|
|
Created:
9 years ago by nduca Modified:
9 years ago CC:
chromium-reviews, apatrick_chromium, reveman, marcheu Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionDisable 5ms flusher on Android to reduce kernel thrashing.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=114688
Patch Set 1 #
Total comments: 1
Patch Set 2 : fix typo #Messages
Total messages: 12 (0 generated)
On 2011/12/15 01:10:03, nduca wrote: On Android, this 5ms flush causes the descheduling of the calling thread. This holds up compositing from completing. Even more annoying, each flush then has a corresponding ack that will come back to the main thread sometime in the middle of the next frame, descheduling us again for a ms or so. Removing this flush noticeably improves frame stability. I suspect this is isn't particular to the Android's kernel, hence the cc's of CrOS folks. Rather, I think the core issue here is that on <=2-core machines, flushing makes a bunch of threads runnable. If the kernel is feeling agressive, its going to thrash.
http://codereview.chromium.org/8956001/diff/1/gpu/command_buffer/client/cmd_b... File gpu/command_buffer/client/cmd_buffer_helper.cc (right): http://codereview.chromium.org/8956001/diff/1/gpu/command_buffer/client/cmd_b... gpu/command_buffer/client/cmd_buffer_helper.cc:184: // amount of work has been done. On highend machines, this reuces the typo 'reuces' :)
On Wed, Dec 14, 2011 at 5:15 PM, <nduca@chromium.org> wrote: > On 2011/12/15 01:10:03, nduca wrote: > > On Android, this 5ms flush causes the descheduling of the calling thread. I'm curious about the details, to see if it can have similar effects on Chrome OS etc. Flush is an asynchronous message, so all it should do is post a task, which will hold a lock that shouldn't cause a contention in the general case, and signal an event, but it shouldn't cause the thread to become stopped at any point Are you saying that even in that case, the kernel will prefer switching threads over staying on the same thread? Or am I missing something else that would cause the thread to become stopped - and can we fix that? I'm afraid that if it's the former we will be chasing ghosts everywhere... > This > holds up compositing from completing. Even more annoying, each flush then > has a > corresponding ack that will come back to the main thread sometime in the > middle > of the next frame, descheduling us again for a ms or so. Removing this > flush > noticeably improves frame stability. > > I suspect this is isn't particular to the Android's kernel, hence the cc's > of > CrOS folks. Rather, I think the core issue here is that on <=2-core > machines, > flushing makes a bunch of threads runnable. If the kernel is feeling > agressive, > its going to thrash. > > http://codereview.chromium.**org/8956001/<http://codereview.chromium.org/8956... >
Great questions! I gathered some data to support the discussion: http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without... http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_fl... This flusher was added back when we were trying on our z600s to get the GPU process cranking on commands early into the frame. On a lowend machine, when you do that, if the GPU process is going to execute anything that is cache/memory, you're creating resource contention with whatever it is that you're "still trying to do." In our case, thats issue the rest of the compositor frames. On Android, the scheduler prefers newly-woken threads for some reason. That just shifts the problem around. For every flush, you get a series of ack IPCs. These have an annoying tendency to land during the middle of the next frame --- adding context switching as they work their way through the system. For whatever reason, a single IPC wakeup tends to cost a third of a millisecond on Android [my guess is message loop, not the kernel, I routinely see the kernel wake up raw processes for barely a microsecond before shifting again]. Right now, I'm trying to eliminate these types of things where possible to reduce scheduling-induced noise. There are tons of other scheduling/GL/etc etc etc issues that I need to study and removing these things from consideration makes the remaining issues more visible. Down the road, we should revisit this in a more formal way, with certainty.
On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote: > Great questions! I gathered some data to support the discussion: > > http://www.corp.google.com/~**nduca/clank_kernel_trace/** > viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html> > http://www.corp.google.com/~**nduca/clank_kernel_trace/** > viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html> > > > This flusher was added back when we were trying on our z600s to get the GPU > process cranking on commands early into the frame. On a lowend machine, > when you > do that, if the GPU process is going to execute anything that is > cache/memory, > you're creating resource contention with whatever it is that you're "still > trying to do." In our case, thats issue the rest of the compositor frames. > > On Android, the scheduler prefers newly-woken threads for some reason. > That just > shifts the problem around. > Stéphane said that's an android-specific thing that they did to increase interactivity. That probably makes the whole thing way worse, because it dramatically increases the number of context switches. On top of that, as you're apparently very heavily CPU-bound, the context switches tend to be longer lived and more likely to trash the cache. > For every flush, you get a series of ack IPCs. These have an annoying > tendency > to land during the middle of the next frame --- adding context switching > as they > work their way through the system. For whatever reason, a single IPC wakeup > tends to cost a third of a millisecond on Android [my guess is message > loop, not > the kernel, I routinely see the kernel wake up raw processes for barely a > microsecond before shifting again]. > The message loop is costly on linux, I did notice that. Not *that* much on Chrome OS but still. I suspect there's some micro-optimizations to be done there. But combined with the android kernel scheduling behavior, that makes it worse. On top of that, it seems that clock() is ill-advised to send a Flush, since it counts the entire process time regardless of which thread (at least that's what the man page says, but msdn gives it a different meaning), and it obviously triggers a lot more than we'd want it to. > > Right now, I'm trying to eliminate these types of things where possible to > reduce scheduling-induced noise. There are tons of other scheduling/GL/etc > etc > etc issues that I need to study and removing these things from > consideration > makes the remaining issues more visible. > TBH I think it's a great find and I'm totally fine with disabling it on Android. I'd be curious to verify whether or not we have the same sort of scheduling behavior on Chrome OS, but I haven't seen evidence of that. > > Down the road, we should revisit this in a more formal way, with certainty. > > http://codereview.chromium.**org/8956001/<http://codereview.chromium.org/8956... >
On Wed, Dec 14, 2011 at 19:53, Antoine Labour <piman@chromium.org> wrote: > > > On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote: > >> Great questions! I gathered some data to support the discussion: >> >> http://www.corp.google.com/~**nduca/clank_kernel_trace/** >> viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html> >> http://www.corp.google.com/~**nduca/clank_kernel_trace/** >> viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html> >> >> >> This flusher was added back when we were trying on our z600s to get the >> GPU >> process cranking on commands early into the frame. On a lowend machine, >> when you >> do that, if the GPU process is going to execute anything that is >> cache/memory, >> you're creating resource contention with whatever it is that you're "still >> trying to do." In our case, thats issue the rest of the compositor frames. >> >> On Android, the scheduler prefers newly-woken threads for some reason. >> That just >> shifts the problem around. >> > > Stéphane said that's an android-specific thing that they did to increase > interactivity. That probably makes the whole thing way worse, because it > dramatically increases the number of context switches. On top of that, as > you're apparently very heavily CPU-bound, the context switches tend to be > longer lived and more likely to trash the cache. > > >> For every flush, you get a series of ack IPCs. These have an annoying >> tendency >> to land during the middle of the next frame --- adding context switching >> as they >> work their way through the system. For whatever reason, a single IPC >> wakeup >> tends to cost a third of a millisecond on Android [my guess is message >> loop, not >> the kernel, I routinely see the kernel wake up raw processes for barely a >> microsecond before shifting again]. >> > > The message loop is costly on linux, I did notice that. Not *that* much on > Chrome OS but still. I suspect there's some micro-optimizations to be done > there. But combined with the android kernel scheduling behavior, that makes > it worse. > On top of that, it seems that clock() is ill-advised to send a Flush, > since it counts the entire process time regardless of which thread (at > least that's what the man page says, but msdn gives it a different > meaning), and it obviously triggers a lot more than we'd want it to. > > >> >> Right now, I'm trying to eliminate these types of things where possible to >> reduce scheduling-induced noise. There are tons of other >> scheduling/GL/etc etc >> etc issues that I need to study and removing these things from >> consideration >> makes the remaining issues more visible. >> > > TBH I think it's a great find and I'm totally fine with disabling it on > Android. > I'd be curious to verify whether or not we have the same sort of > scheduling behavior on Chrome OS, but I haven't seen evidence of that. > Yeah I suspect that the big difference here is between Android's SGX (which uses deferred rendering, so flushes have no effect on it) and Chrome OS's Intel GPU (which loves early flushes). In short as long as Chrome OS is on Intel GPUs I suspect we'll be better off with these flushes enabled. At this point this is just my intuition though, and I intend to apply your change on Chrome OS and do some measurements with it. Which brings the question, do you have specific test cases which show a big difference for you? Stéphane
kk. Owners, start your lgtms
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/nduca@chromium.org/8956001/3002
Change committed as 114688
On Wed, Dec 14, 2011 at 20:16, Stéphane Marchesin <marcheu@chromium.org>wrote: > > > On Wed, Dec 14, 2011 at 19:53, Antoine Labour <piman@chromium.org> wrote: > >> >> >> On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote: >> >>> Great questions! I gathered some data to support the discussion: >>> >>> http://www.corp.google.com/~**nduca/clank_kernel_trace/** >>> viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html> >>> http://www.corp.google.com/~**nduca/clank_kernel_trace/** >>> viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html> >>> >>> >>> This flusher was added back when we were trying on our z600s to get the >>> GPU >>> process cranking on commands early into the frame. On a lowend machine, >>> when you >>> do that, if the GPU process is going to execute anything that is >>> cache/memory, >>> you're creating resource contention with whatever it is that you're >>> "still >>> trying to do." In our case, thats issue the rest of the compositor >>> frames. >>> >>> On Android, the scheduler prefers newly-woken threads for some reason. >>> That just >>> shifts the problem around. >>> >> >> Stéphane said that's an android-specific thing that they did to increase >> interactivity. That probably makes the whole thing way worse, because it >> dramatically increases the number of context switches. On top of that, as >> you're apparently very heavily CPU-bound, the context switches tend to be >> longer lived and more likely to trash the cache. >> >> >>> For every flush, you get a series of ack IPCs. These have an annoying >>> tendency >>> to land during the middle of the next frame --- adding context switching >>> as they >>> work their way through the system. For whatever reason, a single IPC >>> wakeup >>> tends to cost a third of a millisecond on Android [my guess is message >>> loop, not >>> the kernel, I routinely see the kernel wake up raw processes for barely a >>> microsecond before shifting again]. >>> >> >> The message loop is costly on linux, I did notice that. Not *that* much >> on Chrome OS but still. I suspect there's some micro-optimizations to be >> done there. But combined with the android kernel scheduling behavior, that >> makes it worse. >> On top of that, it seems that clock() is ill-advised to send a Flush, >> since it counts the entire process time regardless of which thread (at >> least that's what the man page says, but msdn gives it a different >> meaning), and it obviously triggers a lot more than we'd want it to. >> >> >>> >>> Right now, I'm trying to eliminate these types of things where possible >>> to >>> reduce scheduling-induced noise. There are tons of other >>> scheduling/GL/etc etc >>> etc issues that I need to study and removing these things from >>> consideration >>> makes the remaining issues more visible. >>> >> >> TBH I think it's a great find and I'm totally fine with disabling it on >> Android. >> I'd be curious to verify whether or not we have the same sort of >> scheduling behavior on Chrome OS, but I haven't seen evidence of that. >> > > Yeah I suspect that the big difference here is between Android's SGX > (which uses deferred rendering, so flushes have no effect on it) and Chrome > OS's Intel GPU (which loves early flushes). In short as long as Chrome OS > is on Intel GPUs I suspect we'll be better off with these flushes enabled. > Catching up on an old TODO item of mine: I just tested this change on Chrome OS and I can't see any measurable difference here (on Alex). Stéphane |