Issue 8956001: Disable 5ms flusher on Android to reduce kernel thrashing.

Issue 8956001: Disable 5ms flusher on Android to reduce kernel thrashing. (Closed)

Created:
9 years ago by nduca

Modified:
9 years ago

Reviewers:
no sievers, piman

CC:
chromium-reviews, apatrick_chromium, reveman, marcheu

Base URL:
svn://svn.chromium.org/chrome/trunk/src

Visibility:
Public.

More Reviews

Description

Disable 5ms flusher on Android to reduce kernel thrashing. Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=114688

Patch Set 2 : fix typo #

Created: 9 years ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+5 lines, -1 line)			Patch
	M	gpu/command_buffer/client/cmd_buffer_helper.cc	View	1	1 chunk	+5 lines, -1 line	0 comments	Download

Messages

Total messages: 12 (0 generated)

Expand Messages | Collapse Messages

nduca

On 2011/12/15 01:10:03, nduca wrote: On Android, this 5ms flush causes the descheduling of the ...

9 years ago (2011-12-15 01:15:58 UTC) #2

no sievers

http://codereview.chromium.org/8956001/diff/1/gpu/command_buffer/client/cmd_buffer_helper.cc File gpu/command_buffer/client/cmd_buffer_helper.cc (right): http://codereview.chromium.org/8956001/diff/1/gpu/command_buffer/client/cmd_buffer_helper.cc#newcode184 gpu/command_buffer/client/cmd_buffer_helper.cc:184: // amount of work has been done. On highend ...

9 years ago (2011-12-15 01:34:20 UTC) #3

piman

On Wed, Dec 14, 2011 at 5:15 PM, <nduca@chromium.org> wrote: > On 2011/12/15 01:10:03, nduca ...

9 years ago (2011-12-15 01:38:41 UTC) #4

nduca

Great questions! I gathered some data to support the discussion: http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html This flusher was ...

9 years ago (2011-12-15 02:26:19 UTC) #5

piman

On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote: > Great questions! I gathered ...

9 years ago (2011-12-15 03:53:47 UTC) #6

On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote:

> Great questions! I gathered some data to support the discussion:
>
> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>
viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html>
> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>
viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html>
>
>
> This flusher was added back when we were trying on our z600s to get the GPU
> process cranking on commands early into the frame. On a lowend machine,
> when you
> do that, if the GPU process is going to execute anything that is
> cache/memory,
> you're creating resource contention with whatever it is that you're "still
> trying to do." In our case, thats issue the rest of the compositor frames.
>
> On Android, the scheduler prefers newly-woken threads for some reason.
> That just
> shifts the problem around.
>

Stéphane said that's an android-specific thing that they did to increase
interactivity. That probably makes the whole thing way worse, because it
dramatically increases the number of context switches. On top of that, as
you're apparently very heavily CPU-bound, the context switches tend to be
longer lived and more likely to trash the cache.

> For every flush, you get a series of ack IPCs. These have an annoying
> tendency
> to land during the middle of the next frame --- adding context switching
> as they
> work their way through the system. For whatever reason, a single IPC wakeup
> tends to cost a third of a millisecond on Android [my guess is message
> loop, not
> the kernel, I routinely see the kernel wake up raw processes for barely a
> microsecond before shifting again].
>

The message loop is costly on linux, I did notice that. Not *that* much on
Chrome OS but still. I suspect there's some micro-optimizations to be done
there. But combined with the android kernel scheduling behavior, that makes
it worse.
On top of that, it seems that clock() is ill-advised to send a Flush, since
it counts the entire process time regardless of which thread (at least
that's what the man page says, but msdn gives it a different meaning), and
it obviously triggers a lot more than we'd want it to.

>
> Right now, I'm trying to eliminate these types of things where possible to
> reduce scheduling-induced noise. There are tons of other scheduling/GL/etc
> etc
> etc issues that I need to study and removing these things from
> consideration
> makes the remaining issues more visible.
>

TBH I think it's a great find and I'm totally fine with disabling it on
Android.
I'd be curious to verify whether or not we have the same sort of scheduling
behavior on Chrome OS, but I haven't seen evidence of that.

>
> Down the road, we should revisit this in a more formal way, with certainty.
>
>
http://codereview.chromium.**org/8956001/<http://codereview.chromium.org/8956...
>

marcheu

On Wed, Dec 14, 2011 at 19:53, Antoine Labour <piman@chromium.org> wrote: > > > On ...

9 years ago (2011-12-15 04:17:07 UTC) #7

On Wed, Dec 14, 2011 at 19:53, Antoine Labour <piman@chromium.org> wrote:

>
>
> On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote:
>
>> Great questions! I gathered some data to support the discussion:
>>
>> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>>
viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html>
>> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>>
viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html>
>>
>>
>> This flusher was added back when we were trying on our z600s to get the
>> GPU
>> process cranking on commands early into the frame. On a lowend machine,
>> when you
>> do that, if the GPU process is going to execute anything that is
>> cache/memory,
>> you're creating resource contention with whatever it is that you're "still
>> trying to do." In our case, thats issue the rest of the compositor frames.
>>
>> On Android, the scheduler prefers newly-woken threads for some reason.
>> That just
>> shifts the problem around.
>>
>
> Stéphane said that's an android-specific thing that they did to increase
> interactivity. That probably makes the whole thing way worse, because it
> dramatically increases the number of context switches. On top of that, as
> you're apparently very heavily CPU-bound, the context switches tend to be
> longer lived and more likely to trash the cache.
>
>
>> For every flush, you get a series of ack IPCs. These have an annoying
>> tendency
>> to land during the middle of the next frame --- adding context switching
>> as they
>> work their way through the system. For whatever reason, a single IPC
>> wakeup
>> tends to cost a third of a millisecond on Android [my guess is message
>> loop, not
>> the kernel, I routinely see the kernel wake up raw processes for barely a
>> microsecond before shifting again].
>>
>
> The message loop is costly on linux, I did notice that. Not *that* much on
> Chrome OS but still. I suspect there's some micro-optimizations to be done
> there. But combined with the android kernel scheduling behavior, that makes
> it worse.
> On top of that, it seems that clock() is ill-advised to send a Flush,
> since it counts the entire process time regardless of which thread (at
> least that's what the man page says, but msdn gives it a different
> meaning), and it obviously triggers a lot more than we'd want it to.
>
>
>>
>> Right now, I'm trying to eliminate these types of things where possible to
>> reduce scheduling-induced noise. There are tons of other
>> scheduling/GL/etc etc
>> etc issues that I need to study and removing these things from
>> consideration
>> makes the remaining issues more visible.
>>
>
> TBH I think it's a great find and I'm totally fine with disabling it on
> Android.
> I'd be curious to verify whether or not we have the same sort of
> scheduling behavior on Chrome OS, but I haven't seen evidence of that.
>

Yeah I suspect that the big difference here is between Android's SGX (which
uses deferred rendering, so flushes have no effect on it) and Chrome OS's
Intel GPU (which loves early flushes). In short as long as Chrome OS is on
Intel GPUs I suspect we'll be better off with these flushes enabled. At
this point this is just my intuition though, and I intend to apply your
change on Chrome OS and do some measurements with it. Which brings the
question, do you have specific test cases which show a big difference for
you?

Stéphane

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/nduca@chromium.org/8956001/3002

9 years ago (2011-12-15 18:28:35 UTC) #10

marcheu

8 years, 11 months ago (2012-01-03 22:53:09 UTC) #12

On Wed, Dec 14, 2011 at 20:16, Stéphane Marchesin <marcheu@chromium.org>wrote:

>
>
> On Wed, Dec 14, 2011 at 19:53, Antoine Labour <piman@chromium.org> wrote:
>
>>
>>
>> On Wed, Dec 14, 2011 at 6:26 PM, <nduca@chromium.org> wrote:
>>
>>> Great questions! I gathered some data to support the discussion:
>>>
>>> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>>>
viewer/cur_trace_without_**flush.html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_without_flush.html>
>>> http://www.corp.google.com/~**nduca/clank_kernel_trace/**
>>>
viewer/cur_trace_with_flush.**html<http://www.corp.google.com/~nduca/clank_kernel_trace/viewer/cur_trace_with_flush.html>
>>>
>>>
>>> This flusher was added back when we were trying on our z600s to get the
>>> GPU
>>> process cranking on commands early into the frame. On a lowend machine,
>>> when you
>>> do that, if the GPU process is going to execute anything that is
>>> cache/memory,
>>> you're creating resource contention with whatever it is that you're
>>> "still
>>> trying to do." In our case, thats issue the rest of the compositor
>>> frames.
>>>
>>> On Android, the scheduler prefers newly-woken threads for some reason.
>>> That just
>>> shifts the problem around.
>>>
>>
>> Stéphane said that's an android-specific thing that they did to increase
>> interactivity. That probably makes the whole thing way worse, because it
>> dramatically increases the number of context switches. On top of that, as
>> you're apparently very heavily CPU-bound, the context switches tend to be
>> longer lived and more likely to trash the cache.
>>
>>
>>> For every flush, you get a series of ack IPCs. These have an annoying
>>> tendency
>>> to land during the middle of the next frame --- adding context switching
>>> as they
>>> work their way through the system. For whatever reason, a single IPC
>>> wakeup
>>> tends to cost a third of a millisecond on Android [my guess is message
>>> loop, not
>>> the kernel, I routinely see the kernel wake up raw processes for barely a
>>> microsecond before shifting again].
>>>
>>
>> The message loop is costly on linux, I did notice that. Not *that* much
>> on Chrome OS but still. I suspect there's some micro-optimizations to be
>> done there. But combined with the android kernel scheduling behavior, that
>> makes it worse.
>> On top of that, it seems that clock() is ill-advised to send a Flush,
>> since it counts the entire process time regardless of which thread (at
>> least that's what the man page says, but msdn gives it a different
>> meaning), and it obviously triggers a lot more than we'd want it to.
>>
>>
>>>
>>> Right now, I'm trying to eliminate these types of things where possible
>>> to
>>> reduce scheduling-induced noise. There are tons of other
>>> scheduling/GL/etc etc
>>> etc issues that I need to study and removing these things from
>>> consideration
>>> makes the remaining issues more visible.
>>>
>>
>> TBH I think it's a great find and I'm totally fine with disabling it on
>> Android.
>> I'd be curious to verify whether or not we have the same sort of
>> scheduling behavior on Chrome OS, but I haven't seen evidence of that.
>>
>
> Yeah I suspect that the big difference here is between Android's SGX
> (which uses deferred rendering, so flushes have no effect on it) and Chrome
> OS's Intel GPU (which loves early flushes). In short as long as Chrome OS
> is on Intel GPUs I suspect we'll be better off with these flushes enabled.
>

Catching up on an old TODO item of mine: I just tested this change on
Chrome OS and I can't see any measurable difference here (on Alex).

Stéphane

Expand Messages | Collapse Messages