Issue 73173005: Telemetry: manage CPU core online state for some device types

pasko

The state management in this code is complex, but I could not make it simpler ...

7 years, 1 month ago (2013-11-15 16:12:59 UTC) #1

bulach

thanks egor! just nits and suggestions, I like the overall idea!! not sure what do ...

7 years, 1 month ago (2013-11-15 17:54:42 UTC) #2

aberent

In addition to the inline comments I have never really understood what we are trying ...

7 years, 1 month ago (2013-11-15 19:00:12 UTC) #3

In addition to the inline comments I have never really understood what we are
trying to test by setting performance mode. On the only tests we run with and
without performance mode (the old warm and cold startup tests on kapook.com) the
results track each other very closely, and there is no evidence that one is
noisier that the other. The more we modify the phones behaviour from its default
behaviour the less likely it is that the regressions we find correspond to the
regressions experienced by real users.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:48: 'Nexus 5': [0, 1],
What is your intention in only managing two of the CPUs? I have certainly seen,
in traces, Chrome using 4 CPUs, and the (relatively small) difference in Chrome
startup time between a normal Nexus4 and Svelte Nexus 4 appeared to be due to
svelte disabling two of the CPUs.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:61:
self._adb.SetProtectedFileContents(PerfControl._CPU_ONLINE_FMT, '0')
This doesn't use cpu_index, how does it know which cpu to take offline? Do you
mean PerfControl._CPU_ONLINE_FMT % cpu_index?

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:101: # number of CPUs online when
running tests. TODO(pasko): stop 'thermald'.
Isn't stopping thermald going to burn out the device?

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:118: 'Nexus 4': 'ondemand',
Please add 'Nexus 5' to this list.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:142: if cpu in
self._cpus_on_originally:
I think if the CPU was both online originally and was in the manually managed
set you need to reset the performance mode.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:145: self._BringCpuOffline(cpu)
Does this clear the performance mode back to the default, or does it simply hide
it? If it simply hides it then you need to reset the performance mode before
taking it offline.

pasko

Marcus, Anthony, thank you for review! On 2013/11/15 19:00:12, aberent wrote: > In addition to ...

7 years, 1 month ago (2013-11-18 15:11:18 UTC) #4

Marcus, Anthony, thank you for review!

On 2013/11/15 19:00:12, aberent wrote:
> In addition to the inline comments I have never really understood what we are
> trying to test by setting performance mode. On the only tests we run with and
> without performance mode (the old warm and cold startup tests on
http://kapook.com) the
> results track each other very closely, and there is no evidence that one is
> noisier that the other. The more we modify the phones behaviour from its
default
> behaviour the less likely it is that the regressions we find correspond to the
> regressions experienced by real users.

I feel strongly that we have to reduce all possible sources of variance in our
measurements. Without actively fighting for reproducible results, we cannot
track regressions, and even observe any performance change. This CL addresses
one point about variance, but alone it will not magically fix our numbers. There
are a few known methodological problems that, if fixed, together may produce
better results, or they may not, due to some other still unknown sources of
variance that are masking the signal.

Startup tests have a built-in variability, I believe, because they rely a lot on
loading binaries from disk, which depends on the flash controller as a source of
variability. So I am not surprised that we cannot sort out signals from noise in
startup tests. However, my hope is that not all tests are like this, *some*
JavaScript non-GC-intensive workloads might be much more stable than others.

To answer your question, if we want to choose among all performance modes, which
would be our constant throughout the test runs, the highest-performance-mode
happening for users should be our choice for a few reasons: (a) CPUs are
designed to be more predictable in this mode, also (b) optimizing hot codepaths
usually pays off better than optimizing cold paths.

There are issues with high-perf mode that are hard:
1. melting silicon
2. running benchmarks always in high-perf would mask some issues with power
management induced by our software: when we drop frames at wrong times causing
later perf-critical parts be executed in low-power CPU mode.

I don't have any good generic solutions in mind for (1) and (2), rather some
ad-hoc, pain-in-tail, but still I feel strongly that we should always run
performance tests in high-performance CPU mode.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:21: ]
On 2013/11/15 17:54:43, bulach wrote:
> nit: align with _ from 17

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:35: # Manage CPUs on some device types
manually to reduce variance in results.
On 2013/11/15 17:54:43, bulach wrote:
> "manually" is a bit misleading, and they're all "managed" one way or
another....
> 
> so perhaps:
> self._forced_cpus as the list (i.e, in place of "cpus_to_manager" ?
> no need for the bool, you can infer that from self._forced_cpus not being
> empty... wdyt?

I tried that, and it looks less readable to me to check for emptiness of a list
in a few places compared to checking for one boolean mode-like variable that is
set up once and remains constant throughout the lifetime of the object. Though
arguably the list is setup only once too. _forced_cpus sounds also a bit
general, how about _must_be_online_cpus?

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:48: 'Nexus 5': [0, 1],
On 2013/11/15 19:00:12, aberent wrote:
> What is your intention in only managing two of the CPUs? I have certainly
seen,
> in traces, Chrome using 4 CPUs, and the (relatively small) difference in
Chrome
> startup time between a normal Nexus4 and Svelte Nexus 4 appeared to be due to
> svelte disabling two of the CPUs. 

I looked at a few pagecycler runs on the bot and found that 2-3 CPUs are used at
the end of the tests. With 2 I am also trying to be on the safer side to avoid
overheating, but we can certainly tweak these numbers later. The main purpose is
making the number of CPUs online be a constant throughout the test run to make
the results more predictable.

Frankly, I did not measure whether anything in fact became more stable, we would
need to check that on the dashboard since variability impact can be quite
different for different tests.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:55: for cpu in xrange(self._kernel_max
+1):
On 2013/11/15 17:54:43, bulach wrote:
> nit: space after +

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:60: logging.info('Bringing CPU #%d
offline.' %cpu_index)
On 2013/11/15 17:54:43, bulach wrote:
> nit: logging takes format and vararg, so replace % with ,

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:61:
self._adb.SetProtectedFileContents(PerfControl._CPU_ONLINE_FMT, '0')
On 2013/11/15 19:00:12, aberent wrote:
> This doesn't use cpu_index, how does it know which cpu to take offline? Do you
> mean PerfControl._CPU_ONLINE_FMT % cpu_index?

That's a great catch! Thank you! Do you think it makes sense to mock self._adb
and verify the logic in unittests? The logic is not particularly tricky, but ..
with Python I am never sure even the syntax of the code is correct without
covering every line of code at runtime, which can be even more tedious than
writing unittests. Yes, I am not a big advocate for Python :)

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:73: else:
On 2013/11/15 17:54:43, bulach wrote:
> nit: no need for else, unindent the following block.

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:75: logging.warning('Gave up bringing
CPU %d online')
On 2013/11/15 17:54:43, bulach wrote:
> nit: missing ", cpu_index"

Thank you! You know, I really got used to GCC warnings about unsatisfied format
strings, but luckily good reviewers do the job of the compiler. Adds a little
bit feeling of guilt, but not outside manageable :)

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:78: logging.warning('Could not bring
CPU %d online, retrying..' %
On 2013/11/15 17:54:43, bulach wrote:
> nit: s/%/,/

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:83: return
'\n'.join(self._adb.GetProtectedFileContents(file_name))
On 2013/11/15 17:54:43, bulach wrote:
> it this function needed? :)
> line 71 would be fine:
> '1' in ['1', 'foo', 'bar']
> 
> and then just do this directly at line 90...

This function made 3 lines (line 56, 72 and 90) shorter and not wrap, on the
other hand it added 3 lines to the file. The final argument of it being a
descriptive name won.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:100: # Stop the proprietary
'mpdecision' hotplug manager to get a constant
On 2013/11/15 17:54:43, bulach wrote:
> is "proprietary" relevant? How about just "Stop 'mdpdecision' hotplug ...

no, not relevant, I was just angry :) Removed. I would not take the suggestion
of your s with a typo because it may sound even more rude than "proprietary".
But if you want, we can make it "m#$%decision".

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:101: # number of CPUs online when
running tests. TODO(pasko): stop 'thermald'.
On 2013/11/15 19:00:12, aberent wrote:
> Isn't stopping thermald going to burn out the device?

Yes, it certainly may. So I was thinking of having it as option for devices with
enough heatsinks installed. It is not typical to user devices, but I value low
variance more than being maximum representative. Let's separate this: in-lab
performance tests should be with low variance, in-field metrics/histograms
should be representative. WDYT?

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:118: 'Nexus 4': 'ondemand',
On 2013/11/15 19:00:12, aberent wrote:
> Please add 'Nexus 5' to this list.

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:128: pass
On 2013/11/15 17:54:43, bulach wrote:
> sorry, this block is a bit confusing :)
> how about just inside 122:
> 
> if not self._forced_cpus:
>   # Set the default perf mode only if the CPU was not forced online.
>   for cpu in xrange(...):
>     ...

Sorry, it was. Good suggestion. Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:142: if cpu in
self._cpus_on_originally:
On 2013/11/15 19:00:12, aberent wrote:
> I think if the CPU was both online originally and was in the manually managed
> set you need to reset the performance mode.

That's right. I forgot to restore the performance governor mode, mainly because
it would be manipulated by mpdecision anyway. This would be cleaner if restores
were performed with scaling governor involved explicitly. Done.

By the way, always wanted to ask. Do we use SetDefaultPerfMode() for something
that is running on perf bots or is it just for local experimentation?

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:145: self._BringCpuOffline(cpu)
On 2013/11/15 19:00:12, aberent wrote:
> Does this clear the performance mode back to the default, or does it simply
hide
> it? If it simply hides it then you need to reset the performance mode before
> taking it offline.

Done.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:152: logging.info('Writing scaling
governor mode \'%s\' -> %s',
On 2013/11/15 17:54:43, bulach wrote:
> nit: replace \' with "

Done.

aberent

On 2013/11/18 15:11:18, pasko wrote: > Marcus, Anthony, thank you for review! > > On ...

7 years, 1 month ago (2013-11-18 16:31:51 UTC) #5

On 2013/11/18 15:11:18, pasko wrote:
> Marcus, Anthony, thank you for review!
> 
> On 2013/11/15 19:00:12, aberent wrote:
> > In addition to the inline comments I have never really understood what we
are
> > trying to test by setting performance mode. On the only tests we run with
and
> > without performance mode (the old warm and cold startup tests on
> http://kapook.com) the
> > results track each other very closely, and there is no evidence that one is
> > noisier that the other. The more we modify the phones behaviour from its
> default
> > behaviour the less likely it is that the regressions we find correspond to
the
> > regressions experienced by real users.
> 
> I feel strongly that we have to reduce all possible sources of variance in our
> measurements. Without actively fighting for reproducible results, we cannot
> track regressions, and even observe any performance change. This CL addresses
> one point about variance, but alone it will not magically fix our numbers.
There
> are a few known methodological problems that, if fixed, together may produce
> better results, or they may not, due to some other still unknown sources of
> variance that are masking the signal.
> 
> Startup tests have a built-in variability, I believe, because they rely a lot
on
> loading binaries from disk, which depends on the flash controller as a source
of
> variability. So I am not surprised that we cannot sort out signals from noise
in
> startup tests. However, my hope is that not all tests are like this, *some*
> JavaScript non-GC-intensive workloads might be much more stable than others.

Actually the startup tests, at least on Manta and Mako, have much lower
variability than many of our other tests. See
https://chromeperf.appspot.com/report?masters=ClankUpstreamInternal&bots=mako....
The typical sd on the kapook.com results on Manta is around 1% and it is less
than this on Mako. My worry, apart from performance mode not being
representative, is that setting performance mode may actually increase the
variability of the tests by increasing thermal throttling. On the startup tests
we throw away results and re-run any tests that are thermally throttled, I don't
think we yet do this on Telemetry tests, although we do print a warning in the
log.

It would be interesting to try some tests with and without performance mode, and
see how the variability compared.

Maybe we should really have two sets of bots, one for test devices (possibly
development boards with lots of extra cooling) running permanently at maximum
performance, and a second set running typical user devices in a typical user
configuration.

aberent

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf_control.py File build/android/pylib/perf/perf_control.py (right): https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf_control.py#newcode101 build/android/pylib/perf/perf_control.py:101: # number of CPUs online when running tests. TODO(pasko): ...

7 years, 1 month ago (2013-11-18 16:32:05 UTC) #6

pasko

On 2013/11/18 16:31:51, aberent wrote: > Actually the startup tests, at least on Manta and ...

7 years, 1 month ago (2013-11-18 16:48:22 UTC) #7

aberent

On 2013/11/18 16:48:22, pasko wrote: > On 2013/11/18 16:31:51, aberent wrote: > > Actually the ...

7 years, 1 month ago (2013-11-18 17:22:03 UTC) #8

On 2013/11/18 16:48:22, pasko wrote:
> On 2013/11/18 16:31:51, aberent wrote:
> > Actually the startup tests, at least on Manta and Mako, have much lower
> > variability than many of our other tests. See
> >
>
https://chromeperf.appspot.com/report?masters=ClankUpstreamInternal&bots=mako....
> > The typical sd on the http://kapook.com results on Manta is around 1% and it
> is less
> > than this on Mako. My worry, apart from performance mode not being
> > representative, is that setting performance mode may actually increase the
> > variability of the tests by increasing thermal throttling. On the startup
> tests
> > we throw away results and re-run any tests that are thermally throttled, I
> don't
> > think we yet do this on Telemetry tests, although we do print a warning in
the
> > log.
> 
> Oh, that's great stuff! By the way, did you look at CPU utilization numbers on
> startup tests? I would not be surprised if it's below 50%. Does thermal
> throttling appear often on startup tests?

I don't know what the overall CPU usage is on the startup tests, but I do know
that some parts of the startup sequence are CPU bound, and other parts are a
long way from being CPU bound. There is a measurable difference in startup time
between a Nexus 4 with four enabled CPUs and one with two enabled CPUs (an occam
svelte) and if one looks at a trace one can see that at times during startup all
the CPUs are running at 100% (even with 4 CPUs). At one time we had an issue on
some targets when running in default performance governor setting where the CPU
frequency would be stepped down during the non CPU bound sections of startup,
and would then take some time to step up again on CPU bound sections, but this
has now (I think) gone away due to other changes in the startup sequence. Note
that we would not have seen this issue if we had been running in performance
mode, but it was entirely reproducible.

Yes, thermal throttling is a big problem in the startup tests. I originally
wrote the thermal throttling code because thermal throttling was a major source
of variance in the startup tests. I am rather surprised that kapook.com gives
any results at all on the bots, I haven't been able to run it locally on devices
that detect thermal throttling since I put in the thermal throttling check,
since they always overheats on every run and gives up after 5 retries.
> 
> Sorry for off-topic :)
> 
> > It would be interesting to try some tests with and without performance mode,
> and
> > see how the variability compared.
> > 
> > Maybe we should really have two sets of bots, one for test devices (possibly
> > development boards with lots of extra cooling) running permanently at
maximum
> > performance, and a second set running typical user devices in a typical user
> > configuration.
> 
> Yes, that'd be cool. 2x maintenance cost though makes me sad. So if I am
forced
> to choose between real devices and rock-stable cooled dev boards (do they
> exist?), my choice should be obvious now.

pasko

On 2013/11/18 17:22:03, aberent wrote: > On 2013/11/18 16:48:22, pasko wrote: > > On 2013/11/18 ...

7 years, 1 month ago (2013-11-18 17:42:02 UTC) #9

On 2013/11/18 17:22:03, aberent wrote:
> On 2013/11/18 16:48:22, pasko wrote:
> > On 2013/11/18 16:31:51, aberent wrote:
> > > Actually the startup tests, at least on Manta and Mako, have much lower
> > > variability than many of our other tests. See
> > >
> >
>
https://chromeperf.appspot.com/report?masters=ClankUpstreamInternal&bots=mako....
> > > The typical sd on the http://kapook.com results on Manta is around 1% and
it
> > is less
> > > than this on Mako. My worry, apart from performance mode not being
> > > representative, is that setting performance mode may actually increase the
> > > variability of the tests by increasing thermal throttling. On the startup
> > tests
> > > we throw away results and re-run any tests that are thermally throttled, I
> > don't
> > > think we yet do this on Telemetry tests, although we do print a warning in
> the
> > > log.
> > 
> > Oh, that's great stuff! By the way, did you look at CPU utilization numbers
on
> > startup tests? I would not be surprised if it's below 50%. Does thermal
> > throttling appear often on startup tests?
> 
> I don't know what the overall CPU usage is on the startup tests, but I do know
> that some parts of the startup sequence are CPU bound, and other parts are a
> long way from being CPU bound. There is a measurable difference in startup
time
> between a Nexus 4 with four enabled CPUs and one with two enabled CPUs (an
occam
> svelte) and if one looks at a trace one can see that at times during startup
all
> the CPUs are running at 100% (even with 4 CPUs). At one time we had an issue
on
> some targets when running in default performance governor setting where the
CPU
> frequency would be stepped down during the non CPU bound sections of startup,
> and would then take some time to step up again on CPU bound sections, but this
> has now (I think) gone away due to other changes in the startup sequence. Note
> that we would not have seen this issue if we had been running in performance
> mode, but it was entirely reproducible.
> 
> Yes, thermal throttling is a big problem in the startup tests. I originally
> wrote the thermal throttling code because thermal throttling was a major
source
> of variance in the startup tests. I am rather surprised that http://kapook.com
gives
> any results at all on the bots, I haven't been able to run it locally on
devices
> that detect thermal throttling since I put in the thermal throttling check,
> since they always overheats on every run and gives up after 5 retries.

Thanks! Interesting.

P.S.: Looked at kapook.com for the first time. Well, impressive piece of design
and engineering, I would say. I hope this stands out separately as a non-typical
marvel.

pasko

oops, forgot to upload my latest changes yesterday. Now done, PTAL.

7 years, 1 month ago (2013-11-19 14:33:33 UTC) #10

bulach

lgtm, mostly nits.. please give aberent a chance to look into it as well.. https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf_control.py ...

7 years, 1 month ago (2013-11-19 15:13:56 UTC) #11

lgtm, mostly nits.. please give aberent a chance to look into it as well..

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:35: # Manage CPUs on some device types
manually to reduce variance in results.
On 2013/11/18 15:11:18, pasko wrote:
> On 2013/11/15 17:54:43, bulach wrote:
> > "manually" is a bit misleading, and they're all "managed" one way or
> another....
> > 
> > so perhaps:
> > self._forced_cpus as the list (i.e, in place of "cpus_to_manager" ?
> > no need for the bool, you can infer that from self._forced_cpus not being
> > empty... wdyt?
> 
> I tried that, and it looks less readable to me to check for emptiness of a
list
> in a few places compared to checking for one boolean mode-like variable that
is
> set up once and remains constant throughout the lifetime of the object. Though
> arguably the list is setup only once too. _forced_cpus sounds also a bit
> general, how about _must_be_online_cpus?

how about this:

extract cpu_indexes_to_manage_manually into a constant:

_MODELS_TO_FULLY_MANAGED_CPU = {
  'Nexus 4': [0, 1],
  'Nexus 5': [0, 1],
}

then rather than a bool + list, it can use:
  self._fully_managed_cpus = _MODELS_TO_FULLY_MANAGED_CPU.get(
    self._adb.GetProductModel(), [])

and from then on, you can just test "if self._fully_managed_cpus" as a simple
bool-like test..

I still think "manually" is a bit misleading as nothing here is manual, this
script can't even be called from the shell directly :)

...or perhaps even more directly like governor_mode, and just have:

self._must_be_online_cpus = {
  'Nexus 4' : [0, 1],
  'Nexut 5' : [0, 1],
}.get(self._adb.GetProductModel(), [])

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:83: return
'\n'.join(self._adb.GetProtectedFileContents(file_name))
On 2013/11/18 15:11:18, pasko wrote:
> On 2013/11/15 17:54:43, bulach wrote:
> > it this function needed? :)
> > line 71 would be fine:
> > '1' in ['1', 'foo', 'bar']
> > 
> > and then just do this directly at line 90...
> 
> This function made 3 lines (line 56, 72 and 90) shorter and not wrap, on the
> other hand it added 3 lines to the file. The final argument of it being a
> descriptive name won.

perhaps then name it as _GetProtectedFileContents as the underlying method it's
calling?

https://codereview.chromium.org/73173005/diff/120001/build/android/pylib/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/120001/build/android/pylib/perf...
build/android/pylib/perf/perf_control.py:102: # online when running tests.
TODO(pasko): stop 'thermald'.
make sure you send a heads up to chrome-perf-sheriffs, and please keep an eye on
the bots to ensure nothing will burn out :) assuming you tried locally, right?

pasko

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf_control.py File build/android/pylib/perf/perf_control.py (right): https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf_control.py#newcode35 build/android/pylib/perf/perf_control.py:35: # Manage CPUs on some device types manually to ...

7 years, 1 month ago (2013-11-19 18:00:08 UTC) #12

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:35: # Manage CPUs on some device types
manually to reduce variance in results.
On 2013/11/19 15:13:57, bulach wrote:
> On 2013/11/18 15:11:18, pasko wrote:
> > On 2013/11/15 17:54:43, bulach wrote:
> > > "manually" is a bit misleading, and they're all "managed" one way or
> > another....
> > > 
> > > so perhaps:
> > > self._forced_cpus as the list (i.e, in place of "cpus_to_manager" ?
> > > no need for the bool, you can infer that from self._forced_cpus not being
> > > empty... wdyt?
> > 
> > I tried that, and it looks less readable to me to check for emptiness of a
> list
> > in a few places compared to checking for one boolean mode-like variable that
> is
> > set up once and remains constant throughout the lifetime of the object.
Though
> > arguably the list is setup only once too. _forced_cpus sounds also a bit
> > general, how about _must_be_online_cpus?
> 
> how about this:
> 
> extract cpu_indexes_to_manage_manually into a constant:
> 
> _MODELS_TO_FULLY_MANAGED_CPU = {
>   'Nexus 4': [0, 1],
>   'Nexus 5': [0, 1],
> }
> 
> then rather than a bool + list, it can use:
>   self._fully_managed_cpus = _MODELS_TO_FULLY_MANAGED_CPU.get(
>     self._adb.GetProductModel(), [])
> 
> and from then on, you can just test "if self._fully_managed_cpus" as a simple
> bool-like test..
> 
> I still think "manually" is a bit misleading as nothing here is manual, this
> script can't even be called from the shell directly :)
> 
> ...or perhaps even more directly like governor_mode, and just have:
> 
> self._must_be_online_cpus = {
>   'Nexus 4' : [0, 1],
>   'Nexut 5' : [0, 1],
> }.get(self._adb.GetProductModel(), [])

I like "fully managed CPUs" as a term. Thanks!!

self._fully_managed_cpus is okay, I'll go with that, it is less readable to me
when one variable gets interpreted as different types in different contexts (I
just had one of those rare Perl flashbacks), but you are right, seems like more
pythonistic readers will find it more convenient.

https://codereview.chromium.org/73173005/diff/1/build/android/pylib/perf/perf...
build/android/pylib/perf/perf_control.py:83: return
'\n'.join(self._adb.GetProtectedFileContents(file_name))
On 2013/11/19 15:13:57, bulach wrote:
> On 2013/11/18 15:11:18, pasko wrote:
> > On 2013/11/15 17:54:43, bulach wrote:
> > > it this function needed? :)
> > > line 71 would be fine:
> > > '1' in ['1', 'foo', 'bar']
> > > 
> > > and then just do this directly at line 90...
> > 
> > This function made 3 lines (line 56, 72 and 90) shorter and not wrap, on the
> > other hand it added 3 lines to the file. The final argument of it being a
> > descriptive name won.
> 
> perhaps then name it as _GetProtectedFileContents as the underlying method
it's
> calling?

Whichever you like. Done.

https://codereview.chromium.org/73173005/diff/120001/build/android/pylib/perf...
File build/android/pylib/perf/perf_control.py (right):

https://codereview.chromium.org/73173005/diff/120001/build/android/pylib/perf...
build/android/pylib/perf/perf_control.py:102: # online when running tests.
TODO(pasko): stop 'thermald'.
On 2013/11/19 15:13:57, bulach wrote:
> make sure you send a heads up to chrome-perf-sheriffs,

Yes, I am planning to let sheriffs know, this can shift performance numbers in
unexpected ways.

> and please keep an eye on the bots to ensure nothing will burn out :) assuming
you tried locally, right?

I ran this locally, of course, it is very unlikely that anything burns with
current CL, so I'll leave most of the watching to sheriffs, will occasionally
look at bots a few times.

tonyg

Great catch on the scaling governor fixes! Could we please land those fixes first so ...

7 years, 1 month ago (2013-11-19 18:07:24 UTC) #13

pasko

On 2013/11/19 18:07:24, tonyg wrote: > Great catch on the scaling governor fixes! Could we ...

7 years, 1 month ago (2013-11-19 18:24:35 UTC) #14

Sami

I'm also curious how this will affect the smoothness benchmark. A part the jankiness there ...

7 years, 1 month ago (2013-11-21 10:56:20 UTC) #15

tonyg

On 2013/11/21 10:56:20, Sami wrote: > I'm also curious how this will affect the smoothness ...

7 years, 1 month ago (2013-11-21 16:20:43 UTC) #16

aberent

On 2013/11/21 16:20:43, tonyg wrote: > On 2013/11/21 10:56:20, Sami wrote: > > I'm also ...

7 years, 1 month ago (2013-11-21 17:47:32 UTC) #17

pasko

On 2013/11/21 17:47:32, aberent wrote: > On 2013/11/21 16:20:43, tonyg wrote: > > On 2013/11/21 ...

7 years, 1 month ago (2013-11-21 19:53:11 UTC) #18

On 2013/11/21 17:47:32, aberent wrote:
> On 2013/11/21 16:20:43, tonyg wrote:
> > On 2013/11/21 10:56:20, Sami wrote:
> > > I'm also curious how this will affect the smoothness benchmark. A part the
> > > jankiness there comes from the fact that mpdecision or equivalent is slow
in
> > > bringing up additional cores, so we limp along with a single core for the
> > first
> > > few frames. Having two cores online from the start should improve this,
but
> > > isn't entirely realistic.
> > > 
> > > I think ultimately we want to optimize for scrolling smoothly with just
one
> > core
> > > since that gives us more room to breathe, so I like Egor's idea of running
> > some
> > > tests in two modes.
> > 
> > Even the current disabling of cpu frequency scaling gets us down a path of
> > sacrificing realism for low-noise. This takes us even farther down this
path.

I prefer to think of this CL as fixing the broken disabling of CPU frequency. If
for some
tests we do not want to disable cpufreq scaling, then OK, let's address this,
but the
current state is half-solution that is not consistent within itself.

> > I think that is a good idea for microbenchmarks like octane, blink_perf,
> > cc_perftests, etc. But a bad idea for regression tests like page_cycler,
> > smoothness, etc.
> > 
> > What would folks think about first changing Telemetry so that we don't call
> the
> > methods to put us into low noise mode for those end-to-end regression tests,
> but
> > do it for the micros. Then we can land this improvement to our noise
reduction
> > functionality.
> 
> Yes, sounds good to me. My concerns about this have been about end-to-end
> benchmarks (in particular startup benchmarks). For microbenchmarks running in
> performance mode probably makes sense.

I disagree.

I feel strongly that low variance is more important than our immediate attempts
to be more 'representative'. That is because:
1. we spend immense resources on bisecting performance problems just to discover
that regressions are within noise, this slows us down and does not prevent
regressions
2. in-lab benchmarks are always liars, in all industries, some more, some less.
Benchmarks are not representative, but they are a good tool to assess some
properties of the product.

Ideally we would have *all* benchmarks run in two modes: one 'closest to
representative' as much as we can get (non-rooted devices?), and another 'tuned
for stable performance' with proper cooling etc. We should not separate by
benchmark type to decide which subset of them runs in which mode.

By the way, are we OK with the current level of realism on page_cycler
benchmarks? I would think realism requires non non-rooted devices, with network
delay simulation. I remember we actually discovered problems in presence of
network delays in the past.

aberent

On 2013/11/21 19:53:11, pasko wrote: > On 2013/11/21 17:47:32, aberent wrote: > > On 2013/11/21 ...

7 years, 1 month ago (2013-11-22 10:00:09 UTC) #19

On 2013/11/21 19:53:11, pasko wrote:
> On 2013/11/21 17:47:32, aberent wrote:
> > On 2013/11/21 16:20:43, tonyg wrote:
> > > On 2013/11/21 10:56:20, Sami wrote:
> > > > I'm also curious how this will affect the smoothness benchmark. A part
the
> > > > jankiness there comes from the fact that mpdecision or equivalent is
slow
> in
> > > > bringing up additional cores, so we limp along with a single core for
the
> > > first
> > > > few frames. Having two cores online from the start should improve this,
> but
> > > > isn't entirely realistic.
> > > > 
> > > > I think ultimately we want to optimize for scrolling smoothly with just
> one
> > > core
> > > > since that gives us more room to breathe, so I like Egor's idea of
running
> > > some
> > > > tests in two modes.
> > > 
> > > Even the current disabling of cpu frequency scaling gets us down a path of
> > > sacrificing realism for low-noise. This takes us even farther down this
> path.
> 
> I prefer to think of this CL as fixing the broken disabling of CPU frequency.
If
> for some
> tests we do not want to disable cpufreq scaling, then OK, let's address this,
> but the
> current state is half-solution that is not consistent within itself.

Agreed; if we are going to use performance mode at all we should use it
correctly, and this CL appears to fix how we currently set performance mode. I
don't think anything in this CL changes when we attempt to set performance mode.

> 
> > > I think that is a good idea for microbenchmarks like octane, blink_perf,
> > > cc_perftests, etc. But a bad idea for regression tests like page_cycler,
> > > smoothness, etc.
> > > 
> > > What would folks think about first changing Telemetry so that we don't
call
> > the
> > > methods to put us into low noise mode for those end-to-end regression
tests,
> > but
> > > do it for the micros. Then we can land this improvement to our noise
> reduction
> > > functionality.
> > 
> > Yes, sounds good to me. My concerns about this have been about end-to-end
> > benchmarks (in particular startup benchmarks). For microbenchmarks running
in
> > performance mode probably makes sense.
> 
> I disagree.
> 
> I feel strongly that low variance is more important than our immediate
attempts
> to be more 'representative'. That is because:
> 1. we spend immense resources on bisecting performance problems just to
discover
> that regressions are within noise, this slows us down and does not prevent
> regressions
> 2. in-lab benchmarks are always liars, in all industries, some more, some
less.
> Benchmarks are not representative, but they are a good tool to assess some
> properties of the product.
> 
> Ideally we would have *all* benchmarks run in two modes: one 'closest to
> representative' as much as we can get (non-rooted devices?), and another
'tuned
> for stable performance' with proper cooling etc. We should not separate by
> benchmark type to decide which subset of them runs in which mode.
> 
> By the way, are we OK with the current level of realism on page_cycler
> benchmarks? I would think realism requires non non-rooted devices, with
network
> delay simulation. I remember we actually discovered problems in presence of
> network delays in the past.

Do we need to have this discussion before landing this CL? While I have some
doubts as to the benefits of performance mode (I am not convinced that this has
much to do with the variability we are seeing) I don't have much data. Once this
has landed we will be able to see the true effect of performance mode on the
variability of the test results, and whether (and on which tests) the benefits
of it are sufficient to justify the lack of realism.

BTW on the downstream bots we do run what benchmarks we can on a non-rooted S3,
in addition to running on rooted Nexus devices.

pasko

On 2013/11/22 10:00:09, aberent wrote: > > I disagree. > > > > I feel ...

7 years ago (2013-11-25 13:31:07 UTC) #20

On 2013/11/22 10:00:09, aberent wrote:
> > I disagree.
> > 
> > I feel strongly that low variance is more important than our immediate
> attempts
> > to be more 'representative'. That is because:
> > 1. we spend immense resources on bisecting performance problems just to
> discover
> > that regressions are within noise, this slows us down and does not prevent
> > regressions
> > 2. in-lab benchmarks are always liars, in all industries, some more, some
> less.
> > Benchmarks are not representative, but they are a good tool to assess some
> > properties of the product.
> > 
> > Ideally we would have *all* benchmarks run in two modes: one 'closest to
> > representative' as much as we can get (non-rooted devices?), and another
> 'tuned
> > for stable performance' with proper cooling etc. We should not separate by
> > benchmark type to decide which subset of them runs in which mode.
> > 
> > By the way, are we OK with the current level of realism on page_cycler
> > benchmarks? I would think realism requires non non-rooted devices, with
> network
> > delay simulation. I remember we actually discovered problems in presence of
> > network delays in the past.
> 
> Do we need to have this discussion before landing this CL? While I have some
> doubts as to the benefits of performance mode (I am not convinced that this
has
> much to do with the variability we are seeing) I don't have much data. Once
this
> has landed we will be able to see the true effect of performance mode on the
> variability of the test results, and whether (and on which tests) the benefits
> of it are sufficient to justify the lack of realism.

I agree that we should land this and observe the effect. :) The performance mode
discussion is indeed broad, we will continue it beyond review.

> BTW on the downstream bots we do run what benchmarks we can on a non-rooted
S3,
> in addition to running on rooted Nexus devices.

I just looked at downstream page_cycler runs on s3, they all say
"User build device, skipping test" and walk away. The s3 bot is *not* useless,
it runs startup tests and other benchmarks, just not (yet?) page_cycler.

So, Anthony, does this CL LGTY?

pasko

After a conversation with tonyg@ we decided to wait until we have a dedicated bot ...

7 years ago (2013-11-27 18:29:17 UTC) #24

pasko

6 years, 2 months ago (2014-10-09 17:30:24 UTC) #25

On 2013/11/27 18:29:17, pasko wrote:
> After a conversation with tonyg@ we decided to wait until we have a dedicated
> bot tuned for perf predictability. The device on the bot should have the same
> hardware/OS configuration compared to one of our existing perf devices, and
then
> tuned for being more predictable (i.e. constant number of cores, scaling
> governor, or even userspace frequency management).

done in another patch, closing this

Issue 73173005: Telemetry: manage CPU core online state for some device types (Closed)

Description

Patch Set 1 #

Patch Set 2 : . #

Patch Set 3 : enable 4 cores, modify files without su #

Patch Set 4 : Useful only for N4, enable only for that #

Patch Set 5 : rebased on top of r263055 #

Messages