|
|
Created:
6 years, 8 months ago by bulach Modified:
5 years, 6 months ago CC:
chromium-reviews, telemetry+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionTelemetry: adds CPU frequency stats.
Initial attempt to gather how much time the CPUs spend per frequency.
It may (or may not) be useful for deriving power usage.
BUG=
Patch Set 1 : #
Total comments: 8
Patch Set 2 : Comments #
Total comments: 13
Patch Set 3 : Comments #
Total comments: 7
Patch Set 4 : unit-info #
Total comments: 10
Patch Set 5 : Rebased #Patch Set 6 : Link to docs #Patch Set 7 : Nit comment #Patch Set 8 : Rebased (and fixed test data) #Patch Set 9 : Fixes mac #Messages
Total messages: 41 (1 generated)
hey tony, egor, I think we chatted a while ago about gathering frequency state.. this patch adds results that look like: RESULT cpu_freq: 883200_hz= [0,0,0,0,0,0,0,0,0,0] ms RESULT cpu_freq: 2265600_hz= [456,508,480,464,423,653,632,586,497,470] ms RESULT cpu_freq: 960000_hz= [0,0,0,0,0,1,77,7,0,0] ms does it look useful? wdyt? https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... File tools/perf/measurements/page_cycler.py (left): https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... tools/perf/measurements/page_cycler.py:92: # self._cpu_metric.Start(page, tab) See crbug.com/301714. never mind this file, I'll split into a separate patch later on (the bug seems fixed..)
https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... File tools/perf/measurements/page_cycler.py (left): https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... tools/perf/measurements/page_cycler.py:92: # self._cpu_metric.Start(page, tab) See crbug.com/301714. On 2014/04/17 17:42:10, bulach wrote: > never mind this file, I'll split into a separate patch later on (the bug seems > fixed..) Yes, separate patch sg https://codereview.chromium.org/239083010/diff/20001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/20001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:41: results.Add(k, 'ms', v, I think this needs to either be a histogram result type or else be a single average_frequency_hz number. In its current form, the data wouldn't visualize well on the dashboard and html results. https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/platform/platform_backend.py (right): https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/platform/platform_backend.py:59: def GetGlobalCpuStats(self): We shouldn't need a new API method, right? Should GetCpuStats just include a Global key? Also, I'm wondering whether this should go into the StopMonitoringPower dict instead. There's some other global power related stuff in there. I was thinking c-states would go in there, so p-states probably make sense as well.
thanks tony! addressed some comments and have some questions below, ptal. https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... File tools/perf/measurements/page_cycler.py (left): https://codereview.chromium.org/239083010/diff/20001/tools/perf/measurements/... tools/perf/measurements/page_cycler.py:92: # self._cpu_metric.Start(page, tab) See crbug.com/301714. On 2014/04/18 03:58:56, tonyg wrote: > On 2014/04/17 17:42:10, bulach wrote: > > never mind this file, I'll split into a separate patch later on (the bug seems > > fixed..) > > Yes, separate patch sg split into https://codereview.chromium.org/246453004/ (still here to make my life easier, I'll make sure the other one lands first and gets merged here..) https://codereview.chromium.org/239083010/diff/20001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/20001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:41: results.Add(k, 'ms', v, On 2014/04/18 03:58:56, tonyg wrote: > I think this needs to either be a histogram result type or else be a single > average_frequency_hz number. In its current form, the data wouldn't visualize > well on the dashboard and html results. ouch, indeed.... averaging it out as average_frequency_hz https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/browser.py (right): https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/browser.py:109: def _GetStatsCommon(self, pid_stats_function): ^^ this is what ends up calling the CpuStats... at this level, it assumes it's "per pid" rather than global, i.e., GetCpuStats takes a pid parameter, and that's why exposing a "GetGlobalCpuStats" seemed more natural.. (not just GetCpuStats, but also GetMemory and GetIOStats are per PID). I think we can't push this all down due to self._browser_backend.GetProcessName(...), since platform_backend doesn't know about browser_backend.. having said that, the only implementation of |GetProcessName| seems static :) so if we want to avoid exposing GetGlobalCpuStats, we'd need to: - extract |browser_backend.GetProcessName| into some utility. - move this _GetStatsCommon into platform_backend - change platform_backend.|GetMemoryStats, GetIOStats, GetCPUStats| so that they would be global (not per PID), and internally would then call this PID-aggregation function.. wdyt? happy to do it separately if you think it's a good cleanup.. https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/platform/platform_backend.py (right): https://codereview.chromium.org/239083010/diff/20001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/platform/platform_backend.py:59: def GetGlobalCpuStats(self): On 2014/04/18 03:58:56, tonyg wrote: > We shouldn't need a new API method, right? Should GetCpuStats just include a > Global key? see above :) > > Also, I'm wondering whether this should go into the StopMonitoringPower dict > instead. There's some other global power related stuff in there. I was thinking > c-states would go in there, so p-states probably make sense as well. hmm... tough choice: on one hand, this is certainly related to power stuff.. otoh, it feels a bit weird to have "%CPU per process" in the cpu metrics, and then the frequency histogram separated out? happy to do it in StopMonitoringPower, if you prefer, just explaining my reasoning..
Seems to be potentially useful. I'd leave code design considerations for Tony's judgement, but please enjoy my concerns on noise levels. One way to deal with it would be the opposite to Tony's proposal (i.e. avoid averaging early, apply more data analysis later), but it's too early to say anything right now before we see any numbers. Can you experiment locally with say --pageset-repeat=30 and report the cpufreq numbers for those runs with stdev? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:48: data_type='unimportant') what does data_type mean? is it documented anywhere? is it used anywhere? https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... File tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py:48: cpu_frequencies[int(k)] += int(v) I'm concerned with the story towards two vectors: 1. CPU hotplug 2. scaling governor set to 'performance' More on (1): Depending on CPU hotplug state (on/off) at the time of collecting the stats (in page_cycler case when the page has already loaded), a few CPUs can be already switched off. The decision to switch CPUs on/off are quite coarse-grained, so we may end up loosing large chunks of cpufreq data occasionally. This would lead to very noisy stats especially for "time spent at freq X". More on (2): Wouldn't the frequency be maxed out most of the time with scaling governor set to 'performance'? Do we want to detect thermal frequency scaling this way? Ways to reduce noise would be: 1. hotpug ON all cores all the time (just like https://codereview.chromium.org/73173005/), this does not lead to realistic power usage, however 2. special power-predictability bots with number of online cores restricted to one at all times (infra maintenance troubles) 3. don't restrict anything, measure only cpu0, it would be less noisy this way and may actually correlate to power usage much better, however, correlation coefficients may have to me gathered experimentally for each test in separation (i.e. data analysis would be tricky)
On 2014/04/22 14:06:30, pasko wrote: > Seems to be potentially useful. > > I'd leave code design considerations for Tony's judgement, but please enjoy my > concerns on noise levels. One way to deal with it would be the opposite to > Tony's proposal (i.e. avoid averaging early, apply more data analysis later), > but it's too early to say anything right now before we see any numbers. > > Can you experiment locally with say --pageset-repeat=30 and report the cpufreq > numbers for those runs with stdev? I surely can, but I'm afraid I won't have enough "scale" locally. so how about using the bots for that? :) given that it's a brand new trace, offline with respect to other measures (i.e., it doesn't use tracing / won't affect other measurements), would you be ok if we do this analysis on the bots? if it proves too noisy / useless, we can just revert this patch, no harm done (I hope!). > > https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py > File tools/perf/metrics/cpu.py (right): > > https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... > tools/perf/metrics/cpu.py:48: data_type='unimportant') > what does data_type mean? is it documented anywhere? is it used anywhere? > > https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... > File tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py > (right): > > https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... > tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py:48: > cpu_frequencies[int(k)] += int(v) > I'm concerned with the story towards two vectors: > 1. CPU hotplug > 2. scaling governor set to 'performance' > > More on (1): > Depending on CPU hotplug state (on/off) at the time of collecting the stats (in > page_cycler case when the page has already loaded), a few CPUs can be already > switched off. The decision to switch CPUs on/off are quite coarse-grained, so we > may end up loosing large chunks of cpufreq data occasionally. This would lead to > very noisy stats especially for "time spent at freq X". > > More on (2): > Wouldn't the frequency be maxed out most of the time with scaling governor set > to 'performance'? Do we want to detect thermal frequency scaling this way? I'm not sure if 'performance' maxes out all the time, or is just bias towards higher freqs. in my local tests (page cycler, n5), I haven't seen it 100% of time on max freq, but then again, I didn't test it too broadly.. > > Ways to reduce noise would be: > 1. hotpug ON all cores all the time (just like > https://codereview.chromium.org/73173005/), this does not lead to realistic > power usage, however > 2. special power-predictability bots with number of online cores restricted to > one at all times (infra maintenance troubles) > 3. don't restrict anything, measure only cpu0, it would be less noisy this way > and may actually correlate to power usage much better, however, correlation > coefficients may have to me gathered experimentally for each test in separation > (i.e. data analysis would be tricky) hmm, measuring only cpu0 seems an interesting metric to gather! I suppose it wouldn't hurt to split it out, do you think it's worth experimenting at this stage or later on? there's little cost to split it anyways, but if the whole-cpu approach is a good first approximation, maybe it's better to start that way and refine later? wdyt?
https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:39: # Add a result for Global CPU stats. Does "Global" mean all process types combined? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:40: time_spent = 0.0 Is this in seconds or milliseconds? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:43: frequency_sum += (k * v) So what does frequency_sum represent? What are k and v? I'm guessing that it's Hertz-seconds? So 1 MHz for 2 seconds would be 2,000,000 Hertz-seconds? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:48: data_type='unimportant') On 2014/04/22 14:06:30, pasko wrote: > what does data_type mean? is it documented anywhere? is it used anywhere? The data_type parameter specifies two things -- whether the result that's being added is a scalar value, a list of values, or a "histogram", and also whether it should be marked as "important" (whether it's a primary thing that's being measured or just extra info). I think this way of specifying values is going to be deprecated.
On 2014/04/22 16:09:18, bulach wrote: > On 2014/04/22 14:06:30, pasko wrote: > > Seems to be potentially useful. > > > > I'd leave code design considerations for Tony's judgement, but please enjoy my > > concerns on noise levels. One way to deal with it would be the opposite to > > Tony's proposal (i.e. avoid averaging early, apply more data analysis later), > > but it's too early to say anything right now before we see any numbers. > > > > Can you experiment locally with say --pageset-repeat=30 and report the cpufreq > > numbers for those runs with stdev? > > I surely can, but I'm afraid I won't have enough "scale" locally. > so how about using the bots for that? :) I assumed that if the numbers are not useful on one local device, they won't be useful on others. And this assumption may not be true, so I'm fine with experimenting on a wider range of bots if you want to evaluate all devices at once. The concerns I have is nothing new: 1. with reviews and dashboard turnaround it is slower to iterate through series of non-working solutions 2. hacking on telemetry is likely to introduce breakages which would eat our otherwise valid results for lunch (hey it's python, only runtime checks, and covering all the lines is annoying, nobody does that;) > given that it's a brand new trace, offline with respect to other measures (i.e., > it doesn't use tracing / won't affect other measurements), would you be ok if we > do this analysis on the bots? > if it proves too noisy / useless, we can just revert this patch, no harm done (I > hope!). Yeah, sure, this is low overhead. > > https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py > > File tools/perf/metrics/cpu.py (right): > > > > > https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... > > tools/perf/metrics/cpu.py:48: data_type='unimportant') > > what does data_type mean? is it documented anywhere? is it used anywhere? > > > > > https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... > > File > tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py > > (right): > > > > > https://codereview.chromium.org/239083010/diff/60001/tools/telemetry/telemetr... > > > tools/telemetry/telemetry/core/platform/proc_supporting_platform_backend.py:48: > > cpu_frequencies[int(k)] += int(v) > > I'm concerned with the story towards two vectors: > > 1. CPU hotplug > > 2. scaling governor set to 'performance' > > > > More on (1): > > Depending on CPU hotplug state (on/off) at the time of collecting the stats > (in > > page_cycler case when the page has already loaded), a few CPUs can be already > > switched off. The decision to switch CPUs on/off are quite coarse-grained, so > we > > may end up loosing large chunks of cpufreq data occasionally. This would lead > to > > very noisy stats especially for "time spent at freq X". > > > > More on (2): > > Wouldn't the frequency be maxed out most of the time with scaling governor set > > to 'performance'? Do we want to detect thermal frequency scaling this way? > > I'm not sure if 'performance' maxes out all the time, or is just bias towards > higher freqs. > in my local tests (page cycler, n5), I haven't seen it 100% of time on max freq, > but then again, I didn't test it too broadly.. Hm, interesting, here is a quote from the kernel documentation: """ The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of scaling_min_freq and scaling_max_freq. """ Was it tweaked for Android? Does mpdecision tweak scaling_max_freq? Can you reproduce non-100% with governor == "performance"? > > Ways to reduce noise would be: > > 1. hotpug ON all cores all the time (just like > > https://codereview.chromium.org/73173005/), this does not lead to realistic > > power usage, however > > 2. special power-predictability bots with number of online cores restricted to > > one at all times (infra maintenance troubles) > > 3. don't restrict anything, measure only cpu0, it would be less noisy this way > > and may actually correlate to power usage much better, however, correlation > > coefficients may have to me gathered experimentally for each test in > separation > > (i.e. data analysis would be tricky) > > hmm, measuring only cpu0 seems an interesting metric to gather! > I suppose it wouldn't hurt to split it out, do you think it's worth > experimenting at this stage or later on? I'm OK with both, it's just a suggestion. If it were me, I'd have looked at both variants with local testing to make conclusions faster, but you can go breadth-first :) > there's little cost to split it anyways, but if the whole-cpu approach is a good > first approximation, maybe it's better to start that way and refine later? wdyt? Refine later sgtm, apply your judgement.
thanks qyearsley! inline, some questions: https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:39: # Add a result for Global CPU stats. On 2014/04/22 17:00:51, qyearsley wrote: > Does "Global" mean all process types combined? sort of. :) it means system-wide utilization, not per process. so yes, all process combined, but not necessarily "chrome process". https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:40: time_spent = 0.0 On 2014/04/22 17:00:51, qyearsley wrote: > Is this in seconds or milliseconds? I suppose the unit is 10mS: https://www.kernel.org/doc/Documentation/cpu-freq/cpufreq-stats.txt but it doesn't matter, it's just for a weighted average. https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:43: frequency_sum += (k * v) On 2014/04/22 17:00:51, qyearsley wrote: > So what does frequency_sum represent? What are k and v? > I'm guessing that it's Hertz-seconds? So 1 MHz for 2 seconds would be 2,000,000 > Hertz-seconds? k and v are the key-values, happy to expand to frequency and time if you think it's clearer. you're right about Hertz-seconds, but that's probably a difficult to grasp unit since Hertz is 1/second :) I suppose keeping it unit-less is clearer and correct?
https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:39: # Add a result for Global CPU stats. On 2014/04/22 17:12:30, bulach wrote: > On 2014/04/22 17:00:51, qyearsley wrote: > > Does "Global" mean all process types combined? > > sort of. :) > it means system-wide utilization, not per process. > so yes, all process combined, but not necessarily "chrome process". Alright -- not sure if it would be clearer if you say this in the comment above? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:43: frequency_sum += (k * v) On 2014/04/22 17:12:30, bulach wrote: > On 2014/04/22 17:00:51, qyearsley wrote: > > So what does frequency_sum represent? What are k and v? > > I'm guessing that it's Hertz-seconds? So 1 MHz for 2 seconds would be > 2,000,000 > > Hertz-seconds? > > k and v are the key-values, happy to expand to frequency and time if you think > it's clearer. > > you're right about Hertz-seconds, but that's probably a difficult to grasp unit > since Hertz is 1/second :) I suppose keeping it unit-less is clearer and > correct? I think it would be a bit clearer if it were frequency and time rather than k and v, although then you're gonna have to split that line :-P Since Hertz in the context of processing is cycles/second, Hertz-seconds = cycles, but since the time isn't in seconds, frequency_sum isn't actually number of cycles... Either way, frequency_sum is just used to compute average frequency, so I guess it's better this way.
thanks! another look please? https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:39: # Add a result for Global CPU stats. On 2014/04/22 17:25:52, qyearsley wrote: > On 2014/04/22 17:12:30, bulach wrote: > > On 2014/04/22 17:00:51, qyearsley wrote: > > > Does "Global" mean all process types combined? > > > > sort of. :) > > it means system-wide utilization, not per process. > > so yes, all process combined, but not necessarily "chrome process". > > Alright -- not sure if it would be clearer if you say this in the comment above? Done. https://codereview.chromium.org/239083010/diff/60001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:43: frequency_sum += (k * v) On 2014/04/22 17:25:52, qyearsley wrote: > On 2014/04/22 17:12:30, bulach wrote: > > On 2014/04/22 17:00:51, qyearsley wrote: > > > So what does frequency_sum represent? What are k and v? > > > I'm guessing that it's Hertz-seconds? So 1 MHz for 2 seconds would be > > 2,000,000 > > > Hertz-seconds? > > > > k and v are the key-values, happy to expand to frequency and time if you think > > it's clearer. > > > > you're right about Hertz-seconds, but that's probably a difficult to grasp > unit > > since Hertz is 1/second :) I suppose keeping it unit-less is clearer and > > correct? > > I think it would be a bit clearer if it were frequency and time rather than k > and v, although then you're gonna have to split that line :-P > > Since Hertz in the context of processing is cycles/second, Hertz-seconds = > cycles, but since the time isn't in seconds, frequency_sum isn't actually number > of cycles... > > Either way, frequency_sum is just used to compute average frequency, so I guess > it's better this way. Replaced k, v with frequency, time_in_state. I couldn't think of anything better for the the sums, but happy to change if you have any other suggestion.
https://codereview.chromium.org/239083010/diff/80001/tools/perf/measurements/... File tools/perf/measurements/page_cycler.py (right): https://codereview.chromium.org/239083010/diff/80001/tools/perf/measurements/... tools/perf/measurements/page_cycler.py:142: # changed. See crbug.com/301714. This TODO comment can be removed as well, right? https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:47: results.Add('average_frequency_hz', 'Hz', This would require changing a couple places, but might it be reasonable to use MHz or GHz as the units and report a floating-point number? (It's easier to see the difference between 200 and 2000 MHz than it is to see between 200000000 and 2000000000 Hz...) https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:48: int(frequency_sum / total_time), Is total_time ever 0.0? (i.e. would it ever be the case that self._results['Global']['GlobalCpuFrequencyStats'] is empty, or contains an entry with a time_in_state of 0.0)? For the other CPU stats, we're expecting that it if everything is working OK, then time should never be 0, and we have: assert total_time > 0, 'Expected total_time > 0, was: %d' % total_time
https://codereview.chromium.org/239083010/diff/80001/tools/perf/measurements/... File tools/perf/measurements/page_cycler.py (right): https://codereview.chromium.org/239083010/diff/80001/tools/perf/measurements/... tools/perf/measurements/page_cycler.py:142: # changed. See crbug.com/301714. On 2014/04/22 18:22:26, qyearsley wrote: > This TODO comment can be removed as well, right? this patch is not yet ready to land :) I'll merge once the other one https://codereview.chromium.org/246453004/ lands, and then this file will disappear from here.. https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:47: results.Add('average_frequency_hz', 'Hz', On 2014/04/22 18:22:26, qyearsley wrote: > This would require changing a couple places, but might it be reasonable to use > MHz or GHz as the units and report a floating-point number? > > (It's easier to see the difference between 200 and 2000 MHz than it is to see > between 200000000 and 2000000000 Hz...) good point! rounded to MHz with 2 digits precision, hopefully that's good enough? https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:48: int(frequency_sum / total_time), On 2014/04/22 18:22:26, qyearsley wrote: > Is total_time ever 0.0? (i.e. would it ever be the case that > self._results['Global']['GlobalCpuFrequencyStats'] is empty, or contains an > entry with a time_in_state of 0.0)? > > For the other CPU stats, we're expecting that it if everything is working OK, > then time should never be 0, and we have: > assert total_time > 0, 'Expected total_time > 0, was: %d' % total_time ahn, great catch, thanks! I'm only adding this to "proc_supporting" platforms, so no Windows for now.. guarded with an if instead of an assert.
https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... tools/perf/metrics/cpu.py:47: results.Add('average_frequency_hz', 'Hz', On 2014/04/22 18:34:50, bulach wrote: > On 2014/04/22 18:22:26, qyearsley wrote: > > This would require changing a couple places, but might it be reasonable to use > > MHz or GHz as the units and report a floating-point number? > > > > (It's easier to see the difference between 200 and 2000 MHz than it is to see > > between 200000000 and 2000000000 Hz...) > > good point! rounded to MHz with 2 digits precision, hopefully that's good > enough? I think that sounds good; although note that in order for the perf dashboard to know whether increase/decrease should be considered a regression, you would have to add an entry to src/tools/perf/unit-info.json. But, in this case, is higher or lower necessarily better? Lower means less power consumption, but possibly slower page load right?
On 2014/04/22 18:48:37, qyearsley wrote: > https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.py > File tools/perf/metrics/cpu.py (right): > > https://codereview.chromium.org/239083010/diff/80001/tools/perf/metrics/cpu.p... > tools/perf/metrics/cpu.py:47: results.Add('average_frequency_hz', 'Hz', > On 2014/04/22 18:34:50, bulach wrote: > > On 2014/04/22 18:22:26, qyearsley wrote: > > > This would require changing a couple places, but might it be reasonable to > use > > > MHz or GHz as the units and report a floating-point number? > > > > > > (It's easier to see the difference between 200 and 2000 MHz than it is to > see > > > between 200000000 and 2000000000 Hz...) > > > > good point! rounded to MHz with 2 digits precision, hopefully that's good > > enough? > > I think that sounds good; although note that in order for the perf dashboard to > know whether increase/decrease should be considered a regression, you would have > to add an entry to src/tools/perf/unit-info.json. > > But, in this case, is higher or lower necessarily better? Lower means less power > consumption, but possibly slower page load right? added to unit-info, thanks! yeah, it's a bit tricky, but as you said: lower power is good. slower page load / rasterization / smoothness should be captured by existing other metrics... this one is more about "let's use the slowest possible frequency to perform at the same level". I needed to add more unit tests to the other patch, so this one isn't ready yet, but anyways, ptal!
On 2014/04/22 16:09:18, bulach wrote: > > Ways to reduce noise would be: > > 1. hotpug ON all cores all the time (just like > > https://codereview.chromium.org/73173005/), this does not lead to realistic > > power usage, however > > 2. special power-predictability bots with number of online cores restricted to > > one at all times (infra maintenance troubles) > > 3. don't restrict anything, measure only cpu0, it would be less noisy this way > > and may actually correlate to power usage much better, however, correlation > > coefficients may have to me gathered experimentally for each test in > separation > > (i.e. data analysis would be tricky) > > hmm, measuring only cpu0 seems an interesting metric to gather! > I suppose it wouldn't hurt to split it out, do you think it's worth > experimenting at this stage or later on? > there's little cost to split it anyways, but if the whole-cpu approach is a good > first approximation, maybe it's better to start that way and refine later? wdyt? I think limiting measurements for cpu0 is a good first approximation, the whole-cpu is less reasonable IMO, but it's hard to say more without a single bit of experimental data.
https://codereview.chromium.org/239083010/diff/100001/tools/perf/measurements... File tools/perf/measurements/page_cycler.py (left): https://codereview.chromium.org/239083010/diff/100001/tools/perf/measurements... tools/perf/measurements/page_cycler.py:146: # changed. See crbug.com/301714. should this also remove the comment? https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.json File tools/perf/unit-info.json (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.js... tools/perf/unit-info.json:86: "why": "CPU governors were able to keep the CPU at lower frequency." I thought we don't need to track regressions for this metric just yet, but rather want to find correlation with power draw. We are not enabling the alerts for this metric in near future. In other words this adds some 'dead code', not covered by tests. Truly hypothetically: this may backfire in the future if someone aims at maximizing MHz during a certain action to save power (make critical computations faster). Will that reverse the improvement_direction for MHz? This example is not something realistic, but rather demonstrates my feelings towards introducing the unnecessary.
Overall my concerns are rather about the unnecessary complexity and speed of iteration. Feel free to ignore them as a better expert in telemetry. Other things L G T M (let me know when the CL that this depends on gets committed and you upload the rebase, I'd like to have a final look)
https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... tools/perf/metrics/cpu.py:49: round((frequency_sum / total_time) / 1000.0, 2), If the frequency numbers are in Hz, then should we divide by 1,000,000 here? https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.json File tools/perf/unit-info.json (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.js... tools/perf/unit-info.json:86: "why": "CPU governors were able to keep the CPU at lower frequency." On 2014/04/23 15:18:54, pasko wrote: > I thought we don't need to track regressions for this metric just yet, but > rather want to find correlation with power draw. We are not enabling the alerts > for this metric in near future. In other words this adds some 'dead code', not > covered by tests. > > Truly hypothetically: this may backfire in the future if someone aims at > maximizing MHz during a certain action to save power (make critical computations > faster). Will that reverse the improvement_direction for MHz? This example is > not something realistic, but rather demonstrates my feelings towards introducing > the unnecessary. Fair enough. This unit-info.json is only used by the perf dashboard; if no entry for MHz was added here, then on the result graphs, on the Y-axis it would display: "MHz (? is better)"
Please ping this review on Monday after http://go/perf-fixit .
folks, I rebased with the recently landed patches, would you mind another look please? https://codereview.chromium.org/239083010/diff/100001/tools/perf/measurements... File tools/perf/measurements/page_cycler.py (left): https://codereview.chromium.org/239083010/diff/100001/tools/perf/measurements... tools/perf/measurements/page_cycler.py:146: # changed. See crbug.com/301714. On 2014/04/23 15:18:54, pasko wrote: > should this also remove the comment? Done. https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... tools/perf/metrics/cpu.py:49: round((frequency_sum / total_time) / 1000.0, 2), On 2014/04/23 18:38:22, qyearsley wrote: > If the frequency numbers are in Hz, then should we divide by 1,000,000 here? I suppose /proc is in KHz? https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.json File tools/perf/unit-info.json (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/unit-info.js... tools/perf/unit-info.json:86: "why": "CPU governors were able to keep the CPU at lower frequency." On 2014/04/23 18:38:22, qyearsley wrote: > On 2014/04/23 15:18:54, pasko wrote: > > I thought we don't need to track regressions for this metric just yet, but > > rather want to find correlation with power draw. We are not enabling the > alerts > > for this metric in near future. In other words this adds some 'dead code', not > > covered by tests. > > > > Truly hypothetically: this may backfire in the future if someone aims at > > maximizing MHz during a certain action to save power (make critical > computations > > faster). Will that reverse the improvement_direction for MHz? This example is > > not something realistic, but rather demonstrates my feelings towards > introducing > > the unnecessary. > > Fair enough. This unit-info.json is only used by the perf dashboard; if no entry > for MHz was added here, then on the result graphs, on the Y-axis it would > display: "MHz (? is better)" removed this file..
LGTM, I think :-) https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... tools/perf/metrics/cpu.py:49: round((frequency_sum / total_time) / 1000.0, 2), On 2014/05/06 17:08:19, bulach wrote: > On 2014/04/23 18:38:22, qyearsley wrote: > > If the frequency numbers are in Hz, then should we divide by 1,000,000 here? > > I suppose /proc is in KHz? Might be good to add a link to some reference that says this so we can be sure that this is the case.
thanks qyearsley! I added a link about the documentation (that unfortunately doesn't quite specify the unit). I also changed so that it's always Hz at that level, and then higher up it divides up by 1,000,000 as you suggested. ptal. tony, another look too?
lgtm thanks! https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... tools/perf/metrics/cpu.py:73: # Skip any process_types that are empty. s/empty/empty or global/ ?
https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py File tools/perf/metrics/cpu.py (right): https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... tools/perf/metrics/cpu.py:73: # Skip any process_types that are empty. On 2014/05/07 17:59:27, pasko wrote: > s/empty/empty or global/ ? Done.
On 2014/05/08 12:38:02, bulach wrote: > https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.py > File tools/perf/metrics/cpu.py (right): > > https://codereview.chromium.org/239083010/diff/100001/tools/perf/metrics/cpu.... > tools/perf/metrics/cpu.py:73: # Skip any process_types that are empty. > On 2014/05/07 17:59:27, pasko wrote: > > s/empty/empty or global/ ? > > Done. I'm wondering, is this CL waiting on anything?
argh! sorry, it was waiting on me hitting the CQ.. :) CQing now..
The CQ bit was checked by bulach@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/239083010/180001
FYI, CQ is re-trying this CL (attempt #1). The failing builders are: android_aosp on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_aosp/builds/7...) android_chromium_gn_compile_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_chromium_gn_c...) android_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_clang_dbg/bui...) android_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/android_dbg/builds/18...) chromium_presubmit on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/chromium_presubmit/bu...) ios_dbg_simulator on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/ios_dbg_simulator/bui...) ios_rel_device on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/ios_rel_device/builds...) ios_rel_device_ninja on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/ios_rel_device_ninja/...) linux_chromium_chromeos_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_chrome...) linux_chromium_chromeos_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_chrome...) linux_chromium_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_clang_...) linux_chromium_gn_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_gn_rel...) linux_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_rel/bu...) mac_chromium_compile_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/mac_chromium_compile_...) mac_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/mac_chromium_rel/buil...) win_chromium_compile_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/win_chromium_compile_...) win_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/win_chromium_rel/buil...) win_chromium_x64_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/win_chromium_x64_rel/...) linux_gpu on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/linux_gpu/builds/...) mac_gpu on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/mac_gpu/builds/9895) win_gpu on tryserver.chromium.gpu (http://build.chromium.org/p/tryserver.chromium.gpu/builders/win_gpu/builds/12982)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_clang_dbg on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_clang_...) linux_chromium_rel on tryserver.chromium (http://build.chromium.org/p/tryserver.chromium/builders/linux_chromium_rel/bu...)
The CQ bit was checked by bulach@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/239083010/200001
The mac trybot failure might be legit.
The CQ bit was unchecked by bulach@chromium.org
On 2014/05/28 16:26:25, tonyg wrote: > The mac trybot failure might be legit. indeed. :( looking..
new job on the (re)try bot :) tony: the fix is relatively small, but feel free to look into it! (and thanks for pointing it out!)
tony: sorry, try-bot is green, but I'll need your OWNERS for tools/telemetry (I forgot I removed myself a few days ago, sorry!)
On 2014/05/29 08:46:21, bulach wrote: > tony: sorry, try-bot is green, but I'll need your OWNERS for tools/telemetry (I > forgot I removed myself a few days ago, sorry!) Hey, what's the status of this CL? Do we still want to add this? (Will probably need to be rebased, if so.)
On 2014/09/05 02:45:56, qyearsley wrote: > On 2014/05/29 08:46:21, bulach wrote: > > tony: sorry, try-bot is green, but I'll need your OWNERS for tools/telemetry > (I > > forgot I removed myself a few days ago, sorry!) > > Hey, what's the status of this CL? Do we still want to add this? (Will probably > need to be rebased, if so.) as we all know bulach@ has left the team. I think the experiment is interesting, but I don't have cycles to do anything about it in Q3.
tonyg@chromium.org changed reviewers: - tonyg@chromium.org |