|
|
Created:
7 years, 3 months ago by edmundyan Modified:
7 years, 3 months ago CC:
chromium-reviews, chrome-speed-team+watch_google.com, telemetry+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd cpu_stats for the browser
Implementing linux_backend for now.
BUG=263959
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=221897
Patch Set 1 #Patch Set 2 : . #
Total comments: 2
Patch Set 3 : Using /proc to get current CPU counters #
Total comments: 4
Patch Set 4 : Get jiffies from timer_list #Patch Set 5 : Finalize jiffies. Remove unittests #
Total comments: 2
Patch Set 6 : Only store the timestamp once instead of summing it #
Total comments: 2
Patch Set 7 : . #
Messages
Total messages: 25 (0 generated)
How's this? One issue is that the browser process is not affected by the javascript on the page. What's a unittest we can write to add load to the browser process?
+tonyg
looks solid, lgtm on my part. seems like the test cases are gonna fail on non-linux since you've got it stubbed out tho...
On 2013/09/03 22:53:15, nduca wrote: > looks solid, lgtm on my part. seems like the test cases are gonna fail on > non-linux since you've got it stubbed out tho... Nice catch. Using .get() instead. Or would it be better to write actual stubs for each platform and do {'CpuPercent': 0}. I've also renamed the test to specify that it's only testing the Renderer portion. We should probably write a test for Browser/Gpu too, but I'm not sure how to do that. Tony, care to take a look? Also, accidentally rebased so excuse the diff.
https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/linux_platform_backend.py (right): https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/linux_platform_backend.py:35: cpu_load = p.get_cpu_percent(1) I don't think this is what we want. This will block for 1 second while it measures CPU usage. Also, we currently don't have psutil available on non-linux platforms and even if we installed psutil on the other platforms, this will not work on android because there Telemetry runs on the host machine instead of the target device. Instead, what I think we want is an API that allows us to start the CPU sampling period without blocking, then query again later and get the average CPU usage over the sampling period. One way to grab the data we need on linux, android and cros is to scrape the CPU time counters from /proc/pid/stat (see GetMemoryStats), and I think there are equivalent ways of doing that on mac and windows (similar to their GetMemoryStats methods).
Would making cpu_load a metric make more sense then? It seems to follow better with the start/stop idea you have. https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/linux_platform_backend.py (right): https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/linux_platform_backend.py:35: cpu_load = p.get_cpu_percent(1) On 2013/09/05 00:11:41, tonyg wrote: > I don't think this is what we want. This will block for 1 second while it > measures CPU usage. > > Also, we currently don't have psutil available on non-linux platforms and even > if we installed psutil on the other platforms, this will not work on android > because there Telemetry runs on the host machine instead of the target device. > > Instead, what I think we want is an API that allows us to start the CPU sampling > period without blocking, then query again later and get the average CPU usage > over the sampling period. > > One way to grab the data we need on linux, android and cros is to scrape the CPU > time counters from /proc/pid/stat (see GetMemoryStats), and I think there are > equivalent ways of doing that on mac and windows (similar to their > GetMemoryStats methods). Would making cpu_load a metric make more sense then? It seems to follow better with the start/stop idea you have.
tonyg, i'm not sure i agree with you on the api thing. I'm pretty sure we can get per-process cpu usage without psutil using some adb shell cat /proc/<pid>/somethignorother i'm sure. Same for cros. The reason I felt it was good to get this at a per pid level is because the machine might end up doing some other stuff by accident that we dont catch.E.g system update runs, or vnc server is running and eating 5% on the cpu. Thoughts on that front?
On 2013/09/05 01:24:26, nduca wrote: > tonyg, i'm not sure i agree with you on the api thing. > > I'm pretty sure we can get per-process cpu usage without psutil using some adb > shell cat /proc/<pid>/somethignorother i'm sure. Same for cros. That was exactly my suggestion. Please re-read my comment :) > > The reason I felt it was good to get this at a per pid level is because the > machine might end up doing some other stuff by accident that we dont catch.E.g > system update runs, or vnc server is running and eating 5% on the cpu. > I definitely agree this needs to be per-pid. Wasn't proposing otherwise. Let me try to explain my comment better. As this patch is currently written, when you call GetCpuStats(), it will block for 1 second and report the CPU % usage over that 1 second of blocking. Instead, what I think we need is: StartCPUMonitoring() RunMeasurement() percent_usage = StopCPUMonitoring() Then we get CPU utilization over the course of the measurement we are interested in instead of during some idle blocking time. Please stop by tomorrow if I'm still not making sense.
On 2013/09/05 01:05:44, edmundyan wrote: > Would making cpu_load a metric make more sense then? It seems to follow better > with the start/stop idea you have. > > https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... > File tools/telemetry/telemetry/core/platform/linux_platform_backend.py (right): > > https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... > tools/telemetry/telemetry/core/platform/linux_platform_backend.py:35: cpu_load = > p.get_cpu_percent(1) > On 2013/09/05 00:11:41, tonyg wrote: > > I don't think this is what we want. This will block for 1 second while it > > measures CPU usage. > > > > Also, we currently don't have psutil available on non-linux platforms and even > > if we installed psutil on the other platforms, this will not work on android > > because there Telemetry runs on the host machine instead of the target device. > > > > Instead, what I think we want is an API that allows us to start the CPU > sampling > > period without blocking, then query again later and get the average CPU usage > > over the sampling period. > > > > One way to grab the data we need on linux, android and cros is to scrape the > CPU > > time counters from /proc/pid/stat (see GetMemoryStats), and I think there are > > equivalent ways of doing that on mac and windows (similar to their > > GetMemoryStats methods). > > Would making cpu_load a metric make more sense then? It seems to follow better > with the start/stop idea you have. Yes, this should be a Metric. But platform needs to take care of the OS specific bits. The way this works is that /proc/<pid>/stat (or win/mac equivalent) keeps track of the number of cycles that have been used by that pid. We read the values once at the beginning along with a timestamp. Then we read them again at the end with a timestamp. The utilization percent is then something like (end_cycles - start_cycles) / (end_timestamp - start_timestamp) * CYCLES_PER_SECOND. So the best way for this to work is for the platform method to return the current cycle's used by the process and the timestamp. Then the Metric can do the math upon stopping.
On 2013/09/05 04:43:40, tonyg wrote: > On 2013/09/05 01:05:44, edmundyan wrote: > > Would making cpu_load a metric make more sense then? It seems to follow > better > > with the start/stop idea you have. > > > > > https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... > > File tools/telemetry/telemetry/core/platform/linux_platform_backend.py > (right): > > > > > https://codereview.chromium.org/23717016/diff/16001/tools/telemetry/telemetry... > > tools/telemetry/telemetry/core/platform/linux_platform_backend.py:35: cpu_load > = > > p.get_cpu_percent(1) > > On 2013/09/05 00:11:41, tonyg wrote: > > > I don't think this is what we want. This will block for 1 second while it > > > measures CPU usage. > > > > > > Also, we currently don't have psutil available on non-linux platforms and > even > > > if we installed psutil on the other platforms, this will not work on android > > > because there Telemetry runs on the host machine instead of the target > device. > > > > > > Instead, what I think we want is an API that allows us to start the CPU > > sampling > > > period without blocking, then query again later and get the average CPU > usage > > > over the sampling period. > > > > > > One way to grab the data we need on linux, android and cros is to scrape the > > CPU > > > time counters from /proc/pid/stat (see GetMemoryStats), and I think there > are > > > equivalent ways of doing that on mac and windows (similar to their > > > GetMemoryStats methods). > > > > Would making cpu_load a metric make more sense then? It seems to follow > better > > with the start/stop idea you have. > > Yes, this should be a Metric. But platform needs to take care of the OS specific > bits. > > The way this works is that /proc/<pid>/stat (or win/mac equivalent) keeps track > of the number of cycles that have been used by that pid. We read the values once > at the beginning along with a timestamp. Then we read them again at the end with > a timestamp. The utilization percent is then something like (end_cycles - > start_cycles) / (end_timestamp - start_timestamp) * CYCLES_PER_SECOND. > > So the best way for this to work is for the platform method to return the > current cycle's used by the process and the timestamp. Then the Metric can do > the math upon stopping. Cool, that's exactly what I had implemented (I think)! Here's a linux version. Will add other platforms if this is what you want. 'CpuPercent' is a bit confusing now, as it calculates the avg CPU usage throughout the whole pid's runtime (which for the unittest includes a bunch of the startup time). It still passes though.
This is great, exactly what I had in mind. A couple of last nits and let's make sure Nat follows. https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/proc_util.py (right): https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:42: hertz = float(os.sysconf(os.sysconf_names['SC_CLK_TCK'])) or 100 Is there a way we can get the clock tick from proc instead of os? As this is written, this method could be shared on android except for this line (because it will return the host's frequency). https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:50: return {'CpuPercent': 100 * cpu_process_time_secs / total_time_secs, I don't think we should include this line because we'll never actually want to know the CPU percent over the entire process lifetime. We'll only want to know it over the course of the duration we are measuring. The second two fields give us exactly the info needed to calculate that.
https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/proc_util.py (right): https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:42: hertz = float(os.sysconf(os.sysconf_names['SC_CLK_TCK'])) or 100 On 2013/09/05 15:20:14, tonyg wrote: > Is there a way we can get the clock tick from proc instead of os? As this is > written, this method could be shared on android except for this line (because it > will return the host's frequency). From some googling, not easily. Atleast there is no direct way to access it from /proc. The only way I found is to take a jiffy sample before/after and calculate the average jiffies/second. This can be done using: date +"%s.%N" && grep '^jiffies' /proc/timer_list You would think /proc/uptime would work, but I'm not getting 100 from it. For android/mac (and actually all systems) we can use ps instead. ps -p <pid> -o cputime,etime And just convert "[dd-]hh:mm:ss" to seconds. It might be cleaner having all non-windows platforms do it this way. https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:50: return {'CpuPercent': 100 * cpu_process_time_secs / total_time_secs, On 2013/09/05 15:20:14, tonyg wrote: > I don't think we should include this line because we'll never actually want to > know the CPU percent over the entire process lifetime. We'll only want to know > it over the course of the duration we are measuring. > > The second two fields give us exactly the info needed to calculate that. Shall we remove the browser unittest then?
On 2013/09/05 18:11:18, edmundyan wrote: > https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... > File tools/telemetry/telemetry/core/platform/proc_util.py (right): > > https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... > tools/telemetry/telemetry/core/platform/proc_util.py:42: hertz = > float(os.sysconf(os.sysconf_names['SC_CLK_TCK'])) or 100 > On 2013/09/05 15:20:14, tonyg wrote: > > Is there a way we can get the clock tick from proc instead of os? As this is > > written, this method could be shared on android except for this line (because > it > > will return the host's frequency). > From some googling, not easily. Atleast there is no direct way to access it > from /proc. The only way I found is to take a jiffy sample before/after and > calculate the average jiffies/second. This can be done using: > > date +"%s.%N" && grep '^jiffies' /proc/timer_list > > You would think /proc/uptime would work, but I'm not getting 100 from it. > > For android/mac (and actually all systems) we can use ps instead. > ps -p <pid> -o cputime,etime > > And just convert "[dd-]hh:mm:ss" to seconds. It might be cleaner having all > non-windows platforms do it this way. > > https://codereview.chromium.org/23717016/diff/30001/tools/telemetry/telemetry... > tools/telemetry/telemetry/core/platform/proc_util.py:50: return {'CpuPercent': > 100 * cpu_process_time_secs / total_time_secs, > On 2013/09/05 15:20:14, tonyg wrote: > > I don't think we should include this line because we'll never actually want to > > know the CPU percent over the entire process lifetime. We'll only want to know > > it over the course of the duration we are measuring. > > > > The second two fields give us exactly the info needed to calculate that. > Shall we remove the browser unittest then? Actually, ignore that first paragraph. There IS a way to use /proc for everything. We can just skip converting back to seconds and store everything as jiffies since we only care about %CPU in the end. I ran some crude tests, and they oddly output consistently different results. After running high_cpu.html for 5 seconds a few times, the avg cpu load over that span were: Jiffies%: Renderer: 8.745 Browser: 2.19 Seconds%: Renderer: 9.2032 Browser: 1.62 They both give a good ballpark, but not sure which one is more correct. Which one do you like better? I guess if we are using jiffies-version for android, we should stay with that on cros/linux. The 'ps -p <pid> -o cputime,etime' option I mention above is actually not very optimal as the lowest denomination is seconds, so we won't get good precision unless we run the process quite long. Not sure how else we will do it on OSX though. tl;dr Tony: Added new patch. Lmk which method you like better.
If only jiffies works on android, then let's go with that everywhere. After that change, this patch lgtm. But we should probably make sure Nat's concern is resolved before landing.
On 2013/09/06 02:24:24, tonyg wrote: > If only jiffies works on android, then let's go with that everywhere. After that > change, this patch lgtm. > > But we should probably make sure Nat's concern is resolved before landing. Okay done. PTAL tony/nat. Will create bugs to implement backends for non-linux platforms.
lgtm https://codereview.chromium.org/23717016/diff/58001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/proc_util.py (right): https://codereview.chromium.org/23717016/diff/58001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:34: return 0 Should this raise?
lgtm
Sorry, actually found a bug. If we grab cpu_stats when the # of Renderer processes is different (Ie. different number of tabs) we won't be able to compare them. Fix is to only grab the timestamp once per process_type. https://codereview.chromium.org/23717016/diff/58001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/platform/proc_util.py (right): https://codereview.chromium.org/23717016/diff/58001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/platform/proc_util.py:34: return 0 On 2013/09/06 17:42:15, tonyg wrote: > Should this raise? Done.
https://codereview.chromium.org/23717016/diff/65001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/browser.py (right): https://codereview.chromium.org/23717016/diff/65001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/browser.py:189: for process_type in ['Browser', 'Renderer']: Why restrict to Browser/Renderer?
https://codereview.chromium.org/23717016/diff/65001/tools/telemetry/telemetry... File tools/telemetry/telemetry/core/browser.py (right): https://codereview.chromium.org/23717016/diff/65001/tools/telemetry/telemetry... tools/telemetry/telemetry/core/browser.py:189: for process_type in ['Browser', 'Renderer']: On 2013/09/06 21:35:48, tonyg wrote: > Why restrict to Browser/Renderer? No reason to. Fixed to be more general.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/edmundyan@chromium.org/23717016/74001
Commit queue rejected this change because the description was changed between the time the change entered the commit queue and the time it was ready to commit. You can safely check the commit box again.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/edmundyan@chromium.org/23717016/74001
Message was sent while issue was closed.
Change committed as 221897 |