|
|
DescriptionAdded GPU performance metrics.
Metrics have been added for the GPU using the GPU Tracer. Currently
we record the average, max, and stddev ms for each category
per frame for the following categories:
render_compositor: Both CPU/GPU side spent on the render compositor.
browser_compositor: Both CPU/GPU spent on browser compositor.
total: Both CPU/GPU and total frame time spent in renderer.
swap: Time between swaps, total FPS.
R=vmiura@chromium.org
BUG=None
Committed: https://crrev.com/dfb418e345ac741c380a5570b386a5bfad1c6115
Cr-Commit-Position: refs/heads/master@{#313419}
Patch Set 1 #Patch Set 2 : skip device metrics instead of filling 0 for unsupported devices #Patch Set 3 : Added test for filtering out context_ids #Patch Set 4 : Removed gpu_device check in gpu_times_unittest #
Total comments: 24
Patch Set 5 : Modified metric names, utilize thread_duration when available #Patch Set 6 : Simplified gpu_timeline using new SwapBuffer trace #
Total comments: 1
Patch Set 7 : Standardized names, added frame times #Patch Set 8 : Replaced gpu_times pagetest with standard timeline_based_measurement #
Total comments: 8
Patch Set 9 : Fixed help for TBM Options #
Total comments: 2
Patch Set 10 : Fixed bad merge, moved metrics object creation #Patch Set 11 : Get Metrics callback now returns a list of metrics #
Total comments: 4
Patch Set 12 : Fixed help string, added VerifyNonOverlappedRecords #
Total comments: 2
Patch Set 13 : Merged with master #Patch Set 14 : Fixed issue with metrics callback called multiple times per metric #
Total comments: 12
Patch Set 15 : Handle empty flags case #Patch Set 16 : Fixed flags variable name #Patch Set 17 : Fixed typos #Patch Set 18 : Added better assert message #Patch Set 19 : reverted accidental file modification #Patch Set 20 : Added early assert to check for valid overhead level #Patch Set 21 : Fixed unit test #Patch Set 22 : Renamed total frame time to swap frame time #Patch Set 23 : Fix swap name in unit tests #
Total comments: 14
Patch Set 24 : Fixed nits #Patch Set 25 : CPU and GPU frame times are calculated separately now #Patch Set 26 : Updated comments, ensure lack of thread_duration becomes 0 #Messages
Total messages: 49 (5 generated)
I've added the telemetry data for the toplevel traces now. The one thing I was not sure of was which pagesets to use so I just used the top 10. PTAL.
vmiura@chromium.org changed reviewers: + epenner@chromium.org
epenner@, mind taking a look? These are similar to thread_times.
Cool! At a high level, it seems like both this and thread_times are solving a very similar problem. I wonder if some of the code could even be reused. This is just some feedback and questions from looking at gpu_timeline.py https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:16: FRAME_END_MARKER = ('gpu', 'GLES2DecoderImpl::DoSwapBuffers') Two things on this trace: First, is there no GPU-side trace for this? That would be the best, as there is some latency between this and when this actually occurs on the GPU's timeline.. But over many frames it's fine, so not a big deal. Second, I believe this is never called on some platforms. In thread-times, I tried this, as well as adding "RealSwapBuffers" traces, but even that didn't work on all platforms (now I'm having trouble remembering which, I think it was Mac), so I switched to compositor swaps. You might want to do the same, or dig into the remaining platforms to find an equivalent trace. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:21: GPU_SERVICE_DEVICE_VARIANCE = 5 See below, I'm not sure this threshold is really needed. Perhaps it might reduce some noise, but I don't believe we should fail if we exceed this. It seems to me it will just cause the test to flake occasionally. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:61: TRACKED_NAMES_service: Using the TRACKED_NAMES dictionary, we include It took me a minute to parse what these mean. Does total_frame mean the total clock time for a frame? If so, I would call this 'mean_frame_time' for consistency with other metrics. Does 'service' mean the CPU time spent in chrome's GPU service? Or perhaps cpu time in the driver? And does 'device' mean the real GPU time? If so, I would find 'dispatch' or 'cpu-dispatch' or 'service-cpu' more clear. Also, have you considered thread_duration instead of duration for the cpu-time in the service? A clock duration will often lack meaning for anything but mean_frame_time/fps, as several unrelated things contribute to it (descheduling and blocking on the GPU). https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:75: if event.thread_start: This is the only use of thread_start, are you sure this is what you want? https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:91: # on when the tracing stopped the device trace could not have come back yet. Another reason for this could be that the device events are likely delayed by 1-2 frames, so if these events were previously clipped against a time-range, we might lose 1-2 GPU frames at the right side. It's difficult to guarantee that the traces will correlate exactly, so I'm inclined to just accept the noise. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:98: assert len(service_events) == len(device_events), ( It seems like we should either have a hard '==' in the assertion (without the event_difference threshold), or just accept this as a small source of noise, and add a comment somewhere about this limitation. I'm worried that this would flake depending on the number of gpu events per frame. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:129: for buffer_swap_event in buffer_swap_events: Couple minor things: Firstly, this looks to be M*N. Perhaps not a big deal, but do we even need to group gpu events into frames? Given the GPU's latency, the frame grouping don't be perfect. Also, it seems like given the metrics you want, you just need something simple like this? time_per_frame = sum(times_events) / count(frame_events) Or am I missing something else that you want to calculate by grouping all the traces into frames?
https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:16: FRAME_END_MARKER = ('gpu', 'GLES2DecoderImpl::DoSwapBuffers') We should be able to add a "DoSwapBuffers" gpu trace which drops into gpu.service & gpu.device categories. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:21: GPU_SERVICE_DEVICE_VARIANCE = 5 Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & GPU tracks, and remove this matching. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:61: TRACKED_NAMES_service: Using the TRACKED_NAMES dictionary, we include Currently the 'gpu.service' traces are using the the "asynchronous traces" (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the 'thread_duration' numbers. We can look into making these normal traces. It may violate trace stacking, but I think Tracing can cope with that.
https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:16: FRAME_END_MARKER = ('gpu', 'GLES2DecoderImpl::DoSwapBuffers') On 2015/01/16 01:41:15, vmiura wrote: > We should be able to add a "DoSwapBuffers" gpu trace which drops into > gpu.service & gpu.device categories. Done here: https://codereview.chromium.org/799753009/ Will switch to use that instead once the CL lands. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:21: GPU_SERVICE_DEVICE_VARIANCE = 5 On 2015/01/16 01:41:14, vmiura wrote: > Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & GPU > tracks, and remove this matching. Yes, once the Swap trace is plumbed through we won't need this anymore. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:61: TRACKED_NAMES_service: Using the TRACKED_NAMES dictionary, we include On 2015/01/16 01:41:14, vmiura wrote: > Currently the 'gpu.service' traces are using the the "asynchronous traces" > (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the > 'thread_duration' numbers. > > We can look into making these normal traces. It may violate trace stacking, but > I think Tracing can cope with that. I've changed the Traces to use normal traces instead of async traces here: https://codereview.chromium.org/855653003/ After that lands I will use convert the values to use thread_duration. I've also changed the names a bit to make it more clear. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:75: if event.thread_start: On 2015/01/16 01:17:59, epenner wrote: > This is the only use of thread_start, are you sure this is what you want? I was going to look into this later, but the AsyncSlices are inserted twice for some reason. The only different was the thread_start is set for one of them, and the other one is None. So this was only to filter out one of the extra ones. Now that we are not using Async traces anymore this is irrelevant, but we should still figure out why there are multiple async traces. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:91: # on when the tracing stopped the device trace could not have come back yet. On 2015/01/16 01:17:59, epenner wrote: > Another reason for this could be that the device events are likely delayed by > 1-2 frames, so if these events were previously clipped against a time-range, we > might lose 1-2 GPU frames at the right side. It's difficult to guarantee that > the traces will correlate exactly, so I'm inclined to just accept the noise. This is no longer relevant and will be removed once the new buffer swap traces lands. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:98: assert len(service_events) == len(device_events), ( On 2015/01/16 01:17:59, epenner wrote: > It seems like we should either have a hard '==' in the assertion (without the > event_difference threshold), or just accept this as a small source of noise, and > add a comment somewhere about this limitation. I'm worried that this would flake > depending on the number of gpu events per frame. Before we were using the BufferSwap trace that only happens on the CPU so it was important that the CPU/GPU traces matched up perfectly (if anything for the zip call below). Once the SwapBuffer traces gets plumbed through to the device traces most of this will be unnecessary though. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:129: for buffer_swap_event in buffer_swap_events: On 2015/01/16 01:17:59, epenner wrote: > Couple minor things: > > Firstly, this looks to be M*N. Perhaps not a big deal, but do we even need to > group gpu events into frames? Given the GPU's latency, the frame grouping don't > be perfect. Also, it seems like given the metrics you want, you just need > something simple like this? > > time_per_frame = sum(times_events) / count(frame_events) > > Or am I missing something else that you want to calculate by grouping all the > traces into frames? Before I only had SwapBuffer traces on the CPU side so it was impossible to figure out the start and end for a frame on the GPU side. The grouping was just to figure out the start and end of a frame on the GPU device side. After the SwapBuffer trace CL lands this will be simplified and I can remove this grouping step.
Utilizing the new SwapBuffer traces which goes through both the gpu service & device, lots of the old code has been deleted and the new one is much more simplified. PTAL. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:16: FRAME_END_MARKER = ('gpu', 'GLES2DecoderImpl::DoSwapBuffers') On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:15, vmiura wrote: > > We should be able to add a "DoSwapBuffers" gpu trace which drops into > > gpu.service & gpu.device categories. > > Done here: https://codereview.chromium.org/799753009/ > > Will switch to use that instead once the CL lands. > Done. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:21: GPU_SERVICE_DEVICE_VARIANCE = 5 On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:14, vmiura wrote: > > Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & > GPU > > tracks, and remove this matching. > > Yes, once the Swap trace is plumbed through we won't need this anymore. Done. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:61: TRACKED_NAMES_service: Using the TRACKED_NAMES dictionary, we include On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:14, vmiura wrote: > > Currently the 'gpu.service' traces are using the the "asynchronous traces" > > (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the > > 'thread_duration' numbers. > > > > We can look into making these normal traces. It may violate trace stacking, > but > > I think Tracing can cope with that. > > I've changed the Traces to use normal traces instead of async traces here: > https://codereview.chromium.org/855653003/ > > After that lands I will use convert the values to use thread_duration. > > I've also changed the names a bit to make it more clear. Done. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:91: # on when the tracing stopped the device trace could not have come back yet. On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > Another reason for this could be that the device events are likely delayed by > > 1-2 frames, so if these events were previously clipped against a time-range, > we > > might lose 1-2 GPU frames at the right side. It's difficult to guarantee that > > the traces will correlate exactly, so I'm inclined to just accept the noise. > > This is no longer relevant and will be removed once the new buffer swap traces > lands. Done. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:98: assert len(service_events) == len(device_events), ( On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > It seems like we should either have a hard '==' in the assertion (without the > > event_difference threshold), or just accept this as a small source of noise, > and > > add a comment somewhere about this limitation. I'm worried that this would > flake > > depending on the number of gpu events per frame. > > Before we were using the BufferSwap trace that only happens on the CPU so it was > important that the CPU/GPU traces matched up perfectly (if anything for the zip > call below). Once the SwapBuffer traces gets plumbed through to the device > traces most of this will be unnecessary though. Done. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:129: for buffer_swap_event in buffer_swap_events: On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > Couple minor things: > > > > Firstly, this looks to be M*N. Perhaps not a big deal, but do we even need to > > group gpu events into frames? Given the GPU's latency, the frame grouping > don't > > be perfect. Also, it seems like given the metrics you want, you just need > > something simple like this? > > > > time_per_frame = sum(times_events) / count(frame_events) > > > > Or am I missing something else that you want to calculate by grouping all the > > traces into frames? > > Before I only had SwapBuffer traces on the CPU side so it was impossible to > figure out the start and end for a frame on the GPU side. The grouping was just > to figure out the start and end of a frame on the GPU device side. After the > SwapBuffer trace CL lands this will be simplified and I can remove this grouping > step. This is all removed now.
On 2015/01/20 19:37:26, David Yen wrote: > Utilizing the new SwapBuffer traces which goes through both the gpu service & > device, lots of the old code has been deleted and the new one is much more > simplified. PTAL. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > File tools/perf/metrics/gpu_timeline.py (right): > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:16: FRAME_END_MARKER = ('gpu', > 'GLES2DecoderImpl::DoSwapBuffers') > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:41:15, vmiura wrote: > > > We should be able to add a "DoSwapBuffers" gpu trace which drops into > > > gpu.service & gpu.device categories. > > > > Done here: https://codereview.chromium.org/799753009/ > > > > Will switch to use that instead once the CL lands. > > > > Done. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:21: GPU_SERVICE_DEVICE_VARIANCE = 5 > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:41:14, vmiura wrote: > > > Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & > > GPU > > > tracks, and remove this matching. > > > > Yes, once the Swap trace is plumbed through we won't need this anymore. > > Done. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:61: TRACKED_NAMES_service: Using the > TRACKED_NAMES dictionary, we include > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:41:14, vmiura wrote: > > > Currently the 'gpu.service' traces are using the the "asynchronous traces" > > > (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have > the > > > 'thread_duration' numbers. > > > > > > We can look into making these normal traces. It may violate trace stacking, > > but > > > I think Tracing can cope with that. > > > > I've changed the Traces to use normal traces instead of async traces here: > > https://codereview.chromium.org/855653003/ > > > > After that lands I will use convert the values to use thread_duration. > > > > I've also changed the names a bit to make it more clear. > > Done. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:91: # on when the tracing stopped the device > trace could not have come back yet. > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:17:59, epenner wrote: > > > Another reason for this could be that the device events are likely delayed > by > > > 1-2 frames, so if these events were previously clipped against a time-range, > > we > > > might lose 1-2 GPU frames at the right side. It's difficult to guarantee > that > > > the traces will correlate exactly, so I'm inclined to just accept the noise. > > > > This is no longer relevant and will be removed once the new buffer swap traces > > lands. > > Done. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:98: assert len(service_events) == > len(device_events), ( > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:17:59, epenner wrote: > > > It seems like we should either have a hard '==' in the assertion (without > the > > > event_difference threshold), or just accept this as a small source of noise, > > and > > > add a comment somewhere about this limitation. I'm worried that this would > > flake > > > depending on the number of gpu events per frame. > > > > Before we were using the BufferSwap trace that only happens on the CPU so it > was > > important that the CPU/GPU traces matched up perfectly (if anything for the > zip > > call below). Once the SwapBuffer traces gets plumbed through to the device > > traces most of this will be unnecessary though. > > Done. > > https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... > tools/perf/metrics/gpu_timeline.py:129: for buffer_swap_event in > buffer_swap_events: > On 2015/01/16 22:58:04, David Yen wrote: > > On 2015/01/16 01:17:59, epenner wrote: > > > Couple minor things: > > > > > > Firstly, this looks to be M*N. Perhaps not a big deal, but do we even need > to > > > group gpu events into frames? Given the GPU's latency, the frame grouping > > don't > > > be perfect. Also, it seems like given the metrics you want, you just need > > > something simple like this? > > > > > > time_per_frame = sum(times_events) / count(frame_events) > > > > > > Or am I missing something else that you want to calculate by grouping all > the > > > traces into frames? > > > > Before I only had SwapBuffer traces on the CPU side so it was impossible to > > figure out the start and end for a frame on the GPU side. The grouping was > just > > to figure out the start and end of a frame on the GPU device side. After the > > SwapBuffer trace CL lands this will be simplified and I can remove this > grouping > > step. > > This is all removed now. ping
I tried out the measurement. I'm not sure if I did something wrong. Some notes below. =================== View result at file:///usr/local/google/code/content_shell/src/gpu_times_top_25_smooth/results.html vmiura@vmiura1:/usr/local/google/code/content_shell/src$ ./tools/perf/run_benchmark --browser android-content-shell gpu_times.top_25_smooth --results-label=Software --output-dir ./gpu_times_top_25_smooth --extra-browser-args="--disable-gpu-rasterization --ignore-certificate-errors" --page-filter=Wordpress WARNING:root:Installing ./out/Release/apks/ContentShell.apk on device if needed. [ RUN ] Wordpress 77 KB/s (3383 bytes in 0.042s) [ OK ] Wordpress (22152 ms) [ PASSED ] 1 test. Pages: [Wordpress] *RESULT compositor_device-gpu_avg: compositor_device-gpu_avg= 0 ms *RESULT compositor_device-gpu_max: compositor_device-gpu_max= 0 ms *RESULT compositor_device-gpu_stddev: compositor_device-gpu_stddev= 0.0 ms *RESULT compositor_service-cpu_avg: compositor_service-cpu_avg= 0 ms *RESULT compositor_service-cpu_max: compositor_service-cpu_max= 0 ms *RESULT compositor_service-cpu_stddev: compositor_service-cpu_stddev= 0.0 ms *RESULT mean_frame_avg: mean_frame_avg= 4718.14399993 ms *RESULT mean_frame_max: mean_frame_max= 4718.14399993 ms *RESULT mean_frame_stddev: mean_frame_stddev= 0.0 ms *RESULT mean_gpu_device-gpu_avg: mean_gpu_device-gpu_avg= 186.352001726 ms *RESULT mean_gpu_device-gpu_max: mean_gpu_device-gpu_max= 186.352001726 ms *RESULT mean_gpu_device-gpu_stddev: mean_gpu_device-gpu_stddev= 0.0 ms *RESULT mean_gpu_service-cpu_avg: mean_gpu_service-cpu_avg= 1419.806 ms *RESULT mean_gpu_service-cpu_max: mean_gpu_service-cpu_max= 1419.806 ms *RESULT mean_gpu_service-cpu_stddev: mean_gpu_service-cpu_stddev= 0.0 ms *RESULT render_compositor_device-gpu_avg: render_compositor_device-gpu_avg= 7.70999993133 ms *RESULT render_compositor_device-gpu_max: render_compositor_device-gpu_max= 7.70999993133 ms *RESULT render_compositor_device-gpu_stddev: render_compositor_device-gpu_stddev= 0.0 ms *RESULT render_compositor_service-cpu_avg: render_compositor_service-cpu_avg= 386.682 ms *RESULT render_compositor_service-cpu_max: render_compositor_service-cpu_max= 386.682 ms *RESULT render_compositor_service-cpu_stddev: render_compositor_service-cpu_stddev= 0.0 ms RESULT telemetry_page_measurement_results: num_failed= 0 count RESULT telemetry_page_measurement_results: num_errored= 0 count Pages: [Wordpress] =================== I'm a bit confused by the naming: - "mean_frame_average" -- average of the mean? - "compositor_service-cpu_avg" -- we should call it either service, or cpu. - "compositor_device-gpu_avg" -- we should call it either device, or gpu. Also the values don't look correct very high, and avg == max.
It would be great to get all the actual cpu & gpu frame times as well (in a ListOfScalarValues result).
nednguyen@google.com changed reviewers: + nednguyen@google.com
https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... File tools/perf/measurements/gpu_times.py (right): https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... tools/perf/measurements/gpu_times.py:15: class GPUTimes(page_test.PageTest): Why not adding this metrics to timeline_based_measurement?
Ahh, ignore the issue about bad values. Seems I didn't have the necessary patch for SwapBuffers so was getting a frame of 1. This is what I get on Nexus 4. Pages: [Wordpress] *RESULT compositor_device-gpu_avg: compositor_device-gpu_avg= 0 ms *RESULT compositor_device-gpu_max: compositor_device-gpu_max= 0 ms *RESULT compositor_device-gpu_stddev: compositor_device-gpu_stddev= 0.0 ms *RESULT compositor_service-cpu_avg: compositor_service-cpu_avg= 0 ms *RESULT compositor_service-cpu_max: compositor_service-cpu_max= 0 ms *RESULT compositor_service-cpu_stddev: compositor_service-cpu_stddev= 0.0 ms *RESULT mean_frame_avg: mean_frame_avg= 30.6377290326 ms *RESULT mean_frame_max: mean_frame_max= 1348.90799999 ms *RESULT mean_frame_stddev: mean_frame_stddev= 112.730757875 ms *RESULT mean_gpu_device-gpu_avg: mean_gpu_device-gpu_avg= 1.29135482965 ms *RESULT mean_gpu_device-gpu_max: mean_gpu_device-gpu_max= 5.71199989319 ms *RESULT mean_gpu_device-gpu_stddev: mean_gpu_device-gpu_stddev= 1.17938375687 ms *RESULT mean_gpu_service-cpu_avg: mean_gpu_service-cpu_avg= 5.23331612903 ms *RESULT mean_gpu_service-cpu_max: mean_gpu_service-cpu_max= 77.304 ms *RESULT mean_gpu_service-cpu_stddev: mean_gpu_service-cpu_stddev= 7.24114888831 ms *RESULT render_compositor_device-gpu_avg: render_compositor_device-gpu_avg= 0.0475483786798 ms *RESULT render_compositor_device-gpu_max: render_compositor_device-gpu_max= 1.69999966812 ms *RESULT render_compositor_device-gpu_stddev: render_compositor_device-gpu_stddev= 0.175933025227 ms *RESULT render_compositor_service-cpu_avg: render_compositor_service-cpu_avg= 1.22275483871 ms *RESULT render_compositor_service-cpu_max: render_compositor_service-cpu_max= 42.366 ms *RESULT render_compositor_service-cpu_stddev: render_compositor_service-cpu_stddev= 4.46883791395 ms RESULT telemetry_page_measurement_results: num_failed= 0 count RESULT telemetry_page_measurement_results: num_errored= 0 count Pages: [Wordpress]
On 2015/01/23 02:45:27, nednguyen wrote: > https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... > File tools/perf/measurements/gpu_times.py (right): > > https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... > tools/perf/measurements/gpu_times.py:15: class GPUTimes(page_test.PageTest): > Why not adding this metrics to timeline_based_measurement? These metrics are GPU specific timeline measurements which require certain GPU categories to be turned on. These categories are not exactly light weight either so they should only be turned on during the GPU metric tests.
On 2015/01/23 22:20:10, David Yen wrote: > On 2015/01/23 02:45:27, nednguyen wrote: > > > https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... > > File tools/perf/measurements/gpu_times.py (right): > > > > > https://codereview.chromium.org/854833003/diff/100001/tools/perf/measurements... > > tools/perf/measurements/gpu_times.py:15: class GPUTimes(page_test.PageTest): > > Why not adding this metrics to timeline_based_measurement? > > These metrics are GPU specific timeline measurements which require certain GPU > categories to be turned on. These categories are not exactly light weight either > so they should only be turned on during the GPU metric tests. That's ok. We can expose the categories for timeline_based_measurement. See https://code.google.com/p/chromium/codesearch#chromium/src/tools/telemetry/te...
On 2015/01/23 01:52:31, vmiura wrote: > It would be great to get all the actual cpu & gpu frame times as well (in a > ListOfScalarValues result). I've changed the names to the following scheme: $(NAME)_$(mean,stddev,max)_$(gpu,cpu)_time Where NAME is one of "render_compositor", "browser_compositor", "total". I also added the values for the actual frame times under the name: $(NAME)_$(gpu,cpu)_frame_times The only special case one is the total frame time which is under the name: total_$(mean,stddev,max)_time PTAL.
On 2015/01/23 22:24:20, David Yen wrote: > On 2015/01/23 01:52:31, vmiura wrote: > > It would be great to get all the actual cpu & gpu frame times as well (in a > > ListOfScalarValues result). > > I've changed the names to the following scheme: > > $(NAME)_$(mean,stddev,max)_$(gpu,cpu)_time > > Where NAME is one of "render_compositor", "browser_compositor", "total". > > I also added the values for the actual frame times under the name: > > $(NAME)_$(gpu,cpu)_frame_times > > The only special case one is the total frame time which is under the name: > total_$(mean,stddev,max)_time > > PTAL. I've now removed measurements/gpu_times.py and replaced it with some changes to timeline_based_measurements. I've modified the options to be able to configure the standard TimelineBasedPageTest to measure using custom categories and custom metrics.
https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... File tools/perf/benchmarks/gpu_times.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... tools/perf/benchmarks/gpu_times.py:15: return gpu_timeline.GPUTimelineMetric() This mean your benchmark will compute gpu_timeline on all the interaction records in the page. Is this really what you want? https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:141: metrics: Metrics class to use when measuring pages. Documentation doesn't match. https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:216: if tracing_controller.IsDisplayTracingSupported(): Why is this change? I think I removed this recently.
nednguyen@google.com changed reviewers: + benjhayden@chromium.org
+Ben: I think this patch will make your work to migrate thread_times to TBM possible.
https://codereview.chromium.org/854833003/diff/160001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/160001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:118: metric = self._get_metric_from_metric_type_callback(metric_type) Can you make it so that get_metric_from_metric_type_callback return a list of metrics instead of 1? I can imagine that people may want to have more than 1 metrics for a certain type.
https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... File tools/perf/benchmarks/gpu_times.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... tools/perf/benchmarks/gpu_times.py:15: return gpu_timeline.GPUTimelineMetric() On 2015/01/23 23:58:13, nednguyen wrote: > This mean your benchmark will compute gpu_timeline on all the interaction > records in the page. Is this really what you want? Done. I moved the creation to CreateTimelineBasedMeasurementOptions(). https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:141: metrics: Metrics class to use when measuring pages. On 2015/01/23 23:58:13, nednguyen wrote: > Documentation doesn't match. Done. https://codereview.chromium.org/854833003/diff/140001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:216: if tracing_controller.IsDisplayTracingSupported(): On 2015/01/23 23:58:13, nednguyen wrote: > Why is this change? I think I removed this recently. Looks like bad merge, this is fixed now. https://codereview.chromium.org/854833003/diff/160001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/160001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:118: metric = self._get_metric_from_metric_type_callback(metric_type) On 2015/01/24 00:15:46, nednguyen wrote: > Can you make it so that get_metric_from_metric_type_callback return a list of > metrics instead of 1? I can imagine that people may want to have more than 1 > metrics for a certain type. Done.
https://codereview.chromium.org/854833003/diff/200001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/200001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:59: def AddResults(self, model, _, interaction_records, results): Does this implementation support overlapped interaction records? If not, you may want to make the call self.VerifyNonOverlappedRecords https://codereview.chromium.org/854833003/diff/200001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/200001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:141: get_metrics_from_metric_type_callback: Callback function to create a metric "return a list of metrics based on timeline interaction records' flag"
https://codereview.chromium.org/854833003/diff/200001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/200001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:59: def AddResults(self, model, _, interaction_records, results): On 2015/01/24 00:32:22, nednguyen wrote: > Does this implementation support overlapped interaction records? If not, you may > want to make the call self.VerifyNonOverlappedRecords Done. https://codereview.chromium.org/854833003/diff/200001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/200001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:141: get_metrics_from_metric_type_callback: Callback function to create a metric On 2015/01/24 00:32:22, nednguyen wrote: > "return a list of metrics based on timeline interaction records' flag" Done.
https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... File tools/perf/benchmarks/gpu_times.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... tools/perf/benchmarks/gpu_times.py:15: return gpu_timeline.GPUTimelineMetric() On 2015/01/24 00:22:33, David Yen wrote: > On 2015/01/23 23:58:13, nednguyen wrote: > > This mean your benchmark will compute gpu_timeline on all the interaction > > records in the page. Is this really what you want? > > Done. I moved the creation to CreateTimelineBasedMeasurementOptions(). Sorry, by "compute gpu_timeline on all the interaction", I meant that you will get the cpu metric computed for all the interaction records issued on the page. I was asking whether this is what you want, or do you want to have some filtering here, i.e: only return gpu_timeline.GPUTimelineMetric() if your interaction records have some special flags. Note that no filtering is fine too, but I just want to make sure you are aware of this. https://codereview.chromium.org/854833003/diff/220001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/220001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:118: metrics_list = self._get_metrics_from_metric_type_callback(metric_type) I just realize you are duplicating your metrics multiple times when the interaction have multiple metric types. Let make _get_metrics_from_metric_types_callback take a list of interaction's flags as input. You can define a method for TimelineInteractionRecords class named GetUserDefinedFlags() that return all flags (could be empty) in the interaction record except the "repeatable" flag which is reserved for records repeat checking logic. Then instead of the for loop in #108, you do: # make sure that all the interaction records have the same flags (see comment in line 109) records_custom_flags = interactions[0].GetUserDefinedFlags() metrics = set(self._get_metrics_from_record_flags_callback(records_custom_flags)) for m in metrics: m.AddResults(...) With this change, the default _GetMetricsFromRecordsFlags would be implemented as: def _GetMetricsFromRecordFlags(records_custom_flags): metrics = [] if 'is_smooth' in records_custom_flags: metrics += smoothness.SmoothnessMetric() if 'is_fast' in records_custom_flags: metrics += fast_metric.FastMetric() if 'is_responsive' in records_custom_flags: metrics += responsiveness_metric.ResponsivenessMetric() return metrics
https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... File tools/perf/benchmarks/gpu_times.py (right): https://codereview.chromium.org/854833003/diff/140001/tools/perf/benchmarks/g... tools/perf/benchmarks/gpu_times.py:15: return gpu_timeline.GPUTimelineMetric() On 2015/01/24 16:43:04, nednguyen wrote: > On 2015/01/24 00:22:33, David Yen wrote: > > On 2015/01/23 23:58:13, nednguyen wrote: > > > This mean your benchmark will compute gpu_timeline on all the interaction > > > records in the page. Is this really what you want? > > > > Done. I moved the creation to CreateTimelineBasedMeasurementOptions(). > > Sorry, by "compute gpu_timeline on all the interaction", I meant that you will > get the cpu > metric computed for all the interaction records issued on the page. I was asking > whether this is what you want, or do you want to have some filtering here, i.e: > only return gpu_timeline.GPUTimelineMetric() if your interaction records have > some special flags. Note that no filtering is fine too, but I just want to make > sure you are aware of this. No filtering sounds like what we want. I will move the _CreateGPUTimelineMetric(_) function back to where it was before. https://codereview.chromium.org/854833003/diff/220001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/220001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:118: metrics_list = self._get_metrics_from_metric_type_callback(metric_type) On 2015/01/24 16:43:04, nednguyen wrote: > I just realize you are duplicating your metrics multiple times when the > interaction have multiple metric types. > > Let make _get_metrics_from_metric_types_callback take a list of interaction's > flags as input. You can define a method for TimelineInteractionRecords class > named GetUserDefinedFlags() that return all flags (could be empty) in the > interaction record except the "repeatable" flag which is reserved for records > repeat checking logic. > > Then instead of the for loop in #108, you do: > # make sure that all the interaction records have the same flags (see comment in > line 109) > records_custom_flags = interactions[0].GetUserDefinedFlags() > metrics = > set(self._get_metrics_from_record_flags_callback(records_custom_flags)) > for m in metrics: > m.AddResults(...) > > With this change, the default _GetMetricsFromRecordsFlags would be implemented > as: > def _GetMetricsFromRecordFlags(records_custom_flags): > metrics = [] > if 'is_smooth' in records_custom_flags: > metrics += smoothness.SmoothnessMetric() > if 'is_fast' in records_custom_flags: > metrics += fast_metric.FastMetric() > if 'is_responsive' in records_custom_flags: > metrics += responsiveness_metric.ResponsivenessMetric() > return metrics Really great catch. This is now fixed.
The overall shape looks good to me. Can you pull out some change to another patch as I suggested below? https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:34: tir_module.IS_FAST, fast_metric.FastMetric, I think you mean "tir_module.IS_FAST: fast_metric.FastMetric," https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:52: return [metric() for flag, metric in DEFAULT_METRICS if flag in flags_set] in DEFAULT_METRICS.iteritems() https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. Instead of making overhead_level sometimes object, sometimes string, you can just make it TracingCategoryFilter type only. So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object instead of enum. For this change though, can you please make a different patch sent to me for the ease of reviewing? https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:159: self._extra_category_filters.extend(filters) What is this method for?
https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:34: tir_module.IS_FAST, fast_metric.FastMetric, On 2015/01/26 19:45:48, nednguyen wrote: > I think you mean "tir_module.IS_FAST: fast_metric.FastMetric," Done. https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:52: return [metric() for flag, metric in DEFAULT_METRICS if flag in flags_set] On 2015/01/26 19:45:48, nednguyen wrote: > in DEFAULT_METRICS.iteritems() Done. https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. On 2015/01/26 19:45:48, nednguyen wrote: > Instead of making overhead_level sometimes object, sometimes string, you can > just make it TracingCategoryFilter type only. > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object instead > of enum. > > For this change though, can you please make a different patch sent to me for the > ease of reviewing? I think these are better as strings because we cannot just use global objects that get reused. The default objects need to be new option objects per TBM instance because those option objects are actually modifiable. The new "extra_category_filters" that was added actually do modify the category_filter object right afterwards. If we were going to replace the strings as types we would probably replace the NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a string, or various functions such as NoOverheadLevelFilterCategory(). I'm not sure if replacing the strings with functions is going to be cleaner in the long run. https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:159: self._extra_category_filters.extend(filters) On 2015/01/26 19:45:48, nednguyen wrote: > What is this method for? This was pulled in from master, I merely renamed the variable.
lgtm https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. On 2015/01/26 20:01:48, David Yen wrote: > On 2015/01/26 19:45:48, nednguyen wrote: > > Instead of making overhead_level sometimes object, sometimes string, you can > > just make it TracingCategoryFilter type only. > > > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object > instead > > of enum. > > > > For this change though, can you please make a different patch sent to me for > the > > ease of reviewing? > > I think these are better as strings because we cannot just use global objects > that get reused. The default objects need to be new option objects per TBM > instance because those option objects are actually modifiable. The new > "extra_category_filters" that was added actually do modify the category_filter > object right afterwards. > > If we were going to replace the strings as types we would probably replace the > NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a string, > or various functions such as NoOverheadLevelFilterCategory(). I'm not sure if > replacing the strings with functions is going to be cleaner in the long run. You are right. I thought TracingCategoryFilter is immutable but it isn't. For now, please add assertion that overhead_level either in ALL_OVERHEAD_LEVELS or it is of type TracingCategoryFilter.
On 2015/01/26 22:26:51, nednguyen wrote: > lgtm > > https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... > File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): > > https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... > tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may > be removed. > On 2015/01/26 20:01:48, David Yen wrote: > > On 2015/01/26 19:45:48, nednguyen wrote: > > > Instead of making overhead_level sometimes object, sometimes string, you > can > > > just make it TracingCategoryFilter type only. > > > > > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > > > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object > > instead > > > of enum. > > > > > > For this change though, can you please make a different patch sent to me for > > the > > > ease of reviewing? > > > > I think these are better as strings because we cannot just use global objects > > that get reused. The default objects need to be new option objects per TBM > > instance because those option objects are actually modifiable. The new > > "extra_category_filters" that was added actually do modify the category_filter > > object right afterwards. > > > > If we were going to replace the strings as types we would probably replace the > > NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a string, > > or various functions such as NoOverheadLevelFilterCategory(). I'm not sure if > > replacing the strings with functions is going to be cleaner in the long run. > > You are right. I thought TracingCategoryFilter is immutable but it isn't. For > now, please add assertion that overhead_level either in ALL_OVERHEAD_LEVELS or > it is of type TracingCategoryFilter. Code looks good on my side. Please wait for other reviewers' approval.
https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. On 2015/01/26 22:26:51, nednguyen wrote: > On 2015/01/26 20:01:48, David Yen wrote: > > On 2015/01/26 19:45:48, nednguyen wrote: > > > Instead of making overhead_level sometimes object, sometimes string, you > can > > > just make it TracingCategoryFilter type only. > > > > > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > > > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object > > instead > > > of enum. > > > > > > For this change though, can you please make a different patch sent to me for > > the > > > ease of reviewing? > > > > I think these are better as strings because we cannot just use global objects > > that get reused. The default objects need to be new option objects per TBM > > instance because those option objects are actually modifiable. The new > > "extra_category_filters" that was added actually do modify the category_filter > > object right afterwards. > > > > If we were going to replace the strings as types we would probably replace the > > NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a string, > > or various functions such as NoOverheadLevelFilterCategory(). I'm not sure if > > replacing the strings with functions is going to be cleaner in the long run. > > You are right. I thought TracingCategoryFilter is immutable but it isn't. For > now, please add assertion that overhead_level either in ALL_OVERHEAD_LEVELS or > it is of type TracingCategoryFilter. I think the assertion on line 215 handles this case. I will add a readable assertion text.
https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. On 2015/01/26 22:37:38, David Yen wrote: > On 2015/01/26 22:26:51, nednguyen wrote: > > On 2015/01/26 20:01:48, David Yen wrote: > > > On 2015/01/26 19:45:48, nednguyen wrote: > > > > Instead of making overhead_level sometimes object, sometimes string, you > > can > > > > just make it TracingCategoryFilter type only. > > > > > > > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > > > > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object > > > instead > > > > of enum. > > > > > > > > For this change though, can you please make a different patch sent to me > for > > > the > > > > ease of reviewing? > > > > > > I think these are better as strings because we cannot just use global > objects > > > that get reused. The default objects need to be new option objects per TBM > > > instance because those option objects are actually modifiable. The new > > > "extra_category_filters" that was added actually do modify the > category_filter > > > object right afterwards. > > > > > > If we were going to replace the strings as types we would probably replace > the > > > NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a > string, > > > or various functions such as NoOverheadLevelFilterCategory(). I'm not sure > if > > > replacing the strings with functions is going to be cleaner in the long run. > > > > You are right. I thought TracingCategoryFilter is immutable but it isn't. For > > now, please add assertion that overhead_level either in ALL_OVERHEAD_LEVELS or > > it is of type TracingCategoryFilter. > > I think the assertion on line 215 handles this case. I will add a readable > assertion text. Let move it here instead. Raise error as early as possible.
ping epenner, vmiura for gpu_times.py and gpu_timeline* changes. https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... File tools/telemetry/telemetry/web_perf/timeline_based_measurement.py (right): https://codereview.chromium.org/854833003/diff/260001/tools/telemetry/telemet... tools/telemetry/telemetry/web_perf/timeline_based_measurement.py:149: that may be removed. On 2015/01/26 22:51:41, nednguyen wrote: > On 2015/01/26 22:37:38, David Yen wrote: > > On 2015/01/26 22:26:51, nednguyen wrote: > > > On 2015/01/26 20:01:48, David Yen wrote: > > > > On 2015/01/26 19:45:48, nednguyen wrote: > > > > > Instead of making overhead_level sometimes object, sometimes string, > you > > > can > > > > > just make it TracingCategoryFilter type only. > > > > > > > > > > So NO_OVERHEAD_LEVEL, V8_OVERHEAD_LEVEL, MINIMAL_OVERHEAD_LEVEL, > > > > > DEBUG_OVERHEAD_LEVEL will be the 3 default TracingCategoryFilter object > > > > instead > > > > > of enum. > > > > > > > > > > For this change though, can you please make a different patch sent to me > > for > > > > the > > > > > ease of reviewing? > > > > > > > > I think these are better as strings because we cannot just use global > > objects > > > > that get reused. The default objects need to be new option objects per TBM > > > > instance because those option objects are actually modifiable. The new > > > > "extra_category_filters" that was added actually do modify the > > category_filter > > > > object right afterwards. > > > > > > > > If we were going to replace the strings as types we would probably replace > > the > > > > NO_OVERHEAD_LEVEL...etc. variables with either a function that takes a > > string, > > > > or various functions such as NoOverheadLevelFilterCategory(). I'm not > sure > > if > > > > replacing the strings with functions is going to be cleaner in the long > run. > > > > > > You are right. I thought TracingCategoryFilter is immutable but it isn't. > For > > > now, please add assertion that overhead_level either in ALL_OVERHEAD_LEVELS > or > > > it is of type TracingCategoryFilter. > > > > I think the assertion on line 215 handles this case. I will add a readable > > assertion text. > > Let move it here instead. Raise error as early as possible. Done.
LGTM with nits. Only one real issue regarding thread_duration and the rest are just nits. https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/60001/tools/perf/metrics/gpu_t... tools/perf/metrics/gpu_timeline.py:129: for buffer_swap_event in buffer_swap_events: I also just noticed that you calculate stats on the frames (min/max/ave/std-dev) above, which is nice, and explains why you need a bit more processing. Cool. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:20: TRACKED_NAMES = { 'RenderCompositor': 'render_compositor', Optional nit: Could this be something like TRACKED_GL_CONTEXT_NAMES? I was briefly unclear whether we were looking at client-side traces. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration or event.duration) I'd be tempted to just not report anything if thread_duration isn't supported. If you want to keep clock times, it would be good to: - Report a different name when using clock times - Check if thread_duration is 'None' instead (it's possible for thread_duration to be zero and duration to be non-zero, since the clock granularity could be different). - Might be nice to add a sanity check that 'if any thread_duration is None, all thread_durations are None'. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:76: if durations: Nit: Looks like we should either add or not add all results here. Can you make it an empty list and still add it? https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:99: mean_frame - Mean time each frame is calculated to be. Nit: I think this name might have changed in the code below. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:123: current_tracked_service_events = collections.defaultdict(list) Optional Nit: current_frame_service_events, or current_frame_tracked_service_events or just a comment that these track the traces within the current frame. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:142: Optional nit: The alternating if blocks are almost the same. Could it be less code, like a for loop over two with variables in arrays, or a function? But if you think it would make this less clear you can forget this.
https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:20: TRACKED_NAMES = { 'RenderCompositor': 'render_compositor', On 2015/01/27 22:06:38, epenner wrote: > Optional nit: Could this be something like TRACKED_GL_CONTEXT_NAMES? I was > briefly unclear whether we were looking at client-side traces. Done. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration or event.duration) On 2015/01/27 22:06:38, epenner wrote: > I'd be tempted to just not report anything if thread_duration isn't supported. > If you want to keep clock times, it would be good to: > - Report a different name when using clock times > - Check if thread_duration is 'None' instead (it's possible for thread_duration > to be zero and duration to be non-zero, since the clock granularity could be > different). > - Might be nice to add a sanity check that 'if any thread_duration is None, all > thread_durations are None'. This was to simplify the calculation of both synchronous and asynchronous slices. Asynchronous slices do not have the thread_duration, only the duration. This is why we need to fall back to the duration if the thread_duration does not exist. Currently the CPU side traces use synchronous markers and the GPU side traces use asynchronous markers. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:76: if durations: On 2015/01/27 22:06:38, epenner wrote: > Nit: Looks like we should either add or not add all results here. Can you make > it an empty list and still add it? list_of_scalar_values requires a non-empty list, I added this after I saw it was throwing an exception. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:99: mean_frame - Mean time each frame is calculated to be. On 2015/01/27 22:06:38, epenner wrote: > Nit: I think this name might have changed in the code below. Updated description to match the new return values. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:123: current_tracked_service_events = collections.defaultdict(list) On 2015/01/27 22:06:38, epenner wrote: > Optional Nit: current_frame_service_events, or > current_frame_tracked_service_events or just a comment that these track the > traces within the current frame. Done. https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:142: On 2015/01/27 22:06:38, epenner wrote: > Optional nit: The alternating if blocks are almost the same. Could it be less > code, like a for loop over two with variables in arrays, or a function? But if > you think it would make this less clear you can forget this. Yeah originally I had it trying to share more code but it got confusing pretty quickly.
https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration or event.duration) Minimally, any detection of thread_duration not existing should use event.thread_duration == None as opposed to the current code which detects either None or '0' I believe (and '0' might have non-zero clock duration). Switching between thread_duration/duration changes the meaning of the result, so IMO it would be better to have a different name, or for the result to not exist when there is no thread_durations, or to just use clock duration all the time. In terms of the current benchmark, isn't everything async slices delivered via the command stream? When would CPU time actually be available? Last, it seems like we could actually report thread_duration if the async slice begins/ends on the same thread... No need for that now, but would that make this consistently work with thread_durations?
> Last, it seems like we could actually report thread_duration if the async slice > begins/ends on the same thread... No need for that now, but would that make this > consistently work with thread_durations? I believe we are using async slices only for GPU side events, so the CPU thread_duration isn't meaningful in those cases.
On 2015/01/27 23:03:09, vmiura wrote: > > Last, it seems like we could actually report thread_duration if the async > slice > > begins/ends on the same thread... No need for that now, but would that make > this > > consistently work with thread_durations? > > I believe we are using async slices only for GPU side events, so the CPU > thread_duration isn't meaningful in those cases. I see. So thread_duration is only expected to be 'None' for GPU events where we are measuring GPU-time? As long as we use 'None' to detect this, then it's okay with me. This is less important, but some platforms don't support thread_duration, so those will also be None for all traces and we'll get clock times on those platforms. We could detect this at a higher level and zero out all our results, but it's not a big deal IMO.
https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... File tools/perf/metrics/gpu_timeline.py (right): https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration or event.duration) On 2015/01/27 22:59:02, epenner wrote: > Minimally, any detection of thread_duration not existing should use > event.thread_duration == None as opposed to the current code which detects > either None or '0' I believe (and '0' might have non-zero clock duration). > > Switching between thread_duration/duration changes the meaning of the result, so > IMO it would be better to have a different name, or for the result to not exist > when there is no thread_durations, or to just use clock duration all the time. > In terms of the current benchmark, isn't everything async slices delivered via > the command stream? When would CPU time actually be available? > > Last, it seems like we could actually report thread_duration if the async slice > begins/ends on the same thread... No need for that now, but would that make this > consistently work with thread_durations? I've split this up into 2 functions now, so CPU events only use event.thread_duration and GPU events only use event.duration. Does that sound reasonable?
On 2015/01/27 23:20:00, David Yen wrote: > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > File tools/perf/metrics/gpu_timeline.py (right): > > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration or > event.duration) > On 2015/01/27 22:59:02, epenner wrote: > > Minimally, any detection of thread_duration not existing should use > > event.thread_duration == None as opposed to the current code which detects > > either None or '0' I believe (and '0' might have non-zero clock duration). > > > > Switching between thread_duration/duration changes the meaning of the result, > so > > IMO it would be better to have a different name, or for the result to not > exist > > when there is no thread_durations, or to just use clock duration all the time. > > In terms of the current benchmark, isn't everything async slices delivered via > > the command stream? When would CPU time actually be available? > > > > Last, it seems like we could actually report thread_duration if the async > slice > > begins/ends on the same thread... No need for that now, but would that make > this > > consistently work with thread_durations? > > I've split this up into 2 functions now, so CPU events only use > event.thread_duration and GPU events only use event.duration. Does that sound > reasonable? SGTM
On 2015/01/27 23:22:03, epennerAtGoogle wrote: > On 2015/01/27 23:20:00, David Yen wrote: > > > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > > File tools/perf/metrics/gpu_timeline.py (right): > > > > > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > > tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration > or > > event.duration) > > On 2015/01/27 22:59:02, epenner wrote: > > > Minimally, any detection of thread_duration not existing should use > > > event.thread_duration == None as opposed to the current code which detects > > > either None or '0' I believe (and '0' might have non-zero clock duration). > > > > > > Switching between thread_duration/duration changes the meaning of the > result, > > so > > > IMO it would be better to have a different name, or for the result to not > > exist > > > when there is no thread_durations, or to just use clock duration all the > time. > > > In terms of the current benchmark, isn't everything async slices delivered > via > > > the command stream? When would CPU time actually be available? > > > > > > Last, it seems like we could actually report thread_duration if the async > > slice > > > begins/ends on the same thread... No need for that now, but would that make > > this > > > consistently work with thread_durations? > > > > I've split this up into 2 functions now, so CPU events only use > > event.thread_duration and GPU events only use event.duration. Does that sound > > reasonable? > > SGTM Oh, last note on that. On some platforms, some operations won't work when thread_duration is None. Just ensure that always gets converted to zero to avoid bot redness.
On 2015/01/27 23:23:57, epennerAtGoogle wrote: > On 2015/01/27 23:22:03, epennerAtGoogle wrote: > > On 2015/01/27 23:20:00, David Yen wrote: > > > > > > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > > > File tools/perf/metrics/gpu_timeline.py (right): > > > > > > > > > https://codereview.chromium.org/854833003/diff/430001/tools/perf/metrics/gpu_... > > > tools/perf/metrics/gpu_timeline.py:30: event_times = [(event.thread_duration > > or > > > event.duration) > > > On 2015/01/27 22:59:02, epenner wrote: > > > > Minimally, any detection of thread_duration not existing should use > > > > event.thread_duration == None as opposed to the current code which detects > > > > either None or '0' I believe (and '0' might have non-zero clock duration). > > > > > > > > Switching between thread_duration/duration changes the meaning of the > > result, > > > so > > > > IMO it would be better to have a different name, or for the result to not > > > exist > > > > when there is no thread_durations, or to just use clock duration all the > > time. > > > > In terms of the current benchmark, isn't everything async slices delivered > > via > > > > the command stream? When would CPU time actually be available? > > > > > > > > Last, it seems like we could actually report thread_duration if the async > > > slice > > > > begins/ends on the same thread... No need for that now, but would that > make > > > this > > > > consistently work with thread_durations? > > > > > > I've split this up into 2 functions now, so CPU events only use > > > event.thread_duration and GPU events only use event.duration. Does that > sound > > > reasonable? > > > > SGTM > > Oh, last note on that. On some platforms, some operations won't work when > thread_duration is None. Just ensure that always gets converted to zero to avoid > bot redness. Done and added comments explaining this behavior.
The CQ bit was checked by dyen@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/854833003/490001
Message was sent while issue was closed.
Committed patchset #26 (id:490001)
Message was sent while issue was closed.
Patchset 26 (id:??) landed as https://crrev.com/dfb418e345ac741c380a5570b386a5bfad1c6115 Cr-Commit-Position: refs/heads/master@{#313419}
Message was sent while issue was closed.
nduca@chromium.org changed reviewers: + nduca@chromium.org
Message was sent while issue was closed.
why is this in tools/perf? all tests should produce gpu times
Message was sent while issue was closed.
On 2015/02/05 16:41:14, nduca wrote: > why is this in tools/perf? all tests should produce gpu times Enabling GPU tracing has significant overhead so we cannot enable it on every test. |