tools/perf/metrics/gpu_timeline.py - Issue 854833003: Added GPU performance metrics.

Side by Side Diff: tools/perf/metrics/gpu_timeline.py

Issue 854833003: Added GPU performance metrics. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Removed gpu_device check in gpu_times_unittest Created 5 years, 11 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 # Copyright 2015 The Chromium Authors. All rights reserved.

	2 # Use of this source code is governed by a BSD-style license that can be

	3 # found in the LICENSE file.

	4 import collections

	5 import math

	6

	7 from telemetry.timeline import async_slice as async_slice_module

	8 from telemetry.timeline import slice as slice_module

	9 from telemetry.value import scalar

	10 from telemetry.web_perf.metrics import timeline_based_metric

	11

	12 TOPLEVEL_GL_CATEGORY = 'gpu_toplevel'

	13 TOPLEVEL_SERVICE_CATEGORY = 'disabled-by-default-gpu.service'

	14 TOPLEVEL_DEVICE_CATEGORY = 'disabled-by-default-gpu.device'

	15

	16 FRAME_END_MARKER = ('gpu', 'GLES2DecoderImpl::DoSwapBuffers')
	epenner 2015/01/16 01:17:59 Two things on this trace: First, is there no GPU- Two things on this trace: First, is there no GPU-side trace for this? That would be the best, as there is some latency between this and when this actually occurs on the GPU's timeline.. But over many frames it's fine, so not a big deal. Second, I believe this is never called on some platforms. In thread-times, I tried this, as well as adding "RealSwapBuffers" traces, but even that didn't work on all platforms (now I'm having trouble remembering which, I think it was Mac), so I switched to compositor swaps. You might want to do the same, or dig into the remaining platforms to find an equivalent trace. vmiura 2015/01/16 01:41:15 We should be able to add a "DoSwapBuffers" gpu tra We should be able to add a "DoSwapBuffers" gpu trace which drops into gpu.service & gpu.device categories. David Yen 2015/01/16 22:58:04 Done here: https://codereview.chromium.org/7997530 Show quoted text On 2015/01/16 01:41:15, vmiura wrote: > We should be able to add a "DoSwapBuffers" gpu trace which drops into > gpu.service & gpu.device categories. Done here: https://codereview.chromium.org/799753009/ Will switch to use that instead once the CL lands. David Yen 2015/01/20 19:37:26 Done. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:15, vmiura wrote: > > We should be able to add a "DoSwapBuffers" gpu trace which drops into > > gpu.service & gpu.device categories. > > Done here: https://codereview.chromium.org/799753009/ > > Will switch to use that instead once the CL lands. > Done.
	17

	18 TRACKED_NAMES = { 'RenderCompositor': 'render_compositor',

	19 'Compositor': 'compositor' }

	20

	21 GPU_SERVICE_DEVICE_VARIANCE = 5
	epenner 2015/01/16 01:17:59 See below, I'm not sure this threshold is really n See below, I'm not sure this threshold is really needed. Perhaps it might reduce some noise, but I don't believe we should fail if we exceed this. It seems to me it will just cause the test to flake occasionally. vmiura 2015/01/16 01:41:14 Perhaps if we have the "swap" trace on the GPU we Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & GPU tracks, and remove this matching. David Yen 2015/01/16 22:58:04 Yes, once the Swap trace is plumbed through we won Show quoted text On 2015/01/16 01:41:14, vmiura wrote: > Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & GPU > tracks, and remove this matching. Yes, once the Swap trace is plumbed through we won't need this anymore. David Yen 2015/01/20 19:37:26 Done. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:14, vmiura wrote: > > Perhaps if we have the "swap" trace on the GPU we don't need to match CPU & > GPU > > tracks, and remove this matching. > > Yes, once the Swap trace is plumbed through we won't need this anymore. Done.
	22

	23

	24 class GPUTimelineMetric(timeline_based_metric.TimelineBasedMetric):

	25 """Computes GPU based metrics."""

	26

	27 def __init__(self):

	28 super(GPUTimelineMetric, self).__init__()

	29

	30 def AddResults(self, model, _, interaction_records, results):

	31 service_times = self._CalculateGPUTimelineData(model)

	32 for name, durations in service_times.iteritems():

	33 count = len(durations)

	34 avg = 0.0

	35 stddev = 0.0

	36 maximum = 0.0

	37 if count:

	38 avg = sum(durations) / count

	39 stddev = math.sqrt(sum((d - avg) ** 2 for d in durations) / count)

	40 maximum = max(durations)

	41

	42 results.AddValue(scalar.ScalarValue(results.current_page,

	43 name + '_max', 'ms', maximum))

	44 results.AddValue(scalar.ScalarValue(results.current_page,

	45 name + '_avg', 'ms', avg))

	46 results.AddValue(scalar.ScalarValue(results.current_page,

	47 name + '_stddev', 'ms', stddev))

	48

	49 def _CalculateGPUTimelineData(self, model):

	50 """Uses the model and calculates the times for various values for each

	51 frame. The return value will be a dictionary of the following format:

	52 {

	53 EVENT_NAME1: [FRAME0_TIME, FRAME1_TIME...etc.],

	54 EVENT_NAME2: [FRAME0_TIME, FRAME1_TIME...etc.],

	55 }

	56

	57 Event Names:

	58 total_frame - Total time each frame is calculated to be.

	59 total_gpu_service: Total time the GPU service took per frame.

	60 total_gpu_device: Total time the GPU device took per frame.

	61 TRACKED_NAMES_service: Using the TRACKED_NAMES dictionary, we include
	epenner 2015/01/16 01:17:59 It took me a minute to parse what these mean. Does It took me a minute to parse what these mean. Does total_frame mean the total clock time for a frame? If so, I would call this 'mean_frame_time' for consistency with other metrics. Does 'service' mean the CPU time spent in chrome's GPU service? Or perhaps cpu time in the driver? And does 'device' mean the real GPU time? If so, I would find 'dispatch' or 'cpu-dispatch' or 'service-cpu' more clear. Also, have you considered thread_duration instead of duration for the cpu-time in the service? A clock duration will often lack meaning for anything but mean_frame_time/fps, as several unrelated things contribute to it (descheduling and blocking on the GPU). vmiura 2015/01/16 01:41:14 Currently the 'gpu.service' traces are using the t Currently the 'gpu.service' traces are using the the "asynchronous traces" (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the 'thread_duration' numbers. We can look into making these normal traces. It may violate trace stacking, but I think Tracing can cope with that. David Yen 2015/01/16 22:58:04 I've changed the Traces to use normal traces inste Show quoted text On 2015/01/16 01:41:14, vmiura wrote: > Currently the 'gpu.service' traces are using the the "asynchronous traces" > (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the > 'thread_duration' numbers. > > We can look into making these normal traces. It may violate trace stacking, but > I think Tracing can cope with that. I've changed the Traces to use normal traces instead of async traces here: https://codereview.chromium.org/855653003/ After that lands I will use convert the values to use thread_duration. I've also changed the names a bit to make it more clear. David Yen 2015/01/20 19:37:26 Done. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:41:14, vmiura wrote: > > Currently the 'gpu.service' traces are using the the "asynchronous traces" > > (TRACE_EVENT_COPY_ASYNC_BEGIN/TRACE_EVENT_COPY_ASYNC_END) which don't have the > > 'thread_duration' numbers. > > > > We can look into making these normal traces. It may violate trace stacking, > but > > I think Tracing can cope with that. > > I've changed the Traces to use normal traces instead of async traces here: > https://codereview.chromium.org/855653003/ > > After that lands I will use convert the values to use thread_duration. > > I've also changed the names a bit to make it more clear. Done.
	62 service traces per frame for the tracked name.

	63 TRACKED_NAMES_device: Using the TRACKED_NAMES dictionary, we include

	64 device traces per frame for the tracked name.

	65 """

	66 service_events = []

	67 device_events = []

	68 buffer_swap_events = []

	69

	70 for event in model.IterAllEvents():

	71 if isinstance(event, slice_module.Slice):

	72 if (event.category, event.name) == FRAME_END_MARKER:

	73 buffer_swap_events.append(event)

	74 elif isinstance(event, async_slice_module.AsyncSlice):

	75 if event.thread_start:
	epenner 2015/01/16 01:17:59 This is the only use of thread_start, are you sure This is the only use of thread_start, are you sure this is what you want? David Yen 2015/01/16 22:58:04 I was going to look into this later, but the Async Show quoted text On 2015/01/16 01:17:59, epenner wrote: > This is the only use of thread_start, are you sure this is what you want? I was going to look into this later, but the AsyncSlices are inserted twice for some reason. The only different was the thread_start is set for one of them, and the other one is None. So this was only to filter out one of the extra ones. Now that we are not using Async traces anymore this is irrelevant, but we should still figure out why there are multiple async traces.
	76 if event.args.get('gl_category', None) == TOPLEVEL_GL_CATEGORY:

	77 if event.category == TOPLEVEL_SERVICE_CATEGORY:

	78 service_events.append(event)

	79 elif event.category == TOPLEVEL_DEVICE_CATEGORY:

	80 device_events.append(event)

	81

	82 # Some platforms do not support GPU device tracing, fill in empty values.

	83 no_device_traces = False

	84 if service_events and not device_events:

	85 device_events = [async_slice_module.AsyncSlice(TOPLEVEL_DEVICE_CATEGORY,

	86 event.name, 0)

	87 for event in service_events]

	88 no_device_traces = True

	89

	90 # Allow some variance in the number of service and device events, depending

	91 # on when the tracing stopped the device trace could not have come back yet.
	epenner 2015/01/16 01:17:59 Another reason for this could be that the device e Another reason for this could be that the device events are likely delayed by 1-2 frames, so if these events were previously clipped against a time-range, we might lose 1-2 GPU frames at the right side. It's difficult to guarantee that the traces will correlate exactly, so I'm inclined to just accept the noise. David Yen 2015/01/16 22:58:04 This is no longer relevant and will be removed onc Show quoted text On 2015/01/16 01:17:59, epenner wrote: > Another reason for this could be that the device events are likely delayed by > 1-2 frames, so if these events were previously clipped against a time-range, we > might lose 1-2 GPU frames at the right side. It's difficult to guarantee that > the traces will correlate exactly, so I'm inclined to just accept the noise. This is no longer relevant and will be removed once the new buffer swap traces lands. David Yen 2015/01/20 19:37:26 Done. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > Another reason for this could be that the device events are likely delayed by > > 1-2 frames, so if these events were previously clipped against a time-range, > we > > might lose 1-2 GPU frames at the right side. It's difficult to guarantee that > > the traces will correlate exactly, so I'm inclined to just accept the noise. > > This is no longer relevant and will be removed once the new buffer swap traces > lands. Done.
	92 if len(service_events) > len(device_events):

	93 event_difference = len(service_events) - len(device_events)

	94 if event_difference <= GPU_SERVICE_DEVICE_VARIANCE:

	95 service_events = service_events[:-event_difference]

	96

	97 # Group together GPU events and validate that the markers match.

	98 assert len(service_events) == len(device_events), (
	epenner 2015/01/16 01:17:59 It seems like we should either have a hard '==' in It seems like we should either have a hard '==' in the assertion (without the event_difference threshold), or just accept this as a small source of noise, and add a comment somewhere about this limitation. I'm worried that this would flake depending on the number of gpu events per frame. David Yen 2015/01/16 22:58:04 Before we were using the BufferSwap trace that onl Show quoted text On 2015/01/16 01:17:59, epenner wrote: > It seems like we should either have a hard '==' in the assertion (without the > event_difference threshold), or just accept this as a small source of noise, and > add a comment somewhere about this limitation. I'm worried that this would flake > depending on the number of gpu events per frame. Before we were using the BufferSwap trace that only happens on the CPU so it was important that the CPU/GPU traces matched up perfectly (if anything for the zip call below). Once the SwapBuffer traces gets plumbed through to the device traces most of this will be unnecessary though. David Yen 2015/01/20 19:37:26 Done. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > It seems like we should either have a hard '==' in the assertion (without the > > event_difference threshold), or just accept this as a small source of noise, > and > > add a comment somewhere about this limitation. I'm worried that this would > flake > > depending on the number of gpu events per frame. > > Before we were using the BufferSwap trace that only happens on the CPU so it was > important that the CPU/GPU traces matched up perfectly (if anything for the zip > call below). Once the SwapBuffer traces gets plumbed through to the device > traces most of this will be unnecessary though. Done.
	99 'Mismatching number of GPU Service (%s) and Device events (%s).' %

	100 (len(service_events), len(device_events)))

	101

	102 service_events_dict = collections.defaultdict(list)

	103 for event in service_events:

	104 service_events_dict[event.name].append(event)

	105

	106 device_events_dict = collections.defaultdict(list)

	107 for event in device_events:

	108 device_events_dict[event.name].append(event)

	109

	110 assert set(service_events_dict.keys()) == set(device_events_dict.keys()), (

	111 'Mismatching event names between GPU Service and Device events.')

	112

	113 gpu_events = []

	114 for event_name in service_events_dict:

	115 service_events_list = service_events_dict[event_name]

	116 device_events_list = device_events_dict[event_name]

	117 assert len(service_events_list) == len(device_events_list), (

	118 'GPU service event (%s) does not correspond with all device events.' %

	119 (event_name))

	120

	121 gpu_events.extend(zip(service_events_list, device_events_list))

	122

	123 gpu_events.sort(key=lambda events: events[0].start)

	124

	125 # Utilize Swap Buffer event to separate out gpu events by frames.

	126 gpu_events_by_frame = []

	127 gpu_event_iter = iter(gpu_events)

	128 current_frame = []

	129 for buffer_swap_event in buffer_swap_events:
	epenner 2015/01/16 01:17:59 Couple minor things: Firstly, this looks to be M* Couple minor things: Firstly, this looks to be MN. Perhaps not a big deal, but do we even need to group gpu events into frames? Given the GPU's latency, the frame grouping don't be perfect. Also, it seems like given the metrics you want, you just need something simple like this? time_per_frame = sum(times_events) / count(frame_events) Or am I missing something else that you want to calculate by grouping all the traces into frames? David Yen* 2015/01/16 22:58:04 Before I only had SwapBuffer traces on the CPU sid Show quoted text On 2015/01/16 01:17:59, epenner wrote: > Couple minor things: > > Firstly, this looks to be MN. Perhaps not a big deal, but do we even need to > group gpu events into frames? Given the GPU's latency, the frame grouping don't > be perfect. Also, it seems like given the metrics you want, you just need > something simple like this? > > time_per_frame = sum(times_events) / count(frame_events) > > Or am I missing something else that you want to calculate by grouping all the > traces into frames? Before I only had SwapBuffer traces on the CPU side so it was impossible to figure out the start and end for a frame on the GPU side. The grouping was just to figure out the start and end of a frame on the GPU device side. After the SwapBuffer trace CL lands this will be simplified and I can remove this grouping step. David Yen* 2015/01/20 19:37:26 This is all removed now. Show quoted text On 2015/01/16 22:58:04, David Yen wrote: > On 2015/01/16 01:17:59, epenner wrote: > > Couple minor things: > > > > Firstly, this looks to be MN. Perhaps not a big deal, but do we even need to > > group gpu events into frames? Given the GPU's latency, the frame grouping > don't > > be perfect. Also, it seems like given the metrics you want, you just need > > something simple like this? > > > > time_per_frame = sum(times_events) / count(frame_events) > > > > Or am I missing something else that you want to calculate by grouping all the > > traces into frames? > > Before I only had SwapBuffer traces on the CPU side so it was impossible to > figure out the start and end for a frame on the GPU side. The grouping was just > to figure out the start and end of a frame on the GPU device side. After the > SwapBuffer trace CL lands this will be simplified and I can remove this grouping > step. This is all removed now. epenner* 2015/01/27 22:06:38 I also just noticed that you calculate stats on th I also just noticed that you calculate stats on the frames (min/max/ave/std-dev) above, which is nice, and explains why you need a bit more processing. Cool.
	130 for gpu_event in gpu_event_iter:

	131 service_event, device_event = gpu_event

	132 if service_event.end <= buffer_swap_event.end:

	133 current_frame.append(gpu_event)

	134 else:

	135 if current_frame:

	136 gpu_events_by_frame.append(current_frame)

	137 current_frame = [gpu_event]

	138 break

	139

	140 current_frame.extend([gpu_event for gpu_event in gpu_event_iter])

	141 if current_frame:

	142 gpu_events_by_frame.append(current_frame)

	143

	144 # Calculate service times that we care about.

	145 total_frame_times = []

	146 gpu_service_times = []

	147 gpu_device_times = []

	148 tracked_times = {}

	149

	150 tracked_times.update(dict([(value + "_service", [])

	151 for value in TRACKED_NAMES.itervalues()]))

	152 tracked_times.update(dict([(value + "_device", [])

	153 for value in TRACKED_NAMES.itervalues()]))

	154

	155 if gpu_events:

	156 first_service_event, _ = gpu_events[0]

	157 prev_frame_end = first_service_event.start

	158 else:

	159 prev_frame_end = 0

	160

	161 for frame_gpu_events in gpu_events_by_frame:

	162 last_service_in_frame, _ = frame_gpu_events[-1]

	163

	164 total_frame_time = last_service_in_frame.end - prev_frame_end

	165 prev_frame_end = last_service_in_frame.end

	166

	167 total_gpu_service_time = 0

	168 total_gpu_device_time = 0

	169 tracked_markers = collections.defaultdict(lambda : 0)

	170 for service_event, device_event in frame_gpu_events:

	171 service_time = service_event.end - service_event.start

	172 device_time = device_event.end - device_event.start

	173 total_gpu_service_time += service_time

	174 total_gpu_device_time += device_time

	175

	176 base_name = service_event.name

	177 dash_index = base_name.rfind('-')

	178 if dash_index != -1:

	179 base_name = base_name[:dash_index]

	180

	181 tracked_name = TRACKED_NAMES.get(base_name, None)

	182 if tracked_name:

	183 tracked_markers[tracked_name + '_service'] += service_time

	184 tracked_markers[tracked_name + '_device'] += device_time

	185

	186 total_frame_times.append(total_frame_time)

	187 gpu_service_times.append(total_gpu_service_time)

	188 gpu_device_times.append(total_gpu_device_time)

	189

	190 for tracked_name in TRACKED_NAMES.values():

	191 service_name = tracked_name + '_service'

	192 device_name = tracked_name + '_device'

	193 tracked_times[service_name].append(tracked_markers[service_name])

	194 tracked_times[device_name].append(tracked_markers[device_name])

	195

	196 # Create the service times dictionary.

	197 service_times = { 'total_frame': total_frame_times,

	198 'total_gpu_service': gpu_service_times,

	199 'total_gpu_device': gpu_device_times }

	200 service_times.update(tracked_times)

	201

	202 # Remove device metrics if no device traces were found.

	203 if no_device_traces:

	204 for device_name in [name

	205 for name in service_times.iterkeys()

	206 if name.endswith('_device')]:

	207 service_times.pop(device_name)

	208

	209 return service_times

OLD	NEW

« no previous file with comments | « tools/perf/measurements/gpu_times_unittest.py ('k') | tools/perf/metrics/gpu_timeline_unittest.py » ('j') | no next file with comments »