|
|
DescriptionAdd tracing to the SkCanvas drawFoo() methods to find long draw ops.
BUG=skia:3088
Committed: https://skia.googlesource.com/skia/+/8f757f540a8378c7b1354aab3d4650eaa920b17a
Patch Set 1 #
Total comments: 2
Patch Set 2 : canvas-tracing: fixmac #Messages
Total messages: 38 (7 generated)
danakj@chromium.org changed reviewers: + bsalomon@google.com, reed@google.com
These traces were essential to track down why raster is slow in https://code.google.com/p/chromium/issues/detail?id=428296 . And they will be useful more in the future whenever we have a slow raster bug. Being able to narrow it down in an instant by turning on the skia category in tracing will be super valuable and save many hours.
lgtm, but wondering if we want this in the canvas, device, or both.
what is the perf overhead of the calls? Is this of any real value in the GPU case? Also, when the canvas is subclassed, having the tracers here may not do anything. In general I am reluctant to add this, as it may end up being in lots of places for speculative value. I have not had too much pain using profilers instead.
reed@google.com changed required reviewers: + reed@google.com
I wonder if this info can be gotten via a custom canvas that performs the timing.
On 2014/11/03 21:38:23, reed1 wrote: > what is the perf overhead of the calls? Is this of any real value in the GPU > case? Also, when the canvas is subclassed, having the tracers here may not do > anything. In general I am reluctant to add this, as it may end up being in lots > of places for speculative value. I have not had too much pain using profilers > instead. The idea is that any chrome developer can just check "skia" in the about:tracing and immediately see what's going on. Otherwise everytime we get a "Raster is slow" but we have to go and insert a bunch of traces and spend a lot of time narrowing things down. When traces are not enabled the overhead is approximately nothing. If there's other places we should add them to catch subclasses I'm happy to add those too. > I wonder if this info can be gotten via a custom canvas that performs the > timing. The idea is to provide the output to about:tracing so it's part of the standard chrome debugging toolkit. It's not precise numbers that is important here but tool integration.
On 2014/11/04 15:47:16, danakj wrote: > On 2014/11/03 21:38:23, reed1 wrote: > > what is the perf overhead of the calls? Is this of any real value in the GPU > > case? Also, when the canvas is subclassed, having the tracers here may not do > > anything. In general I am reluctant to add this, as it may end up being in > lots > > of places for speculative value. I have not had too much pain using profilers > > instead. > > The idea is that any chrome developer can just check "skia" in the about:tracing > and immediately see what's going on. Otherwise everytime we get a "Raster is > slow" but we have to go and insert a bunch of traces and spend a lot of time > narrowing things down. When traces are not enabled the overhead is approximately > nothing. > > If there's other places we should add them to catch subclasses I'm happy to add > those too. > > > I wonder if this info can be gotten via a custom canvas that performs the > > timing. > > The idea is to provide the output to about:tracing so it's part of the standard > chrome debugging toolkit. It's not precise numbers that is important here but > tool integration. If we can provide it via a custom canvas, that might better isolate the added complexity and overhead.
On 2014/11/04 15:50:37, reed1 wrote: > On 2014/11/04 15:47:16, danakj wrote: > > On 2014/11/03 21:38:23, reed1 wrote: > > > what is the perf overhead of the calls? Is this of any real value in the GPU > > > case? Also, when the canvas is subclassed, having the tracers here may not > do > > > anything. In general I am reluctant to add this, as it may end up being in > > lots > > > of places for speculative value. I have not had too much pain using > profilers > > > instead. > > > > The idea is that any chrome developer can just check "skia" in the > about:tracing > > and immediately see what's going on. Otherwise everytime we get a "Raster is > > slow" but we have to go and insert a bunch of traces and spend a lot of time > > narrowing things down. When traces are not enabled the overhead is > approximately > > nothing. > > > > If there's other places we should add them to catch subclasses I'm happy to > add > > those too. > > > > > I wonder if this info can be gotten via a custom canvas that performs the > > > timing. > > > > The idea is to provide the output to about:tracing so it's part of the > standard > > chrome debugging toolkit. It's not precise numbers that is important here but > > tool integration. > > If we can provide it via a custom canvas, that might better isolate the added > complexity and overhead. A proxy canvas would handle all subclassing fine, as it would wrap the real canvas (whatever it is).
On 2014/11/04 15:54:31, reed1 wrote: > On 2014/11/04 15:50:37, reed1 wrote: > > On 2014/11/04 15:47:16, danakj wrote: > > > On 2014/11/03 21:38:23, reed1 wrote: > > > > what is the perf overhead of the calls? Is this of any real value in the > GPU > > > > case? Also, when the canvas is subclassed, having the tracers here may not > > do > > > > anything. In general I am reluctant to add this, as it may end up being in > > > lots > > > > of places for speculative value. I have not had too much pain using > > profilers > > > > instead. > > > > > > The idea is that any chrome developer can just check "skia" in the > > about:tracing > > > and immediately see what's going on. Otherwise everytime we get a "Raster is > > > slow" but we have to go and insert a bunch of traces and spend a lot of time > > > narrowing things down. When traces are not enabled the overhead is > > approximately > > > nothing. > > > > > > If there's other places we should add them to catch subclasses I'm happy to > > add > > > those too. > > > > > > > I wonder if this info can be gotten via a custom canvas that performs the > > > > timing. > > > > > > The idea is to provide the output to about:tracing so it's part of the > > standard > > > chrome debugging toolkit. It's not precise numbers that is important here > but > > > tool integration. > > > > If we can provide it via a custom canvas, that might better isolate the added > > complexity and overhead. > > A proxy canvas would handle all subclassing fine, as it would wrap the real > canvas (whatever it is). Any ballpark overhead of a disabled trace? I'm assuming the "disabled" bool stays nice and cached, all branches are predicted away... something in the 10s of ns?
From http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/tracing...: "Trace macros are very low overhead. When tracing is not turned on, trace macros cost at most a few dozen clocks. When running, trace macros cost a few thousand clocks at most."
that sounds like a good result. if our nanobenches reflect that (inc. our picture recording pref) then I have no objections.
On 2014/11/04 17:54:56, enne wrote: > From > http://www.chromium.org/developers/how-tos/trace-event-profiling-tool/tracing...: > > "Trace macros are very low overhead. When tracing is not turned on, trace macros > cost at most a few dozen clocks. When running, trace macros cost a few thousand > clocks at most." There's a *lot* of benchmarks in nanobench, so I tried running a few to be representative? BEFORE: mega:skia [] ((detached from e672e9f))% ./out/Release/nanobench --threads 2 -m picture_playback Timer overhead: 24ns maxrss loops min median mean max stddev samples config bench 46M 1 4.59ms 4.61ms 4.72ms 5.11ms 4% █▄▃▁▁▁▁▁▅▁ 8888 picture_playback_drawPosText 46M 1 4.56ms 4.58ms 4.61ms 4.87ms 2% ▃▁▂▁▁▂█▂▁▁ 565 picture_playback_drawPosText 59M 1 7.58ms 7.68ms 8.78ms 18.2ms 38% ▁▁█▁▁▁▁▁▁▁ gpu picture_playback_drawPosText 59M 1 4.41ms 4.45ms 4.6ms 5.64ms 8% █▁▂▃▁▁▁▁▁▁ 8888 picture_playback_drawPosTextH 59M 1 4.36ms 4.39ms 4.42ms 4.54ms 2% ▁▂▁▂▂▁█▇▃█ 565 picture_playback_drawPosTextH 59M 1 7.62ms 7.8ms 7.79ms 7.93ms 1% ▅█▇▆▄▅▄▁▅▆ gpu picture_playback_drawPosTextH 59M 1 3.42ms 4.02ms 4.77ms 8.36ms 40% █▂▂▁▆▅▁▁▁▁ 8888 picture_playback_drawText 59M 1 3.39ms 3.4ms 3.41ms 3.47ms 1% ▄▂▂▁▂▁█▂▃▂ 565 picture_playback_drawText 59M 1 6.71ms 6.84ms 6.83ms 6.94ms 1% ▁█▃▅▃▅▃▇▅▅ gpu picture_playback_drawText mega:skia [] ((detached from e672e9f))% ./out/Release/nanobench --threads 2 -m Xfermode Timer overhead: 25.3ns maxrss loops min median mean max stddev samples config bench 42M 2 92.8µs 93.4µs 95.7µs 116µs 7% ▂▁▁▁▁▁█▁▁▁ 8888 Xfermode_Luminosity 43M 2 109µs 109µs 111µs 127µs 5% ▁▁▁▁▁▁▁▁▁█ 565 Xfermode_Luminosity 67M 403 12.2µs 12.9µs 14.5µs 29.2µs 36% ▁▁▁▁▁▁█▂▁▁ gpu Xfermode_Luminosity 67M 1 450µs 492µs 491µs 515µs 3% ▆▆▅▅▅▅█▆▆▁ 8888 Xfermode_Color 67M 1 219µs 498µs 442µs 565µs 29% █▇▆▇▅▇▇▅▁▁ 565 Xfermode_Color 69M 384 12.2µs 12.8µs 15µs 25.4µs 34% ▁▂▁▁▇█▁▁▁▁ gpu Xfermode_Color 69M 3 81.7µs 81.8µs 81.8µs 82.4µs 0% █▂▂▂▁▂▂▂▂▃ 8888 Xfermode_Saturation 69M 3 115µs 122µs 121µs 123µs 2% ▁█▇▇▇▇▇▇▇▇ 565 Xfermode_Saturation 72M 405 12.1µs 12.2µs 12.2µs 12.4µs 1% ▁▂▁█▃▂▅▇▂▄ gpu Xfermode_Saturation 72M 1 201µs 204µs 223µs 268µs 13% ▆▆▆█▁▁▁▁▁▁ 8888 Xfermode_Hue 72M 2 197µs 198µs 199µs 207µs 1% ▂▁▁█▂▁▁▁▂▁ 565 Xfermode_Hue 74M 395 12.6µs 12.8µs 12.8µs 13.2µs 1% ▃▄▃▆█▃▃▅▁▁ gpu Xfermode_Hue 74M 4 40.9µs 41µs 41.1µs 41.8µs 1% █▃▃▂▂▂▁▁▂▁ 8888 Xfermode_Multiply 74M 5 54.5µs 54.6µs 54.7µs 55.3µs 0% █▄▂▂▁▁▂▂▁▁ 565 Xfermode_Multiply 76M 395 12.4µs 12.5µs 12.6µs 12.7µs 1% ▄▁▇▂█▄▇▄▄▂ gpu Xfermode_Multiply 76M 3 47.9µs 48.1µs 48.9µs 55.3µs 5% ▂▁█▂▁▁▁▁▁▁ 8888 Xfermode_Exclusion 76M 4 39.8µs 40µs 53.1µs 84.1µs 40% ███▁▁▁▁▁▁▁ 565 Xfermode_Exclusion 78M 397 12.5µs 12.7µs 12.7µs 13.1µs 2% ▁▅█▆▂▃▃▃▄▃ gpu Xfermode_Exclusion 78M 5 37.7µs 37.7µs 38.1µs 41.3µs 3% █▂▁▁▁▁▁▁▁▁ 8888 Xfermode_Difference 78M 6 58.4µs 58.5µs 58.9µs 62.1µs 2% ▂█▂▁▁▁▁▁▁▁ 565 Xfermode_Difference 80M 362 12.6µs 13.1µs 13µs 13.6µs 3% ▁▅▆▁▂▂▄█▆▁ gpu Xfermode_Difference 80M 1 289µs 523µs 443µs 526µs 25% ▁▁▇█████▅▁ 8888 Xfermode_SoftLight 80M 1 293µs 293µs 294µs 298µs 1% ▃▂▂▁▁▁▁▁█▂ 565 Xfermode_SoftLight 80M 76 12.8µs 13µs 13.1µs 13.6µs 2% █▂▁▄▄▃▃▁█▃ gpu Xfermode_SoftLight 80M 6 39.1µs 39.3µs 39.8µs 44.1µs 4% ▂▁▁▁▁▁█▂▁▁ 8888 Xfermode_HardLight 80M 6 49µs 49.1µs 49.4µs 51.5µs 2% ▂▁▁▁▁▁▁█▂▁ 565 Xfermode_HardLight 83M 371 12.7µs 12.9µs 12.9µs 13.1µs 1% ▂▃▄▄██▅▁▄▄ gpu Xfermode_HardLight 83M 4 67.2µs 163µs 136µs 164µs 29% ▁▁▄██▇█▇██ 8888 Xfermode_ColorBurn 83M 2 132µs 133µs 133µs 136µs 1% ▅▃▂▂█▄▂▂▁▁ 565 Xfermode_ColorBurn 84M 385 12.7µs 12.9µs 12.9µs 13.3µs 1% █▆▃▂▂▃▂▂▁▃ gpu Xfermode_ColorBurn 84M 4 56.1µs 56.3µs 56.5µs 58.3µs 1% ▃▁▃▁▁█▂▁▁▁ 8888 Xfermode_ColorDodge 84M 5 63.7µs 63.9µs 64µs 64.9µs 1% ▅▃▁▂▁▁█▃▂▁ 565 Xfermode_ColorDodge 85M 389 12.6µs 12.6µs 12.7µs 13µs 1% ▂▂▁▃▁▂▂▁█▄ gpu Xfermode_ColorDodge 85M 5 29µs 38µs 35.7µs 43.3µs 14% ▆▆▅▅▅█▁▁▃▁ 8888 Xfermode_Lighten 85M 7 42.5µs 42.6µs 42.9µs 45.2µs 2% ▂▂▁▁▁▁▁▁█▂ 565 Xfermode_Lighten 86M 378 12.9µs 13µs 13µs 13.1µs 0% █▅▄▃▇▂▅▇▆▁ gpu Xfermode_Lighten 86M 6 29.8µs 29.9µs 31µs 35.8µs 7% █▅▁▁▁▁▁▃▁▁ 8888 Xfermode_Darken 86M 4 39.5µs 60.8µs 58.9µs 61.9µs 12% ███▁██████ 565 Xfermode_Darken 86M 394 12.1µs 12.4µs 12.3µs 12.4µs 1% ▆▁▇▆▂▄▆▇█▇ gpu Xfermode_Darken 86M 5 40.1µs 40.2µs 41.6µs 45.7µs 5% ▂▁▁▁▁▁▁▆▆█ 8888 Xfermode_Overlay 86M 4 53.2µs 53.3µs 58.4µs 104µs 27% █▁▁▁▁▁▁▁▁▁ 565 Xfermode_Overlay 87M 387 12µs 12.1µs 12.1µs 12.3µs 1% ▄▂▃▆▅▃▄▁▁█ gpu Xfermode_Overlay 87M 9 18.5µs 18.6µs 19.6µs 29µs 17% ▁▁▁▁▁▁▁▁▁█ 8888 Xfermode_Screen 87M 4 56.5µs 65.5µs 64.5µs 66.2µs 5% ▆███▁█▇█▇▇ 565 Xfermode_Screen 87M 7754 641ns 651ns 3.33µs 27.3µs 253% ▁▁▁▁▁▁▁▁█▁ gpu Xfermode_Screen 87M 7 17.9µs 19.1µs 19.7µs 24.2µs 9% ▂▁▅▂▂▂▂▂▂█ 8888 Xfermode_Modulate 87M 7 30.2µs 30.2µs 30.7µs 33.7µs 4% ▂▁▁▁▁▁▁▃█▁ 565 Xfermode_Modulate 87M 7746 626ns 633ns 633ns 643ns 1% ▇▄▁▄▄▃█▃▂▃ gpu Xfermode_Modulate 87M 7 17.3µs 17.6µs 18.4µs 22.4µs 10% ▁█▂▇▂▁▁▁▁▁ 8888 Xfermode_Plus 87M 9 24.6µs 24.6µs 24.7µs 25.5µs 1% ▄▂▁▁▂▁▁▁█▂ 565 Xfermode_Plus 87M 7767 622ns 641ns 640ns 669ns 2% ▃▄▁█▁▄▅▄▃▃ gpu Xfermode_Plus 87M 4 52.6µs 52.7µs 53.6µs 61.1µs 5% ▂▁█▁▁▁▁▁▁▁ 8888 Xfermode_Xor 87M 4 71µs 71.4µs 74.5µs 89.5µs 8% ▄▃█▁▁▁▁▁▁▁ 565 Xfermode_Xor 87M 7871 620ns 631ns 631ns 643ns 1% ▆▇█▄▄▁▅▃▃▁ gpu Xfermode_Xor 87M 6 27.4µs 27.5µs 27.8µs 30.4µs 3% ▂▁▁▁▁▁▁▁▁█ 8888 Xfermode_DstATop 87M 7 38.8µs 38.8µs 38.9µs 39.2µs 0% █▃▂▁▁▁▁▁▂▁ 565 Xfermode_DstATop 87M 7776 625ns 633ns 633ns 645ns 1% █▅▅▃▁▁▄▃▄▅ gpu Xfermode_DstATop 87M 6 28.3µs 29.1µs 31.3µs 39.8µs 13% ▁▁▁▃▃▇█▁▁▁ 8888 Xfermode_SrcATop 87M 7 32.4µs 42.1µs 40.2µs 43.8µs 10% ▆▆▇█▇▇▆█▁▁ 565 Xfermode_SrcATop 87M 8163 484ns 488ns 489ns 495ns 1% █▃▁▃▁▇▂█▅▃ gpu Xfermode_SrcATop 87M 10 7.1µs 7.13µs 7.18µs 7.45µs 2% █▃▂▁▁▁▁▁▇▂ 8888 Xfermode_DstOut 87M 14 16.8µs 16.8µs 16.9µs 17.2µs 1% ▄▂▁█▂▂▁▆▂▁ 565 Xfermode_DstOut 87M 7999 614ns 629ns 628ns 640ns 1% ▄▅▁▁▃▅▆▇█▇ gpu Xfermode_DstOut 87M 9 7.59µs 7.64µs 7.84µs 9.52µs 8% █▂▁▁▁▁▁▁▁▁ 8888 Xfermode_SrcOut 87M 14 17.1µs 17.2µs 17.2µs 17.8µs 1% ▃▂▁▁▁▁▁▁▂█ 565 Xfermode_SrcOut 87M 7616 623ns 646ns 643ns 654ns 1% ▇▆▄▃▅▅▇█▁█ gpu Xfermode_SrcOut 87M 7 14.4µs 15.3µs 15.2µs 16.4µs 4% ▂▄▂▁▅▄▄▃▃█ 8888 Xfermode_DstIn 87M 9 26.1µs 28µs 28µs 29.9µs 4% ▅▄▅▁▆▃▄█▄▄ 565 Xfermode_DstIn 87M 8111 621ns 634ns 631ns 646ns 2% ▁▅▁▁▁▅▅▃██ gpu Xfermode_DstIn 87M 12 7.07µs 7.11µs 7.33µs 8.53µs 7% ▂▁▅▁▁▁█▁▁▁ 8888 Xfermode_SrcIn 87M 14 17.2µs 17.2µs 17.2µs 17.4µs 0% █▃▃▂▁▁▁▁▃█ 565 Xfermode_SrcIn 87M 7800 626ns 647ns 643ns 663ns 2% ▅▅▁▃▁▆█▅▃▆ gpu Xfermode_SrcIn 87M 7 8.04µs 12.2µs 11µs 14.5µs 20% ▆▆▆▆█▂▃▄▁▁ 8888 Xfermode_DstOver 87M 27 240ns 263ns 303ns 681ns 44% ▂▁▁▁▂▁▁▁█▁ 565 Xfermode_DstOver 87M 7971 617ns 633ns 634ns 650ns 1% ▄▄▃▁▄▅▆█▇▅ gpu Xfermode_DstOver 87M 10 3.88µs 3.99µs 4.18µs 5.43µs 11% ▄▂▂▁▂▂▁▁█▁ 8888 Xfermode_SrcOver 87M 25 7.14µs 7.17µs 7.19µs 7.48µs 1% █▂▂▂▁▁▁▁▂▁ 565 Xfermode_SrcOver 87M 5937 624ns 630ns 629ns 636ns 1% ▁▃▅▄▂▃▆▂█▆ gpu Xfermode_SrcOver 87M 37 79.1ns 82.9ns 87.3ns 117ns 14% ▃▁▁▁▅▁█▂▂▁ 8888 Xfermode_Dst 87M 133 72.8ns 73.1ns 73.2ns 73.8ns 0% █▃▁▂▄▄▆▂▃▂ 565 Xfermode_Dst 87M 79716 62.8ns 63.5ns 63.5ns 63.8ns 1% ▅██▄▅▇█▁▅▆ gpu Xfermode_Dst 87M 9 5.48µs 5.77µs 5.88µs 6.79µs 6% ▁▄▃▃▃▃▃█▂▂ 8888 Xfermode_Src 87M 6 51µs 52.9µs 53.3µs 58.5µs 4% ▁▃▃▃▃█▃▃▃▃ 565 Xfermode_Src 87M 7870 632ns 642ns 644ns 663ns 2% █▃▄▆▃▁▁█▂▂ gpu Xfermode_Src 87M 7 3.51µs 7.22µs 6.39µs 14.9µs 57% ▄▄▃▃█▁▁▁▁▁ 8888 Xfermode_Clear 87M 9 27.1µs 27.2µs 27.3µs 27.7µs 1% ▃▁▂▂█▃▁▁▂▂ 565 Xfermode_Clear 87M 7604 637ns 640ns 642ns 656ns 1% ▁▂▁▁█▇▂▂▂▃ gpu Xfermode_Clear
AFTER: mega:skia [] (canvas-tracing)% ./out/Release/nanobench --threads 2 -m picture_playback Timer overhead: 23.2ns maxrss loops min median mean max stddev samples config bench 46M 1 4.6ms 4.64ms 4.76ms 5.46ms 6% █▃▃▁▁▁▁▁▂▁ 8888 picture_playback_drawPosText 46M 1 4.56ms 4.58ms 4.6ms 4.65ms 1% ▂▄▂▂▁▇█▃▂▆ 565 picture_playback_drawPosText 59M 1 7.57ms 7.6ms 8.6ms 17.5ms 36% ▁▁█▁▁▁▁▁▁▁ gpu picture_playback_drawPosText 59M 1 4.42ms 4.43ms 4.66ms 6.43ms 14% █▂▁▁▁▁▁▁▁▁ 8888 picture_playback_drawPosTextH 59M 1 4.39ms 4.4ms 4.4ms 4.44ms 0% ▂▂▃▂▁▁▃█▃▃ 565 picture_playback_drawPosTextH 59M 1 7.43ms 7.54ms 7.55ms 7.73ms 1% ▄▃▅▃▁▃▃▄▄█ gpu picture_playback_drawPosTextH 59M 1 3.46ms 3.47ms 3.73ms 5.31ms 16% █▄▁▁▁▁▁▁▁▁ 8888 picture_playback_drawText 59M 1 3.42ms 3.44ms 3.44ms 3.45ms 0% ▅▅▅▁▄▆▂██▆ 565 picture_playback_drawText 59M 1 6.82ms 6.94ms 6.94ms 7.06ms 1% ▇▃▃▆▅█▆▃▁▃ gpu picture_playback_drawText mega:skia [] (canvas-tracing)% ./out/Release/nanobench --threads 2 -m Xfermode Timer overhead: 22.8ns maxrss loops min median mean max stddev samples config bench 42M 1 163µs 164µs 164µs 168µs 1% █▄▃▂▁▁▂▂▁▁ 8888 Xfermode_Luminosity 43M 2 170µs 170µs 171µs 175µs 1% ▆▂▁▁▁▁▁▁▁█ 565 Xfermode_Luminosity 67M 385 12.7µs 12.9µs 12.9µs 13.1µs 1% ▂▄▆▁▇▄█▃█▄ gpu Xfermode_Luminosity 67M 1 205µs 325µs 297µs 371µs 23% █▁███▆▂▂▃▂ 8888 Xfermode_Color 67M 1 253µs 254µs 255µs 272µs 2% ▂▁▁▁█▁▂▁▁▁ 565 Xfermode_Color 69M 375 13.1µs 13.2µs 13.2µs 13.3µs 0% ▆▅█▄▃▁▃▇▁▂ gpu Xfermode_Color 69M 2 94.6µs 103µs 102µs 106µs 4% ▁▁██▆▆▆▇▆▆ 8888 Xfermode_Saturation 69M 2 126µs 126µs 127µs 133µs 2% ▂▁▁█▂▁▁▁▁▁ 565 Xfermode_Saturation 74M 205 13.2µs 13.3µs 13.3µs 13.4µs 1% ▁▅▅▃█▃▂▁█▁ gpu Xfermode_Saturation 74M 1 210µs 211µs 211µs 213µs 0% █▄▃▂▂▁▄▂▂▂ 8888 Xfermode_Hue 74M 1 235µs 235µs 235µs 236µs 0% █▃▂▁▂▂▁▁▄▂ 565 Xfermode_Hue 77M 379 12.9µs 13.1µs 13.1µs 13.2µs 0% ▃▁▆▂▄▄█▆▆▅ gpu Xfermode_Hue 77M 4 36.4µs 36.5µs 36.6µs 37.4µs 1% █▃▂▁▁▁▁▁▁▁ 8888 Xfermode_Multiply 77M 5 49µs 49.2µs 50µs 54.1µs 3% ▂▁▁█▅▁▁▁▁▁ 565 Xfermode_Multiply 78M 396 12.5µs 12.7µs 12.8µs 13.5µs 3% ▆▅▂▂█▁▁▁▂▁ gpu Xfermode_Multiply 78M 7 27µs 27µs 27.3µs 29.4µs 3% ▂▁▁▁▁█▁▁▁▁ 8888 Xfermode_Exclusion 78M 8 36.5µs 36.6µs 39.5µs 64.6µs 22% █▁▁▁▁▁▁▁▁▁ 565 Xfermode_Exclusion 80M 392 12.6µs 12.7µs 14.6µs 24µs 25% ▁▁▁▁▁▄█▂▃▁ gpu Xfermode_Exclusion 80M 3 37µs 47.4µs 44.7µs 49µs 11% █▇▇▇▇▇▇▂▁▁ 8888 Xfermode_Difference 80M 4 63.1µs 63.3µs 64.1µs 71.4µs 4% ▂▁▁▁▁▁▁▁▁█ 565 Xfermode_Difference 82M 237 13µs 13.3µs 13.2µs 13.3µs 1% ▅▄▆█▂▆▁▇█▅ gpu Xfermode_Difference 82M 1 378µs 531µs 482µs 600µs 23% ████▆▁▁▁▁▁ 8888 Xfermode_SoftLight 82M 1 384µs 385µs 386µs 395µs 1% ▂█▃▂▁▁▁▂▁▁ 565 Xfermode_SoftLight 83M 379 12.8µs 13µs 13µs 13.2µs 1% ▆█▄██▂▃▅▁▄ gpu Xfermode_SoftLight 83M 2 62.9µs 63.2µs 63.5µs 66µs 1% █▃▃▂▂▂▁▂▂▁ 8888 Xfermode_HardLight 83M 3 82.7µs 83.4µs 88.2µs 111µs 10% ▄▂▁▁▁█▄▁▁▁ 565 Xfermode_HardLight 84M 377 13.1µs 13.2µs 13.2µs 13.3µs 1% ▂▂██▄▄▄█▁▄ gpu Xfermode_HardLight 84M 2 78.3µs 78.4µs 78.6µs 80.1µs 1% █▃▂▂▁▂▁▁▁▁ 8888 Xfermode_ColorBurn 84M 3 93.4µs 93.6µs 93.8µs 95.6µs 1% █▂▂▁▁▁▂▁▁▁ 565 Xfermode_ColorBurn 85M 380 12.9µs 13.1µs 13.1µs 13.3µs 1% ▁▄▅▆▇▄▃█▄▄ gpu Xfermode_ColorBurn 85M 3 65.7µs 65.8µs 66.7µs 72.5µs 3% █▁▁▁▁▁▁▃▁▁ 8888 Xfermode_ColorDodge 85M 3 76.4µs 76.4µs 76.8µs 79.1µs 1% ▄▂▁▁▁▁█▁▁▁ 565 Xfermode_ColorDodge 85M 375 12.9µs 13.1µs 13.1µs 13.2µs 1% ▅▇▆█▇▇▅▄▂▁ gpu Xfermode_ColorDodge 85M 4 45.6µs 45.8µs 46µs 47.1µs 1% ▇█▂▂▂▂▂▂▂▁ 8888 Xfermode_Lighten 85M 4 60.3µs 60.6µs 61.3µs 67.4µs 4% ▂▁▁█▁▁▁▁▁▁ 565 Xfermode_Lighten 86M 375 13.2µs 13.3µs 13.3µs 13.4µs 1% ▃▅▄█▄▃▁▂█▁ gpu Xfermode_Lighten 86M 3 48.7µs 48.8µs 49µs 50.7µs 1% █▃▁▂▁▁▂▁▁▁ 8888 Xfermode_Darken 86M 3 32µs 32.1µs 32.2µs 33µs 1% █▄▂▁▂▂▁▂▁▂ 565 Xfermode_Darken 87M 384 12.6µs 12.9µs 12.9µs 13.4µs 2% ▁▃▄▇█▂▁▂▃▃ gpu Xfermode_Darken 87M 3 54µs 54.2µs 55.9µs 69.7µs 9% ▁▁▁▁▁▁▁▁█▂ 8888 Xfermode_Overlay 87M 3 65.4µs 65.5µs 65.7µs 66.7µs 1% █▄▂▂▂▁▁▁▂▂ 565 Xfermode_Overlay 87M 372 12.9µs 13.1µs 13µs 13.1µs 1% ▆█▆▅▁▇▇█▆▇ gpu Xfermode_Overlay 87M 6 28.6µs 28.6µs 28.6µs 29.1µs 1% █▃▂▁▂▁▁▁▁▁ 8888 Xfermode_Screen 87M 6 44.1µs 44.2µs 45.7µs 54.6µs 7% ▁▄▁▁▁▁▁▁█▁ 565 Xfermode_Screen 87M 8718 564ns 575ns 572ns 584ns 1% ▁▃▇▁▁▅▂▆█▅ gpu Xfermode_Screen 87M 3 13µs 31.5µs 27.1µs 43.1µs 36% ▆▅▅▃▁█▁▃▅▅ 8888 Xfermode_Modulate 87M 4 30µs 62.3µs 52.8µs 63µs 27% ▇██▂████▁▁ 565 Xfermode_Modulate 87M 8709 582ns 596ns 592ns 604ns 1% ▂▁▅▁▇▆▆▁█▃ gpu Xfermode_Modulate 87M 6 16.8µs 16.9µs 17.3µs 19.4µs 5% ▂▁█▁▆▁▁▁▁▁ 8888 Xfermode_Plus 87M 10 23.4µs 23.5µs 23.8µs 24.8µs 2% ▃▁▁▁▁█▁▁▁▇ 565 Xfermode_Plus 87M 8679 556ns 557ns 557ns 559ns 0% ▄▃▇▁▁▂▄█▅█ gpu Xfermode_Plus 87M 6 24.4µs 24.4µs 25.3µs 32.8µs 10% ▁▁▁▁▁▁▁▁▁█ 8888 Xfermode_Xor 87M 6 41.2µs 41.3µs 41.6µs 43.6µs 2% ▂█▁▁▁▁▁▁▁▁ 565 Xfermode_Xor 87M 8592 560ns 572ns 570ns 581ns 1% ▃▄▁▅▂▇▁█▅█ gpu Xfermode_Xor 87M 6 22.6µs 22.7µs 23µs 25.7µs 4% ▂▁▁▁▁▁▁▁█▂ 8888 Xfermode_DstATop 87M 8 31.4µs 31.5µs 31.6µs 32.1µs 1% ▄▂▂▁█▁▁▂▁▁ 565 Xfermode_DstATop 87M 8910 571ns 584ns 585ns 614ns 2% ▂▄▄█▁▂▂▃▃▃ gpu Xfermode_DstATop 87M 7 21.5µs 21.6µs 21.7µs 22.6µs 2% ▄▂▁▂█▁▁▁▂▁ 8888 Xfermode_SrcATop 87M 8 31.4µs 31.5µs 33µs 42.1µs 10% █▁▁▁▂▁▁▁▁▃ 565 Xfermode_SrcATop 87M 8920 558ns 564ns 565ns 580ns 1% ▃▂▆▃▁█▄▁▁▁ gpu Xfermode_SrcATop 87M 8 6.62µs 6.66µs 6.83µs 8.39µs 8% █▁▁▁▁▁▁▁▁▁ 8888 Xfermode_DstOut 87M 14 14.2µs 14.7µs 14.8µs 15.5µs 3% ▄▄█▅▄▄▃▇▁▁ 565 Xfermode_DstOut 87M 9746 433ns 435ns 507ns 1.14µs 44% ▁▁▁▁▁▁▁▁█▁ gpu Xfermode_DstOut 87M 8 6.29µs 6.34µs 6.37µs 6.63µs 2% █▄▂▂▁▂▂▂▂▁ 8888 Xfermode_SrcOut 87M 14 14.5µs 14.7µs 14.8µs 15.2µs 2% ▃▁▁█▁▁▁▇▆▆ 565 Xfermode_SrcOut 87M 10385 469ns 488ns 486ns 493ns 1% ▁▆█▅▇▆▆▇▇▆ gpu Xfermode_SrcOut 87M 5 16.5µs 16.8µs 17.3µs 21.3µs 9% ▃▂▁▁█▃▁▁▁▁ 8888 Xfermode_DstIn 87M 6 42µs 42.2µs 42.3µs 43.1µs 1% ▆▃▂▁▁▂▂█▂▁ 565 Xfermode_DstIn 87M 9962 488ns 501ns 498ns 510ns 2% ▃▆█▆▁▅▃▁▂▇ gpu Xfermode_DstIn 87M 8 8.17µs 8.25µs 9.21µs 17.7µs 33% ▁▁▁▁▁▁▁▁▁█ 8888 Xfermode_SrcIn 87M 12 17.2µs 17.2µs 17.3µs 17.6µs 1% ▅▄▂▁▁█▂▂▁▃ 565 Xfermode_SrcIn 87M 10292 465ns 478ns 546ns 1.16µs 39% ▁▁▁▁▁▁▁▁█▁ gpu Xfermode_SrcIn 87M 8 8.51µs 8.54µs 8.58µs 8.85µs 1% █▅▂▃▁▁▁▁▂▁ 8888 Xfermode_DstOver 87M 30 283ns 329ns 330ns 373ns 9% █▅▁▆█▂▆▄▄▄ 565 Xfermode_DstOver 87M 8829 560ns 565ns 565ns 568ns 0% ██▇▆▇▅▂▄▁▆ gpu Xfermode_DstOver 87M 8 4.73µs 4.75µs 4.8µs 5.03µs 2% ██▃▂▂▁▁▁▁▁ 8888 Xfermode_SrcOver 87M 21 8.34µs 8.39µs 8.65µs 9.77µs 6% ▁▁▁▁▁▁▁▅█▄ 565 Xfermode_SrcOver 87M 10617 463ns 471ns 575ns 1.49µs 56% ▁▁▁▁▁▁▁▁█▁ gpu Xfermode_SrcOver 87M 23 92ns 95.3ns 96.5ns 109ns 5% █▂▂▃▂▂▂▁▂▄ 8888 Xfermode_Dst 87M 107 80.7ns 80.9ns 81.1ns 82.7ns 1% █▂▁▁▃▃▂▂▂▁ 565 Xfermode_Dst 87M 79754 62.8ns 63.3ns 63.6ns 64.6ns 1% ▇▂▃█▆█▁▃▂▂ gpu Xfermode_Dst 87M 9 4.85µs 4.88µs 4.92µs 5.3µs 3% █▂▁▁▁▂▂▂▁▁ 8888 Xfermode_Src 87M 6 43.1µs 47.1µs 46.9µs 50.5µs 4% ▅▅▆▄█▅▁▄▄▄ 565 Xfermode_Src 87M 5710 453ns 496ns 489ns 524ns 4% █▇▅▆▃▄▃▃▅▁ gpu Xfermode_Src 87M 6 7.99µs 8.06µs 8.16µs 9.05µs 4% █▂▁▁▁▁▂▂▁▁ 8888 Xfermode_Clear 87M 4 25.8µs 28.2µs 40.9µs 63.2µs 46% ████▁▁▁▁▁▁ 565 Xfermode_Clear 87M 8559 563ns 587ns 660ns 1.26µs 32% ▂▁▁█▂▁▁▁▁▁ gpu Xfermode_Clear
On 2014/11/04 18:00:49, reed1 wrote: > that sounds like a good result. if our nanobenches reflect that (inc. our > picture recording pref) then I have no objections. Oh I'll try the recording ones.
On 2014/11/04 18:03:14, danakj wrote: > On 2014/11/04 18:00:49, reed1 wrote: > > that sounds like a good result. if our nanobenches reflect that (inc. our > > picture recording pref) then I have no objections. > > Oh I'll try the recording ones. BEFORE: Timer overhead: 23.3ns maxrss loops min median mean max stddev samples config bench 42M 6 472ns 484ns 523ns 832ns 21% █▂▂▂▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary 43M 15 187ns 195ns 213ns 290ns 18% ▇█▂▂▁▂▁▂▁▂ nonrendering picture_record_unique_paint_dictionary 43M 6 3.24µs 3.31µs 3.69µs 6.84µs 30% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries AFTER: Timer overhead: 23.4ns maxrss loops min median mean max stddev samples config bench 42M 14 126ns 129ns 138ns 221ns 21% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary 43M 16 200ns 211ns 220ns 276ns 13% ██▂▂▂▁▁▁▁▁ nonrendering picture_record_unique_paint_dictionary 43M 6 3.41µs 3.56µs 3.82µs 6.38µs 24% █▂▂▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries These tests seem maybe a bit noisy, those are some big stddev. So I'm not sure how much data we get from these.
On 2014/11/04 18:05:13, danakj wrote: > On 2014/11/04 18:03:14, danakj wrote: > > On 2014/11/04 18:00:49, reed1 wrote: > > > that sounds like a good result. if our nanobenches reflect that (inc. our > > > picture recording pref) then I have no objections. > > > > Oh I'll try the recording ones. > > BEFORE: > > Timer overhead: 23.3ns > maxrss loops min median mean max stddev samples config bench > > 42M 6 472ns 484ns 523ns 832ns 21% █▂▂▂▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > 43M 15 187ns 195ns 213ns 290ns 18% ▇█▂▂▁▂▁▂▁▂ nonrendering picture_record_unique_paint_dictionary > > 43M 6 3.24µs 3.31µs 3.69µs 6.84µs 30% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > AFTER: > > Timer overhead: 23.4ns > maxrss loops min median mean max stddev samples config bench > > 42M 14 126ns 129ns 138ns 221ns 21% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > 43M 16 200ns 211ns 220ns 276ns 13% ██▂▂▂▁▁▁▁▁ nonrendering picture_record_unique_paint_dictionary > > 43M 6 3.41µs 3.56µs 3.82µs 6.38µs 24% █▂▂▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > These tests seem maybe a bit noisy, those are some big stddev. So I'm not sure > how much data we get from these. Was this on android? I wonder if it is more or less noisy.
On 2014/11/04 18:07:59, reed1 wrote: > On 2014/11/04 18:05:13, danakj wrote: > > On 2014/11/04 18:03:14, danakj wrote: > > > On 2014/11/04 18:00:49, reed1 wrote: > > > > that sounds like a good result. if our nanobenches reflect that (inc. our > > > > picture recording pref) then I have no objections. > > > > > > Oh I'll try the recording ones. > > > > BEFORE: > > > > Timer overhead: 23.3ns > > maxrss loops min median mean max stddev samples config bench > > > > > 42M 6 472ns 484ns 523ns 832ns 21% █▂▂▂▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > > > > 43M 15 187ns 195ns 213ns 290ns 18% ▇█▂▂▁▂▁▂▁▂ nonrendering picture_record_unique_paint_dictionary > > > > > 43M 6 3.24µs 3.31µs 3.69µs 6.84µs 30% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > > > AFTER: > > > > Timer overhead: 23.4ns > > maxrss loops min median mean max stddev samples config bench > > > > > 42M 14 126ns 129ns 138ns 221ns 21% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > > > > 43M 16 200ns 211ns 220ns 276ns 13% ██▂▂▂▁▁▁▁▁ nonrendering picture_record_unique_paint_dictionary > > > > > 43M 6 3.41µs 3.56µs 3.82µs 6.38µs 24% █▂▂▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > > > These tests seem maybe a bit noisy, those are some big stddev. So I'm not sure > > how much data we get from these. > > Was this on android? I wonder if it is more or less noisy. Oh dear, those particular benchmarks are both noisy and not testing anything we do in recording any more. Let's not look at those. I'll send out a CL to delete them. The best way I know to measure real recording cost is to pass some SKPs to nanobench, e.g. out/Release/nanobench --match skp --skps path/to/directory/of/skps. Will do a quick comparison here and post back. An alternative is to just submit and let skiaperf catch the regression.
On 2014/11/04 18:10:37, mtklein wrote: > On 2014/11/04 18:07:59, reed1 wrote: > > On 2014/11/04 18:05:13, danakj wrote: > > > On 2014/11/04 18:03:14, danakj wrote: > > > > On 2014/11/04 18:00:49, reed1 wrote: > > > > > that sounds like a good result. if our nanobenches reflect that (inc. > our > > > > > picture recording pref) then I have no objections. > > > > > > > > Oh I'll try the recording ones. > > > > > > BEFORE: > > > > > > Timer overhead: 23.3ns > > > maxrss loops min median mean max stddev samples config bench > > > > > > > > > 42M 6 472ns 484ns 523ns 832ns 21% █▂▂▂▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > > > > > > > > 43M 15 187ns 195ns 213ns 290ns 18% ▇█▂▂▁▂▁▂▁▂ nonrendering picture_record_unique_paint_dictionary > > > > > > > > > 43M 6 3.24µs 3.31µs 3.69µs 6.84µs 30% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > > > > > AFTER: > > > > > > Timer overhead: 23.4ns > > > maxrss loops min median mean max stddev samples config bench > > > > > > > > > 42M 14 126ns 129ns 138ns 221ns 21% █▂▁▁▁▁▁▁▁▁ nonrendering picture_record_recurring_paint_dictionary > > > > > > > > > 43M 16 200ns 211ns 220ns 276ns 13% ██▂▂▂▁▁▁▁▁ nonrendering picture_record_unique_paint_dictionary > > > > > > > > > 43M 6 3.41µs 3.56µs 3.82µs 6.38µs 24% █▂▂▁▁▁▁▁▁▁ nonrendering picture_record_dictionaries > > > > > > These tests seem maybe a bit noisy, those are some big stddev. So I'm not > sure > > > how much data we get from these. > > > > Was this on android? I wonder if it is more or less noisy. > > Oh dear, those particular benchmarks are both noisy and not testing anything we > do in recording any more. Let's not look at those. I'll send out a CL to > delete them. > > The best way I know to measure real recording cost is to pass some SKPs to > nanobench, e.g. out/Release/nanobench --match skp --skps > path/to/directory/of/skps. Will do a quick comparison here and post back. > > An alternative is to just submit and let skiaperf catch the regression. Hmm. I'm seeing this as roughly a 10% recording perf hit on my laptop, consistently for both large and small SKPs. desk_youtubetvbrowse.skp 7.77us -> 9.04us 1.2x desk_yahoosports.skp 3.15us -> 3.64us 1.2x mobi_wikipedia.skp 584us -> 670us 1.1x desk_espn.skp 209us -> 233us 1.1x desk_samoasvg.skp 346us -> 383us 1.1x desk_booking.skp 540us -> 595us 1.1x desk_yahoogames.skp 24.9us -> 27.5us 1.1x desk_jsfiddlebigcar.skp 26.1us -> 28.8us 1.1x desk_youtube.skp 253us -> 278us 1.1x desk_oldinboxapp.skp 16.8us -> 18.4us 1.1x desk_pokemonwiki.skp 5.37ms -> 5.9ms 1.1x tabl_cnn.skp 476us -> 521us 1.1x tabl_techmeme.skp 95.4us -> 104us 1.1x desk_fontwipe.skp 4.74us -> 5.17us 1.1x desk_forecastio.skp 116us -> 126us 1.1x desk_chalkboard.skp 319us -> 347us 1.1x tabl_mozilla.skp 1.83ms -> 1.98ms 1.1x desk_rectangletransition.skp 10.2us -> 11.1us 1.1x desk_baidu.skp 88.4us -> 95.8us 1.1x desk_sfgate.skp 208us -> 225us 1.1x desk_weather.skp 159us -> 172us 1.1x desk_googlehome.skp 37us -> 40us 1.1x desk_mapsvg.skp 617us -> 667us 1.1x tabl_mlb.skp 141us -> 152us 1.1x desk_mobilenews.skp 295us -> 318us 1.1x tabl_cnet.skp 450us -> 484us 1.1x desk_facebook.skp 316us -> 340us 1.1x tabl_culturalsolutions.skp 254us -> 273us 1.1x tabl_googlecalendar.skp 161us -> 172us 1.1x tabl_androidpolice.skp 703us -> 755us 1.1x desk_googleplus.skp 1.13ms -> 1.22ms 1.1x desk_tigersvg.skp 43.2us -> 46.4us 1.1x desk_jsfiddlehumperclip.skp 28.8us -> 30.9us 1.1x desk_css3gradients.skp 203us -> 217us 1.1x tabl_mercurynews.skp 179us -> 191us 1.1x desk_gws.skp 124us -> 133us 1.1x desk_gmailthread.skp 206us -> 221us 1.1x tabl_sahadan.skp 60.3us -> 64.6us 1.1x tabl_googleblog.skp 364us -> 390us 1.1x desk_carsvg.skp 200us -> 214us 1.1x tabl_digg.skp 387us -> 414us 1.1x tabl_vnexpress.skp 238us -> 254us 1.1x tabl_pravda.skp 169us -> 181us 1.1x desk_yahooanswers.skp 132us -> 141us 1.1x tabl_hsfi.skp 296us -> 316us 1.1x tabl_frantzen.skp 53.2us -> 56.7us 1.1x desk_wowwiki.skp 1.03ms -> 1.1ms 1.1x desk_amazon.skp 107us -> 114us 1.1x tabl_gmail.skp 19.7us -> 20.9us 1.1x desk_googlespreadsheetdashed.skp 1.2ms -> 1.27ms 1.1x tabl_ukwsj.skp 273us -> 290us 1.1x tabl_gamedeksiam.skp 528us -> 562us 1.1x desk_techcrunch.skp 227us -> 241us 1.1x desk_blogger.skp 198us -> 210us 1.1x desk_twitter.skp 292us -> 310us 1.1x tabl_nytimes.skp 209us -> 222us 1.1x tabl_deviantart.skp 131us -> 139us 1.1x desk_linkedin.skp 149us -> 158us 1.1x tabl_slashdot.skp 109us -> 115us 1.1x tabl_engadget.skp 390us -> 411us 1.1x desk_ebay.skp 194us -> 204us 1.1x tabl_nofolo.skp 42.8us -> 44.9us 1x desk_pinterest.skp 40.6us -> 42.5us 1x tabl_worldjournal.skp 465us -> 487us 1x tabl_cuteoverload.skp 590us -> 613us 1x desk_wordpress.skp 372us -> 375us 1x desk_youtubetvvideo.skp 11.7us -> 11us 0.94x These numbers come from running nanobench --match skp --config nonrendering (i.e. loading SKPs and timing how long it takes to rerecord them) with 100 samples of each SKP. The times and ratios you see are the change in minimum sample time: ToT on the left, this CL on the right, with larger ratios indicating slowdown. The tool's omitted any SKPs that don't appear to have significantly different timing distributions. For reference, here's the sort of noise floor we'd expect for a no-op, comparing ToT against itself: desk_yahoosports.skp 3.05us -> 3.18us 1x desk_pokemonwiki.skp 5.39ms -> 5.56ms 1x tabl_digg.skp 384us -> 393us 1x desk_samoasvg.skp 346us -> 353us 1x tabl_techmeme.skp 96.2us -> 97.9us 1x tabl_cuteoverload.skp 584us -> 594us 1x tabl_nofolo.skp 43us -> 43.5us 1x desk_facebook.skp 315us -> 318us 1x desk_wowwiki.skp 1.02ms -> 1.04ms 1x tabl_gspro.skp 57.6us -> 58.2us 1x tabl_worldjournal.skp 472us -> 477us 1x tabl_mlb.skp 142us -> 143us 1x tabl_mozilla.skp 1.86ms -> 1.87ms 1x desk_fontwipe.skp 4.85us -> 4.89us 1x tabl_mercurynews.skp 178us -> 180us 1x tabl_slashdot.skp 108us -> 109us 1x tabl_ukwsj.skp 272us -> 274us 1x tabl_googlecalendar.skp 160us -> 161us 1x desk_googlespreadsheetdashed.skp 1.2ms -> 1.21ms 1x desk_carsvg.skp 201us -> 202us 1x desk_tigersvg.skp 43.1us -> 43.3us 1x desk_espn.skp 208us -> 209us 1x mobi_wikipedia.skp 587us -> 587us 1x desk_css3gradients.skp 202us -> 202us 1x tabl_sahadan.skp 61.9us -> 61.9us 1x desk_youtubetvvideo.skp 9.84us -> 9.84us 1x desk_googleplus.skp 1.15ms -> 1.14ms 1x tabl_googleblog.skp 365us -> 364us 1x desk_baidu.skp 88.6us -> 88.2us 1x tabl_deviantart.skp 134us -> 133us 0.99x desk_pinterest.skp 40.7us -> 40.4us 0.99x desk_youtubetvbrowse.skp 9.38us -> 9.22us 0.98x desk_amazon.skp 109us -> 107us 0.98x tabl_pravda.skp 182us -> 171us 0.94x
This is what I see on my desktop: tabl_hsfi.skp 365us -> 480us 1.3x desk_tigersvg.skp 124us -> 163us 1.3x desk_booking.skp 703us -> 907us 1.3x desk_amazon.skp 80.8us -> 104us 1.3x tabl_gamedeksiam.skp 590us -> 732us 1.2x desk_sfgate.skp 483us -> 493us 1x desk_chalkboard.skp 796us -> 813us 1x desk_googlespreadsheetdashed.skp 1.45ms -> 1.47ms 1x tabl_androidpolice.skp 3.93us -> 4us 1x desk_mobilenews.skp 450us -> 456us 1x desk_mapsvg.skp 1.42ms -> 1.43ms 1x desk_wordpress.skp 702us -> 707us 1x desk_wowwiki.skp 1.53ms -> 1.54ms 1x tabl_engadget.skp 586us -> 590us 1x desk_pokemonwiki.skp 7.39ms -> 7.43ms 1x tabl_ukwsj.skp 526us -> 529us 1x tabl_mozilla.skp 2.4ms -> 2.41ms 1x desk_linkedin.skp 290us -> 291us 1x tabl_nytimes.skp 116us -> 116us 1x tabl_googleblog.skp 300us -> 300us 1x desk_yahooanswers.skp 155us -> 155us 1x tabl_pravda.skp 222us -> 221us 1x tabl_googlecalendar.skp 187us -> 187us 1x desk_youtube.skp 501us -> 499us 1x desk_carsvg.skp 420us -> 417us 0.99x tabl_transformice.skp 141us -> 141us 0.99x tabl_sahadan.skp 85.1us -> 84.5us 0.99x tabl_worldjournal.skp 201us -> 199us 0.99x tabl_slashdot.skp 121us -> 120us 0.99x tabl_culturalsolutions.skp 336us -> 332us 0.99x desk_jsfiddlebigcar.skp 32.7us -> 32.2us 0.99x tabl_deviantart.skp 116us -> 114us 0.99x tabl_nofolo.skp 63.8us -> 62.8us 0.99x desk_jsfiddlehumperclip.skp 37.2us -> 36.6us 0.98x tabl_digg.skp 785us -> 772us 0.98x tabl_techmeme.skp 112us -> 110us 0.98x desk_googlehome.skp 47.5us -> 46.5us 0.98x tabl_gmail.skp 16us -> 15.6us 0.98x desk_googleplus.skp 30.7us -> 29.9us 0.97x desk_forecastio.skp 75.8us -> 73.7us 0.97x desk_pinterest.skp 125us -> 122us 0.97x tabl_frantzen.skp 47.5us -> 45.9us 0.97x Maybe we should try on android?
I'm really skeptical of our bench results. It's way larger an effect than I'd expect, and running this under Instruments I'm just not seeing any time spent in TRACE_EVENT*, or anywhere near them.
Here's the results from my N4 device: desk_chalkboard.skp 7.83ms -> 8.94ms 1.1x tabl_gamedeksiam.skp 5.71ms -> 6.39ms 1.1x tabl_mozilla.skp 26.7ms -> 29.9ms 1.1x desk_samoasvg.skp 4.62ms -> 5.11ms 1.1x desk_mapsvg.skp 16.2ms -> 17.7ms 1.1x tabl_googleblog.skp 2.17ms -> 2.32ms 1.1x desk_googlespreadsheetdashed.skp 16.7ms -> 17.4ms 1x desk_wordpress.skp 5.33ms -> 5.49ms 1x desk_pokemonwiki.skp 87.3ms -> 89.9ms 1x desk_sfgate.skp 3.57ms -> 3.68ms 1x tabl_ukwsj.skp 3.95ms -> 4.06ms 1x tabl_cnn.skp 1.3ms -> 1.33ms 1x desk_facebook.skp 3.89ms -> 3.97ms 1x tabl_hsfi.skp 3.27ms -> 3.33ms 1x desk_googlespreadsheet.skp 3.65ms -> 3.71ms 1x desk_youtube.skp 3.82ms -> 3.88ms 1x desk_twitter.skp 3.19ms -> 3.25ms 1x desk_mobilenews.skp 3.27ms -> 3.31ms 1x desk_wowwiki.skp 14.9ms -> 15.1ms 1x tabl_nofolo.skp 486us -> 492us 1x tabl_cnet.skp 956us -> 966us 1x tabl_engadget.skp 4.43ms -> 4.47ms 1x desk_espn.skp 1.79ms -> 1.81ms 1x tabl_cuteoverload.skp 3.77ms -> 3.8ms 1x desk_gws.skp 1.3ms -> 1.31ms 1x desk_yahooanswers.skp 1.11ms -> 1.11ms 1x desk_amazon.skp 790us -> 794us 1x desk_ebay.skp 1.51ms -> 1.52ms 1x tabl_frantzen.skp 335us -> 336us 1x tabl_gmail.skp 120us -> 120us 1x desk_silkfinance.skp 456us -> 456us 1x desk_jsfiddlehumperclip.skp 244us -> 244us 1x tabl_techmeme.skp 800us -> 800us 1x desk_googlehome.skp 371us -> 370us 1x tabl_nytimes.skp 804us -> 801us 1x tabl_transformice.skp 1ms -> 998us 1x desk_tigersvg.skp 1.09ms -> 1.09ms 1x desk_jsfiddlebigcar.skp 219us -> 218us 1x desk_booking.skp 7.71ms -> 7.66ms 0.99x tabl_gspro.skp 449us -> 446us 0.99x tabl_slashdot.skp 855us -> 848us 0.99x desk_weather.skp 1.8ms -> 1.78ms 0.99x desk_forecastio.skp 602us -> 595us 0.99x desk_gmailthread.skp 2.06ms -> 2.04ms 0.99x tabl_culturalsolutions.skp 2.65ms -> 2.61ms 0.99x desk_blogger.skp 3.48ms -> 3.43ms 0.99x tabl_googlecalendar.skp 1.35ms -> 1.33ms 0.98x tabl_deviantart.skp 905us -> 891us 0.98x desk_pinterest.skp 992us -> 970us 0.98x desk_baidu.skp 1.34ms -> 1.29ms 0.96x desk_googleplus.skp 231us -> 223us 0.96x tabl_digg.skp 6.76ms -> 6.41ms 0.95x
On 2014/11/04 19:15:42, danakj wrote: > Here's the results from my N4 device: > > desk_chalkboard.skp 7.83ms -> 8.94ms 1.1x > tabl_gamedeksiam.skp 5.71ms -> 6.39ms 1.1x > tabl_mozilla.skp 26.7ms -> 29.9ms 1.1x > desk_samoasvg.skp 4.62ms -> 5.11ms 1.1x > desk_mapsvg.skp 16.2ms -> 17.7ms 1.1x > tabl_googleblog.skp 2.17ms -> 2.32ms 1.1x > desk_googlespreadsheetdashed.skp 16.7ms -> 17.4ms 1x > desk_wordpress.skp 5.33ms -> 5.49ms 1x > desk_pokemonwiki.skp 87.3ms -> 89.9ms 1x > desk_sfgate.skp 3.57ms -> 3.68ms 1x > tabl_ukwsj.skp 3.95ms -> 4.06ms 1x > tabl_cnn.skp 1.3ms -> 1.33ms 1x > desk_facebook.skp 3.89ms -> 3.97ms 1x > tabl_hsfi.skp 3.27ms -> 3.33ms 1x > desk_googlespreadsheet.skp 3.65ms -> 3.71ms 1x > desk_youtube.skp 3.82ms -> 3.88ms 1x > desk_twitter.skp 3.19ms -> 3.25ms 1x > desk_mobilenews.skp 3.27ms -> 3.31ms 1x > desk_wowwiki.skp 14.9ms -> 15.1ms 1x > tabl_nofolo.skp 486us -> 492us 1x > tabl_cnet.skp 956us -> 966us 1x > tabl_engadget.skp 4.43ms -> 4.47ms 1x > desk_espn.skp 1.79ms -> 1.81ms 1x > tabl_cuteoverload.skp 3.77ms -> 3.8ms 1x > desk_gws.skp 1.3ms -> 1.31ms 1x > desk_yahooanswers.skp 1.11ms -> 1.11ms 1x > desk_amazon.skp 790us -> 794us 1x > desk_ebay.skp 1.51ms -> 1.52ms 1x > tabl_frantzen.skp 335us -> 336us 1x > tabl_gmail.skp 120us -> 120us 1x > desk_silkfinance.skp 456us -> 456us 1x > desk_jsfiddlehumperclip.skp 244us -> 244us 1x > tabl_techmeme.skp 800us -> 800us 1x > desk_googlehome.skp 371us -> 370us 1x > tabl_nytimes.skp 804us -> 801us 1x > tabl_transformice.skp 1ms -> 998us 1x > desk_tigersvg.skp 1.09ms -> 1.09ms 1x > desk_jsfiddlebigcar.skp 219us -> 218us 1x > desk_booking.skp 7.71ms -> 7.66ms 0.99x > tabl_gspro.skp 449us -> 446us 0.99x > tabl_slashdot.skp 855us -> 848us 0.99x > desk_weather.skp 1.8ms -> 1.78ms 0.99x > desk_forecastio.skp 602us -> 595us 0.99x > desk_gmailthread.skp 2.06ms -> 2.04ms 0.99x > tabl_culturalsolutions.skp 2.65ms -> 2.61ms 0.99x > desk_blogger.skp 3.48ms -> 3.43ms 0.99x > tabl_googlecalendar.skp 1.35ms -> 1.33ms 0.98x > tabl_deviantart.skp 905us -> 891us 0.98x > desk_pinterest.skp 992us -> 970us 0.98x > desk_baidu.skp 1.34ms -> 1.29ms 0.96x > desk_googleplus.skp 231us -> 223us 0.96x > tabl_digg.skp 6.76ms -> 6.41ms 0.95x That looks as neutral as these things come. Let's try landing this? skiaperf.com/alerts should catch any steps up or down this CL causes. May take a few hours for all the bench results to come in, so I wouldn't bother looking before tomorrow.
mtklein@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp File src/core/SkCanvas.cpp (right): https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp#newcod... src/core/SkCanvas.cpp:1743: TRACE_EVENT1("skia", "SkCanvas::drawPoints()", "count", count); I had to drop count and make this a TRACE_EVENT0() to get this to compile with Clang on my Mac laptop. It was complaining that this is somehow ambiguous. Might need to cast count to some other data type?
https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp File src/core/SkCanvas.cpp (right): https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp#newcod... src/core/SkCanvas.cpp:1743: TRACE_EVENT1("skia", "SkCanvas::drawPoints()", "count", count); On 2014/11/04 19:21:37, mtklein wrote: > I had to drop count and make this a TRACE_EVENT0() to get this to compile with > Clang on my Mac laptop. It was complaining that this is somehow ambiguous. > Might need to cast count to some other data type? Hm, okay! Thanks.
On 2014/11/04 19:21:37, mtklein wrote: > https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp > File src/core/SkCanvas.cpp (right): > > https://codereview.chromium.org/702473004/diff/1/src/core/SkCanvas.cpp#newcod... > src/core/SkCanvas.cpp:1743: TRACE_EVENT1("skia", "SkCanvas::drawPoints()", > "count", count); > I had to drop count and make this a TRACE_EVENT0() to get this to compile with > Clang on my Mac laptop. It was complaining that this is somehow ambiguous. > Might need to cast count to some other data type? For what it's worth, I've been disassembling and Instruments'ing nanobench picture recording. The TRACE_EVENT macros do seem to add a tiny bit of overhead to the methods they're called from. My exemplar was drawPosTextH(), because it's very common and all it does is call (virtual) onDrawPosTextH(). Before the CL that virtual call was ~100% of the time spent _in that method_, and now it's ~95%, with 5% of the time spent in that method devoted to checking if the trace is active and managing the newly-required stack space. Keep in mind that the time spent in the method itself is something like 0.5% of the overall runtime, and that's the most expensive SkCanvas method I see.
PTAL hopefully this fixes mac
On 2014/11/04 19:36:17, danakj wrote: > PTAL hopefully this fixes mac Looks like it did.
On 2014/11/04 19:37:47, danakj wrote: > On 2014/11/04 19:36:17, danakj wrote: > > PTAL hopefully this fixes mac > > Looks like it did. Yeah. Weird. Aren't size_t and uint64_t the same type on 64 bit? Or is one UL, one ULL? Anyway, this lgtm. Let's land it and let the fleet of perf bots see what's what. Any remaining dissenters, please direct your wrath towards me!
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/702473004/20001
Note for Reviewers: The CQ is waiting for an approval. If you believe that the CL is not ready yet, or if you would like to L-G-T-M with comments then please uncheck the CQ checkbox. Waiting for LGTM from valid reviewer(s) till 2014-11-05 01:40 UTC
mtklein@google.com changed required reviewers: - reed@google.com
The CQ bit was unchecked by mtklein@google.com
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/702473004/20001
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as 8f757f540a8378c7b1354aab3d4650eaa920b17a |