|
|
Created:
7 years, 5 months ago by f(malita) Modified:
7 years, 4 months ago CC:
chromium-reviews, extensions-reviews_chromium.org, jam, apatrick_chromium, joi+watch-content_chromium.org, darin-cc_chromium.org, chromium-apps-reviews_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Description[SkiaBenchmarkingExtension] Add draw command timing info.
Extend skiaBenchmarking.getOps() to also report per-op timing information (microseconds, as measured while drawing to a bitmap canvas).
The CL introduces a new SkiaBenchmarkingCanvas abstraction, which is responsible for extracting the op list and gathering render timings. This is accomplished by multiplexing the draw commands onto two internal canvases:
* an SkDebugCanvas - records command info and tracks the current command index.
* an SkiaTimingCanvas - instrumented canvas (records timing information while drawing to a backing bitmap canvas).
Since SkiaTimingCanvas relies on SkDebugCanvas for tracking the current command index, we do not have to worry about timing indices getting out of sync due to missing SkCanvas method overrides, and can selectively instrument only methods of interest. This insulates skiaBenchmarking from SkCanvas API changes.
R=nduca@chromium.org, piman@chromium.org, senorblanco@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=214404
Patch Set 1 #Patch Set 2 : Self review, minor cleanup. #Patch Set 3 : Rebased. #
Total comments: 3
Patch Set 4 : [not for landing] Fixed timing measurement, uploading for pdr testing. #Patch Set 5 : Add dedicated getOpTimings() method, relocate SkiaBenchmarkingCanvas. #Patch Set 6 : Speculative Mac & Win build fix. #
Total comments: 2
Patch Set 7 : Rebased, fixed SkDevice refcounting. #
Total comments: 4
Patch Set 8 : Rebased. #
Messages
Total messages: 32 (0 generated)
This CL depends on https://code.google.com/p/skia/source/detail?r=10095, which hasn't rolled in yet. Will have to retry the jobs after Skia rolls.
On 2013/07/16 19:47:59, Florin Malita wrote: > This CL depends on https://code.google.com/p/skia/source/detail?r=10095, which > hasn't rolled in yet. Will have to retry the jobs after Skia rolls. Did that roll yet? I'm still getting errors in linking this patch: Undefined symbols for architecture i386: "SkProxyCanvas::skew(float, float)", referenced from: vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o "SkProxyCanvas::concat(SkMatrix const&)", referenced from: vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o "SkProxyCanvas::addComment(char const*, char const*)", referenced from: vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o "SkProxyCanvas::endCommentGroup()", referenced from: vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o ... etc
On 2013/07/17 22:26:27, pdr wrote: > On 2013/07/16 19:47:59, Florin Malita wrote: > > This CL depends on https://code.google.com/p/skia/source/detail?r=10095, which > > hasn't rolled in yet. Will have to retry the jobs after Skia rolls. > > Did that roll yet? I'm still getting errors in linking this patch: > Undefined symbols for architecture i386: > "SkProxyCanvas::skew(float, float)", referenced from: > vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o > "SkProxyCanvas::concat(SkMatrix const&)", referenced from: > vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o > "SkProxyCanvas::addComment(char const*, char const*)", referenced from: > vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o > "SkProxyCanvas::endCommentGroup()", referenced from: > vtable for content::SkiaTimingCanvas in content.skia_benchmarking_canvas.o > ... etc Yes, it rolled around noon but the bots were slow to pick it up. You should definitely get it if you sync though.
This looks good to me from a high level. I'll defer to nduca on the real review. https://codereview.chromium.org/19266015/diff/19001/content/renderer/skia_ben... File content/renderer/skia_benchmarking/skia_benchmarking_canvas.cc (right): https://codereview.chromium.org/19266015/diff/19001/content/renderer/skia_ben... content/renderer/skia_benchmarking/skia_benchmarking_canvas.cc:33: // (but timing information will be inaccurate). Why not just bail? https://codereview.chromium.org/19266015/diff/19001/content/renderer/skia_ben... content/renderer/skia_benchmarking/skia_benchmarking_canvas.cc:240: DCHECK_LT(index, static_cast<size_t>(debug_canvas_->getSize())); extra space
I plumbed this into the debugger and I think these numbers are wrong. I'm getting almost every op costing 0microseconds (even big bitmap blits), with the odd op costing 1microsecond. Are we actually drawing in the timing canvas?
On 2013/07/18 23:15:40, pdr wrote: > I plumbed this into the debugger and I think these numbers are wrong. I'm > getting almost every op costing 0microseconds (even big bitmap blits), with the > odd op costing 1microsecond. Are we actually drawing in the timing canvas? Yes, we are. I actually plugged Rasterize on SkiaDebugCanvas at some point to verify. Agreed that it's strange thought, will have to figure what's going on...
lgtm as a baby step, but have you determined the difference between sum(cmd_time) compared to the time taken from 194 to 196? you should obtain that number and return it as well --- this will indicate the correctness of your measurement --- either under or over. might mean that getops should return {ops: ops, time_total: blah} instead of just an array...
hmm though, can we keep the benchmarking extension as a cpp and put the new stuff in src/skia? it seems like thats a better place for skia helper code than our content folder.
On 2013/07/19 05:15:11, nduca wrote: > lgtm as a baby step, but have you determined the difference between > sum(cmd_time) compared to the time taken from 194 to 196? you should obtain that > number and return it as well --- this will indicate the correctness of your > measurement --- either under or over. Problem with that is that it includes the time to draw on the SkDebugCanvas also (SkiaBenchmarkingCanvas delegates to both in lock-step) plus the SkPicture playback overhead, which I expect would skew the numbers. We could avoid the former if we do an extra rendering pass to a plain bitmap canvas, but I don't think we can eliminate the playback overhead. Maybe it's not a problem though, since that's always going to be present in practice - but it won't align with sum(cmd_time). On 2013/07/19 10:21:41, nduca wrote: > hmm though, can we keep the benchmarking extension as a cpp and put the new > stuff in src/skia? it seems like thats a better place for skia helper code than > our content folder. sgtm.
https://codereview.chromium.org/19266015/diff/19001/content/renderer/skia_ben... File content/renderer/skia_benchmarking/skia_benchmarking_canvas.cc (right): https://codereview.chromium.org/19266015/diff/19001/content/renderer/skia_ben... content/renderer/skia_benchmarking/skia_benchmarking_canvas.cc:33: // (but timing information will be inaccurate). On 2013/07/18 20:57:11, pdr wrote: > Why not just bail? Bailing from the constructor is tricky. We could change this to be an init method maybe and return a success code. Since this class is now used for gathering both the op list and timings list, it seemed like a good idea to still return the former even if we cannot get the latter. But I can see why you're concerned that this could be masking bugs and feeding bad timing info so it's probably a good idea to just blow up on failure.
I understand the difficulty, but how do we justify this metric as valid or representative of reality? I like the feature, but I'm not hearing much evidence that its giving our users good numbers....
Meaning, lets get the evidence that the numbers are good, or discuss the limitations etc. :0
On 2013/07/19 18:17:08, nduca wrote: > Meaning, lets get the evidence that the numbers are good, or discuss the > limitations etc. :0 Agreed, I'm not trying to push this patch atm, but instead trying to determine whether the granular timings are accurate and/or useful. I think we can make your cumulative time idea work though, so at the very least we can add that piece of information.
Maybe the thing to do with this patch i nail the api down enough with this as the starter implementation that we can then do more stuff as a followup. One comment on that front is that we've got the timings coming in with the ops request. If you think about it, in the long term, we're gonna need to do many passes on the data in order to get op costs estimates. This implies to me that maybe the getOps and the timing calls should be separate. And, perhaps the timing should allow an index argument so that we can do ops 1-1000, then 1001-2000 etc until we get them all done, without hanging the entire UI...
On 2013/07/19 18:26:59, nduca wrote: > Maybe the thing to do with this patch i nail the api down enough with this as > the starter implementation that we can then do more stuff as a followup. Found the timing bug(s) that were causing incorrect measurements, so at this point the granular measurements appear to be both realistic and useful. We should also add a cumulative figure, with the understanding that it will be larger than the sum of individual ops timings due to picture playback overhead. > One comment on that front is that we've got the timings coming in with the ops > request. If you think about it, in the long term, we're gonna need to do many > passes on the data in order to get op costs estimates. > > This implies to me that maybe the getOps and the timing calls should be > separate. Good point. The reason combining the two seemed like a good idea is that we're drawing to a SkDebugCanvas anyways (for index<->time correlation), so getting full ops info on the timing call is close to free. We could still split the two such that we don't render to a timing canvas when only interested in the commands list thought, if that's a use case worth supporting. > And, perhaps the timing should allow an index argument so that we can > do ops 1-1000, then 1001-2000 etc until we get them all done, without hanging > the entire UI... Unfortunately, I don't think that would work: individual draw commands depend on the full context both in semantics (restore corresponding to a saveLayer does something quite different from a restore for a save(matrix)), and in cost (a previous clip may drastically reduce the area affected by some op), so executing them in isolation/incrementally is going to be inaccurate.
On 2013/07/20 14:55:56, Florin Malita wrote: > On 2013/07/19 18:26:59, nduca wrote: > > And, perhaps the timing should allow an index argument so that we can > > do ops 1-1000, then 1001-2000 etc until we get them all done, without hanging > > the entire UI... > > Unfortunately, I don't think that would work: individual draw commands depend on > the full context both in semantics (restore corresponding to a saveLayer does > something quite different from a restore for a save(matrix)), and in cost (a > previous clip may drastically reduce the area affected by some op), so executing > them in isolation/incrementally is going to be inaccurate. Unless we're introducing a persistent debugging session (an idea I've toyed with before): then a) we don't have to ship the skp64 over ipc and de-serialize it on every call, and b) we can support partial/incremental semantics. But this is a long shot, just thought I'd put it up here :)
I ran some tests comparing the measured cumulative time (sum{cmd_time[i]}) vs. the time to fully draw the picture straight to a bitmap-backed canvas. There's some wild variance on the first run, but after that the numbers are quite stable. Here are some repeated measurements for the main nyt.com picture: cumulative time(us): 6431 6391 6420 6300 6775 6386 6340 6380 6334 6281 atomic paint time(us): 6176 6216 6183 6126 6336 6169 6190 6200 6240 6213 We can probably use the full paint/atomic time vs. cumulative time to validate a measurement (and possibly repeat it if the numbers don't agree), so it's a good idea to return this too - on a dedicated getTimings() call.
<3 Lgtm thank you Florin!
(your points about incrementality are well made. Lets deal with that "someday")
Thanks, Nat! Adding content & Skia owners.
https://codereview.chromium.org/19266015/diff/52001/content/renderer/skia_ben... File content/renderer/skia_benchmarking_extension.cc (right): https://codereview.chromium.org/19266015/diff/52001/content/renderer/skia_ben... content/renderer/skia_benchmarking_extension.cc:251: bounds.width(), bounds.height()))); do we need to unref the SkDevice? With Skia honestly I never know, but I thought objects started at refcount==1 and need to be explicitly unref'd at least once.
https://codereview.chromium.org/19266015/diff/52001/content/renderer/skia_ben... File content/renderer/skia_benchmarking_extension.cc (right): https://codereview.chromium.org/19266015/diff/52001/content/renderer/skia_ben... content/renderer/skia_benchmarking_extension.cc:251: bounds.width(), bounds.height()))); On 2013/07/23 16:29:02, piman wrote: > do we need to unref the SkDevice? Ah, we surely do, nice catch.
https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... File content/renderer/skia_benchmarking_extension.cc (right): https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... content/renderer/skia_benchmarking_extension.cc:248: // Measure the total time by drawing straight into a bitmap-backed canvas. Out of curiosity, is this going to work for GPU-backed canvases? It seems like timing individual commands is only going to work for raster-backed ones. I wonder if there's some way we could optionally annotate SkPicture ops for (later) automatic tracing, or something. Anyway not important for this CL, but something to think about. https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... File skia/ext/benchmarking_canvas.h (right): https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... skia/ext/benchmarking_canvas.h:17: class SK_API BenchmarkingCanvas : public SkNWayCanvas { Is it necessary to derive from SkNWayCanvas, or could it be aggregated instead? Is this class actually used directly in place of an SkCanvas? I'm just asking since we've had issues in the past with subclasses of SkCanvas and SkDevice in Chrome that we wish we hadn't derived.
https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... File content/renderer/skia_benchmarking_extension.cc (right): https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... content/renderer/skia_benchmarking_extension.cc:248: // Measure the total time by drawing straight into a bitmap-backed canvas. On 2013/07/29 14:43:13, Stephen White wrote: > Out of curiosity, is this going to work for GPU-backed canvases? It seems like > timing individual commands is only going to work for raster-backed ones. Yes, we're timing on raster-only canvases. I don't have a good story for adding GPU support atm (we could try to just create a GPU-backed canvas and attempt to time ops on it, but I suspect that's not going to be accurate). Maybe you have some better idea how that may work, but at this point we just wanted to provide some numbers reflecting the relative cost of the draw commands - and timing raster draw ops seems to work pretty well. https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... File skia/ext/benchmarking_canvas.h (right): https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... skia/ext/benchmarking_canvas.h:17: class SK_API BenchmarkingCanvas : public SkNWayCanvas { On 2013/07/29 14:43:13, Stephen White wrote: > Is it necessary to derive from SkNWayCanvas, or could it be aggregated instead? > Is this class actually used directly in place of an SkCanvas? Yes: we pass this to Picture::Replay() to draw itself onto the canvas (which it turn ends up calling SkPicture::draw(canvas)). We could try to derive directly from SkCanvas, but the problem I'm trying to avoid is being too closely tied to the Skia API: if we miss overriding a virtual in BenchmarkingCanvas (or if Skia adds a new one), things go bad as we stop dispatching all ops and the index<->timing mapping gets out of sync; SkNWayCanvas OTOH is part of Skia and guaranteed to be up to date and dispatch all ops correctly. > I'm just asking > since we've had issues in the past with subclasses of SkCanvas and SkDevice in > Chrome that we wish we hadn't derived. Is there anything it particular that I should be watching out for?
On 2013/07/29 15:43:04, Florin Malita wrote: > https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... > File content/renderer/skia_benchmarking_extension.cc (right): > > https://codereview.chromium.org/19266015/diff/69001/content/renderer/skia_ben... > content/renderer/skia_benchmarking_extension.cc:248: // Measure the total time > by drawing straight into a bitmap-backed canvas. > On 2013/07/29 14:43:13, Stephen White wrote: > > Out of curiosity, is this going to work for GPU-backed canvases? It seems like > > timing individual commands is only going to work for raster-backed ones. > > Yes, we're timing on raster-only canvases. I don't have a good story for adding > GPU support atm (we could try to just create a GPU-backed canvas and attempt to > time ops on it, but I suspect that's not going to be accurate). Maybe you have > some better idea how that may work, but at this point we just wanted to provide > some numbers reflecting the relative cost of the draw commands - and timing > raster draw ops seems to work pretty well. > > https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... > File skia/ext/benchmarking_canvas.h (right): > > https://codereview.chromium.org/19266015/diff/69001/skia/ext/benchmarking_can... > skia/ext/benchmarking_canvas.h:17: class SK_API BenchmarkingCanvas : public > SkNWayCanvas { > On 2013/07/29 14:43:13, Stephen White wrote: > > Is it necessary to derive from SkNWayCanvas, or could it be aggregated > instead? > > Is this class actually used directly in place of an SkCanvas? > > Yes: we pass this to Picture::Replay() to draw itself onto the canvas (which it > turn ends up calling SkPicture::draw(canvas)). > > We could try to derive directly from SkCanvas, but the problem I'm trying to > avoid is being too closely tied to the Skia API: if we miss overriding a virtual > in BenchmarkingCanvas (or if Skia adds a new one), things go bad as we stop > dispatching all ops and the index<->timing mapping gets out of sync; > SkNWayCanvas OTOH is part of Skia and guaranteed to be up to date and dispatch > all ops correctly. That makes sense. Don't you have the same problem with TimingCanvas, though? (This is not a blocker, BTW, this patch LGTM.) > > I'm just asking > > since we've had issues in the past with subclasses of SkCanvas and SkDevice in > > Chrome that we wish we hadn't derived. > > Is there anything it particular that I should be watching out for? I don't remember all the details; Mike might. It had to do with PlatformCanvas and PlatformDevice and a bunch of stuff in Chrome-land I only vaguely know about. So not very useful feedback, I'm afraid.
I'd like to use ARB_timer_query for the gpu equivalent timing... we've been slowly plumbing it through the stack and its perfect for this. Can we put an assert in the code that causes it to explode if its handed a GPU-based canvas?
Thanks Stephen! On 2013/07/29 16:52:11, Stephen White wrote: > > We could try to derive directly from SkCanvas, but the problem I'm trying to > > avoid is being too closely tied to the Skia API: if we miss overriding a > virtual > > in BenchmarkingCanvas (or if Skia adds a new one), things go bad as we stop > > dispatching all ops and the index<->timing mapping gets out of sync; > > SkNWayCanvas OTOH is part of Skia and guaranteed to be up to date and dispatch > > all ops correctly. > > That makes sense. Don't you have the same problem with TimingCanvas, though? In a much narrower sense, I think. TimingCanvas is responsible for two things: * proxy draw commands through to the backing bitmap canvas - for this, it's really important that we cover all draw commands, otherwise the canvas can end up in a different state and throw measurements off. This is solved by inheriting from SkProxyCanvas and letting it handle draw forwarding: as part of Skia, SkProxyCanvas should be up to date and properly dispatch all commands to the target. * measure and report the execution time for draw commands - this is implemented using instrumented overrides, and uses an external SkDebugCanvas (multiplexed by BenchmarkingCanvas) to track command indices. Index tracking is always correct, since it's done Skia-side only: SkPicture::draw()->SkNWayCanvas(BenchmarkingCanvas)->SkDebugCanvas. Timing info is gathered using instrumented overrides, and it's true that we may miss some draw calls at this level - but the only side effect should be missing timing info for those commands, while the rest of the results are still valid (unlike the other failure scenarios above). So yes, in a sense TimingCanvas still depends on the Skia canvas API but the dependency is much looser: missing some results for not-yet-instrumented commands vs. completely incorrect results.
On 2013/07/29 17:49:35, nduca wrote: > I'd like to use ARB_timer_query for the gpu equivalent timing... we've been > slowly plumbing it through the stack and its perfect for this. Can we put an > assert in the code that causes it to explode if its handed a GPU-based canvas? At this point SkiaBenchmarkingExtension (and helper classes) allocate the canvas internally and always go for the raster variant. I guess we can still add the assert as a precaution for when we start plumbing the GPU stuff.
lgtm
Message was sent while issue was closed.
Committed patchset #8 manually as r214404 (presubmit successful). |