Issue 1876363005: cc: GPU rasterize to an auxiliary surface when using MSAA renderbuffers - Code Review

Chromium Code Reviews

chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out

(317)

My Issues | Starred Open | Closed | All

Issue 1876363005: cc: GPU rasterize to an auxiliary surface when using MSAA renderbuffers

Created:
4 years, 8 months ago by Kimmo Kinnunen

Modified:
4 years, 6 months ago

Reviewers:
reveman, piman, Stephen White, vmiura, enne (OOO)

CC:
chromium-reviews, piman+watch_chromium.org, cc-bugs_chromium.org, bsalomon

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

cc: GPU rasterize to an auxiliary surface when using MSAA renderbuffers GPU rasterize to an auxiliary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel

Patch Set 1 #

Patch Set 2 : #

Total comments: 4

Patch Set 3 : rebase #

Patch Set 4 : #

Patch Set 5 : rebase #

Patch Set 6 : rebase #

Created: 4 years, 6 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+138 lines, -18 lines)			Patch
	M	cc/raster/gpu_rasterizer.h	View	1 2 3 4	2 chunks	+19 lines, -2 lines	0 comments	Download
	M	cc/raster/gpu_rasterizer.cc	View	1 2 3 4	3 chunks	+106 lines, -14 lines	0 comments	Download
	M	cc/resources/resource_provider.h	View	1 2 3 4	2 chunks	+2 lines, -1 line	0 comments	Download
	M	gpu/command_buffer/common/capabilities.h	View	1 2 3 4 5	2 chunks	+3 lines, -1 line	0 comments	Download
	M	gpu/command_buffer/common/capabilities.cc	View	1 2 3 4	1 chunk	+2 lines, -0 lines	0 comments	Download
	M	gpu/command_buffer/service/gles2_cmd_decoder.cc	View	1 2 3 4	1 chunk	+4 lines, -0 lines	0 comments	Download
	M	gpu/ipc/common/gpu_command_buffer_traits_multi.h	View	1 2 3 4	1 chunk	+2 lines, -0 lines	0 comments	Download

Depends on Patchset:

Issue 2020013002 Patch 20001

Messages

Total messages: 27 (11 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1876363005/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1876363005/1

4 years, 8 months ago (2016-04-14 06:05:10 UTC) #6

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-14 07:22:54 UTC) #8

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1876363005/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1876363005/20001

4 years, 8 months ago (2016-04-14 13:05:37 UTC) #10

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 8 months ago (2016-04-14 14:02:02 UTC) #12

piman

I'm not sufficiently fluent in skia to review in depth those parts - +senorblanco. But ...

4 years, 8 months ago (2016-04-14 22:45:03 UTC) #14

Kimmo Kinnunen

Before this patch with multisampled render to texture: Render multisampled to texture After this patch ...

4 years, 8 months ago (2016-04-15 06:17:09 UTC) #15

Before this patch with multisampled render to texture:
Render multisampled to texture
After this patch with multisampled render to texture:
Render multisampled to texture

Before this patch with explicit resolve multisample:
Create multisampled renderbuffers
Render to multisampled renderbuffers
Resolve to target texture
Destroy multisampled renderbuffers

After this patch with explicit resolve multisample:
Render to longlived multisampled renderbuffers
Resolve to target texture (*)

(*) If Skia is smart enough. If it's not, I may try to fix it. At least it does
not seem to have much of a perf penalty compared to the in-Skia approach which
guaranteed the blit.

So if you trace the if statements, there is no change of operation at all if
your hw is multisampled render to texture.
Skia will use it to render "directly" to the texture.

Only change is for the explicit resolve type hw. Here instead of creating the
surface every time for each tile, we create it only once.

On 2016/04/14 22:45:03, piman wrote:
> If I understand correctly, you're doing 2 things here:
> 1- explicitly separating the MSAA backing from the resolved texture, so that
it
> can be reused

> (1) seems good, but is there a reason it couldn't also apply for
> explicit-resolve-type multisampling (which is now standard, i.e. probably more
> commonly supported)? Is it that skia APIs are missing to be able to do that?

I'm a bit confused. This applies _only_ to explicit resolve type of
multisampling, as the commit message tries to convey. There's no use of
separating any backings for multisampled-render-to-texture, as there is no
backings to separate? The only thing you have is the texture, to which you
render, multisampled.

As for the API:
There kind of is one, but it's not efficient, hence this bug. I tried to make it
efficient, but it was rejected, hence this commit. There's also the perspective
that Chromium is creating these transient SkSurfaces -- typically if you want
something to be efficient, you preserve the work done. Even if Chromium did have
persistent per-tile SkSurfaces, there would be a problem, since the render
buffer parts of the surface would not move between surfaces -- you would waste
very much memory. Again, unless you'd "cache" the render buffer parts separately
inside Skia, and let them move between surfaces.

> 2- leverage EXT_multisampled_render_to_texture (implicit resolve) as opposed
to
> GL_EXT_framebuffer_multisample/es3 - type multisampling (explicit resolve),
but
> still force the resolve by drawing the MSAA texture into the tile backing (I
> assume, so that you can reuse the MSAA backing)
> 
> What is the advantage of (2)? Are there platforms that benefit from it that
> don't support explicit resolve? Also, it seems like this could potentially be
> more costly because of the extra blit (if the implicit resolve is *also* a
> blit).

The practical answer is that Skia will use use multisampled-render-to-texture if
it is present. Thus for multisampled-render-to-texture, this patch tries to
change nothing.

If you're asking why I do not change Skia to not use
multisampled-render-to-texture and then not add the condition to Chromium:
It is a good strategy because switching FBOs seems to be costly, especially with
command buffer. With multisampled render to texture, there's less FBO switches.
I guess also, in practice, the "implicit resolve" is very fast on the HW that
support the method of rendering, at least relative to the alternatives on the
same hardware. Also, Skia will store its internal temp multisampled render
targets to the small GPU object cache during its rendering. This makes
multisampled render to texture very beneficial, as any complex scene will start
to trash the GPU object cache with when using explicit resolve MSAA, due to the
render targets being so much bigger.

Stephen White

https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterizer.cc File cc/raster/gpu_rasterizer.cc (right): https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterizer.cc#newcode115 cc/raster/gpu_rasterizer.cc:115: .gpu.multisampled_render_to_texture && Rather than checking for the absence of ...

4 years, 8 months ago (2016-04-15 14:10:54 UTC) #18

Stephen White

On 2016/04/15 06:17:09, Kimmo Kinnunen wrote: > Before this patch with multisampled render to texture: ...

4 years, 8 months ago (2016-04-15 14:15:25 UTC) #19

On 2016/04/15 06:17:09, Kimmo Kinnunen wrote:
> Before this patch with multisampled render to texture:
> Render multisampled to texture
> After this patch with multisampled render to texture:
> Render multisampled to texture
> 
> 
> Before this patch with explicit resolve multisample:
> Create multisampled renderbuffers
> Render to multisampled renderbuffers
> Resolve to target texture
> Destroy multisampled renderbuffers
> 
> After this patch with explicit resolve multisample:
> Render to longlived multisampled renderbuffers
> Resolve to target texture (*)
> 
> (*) If Skia is smart enough. If it's not, I may try to fix it. At least it
does
> not seem to have much of a perf penalty compared to the in-Skia approach which
> guaranteed the blit.

I'm not sure what you mean by this comment. Are you hoping that the draw() from
the multisampled SkSurface to the tile's SkSurface will do the resolve?
I haven't checked, and this could be possible, but doesn't that mean that
we'll have an RGBA8 texture inside the msaa SkSurface that goes unused?

> So if you trace the if statements, there is no change of operation at all if
> your hw is multisampled render to texture.
> Skia will use it to render "directly" to the texture.
> 
> 
> Only change is for the explicit resolve type hw. Here instead of creating the
> surface every time for each tile, we create it only once.

Victor and/or enne should definitely take a look at this, since my cc knowledge 
is incomplete.

I'm not against this idea, since the creation/destruction of renderbuffers is
slow on some drivers. (Note that it wasn't the cause of the catastrophic
slowdowns 
on Intel that I noticed, though.)

How long-lived is the SkSurface? I'm guessing that its contained RenderBuffer 
is counted against Skia's texture budget, but not cc's? This might force Skia to
evict textures it otherwise might not. OTOH, it was probably doing that anyway
when the renderbuffer was created.

(Question for cc folks): Does cc create tiles of different sizes? If so,
creating
a single RenderBuffer might have to be recreated to accommodate tiles of
different sizes.

> I'm a bit confused. This applies _only_ to explicit resolve type of
> multisampling, as the commit message tries to convey. There's no use of
> separating any backings for multisampled-render-to-texture, as there is no
> backings to separate? The only thing you have is the texture, to which you
> render, multisampled.

Perhaps we should make its creation contingent on the presence of
explicit-resolve 
multisampling, rather than the absence of implicit-resolve (if I understand the
if-statements correctly).

> As for the API:
> There kind of is one, but it's not efficient, hence this bug. I tried to make
it
> efficient, but it was rejected, hence this commit. There's also the
perspective
> that Chromium is creating these transient SkSurfaces -- typically if you want
> something to be efficient, you preserve the work done. Even if Chromium did
have
> persistent per-tile SkSurfaces, there would be a problem, since the render
> buffer parts of the surface would not move between surfaces -- you would waste
> very much memory. Again, unless you'd "cache" the render buffer parts
separately
> inside Skia, and let them move between surfaces.

That would definitely be cleaner, and would allow Skia to manage
lifetime/caching
a bit better. but I understand there has been resistance to that from
the Skia folks.

> The practical answer is that Skia will use use multisampled-render-to-texture
if
> it is present. Thus for multisampled-render-to-texture, this patch tries to
> change nothing.
> 
> If you're asking why I do not change Skia to not use
> multisampled-render-to-texture and then not add the condition to Chromium:
> It is a good strategy because switching FBOs seems to be costly, especially
with
> command buffer. With multisampled render to texture, there's less FBO
switches.
> I guess also, in practice, the "implicit resolve" is very fast on the HW that
> support the method of rendering, at least relative to the alternatives on the
> same hardware. Also, Skia will store its internal temp multisampled render
> targets to the small GPU object cache during its rendering. This makes
> multisampled render to texture very beneficial, as any complex scene will
start
> to trash the GPU object cache with when using explicit resolve MSAA, due to
the
> render targets being so much bigger.

multisampled-render-to-texture is very performant on tiling architectures, where
the resolve is done per-tile, and there's no VRAM overhead.

Kimmo Kinnunen

On 2016/04/15 14:15:25, Stephen White wrote: > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples; > Out of curiosity, why do ...

4 years, 8 months ago (2016-04-18 06:16:37 UTC) #20

On 2016/04/15 14:15:25, Stephen White wrote:
> cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples;
> Out of curiosity, why do we not keep a persistent SkSurface in the
mixed_samples
> case?

Mixed samples mode in Skia always uses only 1 color sample. This means that
there's no renderbuffers used, Skia will just attach the texture to the FBO.
Thus there is no problem with allocating extra renderbuffers, and the operation
resembles the render-to-texture mode.

> On 2016/04/15 06:17:09, Kimmo Kinnunen wrote:
> > (*) If Skia is smart enough. If it's not, I may try to fix it. At least it
> does
> > not seem to have much of a perf penalty compared to the in-Skia approach
which
> > guaranteed the blit.
> 
> I'm not sure what you mean by this comment. Are you hoping that the draw()
from
> the multisampled SkSurface to the tile's SkSurface will do the resolve?

Yes, this is what the comment speculates on.

> I haven't checked, and this could be possible, but doesn't that mean that
> we'll have an RGBA8 texture inside the msaa SkSurface that goes unused?

Well, I guess the clearer way to say that would be that this approach uses one
tiles worth of RGBA8 memory more. It is unused in the best case scenario. Worst
case, and the most probable case, scenario is that Skia is actually using the
extra texture data. Anyhow, this increases the memory consumption by one tile
texture. Sorry, should have been explicit about this in the commit message.

> I'm guessing that its contained RenderBuffer 
> is counted against Skia's texture budget, but not cc's? This might force Skia
to
> evict textures it otherwise might not. OTOH, it was probably doing that anyway
> when the renderbuffer was created.

In this patch, the extra render target is counted in the Skia texture budget in
order to enable recycling of the render target when it is not needed. This does
decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color
samples) (E.g. 5x or 9x size of one tile). I can also make it not do that, if
that is more logical.

> Perhaps we should make its creation contingent on the presence of
> explicit-resolve 
> multisampling, rather than the absence of implicit-resolve (if I understand
the
> if-statements correctly).

Presence of explicit-resolve does not imply absence of implicit-resolve :) Skia
will use implicit resolve on systems that have both.

I think it is clearer this way. Skia has the same condition. After all, this is
just looking inside the abstraction, hardcoding what Skia is expected to do.

Stephen White

On 2016/04/18 06:16:37, Kimmo Kinnunen wrote: > On 2016/04/15 14:15:25, Stephen White wrote: > > ...

4 years, 8 months ago (2016-04-18 15:01:57 UTC) #22

On 2016/04/18 06:16:37, Kimmo Kinnunen wrote:
> On 2016/04/15 14:15:25, Stephen White wrote:
> > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples;
> > Out of curiosity, why do we not keep a persistent SkSurface in the
> mixed_samples
> > case?
> 
> Mixed samples mode in Skia always uses only 1 color sample. This means that
> there's no renderbuffers used, Skia will just attach the texture to the FBO.
> Thus there is no problem with allocating extra renderbuffers, and the
operation
> resembles the render-to-texture mode.

I must admit I don't know NV_framebuffer_mixed_samples (or its _chromium subset)

that well. So where is the extra VRAM for multiple stencil (etc) samples per
pixel
accounted for? In the texture itself? Or is this functionality truly "free" 
in the multisampled_render_to_texture sense (ie., ephemeral)?

> > On 2016/04/15 06:17:09, Kimmo Kinnunen wrote:
> > > (*) If Skia is smart enough. If it's not, I may try to fix it. At least it
> > does
> > > not seem to have much of a perf penalty compared to the in-Skia approach
> which
> > > guaranteed the blit.
> > 
> > I'm not sure what you mean by this comment. Are you hoping that the draw()
> from
> > the multisampled SkSurface to the tile's SkSurface will do the resolve?
> 
> Yes, this is what the comment speculates on.
> 
> > I haven't checked, and this could be possible, but doesn't that mean that
> > we'll have an RGBA8 texture inside the msaa SkSurface that goes unused?
> 
> Well, I guess the clearer way to say that would be that this approach uses one
> tiles worth of RGBA8 memory more. It is unused in the best case scenario.
Worst
> case, and the most probable case, scenario is that Skia is actually using the
> extra texture data. Anyhow, this increases the memory consumption by one tile
> texture. Sorry, should have been explicit about this in the commit message.
> 
> > I'm guessing that its contained RenderBuffer 
> > is counted against Skia's texture budget, but not cc's? This might force
Skia
> to
> > evict textures it otherwise might not. OTOH, it was probably doing that
anyway
> > when the renderbuffer was created.
> 
> In this patch, the extra render target is counted in the Skia texture budget
in
> order to enable recycling of the render target when it is not needed. This
does
> decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color
> samples) (E.g. 5x or 9x size of one tile). I can also make it not do that, if
> that is more logical.

When you say "I can also make it not do that", do you mean "it" = blink, or "it"
= Skia?

> > Perhaps we should make its creation contingent on the presence of
> > explicit-resolve 
> > multisampling, rather than the absence of implicit-resolve (if I understand
> the
> > if-statements correctly).
> 
> Presence of explicit-resolve does not imply absence of implicit-resolve :)
Skia
> will use implicit resolve on systems that have both.
> 
> I think it is clearer this way. Skia has the same condition. After all, this
is
> just looking inside the abstraction, hardcoding what Skia is expected to do.

This is another argument for pushing this functionality into Skia, instead of
duplicating its conditions here, which may be fragile going forward.

Is there a specific benchmark on which this change improves performance? Is it
checked 
into Telemetry? Which platforms does it improve? My inclination is that the
extra 
complexity introduced by this patch must be compensated for by a corresponding 
demonstrated (and tracked) improvement in performance, either on a single 
widely-used platform, or multiple platforms.

Kimmo Kinnunen

On 2016/04/18 15:01:57, Stephen White wrote: > On 2016/04/18 06:16:37, Kimmo Kinnunen wrote: > > ...

4 years, 8 months ago (2016-04-18 19:23:41 UTC) #23

On 2016/04/18 15:01:57, Stephen White wrote:
> On 2016/04/18 06:16:37, Kimmo Kinnunen wrote:
> > On 2016/04/15 14:15:25, Stephen White wrote:
> > > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples;
> > > Out of curiosity, why do we not keep a persistent SkSurface in the
> > mixed_samples
> > > case?
> > 
> > Mixed samples mode in Skia always uses only 1 color sample. This means that
> > there's no renderbuffers used, Skia will just attach the texture to the FBO.
> > Thus there is no problem with allocating extra renderbuffers, and the
> operation
> > resembles the render-to-texture mode.
> 
> I must admit I don't know NV_framebuffer_mixed_samples (or its _chromium
subset)
> that well. So where is the extra VRAM for multiple stencil (etc) samples per
> pixel
> accounted for? In the texture itself? Or is this functionality truly "free" 
> in the multisampled_render_to_texture sense (ie., ephemeral)?

Not truely free, it needs the multisampled stencil buffer. 
More specifically:
NV_framebuffer_mixed_samples just allows "evaluate coverage at a higher
frequency than color samples are stored".
When NV_framebuffer_mixed_samples and NV_path_rendering are present, Skia will
use 1 color sample and 4/8 stencil samples.

The stencil buffers will use render buffers but are recycled by Skia when the
render target is destroyed, so they are not that slow. Or, the current behavior
of creating FBOs for each tile is not slow compared to other command buffer / CC
interaction overhead, at the moment. 

> > In this patch, the extra render target is counted in the Skia texture budget
> in
> > order to enable recycling of the render target when it is not needed. This
> does
> > decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color
> > samples) (E.g. 5x or 9x size of one tile). I can also make it not do that,
if
> > that is more logical.
> 
> When you say "I can also make it not do that", do you mean "it" = blink, or
"it"
> = Skia?

I can change this patch to allocate the render targets with "unbudgeted" Skia
caching policy. I can add any needed CC callbacks for accounting the memory held
in GpuRasterizer, so that the memory appears in CC budget. 
I'll do this if it is more logical than accounting this memory in Skia.

> > I think it is clearer this way. Skia has the same condition. After all, this
> is
> > just looking inside the abstraction, hardcoding what Skia is expected to do.
> 
> This is another argument for pushing this functionality into Skia, instead of
> duplicating its conditions here, which may be fragile going forward.

I would be first to agree. 

> Is there a specific benchmark on which this change improves performance? Is it
> checked into Telemetry?

If you share your typical command-line to test the MSAA code, I'm sure I can
report if it improves. I'm pretty sure it will.

I've been using manual testing, mostly by repainting svg content continuously.
I'm partly using the few .html files animating the tiger head from one of your
issues.
Sorry, I haven't been able to invest time into digging into telemetry so that
I'd know the cases used to test these specifically. I'm working on that, though.

> Which platforms does it improve?

I'm guessing it improves every platform that uses explicit resolve and has MSAA
not blacklisted. I have not been able to test any other than NV platforms,
tough.

I'm do not know  if any such platform counts for "widely used"? If I understand
correctly, most widely sold high-end Android mobile chips are such that they
have multisampled-render-to-texture?

For NV chips without mixed samples, it reduces frame rendering time by 10-20%.
One such a device is Nexus 9.

> My inclination is that the extra 
> complexity introduced by this patch must be compensated for by a corresponding

> demonstrated (and tracked) improvement in performance, either on a single 
> widely-used platform, or multiple platforms.

I must say compared to lots of other code in the system, to me it would seem
this be quite simple patch.

Anyhow, the initial intention was to entice reveman (or any CC person) to
comment on how he wants the code to be structured... If you think there's room
for simplification or an alternative approach, I can also do that. It's not as
if this would be the first declined approach.

Kimmo Kinnunen

On 2016/04/18 15:01:57, Stephen White wrote: >My inclination is that the > extra > complexity ...

4 years, 8 months ago (2016-04-18 19:31:01 UTC) #24

Stephen White

On 2016/04/18 19:31:01, Kimmo Kinnunen wrote: > On 2016/04/18 15:01:57, Stephen White wrote: > >My ...

4 years, 8 months ago (2016-04-18 22:11:14 UTC) #25

Kimmo Kinnunen

The patch did not improve chalkboard avg_surface_fps, which remains at stable 19 for the device ...

4 years, 8 months ago (2016-04-21 13:47:45 UTC) #26

Stephen White

4 years, 8 months ago (2016-04-21 13:50:37 UTC) #27

On 2016/04/21 13:47:45, Kimmo Kinnunen wrote:
> The patch did not improve chalkboard avg_surface_fps, which remains at stable
19
> for the device I tested.
> It did progres and regress some other measurements, but I don't know which of
> these are useful (stable and meaningful).
> The numbers did not improve as much as with the testcases that I had.
> I uploaded a telemetry change for discussion.

Great, thanks!

> I didn't manage to do a successful tryjob with nexus9, and my local nexus9 is
> not rooted (I don't know if userdebug images are available?). I tested with
> similar device (shield tablet 8).
> 
> Profiling the gpu process, it appears as if with this patch, the gpu process
is
> consuming way less cpu time than without it on chalkboard. It does not
translate
> to FPS, though.

Presumably this would also translate into lower power usage? Telemetry has some
recent additions to measure power, but I'm not sure if we get numbers for
Android
I think it might just be the "battor" device on Mac.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Powered by Google App Engine

This is Rietveld 408576698