|
|
Created:
4 years, 8 months ago by Kimmo Kinnunen Modified:
4 years, 6 months ago CC:
chromium-reviews, piman+watch_chromium.org, cc-bugs_chromium.org, bsalomon Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptioncc: GPU rasterize to an auxiliary surface when using MSAA renderbuffers
GPU rasterize to an auxiliary MSAA surface when rendering with MSAA
hardware that uses renderbuffers.
Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which
are time-consuming.
Instead, allocate the MSAA SkSurface once, and then blit the rendering
results to the tile.
Improves GPU rasterization performance when MSAA backend is used.
For example, the tiger svg changes from ~40 to ~60 updates per second
on certain Android hardware.
BUG=594928
CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel
Patch Set 1 #Patch Set 2 : #
Total comments: 4
Patch Set 3 : rebase #Patch Set 4 : #Patch Set 5 : rebase #Patch Set 6 : rebase #
Depends on Patchset: Messages
Total messages: 27 (11 generated)
Description was changed from ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 ========== to ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
Description was changed from ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
kkinnunen@nvidia.com changed reviewers: + piman@chromium.org, reveman@chromium.org
The CQ bit was checked by kkinnunen@nvidia.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1876363005/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1876363005/1
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by kkinnunen@nvidia.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1876363005/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1876363005/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
piman@chromium.org changed reviewers: + senorblanco@chromium.org
I'm not sufficiently fluent in skia to review in depth those parts - +senorblanco. But I have a couple of high-level questions. If I understand correctly, you're doing 2 things here: 1- explicitly separating the MSAA backing from the resolved texture, so that it can be reused 2- leverage EXT_multisampled_render_to_texture (implicit resolve) as opposed to GL_EXT_framebuffer_multisample/es3 - type multisampling (explicit resolve), but still force the resolve by drawing the MSAA texture into the tile backing (I assume, so that you can reuse the MSAA backing) (1) seems good, but is there a reason it couldn't also apply for explicit-resolve-type multisampling (which is now standard, i.e. probably more commonly supported)? Is it that skia APIs are missing to be able to do that? What is the advantage of (2)? Are there platforms that benefit from it that don't support explicit resolve? Also, it seems like this could potentially be more costly because of the extra blit (if the implicit resolve is *also* a blit). https://codereview.chromium.org/1876363005/diff/20001/gpu/command_buffer/comm... File gpu/command_buffer/common/capabilities.h (right): https://codereview.chromium.org/1876363005/diff/20001/gpu/command_buffer/comm... gpu/command_buffer/common/capabilities.h:148: bool chromium_framebuffer_mixed_samples; For these to work (across processes), you also need to add them to gpu/ipc/common/gpu_command_buffer_traits_multi.h
Before this patch with multisampled render to texture: Render multisampled to texture After this patch with multisampled render to texture: Render multisampled to texture Before this patch with explicit resolve multisample: Create multisampled renderbuffers Render to multisampled renderbuffers Resolve to target texture Destroy multisampled renderbuffers After this patch with explicit resolve multisample: Render to longlived multisampled renderbuffers Resolve to target texture (*) (*) If Skia is smart enough. If it's not, I may try to fix it. At least it does not seem to have much of a perf penalty compared to the in-Skia approach which guaranteed the blit. So if you trace the if statements, there is no change of operation at all if your hw is multisampled render to texture. Skia will use it to render "directly" to the texture. Only change is for the explicit resolve type hw. Here instead of creating the surface every time for each tile, we create it only once. On 2016/04/14 22:45:03, piman wrote: > If I understand correctly, you're doing 2 things here: > 1- explicitly separating the MSAA backing from the resolved texture, so that it > can be reused > (1) seems good, but is there a reason it couldn't also apply for > explicit-resolve-type multisampling (which is now standard, i.e. probably more > commonly supported)? Is it that skia APIs are missing to be able to do that? I'm a bit confused. This applies _only_ to explicit resolve type of multisampling, as the commit message tries to convey. There's no use of separating any backings for multisampled-render-to-texture, as there is no backings to separate? The only thing you have is the texture, to which you render, multisampled. As for the API: There kind of is one, but it's not efficient, hence this bug. I tried to make it efficient, but it was rejected, hence this commit. There's also the perspective that Chromium is creating these transient SkSurfaces -- typically if you want something to be efficient, you preserve the work done. Even if Chromium did have persistent per-tile SkSurfaces, there would be a problem, since the render buffer parts of the surface would not move between surfaces -- you would waste very much memory. Again, unless you'd "cache" the render buffer parts separately inside Skia, and let them move between surfaces. > 2- leverage EXT_multisampled_render_to_texture (implicit resolve) as opposed to > GL_EXT_framebuffer_multisample/es3 - type multisampling (explicit resolve), but > still force the resolve by drawing the MSAA texture into the tile backing (I > assume, so that you can reuse the MSAA backing) > > What is the advantage of (2)? Are there platforms that benefit from it that > don't support explicit resolve? Also, it seems like this could potentially be > more costly because of the extra blit (if the implicit resolve is *also* a > blit). The practical answer is that Skia will use use multisampled-render-to-texture if it is present. Thus for multisampled-render-to-texture, this patch tries to change nothing. If you're asking why I do not change Skia to not use multisampled-render-to-texture and then not add the condition to Chromium: It is a good strategy because switching FBOs seems to be costly, especially with command buffer. With multisampled render to texture, there's less FBO switches. I guess also, in practice, the "implicit resolve" is very fast on the HW that support the method of rendering, at least relative to the alternatives on the same hardware. Also, Skia will store its internal temp multisampled render targets to the small GPU object cache during its rendering. This makes multisampled render to texture very beneficial, as any complex scene will start to trash the GPU object cache with when using explicit resolve MSAA, due to the render targets being so much bigger.
Description was changed from ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
senorblanco@chromium.org changed reviewers: + enne@chromium.org, vmiura@chromium.org
https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterize... File cc/raster/gpu_rasterizer.cc (right): https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterize... cc/raster/gpu_rasterizer.cc:115: .gpu.multisampled_render_to_texture && Rather than checking for the absence of multisampled_render_to_texture, could we check for the presence of explicit-resolve MSAA? That might be clearer. https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterize... cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples; Out of curiosity, why do we not keep a persistent SkSurface in the mixed_samples case? https://codereview.chromium.org/1876363005/diff/20001/cc/raster/gpu_rasterize... cc/raster/gpu_rasterizer.cc:148: msaa_surface_->draw(target_surface->getCanvas(), 0, 0, nullptr); Is this the extra blit you're talking about? So we've got draw-to-renderbuffer, resolve-to-texture (inside Skia, on msaa_surface), then draw-to-tile-texture? (here) Or you're hoping that Skia does the multisample resolve at the same time as this blit? If that were true, we're keeping around an extra tile-sized RGBA texture that's unused, if I understand correctly.
On 2016/04/15 06:17:09, Kimmo Kinnunen wrote: > Before this patch with multisampled render to texture: > Render multisampled to texture > After this patch with multisampled render to texture: > Render multisampled to texture > > > Before this patch with explicit resolve multisample: > Create multisampled renderbuffers > Render to multisampled renderbuffers > Resolve to target texture > Destroy multisampled renderbuffers > > After this patch with explicit resolve multisample: > Render to longlived multisampled renderbuffers > Resolve to target texture (*) > > (*) If Skia is smart enough. If it's not, I may try to fix it. At least it does > not seem to have much of a perf penalty compared to the in-Skia approach which > guaranteed the blit. I'm not sure what you mean by this comment. Are you hoping that the draw() from the multisampled SkSurface to the tile's SkSurface will do the resolve? I haven't checked, and this could be possible, but doesn't that mean that we'll have an RGBA8 texture inside the msaa SkSurface that goes unused? > So if you trace the if statements, there is no change of operation at all if > your hw is multisampled render to texture. > Skia will use it to render "directly" to the texture. > > > Only change is for the explicit resolve type hw. Here instead of creating the > surface every time for each tile, we create it only once. Victor and/or enne should definitely take a look at this, since my cc knowledge is incomplete. I'm not against this idea, since the creation/destruction of renderbuffers is slow on some drivers. (Note that it wasn't the cause of the catastrophic slowdowns on Intel that I noticed, though.) How long-lived is the SkSurface? I'm guessing that its contained RenderBuffer is counted against Skia's texture budget, but not cc's? This might force Skia to evict textures it otherwise might not. OTOH, it was probably doing that anyway when the renderbuffer was created. (Question for cc folks): Does cc create tiles of different sizes? If so, creating a single RenderBuffer might have to be recreated to accommodate tiles of different sizes. > I'm a bit confused. This applies _only_ to explicit resolve type of > multisampling, as the commit message tries to convey. There's no use of > separating any backings for multisampled-render-to-texture, as there is no > backings to separate? The only thing you have is the texture, to which you > render, multisampled. Perhaps we should make its creation contingent on the presence of explicit-resolve multisampling, rather than the absence of implicit-resolve (if I understand the if-statements correctly). > As for the API: > There kind of is one, but it's not efficient, hence this bug. I tried to make it > efficient, but it was rejected, hence this commit. There's also the perspective > that Chromium is creating these transient SkSurfaces -- typically if you want > something to be efficient, you preserve the work done. Even if Chromium did have > persistent per-tile SkSurfaces, there would be a problem, since the render > buffer parts of the surface would not move between surfaces -- you would waste > very much memory. Again, unless you'd "cache" the render buffer parts separately > inside Skia, and let them move between surfaces. That would definitely be cleaner, and would allow Skia to manage lifetime/caching a bit better. but I understand there has been resistance to that from the Skia folks. > The practical answer is that Skia will use use multisampled-render-to-texture if > it is present. Thus for multisampled-render-to-texture, this patch tries to > change nothing. > > If you're asking why I do not change Skia to not use > multisampled-render-to-texture and then not add the condition to Chromium: > It is a good strategy because switching FBOs seems to be costly, especially with > command buffer. With multisampled render to texture, there's less FBO switches. > I guess also, in practice, the "implicit resolve" is very fast on the HW that > support the method of rendering, at least relative to the alternatives on the > same hardware. Also, Skia will store its internal temp multisampled render > targets to the small GPU object cache during its rendering. This makes > multisampled render to texture very beneficial, as any complex scene will start > to trash the GPU object cache with when using explicit resolve MSAA, due to the > render targets being so much bigger. multisampled-render-to-texture is very performant on tiling architectures, where the resolve is done per-tile, and there's no VRAM overhead.
On 2016/04/15 14:15:25, Stephen White wrote: > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples; > Out of curiosity, why do we not keep a persistent SkSurface in the mixed_samples > case? Mixed samples mode in Skia always uses only 1 color sample. This means that there's no renderbuffers used, Skia will just attach the texture to the FBO. Thus there is no problem with allocating extra renderbuffers, and the operation resembles the render-to-texture mode. > On 2016/04/15 06:17:09, Kimmo Kinnunen wrote: > > (*) If Skia is smart enough. If it's not, I may try to fix it. At least it > does > > not seem to have much of a perf penalty compared to the in-Skia approach which > > guaranteed the blit. > > I'm not sure what you mean by this comment. Are you hoping that the draw() from > the multisampled SkSurface to the tile's SkSurface will do the resolve? Yes, this is what the comment speculates on. > I haven't checked, and this could be possible, but doesn't that mean that > we'll have an RGBA8 texture inside the msaa SkSurface that goes unused? Well, I guess the clearer way to say that would be that this approach uses one tiles worth of RGBA8 memory more. It is unused in the best case scenario. Worst case, and the most probable case, scenario is that Skia is actually using the extra texture data. Anyhow, this increases the memory consumption by one tile texture. Sorry, should have been explicit about this in the commit message. > I'm guessing that its contained RenderBuffer > is counted against Skia's texture budget, but not cc's? This might force Skia to > evict textures it otherwise might not. OTOH, it was probably doing that anyway > when the renderbuffer was created. In this patch, the extra render target is counted in the Skia texture budget in order to enable recycling of the render target when it is not needed. This does decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color samples) (E.g. 5x or 9x size of one tile). I can also make it not do that, if that is more logical. > Perhaps we should make its creation contingent on the presence of > explicit-resolve > multisampling, rather than the absence of implicit-resolve (if I understand the > if-statements correctly). Presence of explicit-resolve does not imply absence of implicit-resolve :) Skia will use implicit resolve on systems that have both. I think it is clearer this way. Skia has the same condition. After all, this is just looking inside the abstraction, hardcoding what Skia is expected to do.
Description was changed from ========== cc: GPU rasterize to an auxilary surface when using MSAA renderbuffers GPU rasterize to an auxilary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ========== to ========== cc: GPU rasterize to an auxiliary surface when using MSAA renderbuffers GPU rasterize to an auxiliary MSAA surface when rendering with MSAA hardware that uses renderbuffers. Creating MSAA SkSurfaces per tile causes renderbuffer allocations, which are time-consuming. Instead, allocate the MSAA SkSurface once, and then blit the rendering results to the tile. Improves GPU rasterization performance when MSAA backend is used. For example, the tiger svg changes from ~40 to ~60 updates per second on certain Android hardware. BUG=594928 CQ_INCLUDE_TRYBOTS=tryserver.blink:linux_blink_rel ==========
On 2016/04/18 06:16:37, Kimmo Kinnunen wrote: > On 2016/04/15 14:15:25, Stephen White wrote: > > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples; > > Out of curiosity, why do we not keep a persistent SkSurface in the > mixed_samples > > case? > > Mixed samples mode in Skia always uses only 1 color sample. This means that > there's no renderbuffers used, Skia will just attach the texture to the FBO. > Thus there is no problem with allocating extra renderbuffers, and the operation > resembles the render-to-texture mode. I must admit I don't know NV_framebuffer_mixed_samples (or its _chromium subset) that well. So where is the extra VRAM for multiple stencil (etc) samples per pixel accounted for? In the texture itself? Or is this functionality truly "free" in the multisampled_render_to_texture sense (ie., ephemeral)? > > On 2016/04/15 06:17:09, Kimmo Kinnunen wrote: > > > (*) If Skia is smart enough. If it's not, I may try to fix it. At least it > > does > > > not seem to have much of a perf penalty compared to the in-Skia approach > which > > > guaranteed the blit. > > > > I'm not sure what you mean by this comment. Are you hoping that the draw() > from > > the multisampled SkSurface to the tile's SkSurface will do the resolve? > > Yes, this is what the comment speculates on. > > > I haven't checked, and this could be possible, but doesn't that mean that > > we'll have an RGBA8 texture inside the msaa SkSurface that goes unused? > > Well, I guess the clearer way to say that would be that this approach uses one > tiles worth of RGBA8 memory more. It is unused in the best case scenario. Worst > case, and the most probable case, scenario is that Skia is actually using the > extra texture data. Anyhow, this increases the memory consumption by one tile > texture. Sorry, should have been explicit about this in the commit message. > > > I'm guessing that its contained RenderBuffer > > is counted against Skia's texture budget, but not cc's? This might force Skia > to > > evict textures it otherwise might not. OTOH, it was probably doing that anyway > > when the renderbuffer was created. > > In this patch, the extra render target is counted in the Skia texture budget in > order to enable recycling of the render target when it is not needed. This does > decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color > samples) (E.g. 5x or 9x size of one tile). I can also make it not do that, if > that is more logical. When you say "I can also make it not do that", do you mean "it" = blink, or "it" = Skia? > > Perhaps we should make its creation contingent on the presence of > > explicit-resolve > > multisampling, rather than the absence of implicit-resolve (if I understand > the > > if-statements correctly). > > Presence of explicit-resolve does not imply absence of implicit-resolve :) Skia > will use implicit resolve on systems that have both. > > I think it is clearer this way. Skia has the same condition. After all, this is > just looking inside the abstraction, hardcoding what Skia is expected to do. This is another argument for pushing this functionality into Skia, instead of duplicating its conditions here, which may be fragile going forward. Is there a specific benchmark on which this change improves performance? Is it checked into Telemetry? Which platforms does it improve? My inclination is that the extra complexity introduced by this patch must be compensated for by a corresponding demonstrated (and tracked) improvement in performance, either on a single widely-used platform, or multiple platforms.
On 2016/04/18 15:01:57, Stephen White wrote: > On 2016/04/18 06:16:37, Kimmo Kinnunen wrote: > > On 2016/04/15 14:15:25, Stephen White wrote: > > > cc/raster/gpu_rasterizer.cc:117: .gpu.chromium_framebuffer_mixed_samples; > > > Out of curiosity, why do we not keep a persistent SkSurface in the > > mixed_samples > > > case? > > > > Mixed samples mode in Skia always uses only 1 color sample. This means that > > there's no renderbuffers used, Skia will just attach the texture to the FBO. > > Thus there is no problem with allocating extra renderbuffers, and the > operation > > resembles the render-to-texture mode. > > I must admit I don't know NV_framebuffer_mixed_samples (or its _chromium subset) > that well. So where is the extra VRAM for multiple stencil (etc) samples per > pixel > accounted for? In the texture itself? Or is this functionality truly "free" > in the multisampled_render_to_texture sense (ie., ephemeral)? Not truely free, it needs the multisampled stencil buffer. More specifically: NV_framebuffer_mixed_samples just allows "evaluate coverage at a higher frequency than color samples are stored". When NV_framebuffer_mixed_samples and NV_path_rendering are present, Skia will use 1 color sample and 4/8 stencil samples. The stencil buffers will use render buffers but are recycled by Skia when the render target is destroyed, so they are not that slow. Or, the current behavior of creating FBOs for each tile is not slow compared to other command buffer / CC interaction overhead, at the moment. > > In this patch, the extra render target is counted in the Skia texture budget > in > > order to enable recycling of the render target when it is not needed. This > does > > decrease the Skia budget quite much, by 1 tile * (1 texture sample + # color > > samples) (E.g. 5x or 9x size of one tile). I can also make it not do that, if > > that is more logical. > > When you say "I can also make it not do that", do you mean "it" = blink, or "it" > = Skia? I can change this patch to allocate the render targets with "unbudgeted" Skia caching policy. I can add any needed CC callbacks for accounting the memory held in GpuRasterizer, so that the memory appears in CC budget. I'll do this if it is more logical than accounting this memory in Skia. > > I think it is clearer this way. Skia has the same condition. After all, this > is > > just looking inside the abstraction, hardcoding what Skia is expected to do. > > This is another argument for pushing this functionality into Skia, instead of > duplicating its conditions here, which may be fragile going forward. I would be first to agree. > Is there a specific benchmark on which this change improves performance? Is it > checked into Telemetry? If you share your typical command-line to test the MSAA code, I'm sure I can report if it improves. I'm pretty sure it will. I've been using manual testing, mostly by repainting svg content continuously. I'm partly using the few .html files animating the tiger head from one of your issues. Sorry, I haven't been able to invest time into digging into telemetry so that I'd know the cases used to test these specifically. I'm working on that, though. > Which platforms does it improve? I'm guessing it improves every platform that uses explicit resolve and has MSAA not blacklisted. I have not been able to test any other than NV platforms, tough. I'm do not know if any such platform counts for "widely used"? If I understand correctly, most widely sold high-end Android mobile chips are such that they have multisampled-render-to-texture? For NV chips without mixed samples, it reduces frame rendering time by 10-20%. One such a device is Nexus 9. > My inclination is that the extra > complexity introduced by this patch must be compensated for by a corresponding > demonstrated (and tracked) improvement in performance, either on a single > widely-used platform, or multiple platforms. I must say compared to lots of other code in the system, to me it would seem this be quite simple patch. Anyhow, the initial intention was to entice reveman (or any CC person) to comment on how he wants the code to be structured... If you think there's room for simplification or an alternative approach, I can also do that. It's not as if this would be the first declined approach.
On 2016/04/18 15:01:57, Stephen White wrote: >My inclination is that the > extra > complexity introduced by this patch must be compensated for by a corresponding > demonstrated (and tracked) improvement in performance, either on a single > widely-used platform, or multiple platforms. And if you have any such a platform in mind, I'd be happy to go and acquire that and make sure the criteria be demonstrated.
On 2016/04/18 19:31:01, Kimmo Kinnunen wrote: > On 2016/04/18 15:01:57, Stephen White wrote: > >My inclination is that the > > extra > > complexity introduced by this patch must be compensated for by a corresponding > > > demonstrated (and tracked) improvement in performance, either on a single > > widely-used platform, or multiple platforms. > > And if you have any such a platform in mind, I'd be happy to go and acquire that > and make sure the criteria be demonstrated. Is there a platform on Telemetry that would show the change? https://build.chromium.org/p/chromium.perf/waterfall
The patch did not improve chalkboard avg_surface_fps, which remains at stable 19 for the device I tested. It did progres and regress some other measurements, but I don't know which of these are useful (stable and meaningful). The numbers did not improve as much as with the testcases that I had. I uploaded a telemetry change for discussion. I didn't manage to do a successful tryjob with nexus9, and my local nexus9 is not rooted (I don't know if userdebug images are available?). I tested with similar device (shield tablet 8). Profiling the gpu process, it appears as if with this patch, the gpu process is consuming way less cpu time than without it on chalkboard. It does not translate to FPS, though.
On 2016/04/21 13:47:45, Kimmo Kinnunen wrote: > The patch did not improve chalkboard avg_surface_fps, which remains at stable 19 > for the device I tested. > It did progres and regress some other measurements, but I don't know which of > these are useful (stable and meaningful). > The numbers did not improve as much as with the testcases that I had. > I uploaded a telemetry change for discussion. Great, thanks! > I didn't manage to do a successful tryjob with nexus9, and my local nexus9 is > not rooted (I don't know if userdebug images are available?). I tested with > similar device (shield tablet 8). > > Profiling the gpu process, it appears as if with this patch, the gpu process is > consuming way less cpu time than without it on chalkboard. It does not translate > to FPS, though. Presumably this would also translate into lower power usage? Telemetry has some recent additions to measure power, but I'm not sure if we get numbers for Android I think it might just be the "battor" device on Mac. |