Issue 651863002: Use glDiscardFramebuffer to save memory bandwidth in PNaCl

Issue 651863002: Use glDiscardFramebuffer to save memory bandwidth in PNaCl (Closed)

Created:
6 years, 2 months ago by dshwang

Modified:
6 years, 2 months ago

Reviewers:
Ken Russell (switch to Gerrit), aelias_OOO_until_Jul13, Hongbo Min, Zhenyao Mo, vmiura, no sievers

CC:
chromium-reviews, piman+watch_chromium.org, Shouqun Liu

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Use glDiscardFramebuffer to save memory bandwidth in PNaCl Leverage glDiscardFramebuffer for tiled-based GPU to improve performance of offscreen GL contexnt with preserved target buffer and also save memory bandwidth. BUG=424176

Patch Set 1 #

Total comments: 2

Created: 6 years, 2 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+8 lines, -0 lines)			Patch
	M	gpu/command_buffer/service/gles2_cmd_decoder.cc	View		1 chunk	+8 lines, -0 lines	2 comments	Download

Messages

Total messages: 19 (2 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

dshwang

Could you review this optimization? It is the same optimization of WebGL for gles2 decoder: ...

6 years, 2 months ago (2014-10-13 10:11:36 UTC) #2

Hongbo Min

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode9375 gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if (feature_info_->feature_flags().ext_discard_framebuffer) { According to my previous work, an ...

6 years, 2 months ago (2014-10-14 02:27:48 UTC) #3

dshwang

Hongbo, Thank you for review. https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/gles2_cmd_decoder.cc#newcode9375 gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if (feature_info_->feature_flags().ext_discard_framebuffer) { On ...

6 years, 2 months ago (2014-10-14 12:09:30 UTC) #4

Hongbo, Thank you for review.

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/g...
File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):

https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/g...
gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if
(feature_info_->feature_flags().ext_discard_framebuffer) {
On 2014/10/14 02:27:48, Hongbo Min wrote:
> According to my previous work, an attempt to clear a framebuffer on  tiled GPU
> is time-consuming, if glDiscardFramebuffer is used, it gives a performance
hint
> and allow GPU driver to set the pixel vale to whatever it wishes, and the
> glClear could be skipped (I have verified it on ASUS Nexus series, it works
well
> without glClear if glDiscardFramebuffer is used).

As far as I understand, your CL didn't affect whether glClear is skipped or not.
WebGLRenderingContextBase::clearIfComposited still do glClear every frame.

Let me explain why your CL speed up in tiled-based GPU as I understand. When
binding another fbo, tiled-based GPU must stores tiled cache to gpu memory
(texture, stencil buffer and depth buffer). glDiscardFramebuffer hints GPU to
not read back tiled cache to gpu memory because the software don't need anymore
the changed fbo information. It saves huge gpu bandwidth.

Let's go back to your CL. The fbo lifecycle is as follows:
1. WebGL replaces color texture to front fbo. At that time, you hint
glDiscardFramebuffer because DrawingBuffer doesn't update fbo so you don't want
to copy tiled cache to color texture redundantly. It wastes bandwidth.
2. Something later, WebGLRenderingContextBase::clearIfComposited performs
glClear on the fbo.

My CL is the same to your WebGL CL. This code just bind fbo and replace color
texture, so we don't want to copy back tiled cache to color texture redundantly.
glDiscardFramebuffer can make it.

> So my question is, where is glClear called to clear
> |offscreen_target_frame_buffer_| and prepare a new frame? and can we skip that
> glClear if have?

As I explain above, glClear is not related to glDiscardFramebuffer.

Hongbo Min

Very elaborate description. Thanks. On 2014/10/14 12:09:30, dshwang wrote: > Hongbo, Thank you for review. ...

6 years, 2 months ago (2014-10-15 02:03:59 UTC) #5

Very elaborate description. Thanks.

On 2014/10/14 12:09:30, dshwang wrote:
> Hongbo, Thank you for review.
> 
>
https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/g...
> File gpu/command_buffer/service/gles2_cmd_decoder.cc (right):
> 
>
https://codereview.chromium.org/651863002/diff/1/gpu/command_buffer/service/g...
> gpu/command_buffer/service/gles2_cmd_decoder.cc:9375: if
> (feature_info_->feature_flags().ext_discard_framebuffer) {
> On 2014/10/14 02:27:48, Hongbo Min wrote:
> > According to my previous work, an attempt to clear a framebuffer on  tiled
GPU
> > is time-consuming, if glDiscardFramebuffer is used, it gives a performance
> hint
> > and allow GPU driver to set the pixel vale to whatever it wishes, and the
> > glClear could be skipped (I have verified it on ASUS Nexus series, it works
> well
> > without glClear if glDiscardFramebuffer is used).
> 
> As far as I understand, your CL didn't affect whether glClear is skipped or
not.
> WebGLRenderingContextBase::clearIfComposited still do glClear every frame.

Indeed, my CL for webgl explicitly skips glClear in
WebGLRenderingContextBase::clearIfComposited which would return false early if
the DrawingBuffer supports discarding framebuffer by checking
DrawingBuffer::discardFramebufferSupported. See
https://codereview.chromium.org/614883002/patch/40001/50001.

> 
> Let me explain why your CL speed up in tiled-based GPU as I understand. When
> binding another fbo, tiled-based GPU must stores tiled cache to gpu memory
> (texture, stencil buffer and depth buffer). glDiscardFramebuffer hints GPU to
> not read back tiled cache to gpu memory because the software don't need
anymore
> the changed fbo information. It saves huge gpu bandwidth.
> 
> Let's go back to your CL. The fbo lifecycle is as follows:
> 1. WebGL replaces color texture to front fbo. At that time, you hint
> glDiscardFramebuffer because DrawingBuffer doesn't update fbo so you don't
want
> to copy tiled cache to color texture redundantly. It wastes bandwidth.
> 2. Something later, WebGLRenderingContextBase::clearIfComposited performs
> glClear on the fbo.
> 
> My CL is the same to your WebGL CL. This code just bind fbo and replace color
> texture, so we don't want to copy back tiled cache to color texture
redundantly.
> glDiscardFramebuffer can make it.
> 
> 
> > So my question is, where is glClear called to clear
> > |offscreen_target_frame_buffer_| and prepare a new frame? and can we skip
that
> > glClear if have?
> 
> As I explain above, glClear is not related to glDiscardFramebuffer.

According to my experiments, on ASUS Nexus 7 device, only if glClear is skipped,
the performance could get improved ~30% when using glDiscardFramebuffer.
Otherwise, glDiscardFramebuffer won't have an impact on fps improvement. I have
no much idea about why glClear calling on framebuffer is so time-consuming.
Maybe a driver bug?

dshwang

On 2014/10/15 02:03:59, Hongbo Min wrote: > Very elaborate description. Thanks. > > On 2014/10/14 ...

6 years, 2 months ago (2014-10-15 10:06:31 UTC) #7

Hongbo Min

On 2014/10/15 10:06:31, dshwang wrote: > On 2014/10/15 02:03:59, Hongbo Min wrote: > > Very ...

6 years, 2 months ago (2014-10-15 10:11:14 UTC) #8

On 2014/10/15 10:06:31, dshwang wrote:
> On 2014/10/15 02:03:59, Hongbo Min wrote:
> > Very elaborate description. Thanks.
> > 
> > On 2014/10/14 12:09:30, dshwang wrote:
> > > As far as I understand, your CL didn't affect whether glClear is skipped
or
> > not.
> > > WebGLRenderingContextBase::clearIfComposited still do glClear every frame.
> > 
> > Indeed, my CL for webgl explicitly skips glClear in
> > WebGLRenderingContextBase::clearIfComposited which would return false early
if
> > the DrawingBuffer supports discarding framebuffer by checking
> > DrawingBuffer::discardFramebufferSupported. See
> > https://codereview.chromium.org/614883002/patch/40001/50001.
> 
> Ah, I didn't notice this change. Thank you for correcting it.
> However, I found a bug thanks to your explanation. glDiscardFramebuffer()
> doesn't guarantee clearing FBO, although ASUS Nexus series clear FBO.
> I'll fix this bug in https://codereview.chromium.org/656053002/
> 
> > > As I explain above, glClear is not related to glDiscardFramebuffer.
> > 
> > According to my experiments, on ASUS Nexus 7 device, only if glClear is
> skipped,
> > the performance could get improved ~30% when using glDiscardFramebuffer.
> > Otherwise, glDiscardFramebuffer won't have an impact on fps improvement. I
> have
> > no much idea about why glClear calling on framebuffer is so time-consuming.
> > Maybe a driver bug?
> 
> You explain that skipping glClear mostly contribute on speed-up. Which test
> cases do speed-up ~30%? If WebGL draws few triangles, it might be possible,
but
> huge tests like Aquarium would speed-up little. Am I right?
> Ah.. After the above fix, the speed-up will be rollback, unfortunately.
> GLES driver may already have optimization like glDiscardFramebuffer. The code
in
> which you add glDiscardFramebuffer doesn't update buffer of FBO, so driver may
> don't read back tiled cache to gpu memory.

The benchmarks I am working on is HexGL[1], and PlayCanvas[2] in Ludei demo
list.
[1] https://github.com/hmin/HexGL/tree/ffos-local 
[2] https://cocoonjsservice.ludei.com/cocoonjslaunchersvr/demo-list/ (grep
PlayCanvas)

I strongly suggest you to try PlayCanvas, it gets a very big performance boost.

Ken Russell (switch to Gerrit)

I'm deferring to sievers@, aelias@ and vmiura@ since this would principally affect Chrome on Android. ...

6 years, 2 months ago (2014-10-16 02:09:58 UTC) #10

Ken Russell (switch to Gerrit)

Sorry. I didn't read the CL description carefully. Please file a new bug about applying ...

6 years, 2 months ago (2014-10-16 02:12:28 UTC) #11

Hongbo Min

On 2014/10/16 02:09:58, Ken Russell wrote: > I'm deferring to sievers@, aelias@ and vmiura@ since ...

6 years, 2 months ago (2014-10-16 06:27:15 UTC) #13

dshwang

On 2014/10/16 02:12:28, Ken Russell wrote: > Sorry. I didn't read the CL description carefully. ...

6 years, 2 months ago (2014-10-16 14:08:37 UTC) #14

vmiura

Due to Chromium security needs, glDiscardFramebufferEXT imposes a forced glClear in GPU command buffer. glDiscardFramebufferEXT ...

6 years, 2 months ago (2014-10-16 17:10:12 UTC) #15

dshwang

On 2014/10/16 17:10:12, vmiura wrote: Thank you for good review. > Due to Chromium security ...

6 years, 2 months ago (2014-10-16 20:17:36 UTC) #16

On 2014/10/16 17:10:12, vmiura wrote:

Thank you for good review.

> Due to Chromium security needs, glDiscardFramebufferEXT imposes a forced
glClear
> in GPU command buffer.
> 
> glDiscardFramebufferEXT marks framebuffer attachments as invalid (needs
clear),
> which forces a glClear
> the next time the buffer is used.  Therefore replacing a glClear with
> glDiscardFramebufferEXT generally
> leads to no improvement, and actually adds some overhead.

PNaCl spec said that
"The contents of ancillary buffers are always undefined after calling
SwapBuffers. The contents of the color buffer are undefined if the value of the
PP_GRAPHICS3DATTRIB_SWAP_BEHAVIOR attribute of context is not
PP_GRAPHICS3DATTRIB_BUFFER_PRESERVED."
https://code.google.com/p/chromium/codesearch#chromium/src/ppapi/cpp/graphics...
https://developer.chrome.com/native-client/pepper_dev/c/struct_p_p_b___graphi...

So it's intended to make the front buffer (i.e.
|offscreen_target_frame_buffer_|) undefined. Currently, the front buffer on
PNaCl keeps 1 frame ago back buffer (i.e. |offscreen_saved_color_texture_|).
After this CL, it's really undefined. Is it ACCEPTABLE in terms of Chromium
security???

FYI, in my experiences about glDiscardFramebufferEXT,
Adreno GPU: clear fbo
Intel GPU with mesa driver: do nothing

> There should be a win when glDiscardFramebufferEXT is used right at the |end|
of
> a rendering pass when
> there are unneeded attachments, e.g. usually DEPTH & STENCIL buffers can be
> discarded to reduce
> bandwidth on tiling GPUs.
> 
> In this case I think you could try using glDiscardFramebufferEXT with
> GL_DEPTH_ATTACHMENT & GL_STENCIL_ATTACHMENT,
> |before| doing the swap.

After re-thinking, we can use glDiscardFramebufferEXT(GL_DEPTH_ATTACHMENT &
GL_STENCIL_ATTACHMENT) with preserve-fbo because the spec said "The contents of
ancillary buffers are always undefined"

In more detail, let me explain how to composite the back buffer on browser.
After SwapBuffer(), browser generates mailbox by the new back buffer and draws
the mailbox on the browser.

no sievers

On 2014/10/16 20:17:36, dshwang wrote: > On 2014/10/16 17:10:12, vmiura wrote: > > PNaCl spec ...

6 years, 2 months ago (2014-10-16 23:04:18 UTC) #17

vmiura

> @Victor: I thought that tilers are usually able to optimize Clear + full Draw, ...

6 years, 2 months ago (2014-10-17 00:59:45 UTC) #18

dshwang

6 years, 2 months ago (2014-10-17 13:14:36 UTC) #19

@vmlura, @sievers, thank you for good explanation.
As a result, this CL regresses perf also. I close it as Invalid.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages