Issue 2776923003: cc: Partial draw without partial swap support.

reveman

Description was changed from ========== gpu: Partial update without copy. BUG= ========== to ========== gpu: ...

3 years, 8 months ago (2017-03-26 14:16:34 UTC) #1

reveman

Description was changed from ========== gpu: Partial update without copy. BUG= CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== ...

3 years, 8 months ago (2017-03-26 14:16:34 UTC) #2

reveman

Description was changed from ========== gpu: Partial update without copy. BUG= CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel;master.tryserver.chromium.android:android_optional_gpu_tests_rel;master.tryserver.chromium.linux:linux_optional_gpu_tests_rel;master.tryserver.chromium.mac:mac_optional_gpu_tests_rel;master.tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== ...

3 years, 8 months ago (2017-03-26 17:50:39 UTC) #3

reveman

Description was changed from ========== gpu: Partial swap without copy for BufferQueue. Implement partial swap ...

3 years, 8 months ago (2017-03-26 17:57:30 UTC) #4

reveman

Description was changed from ========== gpu: Partial swap without copy for BufferQueue. Implement partial swap ...

3 years, 8 months ago (2017-03-28 16:25:43 UTC) #5

reveman

Description was changed from ========== gpu: Partial swap without copy for BufferQueue. NOT FOR REVIEW ...

3 years, 8 months ago (2017-03-29 12:37:58 UTC) #6

reveman

reveman@chromium.org changed reviewers: + dcastagna@chromium.org

3 years, 8 months ago (2017-03-29 12:39:28 UTC) #7

piman

The advantage of the previous method is that for a large set of workloads (where ...

3 years, 8 months ago (2017-03-29 17:15:42 UTC) #9

Daniele Castagna

On 2017/03/29 at 17:15:42, piman wrote: > The advantage of the previous method is that ...

3 years, 8 months ago (2017-03-29 18:31:47 UTC) #10

reveman

On 2017/03/29 at 18:31:47, dcastagna wrote: > On 2017/03/29 at 17:15:42, piman wrote: > > ...

3 years, 8 months ago (2017-03-29 19:22:04 UTC) #11

On 2017/03/29 at 18:31:47, dcastagna wrote:
> On 2017/03/29 at 17:15:42, piman wrote:
> > The advantage of the previous method is that for a large set of workloads
(where the damage rect is consistent frame-to-frame, e.g. blinking cursor,
video/canvas updates), the region to be copied becomes empty in the steady
state. I think this is a very useful property to keep for devices that can do
it, because of the performance advantages. Thoughts?
> 
> I didn't know how we were doing partial updates. That seems pretty efficient
assuming the workload to copy old_damage - new_damage is generally less than the
workload to recomposite the same region (I guess it might not be the case if
we're compositing solid color quads or YUV buffers).
> 
> I don't understand why having the region being copied becoming empty in the
steady state would be better than this approach though. Wouldn't the steady
state of this approach (for the workloads you described) have the same damage
rect as the current approach (and no "copy"-region) eventually?

Yes, I'm also failing to see how the steady state would be different. The copy
approach might seem more efficient in theory but I wonder if increasing the
scissor rect slightly is not sometimes more efficient than binding a fullscreen
texture (that was not recently used by the GPU) for sampling and reading some
pixels out of it.

> 
> Taking a step back. Let me recap why we ended up in this situation and see if
I understand what's going on.
> We implemented and enabled implicit-sync to be able to move the fence wait
from the DRM thread to the kernel. In this way we can save a few microseconds
(probably even less) and make it more likely to make it in time for the next
pageflip.
> We then disabled partial updates because implicit sync doesn't play well with
partial updates. This increases significantly the work on the GPU for almost
every frame, now on kevin we always miss the next pageflip and we always end up
triple buffering when compositing is involved (even for a tiny cursor update).
> 
> Would it make any sense to re-consider disabling implicit-sync (while we wait
for explicit-sync) and re-enabling partial updates as they are right now?

I think that's a valid question to ask. Turning off implicit sync before we have
explicit sync feels like a larger change though and if we decide that this
non-copy approach is better no matter what then no reason to not move forward
with it. If we decide that the copy approach is definitely better then disabling
implicit sync might be a better approach than handling partial updates in
different ways depending on the device.

piman

On 2017/03/29 19:22:04, reveman wrote: > On 2017/03/29 at 18:31:47, dcastagna wrote: > > On ...

3 years, 8 months ago (2017-03-29 19:29:47 UTC) #12

On 2017/03/29 19:22:04, reveman wrote:
> On 2017/03/29 at 18:31:47, dcastagna wrote:
> > On 2017/03/29 at 17:15:42, piman wrote:
> > > The advantage of the previous method is that for a large set of workloads
> (where the damage rect is consistent frame-to-frame, e.g. blinking cursor,
> video/canvas updates), the region to be copied becomes empty in the steady
> state. I think this is a very useful property to keep for devices that can do
> it, because of the performance advantages. Thoughts?
> > 
> > I didn't know how we were doing partial updates. That seems pretty efficient
> assuming the workload to copy old_damage - new_damage is generally less than
the
> workload to recomposite the same region (I guess it might not be the case if
> we're compositing solid color quads or YUV buffers).
> > 
> > I don't understand why having the region being copied becoming empty in the
> steady state would be better than this approach though. Wouldn't the steady
> state of this approach (for the workloads you described) have the same damage
> rect as the current approach (and no "copy"-region) eventually?
> 
> Yes, I'm also failing to see how the steady state would be different. The copy
> approach might seem more efficient in theory but I wonder if increasing the
> scissor rect slightly is not sometimes more efficient than binding a
fullscreen
> texture (that was not recently used by the GPU) for sampling and reading some
> pixels out of it.

Oh, sorry, I initially misread the patch, I see what you're doing. I agree that
in the steady state it should be equivalent in terms of pixels written. I still
think the copy is going to be generally more efficient than re-render, in
particular as we look at enabling multisampling etc. and for high overdraw (and
because it can use dedicated 2D HW instead of the full 3D pipeline). Would it be
hard to treat the "can't read in-use images" restriction as a driver issue, with
a workaround of enlarging the damage rect (what this patch does) instead of
doing the copy, but keeping the copy as the favored path?

Daniele Castagna

On 2017/03/29 at 19:29:47, piman wrote: > On 2017/03/29 19:22:04, reveman wrote: > > On ...

3 years, 8 months ago (2017-03-29 19:53:06 UTC) #13

On 2017/03/29 at 19:29:47, piman wrote:
> On 2017/03/29 19:22:04, reveman wrote:
> > On 2017/03/29 at 18:31:47, dcastagna wrote:
> > > On 2017/03/29 at 17:15:42, piman wrote:
> > > > The advantage of the previous method is that for a large set of
workloads
> > (where the damage rect is consistent frame-to-frame, e.g. blinking cursor,
> > video/canvas updates), the region to be copied becomes empty in the steady
> > state. I think this is a very useful property to keep for devices that can
do
> > it, because of the performance advantages. Thoughts?
> > > 
> > > I didn't know how we were doing partial updates. That seems pretty
efficient
> > assuming the workload to copy old_damage - new_damage is generally less than
the
> > workload to recomposite the same region (I guess it might not be the case if
> > we're compositing solid color quads or YUV buffers).
> > > 
> > > I don't understand why having the region being copied becoming empty in
the
> > steady state would be better than this approach though. Wouldn't the steady
> > state of this approach (for the workloads you described) have the same
damage
> > rect as the current approach (and no "copy"-region) eventually?
> > 
> > Yes, I'm also failing to see how the steady state would be different. The
copy
> > approach might seem more efficient in theory but I wonder if increasing the
> > scissor rect slightly is not sometimes more efficient than binding a
fullscreen
> > texture (that was not recently used by the GPU) for sampling and reading
some
> > pixels out of it.
> 
> Oh, sorry, I initially misread the patch, I see what you're doing. I agree
that in the steady state it should be equivalent in terms of pixels written. I
still think the copy is going to be generally more efficient than re-render, in
particular as we look at enabling multisampling etc. and for high overdraw (and
because it can use dedicated 2D HW instead of the full 3D pipeline). Would it be
hard to treat the "can't read in-use images" restriction as a driver issue, with
a workaround of enlarging the damage rect (what this patch does) instead of
doing the copy, but keeping the copy as the favored path?

I agree it seems like the copy might be generally faster (and for sure the cost
is more predictable and has an upper-bound) than re-render. On Kevin the buffer
is also compressed, so that should help even more.

We introduced the issue in the first place though. Wouldn't it be better just to
try to evaluate how big of a change would be to disable implicit sync and leave
the code as it is instead of adding two different paths?

reveman

3 years, 8 months ago (2017-03-29 20:11:30 UTC) #14

On 2017/03/29 at 19:53:06, dcastagna wrote:
> On 2017/03/29 at 19:29:47, piman wrote:
> > On 2017/03/29 19:22:04, reveman wrote:
> > > On 2017/03/29 at 18:31:47, dcastagna wrote:
> > > > On 2017/03/29 at 17:15:42, piman wrote:
> > > > > The advantage of the previous method is that for a large set of
workloads
> > > (where the damage rect is consistent frame-to-frame, e.g. blinking cursor,
> > > video/canvas updates), the region to be copied becomes empty in the steady
> > > state. I think this is a very useful property to keep for devices that can
do
> > > it, because of the performance advantages. Thoughts?
> > > > 
> > > > I didn't know how we were doing partial updates. That seems pretty
efficient
> > > assuming the workload to copy old_damage - new_damage is generally less
than the
> > > workload to recomposite the same region (I guess it might not be the case
if
> > > we're compositing solid color quads or YUV buffers).
> > > > 
> > > > I don't understand why having the region being copied becoming empty in
the
> > > steady state would be better than this approach though. Wouldn't the
steady
> > > state of this approach (for the workloads you described) have the same
damage
> > > rect as the current approach (and no "copy"-region) eventually?
> > > 
> > > Yes, I'm also failing to see how the steady state would be different. The
copy
> > > approach might seem more efficient in theory but I wonder if increasing
the
> > > scissor rect slightly is not sometimes more efficient than binding a
fullscreen
> > > texture (that was not recently used by the GPU) for sampling and reading
some
> > > pixels out of it.
> > 
> > Oh, sorry, I initially misread the patch, I see what you're doing. I agree
that in the steady state it should be equivalent in terms of pixels written. I
still think the copy is going to be generally more efficient than re-render, in
particular as we look at enabling multisampling etc. and for high overdraw (and
because it can use dedicated 2D HW instead of the full 3D pipeline). Would it be
hard to treat the "can't read in-use images" restriction as a driver issue, with
a workaround of enlarging the damage rect (what this patch does) instead of
doing the copy, but keeping the copy as the favored path?
> 
> I agree it seems like the copy might be generally faster (and for sure the
cost is more predictable and has an upper-bound) than re-render. On Kevin the
buffer is also compressed, so that should help even more.
> 
> We introduced the issue in the first place though. Wouldn't it be better just
to try to evaluate how big of a change would be to disable implicit sync and
leave the code as it is instead of adding two different paths?

Yes, let's figure out if we can instead disable implicit sync and if not we move
forward with this as an optional partial swap mechanism where the copy approach
is still preferred.

reveman

Description was changed from ========== gpu: Partial swap without copy for BufferQueue. Implement partial swap ...

3 years, 8 months ago (2017-04-18 03:20:15 UTC) #15

reveman

The CQ bit was checked by reveman@chromium.org to run a CQ dry run

3 years, 8 months ago (2017-04-18 03:20:24 UTC) #16