Issue 1479673003: Verify resource provider sync tokens before sending to parent.

David Yen

5 years ago (2015-11-26 01:04:48 UTC) #1

David Yen

Description was changed from ========== Verify resource provider sync tokens before sending to parent. R=piman@chromium.org ...

5 years ago (2015-11-26 01:04:49 UTC) #2

piman

https://codereview.chromium.org/1479673003/diff/1/cc/resources/resource_provider.cc File cc/resources/resource_provider.cc (right): https://codereview.chromium.org/1479673003/diff/1/cc/resources/resource_provider.cc#newcode1152 cc/resources/resource_provider.cc:1152: unverified_sync_tokens.push_back(new_sync_token.GetData()); Actually, the VerifySyncTokensCHROMIUM below will tag the local ...

5 years ago (2015-11-26 02:32:44 UTC) #3

David Yen

https://codereview.chromium.org/1479673003/diff/1/cc/resources/resource_provider.cc File cc/resources/resource_provider.cc (right): https://codereview.chromium.org/1479673003/diff/1/cc/resources/resource_provider.cc#newcode1152 cc/resources/resource_provider.cc:1152: unverified_sync_tokens.push_back(new_sync_token.GetData()); On 2015/11/26 02:32:44, piman (slow to review) wrote: ...

5 years ago (2015-12-08 00:57:56 UTC) #4

David Yen

On 2015/12/08 02:27:33, piman (OOO back 12-11) wrote: > lgtm In fixing the unit tests, ...

5 years ago (2015-12-09 23:21:37 UTC) #6

David Yen

The patchset sent to the CQ was uploaded after l-g-t-m from piman@chromium.org Link to the ...

5 years ago (2015-12-16 01:03:16 UTC) #9

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1479673003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1479673003/60001

5 years ago (2015-12-16 01:08:04 UTC) #10

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years ago (2015-12-16 02:48:29 UTC) #11

commit-bot: I haz the power

Try jobs failed on following builders: android_arm64_dbg_recipe on tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) cast_shell_android on ...

5 years ago (2015-12-16 02:48:30 UTC) #12

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1479673003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1479673003/60001

5 years ago (2015-12-16 06:03:06 UTC) #14

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years ago (2015-12-16 07:04:43 UTC) #15

commit-bot: I haz the power

Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/156917)

5 years ago (2015-12-16 07:04:43 UTC) #16

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1479673003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1479673003/60001

5 years ago (2015-12-16 17:13:28 UTC) #18

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years ago (2015-12-16 18:08:56 UTC) #19

commit-bot: I haz the power

Try jobs failed on following builders: linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/157105)

5 years ago (2015-12-16 18:08:56 UTC) #20

David Yen

On 2015/12/16 18:08:56, commit-bot: I haz the power wrote: > Try jobs failed on following ...

5 years ago (2015-12-16 21:13:04 UTC) #21

piman

On Thu, Dec 17, 2015 at 6:13 AM, <dyen@chromium.org> wrote: > On 2015/12/16 18:08:56, commit-bot: ...

5 years ago (2015-12-17 06:16:39 UTC) #22

On Thu, Dec 17, 2015 at 6:13 AM, <dyen@chromium.org> wrote:

> On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
>
>> Try jobs failed on following builders:
>>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
>>
>
>
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
> )
>
> So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
>
> 1. This is most likely command buffer implementation dependent (it was the
> case
> for mojo) but I had to actually generate a fence sync command in order to
> be
> sure a flush actually does something. For that reason I've changed
> VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
> GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
>

I'd like to understand this "flush actually does something". Is it because
of the last_put_offset_ == put_offset early exit? That shouldn't be a
problem - it would only trigger if no commands were inserted since the last
flush. For mojo, it may trigger even without an actual Flush, because
OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
result is the same - when Flush returns, it's guaranteed that an IPC has
been sent that makes the commands up to put_offset visible to the service
side; it doesn't matter whether that IPC was sent from the Flush itself, or
some time in the past. In either case it's valid to do the round-trip, and
that will satisfy the guarantee that the service side has seen that IPC.

Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
which forces an extra flush, but I don't think it's needed because of the
above - or maybe I missed something, but I would like to understand it (and
it probably needs to be documented).

2. I also thought of a subtle race condition that could happen, since we are
> ensuring that ordering barriers are flushed within
> CanWaitUnverifiedSyncToken(),
> we have to do the synchronization validation after all the
> CanWaitUnverifiedSyncToken() calls. For that reason I have restructured the
> function to loop through the sync tokens twice. This requires an extra
> memcpy
> call for the first loop, but I think it is still faster to do the memcpy
> on the
> stack rather than allocate a vector of sync tokens on the heap. I also
> added a
> test to be sure the order is done correctly
> (VerifySyncTokensCHROMIUM_Sequence).
>

Ok. I wonder if it could make sense to set the verified bit directly on the
byte array - it's a bit of a hack but would save those copies. We can make
a function that sets the bit directly (reinterpret_cast the byte array to
bool*), and defend it with a static_assert(offsetof(SyncToken,
verified_flush_)==0) to make sure we're touching the correct field.

>
> GLES2Implementation::VerifySyncTokensCHROMIUM() had changed quite a bit so
> I'll
> let you re-review this again. PTAL.
>
> https://codereview.chromium.org/1479673003/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

David Yen

On 2015/12/17 06:16:39, piman (in KST timezone) wrote: > On Thu, Dec 17, 2015 at ...

5 years ago (2015-12-17 06:31:09 UTC) #23

On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
> On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
> 
> > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
> >
> >> Try jobs failed on following builders:
> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
> >>
> >
> >
> >
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
> > )
> >
> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
> >
> > 1. This is most likely command buffer implementation dependent (it was the
> > case
> > for mojo) but I had to actually generate a fence sync command in order to
> > be
> > sure a flush actually does something. For that reason I've changed
> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
> >
> 
> I'd like to understand this "flush actually does something". Is it because
> of the last_put_offset_ == put_offset early exit? That shouldn't be a
> problem - it would only trigger if no commands were inserted since the last
> flush. For mojo, it may trigger even without an actual Flush, because
> OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
> result is the same - when Flush returns, it's guaranteed that an IPC has
> been sent that makes the commands up to put_offset visible to the service
> side; it doesn't matter whether that IPC was sent from the Flush itself, or
> some time in the past. In either case it's valid to do the round-trip, and
> that will satisfy the guarantee that the service side has seen that IPC.
> 
> Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
> which forces an extra flush, but I don't think it's needed because of the
> above - or maybe I missed something, but I would like to understand it (and
> it probably needs to be documented).
> 

Yeah the extra command is not really needed, the only thing that is needed is to
flush any pending ordering barriers and issue a synchronous IPC to verify the
sync token. Guaranteeing a command is in the buffer to be flushed allows the
rest of the system do the validation. This seems like the easiest way apart from
writing yet another gpu_control function to handle this case.

> 2. I also thought of a subtle race condition that could happen, since we are
> > ensuring that ordering barriers are flushed within
> > CanWaitUnverifiedSyncToken(),
> > we have to do the synchronization validation after all the
> > CanWaitUnverifiedSyncToken() calls. For that reason I have restructured the
> > function to loop through the sync tokens twice. This requires an extra
> > memcpy
> > call for the first loop, but I think it is still faster to do the memcpy
> > on the
> > stack rather than allocate a vector of sync tokens on the heap. I also
> > added a
> > test to be sure the order is done correctly
> > (VerifySyncTokensCHROMIUM_Sequence).
> >
> 
> Ok. I wonder if it could make sense to set the verified bit directly on the
> byte array - it's a bit of a hack but would save those copies. We can make
> a function that sets the bit directly (reinterpret_cast the byte array to
> bool*), and defend it with a static_assert(offsetof(SyncToken,
> verified_flush_)==0) to make sure we're touching the correct field.
> 
> 

That could work, I'm not too concerned with the performance of this part yet
since the sync_token is not that big and it's all done on the stack (IE in
cache).

> >
> > GLES2Implementation::VerifySyncTokensCHROMIUM() had changed quite a bit so
> > I'll
> > let you re-review this again. PTAL.
> >
> > https://codereview.chromium.org/1479673003/
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.

piman

On Thu, Dec 17, 2015 at 3:31 PM, <dyen@chromium.org> wrote: > On 2015/12/17 06:16:39, piman ...

5 years ago (2015-12-17 06:44:47 UTC) #24

On Thu, Dec 17, 2015 at 3:31 PM, <dyen@chromium.org> wrote:

> On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
>
> On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
>>
>
> > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
>> >
>> >> Try jobs failed on following builders:
>> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
>> >>
>> >
>> >
>> >
>>
>
>
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
>
>> > )
>> >
>> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
>> >
>> > 1. This is most likely command buffer implementation dependent (it was
>> the
>> > case
>> > for mojo) but I had to actually generate a fence sync command in order
>> to
>> > be
>> > sure a flush actually does something. For that reason I've changed
>> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
>> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
>> >
>>
>
> I'd like to understand this "flush actually does something". Is it because
>> of the last_put_offset_ == put_offset early exit? That shouldn't be a
>> problem - it would only trigger if no commands were inserted since the
>> last
>> flush. For mojo, it may trigger even without an actual Flush, because
>> OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
>> result is the same - when Flush returns, it's guaranteed that an IPC has
>> been sent that makes the commands up to put_offset visible to the service
>> side; it doesn't matter whether that IPC was sent from the Flush itself,
>> or
>> some time in the past. In either case it's valid to do the round-trip, and
>> that will satisfy the guarantee that the service side has seen that IPC.
>>
>
> Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
>> which forces an extra flush, but I don't think it's needed because of the
>> above - or maybe I missed something, but I would like to understand it
>> (and
>> it probably needs to be documented).
>>
>
>
> Yeah the extra command is not really needed, the only thing that is needed
> is to
> flush any pending ordering barriers and issue a synchronous IPC to verify
> the
> sync token. Guaranteeing a command is in the buffer to be flushed allows
> the
> rest of the system do the validation. This seems like the easiest way
> apart from
> writing yet another gpu_control function to handle this case.


But why is Flush() not enough?



>
>
> 2. I also thought of a subtle race condition that could happen, since we
>> are
>> > ensuring that ordering barriers are flushed within
>> > CanWaitUnverifiedSyncToken(),
>> > we have to do the synchronization validation after all the
>> > CanWaitUnverifiedSyncToken() calls. For that reason I have restructured
>> the
>> > function to loop through the sync tokens twice. This requires an extra
>> > memcpy
>> > call for the first loop, but I think it is still faster to do the memcpy
>> > on the
>> > stack rather than allocate a vector of sync tokens on the heap. I also
>> > added a
>> > test to be sure the order is done correctly
>> > (VerifySyncTokensCHROMIUM_Sequence).
>> >
>>
>
> Ok. I wonder if it could make sense to set the verified bit directly on the
>> byte array - it's a bit of a hack but would save those copies. We can make
>> a function that sets the bit directly (reinterpret_cast the byte array to
>> bool*), and defend it with a static_assert(offsetof(SyncToken,
>> verified_flush_)==0) to make sure we're touching the correct field.
>>
>
>
>
> That could work, I'm not too concerned with the performance of this part
> yet
> since the sync_token is not that big and it's all done on the stack (IE in
> cache).
>
> >
>> > GLES2Implementation::VerifySyncTokensCHROMIUM() had changed quite a bit
>> so
>> > I'll
>> > let you re-review this again. PTAL.
>> >
>> > https://codereview.chromium.org/1479673003/
>> >
>>
>
> --
>> You received this message because you are subscribed to the Google Groups
>> "Chromium-reviews" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>>
> email
>
>> to mailto:chromium-reviews+unsubscribe@chromium.org.
>>
>
>
>
> https://codereview.chromium.org/1479673003/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

David Yen

On 2015/12/17 06:44:47, piman (in KST timezone) wrote: > On Thu, Dec 17, 2015 at ...

5 years ago (2015-12-17 06:54:30 UTC) #25

On 2015/12/17 06:44:47, piman (in KST timezone) wrote:
> On Thu, Dec 17, 2015 at 3:31 PM, <mailto:dyen@chromium.org> wrote:
> 
> > On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
> >
> > On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
> >>
> >
> > > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
> >> >
> >> >> Try jobs failed on following builders:
> >> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
> >> >>
> >> >
> >> >
> >> >
> >>
> >
> >
> >
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
> >
> >> > )
> >> >
> >> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
> >> >
> >> > 1. This is most likely command buffer implementation dependent (it was
> >> the
> >> > case
> >> > for mojo) but I had to actually generate a fence sync command in order
> >> to
> >> > be
> >> > sure a flush actually does something. For that reason I've changed
> >> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
> >> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
> >> >
> >>
> >
> > I'd like to understand this "flush actually does something". Is it because
> >> of the last_put_offset_ == put_offset early exit? That shouldn't be a
> >> problem - it would only trigger if no commands were inserted since the
> >> last
> >> flush. For mojo, it may trigger even without an actual Flush, because
> >> OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
> >> result is the same - when Flush returns, it's guaranteed that an IPC has
> >> been sent that makes the commands up to put_offset visible to the service
> >> side; it doesn't matter whether that IPC was sent from the Flush itself,
> >> or
> >> some time in the past. In either case it's valid to do the round-trip, and
> >> that will satisfy the guarantee that the service side has seen that IPC.
> >>
> >
> > Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
> >> which forces an extra flush, but I don't think it's needed because of the
> >> above - or maybe I missed something, but I would like to understand it
> >> (and
> >> it probably needs to be documented).
> >>
> >
> >
> > Yeah the extra command is not really needed, the only thing that is needed
> > is to
> > flush any pending ordering barriers and issue a synchronous IPC to verify
> > the
> > sync token. Guaranteeing a command is in the buffer to be flushed allows
> > the
> > rest of the system do the validation. This seems like the easiest way
> > apart from
> > writing yet another gpu_control function to handle this case.
> 
> 
> But why is Flush() not enough?
> 

So the flush just becomes a no-op since the put offset didn't change. Since no
flush occurred the fence release is not set as flushed which causes the sync
token verification to fail.

On the other hand, generating a fence sync which is not a part of any command
buffer seems like an error. Although in this case it can never be passed
anywhere so there is no technical danger, but that is why the logic doesn't
handle it.

piman

On Thu, Dec 17, 2015 at 3:44 PM, Antoine Labour <piman@chromium.org> wrote: > > > ...

5 years ago (2015-12-17 06:57:00 UTC) #26

On Thu, Dec 17, 2015 at 3:44 PM, Antoine Labour <piman@chromium.org> wrote:

>
>
> On Thu, Dec 17, 2015 at 3:31 PM, <dyen@chromium.org> wrote:
>
>> On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
>>
>> On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
>>>
>>
>> > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
>>> >
>>> >> Try jobs failed on following builders:
>>> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
>>> >>
>>> >
>>> >
>>> >
>>>
>>
>>
>>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
>>
>>> > )
>>> >
>>> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
>>> >
>>> > 1. This is most likely command buffer implementation dependent (it was
>>> the
>>> > case
>>> > for mojo) but I had to actually generate a fence sync command in order
>>> to
>>> > be
>>> > sure a flush actually does something. For that reason I've changed
>>> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
>>> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
>>> >
>>>
>>
>> I'd like to understand this "flush actually does something". Is it because
>>> of the last_put_offset_ == put_offset early exit? That shouldn't be a
>>> problem - it would only trigger if no commands were inserted since the
>>> last
>>> flush. For mojo, it may trigger even without an actual Flush, because
>>> OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
>>> result is the same - when Flush returns, it's guaranteed that an IPC has
>>> been sent that makes the commands up to put_offset visible to the service
>>> side; it doesn't matter whether that IPC was sent from the Flush itself,
>>> or
>>> some time in the past. In either case it's valid to do the round-trip,
>>> and
>>> that will satisfy the guarantee that the service side has seen that IPC.
>>>
>>
>> Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
>>> which forces an extra flush, but I don't think it's needed because of the
>>> above - or maybe I missed something, but I would like to understand it
>>> (and
>>> it probably needs to be documented).
>>>
>>
>>
>> Yeah the extra command is not really needed, the only thing that is
>> needed is to
>> flush any pending ordering barriers and issue a synchronous IPC to verify
>> the
>> sync token. Guaranteeing a command is in the buffer to be flushed allows
>> the
>> rest of the system do the validation. This seems like the easiest way
>> apart from
>> writing yet another gpu_control function to handle this case.
>
>
> But why is Flush() not enough?
>

To be clear:
- on mojo, Flush() is enough to guarantee pending ordering barriers are
flushed, because they're always flushed immediately anyway (OrderingBarrier
== Flush there).
- on CommandBufferProxyImpl/GpuChannelHost, Flush will also provide the
guarantee:
  - there's only at most one OrderingBarrier pending
  - if the OrderingBarrier is from another command buffer, it will be
flushed before doing anything
  - if the OrderingBarrier is from the same command buffer, then
last_barrier_put_offset_ > last_put_offset_ (otherwise we wouldn't have
kept it), so put_offset >=  last_barrier_put_offset_ > last_put_offset_, so
we will flush.

Antoine


>
>
>
>>
>>
>> 2. I also thought of a subtle race condition that could happen, since we
>>> are
>>> > ensuring that ordering barriers are flushed within
>>> > CanWaitUnverifiedSyncToken(),
>>> > we have to do the synchronization validation after all the
>>> > CanWaitUnverifiedSyncToken() calls. For that reason I have
>>> restructured the
>>> > function to loop through the sync tokens twice. This requires an extra
>>> > memcpy
>>> > call for the first loop, but I think it is still faster to do the
>>> memcpy
>>> > on the
>>> > stack rather than allocate a vector of sync tokens on the heap. I also
>>> > added a
>>> > test to be sure the order is done correctly
>>> > (VerifySyncTokensCHROMIUM_Sequence).
>>> >
>>>
>>
>> Ok. I wonder if it could make sense to set the verified bit directly on
>>> the
>>> byte array - it's a bit of a hack but would save those copies. We can
>>> make
>>> a function that sets the bit directly (reinterpret_cast the byte array to
>>> bool*), and defend it with a static_assert(offsetof(SyncToken,
>>> verified_flush_)==0) to make sure we're touching the correct field.
>>>
>>
>>
>>
>> That could work, I'm not too concerned with the performance of this part
>> yet
>> since the sync_token is not that big and it's all done on the stack (IE in
>> cache).
>>
>> >
>>> > GLES2Implementation::VerifySyncTokensCHROMIUM() had changed quite a
>>> bit so
>>> > I'll
>>> > let you re-review this again. PTAL.
>>> >
>>> > https://codereview.chromium.org/1479673003/
>>> >
>>>
>>
>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Chromium-reviews" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>>
>> email
>>
>>> to mailto:chromium-reviews+unsubscribe@chromium.org.
>>>
>>
>>
>>
>> https://codereview.chromium.org/1479673003/
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

David Yen

On 2015/12/17 06:57:00, piman (in KST timezone) wrote: > On Thu, Dec 17, 2015 at ...

5 years ago (2015-12-17 15:40:23 UTC) #27

On 2015/12/17 06:57:00, piman (in KST timezone) wrote:
> On Thu, Dec 17, 2015 at 3:44 PM, Antoine Labour <mailto:piman@chromium.org>
wrote:
> 
> >
> >
> > On Thu, Dec 17, 2015 at 3:31 PM, <mailto:dyen@chromium.org> wrote:
> >
> >> On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
> >>
> >> On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
> >>>
> >>
> >> > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
> >>> >
> >>> >> Try jobs failed on following builders:
> >>> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
> >>> >>
> >>> >
> >>> >
> >>> >
> >>>
> >>
> >>
> >>
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
> >>
> >>> > )
> >>> >
> >>> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
> >>> >
> >>> > 1. This is most likely command buffer implementation dependent (it was
> >>> the
> >>> > case
> >>> > for mojo) but I had to actually generate a fence sync command in order
> >>> to
> >>> > be
> >>> > sure a flush actually does something. For that reason I've changed
> >>> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
> >>> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
> >>> >
> >>>
> >>
> >> I'd like to understand this "flush actually does something". Is it because
> >>> of the last_put_offset_ == put_offset early exit? That shouldn't be a
> >>> problem - it would only trigger if no commands were inserted since the
> >>> last
> >>> flush. For mojo, it may trigger even without an actual Flush, because
> >>> OrderingBarrier isn't optimized and behaves exactly like a Flush, but the
> >>> result is the same - when Flush returns, it's guaranteed that an IPC has
> >>> been sent that makes the commands up to put_offset visible to the service
> >>> side; it doesn't matter whether that IPC was sent from the Flush itself,
> >>> or
> >>> some time in the past. In either case it's valid to do the round-trip,
> >>> and
> >>> that will satisfy the guarantee that the service side has seen that IPC.
> >>>
> >>
> >> Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
> >>> which forces an extra flush, but I don't think it's needed because of the
> >>> above - or maybe I missed something, but I would like to understand it
> >>> (and
> >>> it probably needs to be documented).
> >>>
> >>
> >>
> >> Yeah the extra command is not really needed, the only thing that is
> >> needed is to
> >> flush any pending ordering barriers and issue a synchronous IPC to verify
> >> the
> >> sync token. Guaranteeing a command is in the buffer to be flushed allows
> >> the
> >> rest of the system do the validation. This seems like the easiest way
> >> apart from
> >> writing yet another gpu_control function to handle this case.
> >
> >
> > But why is Flush() not enough?
> >
> 
> To be clear:
> - on mojo, Flush() is enough to guarantee pending ordering barriers are
> flushed, because they're always flushed immediately anyway (OrderingBarrier
> == Flush there).
> - on CommandBufferProxyImpl/GpuChannelHost, Flush will also provide the
> guarantee:
>   - there's only at most one OrderingBarrier pending
>   - if the OrderingBarrier is from another command buffer, it will be
> flushed before doing anything
>   - if the OrderingBarrier is from the same command buffer, then
> last_barrier_put_offset_ > last_put_offset_ (otherwise we wouldn't have
> kept it), so put_offset >=  last_barrier_put_offset_ > last_put_offset_, so
> we will flush.
> 
> Antoine
> 

Yes the flush is enough to ensure there are no pending ordering barriers, that's
not the issue here. The issue is forcing a synchronous IPC to happen even if
there are no fence syncs to be validated on this stream.

So in order to force the synchronous IPC the code inserts a dummy fence sync,
flushes, and validates that dummy fence sync. The issue is if flush didn't do
anything, it does not mark the dummy fence sync as being flushed, and the
validation also does not do anything because there are no flushed fence syncs
waiting to be validated.

Another option we can do is to add a function in gpu_control which just issues a
synchronous iPC, then here we would just call flush() and the new function. I am
not sure if it's worth adding the new function just for this use case though,
but I can understand if you don't want to cause an unnecessary flush.

David Yen

On 2015/12/17 15:40:23, David Yen wrote: > On 2015/12/17 06:57:00, piman (in KST timezone) wrote: ...

4 years, 11 months ago (2016-01-05 21:17:30 UTC) #28

On 2015/12/17 15:40:23, David Yen wrote:
> On 2015/12/17 06:57:00, piman (in KST timezone) wrote:
> > On Thu, Dec 17, 2015 at 3:44 PM, Antoine Labour <mailto:piman@chromium.org>
> wrote:
> > 
> > >
> > >
> > > On Thu, Dec 17, 2015 at 3:31 PM, <mailto:dyen@chromium.org> wrote:
> > >
> > >> On 2015/12/17 06:16:39, piman (in KST timezone) wrote:
> > >>
> > >> On Thu, Dec 17, 2015 at 6:13 AM, <mailto:dyen@chromium.org> wrote:
> > >>>
> > >>
> > >> > On 2015/12/16 18:08:56, commit-bot: I haz the power wrote:
> > >>> >
> > >>> >> Try jobs failed on following builders:
> > >>> >>    linux_chromium_rel_ng on tryserver.chromium.linux (JOB_FAILED,
> > >>> >>
> > >>> >
> > >>> >
> > >>> >
> > >>>
> > >>
> > >>
> > >>
> >
>
http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...
> > >>
> > >>> > )
> > >>> >
> > >>> > So a couple of issues I've found here with VerifySyncTokensCHROMIUM():
> > >>> >
> > >>> > 1. This is most likely command buffer implementation dependent (it was
> > >>> the
> > >>> > case
> > >>> > for mojo) but I had to actually generate a fence sync command in order
> > >>> to
> > >>> > be
> > >>> > sure a flush actually does something. For that reason I've changed
> > >>> > VerifySyncTokensCHROMIUM() to call InsertFenceSyncCHROMIUM and
> > >>> > GenSyncTokenCHROMIUM() instead of calling gpu_control calls directly.
> > >>> >
> > >>>
> > >>
> > >> I'd like to understand this "flush actually does something". Is it
because
> > >>> of the last_put_offset_ == put_offset early exit? That shouldn't be a
> > >>> problem - it would only trigger if no commands were inserted since the
> > >>> last
> > >>> flush. For mojo, it may trigger even without an actual Flush, because
> > >>> OrderingBarrier isn't optimized and behaves exactly like a Flush, but
the
> > >>> result is the same - when Flush returns, it's guaranteed that an IPC has
> > >>> been sent that makes the commands up to put_offset visible to the
service
> > >>> side; it doesn't matter whether that IPC was sent from the Flush itself,
> > >>> or
> > >>> some time in the past. In either case it's valid to do the round-trip,
> > >>> and
> > >>> that will satisfy the guarantee that the service side has seen that IPC.
> > >>>
> > >>
> > >> Doing an extra InsertFenceSyncCHROMIUM basically forces an extra command,
> > >>> which forces an extra flush, but I don't think it's needed because of
the
> > >>> above - or maybe I missed something, but I would like to understand it
> > >>> (and
> > >>> it probably needs to be documented).
> > >>>
> > >>
> > >>
> > >> Yeah the extra command is not really needed, the only thing that is
> > >> needed is to
> > >> flush any pending ordering barriers and issue a synchronous IPC to verify
> > >> the
> > >> sync token. Guaranteeing a command is in the buffer to be flushed allows
> > >> the
> > >> rest of the system do the validation. This seems like the easiest way
> > >> apart from
> > >> writing yet another gpu_control function to handle this case.
> > >
> > >
> > > But why is Flush() not enough?
> > >
> > 
> > To be clear:
> > - on mojo, Flush() is enough to guarantee pending ordering barriers are
> > flushed, because they're always flushed immediately anyway (OrderingBarrier
> > == Flush there).
> > - on CommandBufferProxyImpl/GpuChannelHost, Flush will also provide the
> > guarantee:
> >   - there's only at most one OrderingBarrier pending
> >   - if the OrderingBarrier is from another command buffer, it will be
> > flushed before doing anything
> >   - if the OrderingBarrier is from the same command buffer, then
> > last_barrier_put_offset_ > last_put_offset_ (otherwise we wouldn't have
> > kept it), so put_offset >=  last_barrier_put_offset_ > last_put_offset_, so
> > we will flush.
> > 
> > Antoine
> > 
> 
> Yes the flush is enough to ensure there are no pending ordering barriers,
that's
> not the issue here. The issue is forcing a synchronous IPC to happen even if
> there are no fence syncs to be validated on this stream.
> 
> So in order to force the synchronous IPC the code inserts a dummy fence sync,
> flushes, and validates that dummy fence sync. The issue is if flush didn't do
> anything, it does not mark the dummy fence sync as being flushed, and the
> validation also does not do anything because there are no flushed fence syncs
> waiting to be validated.
> 
> Another option we can do is to add a function in gpu_control which just issues
a
> synchronous iPC, then here we would just call flush() and the new function. I
am
> not sure if it's worth adding the new function just for this use case though,
> but I can understand if you don't want to cause an unnecessary flush.

I've modified the function to use EnsureWorkVisible() now so we do not have to
use a dummy fence sync anymore. This saves us a flush if nothing needs to be
flushed. PTAL.

David Yen

The patchset sent to the CQ was uploaded after l-g-t-m from piman@chromium.org Link to the ...

4 years, 11 months ago (2016-01-05 22:50:43 UTC) #31

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1479673003/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1479673003/140001

4 years, 11 months ago (2016-01-05 22:51:49 UTC) #32

commit-bot: I haz the power

Description was changed from ========== Verify resource provider sync tokens before sending to parent. R=piman@chromium.org ...

4 years, 11 months ago (2016-01-06 00:52:17 UTC) #34

commit-bot: I haz the power

4 years, 11 months ago (2016-01-06 00:52:18 UTC) #35

Message was sent while issue was closed.

Patchset 8 (id:??) landed as
https://crrev.com/882d69d97b5c43b7579bc610722dbadd328ac63e
Cr-Commit-Position: refs/heads/master@{#367727}

Issue 1479673003: Verify resource provider sync tokens before sending to parent. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Verify new sync token in mailbox holder #

Patch Set 3 : Insert sync token after transfer all resources again #

Patch Set 4 : rebase #

Patch Set 5 : Fix verify sync token chromium order #

Patch Set 6 : No longer insert dummy fence syncs, just use EnsureWorkVisible() #

Patch Set 7 : Fixed merge error #

Patch Set 8 : Fixed unit test so it doesn't expect dummy fence sync #

Messages