|
|
Chromium Code Reviews|
Created:
4 years, 1 month ago by Ken Rockot(use gerrit already) Modified:
4 years, 1 month ago Reviewers:
Torne CC:
Aaron Boodman, abarth-chromium, chromium-reviews, darin (slow to review), qsr+mojo_chromium.org, viettrungluu+watch_chromium.org, yzshen+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionMojo EDK: Work around Nexus 9 hardware bug
Adds a memory barrier to port table lookup, which is apparently
sufficient to avoid a nasty Nexus 9 bug.
BUG=665869
Committed: https://crrev.com/bb50f7d41c6aded291aec15a16853a1cf044c275
Cr-Commit-Position: refs/heads/master@{#433291}
Patch Set 1 #Patch Set 2 : fix? #Patch Set 3 : . #
Total comments: 1
Patch Set 4 : . #
Total comments: 2
Patch Set 5 : . #Messages
Total messages: 27 (18 generated)
The CQ bit was checked by rockot@chromium.org to run a CQ dry run
The CQ bit was checked by rockot@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from
==========
Stress test to repro Nexus 9 crash
BUG=665869
==========
to
==========
Mojo EDK: Work around Nexus 9 hardward bug
This makes a redundant copy of the scoped_refptr<Port> used in
OnObserveClosure before acuiring its lock, effectively working
around what appears to be a hardware bug triggered by some
code sequence there.
BUG=665869
==========
rockot@chromium.org changed reviewers: + torne@chromium.org
torne@ mind taking a look at this hack to see if it maybe makes sense?
Description was changed from
==========
Mojo EDK: Work around Nexus 9 hardward bug
This makes a redundant copy of the scoped_refptr<Port> used in
OnObserveClosure before acuiring its lock, effectively working
around what appears to be a hardware bug triggered by some
code sequence there.
BUG=665869
==========
to
==========
Mojo EDK: Work around Nexus 9 hardware bug
This makes a redundant copy of the scoped_refptr<Port> used in
OnObserveClosure before acquiring its lock, effectively working
around what appears to be a hardware bug triggered by some
code sequence there.
BUG=665869
==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... File mojo/edk/system/ports/node.cc (right): https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... mojo/edk/system/ports/node.cc:647: scoped_refptr<Port> port = closed_port; Doing this here isn't sufficient - there are also crashes occurring in the other functions in this file that call GetPort, so whatever weird hack we introduce should probably be inside GetPort. It's weird that adding another scoped_refptr here works. I guess it's just spreading things out enough that the out-of-order execution engine isn't making whatever mistake it was making any more due to the changed input. You might find that it's more efficient to add a MemoryBarrier() call somewhere instead of making an extra refptr. My comment in the bug about creating the scoped_refptr including a barrier is wrong - incrementing a refcount doesn't use a barrier, only decrementing does, which doesn't happen until after the bug was observed - so adding a barrier might be enough.
The CQ bit was checked by rockot@chromium.org to run a CQ dry run
On 2016/11/18 at 11:59:57, torne wrote: > https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... > File mojo/edk/system/ports/node.cc (right): > > https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... > mojo/edk/system/ports/node.cc:647: scoped_refptr<Port> port = closed_port; > Doing this here isn't sufficient - there are also crashes occurring in the other functions in this file that call GetPort, so whatever weird hack we introduce should probably be inside GetPort. > > It's weird that adding another scoped_refptr here works. I guess it's just spreading things out enough that the out-of-order execution engine isn't making whatever mistake it was making any more due to the changed input. > > You might find that it's more efficient to add a MemoryBarrier() call somewhere instead of making an extra refptr. My comment in the bug about creating the scoped_refptr including a barrier is wrong - incrementing a refcount doesn't use a barrier, only decrementing does, which doesn't happen until after the bug was observed - so adding a barrier might be enough. Thanks for looking. I missed that there were other crashes outside of OnObserveClosure, but you're right. It appears to be sufficient to added a barrier within GetPort_Locked.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/11/18 at 15:30:14, Ken Rockot wrote: > On 2016/11/18 at 11:59:57, torne wrote: > > https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... > > File mojo/edk/system/ports/node.cc (right): > > > > https://codereview.chromium.org/2514553002/diff/40001/mojo/edk/system/ports/n... > > mojo/edk/system/ports/node.cc:647: scoped_refptr<Port> port = closed_port; > > Doing this here isn't sufficient - there are also crashes occurring in the other functions in this file that call GetPort, so whatever weird hack we introduce should probably be inside GetPort. > > > > It's weird that adding another scoped_refptr here works. I guess it's just spreading things out enough that the out-of-order execution engine isn't making whatever mistake it was making any more due to the changed input. > > > > You might find that it's more efficient to add a MemoryBarrier() call somewhere instead of making an extra refptr. My comment in the bug about creating the scoped_refptr including a barrier is wrong - incrementing a refcount doesn't use a barrier, only decrementing does, which doesn't happen until after the bug was observed - so adding a barrier might be enough. > > Thanks for looking. I missed that there were other crashes outside of OnObserveClosure, but you're right. > > It appears to be sufficient to added a barrier within GetPort_Locked. (Also, done)
That's... really interesting that putting a barrier there actually works, because that's *before* the refptr is even constructed on the stack :) Oh well. If that appears to work, we can ship it, it can't hurt. LGTM with a nit: https://codereview.chromium.org/2514553002/diff/60001/mojo/edk/system/ports/n... File mojo/edk/system/ports/node.cc (right): https://codereview.chromium.org/2514553002/diff/60001/mojo/edk/system/ports/n... mojo/edk/system/ports/node.cc:808: #if defined(OS_ANDROID) You could use #if defined(OS_ANDROID) && defined(ARCH_CPU_ARM64) to limit it more specifically.
The CQ bit was checked by rockot@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from
==========
Mojo EDK: Work around Nexus 9 hardware bug
This makes a redundant copy of the scoped_refptr<Port> used in
OnObserveClosure before acquiring its lock, effectively working
around what appears to be a hardware bug triggered by some
code sequence there.
BUG=665869
==========
to
==========
Mojo EDK: Work around Nexus 9 hardware bug
Adds a memory barrier to port table lookup, which is apparently
sufficient to avoid a nasty Nexus 9 bug.
BUG=665869
==========
Yes, strange indeed. Also note that even more curiously, adding a memory barrier *after* the scoped_refptr construction appeared to be insufficient. https://codereview.chromium.org/2514553002/diff/60001/mojo/edk/system/ports/n... File mojo/edk/system/ports/node.cc (right): https://codereview.chromium.org/2514553002/diff/60001/mojo/edk/system/ports/n... mojo/edk/system/ports/node.cc:808: #if defined(OS_ANDROID) On 2016/11/18 at 15:37:57, Torne wrote: > You could use #if defined(OS_ANDROID) && defined(ARCH_CPU_ARM64) to limit it more specifically. Done
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by rockot@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from torne@chromium.org Link to the patchset: https://codereview.chromium.org/2514553002/#ps80001 (title: ".")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Committed patchset #5 (id:80001)
Message was sent while issue was closed.
Description was changed from ========== Mojo EDK: Work around Nexus 9 hardware bug Adds a memory barrier to port table lookup, which is apparently sufficient to avoid a nasty Nexus 9 bug. BUG=665869 ========== to ========== Mojo EDK: Work around Nexus 9 hardware bug Adds a memory barrier to port table lookup, which is apparently sufficient to avoid a nasty Nexus 9 bug. BUG=665869 Committed: https://crrev.com/bb50f7d41c6aded291aec15a16853a1cf044c275 Cr-Commit-Position: refs/heads/master@{#433291} ==========
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/bb50f7d41c6aded291aec15a16853a1cf044c275 Cr-Commit-Position: refs/heads/master@{#433291} |
