|
|
Description[Sync] Signal UIModelWorker to abort on sync shutdown.
This should prevent a deadlock between the sync and UI threads during
shutdown.
BUG=663600
Committed: https://crrev.com/52e8a13c965a4777fa96eb2feb016e82a1baccbf
Cr-Commit-Position: refs/heads/master@{#433628}
Patch Set 1 #Patch Set 2 : keep reference to UIModelWorker #Patch Set 3 : self-review #Patch Set 4 : self-review #
Total comments: 12
Patch Set 5 : CR maxbogue #21 #
Total comments: 2
Messages
Total messages: 34 (26 generated)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_x86-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_x86-ge...) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
fdoray@chromium.org changed reviewers: + maxbogue@chromium.org
PTAL
Mostly LG, I just want to fully understand why this works over the approach I was trying Monday. https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.cc:18: class ScopedEventSignalWithUIModelWorker { nit: ScopedEventSignalWithWorker would be plenty descriptive IMO. At some point a name becomes so long it's actually less readable. https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.cc:45: ScopedEventSignalWithUIModelWorker scoped_event_signal_with_ui_model_worker, nit: scoped_event_signal_with_worker https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.h (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.h:47: base::WaitableEvent stop_requested_; Really this is necessary to prevent the race between the stopped signal being set and the work_done_or_abandoned_ signal being reset, right? I'm trying to see why this works better than checking IsStopped in CallDoWorkAndSignalEvent did. https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker_unittest.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker_unittest.cc:94: base::PlatformThread::Sleep(TestTimeouts::tiny_timeout()); Ooo, interesting. Didn't know we had anything like this. This isn't racy? https://codereview.chromium.org/2505913003/diff/60001/components/sync/engine/... File components/sync/engine/model_safe_worker.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/engine/... components/sync/engine/model_safe_worker.cc:82: bool ModelSafeWorker::IsStopped() { IsStopped() is still below DoWorkAndWaitUntilDone in the header so you can probably revert this file.
PTAnL https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.cc:18: class ScopedEventSignalWithUIModelWorker { On 2016/11/18 18:26:51, maxbogue wrote: > nit: ScopedEventSignalWithWorker would be plenty descriptive IMO. At some point > a name becomes so long it's actually less readable. Done. https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.cc:45: ScopedEventSignalWithUIModelWorker scoped_event_signal_with_ui_model_worker, On 2016/11/18 18:26:51, maxbogue wrote: > nit: scoped_event_signal_with_worker Done. https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.h (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.h:47: base::WaitableEvent stop_requested_; On 2016/11/18 18:26:51, maxbogue wrote: > Really this is necessary to prevent the race between the stopped signal being > set and the work_done_or_abandoned_ signal being reset, right? I'm trying to see > why this works better than checking IsStopped in CallDoWorkAndSignalEvent did. With your CL: Thread: Action: Sync DoWorkAndWaitUntilDone() sees that the worker isn't stopped UI RequestStop() sets the stopped flag signals the event Sync DoWorkAndWaitUntilDoneImpl() resets the event posts the task waits for it forever With this CL: Thread: Action: Sync DoWorkAndWaitUntilDone() sees that the worker isn't stopped UI RequestStop() sets the stopped flag signals |stop_requested_| - which is never reset Sync DoWorkAndWaitUntilDoneImpl() resets |work_done_or_abandoned_| posts the task WaitMany returns immediately because |stop_requested_| is signaled https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker_unittest.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker_unittest.cc:94: base::PlatformThread::Sleep(TestTimeouts::tiny_timeout()); On 2016/11/18 18:26:51, maxbogue wrote: > Ooo, interesting. Didn't know we had anything like this. This isn't racy? Yes, it's racy. I explained it in the comment. Joining the sync thread within the callback like I asked you to do here https://codereview.chromium.org/2496723003/diff/20001/components/sync/driver/... doesn't work. It causes a write-after-free when |error_info| is accessed here https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... https://codereview.chromium.org/2505913003/diff/60001/components/sync/engine/... File components/sync/engine/model_safe_worker.cc (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/engine/... components/sync/engine/model_safe_worker.cc:82: bool ModelSafeWorker::IsStopped() { On 2016/11/18 18:26:51, maxbogue wrote: > IsStopped() is still below DoWorkAndWaitUntilDone in the header so you can > probably revert this file. Done.
The CQ bit was checked by fdoray@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm, thanks for figuring this out François! https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.h (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.h:47: base::WaitableEvent stop_requested_; On 2016/11/21 16:57:55, fdoray wrote: > On 2016/11/18 18:26:51, maxbogue wrote: > > Really this is necessary to prevent the race between the stopped signal being > > set and the work_done_or_abandoned_ signal being reset, right? I'm trying to > see > > why this works better than checking IsStopped in CallDoWorkAndSignalEvent did. > > With your CL: > Thread: Action: > Sync DoWorkAndWaitUntilDone() > sees that the worker isn't stopped > UI RequestStop() > sets the stopped flag > signals the event > Sync DoWorkAndWaitUntilDoneImpl() > resets the event > posts the task > waits for it forever > > With this CL: > Thread: Action: > Sync DoWorkAndWaitUntilDone() > sees that the worker isn't stopped > UI RequestStop() > sets the stopped flag > signals |stop_requested_| - which is never reset > Sync DoWorkAndWaitUntilDoneImpl() > resets |work_done_or_abandoned_| > posts the task > WaitMany returns immediately because |stop_requested_| is signaled > Ah ok, so mine tried to make the posted task return immediately, but it might never run or be destroyed before deadlocking, so not waiting for it at all works better? Makes sense. Thanks! https://codereview.chromium.org/2505913003/diff/80001/components/sync/engine/... File components/sync/engine/ui_model_worker.cc (right): https://codereview.chromium.org/2505913003/diff/80001/components/sync/engine/... components/sync/engine/ui_model_worker.cc:56: work_done_or_abandoned_(base::WaitableEvent::ResetPolicy::MANUAL, I suspect this could be AUTOMATIC and you could omit the Reset() now, but I'm not positive.
https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... File components/sync/driver/glue/ui_model_worker.h (right): https://codereview.chromium.org/2505913003/diff/60001/components/sync/driver/... components/sync/driver/glue/ui_model_worker.h:47: base::WaitableEvent stop_requested_; On 2016/11/21 17:17:46, maxbogue wrote: > On 2016/11/21 16:57:55, fdoray wrote: > > On 2016/11/18 18:26:51, maxbogue wrote: > > > Really this is necessary to prevent the race between the stopped signal > being > > > set and the work_done_or_abandoned_ signal being reset, right? I'm trying to > > see > > > why this works better than checking IsStopped in CallDoWorkAndSignalEvent > did. > > > > With your CL: > > Thread: Action: > > Sync DoWorkAndWaitUntilDone() > > sees that the worker isn't stopped > > UI RequestStop() > > sets the stopped flag > > signals the event > > Sync DoWorkAndWaitUntilDoneImpl() > > resets the event > > posts the task > > waits for it forever > > > > With this CL: > > Thread: Action: > > Sync DoWorkAndWaitUntilDone() > > sees that the worker isn't stopped > > UI RequestStop() > > sets the stopped flag > > signals |stop_requested_| - which is never reset > > Sync DoWorkAndWaitUntilDoneImpl() > > resets |work_done_or_abandoned_| > > posts the task > > WaitMany returns immediately because |stop_requested_| is signaled > > > > Ah ok, so mine tried to make the posted task return immediately, but it might > never run or be destroyed before deadlocking, so not waiting for it at all works > better? Makes sense. Thanks! That's right. https://codereview.chromium.org/2505913003/diff/80001/components/sync/engine/... File components/sync/engine/ui_model_worker.cc (right): https://codereview.chromium.org/2505913003/diff/80001/components/sync/engine/... components/sync/engine/ui_model_worker.cc:56: work_done_or_abandoned_(base::WaitableEvent::ResetPolicy::MANUAL, On 2016/11/21 17:17:46, maxbogue wrote: > I suspect this could be AUTOMATIC and you could omit the Reset() now, but I'm > not positive. If it was AUTOMATIC, we would still need a reset when PostTask() fails. Since we don't wait for the event when PostTask() fails, it is not automatically reset. I prefer having a Reset() that runs in all cases than a Reset() just for this special case.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by fdoray@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 80001, "attempt_start_ts": 1479759736448760, "parent_rev": "33c28cb616fddd91a9c70d2e631fbfd009b9f980", "commit_rev": "0396ea72fac03c1ee0706917198c60130df3feeb"}
Message was sent while issue was closed.
Committed patchset #5 (id:80001)
Message was sent while issue was closed.
Description was changed from ========== [Sync] Signal UIModelWorker to abort on sync shutdown. This should prevent a deadlock between the sync and UI threads during shutdown. BUG=663600 ========== to ========== [Sync] Signal UIModelWorker to abort on sync shutdown. This should prevent a deadlock between the sync and UI threads during shutdown. BUG=663600 Committed: https://crrev.com/52e8a13c965a4777fa96eb2feb016e82a1baccbf Cr-Commit-Position: refs/heads/master@{#433628} ==========
Message was sent while issue was closed.
Patchset 5 (id:??) landed as https://crrev.com/52e8a13c965a4777fa96eb2feb016e82a1baccbf Cr-Commit-Position: refs/heads/master@{#433628} |