|
|
DescriptionExempt the Service Thread from BLOCK_SHUTDOWN DCHECKs
In production, FileDescriptorWatchers can be leaky and post
BLOCK_SHUTDOWN tasks in response to their notification.
Stopping the service thread requires some non-trivial task lifetime
fixes to do it correctly, so this was deemed the next best quick fix
for redirection.
BUG=717380
Review-Url: https://codereview.chromium.org/2857103005
Cr-Commit-Position: refs/heads/master@{#469461}
Committed: https://chromium.googlesource.com/chromium/src/+/f10490bd9e23a11a75977d6c53e2571248a61900
Patch Set 1 #
Total comments: 10
Patch Set 2 : CR Feedback + Additional DCHECK_IS_ON due to the way DCHECKs are implemented #
Total comments: 15
Patch Set 3 : CR Feedback #
Messages
Total messages: 29 (18 generated)
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Exempt the Service Thread from BLOCK_SHUTDOWN DCHECKs In production, FileDescriptorWatchers can be leaky and post BLOCK_SHUTDOWN tasks in response to their notification. Stopping the service thread requires some non-trivial task lifetime fixes to do it correctly, so this was deemed the next best quick fix for redirection. BUG=717380 ========== to ========== Exempt the Service Thread from BLOCK_SHUTDOWN DCHECKs In production, FileDescriptorWatchers can be leaky and post BLOCK_SHUTDOWN tasks in response to their notification. Stopping the service thread requires some non-trivial task lifetime fixes to do it correctly, so this was deemed the next best quick fix for redirection. BUG=717380 ==========
robliao@chromium.org changed reviewers: + fdoray@chromium.org, gab@chromium.org
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromeos_amd64-...)
lgtm w/ comments on comments and tests https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:29: #include "testing/gtest/include/gtest/gtest.h" #include "build/build_config.h" https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:461: scheduler_.PostDelayedTaskWithTraits(FROM_HERE, TaskTraits(), Comment that this is racy or make it non-racy. Post task A to sequence 1. This task waits on a barrier WaitableEvent. Post delayed task B to sequence 1. This task has an ADD_FAILURE(). Shutdown. It is guaranteed that task B has not run yet since it is sequenced with a blocked task. Signal barrier. Wait a little bit and make sure that task B does not run. https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:463: TimeDelta::FromMilliseconds(50)); TestTimeouts::tiny_timeout() here and 2*TestTimeouts::tiny_timeout() below https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_tr... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_tr... base/task_scheduler/task_tracker.cc:403: // FileDescriptorWatcher. Delayed tasks are fine because they're marked Delayed tasks go through TaskTracker::BeforePostTask() when they are posted, not when their delay expires on the service thread. Therefore, I'm not sure it is necessary to comment on delayed tasks here.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:29: #include "testing/gtest/include/gtest/gtest.h" On 2017/05/03 20:34:33, fdoray wrote: > #include "build/build_config.h" Done. https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:461: scheduler_.PostDelayedTaskWithTraits(FROM_HERE, TaskTraits(), On 2017/05/03 20:34:33, fdoray wrote: > Comment that this is racy or make it non-racy. > > Post task A to sequence 1. This task waits on a barrier WaitableEvent. > Post delayed task B to sequence 1. This task has an ADD_FAILURE(). > Shutdown. It is guaranteed that task B has not run yet since it is sequenced > with a blocked task. > Signal barrier. > Wait a little bit and make sure that task B does not run. Seems like the barrier just ensures that a task actually ran, but it doesn't change the race between the timeout and the delayed task. Added a comment. https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:463: TimeDelta::FromMilliseconds(50)); On 2017/05/03 20:34:33, fdoray wrote: > TestTimeouts::tiny_timeout() here and 2*TestTimeouts::tiny_timeout() below Done. https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_tr... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_tr... base/task_scheduler/task_tracker.cc:403: // FileDescriptorWatcher. Delayed tasks are fine because they're marked On 2017/05/03 20:34:33, fdoray wrote: > Delayed tasks go through TaskTracker::BeforePostTask() when they are posted, not > when their delay expires on the service thread. Therefore, I'm not sure it is > necessary to comment on delayed tasks here. Indeed. I was thinking of the general case. Removed.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from fdoray@chromium.org Link to the patchset: https://codereview.chromium.org/2857103005/#ps20001 (title: "CR Feedback + Additional DCHECK_IS_ON due to the way DCHECKs are implemented")
The CQ bit was unchecked by robliao@chromium.org
https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker.cc:403: // FileDescriptorWatcher. Indicate that while these tasks shouldn't be BLOCK_SHUTDOWN since they have no guarantees of firing in the first place, they often are because they're posted on a sequence that is. This will be solved by sequence funneling. I think the TODO (and documentation in this CL) should be around that more (i.e. I don't think we care to fix it as those tasks aren't truly BLOCK_SHUTDOWN, instead we want to properly post them as SKIP_ON_SHUTDOWN when that becomes possible) https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker_posix.h (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker_posix.h:46: void set_service_thread_handle( Can you set a bit in TLS in TaskTracker instead of using the PlatformThreadHandle? I have a TODO in thread.cc to get rid of |thread_lock_| and would be sad to add a valid use case of it... https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker_posix.h:58: bool IsExemptFromBlockingShutdownChecks() override; Why is this POSIX only?
https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker.cc:403: // FileDescriptorWatcher. On 2017/05/04 17:36:11, gab wrote: > Indicate that while these tasks shouldn't be BLOCK_SHUTDOWN since they have no > guarantees of firing in the first place, they often are because they're posted > on a sequence that is. This will be solved by sequence funneling. > > I think the TODO (and documentation in this CL) should be around that more (i.e. > I don't think we care to fix it as those tasks aren't truly BLOCK_SHUTDOWN, > instead we want to properly post them as SKIP_ON_SHUTDOWN when that becomes > possible) We should keep the extra details in the bug, especially implementation details that may change and outdate any further comments here. This means we should keep the comments here factual on the state of the world when we introduced this code. Sequence funneling will not inherently solve the issue as we still need to fix all of the callers that register these FileDescriptorWatchers. https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker_posix.h (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker_posix.h:46: void set_service_thread_handle( On 2017/05/04 17:36:11, gab wrote: > Can you set a bit in TLS in TaskTracker instead of using the > PlatformThreadHandle? I have a TODO in thread.cc to get rid of |thread_lock_| > and would be sad to add a valid use case of it... That would create a race between starting the thread and shutting down the task tracker. Since the handle is known at start, this avoids the race. https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker_posix.h:58: bool IsExemptFromBlockingShutdownChecks() override; On 2017/05/04 17:36:11, gab wrote: > Why is this POSIX only? FileDescriptorWatchers only exist on POSIX. Delayed tasks aren't subject to this because they go through the task tracker. FileDescriptorWatchers fire and execute directly.
https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker_posix.h (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker_posix.h:46: void set_service_thread_handle( On 2017/05/04 17:43:24, robliao wrote: > On 2017/05/04 17:36:11, gab wrote: > > Can you set a bit in TLS in TaskTracker instead of using the > > PlatformThreadHandle? I have a TODO in thread.cc to get rid of |thread_lock_| > > and would be sad to add a valid use case of it... > > That would create a race between starting the thread and shutting down the task > tracker. Since the handle is known at start, this avoids the race. One more note, this code will go away once we fix the shutdown issue with regards to Service thread, so this legitimate use is temporary.
lgtm w/ comments https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_scheduler_impl_unittest.cc:500: controller.release(); inline controller.release() in the leak macro above instead of .get() then .release? https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_scheduler_impl_unittest.cc:510: PlatformThread::Sleep(TimeDelta::FromMilliseconds(100)); TestTimeouts::tiny_timeout() https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker.cc:405: DCHECK(IsExemptFromBlockingShutdownChecks()); The log "DCHECK(!shutdown_event_->IsSignaled());" was previously more obvious and since people hit this occasionally it should remain easy to see what's going on whereas "DCHECK(IsExemptFromBlockingShutdownChecks());" isn't as obvious to the untrained IMO. How about DCHECK(IsPostingBlockShutdownTaskAfterShutdownAllowed()); or something? https://codereview.chromium.org/2857103005/diff/20001/base/threading/thread.h File base/threading/thread.h (right): https://codereview.chromium.org/2857103005/diff/20001/base/threading/thread.h... base/threading/thread.h:251: // This method is thread-safe. TODO(robliao): remove this when it no longer needs to be temporarily exposed for http://crbug.com/717380. ...so its usage doesn't spread like wildfire and we can actually remove it when we fix the current use case.
still lgtm https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:461: scheduler_.PostDelayedTaskWithTraits(FROM_HERE, TaskTraits(), On 2017/05/03 21:11:17, robliao wrote: > On 2017/05/03 20:34:33, fdoray wrote: > > Comment that this is racy or make it non-racy. > > > > Post task A to sequence 1. This task waits on a barrier WaitableEvent. > > Post delayed task B to sequence 1. This task has an ADD_FAILURE(). > > Shutdown. It is guaranteed that task B has not run yet since it is sequenced > > with a blocked task. > > Signal barrier. > > Wait a little bit and make sure that task B does not run. > > Seems like the barrier just ensures that a task actually ran, but it doesn't > change the race between the timeout and the delayed task. Added a comment. In the current state, the test may fail if the 50 ms timeout expires before Shutdown() is called. With the change I proposed, this would not be possible. The race between the timeout and the delayed task still exists (the delayed task was gonna run but didn't because the 100 ms delay wasn't long enough), but at least it doesn't lead to a test failure.
Patchset #3 (id:40001) has been deleted
https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/1/base/task_scheduler/task_sc... base/task_scheduler/task_scheduler_impl_unittest.cc:461: scheduler_.PostDelayedTaskWithTraits(FROM_HERE, TaskTraits(), On 2017/05/04 18:37:32, fdoray wrote: > On 2017/05/03 21:11:17, robliao wrote: > > On 2017/05/03 20:34:33, fdoray wrote: > > > Comment that this is racy or make it non-racy. > > > > > > Post task A to sequence 1. This task waits on a barrier WaitableEvent. > > > Post delayed task B to sequence 1. This task has an ADD_FAILURE(). > > > Shutdown. It is guaranteed that task B has not run yet since it is sequenced > > > with a blocked task. > > > Signal barrier. > > > Wait a little bit and make sure that task B does not run. > > > > Seems like the barrier just ensures that a task actually ran, but it doesn't > > change the race between the timeout and the delayed task. Added a comment. > > In the current state, the test may fail if the 50 ms timeout expires before > Shutdown() is called. With the change I proposed, this would not be possible. > > The race between the timeout and the delayed task still exists (the delayed task > was gonna run but didn't because the 100 ms delay wasn't long enough), but at > least it doesn't lead to a test failure. Discussed offline. Shutdown() would block, which means we would have to release the WaitableEvent on a different thread. Added a comment documenting the decision. https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_scheduler_impl_unittest.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_scheduler_impl_unittest.cc:500: controller.release(); On 2017/05/04 18:11:58, gab wrote: > inline controller.release() in the leak macro above instead of .get() then > .release? Turns out this breaks the test :-( #define ANNOTATE_LEAKING_OBJECT_PTR(X) ((void)0) https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_scheduler_impl_unittest.cc:510: PlatformThread::Sleep(TimeDelta::FromMilliseconds(100)); On 2017/05/04 18:11:58, gab wrote: > TestTimeouts::tiny_timeout() This seems to work, switched. https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... File base/task_scheduler/task_tracker.cc (right): https://codereview.chromium.org/2857103005/diff/20001/base/task_scheduler/tas... base/task_scheduler/task_tracker.cc:405: DCHECK(IsExemptFromBlockingShutdownChecks()); On 2017/05/04 18:11:58, gab wrote: > The log "DCHECK(!shutdown_event_->IsSignaled());" was previously more obvious > and since people hit this occasionally it should remain easy to see what's going > on whereas "DCHECK(IsExemptFromBlockingShutdownChecks());" isn't as obvious to > the untrained IMO. > > How about > > DCHECK(IsPostingBlockShutdownTaskAfterShutdownAllowed()); > > or something? sgtm. Done. https://codereview.chromium.org/2857103005/diff/20001/base/threading/thread.h File base/threading/thread.h (right): https://codereview.chromium.org/2857103005/diff/20001/base/threading/thread.h... base/threading/thread.h:251: // This method is thread-safe. On 2017/05/04 18:11:58, gab wrote: > TODO(robliao): remove this when it no longer needs to be temporarily exposed for > http://crbug.com/717380. > > ...so its usage doesn't spread like wildfire and we can actually remove it when > we fix the current use case. Done.
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from gab@chromium.org, fdoray@chromium.org Link to the patchset: https://codereview.chromium.org/2857103005/#ps60001 (title: "CR Feedback")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 60001, "attempt_start_ts": 1493924513274670, "parent_rev": "1672250ece2bf96f487093d5b495848c8580fef2", "commit_rev": "f10490bd9e23a11a75977d6c53e2571248a61900"}
Message was sent while issue was closed.
Description was changed from ========== Exempt the Service Thread from BLOCK_SHUTDOWN DCHECKs In production, FileDescriptorWatchers can be leaky and post BLOCK_SHUTDOWN tasks in response to their notification. Stopping the service thread requires some non-trivial task lifetime fixes to do it correctly, so this was deemed the next best quick fix for redirection. BUG=717380 ========== to ========== Exempt the Service Thread from BLOCK_SHUTDOWN DCHECKs In production, FileDescriptorWatchers can be leaky and post BLOCK_SHUTDOWN tasks in response to their notification. Stopping the service thread requires some non-trivial task lifetime fixes to do it correctly, so this was deemed the next best quick fix for redirection. BUG=717380 Review-Url: https://codereview.chromium.org/2857103005 Cr-Commit-Position: refs/heads/master@{#469461} Committed: https://chromium.googlesource.com/chromium/src/+/f10490bd9e23a11a75977d6c53e2... ==========
Message was sent while issue was closed.
Committed patchset #3 (id:60001) as https://chromium.googlesource.com/chromium/src/+/f10490bd9e23a11a75977d6c53e2... |