|
|
DescriptionSchedulerWorker Refcounting for Destruction in Production
The upcoming work for one dedicated thread per SingleTaskTaskRunner
requires the ability to clean up SchedulerWorker. Additionally
dynamically ramping up and down SchedulerWorkers at runtime may also
utilize this work.
This change introduces refcounting and
SchedulerWorker::RequestThreadTermination() so that a caller may safely
release a SchedulerWorker at runtime.
The expected usage is
scoped_refptr<SchedulerWorker> worker_ = /* Existing Worker */
worker_->Cleanup();
worker_ = nullptr;
Done!
Reference Change:
https://codereview.chromium.org/2686593003/
BUG=684080
Review-Url: https://codereview.chromium.org/2692863012
Cr-Commit-Position: refs/heads/master@{#452686}
Committed: https://chromium.googlesource.com/chromium/src/+/bcc569081c8c449f98da7d1130bc2bcf5efecf7d
Patch Set 1 #
Total comments: 12
Patch Set 2 : CR Feedback #
Total comments: 8
Patch Set 3 : AtomicFlag #Patch Set 4 : Fix Up Comments and Description #
Total comments: 28
Patch Set 5 : CR Feedback #Patch Set 6 : CR Feedback #
Total comments: 6
Patch Set 7 : CR Feedback #Patch Set 8 : Remove Last Vestiges of std::unique_ptr SchedulerWorker #
Dependent Patchsets: Messages
Total messages: 65 (44 generated)
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
robliao@chromium.org changed reviewers: + fdoray@chromium.org, gab@chromium.org
Here's what life looks like with refcounting. I decided to keep SchedulerWorker::Thread as is with the following lifetime behaviors: Join - SchedulerWorker cleans up Anything other termination, the Thread cleans up. One downside of SchedulerWorker refcounted is that there's not a way to assert that dev has performed a termination action (Join or RequestTermination). However, it is very easy to guarantee the lifetime of everything SchedulerWorker::Thread needs.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:110: if (!detached_thread) Why is it important to take ownership of |this| here? To break the ownership cycle between SchedulerWorker and Thread? -- : ref ptr == : unique_ptr Before this line: --> SchedulerWorker ==> Thread ^ | |______________| After this line: --> SchedulerWorker Thread <== stack ^ | |_______________| Wouldn't it be simpler to release our reference to SchedulerWorker? --> SchedulerWorker ==> Thread Oh... just realized that it would prevent you from DCHECK(!thread_); in ~SchedulerWorker. I prefer removing that DCHECK and doing what I said above than keeping the DCHECK. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:252: void SchedulerWorker::RequestThreadTermination() { Optional (I'm not convinced it's a perfect idea): Change this method to: // static SchedulerWorker::RequestThreadTermination( scoped_refptr<SchedulerWorker> worker) { AutoSchedulerLock auto_lock(worker->thread_lock_); worker->should_exit_.Set(); // At the end of this scope, there should be at // most one reference to |worker|, owned by // |worker->thread_|. } And add this at the end of ThreadMain(): AutoSchedulerLock auto_lock(outer->thread_lock_); DCHECK(!outer_->should_exit_.IsSet() || outer_->HasOneRef()); This enforces that no external reference to SchedulerWorker is kept after RequestThreadTermination(). https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:252: void SchedulerWorker::RequestThreadTermination() { Change the name to Cleanup? "Request thread termination" might let the reader think that the SchedulerWorker can be reused after calling this. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.h:165: bool should_exit_ = false; Since it is only set once [1], the fact that it may be set from any thread shouldn't be a problem. It doesn't have to be set from the sequence from which it was constructed https://cs.chromium.org/chromium/src/base/synchronization/atomic_flag.cc?sq=p... [1] Enforced by the DHCECK(!should_exit_) in ShouldExit(). https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_unittest.cc:393: void ResetState() { Reset all members in order of declaration? https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_unittest.cc:485: ; extra ;
Patchset #2 (id:20001) has been deleted
Thanks for the comments! https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:110: if (!detached_thread) On 2017/02/17 16:43:41, fdoray wrote: > Why is it important to take ownership of |this| here? To break the ownership > cycle between SchedulerWorker and Thread? > > -- : ref ptr > == : unique_ptr > > Before this line: > > --> SchedulerWorker ==> Thread > ^ | > |______________| > > After this line: > > --> SchedulerWorker Thread <== stack > ^ | > |_______________| > > Wouldn't it be simpler to release our reference to SchedulerWorker? > > --> SchedulerWorker ==> Thread > > Oh... just realized that it would prevent you from DCHECK(!thread_); in > ~SchedulerWorker. I prefer removing that DCHECK and doing what I said above than > keeping the DCHECK. My lifetime model was that the Thread object would not outlive the thread: Either it's cleared after a Join or cleared after Termination. Clearing outer_ seemed like a roundabout way of doing it. We already have to do this for Detach so it seemed natural to do it here too. The DCHECK is a nice safeguard to make sure we don't orphan the thread if SchedulerWorker somehow goes away. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:252: void SchedulerWorker::RequestThreadTermination() { On 2017/02/17 16:43:41, fdoray wrote: > Change the name to Cleanup? > > "Request thread termination" might let the reader think that the SchedulerWorker > can be reused after calling this. Renaming to Cleanup sounds good. The intent of the DCHECK(thread_) call in the unique_ptr world is to make sure we did not accidentally orphan the Thread object by releasing SchedulerWorker too early, forcing the caller to either cleanup or join. Holding a reference to the SchedulerWorker afterwards is fine. It's releasing it without calling Join or Cleanup. In this case, if callers set their pointers to nullptr without a Join or Cleanup, the SchedulerWorker is orphaned. If the SchedulerWorker is asleep, the worker itself can't assert that something bad has happened! The unique_ptr + refcounted scheme did cover this as folks declare that they're done with the worker by destroying it. We could do something similar by running our own AddRef() and Release() show. When the ref hits one, we can make sure that either thread_ is nullptr or that someone has called Join or Cleanup(). Perhaps even more elegantly, we could just request termination at this step. If this doesn't sound too crazy, I'll go with that. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.h:165: bool should_exit_ = false; On 2017/02/17 16:43:41, fdoray wrote: > Since it is only set once [1], the fact that it may be set from any thread > shouldn't be a problem. It doesn't have to be set from the sequence from which > it was constructed > https://cs.chromium.org/chromium/src/base/synchronization/atomic_flag.cc?sq=p... > > [1] Enforced by the DHCECK(!should_exit_) in ShouldExit(). Gotcha. I must have misread something somewhere as an earlier iteration did assert here. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_unittest.cc:393: void ResetState() { On 2017/02/17 16:43:41, fdoray wrote: > Reset all members in order of declaration? Done. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_unittest.cc:485: ; On 2017/02/17 16:43:41, fdoray wrote: > extra ; Done.
lgtm https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:163: bool should_exit_ = false; Ping change to AtomicFlag? So you don't have to acquire |thread_lock_| at every loop iteration.
https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:163: bool should_exit_ = false; On 2017/02/17 20:28:04, fdoray wrote: > Ping change to AtomicFlag? > > So you don't have to acquire |thread_lock_| at every loop iteration. Actually we do. Threads A and B. A: Cleanup(), set should_exit_ B: Sees should_exit_. Grabs thread_, destroys and terminates. A: thread_->Wakeup(). Crash!
On 2017/02/17 20:38:58, robliao wrote: > https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... > File base/task_scheduler/scheduler_worker.h (right): > > https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... > base/task_scheduler/scheduler_worker.h:163: bool should_exit_ = false; > On 2017/02/17 20:28:04, fdoray wrote: > > Ping change to AtomicFlag? > > > > So you don't have to acquire |thread_lock_| at every loop iteration. > > Actually we do. > Threads A and B. > > A: Cleanup(), set should_exit_ > B: Sees should_exit_. Grabs thread_, destroys and terminates. > A: thread_->Wakeup(). Crash! You have to acquire |thread_lock_| in SchedulerWorker::Cleanup() but not in ShouldExit() if you use an AtomicFlag for should_exit_.
Patchset #3 (id:60001) has been deleted
Patchset #3 (id:80001) has been deleted
> You have to acquire |thread_lock_| in SchedulerWorker::Cleanup() but not in ShouldExit() if you use an AtomicFlag for should_exit_. Ah, you're right! Done. https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker.cc:252: void SchedulerWorker::RequestThreadTermination() { On 2017/02/17 19:24:36, robliao wrote: > On 2017/02/17 16:43:41, fdoray wrote: > > Change the name to Cleanup? > > > > "Request thread termination" might let the reader think that the > SchedulerWorker > > can be reused after calling this. > > Renaming to Cleanup sounds good. > > The intent of the DCHECK(thread_) call in the unique_ptr world is to make sure > we did not accidentally orphan the Thread object by releasing SchedulerWorker > too early, forcing the caller to either cleanup or join. > > Holding a reference to the SchedulerWorker afterwards is fine. It's releasing it > without calling Join or Cleanup. > > In this case, if callers set their pointers to nullptr without a Join or > Cleanup, the SchedulerWorker is orphaned. > > If the SchedulerWorker is asleep, the worker itself can't assert that something > bad has happened! > > The unique_ptr + refcounted scheme did cover this as folks declare that they're > done with the worker by destroying it. > > We could do something similar by running our own AddRef() and Release() show. > When the ref hits one, we can make sure that either thread_ is nullptr or that > someone has called Join or Cleanup(). Perhaps even more elegantly, we could just > request termination at this step. > > If this doesn't sound too crazy, I'll go with that. > Do we want to pursue a custom AddRef/Release?
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
So in this CL the refcount never goes above one still, right? (i.e. this will only happen in DestroyOnDetachment() CL?) lgtm (didn't look at test updates -- riding Francois' review for that :)) https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.cc:110: if (!detached_thread) Add comment https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:129: // All further method calls are undefined after this calling this function. "this calling this", but I'd just say last sentence: // The caller is expected to release its ref after calling this (all further method calls are undefined after calling this). actually, your example in CL desc is good, it should be in the code. https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:130: void Cleanup(); Need to update CL description? (i.e. not RequestThreadTermination())
Description was changed from ========== SchedulerWorker Refcounting for Destruction in Production The upcoming work for one dedicated thread per SingleTaskTaskRunner requires the ability to clean up SchedulerWorker. Additionally dynamically ramping up and down SchedulerWorkers at runtime may also utilize this work. This change introduces refcounting and SchedulerWorker::RequestThreadTermination() so that a caller may safely release a SchedulerWorker at runtime. The expected usage is scoped_refptr<SchedulerWorker> worker_ = /* Existing Worker */ worker_->RequestThreadTermination; worker_ = nullptr; Done! Reference Change: https://codereview.chromium.org/2686593003/ BUG=684080 ========== to ========== SchedulerWorker Refcounting for Destruction in Production The upcoming work for one dedicated thread per SingleTaskTaskRunner requires the ability to clean up SchedulerWorker. Additionally dynamically ramping up and down SchedulerWorkers at runtime may also utilize this work. This change introduces refcounting and SchedulerWorker::RequestThreadTermination() so that a caller may safely release a SchedulerWorker at runtime. The expected usage is scoped_refptr<SchedulerWorker> worker_ = /* Existing Worker */ worker_->Cleanup(); worker_ = nullptr; Done! Reference Change: https://codereview.chromium.org/2686593003/ BUG=684080 ==========
Patchset #4 (id:120001) has been deleted
Patchset #4 (id:140001) has been deleted
> So in this CL the refcount never goes above one still, right? (i.e. this will only happen in DestroyOnDetachment() CL?) The refcount will be above one. 0 = Object should be destroyed 1 = The thread is terminating or the thread is already detached 2+ = The environment and the thread have references. https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.cc:110: if (!detached_thread) On 2017/02/17 21:38:35, gab wrote: > Add comment Done. https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:129: // All further method calls are undefined after this calling this function. On 2017/02/17 21:38:35, gab wrote: > "this calling this", but I'd just say last sentence: > > // The caller is expected to release its ref after calling this (all further > method calls are undefined after calling this). > > actually, your example in CL desc is good, it should be in the code. Done and done. https://codereview.chromium.org/2692863012/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker.h:130: void Cleanup(); On 2017/02/17 21:38:35, gab wrote: > Need to update CL description? (i.e. not RequestThreadTermination()) Done.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Took a fresh peak :). lvg, just a few things with tests https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (left): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:296: thread_ = Thread::Create(this); Thread::Create(scoped_refptr<SchedulerWorker>(this)); (scoped_refptr isn't currently explicit but ideally would be -- this is what I was missing in first pass to realize there were two refs around) https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:111: // in Join(), we go ahead and grab the thread object here. We don't want to What makes Join() not hit this code? https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:256: // should_exit_ is synchronized with thread_ for writes here so that we can |should_exit_| |thread_| https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:257: // maintain access |thread_| for wakeup. Otherwise, the thread may take away "maintain alive" or just "access" ? https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.h:99: // is enabled. Mention that Cleanup() must be called before releasing the obtained ref. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:469: // ControllableDetachDelegate: This usually means that it implements that interface which isn't the case here. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:470: scoped_refptr<Controls> controls() { return controls_; } Does this need to expose the ref (usage seems to non-owning?). If not, can the whole class not be refcounted? https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:604: controls->set_can_detach(true); What's the point of setting it after? (I guess I also don't understand why |can_detach_| is saved before the wait and used as the return value?) https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:675: // While Start blocks until the thread starts, that doesn't cover the actual Start no longer blocks until thread starts (Francois just changed that -- or did that land? anyways it's coming!)
On 2017/02/21 19:01:05, gab wrote: > Took a fresh peak :). lvg, just a few things with tests So good it's at the peak ;)... s/peak/peek! > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker.cc (left): > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker.cc:296: thread_ = Thread::Create(this); > Thread::Create(scoped_refptr<SchedulerWorker>(this)); > > (scoped_refptr isn't currently explicit but ideally would be -- this is what I > was missing in first pass to realize there were two refs around) > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker.cc (right): > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker.cc:111: // in Join(), we go ahead and grab > the thread object here. We don't want to > What makes Join() not hit this code? > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker.cc:256: // should_exit_ is synchronized > with thread_ for writes here so that we can > |should_exit_| > > |thread_| > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker.cc:257: // maintain access |thread_| for > wakeup. Otherwise, the thread may take away > "maintain alive" or just "access" ? > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker.h (right): > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker.h:99: // is enabled. > Mention that Cleanup() must be called before releasing the obtained ref. > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_unittest.cc (right): > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_unittest.cc:469: // > ControllableDetachDelegate: > This usually means that it implements that interface which isn't the case here. > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_unittest.cc:470: scoped_refptr<Controls> > controls() { return controls_; } > Does this need to expose the ref (usage seems to non-owning?). If not, can the > whole class not be refcounted? > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_unittest.cc:604: > controls->set_can_detach(true); > What's the point of setting it after? (I guess I also don't understand why > |can_detach_| is saved before the wait and used as the return value?) > > https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_unittest.cc:675: // While Start blocks > until the thread starts, that doesn't cover the actual > Start no longer blocks until thread starts (Francois just changed that -- or did > that land? anyways it's coming!)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
Thanks for the comments! https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (left): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:296: thread_ = Thread::Create(this); On 2017/02/21 19:01:04, gab wrote: > Thread::Create(scoped_refptr<SchedulerWorker>(this)); > > (scoped_refptr isn't currently explicit but ideally would be -- this is what I > was missing in first pass to realize there were two refs around) Went with make_scoped_refptr https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:111: // in Join(), we go ahead and grab the thread object here. We don't want to On 2017/02/21 19:01:04, gab wrote: > What makes Join() not hit this code? Join() nulls out |thread_| which means DetachThreadObject() will return nullptr. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:256: // should_exit_ is synchronized with thread_ for writes here so that we can On 2017/02/21 19:01:04, gab wrote: > |should_exit_| > > |thread_| Done. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:257: // maintain access |thread_| for wakeup. Otherwise, the thread may take away On 2017/02/21 19:01:04, gab wrote: > "maintain alive" or just "access" ? Lost a preposition there. Added a "to" -> maintain access to |thread_| https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.h (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.h:99: // is enabled. On 2017/02/21 19:01:04, gab wrote: > Mention that Cleanup() must be called before releasing the obtained ref. Done. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:469: // ControllableDetachDelegate: On 2017/02/21 19:01:04, gab wrote: > This usually means that it implements that interface which isn't the case here. This was habit for me working in C++ interfaces for me so you could identify which methods aren't part of any interface. Otherwise they all blend together and you have no idea! Any strong preference to removing this? There doesn't appear to be a style guideline either way. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:470: scoped_refptr<Controls> controls() { return controls_; } On 2017/02/21 19:01:04, gab wrote: > Does this need to expose the ref (usage seems to non-owning?). If not, can the > whole class not be refcounted? This needs to be owned for cases where we race between Join/Shutdown and Cleanup. In these cases, the delegate can be destroyed at anytime. Refcounting the controls addresses this and is robust to whatever happens to the delegate. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:604: controls->set_can_detach(true); On 2017/02/21 19:01:04, gab wrote: > What's the point of setting it after? (I guess I also don't understand why > |can_detach_| is saved before the wait and used as the return value?) In this case, we just want to wait for a CanDetach() serviced at least once with CanDetach() => false, forcing us to go into the Wait (there isn't a Wait callback (coming attractions for COM). https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:675: // While Start blocks until the thread starts, that doesn't cover the actual On 2017/02/21 19:01:04, gab wrote: > Start no longer blocks until thread starts (Francois just changed that -- or did > that land? anyways it's coming!) SimpleThread::Start() waits. Thread::Start() doesn't wait.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromeos_amd64-generic_chromium_compile_only_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) chromium_presubmit on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL) linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_TIMED_OUT, no build URL)
https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:111: // in Join(), we go ahead and grab the thread object here. We don't want to On 2017/02/21 22:26:57, robliao wrote: > On 2017/02/21 19:01:04, gab wrote: > > What makes Join() not hit this code? > > Join() nulls out |thread_| which means DetachThreadObject() will return nullptr. I see and JoinForTesting() also checks for null |thread_| w/ lock before Join() in case the thread somehow terminated itself already (potentially noticing the join_called_for_testing event before Join() was invoked in which case the detach below will be the one reclaiming the Thread object). How about the following: // Silently make sure the Thread object was reclaimed. // Cases where this no-ops: // - |detached_thread != nullptr|: Thread was explicitly detached above. // - JoinForTesting() reclaimed the Thread object. // Cases where this reclaims the Thread object: // - Main loop exited per IsShutdownComplete() or Cleanup(). // - Main loop exited per JoinForTesting() and this point was reached before // JoinForTesting() could even reclaim the Thread object (that's okay: no-op // join). outer_->DetachThreadObject(DetachNotify::SILENT); (note: don't even bother with conditional or assigning to variable -- just call DetachThreadObject(DetachNotify::SILENT) and let it do the right thing) https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:327: task_tracker_->IsShutdownComplete() || join_called_for_testing_.IsSet(); inline below https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:332: // on its own to perform changes to thread_. |thread_| https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:333: return should_exit_.IsSet(); It's unclear why this is checked separately now (it used to be because it required a lock). merge as return X || Y || Z;? https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:675: // While Start blocks until the thread starts, that doesn't cover the actual On 2017/02/21 22:26:57, robliao wrote: > On 2017/02/21 19:01:04, gab wrote: > > Start no longer blocks until thread starts (Francois just changed that -- or > did > > that land? anyways it's coming!) > > SimpleThread::Start() waits. Ah, found it: https://codereview.chromium.org/2664953004/ (was reverted :( ). SimpleThread::Start() shouldn't block either (and the reason is exactly as your comment here describes -- that serves no purpose as it doesn't guarantee Run() was invoked when it unblocks). Hence I'd prefer to rephrase this comment to avoid making an explicit statement about SimpleThread::Start()'s behavior. Perhaps just: Why not add an event that is signaled by the worker thread right before it calls JoinForTesting()? That explicitly highlights what you want (have a race in this test which the impl has to handle). > > Thread::Start() doesn't wait.
Patchset #6 (id:200001) has been deleted
https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:111: // in Join(), we go ahead and grab the thread object here. We don't want to On 2017/02/22 18:02:23, gab wrote: > On 2017/02/21 22:26:57, robliao wrote: > > On 2017/02/21 19:01:04, gab wrote: > > > What makes Join() not hit this code? > > > > Join() nulls out |thread_| which means DetachThreadObject() will return > nullptr. > > I see and JoinForTesting() also checks for null |thread_| w/ lock before Join() > in case the thread somehow terminated itself already (potentially noticing the > join_called_for_testing event before Join() was invoked in which case the detach > below will be the one reclaiming the Thread object). How about the following: > > // Silently make sure the Thread object was reclaimed. > // Cases where this no-ops: > // - |detached_thread != nullptr|: Thread was explicitly detached above. > // - JoinForTesting() reclaimed the Thread object. > // Cases where this reclaims the Thread object: > // - Main loop exited per IsShutdownComplete() or Cleanup(). > // - Main loop exited per JoinForTesting() and this point was reached before > // JoinForTesting() could even reclaim the Thread object (that's okay: no-op > // join). > outer_->DetachThreadObject(DetachNotify::SILENT); > > (note: don't even bother with conditional or assigning to variable -- just call > DetachThreadObject(DetachNotify::SILENT) and let it do the right thing) I've enumerated the conditions, but we should keep the check should keep this check for robustness (it basically costs us nothing and buys us insurance against future crashes). It is unexpected that a call to DetachThreadObject would destroy the this pointer during the execution of ThreadMain. If the thread detached, the this pointer would get destroyed after the statement completes. Calling without assignment would result in the same destruction during execution. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:327: task_tracker_->IsShutdownComplete() || join_called_for_testing_.IsSet(); On 2017/02/22 18:02:24, gab wrote: > inline below Done. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:332: // on its own to perform changes to thread_. On 2017/02/22 18:02:24, gab wrote: > |thread_| Removed https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:333: return should_exit_.IsSet(); On 2017/02/22 18:02:23, gab wrote: > It's unclear why this is checked separately now (it used to be because it > required a lock). > > merge as return X || Y || Z;? Done. https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:675: // While Start blocks until the thread starts, that doesn't cover the actual On 2017/02/22 18:02:24, gab wrote: > On 2017/02/21 22:26:57, robliao wrote: > > On 2017/02/21 19:01:04, gab wrote: > > > Start no longer blocks until thread starts (Francois just changed that -- or > > did > > > that land? anyways it's coming!) > > > > SimpleThread::Start() waits. > > Ah, found it: https://codereview.chromium.org/2664953004/ (was reverted :( ). > SimpleThread::Start() shouldn't block either (and the reason is exactly as your > comment here describes -- that serves no purpose as it doesn't guarantee Run() > was invoked when it unblocks). > > Hence I'd prefer to rephrase this comment to avoid making an explicit statement > about SimpleThread::Start()'s behavior. Perhaps just: > > Why not add an event that is signaled by the worker thread right before it calls > JoinForTesting()? That explicitly highlights what you want (have a race in this > test which the impl has to handle). > > > > > Thread::Start() doesn't wait. Went with the event. Amended comment.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm w/ comments https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:118: // * Join: DetachThreadObject returns nullptr. Join cleans up. It's also possible to receive the thread object in Join use case FWIW (i.e. if this thread sees the flag and exits before JoinForTesting() even has a chance to grab the thread). Worth mentioning? Also s/Join/JoinForTesting/ ? https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:688: PlatformThread::YieldCurrentThread(); I'm not convinced yielding is necessary/better. I'd assume most kernels don't preempt a thread that signals (I'd noticed Android seemed to do that but not others -- an "android only" failure it took me days to track down when doing the BrowserThread redirect CL which was actually an all platform race which resolved worse on Android because of this); as such JoinForTesting() has likely already been called (and yielding merely alters the race conditions).
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker.cc:118: // * Join: DetachThreadObject returns nullptr. Join cleans up. On 2017/02/23 19:25:24, gab wrote: > It's also possible to receive the thread object in Join use case FWIW (i.e. if > this thread sees the flag and exits before JoinForTesting() even has a chance to > grab the thread). Worth mentioning? > > Also s/Join/JoinForTesting/ ? Sure, and done. https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:688: PlatformThread::YieldCurrentThread(); On 2017/02/23 19:25:24, gab wrote: > I'm not convinced yielding is necessary/better. I'd assume most kernels don't > preempt a thread that signals (I'd noticed Android seemed to do that but not > others -- an "android only" failure it took me days to track down when doing the > BrowserThread redirect CL which was actually an all platform race which resolved > worse on Android because of this); as such JoinForTesting() has likely already > been called (and yielding merely alters the race conditions). Let's go with PlatformThread::Sleep to hammer the point home.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:662: TEST(TaskSchedulerWorkerTest, WorkerDetachesAndSelfDestroysDuringJoin) { As discussed offline, need to tweak test or adjust name, the code below doesn't "detach".
https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_unittest.cc (right): https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_unittest.cc:662: TEST(TaskSchedulerWorkerTest, WorkerDetachesAndSelfDestroysDuringJoin) { On 2017/02/23 21:52:54, gab wrote: > As discussed offline, need to tweak test or adjust name, the code below doesn't > "detach". Nice. This was here since the first revision. Adjusted to WorkerCleanup*.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm++! On 2017/02/23 22:00:03, robliao wrote: > https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_unittest.cc (right): > > https://codereview.chromium.org/2692863012/diff/220001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_unittest.cc:662: > TEST(TaskSchedulerWorkerTest, WorkerDetachesAndSelfDestroysDuringJoin) { > On 2017/02/23 21:52:54, gab wrote: > > As discussed offline, need to tweak test or adjust name, the code below > doesn't > > "detach". > > Nice. This was here since the first revision. Adjusted to WorkerCleanup*. Right, sorry for not spotting earlier, I was more looking at a high-level at first, then at logic and just now did I pay attention to the names per my confusion with what a test was doing :)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from fdoray@chromium.org Link to the patchset: https://codereview.chromium.org/2692863012/#ps260001 (title: "Remove Last Vestiges of std::unique_ptr SchedulerWorker")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 260001, "attempt_start_ts": 1487895324852350, "parent_rev": "83f9a7b7a698e9cb7b6e03c2ad567ab5d1435d93", "commit_rev": "bcc569081c8c449f98da7d1130bc2bcf5efecf7d"}
Message was sent while issue was closed.
Description was changed from ========== SchedulerWorker Refcounting for Destruction in Production The upcoming work for one dedicated thread per SingleTaskTaskRunner requires the ability to clean up SchedulerWorker. Additionally dynamically ramping up and down SchedulerWorkers at runtime may also utilize this work. This change introduces refcounting and SchedulerWorker::RequestThreadTermination() so that a caller may safely release a SchedulerWorker at runtime. The expected usage is scoped_refptr<SchedulerWorker> worker_ = /* Existing Worker */ worker_->Cleanup(); worker_ = nullptr; Done! Reference Change: https://codereview.chromium.org/2686593003/ BUG=684080 ========== to ========== SchedulerWorker Refcounting for Destruction in Production The upcoming work for one dedicated thread per SingleTaskTaskRunner requires the ability to clean up SchedulerWorker. Additionally dynamically ramping up and down SchedulerWorkers at runtime may also utilize this work. This change introduces refcounting and SchedulerWorker::RequestThreadTermination() so that a caller may safely release a SchedulerWorker at runtime. The expected usage is scoped_refptr<SchedulerWorker> worker_ = /* Existing Worker */ worker_->Cleanup(); worker_ = nullptr; Done! Reference Change: https://codereview.chromium.org/2686593003/ BUG=684080 Review-Url: https://codereview.chromium.org/2692863012 Cr-Commit-Position: refs/heads/master@{#452686} Committed: https://chromium.googlesource.com/chromium/src/+/bcc569081c8c449f98da7d1130bc... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:260001) as https://chromium.googlesource.com/chromium/src/+/bcc569081c8c449f98da7d1130bc... |