|
|
Created:
4 years, 5 months ago by robliao Modified:
4 years, 5 months ago CC:
chromium-reviews, gab+watch_chromium.org, robliao+watch_chromium.org, fdoray+watch_chromium.org, danakj Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool
BUG=553459
Committed: https://crrev.com/8a4029212fd36fdd8befc9479d6e2966bd27b95e
Cr-Commit-Position: refs/heads/master@{#407205}
Patch Set 1 #
Total comments: 41
Patch Set 2 : CR Feedback #Patch Set 3 : CR Feedback fdoray@ #
Total comments: 9
Patch Set 4 : Test Controlled Detachment and Some Cleanup #
Total comments: 39
Patch Set 5 : CR Feedback gab@ #Patch Set 6 : Rebase on top of SchedulerWorkerPoolParams #Patch Set 7 : CR Feedback Continuation #
Total comments: 29
Patch Set 8 : CR Feedback #Patch Set 9 : Rebase #Patch Set 10 : CR Feedback #
Total comments: 3
Patch Set 11 : TimeTicks #Patch Set 12 : Fenced Load #
Total comments: 1
Depends on Patchset: Messages
Total messages: 67 (33 generated)
robliao@chromium.org changed reviewers: + fdoray@chromium.org, gab@chromium.org
Description was changed from ========== Enable Lazy Creation and Thread Detachment in the Scheduler Worker Pool BUG=553459 ========== to ========== Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool BUG=553459 ==========
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:525: return suggested_reclaim_time_; From scheduler_worker.cc: while (...) { scoped_refptr<Sequence> sequence = outer_->delegate_->GetWork(outer_); if (!sequence) { if (outer_->delegate_->CanDetach(outer_)) { detached_thread = outer_->Detach(); ... } WaitForWork(); continue; } ... } What prevents the worker from detaching as soon as GetWork() returns nullptr (i.e. without staying idle for |suggested_reclaim_time_|)? https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && Add comment explaining why it doesn't matter if |num_single_threaded_runners_| is incremented right after this (since no task posted to the newly created single-threaded TaskRunner, there is no need to keep the current "physical" thread). https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:608: if (!idle_workers_stack_.Contains(worker)) A detached worker is only re-created when its WakeUp() method is called, right? WakeUp() can be called from 2 places: A) https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... A single-threaded task is posted to the worker. The next time that GetWork() is called, the worker will be removed from the idle stack because work is available (no other worker can run the single-threaded task). B) https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... A worker is woken up to run a shared task. Right before this is done, the worker is popped from the idle stack. I don't see how detachment makes it possible to try to add the same worker to the idle stack twice. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:99: void UnregisterSingleThreadTaskRunner(SchedulerWorker* worker); I would would be better if these methods weren't here: - They aren't meant to be called from outside of scheduler_worker_pool_impl.cc - They don't access any member of SchedulerWorkerPoolImpl Could you: - Delete these methods. - Make SchedulerSingleThreadTaskRunner a nested class of SchedulerWorkerPoolImpl. - Call SchedulerWorkerDelegateImpl::RegisterSingleThreadTaskRunner directly from SchedulerSingleThreadTaskRunner. class SchedulerWorkerPoolImpl::SchedulerSingleThreadTaskRunner { public: SchedulerSingleThreadTaskRunner(..., SchedulerWorker* worker) { auto delegate = static_cast<SchedulerWorkerDelegateImpl*>(worker->delegate()); delegate->RegisterSingleThreadTaskRunner(); } ... }; https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent instead of a SchedulerLock+bool. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: Test that no detachment occurs when suggested reclaim time is TimeDelta::Max? https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498: waiter_.Signal(); Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might not have reached line 457 yet when Reset() is callled. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues, Nothing prevents all CountZeroTlsValue tasks from running on the same thread (the Wait task posted at line 522 doesn't have to run on the same thread as the CountZeroTlsValue task). Maybe you should move the Signal+Wait inside CountZeroTlsValue?
lg, a single comment for now and +1 to Francois'. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && Why do you need Release_Load() here?
Thanks for the review! https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:525: return suggested_reclaim_time_; On 2016/07/04 19:41:32, fdoray wrote: > From scheduler_worker.cc: > > while (...) { > scoped_refptr<Sequence> sequence = outer_->delegate_->GetWork(outer_); > if (!sequence) { > if (outer_->delegate_->CanDetach(outer_)) { > detached_thread = outer_->Detach(); > ... > } > WaitForWork(); > continue; > } > ... > } > > What prevents the worker from detaching as soon as GetWork() returns nullptr > (i.e. without staying idle for |suggested_reclaim_time_|)? Nice catch! Nothing at the moment. Fixed with an idle cycle bool tracked in CanDetach() https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/07 15:58:30, gab wrote: > Comment Done. > Why do you need Release_Load() here? Since CanDetach may make decisions based off of surrounding state from the single threaded task runners (like idle thread accounting), I figured it would be robust to make sure everything is updated at this point. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:608: if (!idle_workers_stack_.Contains(worker)) On 2016/07/04 19:41:32, fdoray wrote: > A detached worker is only re-created when its WakeUp() method is called, right? > WakeUp() can be called from 2 places: > > A) > https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... > A single-threaded task is posted to the worker. The next time that GetWork() is > called, the worker will be removed from the idle stack because work is available > (no other worker can run the single-threaded task). > > B) > https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... > A worker is woken up to run a shared task. Right before this is done, the worker > is popped from the idle stack. > > I don't see how detachment makes it possible to try to add the same worker to > the idle stack twice. Prior to this change, GetWork had two preconditions: 1) There may be shared work to be done (The worker has been removed from the stack via WakeUpOneWorker). 2) There is single-threaded work to be done (The worker will remove itself from the stack). With sleep timeouts, this precondition is no longer true since we may have been woken up by the timeout. This means we'll call GetWork while we're idle, note that there's no work to be done, and then readd ourselves back onto the idle workers stack. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:99: void UnregisterSingleThreadTaskRunner(SchedulerWorker* worker); On 2016/07/04 19:41:32, fdoray wrote: > I would would be better if these methods weren't here: > - They aren't meant to be called from outside of scheduler_worker_pool_impl.cc > - They don't access any member of SchedulerWorkerPoolImpl > > Could you: > - Delete these methods. > - Make SchedulerSingleThreadTaskRunner a nested class of > SchedulerWorkerPoolImpl. > - Call SchedulerWorkerDelegateImpl::RegisterSingleThreadTaskRunner directly from > SchedulerSingleThreadTaskRunner. > > class SchedulerWorkerPoolImpl::SchedulerSingleThreadTaskRunner { > public: > SchedulerSingleThreadTaskRunner(..., SchedulerWorker* worker) { > auto delegate = > static_cast<SchedulerWorkerDelegateImpl*>(worker->delegate()); > delegate->RegisterSingleThreadTaskRunner(); > } > ... > }; Done. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/04 19:41:32, fdoray wrote: > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent instead of > a SchedulerLock+bool. Here, that seems like an abuse of the WaitableEvent since we don't intend to wait on it. Maybe we should be using an atomic instead in those cases? https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: On 2016/07/04 19:41:33, fdoray wrote: > Test that no detachment occurs when suggested reclaim time is TimeDelta::Max? If we could test this, this would be better handled in the SchedulerWorkerTest. In either case, I'm not sure how we could test this since it depends on an indefinite waiting period. The test would have to check the state at an arbitrary time, and if the component was indeed adversarial, it could simply detach after the arbitrary time. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498: waiter_.Signal(); On 2016/07/04 19:41:33, fdoray wrote: > Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might not > have reached line 457 yet when Reset() is callled. The tricky bit here is that we can't wait for all idle workers because they won't ever be idle if they're waiting for the signal. The WaitForAllTasksToRun in the for loop is the best current guarantee short of doing something semaphore-ish to make sure the thread is occupied before dispatching the next set. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues, On 2016/07/04 19:41:32, fdoray wrote: > Nothing prevents all CountZeroTlsValue tasks from running on the same thread > (the Wait task posted at line 522 doesn't have to run on the same thread as the > CountZeroTlsValue task). Maybe you should move the Signal+Wait inside > CountZeroTlsValue? Doesn't WaitForAllTasksToRun with a waiting task also help provide this guarantee?
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:608: if (!idle_workers_stack_.Contains(worker)) On 2016/07/07 17:49:16, robliao wrote: > On 2016/07/04 19:41:32, fdoray wrote: > > A detached worker is only re-created when its WakeUp() method is called, > right? > > WakeUp() can be called from 2 places: > > > > A) > > > https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... > > A single-threaded task is posted to the worker. The next time that GetWork() > is > > called, the worker will be removed from the idle stack because work is > available > > (no other worker can run the single-threaded task). > > > > B) > > > https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo... > > A worker is woken up to run a shared task. Right before this is done, the > worker > > is popped from the idle stack. > > > > I don't see how detachment makes it possible to try to add the same worker to > > the idle stack twice. > > Prior to this change, GetWork had two preconditions: > 1) There may be shared work to be done (The worker has been removed from the > stack via WakeUpOneWorker). > 2) There is single-threaded work to be done (The worker will remove itself from > the stack). > > With sleep timeouts, this precondition is no longer true since we may have been > woken up by the timeout. This means we'll call GetWork while we're idle, note > that there's no work to be done, and then readd ourselves back onto the idle > workers stack. You're right. Thanks! https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/07 17:49:16, robliao wrote: > On 2016/07/04 19:41:32, fdoray wrote: > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent instead > of > > a SchedulerLock+bool. > > Here, that seems like an abuse of the WaitableEvent since we don't intend to > wait on it. Maybe we should be using an atomic instead in those cases? It would be nice to have an AtomicBool class. Since we don't have it yet, you can keep this as-is. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: On 2016/07/07 17:49:16, robliao wrote: > On 2016/07/04 19:41:33, fdoray wrote: > > Test that no detachment occurs when suggested reclaim time is TimeDelta::Max? > > If we could test this, this would be better handled in the SchedulerWorkerTest. > In either case, I'm not sure how we could test this since it depends on an > indefinite waiting period. The test would have to check the state at an > arbitrary time, and if the component was indeed adversarial, it could simply > detach after the arbitrary time. Acknowledged. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498: waiter_.Signal(); On 2016/07/07 17:49:16, robliao wrote: > On 2016/07/04 19:41:33, fdoray wrote: > > Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might > not > > have reached line 457 yet when Reset() is callled. > > The tricky bit here is that we can't wait for all idle workers because they > won't ever be idle if they're waiting for the signal. The WaitForAllTasksToRun > in the for loop is the best current guarantee short of doing something > semaphore-ish to make sure the thread is occupied before dispatching the next > set. If Signal() stays before WaitForAllWorkersIdleForTesting(), all workers should eventually become idle? I think Reset() should be moved after WaitForAllWorkersIdleForTesting() to avoid having a task that doesn't call Wait() before Reset() is called (and thus blocks forever when it does call Wait()). https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues, On 2016/07/07 17:49:16, robliao wrote: > On 2016/07/04 19:41:32, fdoray wrote: > > Nothing prevents all CountZeroTlsValue tasks from running on the same thread > > (the Wait task posted at line 522 doesn't have to run on the same thread as > the > > CountZeroTlsValue task). Maybe you should move the Signal+Wait inside > > CountZeroTlsValue? > > Doesn't WaitForAllTasksToRun with a waiting task also help provide this > guarantee? Line 486: You create factories with PARALLEL TaskRunners. Loop iteration 1: Line 511: Task runs on thread A. Lines 515 and 518: Tasks run on thread B (nothing prevents these tasks from running on another thread than A). Loop iteration 2: Line 511: Task runs on thread A (again). [since two CountZeroTlsValues tasks ran on the same thread, there is a thread that won't be inspected]
Updated. Thanks! https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498: waiter_.Signal(); On 2016/07/07 20:45:57, fdoray wrote: > On 2016/07/07 17:49:16, robliao wrote: > > On 2016/07/04 19:41:33, fdoray wrote: > > > Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might > > not > > > have reached line 457 yet when Reset() is callled. > > > > The tricky bit here is that we can't wait for all idle workers because they > > won't ever be idle if they're waiting for the signal. The > WaitForAllTasksToRun > > in the for loop is the best current guarantee short of doing something > > semaphore-ish to make sure the thread is occupied before dispatching the next > > set. > > If Signal() stays before WaitForAllWorkersIdleForTesting(), all workers should > eventually become idle? I think Reset() should be moved after > WaitForAllWorkersIdleForTesting() to avoid having a task that doesn't call > Wait() before Reset() is called (and thus blocks forever when it does call > Wait()). I initially misread this. Yup, let's move Reset after the wait for idle. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues, On 2016/07/07 20:45:57, fdoray wrote: > On 2016/07/07 17:49:16, robliao wrote: > > On 2016/07/04 19:41:32, fdoray wrote: > > > Nothing prevents all CountZeroTlsValue tasks from running on the same thread > > > (the Wait task posted at line 522 doesn't have to run on the same thread as > > the > > > CountZeroTlsValue task). Maybe you should move the Signal+Wait inside > > > CountZeroTlsValue? > > > > Doesn't WaitForAllTasksToRun with a waiting task also help provide this > > guarantee? > > Line 486: You create factories with PARALLEL TaskRunners. > > Loop iteration 1: > Line 511: Task runs on thread A. > Lines 515 and 518: Tasks run on thread B (nothing prevents these tasks from > running on another thread than A). > > Loop iteration 2: > Line 511: Task runs on thread A (again). > [since two CountZeroTlsValues tasks ran on the same thread, there is a thread > that won't be inspected] Ah yes, indeed. I've now moved both waitable events into CountZeroTlsValues.
lgtm with nits and something that could be addressed in a separate CL if you prefer https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker performed and idle cycle. Workers start out idle. an* idle cycle https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if num_single_threaded_runners_ is incremented after this |num_single_threaded_runners_| https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach = performed_idle_cycle_ && const bool https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:534: !outer_->HasJoinedForTesting(); Sadly this is racy: - Thread A: SchedulerWorker::CanDetach() - returns true. - Thread B: SchedulerWorkerPoolImpl::JoinForTesting() - Thread B: SchedulerWorker::JoinForTesting() - Thread B: Acquires SchedulerWorker::thread_lock_ - Thread A: SchedulerWorker::Detach() - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds. - Thread A: Tries to acquire SchedulerWorker::thread_lock_. Dead lock between the join and the acquisition of SchedulerWorker::thread_lock_ I think SchedulerWorker::JoinForTesting() could be rewritten like this: void SchedulerWorker::JoinForTesting() { { AutoSchedulerLock auto_lock(should_exit_for_testing_lock_); should_exit_for_testing_ = true; } WakeUp(); std::unique_ptr<Thread> thread; { AutoSchedulerLock auto_lock(thread_lock_); thread = std::move(thread_); } // Join without holding the lock! thread->Join(); } and SchedulerWorker::Detach() like this: // Returns nullptr if called during thread->Join(). The thread won't // detach. It will exit because !outer_->ShouldExitForTesting(). std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() { DCHECK(!ShouldExitForTesting()) << "Worker was already joined"; AutoSchedulerLock auto_lock(thread_lock_); // If a wakeup is pending, then a WakeUp() came in while we were deciding to // detach. This means we can't go away anymore since we would break the // guarantee that we call GetWork() after a successful wakeup. return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_); } To remove the requirement that Delegate::CanDetach() must return false when the thread may be joined. Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()? Let me know if my reasoning makes sense.
Patchset #4 (id:60001) has been deleted
Patchset #4 (id:80001) has been deleted
Patchset #4 (id:100001) has been deleted
Patchset #4 (id:120001) has been deleted
Patchset #4 (id:140001) has been deleted
Went with test controlled detachment to fix the race. Thanks! https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker performed and idle cycle. Workers start out idle. On 2016/07/08 18:58:06, fdoray wrote: > an* idle cycle Done. https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if num_single_threaded_runners_ is incremented after this On 2016/07/08 18:58:06, fdoray wrote: > |num_single_threaded_runners_| Done. https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach = performed_idle_cycle_ && On 2016/07/08 18:58:07, fdoray wrote: > const bool Done. https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:534: !outer_->HasJoinedForTesting(); On 2016/07/08 18:58:07, fdoray wrote: > Sadly this is racy: > - Thread A: SchedulerWorker::CanDetach() - returns true. > - Thread B: SchedulerWorkerPoolImpl::JoinForTesting() > - Thread B: SchedulerWorker::JoinForTesting() > - Thread B: Acquires SchedulerWorker::thread_lock_ > - Thread A: SchedulerWorker::Detach() > - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds. > - Thread A: Tries to acquire SchedulerWorker::thread_lock_. > Dead lock between the join and the acquisition of SchedulerWorker::thread_lock_ > > I think SchedulerWorker::JoinForTesting() could be rewritten like this: > > void SchedulerWorker::JoinForTesting() { > { > AutoSchedulerLock auto_lock(should_exit_for_testing_lock_); > should_exit_for_testing_ = true; > } > WakeUp(); > > std::unique_ptr<Thread> thread; > > { > AutoSchedulerLock auto_lock(thread_lock_); > thread = std::move(thread_); > } > > // Join without holding the lock! > thread->Join(); > } > > and SchedulerWorker::Detach() like this: > > // Returns nullptr if called during thread->Join(). The thread won't > // detach. It will exit because !outer_->ShouldExitForTesting(). > std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() { > DCHECK(!ShouldExitForTesting()) << "Worker was already joined"; > AutoSchedulerLock auto_lock(thread_lock_); > // If a wakeup is pending, then a WakeUp() came in while we were deciding to > // detach. This means we can't go away anymore since we would break the > // guarantee that we call GetWork() after a successful wakeup. > return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_); > } > > To remove the requirement that Delegate::CanDetach() must return false when the > thread may be joined. > > Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()? > > Let me know if my reasoning makes sense. Yup. This became the first test that flew in the face of this comment in SchedulerWorker: // Normally holding a lock and joining is dangerous. However, since this is // only for testing, we're okay since the only scenario that could impact this // is a call to Detach, which is disallowed by having the delegate always // return false for the CanDetach call. Because we have no control over the delegate from the test, I added HasJoinedForTesting because of the race the other way: Thread wakes up after an idle cycle and attempts to detach after the ShouldExitForTesting check. Also long as the joins occur before the detach timeout we should be okay. Alternatively, we can wait until after the detach timeout occurs as well. I'm not sure we want to go with the suggested code because it walks down a different problematic road. If we were to make SchedulerWorker robust to Join and Detach races, now there's no guarantee that the current thread is cleaned up after Join since it could be in the middle of detaching, which also seems undesirable for tests that care about that. It seems that tension comes from having no control on the delegate, so I've added a disallow control to address this. Multithreaded code is tricky indeed!
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/07/08 21:17:03, robliao wrote: > Went with test controlled detachment to fix the race. Thanks! > > https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... > File base/task_scheduler/scheduler_worker_pool_impl.cc (right): > > https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... > base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker > performed and idle cycle. Workers start out idle. > On 2016/07/08 18:58:06, fdoray wrote: > > an* idle cycle > > Done. > > https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... > base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if > num_single_threaded_runners_ is incremented after this > On 2016/07/08 18:58:06, fdoray wrote: > > |num_single_threaded_runners_| > > Done. > > https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... > base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach = > performed_idle_cycle_ && > On 2016/07/08 18:58:07, fdoray wrote: > > const bool > > Done. > > https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... > base/task_scheduler/scheduler_worker_pool_impl.cc:534: > !outer_->HasJoinedForTesting(); > On 2016/07/08 18:58:07, fdoray wrote: > > Sadly this is racy: > > - Thread A: SchedulerWorker::CanDetach() - returns true. > > - Thread B: SchedulerWorkerPoolImpl::JoinForTesting() > > - Thread B: SchedulerWorker::JoinForTesting() > > - Thread B: Acquires SchedulerWorker::thread_lock_ > > - Thread A: SchedulerWorker::Detach() > > - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds. > > - Thread A: Tries to acquire SchedulerWorker::thread_lock_. > > Dead lock between the join and the acquisition of > SchedulerWorker::thread_lock_ > > > > I think SchedulerWorker::JoinForTesting() could be rewritten like this: > > > > void SchedulerWorker::JoinForTesting() { > > { > > AutoSchedulerLock auto_lock(should_exit_for_testing_lock_); > > should_exit_for_testing_ = true; > > } > > WakeUp(); > > > > std::unique_ptr<Thread> thread; > > > > { > > AutoSchedulerLock auto_lock(thread_lock_); > > thread = std::move(thread_); > > } > > > > // Join without holding the lock! > > thread->Join(); > > } > > > > and SchedulerWorker::Detach() like this: > > > > // Returns nullptr if called during thread->Join(). The thread won't > > // detach. It will exit because !outer_->ShouldExitForTesting(). > > std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() { > > DCHECK(!ShouldExitForTesting()) << "Worker was already joined"; > > AutoSchedulerLock auto_lock(thread_lock_); > > // If a wakeup is pending, then a WakeUp() came in while we were deciding to > > // detach. This means we can't go away anymore since we would break the > > // guarantee that we call GetWork() after a successful wakeup. > > return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_); > > } > > > > To remove the requirement that Delegate::CanDetach() must return false when > the > > thread may be joined. > > > > Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()? > > > > Let me know if my reasoning makes sense. > > Yup. This became the first test that flew in the face of this comment in > SchedulerWorker: > // Normally holding a lock and joining is dangerous. However, since this is > // only for testing, we're okay since the only scenario that could impact this > // is a call to Detach, which is disallowed by having the delegate always > // return false for the CanDetach call. > > Because we have no control over the delegate from the test, I added > HasJoinedForTesting because of the race the other way: Thread wakes up after an > idle cycle and attempts to detach after the ShouldExitForTesting check. Also > long as the joins occur before the detach timeout we should be okay. > > Alternatively, we can wait until after the detach timeout occurs as well. > > I'm not sure we want to go with the suggested code because it walks down a > different problematic road. If we were to make SchedulerWorker robust to Join > and Detach races, now there's no guarantee that the current thread is cleaned up > after Join since it could be in the middle of detaching, which also seems > undesirable for tests that care about that. > > It seems that tension comes from having no control on the delegate, so I've > added a disallow control to address this. > > Multithreaded code is tricky indeed! No further comments?
Cc danakj for optional review.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
lg, full review this time, comments below https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && > > Why do you need Release_Load() here? > Since CanDetach may make decisions based off of surrounding state from the > single threaded task runners (like idle thread accounting), I figured it would > be robust to make sure everything is updated at this point. Hmmm, I don't think this matters, idle thread accounting is done on its own (and is in fact checked before the ReleaseLoad here anyways). All checks are self-sufficient so this one is really only about |num_single_threaded_runners| IMO. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:211: subtle::NoBarrier_AtomicIncrement(&num_single_threaded_runners_, 1); Let's wait until Dana/Francois settle the barrier/no-barrier discussion @ https://codereview.chromium.org/2019763002/diff/140001/base/task_scheduler/ta... https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool performed_idle_cycle_ = true; is_idle_ ? Then it's true whether it's a new worker or because it became idle? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:273: "Workers can detach during join."; CanWorker...() versus "Workers" (plural versus singular forms). I think it should be plural everywhere? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:399: SchedulerWorker* worker) missing space: git cl format? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:447: performed_idle_cycle_ = true; This is redundant per the value being initialized to true. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:506: performed_idle_cycle_ = false; is_idle_ = false; with no comments is sufficient IMO. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ = true; Shouldn't is_idle_ instead be set when about to return no work from GetWork()? It seems weird to me to check |is_idle_| and then set it to true after? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:561: worker_detachment_allowed_(true), If keeping the bool (ref. WaitableEvent comment), initialize it at declaration site. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:616: // Detachment may cause multiple attempts to add because the delegate cannot Shouldn't a thread that wakes up after it's sleep timeout for detachment detach without bothering to GetWork()? I guess we need to check GetWork() in the racy event that the thread was actually woken for work but happens to only get a CPU slot after its detach deadline? And hence we need this? (just making sure I follow :-)!) https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:88: void DisallowWorkerDetachmentForTesting(); It seems weird to me that something that is "recommended" not be highlighted more than this. I see though that JoinForTesting() ensures this is called (but it can't fully confirm it was called early enough). So perhaps tweak the comment to something like: // Disallows worker thread detachment. In tests, this must be called before the last synchronization point with all worker threads (e.g. before having them sync on the last WaitableEvent or even before dispatching the first set of work) before calling JoinForTesting() or detach/join races may ensue. Actually, this makes me think, should we not instead get rid of Detach/Join races? I guess that adds a bit of test specific synchronization logic to Detach() in non-test code but since the thread is about to Detach(), it doesn't need to be efficient and the Join() side is only for testing so wtv. I think I prefer that to the complexity involved in this required DisallowWorkerDetachmentForTesting() call. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:177: bool worker_detachment_allowed_; Use a manual WaitableEvent instead of bool + lock? (this is what we did in many other similar scenarios in scheduler) https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:410: EXPECT_EQ(kMagicTlsValue, reinterpret_cast<size_t>(slot_.Get())); Instead of using TLS to do an "is this the right thread" check, I think it's cleaner to use ThreadCheckerImpl. To set it you can do: // Re-attach |thread_checker_| to the current thread. thread_checker_.DetachFromThread(); thread_checker.CalledOnValidThread(); And then simply use CalledOnValidThread() in the check phase. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444: task_waiter.Reset(); Make the event AUTOMATIC instead of doing Wait()+Reset() https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:447: // Give the worker pool a chance to reclaim its threads. It was never forced to create more than one thread above so I don't understand what this means? IMO to test this you need N blocking tasks (N-1 of which are not single-threaded). Release the single-threaded one first, then the others. And confirm that despite that the single-threaded task can still run on the same thread (i.e. it wasn't among the terminated threads). Would be nice to verify actual thread count in process too to confirm that threads actually were created and wound down (confirming that things happened and single-threaded bound thread was protected). https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait, Instead of counting lack of TLS values here I think a cleaner approach would be to build a std::set of PlatformThreadRef in the first phase and check against it in the second phase (effectively ThreadCheckerImpl but pooled). https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_stack.h (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_stack.h:38: // Removes the top SchedulerWorker from the stack and returns it. Doesn't "remove"
Patchset #5 (id:180001) has been deleted
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/13 18:36:31, gab wrote: > > > Why do you need Release_Load() here? > > Since CanDetach may make decisions based off of surrounding state from the > > single threaded task runners (like idle thread accounting), I figured it would > > be robust to make sure everything is updated at this point. > > Hmmm, I don't think this matters, idle thread accounting is done on its own (and > is in fact checked before the ReleaseLoad here anyways). All checks are > self-sufficient so this one is really only about |num_single_threaded_runners| > IMO. In either case, it's the safest thing to do here. We want to make sure all state from before gets synced up to this point. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:211: subtle::NoBarrier_AtomicIncrement(&num_single_threaded_runners_, 1); On 2016/07/13 18:36:31, gab wrote: > Let's wait until Dana/Francois settle the barrier/no-barrier discussion @ > https://codereview.chromium.org/2019763002/diff/140001/base/task_scheduler/ta... sgtm. As we are all well aware, atomics are tricky. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool performed_idle_cycle_ = true; On 2016/07/13 18:36:31, gab wrote: > is_idle_ ? Then it's true whether it's a new worker or because it became idle? The delegate needs to distinguish between a thread that's idle (GetWork() == nullptr) and a thread that's idle after a running an idle cycle (Sleeping and then waking up again with GetWork() == nullptr). Tracking is_idle_ is insufficient as that makes detachment happen too early (detachment is checked after the GetWork() call. We can't check for detachment after wakeup because we can't distinguish between wakeups and timeouts (and they're racy). Checking for an idle cycle is equivalent to checking that we got no work, attempted to check if we can detach, slept, and then woke up again. If we went through this cycle once, then we can be assured that we should detach in the next wake up if there's no work. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:273: "Workers can detach during join."; On 2016/07/13 18:36:31, gab wrote: > CanWorker...() versus "Workers" (plural versus singular forms). I think it > should be plural everywhere? The act of detaching applies to a single worker, making it singular. A set of workers can't determine among themselves if detachment is allowed. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:399: SchedulerWorker* worker) On 2016/07/13 18:36:31, gab wrote: > missing space: git cl format? Done. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:447: performed_idle_cycle_ = true; On 2016/07/13 18:36:31, gab wrote: > This is redundant per the value being initialized to true. The delegate is reused across thread instances, so we can have multiple MainEntry calls per delegate. The state needs to be reset here. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:506: performed_idle_cycle_ = false; On 2016/07/13 18:36:31, gab wrote: > is_idle_ = false; > > with no comments is sufficient IMO. sgtm. Removed. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ = true; On 2016/07/13 18:36:31, gab wrote: > Shouldn't is_idle_ instead be set when about to return no work from GetWork()? > It seems weird to me to check |is_idle_| and then set it to true after? See is_idle_ comment. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:561: worker_detachment_allowed_(true), On 2016/07/13 18:36:31, gab wrote: > If keeping the bool (ref. WaitableEvent comment), initialize it at declaration > site. See earlier comment. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:616: // Detachment may cause multiple attempts to add because the delegate cannot On 2016/07/13 18:36:31, gab wrote: > Shouldn't a thread that wakes up after it's sleep timeout for detachment detach > without bothering to GetWork()? I guess we need to check GetWork() in the racy > event that the thread was actually woken for work but happens to only get a CPU > slot after its detach deadline? And hence we need this? (just making sure I > follow :-)!) We can't distinguish between sleep timeouts and actual wakeups. They look the same from the thread's perspective. Add to that the race, and we have to check GetWork in all cases. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:88: void DisallowWorkerDetachmentForTesting(); On 2016/07/13 18:36:31, gab wrote: > It seems weird to me that something that is "recommended" not be highlighted > more than this. I see though that JoinForTesting() ensures this is called (but > it can't fully confirm it was called early enough). So perhaps tweak the comment > to something like: > > // Disallows worker thread detachment. In tests, this must be called before the > last synchronization point with all worker threads (e.g. before having them sync > on the last WaitableEvent or even before dispatching the first set of work) > before calling JoinForTesting() or detach/join races may ensue. > > Actually, this makes me think, should we not instead get rid of Detach/Join > races? I guess that adds a bit of test specific synchronization logic to > Detach() in non-test code but since the thread is about to Detach(), it doesn't > need to be efficient and the Join() side is only for testing so wtv. > > I think I prefer that to the complexity involved in this required > DisallowWorkerDetachmentForTesting() call. I've amended the comment. Tests that specify a reclaim time of TimeDelta::Max() don't need to call this function. The reason why I think we should avoid fixing the race is it only applies to tests that allow detachment (and as of the new patchset, it's only the tests that test for detachment). Given that the test is looking for detachment, it already has to construct the worker pool where it can inspect state at certain timed points. All other tests can specify TimeDelta::Max() to avoid the issue entirely. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:177: bool worker_detachment_allowed_; On 2016/07/13 18:36:31, gab wrote: > Use a manual WaitableEvent instead of bool + lock? (this is what we did in many > other similar scenarios in scheduler) Discussed here: https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... In short, it's an abuse of WaitableEvent since we're never actually going to wait on this value. What we really want is something like AtomicBool. Until then, this is the next best thing. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:410: EXPECT_EQ(kMagicTlsValue, reinterpret_cast<size_t>(slot_.Get())); On 2016/07/13 18:36:31, gab wrote: > Instead of using TLS to do an "is this the right thread" check, I think it's > cleaner to use ThreadCheckerImpl. > > To set it you can do: > > // Re-attach |thread_checker_| to the current thread. > thread_checker_.DetachFromThread(); > thread_checker.CalledOnValidThread(); > > And then simply use CalledOnValidThread() in the check phase. Nice. I forgot about ThreadCheckerImpl. Done. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444: task_waiter.Reset(); On 2016/07/13 18:36:31, gab wrote: > Make the event AUTOMATIC instead of doing Wait()+Reset() I thought we were preferring Manual events for tests? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:447: // Give the worker pool a chance to reclaim its threads. On 2016/07/13 18:36:32, gab wrote: > It was never forced to create more than one thread above so I don't understand > what this means? IMO to test this you need N blocking tasks (N-1 of which are > not single-threaded). Release the single-threaded one first, then the others. > And confirm that despite that the single-threaded task can still run on the same > thread (i.e. it wasn't among the terminated threads). > > Would be nice to verify actual thread count in process too to confirm that > threads actually were created and wound down (confirming that things happened > and single-threaded bound thread was protected). The goal of this test is to verify that the worker pool does not reclaim the thread with single threaded work after an idle cycle. This test fails if the thread was allowed to reclaim (e.g. CanDetach always returns true). From the caller's standpoint, we don't know how many threads were created. All we know is that stuff can happen during the suggested reclaim time, so we want for that long and then some to ensure things have settled down. I wanted to avoid checking thread counts since I don't think there are guarantees for stable thread counts in a testing environment. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait, On 2016/07/13 18:36:31, gab wrote: > Instead of counting lack of TLS values here I think a cleaner approach would be > to build a std::set of PlatformThreadRef in the first phase and check against it > in the second phase (effectively ThreadCheckerImpl but pooled). Wouldn't using the TLS be basically the same thing except with less bookkeeping? The test doesn't actually care about the thread instances themselves, but rather that the environment is clean when it entered. TLS's are guaranteed by the OS to be 0/null upon entry into a new thread. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_stack.h (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_stack.h:38: // Removes the top SchedulerWorker from the stack and returns it. On 2016/07/13 18:36:32, gab wrote: > Doesn't "remove" Indeed! Done.
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/13 20:19:46, robliao wrote: > On 2016/07/13 18:36:31, gab wrote: > > > > Why do you need Release_Load() here? > > > Since CanDetach may make decisions based off of surrounding state from the > > > single threaded task runners (like idle thread accounting), I figured it > would > > > be robust to make sure everything is updated at this point. > > > > Hmmm, I don't think this matters, idle thread accounting is done on its own > (and > > is in fact checked before the ReleaseLoad here anyways). All checks are > > self-sufficient so this one is really only about |num_single_threaded_runners| > > IMO. > > In either case, it's the safest thing to do here. We want to make sure all state > from before gets synced up to this point. I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() doesn't document that it produces a memory barrier and nothing relies on it doing so. The only other thing after this call is HasJoinedForTesting() which is self-sufficient. Everything in the scheduler is designed to do exactly what it needs to be safe. Not more, not less. Adding Release semantics here implicitly encodes a requirement which doesn't exist. It's like using ThreadTaskRunnerHandle when one merely cares for a sequence ;-). https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/07 20:45:57, fdoray wrote: > On 2016/07/07 17:49:16, robliao wrote: > > On 2016/07/04 19:41:32, fdoray wrote: > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent > instead > > of > > > a SchedulerLock+bool. > > > > Here, that seems like an abuse of the WaitableEvent since we don't intend to > > wait on it. Maybe we should be using an atomic instead in those cases? > > It would be nice to have an AtomicBool class. Since we don't have it yet, you > can keep this as-is. We have CancellationFlag. It's name is a bit too specific and no one has bothered renaming it, but what some people do is typedef it to a WhateverElseFlag. Whether the flag or a WaitableEvent I would strongly prefer having a single member here. Having single-member locks already bothers me some and when they're test specific it bothers me some more. PS: I'd be happy to see someone change that class to a BooleanFlag with a typedef at bottom to CancellationFlag for legacy usage. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool performed_idle_cycle_ = true; On 2016/07/13 20:19:46, robliao wrote: > On 2016/07/13 18:36:31, gab wrote: > > is_idle_ ? Then it's true whether it's a new worker or because it became idle? > > The delegate needs to distinguish between a thread that's idle (GetWork() == > nullptr) and a thread that's idle after a running an idle cycle (Sleeping and > then waking up again with GetWork() == nullptr). > > Tracking is_idle_ is insufficient as that makes detachment happen too early > (detachment is checked after the GetWork() call. > > We can't check for detachment after wakeup because we can't distinguish between > wakeups and timeouts (and they're racy). > > Checking for an idle cycle is equivalent to checking that we got no work, > attempted to check if we can detach, slept, and then woke up again. If we went > through this cycle once, then we can be assured that we should detach in the > next wake up if there's no work. Ah!! Got it :-). Hadn't fully understood the logic here. Maybe |detach_in_next_cycle_if_no_work_| to be more explicit? Should it actually start at true? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ = true; On 2016/07/13 20:19:46, robliao wrote: > On 2016/07/13 18:36:31, gab wrote: > > Shouldn't is_idle_ instead be set when about to return no work from GetWork()? > > It seems weird to me to check |is_idle_| and then set it to true after? > > See is_idle_ comment. Perhaps expand comment above here to explain the detachment dance logic you just explained to me in that other comment? (your explanation there was very compelling but not captured in code comments AFAICT) https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444: task_waiter.Reset(); On 2016/07/13 20:19:47, robliao wrote: > On 2016/07/13 18:36:31, gab wrote: > > Make the event AUTOMATIC instead of doing Wait()+Reset() > > I thought we were preferring Manual events for tests? I don't think there's a preference. I tend to use MANUAL unless I care about it being reset for re-use which you do here. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539: Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait, On 2016/07/13 20:19:47, robliao wrote: > On 2016/07/13 18:36:31, gab wrote: > > Instead of counting lack of TLS values here I think a cleaner approach would > be > > to build a std::set of PlatformThreadRef in the first phase and check against > it > > in the second phase (effectively ThreadCheckerImpl but pooled). > > Wouldn't using the TLS be basically the same thing except with less bookkeeping? > The test doesn't actually care about the thread instances themselves, but rather > that the environment is clean when it entered. TLS's are guaranteed by the OS to > be 0/null upon entry into a new thread. Ah ok you're right, the TLS effectively acts as a "one of these threads" flag without having to bookkeep threads themselves.
Patchset #7 (id:240001) has been deleted
Rebased and incorporated some more changes. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/13 21:07:55, gab wrote: > On 2016/07/13 20:19:46, robliao wrote: > > On 2016/07/13 18:36:31, gab wrote: > > > > > Why do you need Release_Load() here? > > > > Since CanDetach may make decisions based off of surrounding state from the > > > > single threaded task runners (like idle thread accounting), I figured it > > would > > > > be robust to make sure everything is updated at this point. > > > > > > Hmmm, I don't think this matters, idle thread accounting is done on its own > > (and > > > is in fact checked before the ReleaseLoad here anyways). All checks are > > > self-sufficient so this one is really only about > |num_single_threaded_runners| > > > IMO. > > > > In either case, it's the safest thing to do here. We want to make sure all > state > > from before gets synced up to this point. > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() doesn't > document that it produces a memory barrier and nothing relies on it doing so. > The only other thing after this call is HasJoinedForTesting() which is > self-sufficient. > > Everything in the scheduler is designed to do exactly what it needs to be safe. > Not more, not less. > > Adding Release semantics here implicitly encodes a requirement which doesn't > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a > sequence ;-). In light of the atomic ops discussion, any strong opinion to remove the barrier here? https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool performed_idle_cycle_ = true; On 2016/07/13 21:07:55, gab wrote: > On 2016/07/13 20:19:46, robliao wrote: > > On 2016/07/13 18:36:31, gab wrote: > > > is_idle_ ? Then it's true whether it's a new worker or because it became > idle? > > > > The delegate needs to distinguish between a thread that's idle (GetWork() == > > nullptr) and a thread that's idle after a running an idle cycle (Sleeping and > > then waking up again with GetWork() == nullptr). > > > > Tracking is_idle_ is insufficient as that makes detachment happen too early > > (detachment is checked after the GetWork() call. > > > > We can't check for detachment after wakeup because we can't distinguish > between > > wakeups and timeouts (and they're racy). > > > > Checking for an idle cycle is equivalent to checking that we got no work, > > attempted to check if we can detach, slept, and then woke up again. If we went > > through this cycle once, then we can be assured that we should detach in the > > next wake up if there's no work. > > Ah!! Got it :-). Hadn't fully understood the logic here. > > Maybe |detach_in_next_cycle_if_no_work_| to be more explicit? Should it actually > start at true? I like performed_idle_cycle_ since it lets us express can_detach in terms of things that have happened in the SchedulerWorkerDelegateImpl rather than in terms of itself. e.g. can_detach = performed_idle_cycle_ ... reads better than can_detach = detach_in_next_cycle_if_no_work_ ... https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ = true; On 2016/07/13 21:07:55, gab wrote: > On 2016/07/13 20:19:46, robliao wrote: > > On 2016/07/13 18:36:31, gab wrote: > > > Shouldn't is_idle_ instead be set when about to return no work from > GetWork()? > > > It seems weird to me to check |is_idle_| and then set it to true after? > > > > See is_idle_ comment. > > Perhaps expand comment above here to explain the detachment dance logic you just > explained to me in that other comment? (your explanation there was very > compelling but not captured in code comments AFAICT) Done. https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444: task_waiter.Reset(); On 2016/07/13 21:07:55, gab wrote: > On 2016/07/13 20:19:47, robliao wrote: > > On 2016/07/13 18:36:31, gab wrote: > > > Make the event AUTOMATIC instead of doing Wait()+Reset() > > > > I thought we were preferring Manual events for tests? > > I don't think there's a preference. I tend to use MANUAL unless I care about it > being reset for re-use which you do here. Cool. I prefer automatic here as well. Done.
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/19 22:03:47, robliao wrote: > On 2016/07/13 21:07:55, gab wrote: > > On 2016/07/13 20:19:46, robliao wrote: > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > Why do you need Release_Load() here? > > > > > Since CanDetach may make decisions based off of surrounding state from > the > > > > > single threaded task runners (like idle thread accounting), I figured it > > > would > > > > > be robust to make sure everything is updated at this point. > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done on its > own > > > (and > > > > is in fact checked before the ReleaseLoad here anyways). All checks are > > > > self-sufficient so this one is really only about > > |num_single_threaded_runners| > > > > IMO. > > > > > > In either case, it's the safest thing to do here. We want to make sure all > > state > > > from before gets synced up to this point. > > > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() > doesn't > > document that it produces a memory barrier and nothing relies on it doing so. > > The only other thing after this call is HasJoinedForTesting() which is > > self-sufficient. > > > > Everything in the scheduler is designed to do exactly what it needs to be > safe. > > Not more, not less. > > > > Adding Release semantics here implicitly encodes a requirement which doesn't > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a > > sequence ;-). > > In light of the atomic ops discussion, any strong opinion to remove the barrier > here? From the conclusion we derived, I think the inc/decrements should use Barriers and the read here should use Acquire_Load(). I'm still not convinced that this is required when dealing with a single variable (all memory re-ordering examples I've seen use two variables and even a NoBarrier_Increment has to use RMW -- read-modify-write -- semantics which have an implicit "barrier" for at least that single variable (but would allow memory re-ordering of other memory regions with respect to it?), i.e. maybe we could get away with NoBarrier_Increment and Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we need the Acquire_Load() though as the read doesn't have an implicit RMW like the increments do (don't understand when Release_Load() is appropriate -- C++11 doesn't provide it FWIW). https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/13 21:07:55, gab wrote: > On 2016/07/07 20:45:57, fdoray wrote: > > On 2016/07/07 17:49:16, robliao wrote: > > > On 2016/07/04 19:41:32, fdoray wrote: > > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent > > instead > > > of > > > > a SchedulerLock+bool. > > > > > > Here, that seems like an abuse of the WaitableEvent since we don't intend to > > > wait on it. Maybe we should be using an atomic instead in those cases? > > > > It would be nice to have an AtomicBool class. Since we don't have it yet, you > > can keep this as-is. > > We have CancellationFlag. It's name is a bit too specific and no one has > bothered renaming it, but what some people do is typedef it to a > WhateverElseFlag. > > Whether the flag or a WaitableEvent I would strongly prefer having a single > member here. Having single-member locks already bothers me some and when they're > test specific it bothers me some more. ping ^^ > > PS: I'd be happy to see someone change that class to a BooleanFlag with a > typedef at bottom to CancellationFlag for legacy usage. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ = true; So if we create/wake one too many thread (i.e. 2 too many threads for workload given we always aim to keep one ready), we'll immediately detach it? Or do we want to at least wait for "reclaim time" (in which case I think starting this at false gives us what we want?) https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:564: worker_detachment_allowed_(true), (bool should be replaced by flag as mentioned in previous comment, but if it's not : initialize it inline in header) https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); constexpr https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); "kSuggestedReclaimTimeForDetachTests" or something (otherwise I find it unexpected that it's not used directly in InitializeWorkerPool()) https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: EXPECT_GT(zero_tls_values_, 0); Should this use NoBarrier_Load()? On Windows this is equivalent, but in atomicops_internals_portable.h it appears to do more than just a simple deref. Note: NoBarrier is clearly correct because of the surrounding use of WaitableEvent (not that it's very efficient given the heavier synchronization constructs nearby but the barrier is definitely not necessary)
still lgtm w/ comments https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch... base/task_scheduler/scheduler_worker_pool_impl.cc:534: !outer_->HasJoinedForTesting(); On 2016/07/08 21:17:03, robliao wrote: > On 2016/07/08 18:58:07, fdoray wrote: > > Sadly this is racy: > > - Thread A: SchedulerWorker::CanDetach() - returns true. > > - Thread B: SchedulerWorkerPoolImpl::JoinForTesting() > > - Thread B: SchedulerWorker::JoinForTesting() > > - Thread B: Acquires SchedulerWorker::thread_lock_ > > - Thread A: SchedulerWorker::Detach() > > - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds. > > - Thread A: Tries to acquire SchedulerWorker::thread_lock_. > > Dead lock between the join and the acquisition of > SchedulerWorker::thread_lock_ > > > > I think SchedulerWorker::JoinForTesting() could be rewritten like this: > > > > void SchedulerWorker::JoinForTesting() { > > { > > AutoSchedulerLock auto_lock(should_exit_for_testing_lock_); > > should_exit_for_testing_ = true; > > } > > WakeUp(); > > > > std::unique_ptr<Thread> thread; > > > > { > > AutoSchedulerLock auto_lock(thread_lock_); > > thread = std::move(thread_); > > } > > > > // Join without holding the lock! > > thread->Join(); > > } > > > > and SchedulerWorker::Detach() like this: > > > > // Returns nullptr if called during thread->Join(). The thread won't > > // detach. It will exit because !outer_->ShouldExitForTesting(). > > std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() { > > DCHECK(!ShouldExitForTesting()) << "Worker was already joined"; > > AutoSchedulerLock auto_lock(thread_lock_); > > // If a wakeup is pending, then a WakeUp() came in while we were deciding to > > // detach. This means we can't go away anymore since we would break the > > // guarantee that we call GetWork() after a successful wakeup. > > return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_); > > } > > > > To remove the requirement that Delegate::CanDetach() must return false when > the > > thread may be joined. > > > > Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()? > > > > Let me know if my reasoning makes sense. > > Yup. This became the first test that flew in the face of this comment in > SchedulerWorker: > // Normally holding a lock and joining is dangerous. However, since this is > // only for testing, we're okay since the only scenario that could impact this > // is a call to Detach, which is disallowed by having the delegate always > // return false for the CanDetach call. > > Because we have no control over the delegate from the test, I added > HasJoinedForTesting because of the race the other way: Thread wakes up after an > idle cycle and attempts to detach after the ShouldExitForTesting check. Also > long as the joins occur before the detach timeout we should be okay. > > Alternatively, we can wait until after the detach timeout occurs as well. > > I'm not sure we want to go with the suggested code because it walks down a > different problematic road. If we were to make SchedulerWorker robust to Join > and Detach races, now there's no guarantee that the current thread is cleaned up > after Join since it could be in the middle of detaching, which also seems > undesirable for tests that care about that. > > It seems that tension comes from having no control on the delegate, so I've > added a disallow control to address this. > > Multithreaded code is tricky indeed! Acknowledged. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:144: SchedulerWorkerPoolImpl* worker_pool, Can stay SchedulerWorkerPool* https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an issue if |num_single_threaded_runners_| is incremented after [...] after this because the newly created TaskRunner (from which no task has run yet) will simply run all its tasks on the next physical thread created by this worker. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker thread detachment. If the suggested reclaim time is not // [...], then the test should call this before JoinForTesting() // when no worker thread can be in the middle of detaching. That // can be when all workers are known to be idle (after // WaitForAllWorkersIdleForTesting() returned) or busy (after // posting as many blocking tasks as the number of workers). "before the first set of work is dispatched": If someone wants to disallow detachment before posting work, shouldn't they use TimeDelta::Max() for the suggested reclaim time? https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: worker_pool_->DisallowWorkerDetachmentForTesting(); This is not required given that suggested reclaim time is TimeDelta::Max(), right? https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: worker_pool->DisallowWorkerDetachmentForTesting(); This is not required given that suggested reclaim time is TimeDelta::Max(), right? https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: EXPECT_GT(zero_tls_values_, 0); On 2016/07/20 01:45:21, gab wrote: > Should this use NoBarrier_Load()? On Windows this is equivalent, but in > atomicops_internals_portable.h it appears to do more than just a simple deref. > > Note: NoBarrier is clearly correct because of the surrounding use of > WaitableEvent (not that it's very efficient given the heavier synchronization > constructs nearby but the barrier is definitely not necessary) atomicops.h:19 It is incorrect to make direct assignments to/from an atomic variable. You should use one of the Load or Store routines. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_params.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_params.cc:7: #include "base/time/time.h" Not needed if included in the header. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_params.h (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; #include "base/time/time.h" given that there is a TimeDelta member?
Patchset #8 (id:280001) has been deleted
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 01:45:21, gab wrote: > On 2016/07/19 22:03:47, robliao wrote: > > On 2016/07/13 21:07:55, gab wrote: > > > On 2016/07/13 20:19:46, robliao wrote: > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > Why do you need Release_Load() here? > > > > > > Since CanDetach may make decisions based off of surrounding state from > > the > > > > > > single threaded task runners (like idle thread accounting), I figured > it > > > > would > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done on its > > own > > > > (and > > > > > is in fact checked before the ReleaseLoad here anyways). All checks are > > > > > self-sufficient so this one is really only about > > > |num_single_threaded_runners| > > > > > IMO. > > > > > > > > In either case, it's the safest thing to do here. We want to make sure all > > > state > > > > from before gets synced up to this point. > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() > > doesn't > > > document that it produces a memory barrier and nothing relies on it doing > so. > > > The only other thing after this call is HasJoinedForTesting() which is > > > self-sufficient. > > > > > > Everything in the scheduler is designed to do exactly what it needs to be > > safe. > > > Not more, not less. > > > > > > Adding Release semantics here implicitly encodes a requirement which doesn't > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a > > > sequence ;-). > > > > In light of the atomic ops discussion, any strong opinion to remove the > barrier > > here? > > From the conclusion we derived, I think the inc/decrements should use Barriers > and the read here should use Acquire_Load(). I'm still not convinced that this > is required when dealing with a single variable (all memory re-ordering examples > I've seen use two variables and even a NoBarrier_Increment has to use RMW -- > read-modify-write -- semantics which have an implicit "barrier" for at least > that single variable (but would allow memory re-ordering of other memory regions > with respect to it?), i.e. maybe we could get away with NoBarrier_Increment and > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we need > the Acquire_Load() though as the read doesn't have an implicit RMW like the > increments do (don't understand when Release_Load() is appropriate -- C++11 > doesn't provide it FWIW). I've my understanding is right, the atomic ops are guaranteed to provide atomicity in operations. The barriers control visibility. So IIUC, it's like this [Atomic Operation] NoBarrier_AtomicIncrement of the same atomic variable: Thread A [In-place Atomic Increment] [In-place Atomic Increment] Thread B [In-place Atomic Increment] [In-place Atomic Increment] Result Case 1 0->1 1->2 Result Case 2 1->2 2->3 (And more) The value is atomic (we aren't reading in the middle of another thread writing). The visibility is up to the processor. With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual exclusion) Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic Increment + B] Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic Increment + B] Result Case 1 A: 0->1 3->4 B: 1->2 2->3 Result Case 2 A: 1->2 3->4 B: 0->1 2->3 (...) Non-atomic increment would separate read-increment-write steps, which for word sized data types, would lead to similar behavior as NoBarrier_AtomicIncrement. For sizes larger than a word, we could see intermediate reads and writes. So that indeed suggests I need a barrier at the increment and decrements. So as for whether or not to use Acquire_Load and Release_Load, I guess the question comes down when do I want to sync the world Acquire [Sync] Read Release Read [Sync] Given that this count could be changing any time, practically speaking, it doesn't seem to matter. If we get a 0, it could be 1 right after the read in either case. The other thing to consider is the memory around the value. If that needs to be visible too, then you'll also need a barrier (e.g. a certain variable A is set to a value iff a variable B is 0). A Release_Store ensures that setting A and B in concert is visible immediately after B is set. In this case of a read, the other values are synchronized, so it doesn't matter as much if we sync now or later. With this analysis here, I think arguably we can use NoBarrier_Load since with the increment barriers, the value is visible everywhere. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/20 01:45:21, gab wrote: > On 2016/07/13 21:07:55, gab wrote: > > On 2016/07/07 20:45:57, fdoray wrote: > > > On 2016/07/07 17:49:16, robliao wrote: > > > > On 2016/07/04 19:41:32, fdoray wrote: > > > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent > > > instead > > > > of > > > > > a SchedulerLock+bool. > > > > > > > > Here, that seems like an abuse of the WaitableEvent since we don't intend > to > > > > wait on it. Maybe we should be using an atomic instead in those cases? > > > > > > It would be nice to have an AtomicBool class. Since we don't have it yet, > you > > > can keep this as-is. > > > > We have CancellationFlag. It's name is a bit too specific and no one has > > bothered renaming it, but what some people do is typedef it to a > > WhateverElseFlag. > > > > Whether the flag or a WaitableEvent I would strongly prefer having a single > > member here. Having single-member locks already bothers me some and when > they're > > test specific it bothers me some more. > > ping ^^ > > > > > PS: I'd be happy to see someone change that class to a BooleanFlag with a > > typedef at bottom to CancellationFlag for legacy usage. > Yup. I see your pending change for the atomic flag. We can incorporate that once the name change goes in. A typedef seems like an abuse of the type system just to perform a local rename. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:144: SchedulerWorkerPoolImpl* worker_pool, On 2016/07/20 14:15:43, fdoray wrote: > Can stay SchedulerWorkerPool* Done. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ = true; On 2016/07/20 01:45:21, gab wrote: > So if we create/wake one too many thread (i.e. 2 too many threads for workload > given we always aim to keep one ready), we'll immediately detach it? Or do we > want to at least wait for "reclaim time" (in which case I think starting this at > false gives us what we want?) Yes, if we wake one too many threads with no work, those threads will detach. In general, we shouldn't be doing that as that's indicative of a bug on our side. Waiting for the reclaim time is just as bad since we'll effectively have these threads spin waiting. We should instead fix the wake bug. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an issue if |num_single_threaded_runners_| is incremented after On 2016/07/20 14:15:43, fdoray wrote: > [...] after this because the newly created TaskRunner (from which no task has > run yet) will simply run all its tasks on the next physical thread created by > this worker. Done. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:564: worker_detachment_allowed_(true), On 2016/07/20 01:45:21, gab wrote: > (bool should be replaced by flag as mentioned in previous comment, but if it's > not : initialize it inline in header) Moved initialization. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker thread detachment. If the suggested reclaim time is not On 2016/07/20 14:15:43, fdoray wrote: > // [...], then the test should call this before JoinForTesting() > // when no worker thread can be in the middle of detaching. That > // can be when all workers are known to be idle (after > // WaitForAllWorkersIdleForTesting() returned) or busy (after > // posting as many blocking tasks as the number of workers). > > "before the first set of work is dispatched": If someone wants to disallow > detachment before posting work, shouldn't they use TimeDelta::Max() for the > suggested reclaim time? > Unless the test requires the ability to detach. I've relaxed the requirement from "first". https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); On 2016/07/20 01:45:21, gab wrote: > constexpr Done. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: worker_pool_->DisallowWorkerDetachmentForTesting(); On 2016/07/20 14:15:43, fdoray wrote: > This is not required given that suggested reclaim time is TimeDelta::Max(), > right? Yep. Done. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: worker_pool->DisallowWorkerDetachmentForTesting(); On 2016/07/20 14:15:43, fdoray wrote: > This is not required given that suggested reclaim time is TimeDelta::Max(), > right? Yep. Done. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { On 2016/07/20 01:45:21, gab wrote: > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) We're counting zero TLS values here so the name should stick. The TLS is always initialized to zero. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: EXPECT_GT(zero_tls_values_, 0); On 2016/07/20 14:15:43, fdoray wrote: > On 2016/07/20 01:45:21, gab wrote: > > Should this use NoBarrier_Load()? On Windows this is equivalent, but in > > atomicops_internals_portable.h it appears to do more than just a simple deref. > > > > Note: NoBarrier is clearly correct because of the surrounding use of > > WaitableEvent (not that it's very efficient given the heavier synchronization > > constructs nearby but the barrier is definitely not necessary) > > atomicops.h:19 It is incorrect to make direct assignments to/from an atomic > variable. You should use one of the Load or Store routines. The atomicops unittest direct references their variables for testing (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). Went ahead and changed this to a NoBarrier_Load https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_params.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_params.cc:7: #include "base/time/time.h" On 2016/07/20 14:15:44, fdoray wrote: > Not needed if included in the header. See earlier comment. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_params.h (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; On 2016/07/20 14:15:44, fdoray wrote: > #include "base/time/time.h" given that there is a TimeDelta member? The forward-decl still works here since we only need to know the size in the constructor. Folks who include this should also include time.h as well since they themselves will be using TimeDelta.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: > > > https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... > File base/task_scheduler/scheduler_worker_pool_impl.cc (right): > > > https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... > base/task_scheduler/scheduler_worker_pool_impl.cc:531: > !subtle::Release_Load(&num_single_threaded_runners_) && > On 2016/07/20 01:45:21, gab wrote: > > On 2016/07/19 22:03:47, robliao wrote: > > > On 2016/07/13 21:07:55, gab wrote: > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > Why do you need Release_Load() here? > > > > > > > Since CanDetach may make decisions based off of surrounding > state from > > > the > > > > > > > single threaded task runners (like idle thread accounting), > I figured > > it > > > > > would > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is > done on its > > > own > > > > > (and > > > > > > is in fact checked before the ReleaseLoad here anyways). All > checks are > > > > > > self-sufficient so this one is really only about > > > > |num_single_threaded_runners| > > > > > > IMO. > > > > > > > > > > In either case, it's the safest thing to do here. We want to > make sure all > > > > state > > > > > from before gets synced up to this point. > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). > CanDetach() > > > doesn't > > > > document that it produces a memory barrier and nothing relies on > it doing > > so. > > > > The only other thing after this call is HasJoinedForTesting() > which is > > > > self-sufficient. > > > > > > > > Everything in the scheduler is designed to do exactly what it > needs to be > > > safe. > > > > Not more, not less. > > > > > > > > Adding Release semantics here implicitly encodes a requirement > which doesn't > > > > exist. It's like using ThreadTaskRunnerHandle when one merely > cares for a > > > > sequence ;-). > > > > > > In light of the atomic ops discussion, any strong opinion to remove > the > > barrier > > > here? > > > > From the conclusion we derived, I think the inc/decrements should use > Barriers > > and the read here should use Acquire_Load(). I'm still not convinced > that this > > is required when dealing with a single variable (all memory > re-ordering examples > > I've seen use two variables and even a NoBarrier_Increment has to use > RMW -- > > read-modify-write -- semantics which have an implicit "barrier" for at > least > > that single variable (but would allow memory re-ordering of other > memory regions > > with respect to it?), i.e. maybe we could get away with > NoBarrier_Increment and > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced > we need > > the Acquire_Load() though as the read doesn't have an implicit RMW > like the > > increments do (don't understand when Release_Load() is appropriate -- > C++11 > > doesn't provide it FWIW). > > I've my understanding is right, the atomic ops are guaranteed to provide > atomicity in operations. The barriers control visibility. > > So IIUC, it's like this > > [Atomic Operation] > > NoBarrier_AtomicIncrement of the same atomic variable: > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > Result Case 1 0->1 1->2 > Result Case 2 1->2 2->3 > (And more) > > The value is atomic (we aren't reading in the middle of another thread > writing). The visibility is up to the processor. > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > exclusion) > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place > Atomic Increment + B] > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place > Atomic Increment + B] > Result Case 1 A: 0->1 3->4 > B: 1->2 2->3 > > Result Case 2 A: 1->2 3->4 > B: 0->1 2->3 > (...) > > Non-atomic increment would separate read-increment-write steps, which > for word sized data types, would lead to similar behavior as > NoBarrier_AtomicIncrement. > Quibble: My understanding is that even word-sized increments can end up being non-atomic inside the processor. > For sizes larger than a word, we could see > intermediate reads and writes. > > So that indeed suggests I need a barrier at the increment and > decrements. > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess > the question comes down when do I want to sync the world > > Acquire [Sync] Read > Release Read [Sync] > > Given that this count could be changing any time, practically speaking, > it doesn't seem to matter. If we get a 0, it could be 1 right after the > read in either case. The other thing to consider is the memory around > the value. If that needs to be visible too, then you'll also need a > barrier (e.g. a certain variable A is set to a value iff a variable B is > 0). > A Release_Store ensures that setting A and B in concert is visible > immediately after B is set. > > In this case of a read, the other values are synchronized, so it doesn't > matter as much if we sync now or later. With this analysis here, I think > arguably we can use NoBarrier_Load since with the increment barriers, > the value is visible everywhere. > > > https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... > File base/task_scheduler/scheduler_worker_pool_impl.h (right): > > > https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... > base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock > join_for_testing_called_lock_; > On 2016/07/20 01:45:21, gab wrote: > > On 2016/07/13 21:07:55, gab wrote: > > > On 2016/07/07 20:45:57, fdoray wrote: > > > > On 2016/07/07 17:49:16, robliao wrote: > > > > > On 2016/07/04 19:41:32, fdoray wrote: > > > > > > Elsewhere in base/task_scheduler, we use a manual reset > WaitableEvent > > > > instead > > > > > of > > > > > > a SchedulerLock+bool. > > > > > > > > > > Here, that seems like an abuse of the WaitableEvent since we > don't intend > > to > > > > > wait on it. Maybe we should be using an atomic instead in those > cases? > > > > > > > > It would be nice to have an AtomicBool class. Since we don't have > it yet, > > you > > > > can keep this as-is. > > > > > > We have CancellationFlag. It's name is a bit too specific and no one > has > > > bothered renaming it, but what some people do is typedef it to a > > > WhateverElseFlag. > > > > > > Whether the flag or a WaitableEvent I would strongly prefer having a > single > > > member here. Having single-member locks already bothers me some and > when > > they're > > > test specific it bothers me some more. > > > > ping ^^ > > > > > > > > PS: I'd be happy to see someone change that class to a BooleanFlag > with a > > > typedef at bottom to CancellationFlag for legacy usage. > > > > Yup. I see your pending change for the atomic flag. We can incorporate > that once the name change goes in. A typedef seems like an abuse of the > type system just to perform a local rename. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_pool_impl.cc (right): > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl.cc:144: > SchedulerWorkerPoolImpl* worker_pool, > On 2016/07/20 14:15:43, fdoray wrote: > > Can stay SchedulerWorkerPool* > > Done. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl.cc:446: > performed_idle_cycle_ = true; > On 2016/07/20 01:45:21, gab wrote: > > So if we create/wake one too many thread (i.e. 2 too many threads for > workload > > given we always aim to keep one ready), we'll immediately detach it? > Or do we > > want to at least wait for "reclaim time" (in which case I think > starting this at > > false gives us what we want?) > > Yes, if we wake one too many threads with no work, those threads will > detach. In general, we shouldn't be doing that as that's indicative of a > bug on our side. > > Waiting for the reclaim time is just as bad since we'll effectively have > these threads spin waiting. We should instead fix the wake bug. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an > issue if |num_single_threaded_runners_| is incremented after > On 2016/07/20 14:15:43, fdoray wrote: > > [...] after this because the newly created TaskRunner (from which no > task has > > run yet) will simply run all its tasks on the next physical thread > created by > > this worker. > > Done. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl.cc:564: > worker_detachment_allowed_(true), > On 2016/07/20 01:45:21, gab wrote: > > (bool should be replaced by flag as mentioned in previous comment, but > if it's > > not : initialize it inline in header) > > Moved initialization. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_pool_impl.h (right): > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker > thread detachment. If the suggested reclaim time is not > On 2016/07/20 14:15:43, fdoray wrote: > > // [...], then the test should call this before JoinForTesting() > > // when no worker thread can be in the middle of detaching. That > > // can be when all workers are known to be idle (after > > // WaitForAllWorkersIdleForTesting() returned) or busy (after > > // posting as many blocking tasks as the number of workers). > > > > "before the first set of work is dispatched": If someone wants to > disallow > > detachment before posting work, shouldn't they use TimeDelta::Max() > for the > > suggested reclaim time? > > > > Unless the test requires the ability to detach. I've relaxed the > requirement from "first". > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const > TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); > On 2016/07/20 01:45:21, gab wrote: > > constexpr > > Done. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: > worker_pool_->DisallowWorkerDetachmentForTesting(); > On 2016/07/20 14:15:43, fdoray wrote: > > This is not required given that suggested reclaim time is > TimeDelta::Max(), > > right? > > Yep. Done. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: > worker_pool->DisallowWorkerDetachmentForTesting(); > On 2016/07/20 14:15:43, fdoray wrote: > > This is not required given that suggested reclaim time is > TimeDelta::Max(), > > right? > > Yep. Done. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void > CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { > On 2016/07/20 01:45:21, gab wrote: > > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) > > We're counting zero TLS values here so the name should stick. The TLS is > always initialized to zero. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: > EXPECT_GT(zero_tls_values_, 0); > On 2016/07/20 14:15:43, fdoray wrote: > > On 2016/07/20 01:45:21, gab wrote: > > > Should this use NoBarrier_Load()? On Windows this is equivalent, but > in > > > atomicops_internals_portable.h it appears to do more than just a > simple deref. > > > > > > Note: NoBarrier is clearly correct because of the surrounding use of > > > WaitableEvent (not that it's very efficient given the heavier > synchronization > > > constructs nearby but the barrier is definitely not necessary) > > > > atomicops.h:19 It is incorrect to make direct assignments to/from an > atomic > > variable. You should use one of the Load or Store routines. > > The atomicops unittest direct references their variables for testing > (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). > > Went ahead and changed this to a NoBarrier_Load > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_pool_params.cc (right): > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_params.cc:7: #include > "base/time/time.h" > On 2016/07/20 14:15:44, fdoray wrote: > > Not needed if included in the header. > > See earlier comment. > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > File base/task_scheduler/scheduler_worker_pool_params.h (right): > > > https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... > base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; > On 2016/07/20 14:15:44, fdoray wrote: > > #include "base/time/time.h" given that there is a TimeDelta member? > > The forward-decl still works here since we only need to know the size in > the constructor. Folks who include this should also include time.h as > well since they themselves will be using TimeDelta. > > https://codereview.chromium.org/2116163002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 19:44:00, robliao wrote: > On 2016/07/20 01:45:21, gab wrote: > > On 2016/07/19 22:03:47, robliao wrote: > > > On 2016/07/13 21:07:55, gab wrote: > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > Why do you need Release_Load() here? > > > > > > > Since CanDetach may make decisions based off of surrounding state > from > > > the > > > > > > > single threaded task runners (like idle thread accounting), I > figured > > it > > > > > would > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done on > its > > > own > > > > > (and > > > > > > is in fact checked before the ReleaseLoad here anyways). All checks > are > > > > > > self-sufficient so this one is really only about > > > > |num_single_threaded_runners| > > > > > > IMO. > > > > > > > > > > In either case, it's the safest thing to do here. We want to make sure > all > > > > state > > > > > from before gets synced up to this point. > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() > > > doesn't > > > > document that it produces a memory barrier and nothing relies on it doing > > so. > > > > The only other thing after this call is HasJoinedForTesting() which is > > > > self-sufficient. > > > > > > > > Everything in the scheduler is designed to do exactly what it needs to be > > > safe. > > > > Not more, not less. > > > > > > > > Adding Release semantics here implicitly encodes a requirement which > doesn't > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a > > > > sequence ;-). > > > > > > In light of the atomic ops discussion, any strong opinion to remove the > > barrier > > > here? > > > > From the conclusion we derived, I think the inc/decrements should use Barriers > > and the read here should use Acquire_Load(). I'm still not convinced that this > > is required when dealing with a single variable (all memory re-ordering > examples > > I've seen use two variables and even a NoBarrier_Increment has to use RMW -- > > read-modify-write -- semantics which have an implicit "barrier" for at least > > that single variable (but would allow memory re-ordering of other memory > regions > > with respect to it?), i.e. maybe we could get away with NoBarrier_Increment > and > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we need > > the Acquire_Load() though as the read doesn't have an implicit RMW like the > > increments do (don't understand when Release_Load() is appropriate -- C++11 > > doesn't provide it FWIW). > > I've my understanding is right We have different understandings on a few things (inline) >, the atomic ops are guaranteed to provide > atomicity in operations. The barriers control visibility. > > So IIUC, it's like this > > [Atomic Operation] > > NoBarrier_AtomicIncrement of the same atomic variable: > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > Result Case 1 0->1 1->2 > Result Case 2 1->2 2->3 > (And more) > > The value is atomic (we aren't reading in the middle of another thread writing). > The visibility is up to the processor. > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > exclusion) > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > Increment + B] > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > Increment + B] > Result Case 1 A: 0->1 3->4 > B: 1->2 2->3 > > Result Case 2 A: 1->2 3->4 > B: 0->1 2->3 > (...) > > Non-atomic increment would separate read-increment-write steps, which for word > sized data types, would lead to similar behavior as NoBarrier_AtomicIncrement. Hmmm separating read-increment-write steps would be wrong even for word sized data types (the whole point of atomic RMW is to make sure to do so as a single transaction). > For sizes larger than a word, we could see intermediate reads and writes. > > So that indeed suggests I need a barrier at the increment and decrements. I think you need a barrier so that the result of the increment is "flushed" (guaranteed to not be reordered after the next sequential read). > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess the > question comes down when do I want to sync the world > > Acquire [Sync] Read > Release Read [Sync] > > Given that this count could be changing any time, practically speaking, it > doesn't seem to matter. If we get a 0, it could be 1 right after the read in > either case. The other thing to consider is the memory around the value. If that > needs to be visible too, then you'll also need a barrier (e.g. a certain > variable A is set to a value iff a variable B is 0). I think we do care about syncing before (Acquire), otherwise a thread could see 0 forever. It's not relevant that the count could go up right after check (we explicitly support that by sending a WakeUp which would result in a sleep no-op IIRC). What's relevant is that we don't see a value that's outdated beyond the range we explicitly protect ourselves against. > A Release_Store ensures that setting A and B in concert is visible immediately > after B is set. I don't think we care about that. Nothing depends on being synchronized with the variable at stake here, adding the Release adds an implicit dependency which we don't document and that's no good for readability (and a potential burden if someone starts depending on it because it happens to be there). Every synchronization in the scheduler is explicit and intentional, let's not add one for the sake of "why not". > > In this case of a read, the other values are synchronized What do you mean by "the other values are synchronized on read"? Which other values and why and why does it matter? >, so it doesn't matter > as much if we sync now or later. With this analysis here, I think arguably we > can use NoBarrier_Load since with the increment barriers, the value is visible > everywhere. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/20 19:44:00, robliao wrote: > On 2016/07/20 01:45:21, gab wrote: > > On 2016/07/13 21:07:55, gab wrote: > > > On 2016/07/07 20:45:57, fdoray wrote: > > > > On 2016/07/07 17:49:16, robliao wrote: > > > > > On 2016/07/04 19:41:32, fdoray wrote: > > > > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent > > > > instead > > > > > of > > > > > > a SchedulerLock+bool. > > > > > > > > > > Here, that seems like an abuse of the WaitableEvent since we don't > intend > > to > > > > > wait on it. Maybe we should be using an atomic instead in those cases? > > > > > > > > It would be nice to have an AtomicBool class. Since we don't have it yet, > > you > > > > can keep this as-is. > > > > > > We have CancellationFlag. It's name is a bit too specific and no one has > > > bothered renaming it, but what some people do is typedef it to a > > > WhateverElseFlag. > > > > > > Whether the flag or a WaitableEvent I would strongly prefer having a single > > > member here. Having single-member locks already bothers me some and when > > they're > > > test specific it bothers me some more. > > > > ping ^^ > > > > > > > > PS: I'd be happy to see someone change that class to a BooleanFlag with a > > > typedef at bottom to CancellationFlag for legacy usage. > > > > Yup. I see your pending change for the atomic flag. Which change? It sure isn't mine or I code in my sleep :-) > We can incorporate that once > the name change goes in. A typedef seems like an abuse of the type system just > to perform a local rename. It's at least done here (and I remember other places but I can't find it, maybe deleted code). I don't see why not, but fine to do it later too. https://cs.chromium.org/chromium/src/chrome/browser/after_startup_task_utils.... https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ = true; On 2016/07/20 19:44:01, robliao wrote: > On 2016/07/20 01:45:21, gab wrote: > > So if we create/wake one too many thread (i.e. 2 too many threads for workload > > given we always aim to keep one ready), we'll immediately detach it? Or do we > > want to at least wait for "reclaim time" (in which case I think starting this > at > > false gives us what we want?) > > Yes, if we wake one too many threads with no work, those threads will detach. In > general, we shouldn't be doing that as that's indicative of a bug on our side. > > Waiting for the reclaim time is just as bad since we'll effectively have these > threads spin waiting. We should instead fix the wake bug. It's not necessarily a bug. It can today under a simple bursty load. i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all processed by first two threads before other 2 even get scheduled. I think in those cases the ever idle threads should wait for reclaim time or we might end up trashing the system. Say 4 more tasks show up after an immediate reclaim: we recreate those 2 threads but the 2 active one yet again manage to process the 4 tasks before the 2 recreated threads are scheduled (etc.). So I still think we should start this false (and explain the above in a comment here). (and we should reset it to false right before detaching as well for the same reasons)
The way that the atomic increment is implemented for x86 on Windows is via _InterlockedExchangeAdd, which is intended to be an atomic operation. x86 allows for atomic increments in the processor as long as you specify the lock prefix on the INC instruction. Adds register to register are generally atomic in word sizes. The _InterlockedExchangeAdd gets translated to lock xadd [mem location], [increment value] making the increment atomic. Since on x86 in multiprocessor systems lock prefixed instructions have a total order, the barriers are not necessary. On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote: > On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: > >> >> >> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >> >> >> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >> base/task_scheduler/scheduler_worker_pool_impl.cc:531: >> !subtle::Release_Load(&num_single_threaded_runners_) && >> On 2016/07/20 01:45:21, gab wrote: >> > On 2016/07/19 22:03:47, robliao wrote: >> > > On 2016/07/13 21:07:55, gab wrote: >> > > > On 2016/07/13 20:19:46, robliao wrote: >> > > > > On 2016/07/13 18:36:31, gab wrote: >> > > > > > > > Why do you need Release_Load() here? >> > > > > > > Since CanDetach may make decisions based off of surrounding >> state from >> > > the >> > > > > > > single threaded task runners (like idle thread accounting), >> I figured >> > it >> > > > > would >> > > > > > > be robust to make sure everything is updated at this point. >> > > > > > >> > > > > > Hmmm, I don't think this matters, idle thread accounting is >> done on its >> > > own >> > > > > (and >> > > > > > is in fact checked before the ReleaseLoad here anyways). All >> checks are >> > > > > > self-sufficient so this one is really only about >> > > > |num_single_threaded_runners| >> > > > > > IMO. >> > > > > >> > > > > In either case, it's the safest thing to do here. We want to >> make sure all >> > > > state >> > > > > from before gets synced up to this point. >> > > > >> > > > I disagree. It's not safer. It's equivalent (plus overhead). >> CanDetach() >> > > doesn't >> > > > document that it produces a memory barrier and nothing relies on >> it doing >> > so. >> > > > The only other thing after this call is HasJoinedForTesting() >> which is >> > > > self-sufficient. >> > > > >> > > > Everything in the scheduler is designed to do exactly what it >> needs to be >> > > safe. >> > > > Not more, not less. >> > > > >> > > > Adding Release semantics here implicitly encodes a requirement >> which doesn't >> > > > exist. It's like using ThreadTaskRunnerHandle when one merely >> cares for a >> > > > sequence ;-). >> > > >> > > In light of the atomic ops discussion, any strong opinion to remove >> the >> > barrier >> > > here? >> > >> > From the conclusion we derived, I think the inc/decrements should use >> Barriers >> > and the read here should use Acquire_Load(). I'm still not convinced >> that this >> > is required when dealing with a single variable (all memory >> re-ordering examples >> > I've seen use two variables and even a NoBarrier_Increment has to use >> RMW -- >> > read-modify-write -- semantics which have an implicit "barrier" for at >> least >> > that single variable (but would allow memory re-ordering of other >> memory regions >> > with respect to it?), i.e. maybe we could get away with >> NoBarrier_Increment and >> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced >> we need >> > the Acquire_Load() though as the read doesn't have an implicit RMW >> like the >> > increments do (don't understand when Release_Load() is appropriate -- >> C++11 >> > doesn't provide it FWIW). >> >> I've my understanding is right, the atomic ops are guaranteed to provide >> atomicity in operations. The barriers control visibility. >> >> So IIUC, it's like this >> >> [Atomic Operation] >> >> NoBarrier_AtomicIncrement of the same atomic variable: >> Thread A [In-place Atomic Increment] [In-place Atomic Increment] >> Thread B [In-place Atomic Increment] [In-place Atomic Increment] >> >> Result Case 1 0->1 1->2 >> Result Case 2 1->2 2->3 >> (And more) >> >> The value is atomic (we aren't reading in the middle of another thread >> writing). The visibility is up to the processor. >> >> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual >> exclusion) >> >> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place >> Atomic Increment + B] >> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place >> Atomic Increment + B] >> Result Case 1 A: 0->1 3->4 >> B: 1->2 2->3 >> >> Result Case 2 A: 1->2 3->4 >> B: 0->1 2->3 >> (...) >> >> Non-atomic increment would separate read-increment-write steps, which >> for word sized data types, would lead to similar behavior as >> NoBarrier_AtomicIncrement. >> > > Quibble: My understanding is that even word-sized increments can end up > being non-atomic inside the processor. > > >> For sizes larger than a word, we could see >> intermediate reads and writes. >> >> So that indeed suggests I need a barrier at the increment and >> decrements. >> >> >> >> >> >> So as for whether or not to use Acquire_Load and Release_Load, I guess >> the question comes down when do I want to sync the world >> >> Acquire [Sync] Read >> Release Read [Sync] >> >> Given that this count could be changing any time, practically speaking, >> it doesn't seem to matter. If we get a 0, it could be 1 right after the >> read in either case. The other thing to consider is the memory around >> the value. If that needs to be visible too, then you'll also need a >> barrier (e.g. a certain variable A is set to a value iff a variable B is >> 0). >> A Release_Store ensures that setting A and B in concert is visible >> immediately after B is set. >> >> In this case of a read, the other values are synchronized, so it doesn't >> matter as much if we sync now or later. With this analysis here, I think >> arguably we can use NoBarrier_Load since with the increment barriers, >> the value is visible everywhere. >> >> >> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >> >> >> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock >> join_for_testing_called_lock_; >> On 2016/07/20 01:45:21, gab wrote: >> > On 2016/07/13 21:07:55, gab wrote: >> > > On 2016/07/07 20:45:57, fdoray wrote: >> > > > On 2016/07/07 17:49:16, robliao wrote: >> > > > > On 2016/07/04 19:41:32, fdoray wrote: >> > > > > > Elsewhere in base/task_scheduler, we use a manual reset >> WaitableEvent >> > > > instead >> > > > > of >> > > > > > a SchedulerLock+bool. >> > > > > >> > > > > Here, that seems like an abuse of the WaitableEvent since we >> don't intend >> > to >> > > > > wait on it. Maybe we should be using an atomic instead in those >> cases? >> > > > >> > > > It would be nice to have an AtomicBool class. Since we don't have >> it yet, >> > you >> > > > can keep this as-is. >> > > >> > > We have CancellationFlag. It's name is a bit too specific and no one >> has >> > > bothered renaming it, but what some people do is typedef it to a >> > > WhateverElseFlag. >> > > >> > > Whether the flag or a WaitableEvent I would strongly prefer having a >> single >> > > member here. Having single-member locks already bothers me some and >> when >> > they're >> > > test specific it bothers me some more. >> > >> > ping ^^ >> > >> > > >> > > PS: I'd be happy to see someone change that class to a BooleanFlag >> with a >> > > typedef at bottom to CancellationFlag for legacy usage. >> > >> >> Yup. I see your pending change for the atomic flag. We can incorporate >> that once the name change goes in. A typedef seems like an abuse of the >> type system just to perform a local rename. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl.cc:144: >> SchedulerWorkerPoolImpl* worker_pool, >> On 2016/07/20 14:15:43, fdoray wrote: >> > Can stay SchedulerWorkerPool* >> >> Done. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl.cc:446: >> performed_idle_cycle_ = true; >> On 2016/07/20 01:45:21, gab wrote: >> > So if we create/wake one too many thread (i.e. 2 too many threads for >> workload >> > given we always aim to keep one ready), we'll immediately detach it? >> Or do we >> > want to at least wait for "reclaim time" (in which case I think >> starting this at >> > false gives us what we want?) >> >> Yes, if we wake one too many threads with no work, those threads will >> detach. In general, we shouldn't be doing that as that's indicative of a >> bug on our side. >> >> Waiting for the reclaim time is just as bad since we'll effectively have >> these threads spin waiting. We should instead fix the wake bug. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an >> issue if |num_single_threaded_runners_| is incremented after >> On 2016/07/20 14:15:43, fdoray wrote: >> > [...] after this because the newly created TaskRunner (from which no >> task has >> > run yet) will simply run all its tasks on the next physical thread >> created by >> > this worker. >> >> Done. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl.cc:564: >> worker_detachment_allowed_(true), >> On 2016/07/20 01:45:21, gab wrote: >> > (bool should be replaced by flag as mentioned in previous comment, but >> if it's >> > not : initialize it inline in header) >> >> Moved initialization. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker >> thread detachment. If the suggested reclaim time is not >> On 2016/07/20 14:15:43, fdoray wrote: >> > // [...], then the test should call this before JoinForTesting() >> > // when no worker thread can be in the middle of detaching. That >> > // can be when all workers are known to be idle (after >> > // WaitForAllWorkersIdleForTesting() returned) or busy (after >> > // posting as many blocking tasks as the number of workers). >> > >> > "before the first set of work is dispatched": If someone wants to >> disallow >> > detachment before posting work, shouldn't they use TimeDelta::Max() >> for the >> > suggested reclaim time? >> > >> >> Unless the test requires the ability to detach. I've relaxed the >> requirement from "first". >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const >> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); >> On 2016/07/20 01:45:21, gab wrote: >> > constexpr >> >> Done. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: >> worker_pool_->DisallowWorkerDetachmentForTesting(); >> On 2016/07/20 14:15:43, fdoray wrote: >> > This is not required given that suggested reclaim time is >> TimeDelta::Max(), >> > right? >> >> Yep. Done. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: >> worker_pool->DisallowWorkerDetachmentForTesting(); >> On 2016/07/20 14:15:43, fdoray wrote: >> > This is not required given that suggested reclaim time is >> TimeDelta::Max(), >> > right? >> >> Yep. Done. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void >> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { >> On 2016/07/20 01:45:21, gab wrote: >> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) >> >> We're counting zero TLS values here so the name should stick. The TLS is >> always initialized to zero. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: >> EXPECT_GT(zero_tls_values_, 0); >> On 2016/07/20 14:15:43, fdoray wrote: >> > On 2016/07/20 01:45:21, gab wrote: >> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but >> in >> > > atomicops_internals_portable.h it appears to do more than just a >> simple deref. >> > > >> > > Note: NoBarrier is clearly correct because of the surrounding use of >> > > WaitableEvent (not that it's very efficient given the heavier >> synchronization >> > > constructs nearby but the barrier is definitely not necessary) >> > >> > atomicops.h:19 It is incorrect to make direct assignments to/from an >> atomic >> > variable. You should use one of the Load or Store routines. >> >> The atomicops unittest direct references their variables for testing >> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). >> >> Went ahead and changed this to a NoBarrier_Load >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> File base/task_scheduler/scheduler_worker_pool_params.cc (right): >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include >> "base/time/time.h" >> On 2016/07/20 14:15:44, fdoray wrote: >> > Not needed if included in the header. >> >> See earlier comment. >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> File base/task_scheduler/scheduler_worker_pool_params.h (right): >> >> >> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; >> On 2016/07/20 14:15:44, fdoray wrote: >> > #include "base/time/time.h" given that there is a TimeDelta member? >> >> The forward-decl still works here since we only need to know the size in >> the constructor. Folks who include this should also include time.h as >> well since they themselves will be using TimeDelta. >> >> https://codereview.chromium.org/2116163002/ >> > > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> wrote: > The way that the atomic increment is implemented for x86 on Windows is via > _InterlockedExchangeAdd, which is intended to be an atomic operation. > > x86 allows for atomic increments in the processor as long as you specify > the lock prefix on the INC instruction. > Adds register to register are generally atomic in word sizes. > > The _InterlockedExchangeAdd gets translated to > lock xadd [mem location], [increment value] > making the increment atomic. > > Since on x86 in multiprocessor systems lock prefixed instructions have a > total order, the barriers are not necessary. > Perhaps thats related to why barriers have 0 perf impact on x86? (A reason to write generally for any platform I guess.) > > > On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote: > >> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: >> >>> >>> >>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>> base/task_scheduler/scheduler_worker_pool_impl.cc:531: >>> !subtle::Release_Load(&num_single_threaded_runners_) && >>> On 2016/07/20 01:45:21, gab wrote: >>> > On 2016/07/19 22:03:47, robliao wrote: >>> > > On 2016/07/13 21:07:55, gab wrote: >>> > > > On 2016/07/13 20:19:46, robliao wrote: >>> > > > > On 2016/07/13 18:36:31, gab wrote: >>> > > > > > > > Why do you need Release_Load() here? >>> > > > > > > Since CanDetach may make decisions based off of surrounding >>> state from >>> > > the >>> > > > > > > single threaded task runners (like idle thread accounting), >>> I figured >>> > it >>> > > > > would >>> > > > > > > be robust to make sure everything is updated at this point. >>> > > > > > >>> > > > > > Hmmm, I don't think this matters, idle thread accounting is >>> done on its >>> > > own >>> > > > > (and >>> > > > > > is in fact checked before the ReleaseLoad here anyways). All >>> checks are >>> > > > > > self-sufficient so this one is really only about >>> > > > |num_single_threaded_runners| >>> > > > > > IMO. >>> > > > > >>> > > > > In either case, it's the safest thing to do here. We want to >>> make sure all >>> > > > state >>> > > > > from before gets synced up to this point. >>> > > > >>> > > > I disagree. It's not safer. It's equivalent (plus overhead). >>> CanDetach() >>> > > doesn't >>> > > > document that it produces a memory barrier and nothing relies on >>> it doing >>> > so. >>> > > > The only other thing after this call is HasJoinedForTesting() >>> which is >>> > > > self-sufficient. >>> > > > >>> > > > Everything in the scheduler is designed to do exactly what it >>> needs to be >>> > > safe. >>> > > > Not more, not less. >>> > > > >>> > > > Adding Release semantics here implicitly encodes a requirement >>> which doesn't >>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely >>> cares for a >>> > > > sequence ;-). >>> > > >>> > > In light of the atomic ops discussion, any strong opinion to remove >>> the >>> > barrier >>> > > here? >>> > >>> > From the conclusion we derived, I think the inc/decrements should use >>> Barriers >>> > and the read here should use Acquire_Load(). I'm still not convinced >>> that this >>> > is required when dealing with a single variable (all memory >>> re-ordering examples >>> > I've seen use two variables and even a NoBarrier_Increment has to use >>> RMW -- >>> > read-modify-write -- semantics which have an implicit "barrier" for at >>> least >>> > that single variable (but would allow memory re-ordering of other >>> memory regions >>> > with respect to it?), i.e. maybe we could get away with >>> NoBarrier_Increment and >>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced >>> we need >>> > the Acquire_Load() though as the read doesn't have an implicit RMW >>> like the >>> > increments do (don't understand when Release_Load() is appropriate -- >>> C++11 >>> > doesn't provide it FWIW). >>> >>> I've my understanding is right, the atomic ops are guaranteed to provide >>> atomicity in operations. The barriers control visibility. >>> >>> So IIUC, it's like this >>> >>> [Atomic Operation] >>> >>> NoBarrier_AtomicIncrement of the same atomic variable: >>> Thread A [In-place Atomic Increment] [In-place Atomic Increment] >>> Thread B [In-place Atomic Increment] [In-place Atomic Increment] >>> >>> Result Case 1 0->1 1->2 >>> Result Case 2 1->2 2->3 >>> (And more) >>> >>> The value is atomic (we aren't reading in the middle of another thread >>> writing). The visibility is up to the processor. >>> >>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual >>> exclusion) >>> >>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place >>> Atomic Increment + B] >>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place >>> Atomic Increment + B] >>> Result Case 1 A: 0->1 3->4 >>> B: 1->2 2->3 >>> >>> Result Case 2 A: 1->2 3->4 >>> B: 0->1 2->3 >>> (...) >>> >>> Non-atomic increment would separate read-increment-write steps, which >>> for word sized data types, would lead to similar behavior as >>> NoBarrier_AtomicIncrement. >>> >> >> Quibble: My understanding is that even word-sized increments can end up >> being non-atomic inside the processor. >> >> >>> For sizes larger than a word, we could see >>> intermediate reads and writes. >>> >>> So that indeed suggests I need a barrier at the increment and >>> decrements. >>> >>> >>> >>> >>> >>> So as for whether or not to use Acquire_Load and Release_Load, I guess >>> the question comes down when do I want to sync the world >>> >>> Acquire [Sync] Read >>> Release Read [Sync] >>> >>> Given that this count could be changing any time, practically speaking, >>> it doesn't seem to matter. If we get a 0, it could be 1 right after the >>> read in either case. The other thing to consider is the memory around >>> the value. If that needs to be visible too, then you'll also need a >>> barrier (e.g. a certain variable A is set to a value iff a variable B is >>> 0). >>> A Release_Store ensures that setting A and B in concert is visible >>> immediately after B is set. >>> >>> In this case of a read, the other values are synchronized, so it doesn't >>> matter as much if we sync now or later. With this analysis here, I think >>> arguably we can use NoBarrier_Load since with the increment barriers, >>> the value is visible everywhere. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock >>> join_for_testing_called_lock_; >>> On 2016/07/20 01:45:21, gab wrote: >>> > On 2016/07/13 21:07:55, gab wrote: >>> > > On 2016/07/07 20:45:57, fdoray wrote: >>> > > > On 2016/07/07 17:49:16, robliao wrote: >>> > > > > On 2016/07/04 19:41:32, fdoray wrote: >>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset >>> WaitableEvent >>> > > > instead >>> > > > > of >>> > > > > > a SchedulerLock+bool. >>> > > > > >>> > > > > Here, that seems like an abuse of the WaitableEvent since we >>> don't intend >>> > to >>> > > > > wait on it. Maybe we should be using an atomic instead in those >>> cases? >>> > > > >>> > > > It would be nice to have an AtomicBool class. Since we don't have >>> it yet, >>> > you >>> > > > can keep this as-is. >>> > > >>> > > We have CancellationFlag. It's name is a bit too specific and no one >>> has >>> > > bothered renaming it, but what some people do is typedef it to a >>> > > WhateverElseFlag. >>> > > >>> > > Whether the flag or a WaitableEvent I would strongly prefer having a >>> single >>> > > member here. Having single-member locks already bothers me some and >>> when >>> > they're >>> > > test specific it bothers me some more. >>> > >>> > ping ^^ >>> > >>> > > >>> > > PS: I'd be happy to see someone change that class to a BooleanFlag >>> with a >>> > > typedef at bottom to CancellationFlag for legacy usage. >>> > >>> >>> Yup. I see your pending change for the atomic flag. We can incorporate >>> that once the name change goes in. A typedef seems like an abuse of the >>> type system just to perform a local rename. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl.cc:144: >>> SchedulerWorkerPoolImpl* worker_pool, >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > Can stay SchedulerWorkerPool* >>> >>> Done. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl.cc:446: >>> performed_idle_cycle_ = true; >>> On 2016/07/20 01:45:21, gab wrote: >>> > So if we create/wake one too many thread (i.e. 2 too many threads for >>> workload >>> > given we always aim to keep one ready), we'll immediately detach it? >>> Or do we >>> > want to at least wait for "reclaim time" (in which case I think >>> starting this at >>> > false gives us what we want?) >>> >>> Yes, if we wake one too many threads with no work, those threads will >>> detach. In general, we shouldn't be doing that as that's indicative of a >>> bug on our side. >>> >>> Waiting for the reclaim time is just as bad since we'll effectively have >>> these threads spin waiting. We should instead fix the wake bug. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an >>> issue if |num_single_threaded_runners_| is incremented after >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > [...] after this because the newly created TaskRunner (from which no >>> task has >>> > run yet) will simply run all its tasks on the next physical thread >>> created by >>> > this worker. >>> >>> Done. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl.cc:564: >>> worker_detachment_allowed_(true), >>> On 2016/07/20 01:45:21, gab wrote: >>> > (bool should be replaced by flag as mentioned in previous comment, but >>> if it's >>> > not : initialize it inline in header) >>> >>> Moved initialization. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker >>> thread detachment. If the suggested reclaim time is not >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > // [...], then the test should call this before JoinForTesting() >>> > // when no worker thread can be in the middle of detaching. That >>> > // can be when all workers are known to be idle (after >>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after >>> > // posting as many blocking tasks as the number of workers). >>> > >>> > "before the first set of work is dispatched": If someone wants to >>> disallow >>> > detachment before posting work, shouldn't they use TimeDelta::Max() >>> for the >>> > suggested reclaim time? >>> > >>> >>> Unless the test requires the ability to detach. I've relaxed the >>> requirement from "first". >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const >>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); >>> On 2016/07/20 01:45:21, gab wrote: >>> > constexpr >>> >>> Done. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: >>> worker_pool_->DisallowWorkerDetachmentForTesting(); >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > This is not required given that suggested reclaim time is >>> TimeDelta::Max(), >>> > right? >>> >>> Yep. Done. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: >>> worker_pool->DisallowWorkerDetachmentForTesting(); >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > This is not required given that suggested reclaim time is >>> TimeDelta::Max(), >>> > right? >>> >>> Yep. Done. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void >>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { >>> On 2016/07/20 01:45:21, gab wrote: >>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) >>> >>> We're counting zero TLS values here so the name should stick. The TLS is >>> always initialized to zero. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: >>> EXPECT_GT(zero_tls_values_, 0); >>> On 2016/07/20 14:15:43, fdoray wrote: >>> > On 2016/07/20 01:45:21, gab wrote: >>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but >>> in >>> > > atomicops_internals_portable.h it appears to do more than just a >>> simple deref. >>> > > >>> > > Note: NoBarrier is clearly correct because of the surrounding use of >>> > > WaitableEvent (not that it's very efficient given the heavier >>> synchronization >>> > > constructs nearby but the barrier is definitely not necessary) >>> > >>> > atomicops.h:19 It is incorrect to make direct assignments to/from an >>> atomic >>> > variable. You should use one of the Load or Store routines. >>> >>> The atomicops unittest direct references their variables for testing >>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). >>> >>> Went ahead and changed this to a NoBarrier_Load >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> File base/task_scheduler/scheduler_worker_pool_params.cc (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include >>> "base/time/time.h" >>> On 2016/07/20 14:15:44, fdoray wrote: >>> > Not needed if included in the header. >>> >>> See earlier comment. >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> File base/task_scheduler/scheduler_worker_pool_params.h (right): >>> >>> >>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; >>> On 2016/07/20 14:15:44, fdoray wrote: >>> > #include "base/time/time.h" given that there is a TimeDelta member? >>> >>> The forward-decl still works here since we only need to know the size in >>> the constructor. Folks who include this should also include time.h as >>> well since they themselves will be using TimeDelta. >>> >>> https://codereview.chromium.org/2116163002/ >>> >> >> > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Yeah, I think atomicops are indeed atomic. Even the arm64 version of NoBarrier_AtomicIncrement <https://cs.chromium.org/chromium/src/v8/src/base/atomicops_internals_arm64_gc...> attempts to make sure the op was atomic. Since arm64 doesn't ensure the total ordering in multiprocessor systems, the barriers are required on both sides of the increment. On Wed, Jul 20, 2016 at 2:00 PM, <danakj@chromium.org> wrote: > On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> wrote: > >> The way that the atomic increment is implemented for x86 on Windows is >> via _InterlockedExchangeAdd, which is intended to be an atomic operation. >> >> x86 allows for atomic increments in the processor as long as you specify >> the lock prefix on the INC instruction. >> Adds register to register are generally atomic in word sizes. >> >> The _InterlockedExchangeAdd gets translated to >> lock xadd [mem location], [increment value] >> making the increment atomic. >> >> Since on x86 in multiprocessor systems lock prefixed instructions have a >> total order, the barriers are not necessary. >> > > Perhaps thats related to why barriers have 0 perf impact on x86? (A reason > to write generally for any platform I guess.) > > >> >> >> On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote: >> >>> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: >>> >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>> base/task_scheduler/scheduler_worker_pool_impl.cc:531: >>>> !subtle::Release_Load(&num_single_threaded_runners_) && >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > On 2016/07/19 22:03:47, robliao wrote: >>>> > > On 2016/07/13 21:07:55, gab wrote: >>>> > > > On 2016/07/13 20:19:46, robliao wrote: >>>> > > > > On 2016/07/13 18:36:31, gab wrote: >>>> > > > > > > > Why do you need Release_Load() here? >>>> > > > > > > Since CanDetach may make decisions based off of surrounding >>>> state from >>>> > > the >>>> > > > > > > single threaded task runners (like idle thread accounting), >>>> I figured >>>> > it >>>> > > > > would >>>> > > > > > > be robust to make sure everything is updated at this point. >>>> > > > > > >>>> > > > > > Hmmm, I don't think this matters, idle thread accounting is >>>> done on its >>>> > > own >>>> > > > > (and >>>> > > > > > is in fact checked before the ReleaseLoad here anyways). All >>>> checks are >>>> > > > > > self-sufficient so this one is really only about >>>> > > > |num_single_threaded_runners| >>>> > > > > > IMO. >>>> > > > > >>>> > > > > In either case, it's the safest thing to do here. We want to >>>> make sure all >>>> > > > state >>>> > > > > from before gets synced up to this point. >>>> > > > >>>> > > > I disagree. It's not safer. It's equivalent (plus overhead). >>>> CanDetach() >>>> > > doesn't >>>> > > > document that it produces a memory barrier and nothing relies on >>>> it doing >>>> > so. >>>> > > > The only other thing after this call is HasJoinedForTesting() >>>> which is >>>> > > > self-sufficient. >>>> > > > >>>> > > > Everything in the scheduler is designed to do exactly what it >>>> needs to be >>>> > > safe. >>>> > > > Not more, not less. >>>> > > > >>>> > > > Adding Release semantics here implicitly encodes a requirement >>>> which doesn't >>>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely >>>> cares for a >>>> > > > sequence ;-). >>>> > > >>>> > > In light of the atomic ops discussion, any strong opinion to remove >>>> the >>>> > barrier >>>> > > here? >>>> > >>>> > From the conclusion we derived, I think the inc/decrements should use >>>> Barriers >>>> > and the read here should use Acquire_Load(). I'm still not convinced >>>> that this >>>> > is required when dealing with a single variable (all memory >>>> re-ordering examples >>>> > I've seen use two variables and even a NoBarrier_Increment has to use >>>> RMW -- >>>> > read-modify-write -- semantics which have an implicit "barrier" for at >>>> least >>>> > that single variable (but would allow memory re-ordering of other >>>> memory regions >>>> > with respect to it?), i.e. maybe we could get away with >>>> NoBarrier_Increment and >>>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced >>>> we need >>>> > the Acquire_Load() though as the read doesn't have an implicit RMW >>>> like the >>>> > increments do (don't understand when Release_Load() is appropriate -- >>>> C++11 >>>> > doesn't provide it FWIW). >>>> >>>> I've my understanding is right, the atomic ops are guaranteed to provide >>>> atomicity in operations. The barriers control visibility. >>>> >>>> So IIUC, it's like this >>>> >>>> [Atomic Operation] >>>> >>>> NoBarrier_AtomicIncrement of the same atomic variable: >>>> Thread A [In-place Atomic Increment] [In-place Atomic Increment] >>>> Thread B [In-place Atomic Increment] [In-place Atomic Increment] >>>> >>>> Result Case 1 0->1 1->2 >>>> Result Case 2 1->2 2->3 >>>> (And more) >>>> >>>> The value is atomic (we aren't reading in the middle of another thread >>>> writing). The visibility is up to the processor. >>>> >>>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual >>>> exclusion) >>>> >>>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place >>>> Atomic Increment + B] >>>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place >>>> Atomic Increment + B] >>>> Result Case 1 A: 0->1 3->4 >>>> B: 1->2 2->3 >>>> >>>> Result Case 2 A: 1->2 3->4 >>>> B: 0->1 2->3 >>>> (...) >>>> >>>> Non-atomic increment would separate read-increment-write steps, which >>>> for word sized data types, would lead to similar behavior as >>>> NoBarrier_AtomicIncrement. >>>> >>> >>> Quibble: My understanding is that even word-sized increments can end up >>> being non-atomic inside the processor. >>> >>> >>>> For sizes larger than a word, we could see >>>> intermediate reads and writes. >>>> >>>> So that indeed suggests I need a barrier at the increment and >>>> decrements. >>>> >>>> >>>> >>>> >>>> >>>> So as for whether or not to use Acquire_Load and Release_Load, I guess >>>> the question comes down when do I want to sync the world >>>> >>>> Acquire [Sync] Read >>>> Release Read [Sync] >>>> >>>> Given that this count could be changing any time, practically speaking, >>>> it doesn't seem to matter. If we get a 0, it could be 1 right after the >>>> read in either case. The other thing to consider is the memory around >>>> the value. If that needs to be visible too, then you'll also need a >>>> barrier (e.g. a certain variable A is set to a value iff a variable B is >>>> 0). >>>> A Release_Store ensures that setting A and B in concert is visible >>>> immediately after B is set. >>>> >>>> In this case of a read, the other values are synchronized, so it doesn't >>>> matter as much if we sync now or later. With this analysis here, I think >>>> arguably we can use NoBarrier_Load since with the increment barriers, >>>> the value is visible everywhere. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock >>>> join_for_testing_called_lock_; >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > On 2016/07/13 21:07:55, gab wrote: >>>> > > On 2016/07/07 20:45:57, fdoray wrote: >>>> > > > On 2016/07/07 17:49:16, robliao wrote: >>>> > > > > On 2016/07/04 19:41:32, fdoray wrote: >>>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset >>>> WaitableEvent >>>> > > > instead >>>> > > > > of >>>> > > > > > a SchedulerLock+bool. >>>> > > > > >>>> > > > > Here, that seems like an abuse of the WaitableEvent since we >>>> don't intend >>>> > to >>>> > > > > wait on it. Maybe we should be using an atomic instead in those >>>> cases? >>>> > > > >>>> > > > It would be nice to have an AtomicBool class. Since we don't have >>>> it yet, >>>> > you >>>> > > > can keep this as-is. >>>> > > >>>> > > We have CancellationFlag. It's name is a bit too specific and no one >>>> has >>>> > > bothered renaming it, but what some people do is typedef it to a >>>> > > WhateverElseFlag. >>>> > > >>>> > > Whether the flag or a WaitableEvent I would strongly prefer having a >>>> single >>>> > > member here. Having single-member locks already bothers me some and >>>> when >>>> > they're >>>> > > test specific it bothers me some more. >>>> > >>>> > ping ^^ >>>> > >>>> > > >>>> > > PS: I'd be happy to see someone change that class to a BooleanFlag >>>> with a >>>> > > typedef at bottom to CancellationFlag for legacy usage. >>>> > >>>> >>>> Yup. I see your pending change for the atomic flag. We can incorporate >>>> that once the name change goes in. A typedef seems like an abuse of the >>>> type system just to perform a local rename. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl.cc:144: >>>> SchedulerWorkerPoolImpl* worker_pool, >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > Can stay SchedulerWorkerPool* >>>> >>>> Done. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl.cc:446: >>>> performed_idle_cycle_ = true; >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > So if we create/wake one too many thread (i.e. 2 too many threads for >>>> workload >>>> > given we always aim to keep one ready), we'll immediately detach it? >>>> Or do we >>>> > want to at least wait for "reclaim time" (in which case I think >>>> starting this at >>>> > false gives us what we want?) >>>> >>>> Yes, if we wake one too many threads with no work, those threads will >>>> detach. In general, we shouldn't be doing that as that's indicative of a >>>> bug on our side. >>>> >>>> Waiting for the reclaim time is just as bad since we'll effectively have >>>> these threads spin waiting. We should instead fix the wake bug. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an >>>> issue if |num_single_threaded_runners_| is incremented after >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > [...] after this because the newly created TaskRunner (from which no >>>> task has >>>> > run yet) will simply run all its tasks on the next physical thread >>>> created by >>>> > this worker. >>>> >>>> Done. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl.cc:564: >>>> worker_detachment_allowed_(true), >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > (bool should be replaced by flag as mentioned in previous comment, but >>>> if it's >>>> > not : initialize it inline in header) >>>> >>>> Moved initialization. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker >>>> thread detachment. If the suggested reclaim time is not >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > // [...], then the test should call this before JoinForTesting() >>>> > // when no worker thread can be in the middle of detaching. That >>>> > // can be when all workers are known to be idle (after >>>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after >>>> > // posting as many blocking tasks as the number of workers). >>>> > >>>> > "before the first set of work is dispatched": If someone wants to >>>> disallow >>>> > detachment before posting work, shouldn't they use TimeDelta::Max() >>>> for the >>>> > suggested reclaim time? >>>> > >>>> >>>> Unless the test requires the ability to detach. I've relaxed the >>>> requirement from "first". >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const >>>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > constexpr >>>> >>>> Done. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: >>>> worker_pool_->DisallowWorkerDetachmentForTesting(); >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > This is not required given that suggested reclaim time is >>>> TimeDelta::Max(), >>>> > right? >>>> >>>> Yep. Done. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: >>>> worker_pool->DisallowWorkerDetachmentForTesting(); >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > This is not required given that suggested reclaim time is >>>> TimeDelta::Max(), >>>> > right? >>>> >>>> Yep. Done. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void >>>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { >>>> On 2016/07/20 01:45:21, gab wrote: >>>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) >>>> >>>> We're counting zero TLS values here so the name should stick. The TLS is >>>> always initialized to zero. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: >>>> EXPECT_GT(zero_tls_values_, 0); >>>> On 2016/07/20 14:15:43, fdoray wrote: >>>> > On 2016/07/20 01:45:21, gab wrote: >>>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but >>>> in >>>> > > atomicops_internals_portable.h it appears to do more than just a >>>> simple deref. >>>> > > >>>> > > Note: NoBarrier is clearly correct because of the surrounding use of >>>> > > WaitableEvent (not that it's very efficient given the heavier >>>> synchronization >>>> > > constructs nearby but the barrier is definitely not necessary) >>>> > >>>> > atomicops.h:19 It is incorrect to make direct assignments to/from an >>>> atomic >>>> > variable. You should use one of the Load or Store routines. >>>> >>>> The atomicops unittest direct references their variables for testing >>>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). >>>> >>>> Went ahead and changed this to a NoBarrier_Load >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> File base/task_scheduler/scheduler_worker_pool_params.cc (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include >>>> "base/time/time.h" >>>> On 2016/07/20 14:15:44, fdoray wrote: >>>> > Not needed if included in the header. >>>> >>>> See earlier comment. >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> File base/task_scheduler/scheduler_worker_pool_params.h (right): >>>> >>>> >>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; >>>> On 2016/07/20 14:15:44, fdoray wrote: >>>> > #include "base/time/time.h" given that there is a TimeDelta member? >>>> >>>> The forward-decl still works here since we only need to know the size in >>>> the constructor. Folks who include this should also include time.h as >>>> well since they themselves will be using TimeDelta. >>>> >>>> https://codereview.chromium.org/2116163002/ >>>> >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
*Barriers are required on both sides on the Barrier_AtomicIncrement version. On Wed, Jul 20, 2016 at 2:07 PM, Robert Liao <robliao@chromium.org> wrote: > Yeah, I think atomicops are indeed atomic. Even the arm64 version of > NoBarrier_AtomicIncrement > <https://cs.chromium.org/chromium/src/v8/src/base/atomicops_internals_arm64_gc...> > attempts to make sure the op was atomic. Since arm64 doesn't ensure the > total ordering in multiprocessor systems, the barriers are required on both > sides of the increment. > > On Wed, Jul 20, 2016 at 2:00 PM, <danakj@chromium.org> wrote: > >> On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> >> wrote: >> >>> The way that the atomic increment is implemented for x86 on Windows is >>> via _InterlockedExchangeAdd, which is intended to be an atomic operation. >>> >>> x86 allows for atomic increments in the processor as long as you specify >>> the lock prefix on the INC instruction. >>> Adds register to register are generally atomic in word sizes. >>> >>> The _InterlockedExchangeAdd gets translated to >>> lock xadd [mem location], [increment value] >>> making the increment atomic. >>> >>> Since on x86 in multiprocessor systems lock prefixed instructions have a >>> total order, the barriers are not necessary. >>> >> >> Perhaps thats related to why barriers have 0 perf impact on x86? (A >> reason to write generally for any platform I guess.) >> >> >>> >>> >>> On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote: >>> >>>> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: >>>> >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:531: >>>>> !subtle::Release_Load(&num_single_threaded_runners_) && >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > On 2016/07/19 22:03:47, robliao wrote: >>>>> > > On 2016/07/13 21:07:55, gab wrote: >>>>> > > > On 2016/07/13 20:19:46, robliao wrote: >>>>> > > > > On 2016/07/13 18:36:31, gab wrote: >>>>> > > > > > > > Why do you need Release_Load() here? >>>>> > > > > > > Since CanDetach may make decisions based off of surrounding >>>>> state from >>>>> > > the >>>>> > > > > > > single threaded task runners (like idle thread accounting), >>>>> I figured >>>>> > it >>>>> > > > > would >>>>> > > > > > > be robust to make sure everything is updated at this point. >>>>> > > > > > >>>>> > > > > > Hmmm, I don't think this matters, idle thread accounting is >>>>> done on its >>>>> > > own >>>>> > > > > (and >>>>> > > > > > is in fact checked before the ReleaseLoad here anyways). All >>>>> checks are >>>>> > > > > > self-sufficient so this one is really only about >>>>> > > > |num_single_threaded_runners| >>>>> > > > > > IMO. >>>>> > > > > >>>>> > > > > In either case, it's the safest thing to do here. We want to >>>>> make sure all >>>>> > > > state >>>>> > > > > from before gets synced up to this point. >>>>> > > > >>>>> > > > I disagree. It's not safer. It's equivalent (plus overhead). >>>>> CanDetach() >>>>> > > doesn't >>>>> > > > document that it produces a memory barrier and nothing relies on >>>>> it doing >>>>> > so. >>>>> > > > The only other thing after this call is HasJoinedForTesting() >>>>> which is >>>>> > > > self-sufficient. >>>>> > > > >>>>> > > > Everything in the scheduler is designed to do exactly what it >>>>> needs to be >>>>> > > safe. >>>>> > > > Not more, not less. >>>>> > > > >>>>> > > > Adding Release semantics here implicitly encodes a requirement >>>>> which doesn't >>>>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely >>>>> cares for a >>>>> > > > sequence ;-). >>>>> > > >>>>> > > In light of the atomic ops discussion, any strong opinion to remove >>>>> the >>>>> > barrier >>>>> > > here? >>>>> > >>>>> > From the conclusion we derived, I think the inc/decrements should use >>>>> Barriers >>>>> > and the read here should use Acquire_Load(). I'm still not convinced >>>>> that this >>>>> > is required when dealing with a single variable (all memory >>>>> re-ordering examples >>>>> > I've seen use two variables and even a NoBarrier_Increment has to use >>>>> RMW -- >>>>> > read-modify-write -- semantics which have an implicit "barrier" for >>>>> at >>>>> least >>>>> > that single variable (but would allow memory re-ordering of other >>>>> memory regions >>>>> > with respect to it?), i.e. maybe we could get away with >>>>> NoBarrier_Increment and >>>>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced >>>>> we need >>>>> > the Acquire_Load() though as the read doesn't have an implicit RMW >>>>> like the >>>>> > increments do (don't understand when Release_Load() is appropriate -- >>>>> C++11 >>>>> > doesn't provide it FWIW). >>>>> >>>>> I've my understanding is right, the atomic ops are guaranteed to >>>>> provide >>>>> atomicity in operations. The barriers control visibility. >>>>> >>>>> So IIUC, it's like this >>>>> >>>>> [Atomic Operation] >>>>> >>>>> NoBarrier_AtomicIncrement of the same atomic variable: >>>>> Thread A [In-place Atomic Increment] [In-place Atomic Increment] >>>>> Thread B [In-place Atomic Increment] [In-place Atomic Increment] >>>>> >>>>> Result Case 1 0->1 1->2 >>>>> Result Case 2 1->2 2->3 >>>>> (And more) >>>>> >>>>> The value is atomic (we aren't reading in the middle of another thread >>>>> writing). The visibility is up to the processor. >>>>> >>>>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual >>>>> exclusion) >>>>> >>>>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place >>>>> Atomic Increment + B] >>>>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place >>>>> Atomic Increment + B] >>>>> Result Case 1 A: 0->1 3->4 >>>>> B: 1->2 2->3 >>>>> >>>>> Result Case 2 A: 1->2 3->4 >>>>> B: 0->1 2->3 >>>>> (...) >>>>> >>>>> Non-atomic increment would separate read-increment-write steps, which >>>>> for word sized data types, would lead to similar behavior as >>>>> NoBarrier_AtomicIncrement. >>>>> >>>> >>>> Quibble: My understanding is that even word-sized increments can end up >>>> being non-atomic inside the processor. >>>> >>>> >>>>> For sizes larger than a word, we could see >>>>> intermediate reads and writes. >>>>> >>>>> So that indeed suggests I need a barrier at the increment and >>>>> decrements. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> So as for whether or not to use Acquire_Load and Release_Load, I guess >>>>> the question comes down when do I want to sync the world >>>>> >>>>> Acquire [Sync] Read >>>>> Release Read [Sync] >>>>> >>>>> Given that this count could be changing any time, practically speaking, >>>>> it doesn't seem to matter. If we get a 0, it could be 1 right after the >>>>> read in either case. The other thing to consider is the memory around >>>>> the value. If that needs to be visible too, then you'll also need a >>>>> barrier (e.g. a certain variable A is set to a value iff a variable B >>>>> is >>>>> 0). >>>>> A Release_Store ensures that setting A and B in concert is visible >>>>> immediately after B is set. >>>>> >>>>> In this case of a read, the other values are synchronized, so it >>>>> doesn't >>>>> matter as much if we sync now or later. With this analysis here, I >>>>> think >>>>> arguably we can use NoBarrier_Load since with the increment barriers, >>>>> the value is visible everywhere. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... >>>>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock >>>>> join_for_testing_called_lock_; >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > On 2016/07/13 21:07:55, gab wrote: >>>>> > > On 2016/07/07 20:45:57, fdoray wrote: >>>>> > > > On 2016/07/07 17:49:16, robliao wrote: >>>>> > > > > On 2016/07/04 19:41:32, fdoray wrote: >>>>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset >>>>> WaitableEvent >>>>> > > > instead >>>>> > > > > of >>>>> > > > > > a SchedulerLock+bool. >>>>> > > > > >>>>> > > > > Here, that seems like an abuse of the WaitableEvent since we >>>>> don't intend >>>>> > to >>>>> > > > > wait on it. Maybe we should be using an atomic instead in those >>>>> cases? >>>>> > > > >>>>> > > > It would be nice to have an AtomicBool class. Since we don't have >>>>> it yet, >>>>> > you >>>>> > > > can keep this as-is. >>>>> > > >>>>> > > We have CancellationFlag. It's name is a bit too specific and no >>>>> one >>>>> has >>>>> > > bothered renaming it, but what some people do is typedef it to a >>>>> > > WhateverElseFlag. >>>>> > > >>>>> > > Whether the flag or a WaitableEvent I would strongly prefer having >>>>> a >>>>> single >>>>> > > member here. Having single-member locks already bothers me some and >>>>> when >>>>> > they're >>>>> > > test specific it bothers me some more. >>>>> > >>>>> > ping ^^ >>>>> > >>>>> > > >>>>> > > PS: I'd be happy to see someone change that class to a BooleanFlag >>>>> with a >>>>> > > typedef at bottom to CancellationFlag for legacy usage. >>>>> > >>>>> >>>>> Yup. I see your pending change for the atomic flag. We can incorporate >>>>> that once the name change goes in. A typedef seems like an abuse of the >>>>> type system just to perform a local rename. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:144: >>>>> SchedulerWorkerPoolImpl* worker_pool, >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > Can stay SchedulerWorkerPool* >>>>> >>>>> Done. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:446: >>>>> performed_idle_cycle_ = true; >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > So if we create/wake one too many thread (i.e. 2 too many threads for >>>>> workload >>>>> > given we always aim to keep one ready), we'll immediately detach it? >>>>> Or do we >>>>> > want to at least wait for "reclaim time" (in which case I think >>>>> starting this at >>>>> > false gives us what we want?) >>>>> >>>>> Yes, if we wake one too many threads with no work, those threads will >>>>> detach. In general, we shouldn't be doing that as that's indicative of >>>>> a >>>>> bug on our side. >>>>> >>>>> Waiting for the reclaim time is just as bad since we'll effectively >>>>> have >>>>> these threads spin waiting. We should instead fix the wake bug. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an >>>>> issue if |num_single_threaded_runners_| is incremented after >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > [...] after this because the newly created TaskRunner (from which no >>>>> task has >>>>> > run yet) will simply run all its tasks on the next physical thread >>>>> created by >>>>> > this worker. >>>>> >>>>> Done. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:564: >>>>> worker_detachment_allowed_(true), >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > (bool should be replaced by flag as mentioned in previous comment, >>>>> but >>>>> if it's >>>>> > not : initialize it inline in header) >>>>> >>>>> Moved initialization. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows >>>>> worker >>>>> thread detachment. If the suggested reclaim time is not >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > // [...], then the test should call this before JoinForTesting() >>>>> > // when no worker thread can be in the middle of detaching. That >>>>> > // can be when all workers are known to be idle (after >>>>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after >>>>> > // posting as many blocking tasks as the number of workers). >>>>> > >>>>> > "before the first set of work is dispatched": If someone wants to >>>>> disallow >>>>> > detachment before posting work, shouldn't they use TimeDelta::Max() >>>>> for the >>>>> > suggested reclaim time? >>>>> > >>>>> >>>>> Unless the test requires the ability to detach. I've relaxed the >>>>> requirement from "first". >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc >>>>> (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const >>>>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10); >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > constexpr >>>>> >>>>> Done. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72: >>>>> worker_pool_->DisallowWorkerDetachmentForTesting(); >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > This is not required given that suggested reclaim time is >>>>> TimeDelta::Max(), >>>>> > right? >>>>> >>>>> Yep. Done. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386: >>>>> worker_pool->DisallowWorkerDetachmentForTesting(); >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > This is not required given that suggested reclaim time is >>>>> TimeDelta::Max(), >>>>> > right? >>>>> >>>>> Yep. Done. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void >>>>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) { >>>>> On 2016/07/20 01:45:21, gab wrote: >>>>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?) >>>>> >>>>> We're counting zero TLS values here so the name should stick. The TLS >>>>> is >>>>> always initialized to zero. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554: >>>>> EXPECT_GT(zero_tls_values_, 0); >>>>> On 2016/07/20 14:15:43, fdoray wrote: >>>>> > On 2016/07/20 01:45:21, gab wrote: >>>>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, >>>>> but >>>>> in >>>>> > > atomicops_internals_portable.h it appears to do more than just a >>>>> simple deref. >>>>> > > >>>>> > > Note: NoBarrier is clearly correct because of the surrounding use >>>>> of >>>>> > > WaitableEvent (not that it's very efficient given the heavier >>>>> synchronization >>>>> > > constructs nearby but the barrier is definitely not necessary) >>>>> > >>>>> > atomicops.h:19 It is incorrect to make direct assignments to/from an >>>>> atomic >>>>> > variable. You should use one of the Load or Store routines. >>>>> >>>>> The atomicops unittest direct references their variables for testing >>>>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc). >>>>> >>>>> Went ahead and changed this to a NoBarrier_Load >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> File base/task_scheduler/scheduler_worker_pool_params.cc (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include >>>>> "base/time/time.h" >>>>> On 2016/07/20 14:15:44, fdoray wrote: >>>>> > Not needed if included in the header. >>>>> >>>>> See earlier comment. >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> File base/task_scheduler/scheduler_worker_pool_params.h (right): >>>>> >>>>> >>>>> https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... >>>>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta; >>>>> On 2016/07/20 14:15:44, fdoray wrote: >>>>> > #include "base/time/time.h" given that there is a TimeDelta member? >>>>> >>>>> The forward-decl still works here since we only need to know the size >>>>> in >>>>> the constructor. Folks who include this should also include time.h as >>>>> well since they themselves will be using TimeDelta. >>>>> >>>>> https://codereview.chromium.org/2116163002/ >>>>> >>>> >>>> >>> >> > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Patchset #10 (id:340001) has been deleted
Patchset #10 (id:360001) has been deleted
Patchset #10 (id:380001) has been deleted
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 20:26:31, gab wrote: > On 2016/07/20 19:44:00, robliao wrote: > > On 2016/07/20 01:45:21, gab wrote: > > > On 2016/07/19 22:03:47, robliao wrote: > > > > On 2016/07/13 21:07:55, gab wrote: > > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > > Why do you need Release_Load() here? > > > > > > > > Since CanDetach may make decisions based off of surrounding state > > from > > > > the > > > > > > > > single threaded task runners (like idle thread accounting), I > > figured > > > it > > > > > > would > > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done on > > its > > > > own > > > > > > (and > > > > > > > is in fact checked before the ReleaseLoad here anyways). All checks > > are > > > > > > > self-sufficient so this one is really only about > > > > > |num_single_threaded_runners| > > > > > > > IMO. > > > > > > > > > > > > In either case, it's the safest thing to do here. We want to make sure > > all > > > > > state > > > > > > from before gets synced up to this point. > > > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() > > > > doesn't > > > > > document that it produces a memory barrier and nothing relies on it > doing > > > so. > > > > > The only other thing after this call is HasJoinedForTesting() which is > > > > > self-sufficient. > > > > > > > > > > Everything in the scheduler is designed to do exactly what it needs to > be > > > > safe. > > > > > Not more, not less. > > > > > > > > > > Adding Release semantics here implicitly encodes a requirement which > > doesn't > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for > a > > > > > sequence ;-). > > > > > > > > In light of the atomic ops discussion, any strong opinion to remove the > > > barrier > > > > here? > > > > > > From the conclusion we derived, I think the inc/decrements should use > Barriers > > > and the read here should use Acquire_Load(). I'm still not convinced that > this > > > is required when dealing with a single variable (all memory re-ordering > > examples > > > I've seen use two variables and even a NoBarrier_Increment has to use RMW -- > > > read-modify-write -- semantics which have an implicit "barrier" for at least > > > that single variable (but would allow memory re-ordering of other memory > > regions > > > with respect to it?), i.e. maybe we could get away with NoBarrier_Increment > > and > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we > need > > > the Acquire_Load() though as the read doesn't have an implicit RMW like the > > > increments do (don't understand when Release_Load() is appropriate -- C++11 > > > doesn't provide it FWIW). > > > > I've my understanding is right > > We have different understandings on a few things (inline) > > >, the atomic ops are guaranteed to provide > > atomicity in operations. The barriers control visibility. > > > > So IIUC, it's like this > > > > [Atomic Operation] > > > > NoBarrier_AtomicIncrement of the same atomic variable: > > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > > > Result Case 1 0->1 1->2 > > Result Case 2 1->2 2->3 > > (And more) > > > > The value is atomic (we aren't reading in the middle of another thread > writing). > > The visibility is up to the processor. > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > > exclusion) > > > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > > Increment + B] > > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > > Increment + B] > > Result Case 1 A: 0->1 3->4 > > B: 1->2 2->3 > > > > Result Case 2 A: 1->2 3->4 > > B: 0->1 2->3 > > (...) > > > > Non-atomic increment would separate read-increment-write steps, which for word > > sized data types, would lead to similar behavior as NoBarrier_AtomicIncrement. > > Hmmm separating read-increment-write steps would be wrong even for word sized > data types (the whole point of atomic RMW is to make sure to do so as a single > transaction). > > > For sizes larger than a word, we could see intermediate reads and writes. > > > > So that indeed suggests I need a barrier at the increment and decrements. > > I think you need a barrier so that the result of the increment is "flushed" > (guaranteed to not be reordered after the next sequential read). > > > > > > > > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess the > > question comes down when do I want to sync the world > > > > Acquire [Sync] Read > > Release Read [Sync] > > > > Given that this count could be changing any time, practically speaking, it > > doesn't seem to matter. If we get a 0, it could be 1 right after the read in > > either case. The other thing to consider is the memory around the value. If > that > > needs to be visible too, then you'll also need a barrier (e.g. a certain > > variable A is set to a value iff a variable B is 0). > > I think we do care about syncing before (Acquire), otherwise a thread could see > 0 forever. It's not relevant that the count could go up right after check (we > explicitly support that by sending a WakeUp which would result in a sleep no-op > IIRC). What's relevant is that we don't see a value that's outdated beyond the > range we explicitly protect ourselves against. > > > A Release_Store ensures that setting A and B in concert is visible immediately > > after B is set. > > I don't think we care about that. Nothing depends on being synchronized with the > variable at stake here, adding the Release adds an implicit dependency which we > don't document and that's no good for readability (and a potential burden if > someone starts depending on it because it happens to be there). Every > synchronization in the scheduler is explicit and intentional, let's not add one > for the sake of "why not". > > > > > In this case of a read, the other values are synchronized > > What do you mean by "the other values are synchronized on read"? Which other > values and why and why does it matter? > > >, so it doesn't matter > > as much if we sync now or later. With this analysis here, I think arguably we > > can use NoBarrier_Load since with the increment barriers, the value is visible > > everywhere. > > Hmmm separating read-increment-write steps would be wrong even for word sized data types (the whole point of atomic RMW is to make sure to do so as a single transaction). Indeed, but that's the joy of omitting the barriers. Non-atomic ops look indistinguishable from atomic ops. > I think you need a barrier so that the result of the increment is "flushed" (guaranteed to not be reordered after the next sequential read). Processors and compilers guarantee that single-threaded behavior is preserved regardless of the ordering game within a single processor. When multiple processors (and thus threads) are running, that guarantee goes down to the memory coherence guarantees (which are quite strong on x86 and weak on arm for example). > I don't think we care about that. Nothing depends on being synchronized with the variable at stake here, adding the Release adds an implicit dependency which we don't document and that's no good for readability (and a potential burden if someone starts depending on it because it happens to be there). Every synchronization in the scheduler is explicit and intentional, let's not add one for the sake of "why not". That's correct. Since all of the other values are already sync'ed via locks or are only modified on one thread, we're generally okay. > I think we do care about syncing before (Acquire), otherwise a thread could see 0 forever. It's not relevant that the count could go up right after check (we explicitly support that by sending a WakeUp which would result in a sleep no-op IIRC). What's relevant is that we don't see a value that's outdated beyond the range we explicitly protect ourselves against. In the Barrier_AtomicIncrement world, a thread won't always see zero because the increments are barriered. The memory address in cache is flushed and invalidated on the other cores. > What do you mean by "the other values are synchronized on read"? Which other values and why and why does it matter? I mean Release_Read. It doesn't matter in this case now that we're using barriers for the counters. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock join_for_testing_called_lock_; On 2016/07/20 20:26:31, gab wrote: > On 2016/07/20 19:44:00, robliao wrote: > > On 2016/07/20 01:45:21, gab wrote: > > > On 2016/07/13 21:07:55, gab wrote: > > > > On 2016/07/07 20:45:57, fdoray wrote: > > > > > On 2016/07/07 17:49:16, robliao wrote: > > > > > > On 2016/07/04 19:41:32, fdoray wrote: > > > > > > > Elsewhere in base/task_scheduler, we use a manual reset > WaitableEvent > > > > > instead > > > > > > of > > > > > > > a SchedulerLock+bool. > > > > > > > > > > > > Here, that seems like an abuse of the WaitableEvent since we don't > > intend > > > to > > > > > > wait on it. Maybe we should be using an atomic instead in those cases? > > > > > > > > > > It would be nice to have an AtomicBool class. Since we don't have it > yet, > > > you > > > > > can keep this as-is. > > > > > > > > We have CancellationFlag. It's name is a bit too specific and no one has > > > > bothered renaming it, but what some people do is typedef it to a > > > > WhateverElseFlag. > > > > > > > > Whether the flag or a WaitableEvent I would strongly prefer having a > single > > > > member here. Having single-member locks already bothers me some and when > > > they're > > > > test specific it bothers me some more. > > > > > > ping ^^ > > > > > > > > > > > PS: I'd be happy to see someone change that class to a BooleanFlag with a > > > > typedef at bottom to CancellationFlag for legacy usage. > > > > > > > Yup. I see your pending change for the atomic flag. > > Which change? It sure isn't mine or I code in my sleep :-) > > > We can incorporate that once > > the name change goes in. A typedef seems like an abuse of the type system just > > to perform a local rename. > > It's at least done here (and I remember other places but I can't find it, maybe > deleted code). I don't see why not, but fine to do it later too. > > https://cs.chromium.org/chromium/src/chrome/browser/after_startup_task_utils.... Ah, I guess I misread it and it was fdoray's https://codereview.chromium.org/2163753004/ It's coming, so we might as well wait for that. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ = true; On 2016/07/20 20:26:31, gab wrote: > On 2016/07/20 19:44:01, robliao wrote: > > On 2016/07/20 01:45:21, gab wrote: > > > So if we create/wake one too many thread (i.e. 2 too many threads for > workload > > > given we always aim to keep one ready), we'll immediately detach it? Or do > we > > > want to at least wait for "reclaim time" (in which case I think starting > this > > at > > > false gives us what we want?) > > > > Yes, if we wake one too many threads with no work, those threads will detach. > In > > general, we shouldn't be doing that as that's indicative of a bug on our side. > > > > > Waiting for the reclaim time is just as bad since we'll effectively have these > > threads spin waiting. We should instead fix the wake bug. > > It's not necessarily a bug. It can today under a simple bursty load. > > i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all processed > by first two threads before other 2 even get scheduled. > > I think in those cases the ever idle threads should wait for reclaim time or we > might end up trashing the system. Say 4 more tasks show up after an immediate > reclaim: we recreate those 2 threads but the 2 active one yet again manage to > process the 4 tasks before the 2 recreated threads are scheduled (etc.). > > So I still think we should start this false (and explain the above in a comment > here). > > (and we should reset it to false right before detaching as well for the same > reasons) Fair enough. I've moved this to an idle_start_time instead of tracking one cycle count.
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 22:18:03, robliao wrote: > On 2016/07/20 20:26:31, gab wrote: > > On 2016/07/20 19:44:00, robliao wrote: > > > On 2016/07/20 01:45:21, gab wrote: > > > > On 2016/07/19 22:03:47, robliao wrote: > > > > > On 2016/07/13 21:07:55, gab wrote: > > > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > > > Why do you need Release_Load() here? > > > > > > > > > Since CanDetach may make decisions based off of surrounding > state > > > from > > > > > the > > > > > > > > > single threaded task runners (like idle thread accounting), I > > > figured > > > > it > > > > > > > would > > > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done > on > > > its > > > > > own > > > > > > > (and > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All > checks > > > are > > > > > > > > self-sufficient so this one is really only about > > > > > > |num_single_threaded_runners| > > > > > > > > IMO. > > > > > > > > > > > > > > In either case, it's the safest thing to do here. We want to make > sure > > > all > > > > > > state > > > > > > > from before gets synced up to this point. > > > > > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). > CanDetach() > > > > > doesn't > > > > > > document that it produces a memory barrier and nothing relies on it > > doing > > > > so. > > > > > > The only other thing after this call is HasJoinedForTesting() which is > > > > > > self-sufficient. > > > > > > > > > > > > Everything in the scheduler is designed to do exactly what it needs to > > be > > > > > safe. > > > > > > Not more, not less. > > > > > > > > > > > > Adding Release semantics here implicitly encodes a requirement which > > > doesn't > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares > for > > a > > > > > > sequence ;-). > > > > > > > > > > In light of the atomic ops discussion, any strong opinion to remove the > > > > barrier > > > > > here? > > > > > > > > From the conclusion we derived, I think the inc/decrements should use > > Barriers > > > > and the read here should use Acquire_Load(). I'm still not convinced that > > this > > > > is required when dealing with a single variable (all memory re-ordering > > > examples > > > > I've seen use two variables and even a NoBarrier_Increment has to use RMW > -- > > > > read-modify-write -- semantics which have an implicit "barrier" for at > least > > > > that single variable (but would allow memory re-ordering of other memory > > > regions > > > > with respect to it?), i.e. maybe we could get away with > NoBarrier_Increment > > > and > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we > > need > > > > the Acquire_Load() though as the read doesn't have an implicit RMW like > the > > > > increments do (don't understand when Release_Load() is appropriate -- > C++11 > > > > doesn't provide it FWIW). > > > > > > I've my understanding is right > > > > We have different understandings on a few things (inline) > > > > >, the atomic ops are guaranteed to provide > > > atomicity in operations. The barriers control visibility. > > > > > > So IIUC, it's like this > > > > > > [Atomic Operation] > > > > > > NoBarrier_AtomicIncrement of the same atomic variable: > > > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > > > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > > > > > Result Case 1 0->1 1->2 > > > Result Case 2 1->2 2->3 > > > (And more) > > > > > > The value is atomic (we aren't reading in the middle of another thread > > writing). > > > The visibility is up to the processor. > > > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > > > exclusion) > > > > > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > > > Increment + B] > > > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place Atomic > > > Increment + B] > > > Result Case 1 A: 0->1 3->4 > > > B: 1->2 2->3 > > > > > > Result Case 2 A: 1->2 3->4 > > > B: 0->1 2->3 > > > (...) > > > > > > Non-atomic increment would separate read-increment-write steps, which for > word > > > sized data types, would lead to similar behavior as > NoBarrier_AtomicIncrement. > > > > Hmmm separating read-increment-write steps would be wrong even for word sized > > data types (the whole point of atomic RMW is to make sure to do so as a single > > transaction). > > > > > For sizes larger than a word, we could see intermediate reads and writes. > > > > > > So that indeed suggests I need a barrier at the increment and decrements. > > > > I think you need a barrier so that the result of the increment is "flushed" > > (guaranteed to not be reordered after the next sequential read). > > > > > > > > > > > > > > > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess the > > > question comes down when do I want to sync the world > > > > > > Acquire [Sync] Read > > > Release Read [Sync] > > > > > > Given that this count could be changing any time, practically speaking, it > > > doesn't seem to matter. If we get a 0, it could be 1 right after the read in > > > either case. The other thing to consider is the memory around the value. If > > that > > > needs to be visible too, then you'll also need a barrier (e.g. a certain > > > variable A is set to a value iff a variable B is 0). > > > > I think we do care about syncing before (Acquire), otherwise a thread could > see > > 0 forever. It's not relevant that the count could go up right after check (we > > explicitly support that by sending a WakeUp which would result in a sleep > no-op > > IIRC). What's relevant is that we don't see a value that's outdated beyond the > > range we explicitly protect ourselves against. > > > > > A Release_Store ensures that setting A and B in concert is visible > immediately > > > after B is set. > > > > I don't think we care about that. Nothing depends on being synchronized with > the > > variable at stake here, adding the Release adds an implicit dependency which > we > > don't document and that's no good for readability (and a potential burden if > > someone starts depending on it because it happens to be there). Every > > synchronization in the scheduler is explicit and intentional, let's not add > one > > for the sake of "why not". > > > > > > > > In this case of a read, the other values are synchronized > > > > What do you mean by "the other values are synchronized on read"? Which other > > values and why and why does it matter? > > > > >, so it doesn't matter > > > as much if we sync now or later. With this analysis here, I think arguably > we > > > can use NoBarrier_Load since with the increment barriers, the value is > visible > > > everywhere. > > > > > Hmmm separating read-increment-write steps would be wrong even for word sized > data types (the whole point of atomic RMW is to make sure to do so as a single > transaction). > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look > indistinguishable from atomic ops. > > > I think you need a barrier so that the result of the increment is "flushed" > (guaranteed to not be reordered after the next sequential read). > Processors and compilers guarantee that single-threaded behavior is preserved > regardless of the ordering game within a single processor. When multiple > processors (and thus threads) are running, that guarantee goes down to the > memory coherence guarantees (which are quite strong on x86 and weak on arm for > example). > > > I don't think we care about that. Nothing depends on being synchronized with > the > variable at stake here, adding the Release adds an implicit dependency which we > don't document and that's no good for readability (and a potential burden if > someone starts depending on it because it happens to be there). Every > synchronization in the scheduler is explicit and intentional, let's not add one > for the sake of "why not". > > That's correct. Since all of the other values are already sync'ed via locks or > are only modified on one thread, we're generally okay. > > > I think we do care about syncing before (Acquire), otherwise a thread could > see > 0 forever. It's not relevant that the count could go up right after check (we > explicitly support that by sending a WakeUp which would result in a sleep no-op > IIRC). What's relevant is that we don't see a value that's outdated beyond the > range we explicitly protect ourselves against. > > In the Barrier_AtomicIncrement world, a thread won't always see zero because the > increments are barriered. The memory address in cache is flushed and invalidated > on the other cores. I'm not convinced this is true. The RMW will add an lfence (Acquire) on its read and an rfence (Release) on its write *for that thread*. That doesn't mean that other threads will see the value (they need an lfence (Acquire)) to be forced to fetch it. In any case, even if NoBarrier was correct here, it's not worth optimizing CanDetach() as it's running when the thread is idle anyways so might as well add the lfence (Acquire). > > > What do you mean by "the other values are synchronized on read"? Which other > values and why and why does it matter? > I mean Release_Read. It doesn't matter in this case now that we're using > barriers for the counters. https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ = true; On 2016/07/20 22:18:03, robliao wrote: > On 2016/07/20 20:26:31, gab wrote: > > On 2016/07/20 19:44:01, robliao wrote: > > > On 2016/07/20 01:45:21, gab wrote: > > > > So if we create/wake one too many thread (i.e. 2 too many threads for > > workload > > > > given we always aim to keep one ready), we'll immediately detach it? Or do > > we > > > > want to at least wait for "reclaim time" (in which case I think starting > > this > > > at > > > > false gives us what we want?) > > > > > > Yes, if we wake one too many threads with no work, those threads will > detach. > > In > > > general, we shouldn't be doing that as that's indicative of a bug on our > side. > > > > > > > > Waiting for the reclaim time is just as bad since we'll effectively have > these > > > threads spin waiting. We should instead fix the wake bug. > > > > It's not necessarily a bug. It can today under a simple bursty load. > > > > i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all > processed > > by first two threads before other 2 even get scheduled. > > > > I think in those cases the ever idle threads should wait for reclaim time or > we > > might end up trashing the system. Say 4 more tasks show up after an immediate > > reclaim: we recreate those 2 threads but the 2 active one yet again manage to > > process the 4 tasks before the 2 recreated threads are scheduled (etc.). > > > > So I still think we should start this false (and explain the above in a > comment > > here). > > > > (and we should reset it to false right before detaching as well for the same > > reasons) > > Fair enough. I've moved this to an idle_start_time instead of tracking one cycle > count. I like the new version very much, easier to read and reason about too :-). https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() - idle_start_time_) > outer_->suggested_reclaim_time_ && Use TimeTicks? IIRC they're more reliable (and maybe even more efficient?) @fdoray's the expert on this
https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() - idle_start_time_) > outer_->suggested_reclaim_time_ && On 2016/07/21 13:43:22, gab wrote: > Use TimeTicks? IIRC they're more reliable (and maybe even more efficient?) > > @fdoray's the expert on this Yes, use TimeTicks. Time = use it to get an absolute value convertible to UTC and comparable with values taken on other machines. It is low resolution, non-monotonic and doesn't increase at a constant rate. TimeTicks = use it to measure elapsed time. The absolute value has no meaning. It has a high resolution, is monotonic and increases at a constant rate. https://docs.google.com/document/d/1ypBZPZnshJ612FWAjkkpSBQiII7vYEEDsRkDOkVjQ...
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/21 13:43:21, gab wrote: > On 2016/07/20 22:18:03, robliao wrote: > > On 2016/07/20 20:26:31, gab wrote: > > > On 2016/07/20 19:44:00, robliao wrote: > > > > On 2016/07/20 01:45:21, gab wrote: > > > > > On 2016/07/19 22:03:47, robliao wrote: > > > > > > On 2016/07/13 21:07:55, gab wrote: > > > > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > > > > Why do you need Release_Load() here? > > > > > > > > > > Since CanDetach may make decisions based off of surrounding > > state > > > > from > > > > > > the > > > > > > > > > > single threaded task runners (like idle thread accounting), I > > > > figured > > > > > it > > > > > > > > would > > > > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done > > on > > > > its > > > > > > own > > > > > > > > (and > > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All > > checks > > > > are > > > > > > > > > self-sufficient so this one is really only about > > > > > > > |num_single_threaded_runners| > > > > > > > > > IMO. > > > > > > > > > > > > > > > > In either case, it's the safest thing to do here. We want to make > > sure > > > > all > > > > > > > state > > > > > > > > from before gets synced up to this point. > > > > > > > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). > > CanDetach() > > > > > > doesn't > > > > > > > document that it produces a memory barrier and nothing relies on it > > > doing > > > > > so. > > > > > > > The only other thing after this call is HasJoinedForTesting() which > is > > > > > > > self-sufficient. > > > > > > > > > > > > > > Everything in the scheduler is designed to do exactly what it needs > to > > > be > > > > > > safe. > > > > > > > Not more, not less. > > > > > > > > > > > > > > Adding Release semantics here implicitly encodes a requirement which > > > > doesn't > > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares > > for > > > a > > > > > > > sequence ;-). > > > > > > > > > > > > In light of the atomic ops discussion, any strong opinion to remove > the > > > > > barrier > > > > > > here? > > > > > > > > > > From the conclusion we derived, I think the inc/decrements should use > > > Barriers > > > > > and the read here should use Acquire_Load(). I'm still not convinced > that > > > this > > > > > is required when dealing with a single variable (all memory re-ordering > > > > examples > > > > > I've seen use two variables and even a NoBarrier_Increment has to use > RMW > > -- > > > > > read-modify-write -- semantics which have an implicit "barrier" for at > > least > > > > > that single variable (but would allow memory re-ordering of other memory > > > > regions > > > > > with respect to it?), i.e. maybe we could get away with > > NoBarrier_Increment > > > > and > > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we > > > need > > > > > the Acquire_Load() though as the read doesn't have an implicit RMW like > > the > > > > > increments do (don't understand when Release_Load() is appropriate -- > > C++11 > > > > > doesn't provide it FWIW). > > > > > > > > I've my understanding is right > > > > > > We have different understandings on a few things (inline) > > > > > > >, the atomic ops are guaranteed to provide > > > > atomicity in operations. The barriers control visibility. > > > > > > > > So IIUC, it's like this > > > > > > > > [Atomic Operation] > > > > > > > > NoBarrier_AtomicIncrement of the same atomic variable: > > > > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > > > > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > > > > > > > Result Case 1 0->1 1->2 > > > > Result Case 2 1->2 2->3 > > > > (And more) > > > > > > > > The value is atomic (we aren't reading in the middle of another thread > > > writing). > > > > The visibility is up to the processor. > > > > > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > > > > exclusion) > > > > > > > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place > Atomic > > > > Increment + B] > > > > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place > Atomic > > > > Increment + B] > > > > Result Case 1 A: 0->1 3->4 > > > > B: 1->2 2->3 > > > > > > > > Result Case 2 A: 1->2 3->4 > > > > B: 0->1 2->3 > > > > (...) > > > > > > > > Non-atomic increment would separate read-increment-write steps, which for > > word > > > > sized data types, would lead to similar behavior as > > NoBarrier_AtomicIncrement. > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > sized > > > data types (the whole point of atomic RMW is to make sure to do so as a > single > > > transaction). > > > > > > > For sizes larger than a word, we could see intermediate reads and writes. > > > > > > > > So that indeed suggests I need a barrier at the increment and decrements. > > > > > > I think you need a barrier so that the result of the increment is "flushed" > > > (guaranteed to not be reordered after the next sequential read). > > > > > > > > > > > > > > > > > > > > > > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess the > > > > question comes down when do I want to sync the world > > > > > > > > Acquire [Sync] Read > > > > Release Read [Sync] > > > > > > > > Given that this count could be changing any time, practically speaking, it > > > > doesn't seem to matter. If we get a 0, it could be 1 right after the read > in > > > > either case. The other thing to consider is the memory around the value. > If > > > that > > > > needs to be visible too, then you'll also need a barrier (e.g. a certain > > > > variable A is set to a value iff a variable B is 0). > > > > > > I think we do care about syncing before (Acquire), otherwise a thread could > > see > > > 0 forever. It's not relevant that the count could go up right after check > (we > > > explicitly support that by sending a WakeUp which would result in a sleep > > no-op > > > IIRC). What's relevant is that we don't see a value that's outdated beyond > the > > > range we explicitly protect ourselves against. > > > > > > > A Release_Store ensures that setting A and B in concert is visible > > immediately > > > > after B is set. > > > > > > I don't think we care about that. Nothing depends on being synchronized with > > the > > > variable at stake here, adding the Release adds an implicit dependency which > > we > > > don't document and that's no good for readability (and a potential burden if > > > someone starts depending on it because it happens to be there). Every > > > synchronization in the scheduler is explicit and intentional, let's not add > > one > > > for the sake of "why not". > > > > > > > > > > > In this case of a read, the other values are synchronized > > > > > > What do you mean by "the other values are synchronized on read"? Which other > > > values and why and why does it matter? > > > > > > >, so it doesn't matter > > > > as much if we sync now or later. With this analysis here, I think arguably > > we > > > > can use NoBarrier_Load since with the increment barriers, the value is > > visible > > > > everywhere. > > > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > sized > > data types (the whole point of atomic RMW is to make sure to do so as a single > > transaction). > > > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look > > indistinguishable from atomic ops. > > > > > I think you need a barrier so that the result of the increment is "flushed" > > (guaranteed to not be reordered after the next sequential read). > > Processors and compilers guarantee that single-threaded behavior is preserved > > regardless of the ordering game within a single processor. When multiple > > processors (and thus threads) are running, that guarantee goes down to the > > memory coherence guarantees (which are quite strong on x86 and weak on arm for > > example). > > > > > I don't think we care about that. Nothing depends on being synchronized with > > the > > variable at stake here, adding the Release adds an implicit dependency which > we > > don't document and that's no good for readability (and a potential burden if > > someone starts depending on it because it happens to be there). Every > > synchronization in the scheduler is explicit and intentional, let's not add > one > > for the sake of "why not". > > > > That's correct. Since all of the other values are already sync'ed via locks or > > are only modified on one thread, we're generally okay. > > > > > I think we do care about syncing before (Acquire), otherwise a thread could > > see > > 0 forever. It's not relevant that the count could go up right after check (we > > explicitly support that by sending a WakeUp which would result in a sleep > no-op > > IIRC). What's relevant is that we don't see a value that's outdated beyond the > > range we explicitly protect ourselves against. > > > > In the Barrier_AtomicIncrement world, a thread won't always see zero because > the > > increments are barriered. The memory address in cache is flushed and > invalidated > > on the other cores. > > I'm not convinced this is true. The RMW will add an lfence (Acquire) on its read > and an rfence (Release) on its write *for that thread*. That doesn't mean that > other threads will see the value (they need an lfence (Acquire)) to be forced to > fetch it. > > In any case, even if NoBarrier was correct here, it's not worth optimizing > CanDetach() as it's running when the thread is idle anyways so might as well add > the lfence (Acquire). > > > > > > What do you mean by "the other values are synchronized on read"? Which other > > values and why and why does it matter? > > I mean Release_Read. It doesn't matter in this case now that we're using > > barriers for the counters. > Digging into this deeper, the RMW will happen atomically with atomicops (lock xadd for x86, and a retry loop for arm to make sure the memory written is as expected). My operating assumption above on visibility incorrectly assumed a non-atomic RMW. The fence is to make sure that state is updated in a consistent way. Because memory ops can be reordered, things will appear just fine on the current thread, but could appear strange to outside threads. Fencing ensures that memory updates from a single thread appear in the correct order for other threads. Using the refcounted object as an example... When a refcounted object addrefs, the refcount doesn't really matter to an external thread except for the fact that it is non-zero. When refcounting down, the comment says to use a barrier to make sure the "state written before the reference count became zero will be visible to a thread that has just made the count zero". This comment makes no sense to me because from the same thread, all memory operations on the same thread are immediately visible on the same thread (even with reordering, processors work hard to ensure that single-threaded behaviors still work correctly). However, if considering that someone knew they had the last ref (already problematic, but that's a separate issue), and they set a boolean fReleasedLastRef, this could potentially be useful to another thread expecting to see fReleasedLastRef and an object of refcount 0. https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() - idle_start_time_) > outer_->suggested_reclaim_time_ && On 2016/07/21 13:59:08, fdoray wrote: > On 2016/07/21 13:43:22, gab wrote: > > Use TimeTicks? IIRC they're more reliable (and maybe even more efficient?) > > > > @fdoray's the expert on this > > Yes, use TimeTicks. > > Time = use it to get an absolute value convertible to UTC and comparable with > values taken on other machines. It is low resolution, non-monotonic and doesn't > increase at a constant rate. > TimeTicks = use it to measure elapsed time. The absolute value has no meaning. > It has a high resolution, is monotonic and increases at a constant rate. > > https://docs.google.com/document/d/1ypBZPZnshJ612FWAjkkpSBQiII7vYEEDsRkDOkVjQ... Done.
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/21 18:44:23, robliao wrote: > On 2016/07/21 13:43:21, gab wrote: > > On 2016/07/20 22:18:03, robliao wrote: > > > On 2016/07/20 20:26:31, gab wrote: > > > > On 2016/07/20 19:44:00, robliao wrote: > > > > > On 2016/07/20 01:45:21, gab wrote: > > > > > > On 2016/07/19 22:03:47, robliao wrote: > > > > > > > On 2016/07/13 21:07:55, gab wrote: > > > > > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > > > > > Why do you need Release_Load() here? > > > > > > > > > > > Since CanDetach may make decisions based off of surrounding > > > state > > > > > from > > > > > > > the > > > > > > > > > > > single threaded task runners (like idle thread accounting), > I > > > > > figured > > > > > > it > > > > > > > > > would > > > > > > > > > > > be robust to make sure everything is updated at this point. > > > > > > > > > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is > done > > > on > > > > > its > > > > > > > own > > > > > > > > > (and > > > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All > > > checks > > > > > are > > > > > > > > > > self-sufficient so this one is really only about > > > > > > > > |num_single_threaded_runners| > > > > > > > > > > IMO. > > > > > > > > > > > > > > > > > > In either case, it's the safest thing to do here. We want to > make > > > sure > > > > > all > > > > > > > > state > > > > > > > > > from before gets synced up to this point. > > > > > > > > > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). > > > CanDetach() > > > > > > > doesn't > > > > > > > > document that it produces a memory barrier and nothing relies on > it > > > > doing > > > > > > so. > > > > > > > > The only other thing after this call is HasJoinedForTesting() > which > > is > > > > > > > > self-sufficient. > > > > > > > > > > > > > > > > Everything in the scheduler is designed to do exactly what it > needs > > to > > > > be > > > > > > > safe. > > > > > > > > Not more, not less. > > > > > > > > > > > > > > > > Adding Release semantics here implicitly encodes a requirement > which > > > > > doesn't > > > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely > cares > > > for > > > > a > > > > > > > > sequence ;-). > > > > > > > > > > > > > > In light of the atomic ops discussion, any strong opinion to remove > > the > > > > > > barrier > > > > > > > here? > > > > > > > > > > > > From the conclusion we derived, I think the inc/decrements should use > > > > Barriers > > > > > > and the read here should use Acquire_Load(). I'm still not convinced > > that > > > > this > > > > > > is required when dealing with a single variable (all memory > re-ordering > > > > > examples > > > > > > I've seen use two variables and even a NoBarrier_Increment has to use > > RMW > > > -- > > > > > > read-modify-write -- semantics which have an implicit "barrier" for at > > > least > > > > > > that single variable (but would allow memory re-ordering of other > memory > > > > > regions > > > > > > with respect to it?), i.e. maybe we could get away with > > > NoBarrier_Increment > > > > > and > > > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced > we > > > > need > > > > > > the Acquire_Load() though as the read doesn't have an implicit RMW > like > > > the > > > > > > increments do (don't understand when Release_Load() is appropriate -- > > > C++11 > > > > > > doesn't provide it FWIW). > > > > > > > > > > I've my understanding is right > > > > > > > > We have different understandings on a few things (inline) > > > > > > > > >, the atomic ops are guaranteed to provide > > > > > atomicity in operations. The barriers control visibility. > > > > > > > > > > So IIUC, it's like this > > > > > > > > > > [Atomic Operation] > > > > > > > > > > NoBarrier_AtomicIncrement of the same atomic variable: > > > > > Thread A [In-place Atomic Increment] [In-place Atomic Increment] > > > > > Thread B [In-place Atomic Increment] [In-place Atomic Increment] > > > > > > > > > > Result Case 1 0->1 1->2 > > > > > Result Case 2 1->2 2->3 > > > > > (And more) > > > > > > > > > > The value is atomic (we aren't reading in the middle of another thread > > > > writing). > > > > > The visibility is up to the processor. > > > > > > > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > > > > > exclusion) > > > > > > > > > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place > > Atomic > > > > > Increment + B] > > > > > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place > > Atomic > > > > > Increment + B] > > > > > Result Case 1 A: 0->1 3->4 > > > > > B: 1->2 2->3 > > > > > > > > > > Result Case 2 A: 1->2 3->4 > > > > > B: 0->1 2->3 > > > > > (...) > > > > > > > > > > Non-atomic increment would separate read-increment-write steps, which > for > > > word > > > > > sized data types, would lead to similar behavior as > > > NoBarrier_AtomicIncrement. > > > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > > sized > > > > data types (the whole point of atomic RMW is to make sure to do so as a > > single > > > > transaction). > > > > > > > > > For sizes larger than a word, we could see intermediate reads and > writes. > > > > > > > > > > So that indeed suggests I need a barrier at the increment and > decrements. > > > > > > > > I think you need a barrier so that the result of the increment is > "flushed" > > > > (guaranteed to not be reordered after the next sequential read). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess > the > > > > > question comes down when do I want to sync the world > > > > > > > > > > Acquire [Sync] Read > > > > > Release Read [Sync] > > > > > > > > > > Given that this count could be changing any time, practically speaking, > it > > > > > doesn't seem to matter. If we get a 0, it could be 1 right after the > read > > in > > > > > either case. The other thing to consider is the memory around the value. > > If > > > > that > > > > > needs to be visible too, then you'll also need a barrier (e.g. a certain > > > > > variable A is set to a value iff a variable B is 0). > > > > > > > > I think we do care about syncing before (Acquire), otherwise a thread > could > > > see > > > > 0 forever. It's not relevant that the count could go up right after check > > (we > > > > explicitly support that by sending a WakeUp which would result in a sleep > > > no-op > > > > IIRC). What's relevant is that we don't see a value that's outdated beyond > > the > > > > range we explicitly protect ourselves against. > > > > > > > > > A Release_Store ensures that setting A and B in concert is visible > > > immediately > > > > > after B is set. > > > > > > > > I don't think we care about that. Nothing depends on being synchronized > with > > > the > > > > variable at stake here, adding the Release adds an implicit dependency > which > > > we > > > > don't document and that's no good for readability (and a potential burden > if > > > > someone starts depending on it because it happens to be there). Every > > > > synchronization in the scheduler is explicit and intentional, let's not > add > > > one > > > > for the sake of "why not". > > > > > > > > > > > > > > In this case of a read, the other values are synchronized > > > > > > > > What do you mean by "the other values are synchronized on read"? Which > other > > > > values and why and why does it matter? > > > > > > > > >, so it doesn't matter > > > > > as much if we sync now or later. With this analysis here, I think > arguably > > > we > > > > > can use NoBarrier_Load since with the increment barriers, the value is > > > visible > > > > > everywhere. > > > > > > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > > sized > > > data types (the whole point of atomic RMW is to make sure to do so as a > single > > > transaction). > > > > > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look > > > indistinguishable from atomic ops. > > > > > > > I think you need a barrier so that the result of the increment is > "flushed" > > > (guaranteed to not be reordered after the next sequential read). > > > Processors and compilers guarantee that single-threaded behavior is > preserved > > > regardless of the ordering game within a single processor. When multiple > > > processors (and thus threads) are running, that guarantee goes down to the > > > memory coherence guarantees (which are quite strong on x86 and weak on arm > for > > > example). > > > > > > > I don't think we care about that. Nothing depends on being synchronized > with > > > the > > > variable at stake here, adding the Release adds an implicit dependency which > > we > > > don't document and that's no good for readability (and a potential burden if > > > someone starts depending on it because it happens to be there). Every > > > synchronization in the scheduler is explicit and intentional, let's not add > > one > > > for the sake of "why not". > > > > > > That's correct. Since all of the other values are already sync'ed via locks > or > > > are only modified on one thread, we're generally okay. > > > > > > > I think we do care about syncing before (Acquire), otherwise a thread > could > > > see > > > 0 forever. It's not relevant that the count could go up right after check > (we > > > explicitly support that by sending a WakeUp which would result in a sleep > > no-op > > > IIRC). What's relevant is that we don't see a value that's outdated beyond > the > > > range we explicitly protect ourselves against. > > > > > > In the Barrier_AtomicIncrement world, a thread won't always see zero because > > the > > > increments are barriered. The memory address in cache is flushed and > > invalidated > > > on the other cores. > > > > I'm not convinced this is true. The RMW will add an lfence (Acquire) on its > read > > and an rfence (Release) on its write *for that thread*. That doesn't mean that > > other threads will see the value (they need an lfence (Acquire)) to be forced > to > > fetch it. > > > > In any case, even if NoBarrier was correct here, it's not worth optimizing > > CanDetach() as it's running when the thread is idle anyways so might as well > add > > the lfence (Acquire). > > > > > > > > > What do you mean by "the other values are synchronized on read"? Which > other > > > values and why and why does it matter? > > > I mean Release_Read. It doesn't matter in this case now that we're using > > > barriers for the counters. > > > > Digging into this deeper, the RMW will happen atomically with atomicops (lock > xadd for x86, and a retry loop for arm to make sure the memory written is as > expected). My operating assumption above on visibility incorrectly assumed a > non-atomic RMW. > > The fence is to make sure that state is updated in a consistent way. Because > memory ops can be reordered, things will appear just fine on the current thread, > but could appear strange to outside threads. Fencing ensures that memory updates > from a single thread appear in the correct order for other threads. > > Using the refcounted object as an example... > When a refcounted object addrefs, the refcount doesn't really matter to an > external thread except for the fact that it is non-zero. > > When refcounting down, the comment says to use a barrier to make sure the "state > written before the reference count became zero will be visible to a thread that > has just made the count zero". This comment makes no sense to me because from > the same thread, all memory operations on the same thread are immediately > visible on the same thread (even with reordering, processors work hard to ensure > that single-threaded behaviors still work correctly). > > However, if considering that someone knew they had the last ref (already > problematic, but that's a separate issue), and they set a boolean > fReleasedLastRef, this could potentially be useful to another thread expecting > to see fReleasedLastRef and an object of refcount 0. Refcount is different, no one ever wants to just "read" the value (well except in say AtomicRefCountIsOne() which does us Acquire_Load() as I think we need to here). The Barrier on decrement (and that comment) IMO mean that: although the memory order doesn't matter, whoever reaches 0 (and thus falls last in the relaxed memory order) should sync its state for *other variables* so that whatever it deletes (likely action of getting to 0) is the most recent version (and it's probably guaranteed to see what other threads have done to *other variables* because their RMW "happened before" in the relaxed memory order per this thread being the one hitting 0 -- though since it hit 0 last I would assume even a NoBarrier RMW would force the lfence on the R to sync...).
Added a fence to the load. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul... base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/21 21:24:24, gab wrote: > On 2016/07/21 18:44:23, robliao wrote: > > On 2016/07/21 13:43:21, gab wrote: > > > On 2016/07/20 22:18:03, robliao wrote: > > > > On 2016/07/20 20:26:31, gab wrote: > > > > > On 2016/07/20 19:44:00, robliao wrote: > > > > > > On 2016/07/20 01:45:21, gab wrote: > > > > > > > On 2016/07/19 22:03:47, robliao wrote: > > > > > > > > On 2016/07/13 21:07:55, gab wrote: > > > > > > > > > On 2016/07/13 20:19:46, robliao wrote: > > > > > > > > > > On 2016/07/13 18:36:31, gab wrote: > > > > > > > > > > > > > Why do you need Release_Load() here? > > > > > > > > > > > > Since CanDetach may make decisions based off of > surrounding > > > > state > > > > > > from > > > > > > > > the > > > > > > > > > > > > single threaded task runners (like idle thread > accounting), > > I > > > > > > figured > > > > > > > it > > > > > > > > > > would > > > > > > > > > > > > be robust to make sure everything is updated at this > point. > > > > > > > > > > > > > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is > > done > > > > on > > > > > > its > > > > > > > > own > > > > > > > > > > (and > > > > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All > > > > checks > > > > > > are > > > > > > > > > > > self-sufficient so this one is really only about > > > > > > > > > |num_single_threaded_runners| > > > > > > > > > > > IMO. > > > > > > > > > > > > > > > > > > > > In either case, it's the safest thing to do here. We want to > > make > > > > sure > > > > > > all > > > > > > > > > state > > > > > > > > > > from before gets synced up to this point. > > > > > > > > > > > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead). > > > > CanDetach() > > > > > > > > doesn't > > > > > > > > > document that it produces a memory barrier and nothing relies on > > it > > > > > doing > > > > > > > so. > > > > > > > > > The only other thing after this call is HasJoinedForTesting() > > which > > > is > > > > > > > > > self-sufficient. > > > > > > > > > > > > > > > > > > Everything in the scheduler is designed to do exactly what it > > needs > > > to > > > > > be > > > > > > > > safe. > > > > > > > > > Not more, not less. > > > > > > > > > > > > > > > > > > Adding Release semantics here implicitly encodes a requirement > > which > > > > > > doesn't > > > > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely > > cares > > > > for > > > > > a > > > > > > > > > sequence ;-). > > > > > > > > > > > > > > > > In light of the atomic ops discussion, any strong opinion to > remove > > > the > > > > > > > barrier > > > > > > > > here? > > > > > > > > > > > > > > From the conclusion we derived, I think the inc/decrements should > use > > > > > Barriers > > > > > > > and the read here should use Acquire_Load(). I'm still not convinced > > > that > > > > > this > > > > > > > is required when dealing with a single variable (all memory > > re-ordering > > > > > > examples > > > > > > > I've seen use two variables and even a NoBarrier_Increment has to > use > > > RMW > > > > -- > > > > > > > read-modify-write -- semantics which have an implicit "barrier" for > at > > > > least > > > > > > > that single variable (but would allow memory re-ordering of other > > memory > > > > > > regions > > > > > > > with respect to it?), i.e. maybe we could get away with > > > > NoBarrier_Increment > > > > > > and > > > > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty > convinced > > we > > > > > need > > > > > > > the Acquire_Load() though as the read doesn't have an implicit RMW > > like > > > > the > > > > > > > increments do (don't understand when Release_Load() is appropriate > -- > > > > C++11 > > > > > > > doesn't provide it FWIW). > > > > > > > > > > > > I've my understanding is right > > > > > > > > > > We have different understandings on a few things (inline) > > > > > > > > > > >, the atomic ops are guaranteed to provide > > > > > > atomicity in operations. The barriers control visibility. > > > > > > > > > > > > So IIUC, it's like this > > > > > > > > > > > > [Atomic Operation] > > > > > > > > > > > > NoBarrier_AtomicIncrement of the same atomic variable: > > > > > > Thread A [In-place Atomic Increment] [In-place Atomic > Increment] > > > > > > Thread B [In-place Atomic Increment] [In-place Atomic > Increment] > > > > > > > > > > > > Result Case 1 0->1 1->2 > > > > > > Result Case 2 1->2 2->3 > > > > > > (And more) > > > > > > > > > > > > The value is atomic (we aren't reading in the middle of another thread > > > > > writing). > > > > > > The visibility is up to the processor. > > > > > > > > > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual > > > > > > exclusion) > > > > > > > > > > > > Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place > > > Atomic > > > > > > Increment + B] > > > > > > Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place > > > Atomic > > > > > > Increment + B] > > > > > > Result Case 1 A: 0->1 3->4 > > > > > > B: 1->2 2->3 > > > > > > > > > > > > Result Case 2 A: 1->2 3->4 > > > > > > B: 0->1 2->3 > > > > > > (...) > > > > > > > > > > > > Non-atomic increment would separate read-increment-write steps, which > > for > > > > word > > > > > > sized data types, would lead to similar behavior as > > > > NoBarrier_AtomicIncrement. > > > > > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > > > sized > > > > > data types (the whole point of atomic RMW is to make sure to do so as a > > > single > > > > > transaction). > > > > > > > > > > > For sizes larger than a word, we could see intermediate reads and > > writes. > > > > > > > > > > > > So that indeed suggests I need a barrier at the increment and > > decrements. > > > > > > > > > > I think you need a barrier so that the result of the increment is > > "flushed" > > > > > (guaranteed to not be reordered after the next sequential read). > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess > > the > > > > > > question comes down when do I want to sync the world > > > > > > > > > > > > Acquire [Sync] Read > > > > > > Release Read [Sync] > > > > > > > > > > > > Given that this count could be changing any time, practically > speaking, > > it > > > > > > doesn't seem to matter. If we get a 0, it could be 1 right after the > > read > > > in > > > > > > either case. The other thing to consider is the memory around the > value. > > > If > > > > > that > > > > > > needs to be visible too, then you'll also need a barrier (e.g. a > certain > > > > > > variable A is set to a value iff a variable B is 0). > > > > > > > > > > I think we do care about syncing before (Acquire), otherwise a thread > > could > > > > see > > > > > 0 forever. It's not relevant that the count could go up right after > check > > > (we > > > > > explicitly support that by sending a WakeUp which would result in a > sleep > > > > no-op > > > > > IIRC). What's relevant is that we don't see a value that's outdated > beyond > > > the > > > > > range we explicitly protect ourselves against. > > > > > > > > > > > A Release_Store ensures that setting A and B in concert is visible > > > > immediately > > > > > > after B is set. > > > > > > > > > > I don't think we care about that. Nothing depends on being synchronized > > with > > > > the > > > > > variable at stake here, adding the Release adds an implicit dependency > > which > > > > we > > > > > don't document and that's no good for readability (and a potential > burden > > if > > > > > someone starts depending on it because it happens to be there). Every > > > > > synchronization in the scheduler is explicit and intentional, let's not > > add > > > > one > > > > > for the sake of "why not". > > > > > > > > > > > > > > > > > In this case of a read, the other values are synchronized > > > > > > > > > > What do you mean by "the other values are synchronized on read"? Which > > other > > > > > values and why and why does it matter? > > > > > > > > > > >, so it doesn't matter > > > > > > as much if we sync now or later. With this analysis here, I think > > arguably > > > > we > > > > > > can use NoBarrier_Load since with the increment barriers, the value is > > > > visible > > > > > > everywhere. > > > > > > > > > > > > > > Hmmm separating read-increment-write steps would be wrong even for word > > > sized > > > > data types (the whole point of atomic RMW is to make sure to do so as a > > single > > > > transaction). > > > > > > > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look > > > > indistinguishable from atomic ops. > > > > > > > > > I think you need a barrier so that the result of the increment is > > "flushed" > > > > (guaranteed to not be reordered after the next sequential read). > > > > Processors and compilers guarantee that single-threaded behavior is > > preserved > > > > regardless of the ordering game within a single processor. When multiple > > > > processors (and thus threads) are running, that guarantee goes down to the > > > > memory coherence guarantees (which are quite strong on x86 and weak on arm > > for > > > > example). > > > > > > > > > I don't think we care about that. Nothing depends on being synchronized > > with > > > > the > > > > variable at stake here, adding the Release adds an implicit dependency > which > > > we > > > > don't document and that's no good for readability (and a potential burden > if > > > > someone starts depending on it because it happens to be there). Every > > > > synchronization in the scheduler is explicit and intentional, let's not > add > > > one > > > > for the sake of "why not". > > > > > > > > That's correct. Since all of the other values are already sync'ed via > locks > > or > > > > are only modified on one thread, we're generally okay. > > > > > > > > > I think we do care about syncing before (Acquire), otherwise a thread > > could > > > > see > > > > 0 forever. It's not relevant that the count could go up right after check > > (we > > > > explicitly support that by sending a WakeUp which would result in a sleep > > > no-op > > > > IIRC). What's relevant is that we don't see a value that's outdated beyond > > the > > > > range we explicitly protect ourselves against. > > > > > > > > In the Barrier_AtomicIncrement world, a thread won't always see zero > because > > > the > > > > increments are barriered. The memory address in cache is flushed and > > > invalidated > > > > on the other cores. > > > > > > I'm not convinced this is true. The RMW will add an lfence (Acquire) on its > > read > > > and an rfence (Release) on its write *for that thread*. That doesn't mean > that > > > other threads will see the value (they need an lfence (Acquire)) to be > forced > > to > > > fetch it. > > > > > > In any case, even if NoBarrier was correct here, it's not worth optimizing > > > CanDetach() as it's running when the thread is idle anyways so might as well > > add > > > the lfence (Acquire). > > > > > > > > > > > > What do you mean by "the other values are synchronized on read"? Which > > other > > > > values and why and why does it matter? > > > > I mean Release_Read. It doesn't matter in this case now that we're using > > > > barriers for the counters. > > > > > > > Digging into this deeper, the RMW will happen atomically with atomicops (lock > > xadd for x86, and a retry loop for arm to make sure the memory written is as > > expected). My operating assumption above on visibility incorrectly assumed a > > non-atomic RMW. > > > > The fence is to make sure that state is updated in a consistent way. Because > > memory ops can be reordered, things will appear just fine on the current > thread, > > but could appear strange to outside threads. Fencing ensures that memory > updates > > from a single thread appear in the correct order for other threads. > > > > Using the refcounted object as an example... > > When a refcounted object addrefs, the refcount doesn't really matter to an > > external thread except for the fact that it is non-zero. > > > > When refcounting down, the comment says to use a barrier to make sure the > "state > > written before the reference count became zero will be visible to a thread > that > > has just made the count zero". This comment makes no sense to me because from > > the same thread, all memory operations on the same thread are immediately > > visible on the same thread (even with reordering, processors work hard to > ensure > > that single-threaded behaviors still work correctly). > > > > However, if considering that someone knew they had the last ref (already > > problematic, but that's a separate issue), and they set a boolean > > fReleasedLastRef, this could potentially be useful to another thread expecting > > to see fReleasedLastRef and an object of refcount 0. > > Refcount is different, no one ever wants to just "read" the value (well except > in say AtomicRefCountIsOne() which does us Acquire_Load() as I think we need to > here). > > The Barrier on decrement (and that comment) IMO mean that: although the memory > order doesn't matter, whoever reaches 0 (and thus falls last in the relaxed > memory order) should sync its state for *other variables* so that whatever it > deletes (likely action of getting to 0) is the most recent version (and it's > probably guaranteed to see what other threads have done to *other variables* > because their RMW "happened before" in the relaxed memory order per this thread > being the one hitting 0 -- though since it hit 0 last I would assume even a > NoBarrier RMW would force the lfence on the R to sync...). Fenced the load.
The CQ bit was checked by robliao@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm, thanks. Keep an eye on https://codereview.chromium.org/2163753004/ to fix lock+bool usage after it lands.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by robliao@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from fdoray@chromium.org Link to the patchset: https://codereview.chromium.org/2116163002/#ps440001 (title: "Fenced Load")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool BUG=553459 ========== to ========== Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool BUG=553459 ==========
Message was sent while issue was closed.
Committed patchset #12 (id:440001)
Message was sent while issue was closed.
Description was changed from ========== Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool BUG=553459 ========== to ========== Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool BUG=553459 Committed: https://crrev.com/8a4029212fd36fdd8befc9479d6e2966bd27b95e Cr-Commit-Position: refs/heads/master@{#407205} ==========
Message was sent while issue was closed.
Patchset 12 (id:??) landed as https://crrev.com/8a4029212fd36fdd8befc9479d6e2966bd27b95e Cr-Commit-Position: refs/heads/master@{#407205}
Message was sent while issue was closed.
https://codereview.chromium.org/2116163002/diff/440001/base/task_scheduler/sc... File base/task_scheduler/scheduler_worker_pool_impl.h (right): https://codereview.chromium.org/2116163002/diff/440001/base/task_scheduler/sc... base/task_scheduler/scheduler_worker_pool_impl.h:169: bool worker_detachment_allowed_ = true; https://codereview.chromium.org/2163753004/ landed, let's follow-up with using an AtomicFlag for this, thanks! |