Issue 2116163002: Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool

robliao

robliao@chromium.org changed reviewers: + fdoray@chromium.org, gab@chromium.org

4 years, 5 months ago (2016-07-02 00:07:34 UTC) #1

robliao

Description was changed from ========== Enable Lazy Creation and Thread Detachment in the Scheduler Worker ...

4 years, 5 months ago (2016-07-02 00:22:59 UTC) #3

fdoray

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode525 base/task_scheduler/scheduler_worker_pool_impl.cc:525: return suggested_reclaim_time_; From scheduler_worker.cc: while (...) { scoped_refptr<Sequence> sequence ...

4 years, 5 months ago (2016-07-04 19:41:33 UTC) #4

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:525: return
suggested_reclaim_time_;
From scheduler_worker.cc:

while (...) {
  scoped_refptr<Sequence> sequence = outer_->delegate_->GetWork(outer_);
  if (!sequence) {
    if (outer_->delegate_->CanDetach(outer_)) {
      detached_thread = outer_->Detach();
      ...
    }
    WaitForWork();
    continue;
  }
  ...
}

What prevents the worker from detaching as soon as GetWork() returns nullptr
(i.e. without staying idle for |suggested_reclaim_time_|)?

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
Add comment explaining why it doesn't matter if |num_single_threaded_runners_|
is incremented right after this (since no task posted to the newly created
single-threaded TaskRunner, there is no need to keep the current "physical"
thread).

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:608: if
(!idle_workers_stack_.Contains(worker))
A detached worker is only re-created when its WakeUp() method is called, right?
WakeUp() can be called from 2 places:

A)
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
A single-threaded task is posted to the worker. The next time that GetWork() is
called, the worker will be removed from the idle stack because work is available
(no other worker can run the single-threaded task).

B)
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
A worker is woken up to run a shared task. Right before this is done, the worker
is popped from the idle stack.

I don't see how detachment makes it possible to try to add the same worker to
the idle stack twice.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:99: void
UnregisterSingleThreadTaskRunner(SchedulerWorker* worker);
I would would be better if these methods weren't here:
- They aren't meant to be called from outside of scheduler_worker_pool_impl.cc
- They don't access any member of SchedulerWorkerPoolImpl

Could you:
- Delete these methods.
- Make SchedulerSingleThreadTaskRunner a nested class of
SchedulerWorkerPoolImpl.
- Call SchedulerWorkerDelegateImpl::RegisterSingleThreadTaskRunner directly from
SchedulerSingleThreadTaskRunner.

class SchedulerWorkerPoolImpl::SchedulerSingleThreadTaskRunner {
 public:
  SchedulerSingleThreadTaskRunner(..., SchedulerWorker* worker) {
    auto delegate =
        static_cast<SchedulerWorkerDelegateImpl*>(worker->delegate());
    delegate->RegisterSingleThreadTaskRunner();
  }
  ...
};

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent instead of
a SchedulerLock+bool.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: 
Test that no detachment occurs when suggested reclaim time is TimeDelta::Max?

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498:
waiter_.Signal();
Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might not
have reached line 457 yet when Reset() is callled.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues,
Nothing prevents all CountZeroTlsValue tasks from running on the same thread
(the Wait task posted at line 522 doesn't have to run on the same thread as the
CountZeroTlsValue task). Maybe you should move the Signal+Wait inside
CountZeroTlsValue?

gab

lg, a single comment for now and +1 to Francois'. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 ...

4 years, 5 months ago (2016-07-07 15:58:31 UTC) #5

robliao

Thanks for the review! https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode525 base/task_scheduler/scheduler_worker_pool_impl.cc:525: return suggested_reclaim_time_; On 2016/07/04 19:41:32, ...

4 years, 5 months ago (2016-07-07 17:49:17 UTC) #6

Thanks for the review!

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:525: return
suggested_reclaim_time_;
On 2016/07/04 19:41:32, fdoray wrote:
> From scheduler_worker.cc:
> 
> while (...) {
>   scoped_refptr<Sequence> sequence = outer_->delegate_->GetWork(outer_);
>   if (!sequence) {
>     if (outer_->delegate_->CanDetach(outer_)) {
>       detached_thread = outer_->Detach();
>       ...
>     }
>     WaitForWork();
>     continue;
>   }
>   ...
> }
> 
> What prevents the worker from detaching as soon as GetWork() returns nullptr
> (i.e. without staying idle for |suggested_reclaim_time_|)?

Nice catch! Nothing at the moment. Fixed with an idle cycle bool tracked in
CanDetach()

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/07 15:58:30, gab wrote:
> Comment
Done.

> Why do you need Release_Load() here?
Since CanDetach may make decisions based off of surrounding state from the
single threaded task runners (like idle thread accounting), I figured it would
be robust to make sure everything is updated at this point.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:608: if
(!idle_workers_stack_.Contains(worker))
On 2016/07/04 19:41:32, fdoray wrote:
> A detached worker is only re-created when its WakeUp() method is called,
right?
> WakeUp() can be called from 2 places:
> 
> A)
>
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
> A single-threaded task is posted to the worker. The next time that GetWork()
is
> called, the worker will be removed from the idle stack because work is
available
> (no other worker can run the single-threaded task).
> 
> B)
>
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
> A worker is woken up to run a shared task. Right before this is done, the
worker
> is popped from the idle stack.
> 
> I don't see how detachment makes it possible to try to add the same worker to
> the idle stack twice.

Prior to this change, GetWork had two preconditions:
1) There may be shared work to be done (The worker has been removed from the
stack via WakeUpOneWorker).
2) There is single-threaded work to be done (The worker will remove itself from
the stack).

With sleep timeouts, this precondition is no longer true since we may have been
woken up by the timeout. This means we'll call GetWork while we're idle, note
that there's no work to be done, and then readd ourselves back onto the idle
workers stack.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:99: void
UnregisterSingleThreadTaskRunner(SchedulerWorker* worker);
On 2016/07/04 19:41:32, fdoray wrote:
> I would would be better if these methods weren't here:
> - They aren't meant to be called from outside of scheduler_worker_pool_impl.cc
> - They don't access any member of SchedulerWorkerPoolImpl
> 
> Could you:
> - Delete these methods.
> - Make SchedulerSingleThreadTaskRunner a nested class of
> SchedulerWorkerPoolImpl.
> - Call SchedulerWorkerDelegateImpl::RegisterSingleThreadTaskRunner directly
from
> SchedulerSingleThreadTaskRunner.
> 
> class SchedulerWorkerPoolImpl::SchedulerSingleThreadTaskRunner {
>  public:
>   SchedulerSingleThreadTaskRunner(..., SchedulerWorker* worker) {
>     auto delegate =
>         static_cast<SchedulerWorkerDelegateImpl*>(worker->delegate());
>     delegate->RegisterSingleThreadTaskRunner();
>   }
>   ...
> };

Done.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/04 19:41:32, fdoray wrote:
> Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent instead
of
> a SchedulerLock+bool.

Here, that seems like an abuse of the WaitableEvent since we don't intend to
wait on it. Maybe we should be using an atomic instead in those cases?

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: 
On 2016/07/04 19:41:33, fdoray wrote:
> Test that no detachment occurs when suggested reclaim time is TimeDelta::Max?

If we could test this, this would be better handled in the SchedulerWorkerTest.
In either case, I'm not sure how we could test this since it depends on an
indefinite waiting period. The test would have to check the state at an
arbitrary time, and if the component was indeed adversarial, it could simply
detach after the arbitrary time.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498:
waiter_.Signal();
On 2016/07/04 19:41:33, fdoray wrote:
> Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might
not
> have reached line 457 yet when Reset() is callled.

The tricky bit here is that we can't wait for all idle workers because they
won't ever be idle if they're waiting for  the signal. The WaitForAllTasksToRun
in the for loop is the best current guarantee short of doing something
semaphore-ish to make sure the thread is occupied before dispatching the next
set.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues,
On 2016/07/04 19:41:32, fdoray wrote:
> Nothing prevents all CountZeroTlsValue tasks from running on the same thread
> (the Wait task posted at line 522 doesn't have to run on the same thread as
the
> CountZeroTlsValue task). Maybe you should move the Signal+Wait inside
> CountZeroTlsValue?

Doesn't WaitForAllTasksToRun with a waiting task also help provide this
guarantee?

fdoray

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode608 base/task_scheduler/scheduler_worker_pool_impl.cc:608: if (!idle_workers_stack_.Contains(worker)) On 2016/07/07 17:49:16, robliao wrote: > On ...

4 years, 5 months ago (2016-07-07 20:45:57 UTC) #7

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:608: if
(!idle_workers_stack_.Contains(worker))
On 2016/07/07 17:49:16, robliao wrote:
> On 2016/07/04 19:41:32, fdoray wrote:
> > A detached worker is only re-created when its WakeUp() method is called,
> right?
> > WakeUp() can be called from 2 places:
> > 
> > A)
> >
>
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
> > A single-threaded task is posted to the worker. The next time that GetWork()
> is
> > called, the worker will be removed from the idle stack because work is
> available
> > (no other worker can run the single-threaded task).
> > 
> > B)
> >
>
https://cs.chromium.org/chromium/src/base/task_scheduler/scheduler_worker_poo...
> > A worker is woken up to run a shared task. Right before this is done, the
> worker
> > is popped from the idle stack.
> > 
> > I don't see how detachment makes it possible to try to add the same worker
to
> > the idle stack twice.
> 
> Prior to this change, GetWork had two preconditions:
> 1) There may be shared work to be done (The worker has been removed from the
> stack via WakeUpOneWorker).
> 2) There is single-threaded work to be done (The worker will remove itself
from
> the stack).
> 
> With sleep timeouts, this precondition is no longer true since we may have
been
> woken up by the timeout. This means we'll call GetWork while we're idle, note
> that there's no work to be done, and then readd ourselves back onto the idle
> workers stack.

You're right. Thanks!

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/07 17:49:16, robliao wrote:
> On 2016/07/04 19:41:32, fdoray wrote:
> > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent
instead
> of
> > a SchedulerLock+bool.
> 
> Here, that seems like an abuse of the WaitableEvent since we don't intend to
> wait on it. Maybe we should be using an atomic instead in those cases?

It would be nice to have an AtomicBool class. Since we don't have it yet, you
can keep this as-is.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:449: 
On 2016/07/07 17:49:16, robliao wrote:
> On 2016/07/04 19:41:33, fdoray wrote:
> > Test that no detachment occurs when suggested reclaim time is
TimeDelta::Max?
> 
> If we could test this, this would be better handled in the
SchedulerWorkerTest.
> In either case, I'm not sure how we could test this since it depends on an
> indefinite waiting period. The test would have to check the state at an
> arbitrary time, and if the component was indeed adversarial, it could simply
> detach after the arbitrary time. 

Acknowledged.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498:
waiter_.Signal();
On 2016/07/07 17:49:16, robliao wrote:
> On 2016/07/04 19:41:33, fdoray wrote:
> > Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task might
> not
> > have reached line 457 yet when Reset() is callled.
> 
> The tricky bit here is that we can't wait for all idle workers because they
> won't ever be idle if they're waiting for  the signal. The
WaitForAllTasksToRun
> in the for loop is the best current guarantee short of doing something
> semaphore-ish to make sure the thread is occupied before dispatching the next
> set.

If Signal() stays before WaitForAllWorkersIdleForTesting(), all workers should
eventually become idle? I think Reset() should be moved after
WaitForAllWorkersIdleForTesting() to avoid having a task that doesn't call
Wait() before Reset() is called (and thus blocks forever when it does call
Wait()).

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues,
On 2016/07/07 17:49:16, robliao wrote:
> On 2016/07/04 19:41:32, fdoray wrote:
> > Nothing prevents all CountZeroTlsValue tasks from running on the same thread
> > (the Wait task posted at line 522 doesn't have to run on the same thread as
> the
> > CountZeroTlsValue task). Maybe you should move the Signal+Wait inside
> > CountZeroTlsValue?
> 
> Doesn't WaitForAllTasksToRun with a waiting task also help provide this
> guarantee?

Line 486: You create factories with PARALLEL TaskRunners.

Loop iteration 1:
Line 511: Task runs on thread A.
Lines 515 and 518: Tasks run on thread B (nothing prevents these tasks from
running on another thread than A).

Loop iteration 2:
Line 511: Task runs on thread A (again).
[since two CountZeroTlsValues tasks ran on the same thread, there is a thread
that won't be inspected]

robliao

Updated. Thanks! https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl_unittest.cc#newcode498 base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498: waiter_.Signal(); On 2016/07/07 20:45:57, fdoray wrote: > ...

4 years, 5 months ago (2016-07-08 17:25:50 UTC) #8

Updated. Thanks!

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:498:
waiter_.Signal();
On 2016/07/07 20:45:57, fdoray wrote:
> On 2016/07/07 17:49:16, robliao wrote:
> > On 2016/07/04 19:41:33, fdoray wrote:
> > > Should the Reset() be after WaitForAllWorkersIdleForTesting()? A task
might
> > not
> > > have reached line 457 yet when Reset() is callled.
> > 
> > The tricky bit here is that we can't wait for all idle workers because they
> > won't ever be idle if they're waiting for  the signal. The
> WaitForAllTasksToRun
> > in the for loop is the best current guarantee short of doing something
> > semaphore-ish to make sure the thread is occupied before dispatching the
next
> > set.
> 
> If Signal() stays before WaitForAllWorkersIdleForTesting(), all workers should
> eventually become idle? I think Reset() should be moved after
> WaitForAllWorkersIdleForTesting() to avoid having a task that doesn't call
> Wait() before Reset() is called (and thus blocks forever when it does call
> Wait()).

I initially misread this. Yup, let's move Reset after the wait for idle.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:513:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValues,
On 2016/07/07 20:45:57, fdoray wrote:
> On 2016/07/07 17:49:16, robliao wrote:
> > On 2016/07/04 19:41:32, fdoray wrote:
> > > Nothing prevents all CountZeroTlsValue tasks from running on the same
thread
> > > (the Wait task posted at line 522 doesn't have to run on the same thread
as
> > the
> > > CountZeroTlsValue task). Maybe you should move the Signal+Wait inside
> > > CountZeroTlsValue?
> > 
> > Doesn't WaitForAllTasksToRun with a waiting task also help provide this
> > guarantee?
> 
> Line 486: You create factories with PARALLEL TaskRunners.
> 
> Loop iteration 1:
> Line 511: Task runs on thread A.
> Lines 515 and 518: Tasks run on thread B (nothing prevents these tasks from
> running on another thread than A).
> 
> Loop iteration 2:
> Line 511: Task runs on thread A (again).
> [since two CountZeroTlsValues tasks ran on the same thread, there is a thread
> that won't be inspected]

Ah yes, indeed. I've now moved both waitable events into CountZeroTlsValues.

fdoray

lgtm with nits and something that could be addressed in a separate CL if you ...

4 years, 5 months ago (2016-07-08 18:58:07 UTC) #9

lgtm with nits and something that could be addressed in a separate CL if you
prefer

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker
performed and idle cycle. Workers start out idle.
an* idle cycle

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if
num_single_threaded_runners_ is incremented after this
|num_single_threaded_runners_|

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach =
performed_idle_cycle_ &&
const bool

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:534:
!outer_->HasJoinedForTesting();
Sadly this is racy:
- Thread A: SchedulerWorker::CanDetach() - returns true.
- Thread B: SchedulerWorkerPoolImpl::JoinForTesting()
- Thread B: SchedulerWorker::JoinForTesting()
- Thread B: Acquires SchedulerWorker::thread_lock_
- Thread A: SchedulerWorker::Detach()
- Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds.
- Thread A: Tries to acquire SchedulerWorker::thread_lock_.
Dead lock between the join and the acquisition of SchedulerWorker::thread_lock_

I think SchedulerWorker::JoinForTesting() could be rewritten like this:

void SchedulerWorker::JoinForTesting() {
  {
    AutoSchedulerLock auto_lock(should_exit_for_testing_lock_);
    should_exit_for_testing_ = true;
  }
  WakeUp();

  std::unique_ptr<Thread> thread;

  {
    AutoSchedulerLock auto_lock(thread_lock_);
    thread = std::move(thread_);
  }
  
  // Join without holding the lock!
  thread->Join();
}

and SchedulerWorker::Detach() like this:

// Returns nullptr if called during thread->Join(). The thread won't
// detach. It will exit because !outer_->ShouldExitForTesting().
std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() {
  DCHECK(!ShouldExitForTesting()) << "Worker was already joined";
  AutoSchedulerLock auto_lock(thread_lock_);
  // If a wakeup is pending, then a WakeUp() came in while we were deciding to
  // detach. This means we can't go away anymore since we would break the
  // guarantee that we call GetWork() after a successful wakeup.
  return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_);
}

To remove the requirement that Delegate::CanDetach() must return false when the
thread may be joined.

Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()?

Let me know if my reasoning makes sense.

robliao

Went with test controlled detachment to fix the race. Thanks! https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode183 ...

4 years, 5 months ago (2016-07-08 21:17:03 UTC) #15

Went with test controlled detachment to fix the race. Thanks!

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker
performed and idle cycle. Workers start out idle.
On 2016/07/08 18:58:06, fdoray wrote:
> an* idle cycle

Done.

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if
num_single_threaded_runners_ is incremented after this
On 2016/07/08 18:58:06, fdoray wrote:
> |num_single_threaded_runners_|

Done.

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach =
performed_idle_cycle_ &&
On 2016/07/08 18:58:07, fdoray wrote:
> const bool

Done.

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:534:
!outer_->HasJoinedForTesting();
On 2016/07/08 18:58:07, fdoray wrote:
> Sadly this is racy:
> - Thread A: SchedulerWorker::CanDetach() - returns true.
> - Thread B: SchedulerWorkerPoolImpl::JoinForTesting()
> - Thread B: SchedulerWorker::JoinForTesting()
> - Thread B: Acquires SchedulerWorker::thread_lock_
> - Thread A: SchedulerWorker::Detach()
> - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds.
> - Thread A: Tries to acquire SchedulerWorker::thread_lock_.
> Dead lock between the join and the acquisition of
SchedulerWorker::thread_lock_
> 
> I think SchedulerWorker::JoinForTesting() could be rewritten like this:
> 
> void SchedulerWorker::JoinForTesting() {
>   {
>     AutoSchedulerLock auto_lock(should_exit_for_testing_lock_);
>     should_exit_for_testing_ = true;
>   }
>   WakeUp();
> 
>   std::unique_ptr<Thread> thread;
> 
>   {
>     AutoSchedulerLock auto_lock(thread_lock_);
>     thread = std::move(thread_);
>   }
>   
>   // Join without holding the lock!
>   thread->Join();
> }
> 
> and SchedulerWorker::Detach() like this:
> 
> // Returns nullptr if called during thread->Join(). The thread won't
> // detach. It will exit because !outer_->ShouldExitForTesting().
> std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() {
>   DCHECK(!ShouldExitForTesting()) << "Worker was already joined";
>   AutoSchedulerLock auto_lock(thread_lock_);
>   // If a wakeup is pending, then a WakeUp() came in while we were deciding to
>   // detach. This means we can't go away anymore since we would break the
>   // guarantee that we call GetWork() after a successful wakeup.
>   return thread_ && thread_->IsWakeUpPending() ? nullptr : std::move(thread_);
> }
> 
> To remove the requirement that Delegate::CanDetach() must return false when
the
> thread may be joined.
> 
> Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()?
> 
> Let me know if my reasoning makes sense.

Yup. This became the first test that flew in the face of this comment in
SchedulerWorker:
// Normally holding a lock and joining is dangerous. However, since this is
// only for testing, we're okay since the only scenario that could impact this
// is a call to Detach, which is disallowed by having the delegate always
// return false for the CanDetach call.

Because we have no control over the delegate from the test, I added
HasJoinedForTesting because of the race the other way: Thread wakes up after an
idle cycle and attempts to detach after the ShouldExitForTesting check. Also
long as the joins occur before the detach timeout we should be okay.

Alternatively, we can wait until after the detach timeout occurs as well.

I'm not sure we want to go with the suggested code because it walks down a
different problematic road. If we were to make SchedulerWorker robust to Join
and Detach races, now there's no guarantee that the current thread is cleaned up
after Join since it could be in the middle of detaching, which also seems
undesirable for tests that care about that.

It seems that tension comes from having no control on the delegate, so I've
added a disallow control to address this.

Multithreaded code is tricky indeed!

robliao

The CQ bit was checked by robliao@chromium.org to run a CQ dry run

4 years, 5 months ago (2016-07-11 17:44:08 UTC) #16

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2116163002/160001

4 years, 5 months ago (2016-07-11 17:44:42 UTC) #17

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 5 months ago (2016-07-11 19:42:23 UTC) #18

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 5 months ago (2016-07-11 19:42:23 UTC) #19

robliao

On 2016/07/08 21:17:03, robliao wrote: > Went with test controlled detachment to fix the race. ...

4 years, 5 months ago (2016-07-12 19:58:08 UTC) #20

On 2016/07/08 21:17:03, robliao wrote:
> Went with test controlled detachment to fix the race. Thanks!
> 
>
https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
> 
>
https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
> base/task_scheduler/scheduler_worker_pool_impl.cc:183: // True if the worker
> performed and idle cycle. Workers start out idle.
> On 2016/07/08 18:58:06, fdoray wrote:
> > an* idle cycle
> 
> Done.
> 
>
https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
> base/task_scheduler/scheduler_worker_pool_impl.cc:528: // It's not an issue if
> num_single_threaded_runners_ is incremented after this
> On 2016/07/08 18:58:06, fdoray wrote:
> > |num_single_threaded_runners_|
> 
> Done.
> 
>
https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
> base/task_scheduler/scheduler_worker_pool_impl.cc:531: bool can_detach =
> performed_idle_cycle_ &&
> On 2016/07/08 18:58:07, fdoray wrote:
> > const bool
> 
> Done.
> 
>
https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
> base/task_scheduler/scheduler_worker_pool_impl.cc:534:
> !outer_->HasJoinedForTesting();
> On 2016/07/08 18:58:07, fdoray wrote:
> > Sadly this is racy:
> > - Thread A: SchedulerWorker::CanDetach() - returns true.
> > - Thread B: SchedulerWorkerPoolImpl::JoinForTesting()
> > - Thread B: SchedulerWorker::JoinForTesting()
> > - Thread B: Acquires SchedulerWorker::thread_lock_
> > - Thread A: SchedulerWorker::Detach()
> > - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds.
> > - Thread A: Tries to acquire SchedulerWorker::thread_lock_.
> > Dead lock between the join and the acquisition of
> SchedulerWorker::thread_lock_
> > 
> > I think SchedulerWorker::JoinForTesting() could be rewritten like this:
> > 
> > void SchedulerWorker::JoinForTesting() {
> >   {
> >     AutoSchedulerLock auto_lock(should_exit_for_testing_lock_);
> >     should_exit_for_testing_ = true;
> >   }
> >   WakeUp();
> > 
> >   std::unique_ptr<Thread> thread;
> > 
> >   {
> >     AutoSchedulerLock auto_lock(thread_lock_);
> >     thread = std::move(thread_);
> >   }
> >   
> >   // Join without holding the lock!
> >   thread->Join();
> > }
> > 
> > and SchedulerWorker::Detach() like this:
> > 
> > // Returns nullptr if called during thread->Join(). The thread won't
> > // detach. It will exit because !outer_->ShouldExitForTesting().
> > std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() {
> >   DCHECK(!ShouldExitForTesting()) << "Worker was already joined";
> >   AutoSchedulerLock auto_lock(thread_lock_);
> >   // If a wakeup is pending, then a WakeUp() came in while we were deciding
to
> >   // detach. This means we can't go away anymore since we would break the
> >   // guarantee that we call GetWork() after a successful wakeup.
> >   return thread_ && thread_->IsWakeUpPending() ? nullptr :
std::move(thread_);
> > }
> > 
> > To remove the requirement that Delegate::CanDetach() must return false when
> the
> > thread may be joined.
> > 
> > Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()?
> > 
> > Let me know if my reasoning makes sense.
> 
> Yup. This became the first test that flew in the face of this comment in
> SchedulerWorker:
> // Normally holding a lock and joining is dangerous. However, since this is
> // only for testing, we're okay since the only scenario that could impact this
> // is a call to Detach, which is disallowed by having the delegate always
> // return false for the CanDetach call.
> 
> Because we have no control over the delegate from the test, I added
> HasJoinedForTesting because of the race the other way: Thread wakes up after
an
> idle cycle and attempts to detach after the ShouldExitForTesting check. Also
> long as the joins occur before the detach timeout we should be okay.
> 
> Alternatively, we can wait until after the detach timeout occurs as well.
> 
> I'm not sure we want to go with the suggested code because it walks down a
> different problematic road. If we were to make SchedulerWorker robust to Join
> and Detach races, now there's no guarantee that the current thread is cleaned
up
> after Join since it could be in the middle of detaching, which also seems
> undesirable for tests that care about that.
> 
> It seems that tension comes from having no control on the delegate, so I've
> added a disallow control to address this.
> 
> Multithreaded code is tricky indeed!

No further comments?

robliao

The CQ bit was checked by robliao@chromium.org to run a CQ dry run

4 years, 5 months ago (2016-07-13 17:07:56 UTC) #22

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2116163002/160001

4 years, 5 months ago (2016-07-13 17:08:42 UTC) #23

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 5 months ago (2016-07-13 18:21:48 UTC) #24

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/103294)

4 years, 5 months ago (2016-07-13 18:21:49 UTC) #25

gab

lg, full review this time, comments below https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && ...

4 years, 5 months ago (2016-07-13 18:36:32 UTC) #26

lg, full review this time, comments below

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
> > Why do you need Release_Load() here?
> Since CanDetach may make decisions based off of surrounding state from the
> single threaded task runners (like idle thread accounting), I figured it would
> be robust to make sure everything is updated at this point.

Hmmm, I don't think this matters, idle thread accounting is done on its own (and
is in fact checked before the ReleaseLoad here anyways). All checks are
self-sufficient so this one is really only about |num_single_threaded_runners|
IMO.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:211:
subtle::NoBarrier_AtomicIncrement(&num_single_threaded_runners_, 1);
Let's wait until Dana/Francois settle the barrier/no-barrier discussion @
https://codereview.chromium.org/2019763002/diff/140001/base/task_scheduler/ta...

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool
performed_idle_cycle_ = true;
is_idle_ ? Then it's true whether it's a new worker or because it became idle?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:273: "Workers can detach
during join.";
CanWorker...() versus "Workers" (plural versus singular forms). I think it
should be plural everywhere?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:399: SchedulerWorker* worker)
missing space: git cl format?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:447: performed_idle_cycle_ =
true;
This is redundant per the value being initialized to true.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:506: performed_idle_cycle_ =
false;
is_idle_ = false;

with no comments is sufficient IMO.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ =
true;
Shouldn't is_idle_ instead be set when about to return no work from GetWork()?
It seems weird to me to check |is_idle_| and then set it to true after?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:561:
worker_detachment_allowed_(true),
If keeping the bool (ref. WaitableEvent comment), initialize it at declaration
site.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:616: // Detachment may cause
multiple attempts to add because the delegate cannot
Shouldn't a thread that wakes up after it's sleep timeout for detachment detach
without bothering to GetWork()? I guess we need to check GetWork() in the racy
event that the thread was actually woken for work but happens to only get a CPU
slot after its detach deadline? And hence we need this? (just making sure I
follow :-)!)

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:88: void
DisallowWorkerDetachmentForTesting();
It seems weird to me that something that is "recommended" not be highlighted
more than this. I see though that JoinForTesting() ensures this is called (but
it can't fully confirm it was called early enough). So perhaps tweak the comment
to something like:

// Disallows worker thread detachment. In tests, this must be called before the
last synchronization point with all worker threads (e.g. before having them sync
on the last WaitableEvent or even before dispatching the first set of work)
before calling JoinForTesting() or detach/join races may ensue.

Actually, this makes me think, should we not instead get rid of Detach/Join
races? I guess that adds a  bit of test specific synchronization logic to
Detach() in non-test code but since the thread is about to Detach(), it doesn't
need to be efficient and the Join() side is only for testing so wtv.

I think I prefer that to the complexity involved in this required
DisallowWorkerDetachmentForTesting() call.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:177: bool
worker_detachment_allowed_;
Use a manual WaitableEvent instead of bool + lock? (this is what we did in many
other similar scenarios in scheduler)

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:410:
EXPECT_EQ(kMagicTlsValue, reinterpret_cast<size_t>(slot_.Get()));
Instead of using TLS to do an "is this the right thread" check, I think it's
cleaner to use ThreadCheckerImpl.

To set it you can do:

// Re-attach |thread_checker_| to the current thread.
thread_checker_.DetachFromThread();
thread_checker.CalledOnValidThread();

And then simply use CalledOnValidThread() in the check phase.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444:
task_waiter.Reset();
Make the event AUTOMATIC instead of doing Wait()+Reset()

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:447: // Give the
worker pool a chance to reclaim its threads.
It was never forced to create more than one thread above so I don't understand
what this means? IMO to test this you need N blocking tasks (N-1 of which are
not single-threaded). Release the single-threaded one first, then the others.
And confirm that despite that the single-threaded task can still run on the same
thread (i.e. it wasn't among the terminated threads).

Would be nice to verify actual thread count in process too to confirm that
threads actually were created and wound down (confirming that things happened
and single-threaded bound thread was protected).

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait,
Instead of counting lack of TLS values here I think a cleaner approach would be
to build a std::set of PlatformThreadRef in the first phase and check against it
in the second phase (effectively ThreadCheckerImpl but pooled).

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_stack.h (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_stack.h:38: // Removes the top
SchedulerWorker from the stack and returns it.
Doesn't "remove"

robliao

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/13 18:36:31, gab wrote: > > ...

4 years, 5 months ago (2016-07-13 20:19:47 UTC) #28

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/13 18:36:31, gab wrote:
> > > Why do you need Release_Load() here?
> > Since CanDetach may make decisions based off of surrounding state from the
> > single threaded task runners (like idle thread accounting), I figured it
would
> > be robust to make sure everything is updated at this point.
> 
> Hmmm, I don't think this matters, idle thread accounting is done on its own
(and
> is in fact checked before the ReleaseLoad here anyways). All checks are
> self-sufficient so this one is really only about |num_single_threaded_runners|
> IMO.

In either case, it's the safest thing to do here. We want to make sure all state
from before gets synced up to this point.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:211:
subtle::NoBarrier_AtomicIncrement(&num_single_threaded_runners_, 1);
On 2016/07/13 18:36:31, gab wrote:
> Let's wait until Dana/Francois settle the barrier/no-barrier discussion @
>
https://codereview.chromium.org/2019763002/diff/140001/base/task_scheduler/ta...

sgtm. As we are all well aware, atomics are tricky.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool
performed_idle_cycle_ = true;
On 2016/07/13 18:36:31, gab wrote:
> is_idle_ ? Then it's true whether it's a new worker or because it became idle?

The delegate needs to distinguish between a thread that's idle (GetWork() ==
nullptr) and a thread that's idle after a running an idle cycle (Sleeping and
then waking up again with GetWork() == nullptr).

Tracking is_idle_ is insufficient as that makes detachment happen too early
(detachment is checked after the GetWork() call.

We can't check for detachment after wakeup because we can't distinguish between
wakeups and timeouts (and they're racy).

Checking for an idle cycle is equivalent to checking that we got no work,
attempted to check if we can detach, slept, and then woke up again. If we went
through this cycle once, then we can be assured that we should detach in the
next wake up if there's no work.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:273: "Workers can detach
during join.";
On 2016/07/13 18:36:31, gab wrote:
> CanWorker...() versus "Workers" (plural versus singular forms). I think it
> should be plural everywhere?

The act of detaching applies to a single worker, making it singular. A set of
workers can't determine among themselves if detachment is allowed.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:399: SchedulerWorker* worker)
On 2016/07/13 18:36:31, gab wrote:
> missing space: git cl format?

Done.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:447: performed_idle_cycle_ =
true;
On 2016/07/13 18:36:31, gab wrote:
> This is redundant per the value being initialized to true.

The delegate is reused across thread instances, so we can have multiple
MainEntry calls per delegate. The state needs to be reset here.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:506: performed_idle_cycle_ =
false;
On 2016/07/13 18:36:31, gab wrote:
> is_idle_ = false;
> 
> with no comments is sufficient IMO.

sgtm. Removed.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ =
true;
On 2016/07/13 18:36:31, gab wrote:
> Shouldn't is_idle_ instead be set when about to return no work from GetWork()?
> It seems weird to me to check |is_idle_| and then set it to true after?

See is_idle_ comment.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:561:
worker_detachment_allowed_(true),
On 2016/07/13 18:36:31, gab wrote:
> If keeping the bool (ref. WaitableEvent comment), initialize it at declaration
> site.

See earlier comment.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:616: // Detachment may cause
multiple attempts to add because the delegate cannot
On 2016/07/13 18:36:31, gab wrote:
> Shouldn't a thread that wakes up after it's sleep timeout for detachment
detach
> without bothering to GetWork()? I guess we need to check GetWork() in the racy
> event that the thread was actually woken for work but happens to only get a
CPU
> slot after its detach deadline? And hence we need this? (just making sure I
> follow :-)!)

We can't distinguish between sleep timeouts and actual wakeups. They look the
same from the thread's perspective. Add to that the race, and we have to check
GetWork in all cases.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:88: void
DisallowWorkerDetachmentForTesting();
On 2016/07/13 18:36:31, gab wrote:
> It seems weird to me that something that is "recommended" not be highlighted
> more than this. I see though that JoinForTesting() ensures this is called (but
> it can't fully confirm it was called early enough). So perhaps tweak the
comment
> to something like:
> 
> // Disallows worker thread detachment. In tests, this must be called before
the
> last synchronization point with all worker threads (e.g. before having them
sync
> on the last WaitableEvent or even before dispatching the first set of work)
> before calling JoinForTesting() or detach/join races may ensue.
> 
> Actually, this makes me think, should we not instead get rid of Detach/Join
> races? I guess that adds a  bit of test specific synchronization logic to
> Detach() in non-test code but since the thread is about to Detach(), it
doesn't
> need to be efficient and the Join() side is only for testing so wtv.
> 
> I think I prefer that to the complexity involved in this required
> DisallowWorkerDetachmentForTesting() call.

I've amended the comment. Tests that specify a reclaim time of TimeDelta::Max()
don't need to call this function. 

The reason why I think we should avoid fixing the race is it only applies to
tests that allow detachment (and as of the new patchset, it's only the tests
that test for detachment). Given that the test is looking for detachment, it
already has to construct the worker pool where it can inspect state at certain
timed points. 

All other tests can specify TimeDelta::Max() to avoid the issue entirely.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:177: bool
worker_detachment_allowed_;
On 2016/07/13 18:36:31, gab wrote:
> Use a manual WaitableEvent instead of bool + lock? (this is what we did in
many
> other similar scenarios in scheduler)

Discussed here:
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...

In short, it's an abuse of WaitableEvent since we're never actually going to
wait on this value. What we really want is something like AtomicBool. Until
then, this is the next best thing.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:410:
EXPECT_EQ(kMagicTlsValue, reinterpret_cast<size_t>(slot_.Get()));
On 2016/07/13 18:36:31, gab wrote:
> Instead of using TLS to do an "is this the right thread" check, I think it's
> cleaner to use ThreadCheckerImpl.
> 
> To set it you can do:
> 
> // Re-attach |thread_checker_| to the current thread.
> thread_checker_.DetachFromThread();
> thread_checker.CalledOnValidThread();
> 
> And then simply use CalledOnValidThread() in the check phase.

Nice. I forgot about ThreadCheckerImpl. Done.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444:
task_waiter.Reset();
On 2016/07/13 18:36:31, gab wrote:
> Make the event AUTOMATIC instead of doing Wait()+Reset()

I thought we were preferring Manual events for tests?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:447: // Give the
worker pool a chance to reclaim its threads.
On 2016/07/13 18:36:32, gab wrote:
> It was never forced to create more than one thread above so I don't understand
> what this means? IMO to test this you need N blocking tasks (N-1 of which are
> not single-threaded). Release the single-threaded one first, then the others.
> And confirm that despite that the single-threaded task can still run on the
same
> thread (i.e. it wasn't among the terminated threads).
> 
> Would be nice to verify actual thread count in process too to confirm that
> threads actually were created and wound down (confirming that things happened
> and single-threaded bound thread was protected).

The goal of this test is to verify that the worker pool does not reclaim the
thread with single threaded work after an idle cycle. This test fails if the
thread was allowed to reclaim (e.g. CanDetach always returns true). From the
caller's standpoint, we don't know how many threads were created. All we know is
that stuff can happen during the suggested reclaim time, so we want for that
long and then some to ensure things have settled down.

I wanted to avoid checking thread counts since I don't think there are
guarantees for stable thread counts in a testing environment.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait,
On 2016/07/13 18:36:31, gab wrote:
> Instead of counting lack of TLS values here I think a cleaner approach would
be
> to build a std::set of PlatformThreadRef in the first phase and check against
it
> in the second phase (effectively ThreadCheckerImpl but pooled).

Wouldn't using the TLS be basically the same thing except with less bookkeeping?
The test doesn't actually care about the thread instances themselves, but rather
that the environment is clean when it entered. TLS's are guaranteed by the OS to
be 0/null upon entry into a new thread.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_stack.h (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_stack.h:38: // Removes the top
SchedulerWorker from the stack and returns it.
On 2016/07/13 18:36:32, gab wrote:
> Doesn't "remove"

Indeed! Done.

gab

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/13 20:19:46, robliao wrote: > On ...

4 years, 5 months ago (2016-07-13 21:07:55 UTC) #29

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/13 20:19:46, robliao wrote:
> On 2016/07/13 18:36:31, gab wrote:
> > > > Why do you need Release_Load() here?
> > > Since CanDetach may make decisions based off of surrounding state from the
> > > single threaded task runners (like idle thread accounting), I figured it
> would
> > > be robust to make sure everything is updated at this point.
> > 
> > Hmmm, I don't think this matters, idle thread accounting is done on its own
> (and
> > is in fact checked before the ReleaseLoad here anyways). All checks are
> > self-sufficient so this one is really only about
|num_single_threaded_runners|
> > IMO.
> 
> In either case, it's the safest thing to do here. We want to make sure all
state
> from before gets synced up to this point. 

I disagree. It's not safer. It's equivalent (plus overhead). CanDetach() doesn't
document that it produces a memory barrier and nothing relies on it doing so.
The only other thing after this call is HasJoinedForTesting() which is
self-sufficient.

Everything in the scheduler is designed to do exactly what it needs to be safe.
Not more, not less.

Adding Release semantics here implicitly encodes a requirement which doesn't
exist. It's like using ThreadTaskRunnerHandle when one merely cares for a
sequence ;-).

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/07 20:45:57, fdoray wrote:
> On 2016/07/07 17:49:16, robliao wrote:
> > On 2016/07/04 19:41:32, fdoray wrote:
> > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent
> instead
> > of
> > > a SchedulerLock+bool.
> > 
> > Here, that seems like an abuse of the WaitableEvent since we don't intend to
> > wait on it. Maybe we should be using an atomic instead in those cases?
> 
> It would be nice to have an AtomicBool class. Since we don't have it yet, you
> can keep this as-is.

We have CancellationFlag. It's name is a bit too specific and no one has
bothered renaming it, but what some people do is typedef it to a
WhateverElseFlag.

Whether the flag or a WaitableEvent I would strongly prefer having a single
member here. Having single-member locks already bothers me some and when they're
test specific it bothers me some more.

PS: I'd be happy to see someone change that class to a BooleanFlag with a
typedef at bottom to CancellationFlag for legacy usage.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool
performed_idle_cycle_ = true;
On 2016/07/13 20:19:46, robliao wrote:
> On 2016/07/13 18:36:31, gab wrote:
> > is_idle_ ? Then it's true whether it's a new worker or because it became
idle?
> 
> The delegate needs to distinguish between a thread that's idle (GetWork() ==
> nullptr) and a thread that's idle after a running an idle cycle (Sleeping and
> then waking up again with GetWork() == nullptr).
> 
> Tracking is_idle_ is insufficient as that makes detachment happen too early
> (detachment is checked after the GetWork() call.
> 
> We can't check for detachment after wakeup because we can't distinguish
between
> wakeups and timeouts (and they're racy).
> 
> Checking for an idle cycle is equivalent to checking that we got no work,
> attempted to check if we can detach, slept, and then woke up again. If we went
> through this cycle once, then we can be assured that we should detach in the
> next wake up if there's no work.

Ah!! Got it :-). Hadn't fully understood the logic here.

Maybe |detach_in_next_cycle_if_no_work_| to be more explicit? Should it actually
start at true?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ =
true;
On 2016/07/13 20:19:46, robliao wrote:
> On 2016/07/13 18:36:31, gab wrote:
> > Shouldn't is_idle_ instead be set when about to return no work from
GetWork()?
> > It seems weird to me to check |is_idle_| and then set it to true after?
> 
> See is_idle_ comment.

Perhaps expand comment above here to explain the detachment dance logic you just
explained to me in that other comment? (your explanation there was very
compelling but not captured in code comments AFAICT)

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444:
task_waiter.Reset();
On 2016/07/13 20:19:47, robliao wrote:
> On 2016/07/13 18:36:31, gab wrote:
> > Make the event AUTOMATIC instead of doing Wait()+Reset()
> 
> I thought we were preferring Manual events for tests?

I don't think there's a preference. I tend to use MANUAL unless I care about it
being reset for re-use which you do here.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:539:
Bind(&TaskSchedulerWorkerPoolCheckTlsReuse::CountZeroTlsValuesAndWait,
On 2016/07/13 20:19:47, robliao wrote:
> On 2016/07/13 18:36:31, gab wrote:
> > Instead of counting lack of TLS values here I think a cleaner approach would
> be
> > to build a std::set of PlatformThreadRef in the first phase and check
against
> it
> > in the second phase (effectively ThreadCheckerImpl but pooled).
> 
> Wouldn't using the TLS be basically the same thing except with less
bookkeeping?
> The test doesn't actually care about the thread instances themselves, but
rather
> that the environment is clean when it entered. TLS's are guaranteed by the OS
to
> be 0/null upon entry into a new thread.

Ah ok you're right, the TLS effectively acts as a "one of these threads" flag
without having to bookkeep threads themselves.

robliao

Rebased and incorporated some more changes. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On ...

4 years, 5 months ago (2016-07-19 22:03:47 UTC) #31

Rebased and incorporated some more changes.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/13 21:07:55, gab wrote:
> On 2016/07/13 20:19:46, robliao wrote:
> > On 2016/07/13 18:36:31, gab wrote:
> > > > > Why do you need Release_Load() here?
> > > > Since CanDetach may make decisions based off of surrounding state from
the
> > > > single threaded task runners (like idle thread accounting), I figured it
> > would
> > > > be robust to make sure everything is updated at this point.
> > > 
> > > Hmmm, I don't think this matters, idle thread accounting is done on its
own
> > (and
> > > is in fact checked before the ReleaseLoad here anyways). All checks are
> > > self-sufficient so this one is really only about
> |num_single_threaded_runners|
> > > IMO.
> > 
> > In either case, it's the safest thing to do here. We want to make sure all
> state
> > from before gets synced up to this point. 
> 
> I disagree. It's not safer. It's equivalent (plus overhead). CanDetach()
doesn't
> document that it produces a memory barrier and nothing relies on it doing so.
> The only other thing after this call is HasJoinedForTesting() which is
> self-sufficient.
> 
> Everything in the scheduler is designed to do exactly what it needs to be
safe.
> Not more, not less.
> 
> Adding Release semantics here implicitly encodes a requirement which doesn't
> exist. It's like using ThreadTaskRunnerHandle when one merely cares for a
> sequence ;-).

In light of the atomic ops discussion, any strong opinion to remove the barrier
here?

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:230: bool
performed_idle_cycle_ = true;
On 2016/07/13 21:07:55, gab wrote:
> On 2016/07/13 20:19:46, robliao wrote:
> > On 2016/07/13 18:36:31, gab wrote:
> > > is_idle_ ? Then it's true whether it's a new worker or because it became
> idle?
> > 
> > The delegate needs to distinguish between a thread that's idle (GetWork() ==
> > nullptr) and a thread that's idle after a running an idle cycle (Sleeping
and
> > then waking up again with GetWork() == nullptr).
> > 
> > Tracking is_idle_ is insufficient as that makes detachment happen too early
> > (detachment is checked after the GetWork() call.
> > 
> > We can't check for detachment after wakeup because we can't distinguish
> between
> > wakeups and timeouts (and they're racy).
> > 
> > Checking for an idle cycle is equivalent to checking that we got no work,
> > attempted to check if we can detach, slept, and then woke up again. If we
went
> > through this cycle once, then we can be assured that we should detach in the
> > next wake up if there's no work.
> 
> Ah!! Got it :-). Hadn't fully understood the logic here.
> 
> Maybe |detach_in_next_cycle_if_no_work_| to be more explicit? Should it
actually
> start at true?

I like performed_idle_cycle_ since it lets us express can_detach in terms of
things that have happened in the SchedulerWorkerDelegateImpl rather than in
terms of itself.

e.g.

can_detach = performed_idle_cycle_ ...

reads better than

can_detach = detach_in_next_cycle_if_no_work_ ...

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:543: performed_idle_cycle_ =
true;
On 2016/07/13 21:07:55, gab wrote:
> On 2016/07/13 20:19:46, robliao wrote:
> > On 2016/07/13 18:36:31, gab wrote:
> > > Shouldn't is_idle_ instead be set when about to return no work from
> GetWork()?
> > > It seems weird to me to check |is_idle_| and then set it to true after?
> > 
> > See is_idle_ comment.
> 
> Perhaps expand comment above here to explain the detachment dance logic you
just
> explained to me in that other comment? (your explanation there was very
> compelling but not captured in code comments AFAICT)

Done.

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/160001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:444:
task_waiter.Reset();
On 2016/07/13 21:07:55, gab wrote:
> On 2016/07/13 20:19:47, robliao wrote:
> > On 2016/07/13 18:36:31, gab wrote:
> > > Make the event AUTOMATIC instead of doing Wait()+Reset()
> > 
> > I thought we were preferring Manual events for tests?
> 
> I don't think there's a preference. I tend to use MANUAL unless I care about
it
> being reset for re-use which you do here.

Cool. I prefer automatic here as well. Done.

gab

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/19 22:03:47, robliao wrote: > On ...

4 years, 5 months ago (2016-07-20 01:45:21 UTC) #32

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/19 22:03:47, robliao wrote:
> On 2016/07/13 21:07:55, gab wrote:
> > On 2016/07/13 20:19:46, robliao wrote:
> > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > Why do you need Release_Load() here?
> > > > > Since CanDetach may make decisions based off of surrounding state from
> the
> > > > > single threaded task runners (like idle thread accounting), I figured
it
> > > would
> > > > > be robust to make sure everything is updated at this point.
> > > > 
> > > > Hmmm, I don't think this matters, idle thread accounting is done on its
> own
> > > (and
> > > > is in fact checked before the ReleaseLoad here anyways). All checks are
> > > > self-sufficient so this one is really only about
> > |num_single_threaded_runners|
> > > > IMO.
> > > 
> > > In either case, it's the safest thing to do here. We want to make sure all
> > state
> > > from before gets synced up to this point. 
> > 
> > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach()
> doesn't
> > document that it produces a memory barrier and nothing relies on it doing
so.
> > The only other thing after this call is HasJoinedForTesting() which is
> > self-sufficient.
> > 
> > Everything in the scheduler is designed to do exactly what it needs to be
> safe.
> > Not more, not less.
> > 
> > Adding Release semantics here implicitly encodes a requirement which doesn't
> > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a
> > sequence ;-).
> 
> In light of the atomic ops discussion, any strong opinion to remove the
barrier
> here?

From the conclusion we derived, I think the inc/decrements should use Barriers
and the read here should use Acquire_Load(). I'm still not convinced that this
is required when dealing with a single variable (all memory re-ordering examples
I've seen use two variables and even a NoBarrier_Increment has to use RMW --
read-modify-write -- semantics which have an implicit "barrier" for at least
that single variable (but would allow memory re-ordering of other memory regions
with respect to it?), i.e. maybe we could get away with NoBarrier_Increment and
Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we need
the Acquire_Load() though as the read doesn't have an implicit RMW like the
increments do (don't understand when Release_Load() is appropriate -- C++11
doesn't provide it FWIW).

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/13 21:07:55, gab wrote:
> On 2016/07/07 20:45:57, fdoray wrote:
> > On 2016/07/07 17:49:16, robliao wrote:
> > > On 2016/07/04 19:41:32, fdoray wrote:
> > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent
> > instead
> > > of
> > > > a SchedulerLock+bool.
> > > 
> > > Here, that seems like an abuse of the WaitableEvent since we don't intend
to
> > > wait on it. Maybe we should be using an atomic instead in those cases?
> > 
> > It would be nice to have an AtomicBool class. Since we don't have it yet,
you
> > can keep this as-is.
> 
> We have CancellationFlag. It's name is a bit too specific and no one has
> bothered renaming it, but what some people do is typedef it to a
> WhateverElseFlag.
> 
> Whether the flag or a WaitableEvent I would strongly prefer having a single
> member here. Having single-member locks already bothers me some and when
they're
> test specific it bothers me some more.

ping ^^

> 
> PS: I'd be happy to see someone change that class to a BooleanFlag with a
> typedef at bottom to CancellationFlag for legacy usage.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ =
true;
So if we create/wake one too many thread (i.e. 2 too many threads for workload
given we always aim to keep one ready), we'll immediately detach it? Or do we
want to at least wait for "reclaim time" (in which case I think starting this at
false gives us what we want?)

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:564:
worker_detachment_allowed_(true),
(bool should be replaced by flag as mentioned in previous comment, but if it's
not : initialize it inline in header)

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta
kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
constexpr

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta
kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
"kSuggestedReclaimTimeForDetachTests" or something (otherwise I find it
unexpected that it's not used directly in InitializeWorkerPool())

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
EXPECT_GT(zero_tls_values_, 0);
Should this use NoBarrier_Load()? On Windows this is equivalent, but in
atomicops_internals_portable.h it appears to do more than just a simple deref.

Note: NoBarrier is clearly correct because of the surrounding use of
WaitableEvent (not that it's very efficient given the heavier synchronization
constructs nearby but the barrier is definitely not necessary)

fdoray

still lgtm w/ comments https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode534 base/task_scheduler/scheduler_worker_pool_impl.cc:534: !outer_->HasJoinedForTesting(); On 2016/07/08 21:17:03, robliao ...

4 years, 5 months ago (2016-07-20 14:15:44 UTC) #33

still lgtm w/ comments

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/40001/base/task_scheduler/sch...
base/task_scheduler/scheduler_worker_pool_impl.cc:534:
!outer_->HasJoinedForTesting();
On 2016/07/08 21:17:03, robliao wrote:
> On 2016/07/08 18:58:07, fdoray wrote:
> > Sadly this is racy:
> > - Thread A: SchedulerWorker::CanDetach() - returns true.
> > - Thread B: SchedulerWorkerPoolImpl::JoinForTesting()
> > - Thread B: SchedulerWorker::JoinForTesting()
> > - Thread B: Acquires SchedulerWorker::thread_lock_
> > - Thread A: SchedulerWorker::Detach()
> > - Thread A: DCHECK(!ShouldExitForTesting()) fails in DCHECK builds.
> > - Thread A: Tries to acquire SchedulerWorker::thread_lock_.
> > Dead lock between the join and the acquisition of
> SchedulerWorker::thread_lock_
> > 
> > I think SchedulerWorker::JoinForTesting() could be rewritten like this:
> > 
> > void SchedulerWorker::JoinForTesting() {
> >   {
> >     AutoSchedulerLock auto_lock(should_exit_for_testing_lock_);
> >     should_exit_for_testing_ = true;
> >   }
> >   WakeUp();
> > 
> >   std::unique_ptr<Thread> thread;
> > 
> >   {
> >     AutoSchedulerLock auto_lock(thread_lock_);
> >     thread = std::move(thread_);
> >   }
> >   
> >   // Join without holding the lock!
> >   thread->Join();
> > }
> > 
> > and SchedulerWorker::Detach() like this:
> > 
> > // Returns nullptr if called during thread->Join(). The thread won't
> > // detach. It will exit because !outer_->ShouldExitForTesting().
> > std::unique_ptr<SchedulerWorker::Thread> SchedulerWorker::Detach() {
> >   DCHECK(!ShouldExitForTesting()) << "Worker was already joined";
> >   AutoSchedulerLock auto_lock(thread_lock_);
> >   // If a wakeup is pending, then a WakeUp() came in while we were deciding
to
> >   // detach. This means we can't go away anymore since we would break the
> >   // guarantee that we call GetWork() after a successful wakeup.
> >   return thread_ && thread_->IsWakeUpPending() ? nullptr :
std::move(thread_);
> > }
> > 
> > To remove the requirement that Delegate::CanDetach() must return false when
> the
> > thread may be joined.
> > 
> > Maybe we should also add DCHECK(!ShouldExitForTesting()) in WakeUp()?
> > 
> > Let me know if my reasoning makes sense.
> 
> Yup. This became the first test that flew in the face of this comment in
> SchedulerWorker:
> // Normally holding a lock and joining is dangerous. However, since this is
> // only for testing, we're okay since the only scenario that could impact this
> // is a call to Detach, which is disallowed by having the delegate always
> // return false for the CanDetach call.
> 
> Because we have no control over the delegate from the test, I added
> HasJoinedForTesting because of the race the other way: Thread wakes up after
an
> idle cycle and attempts to detach after the ShouldExitForTesting check. Also
> long as the joins occur before the detach timeout we should be okay.
> 
> Alternatively, we can wait until after the detach timeout occurs as well.
> 
> I'm not sure we want to go with the suggested code because it walks down a
> different problematic road. If we were to make SchedulerWorker robust to Join
> and Detach races, now there's no guarantee that the current thread is cleaned
up
> after Join since it could be in the middle of detaching, which also seems
> undesirable for tests that care about that.
> 
> It seems that tension comes from having no control on the delegate, so I've
> added a disallow control to address this.
> 
> Multithreaded code is tricky indeed!

Acknowledged.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:144: SchedulerWorkerPoolImpl*
worker_pool,
Can stay SchedulerWorkerPool*

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an issue if
|num_single_threaded_runners_| is incremented after
[...] after this because the newly created TaskRunner (from which no task has
run yet) will simply run all its tasks on the next physical thread created by
this worker.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker thread
detachment. If the suggested reclaim time is not
// [...], then the test should call this before JoinForTesting()
// when no worker thread can be in the middle of detaching. That
// can be when all workers are known to be idle (after
// WaitForAllWorkersIdleForTesting() returned) or busy (after
// posting as many blocking tasks as the number of workers).

"before the first set of work is dispatched": If someone wants to disallow
detachment before posting work, shouldn't they use TimeDelta::Max() for the
suggested reclaim time?

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
worker_pool_->DisallowWorkerDetachmentForTesting();
This is not required given that suggested reclaim time is TimeDelta::Max(),
right?

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
worker_pool->DisallowWorkerDetachmentForTesting();
This is not required given that suggested reclaim time is TimeDelta::Max(),
right?

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
EXPECT_GT(zero_tls_values_, 0);
On 2016/07/20 01:45:21, gab wrote:
> Should this use NoBarrier_Load()? On Windows this is equivalent, but in
> atomicops_internals_portable.h it appears to do more than just a simple deref.
> 
> Note: NoBarrier is clearly correct because of the surrounding use of
> WaitableEvent (not that it's very efficient given the heavier synchronization
> constructs nearby but the barrier is definitely not necessary)

atomicops.h:19  It is incorrect to make direct assignments to/from an atomic
variable. You should use one of the Load or Store routines.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_params.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
"base/time/time.h"
Not needed if included in the header.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_params.h (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
#include "base/time/time.h" given that there is a TimeDelta member?

robliao

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 01:45:21, gab wrote: > On ...

4 years, 5 months ago (2016-07-20 19:44:01 UTC) #35

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/20 01:45:21, gab wrote:
> On 2016/07/19 22:03:47, robliao wrote:
> > On 2016/07/13 21:07:55, gab wrote:
> > > On 2016/07/13 20:19:46, robliao wrote:
> > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > Why do you need Release_Load() here?
> > > > > > Since CanDetach may make decisions based off of surrounding state
from
> > the
> > > > > > single threaded task runners (like idle thread accounting), I
figured
> it
> > > > would
> > > > > > be robust to make sure everything is updated at this point.
> > > > > 
> > > > > Hmmm, I don't think this matters, idle thread accounting is done on
its
> > own
> > > > (and
> > > > > is in fact checked before the ReleaseLoad here anyways). All checks
are
> > > > > self-sufficient so this one is really only about
> > > |num_single_threaded_runners|
> > > > > IMO.
> > > > 
> > > > In either case, it's the safest thing to do here. We want to make sure
all
> > > state
> > > > from before gets synced up to this point. 
> > > 
> > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach()
> > doesn't
> > > document that it produces a memory barrier and nothing relies on it doing
> so.
> > > The only other thing after this call is HasJoinedForTesting() which is
> > > self-sufficient.
> > > 
> > > Everything in the scheduler is designed to do exactly what it needs to be
> > safe.
> > > Not more, not less.
> > > 
> > > Adding Release semantics here implicitly encodes a requirement which
doesn't
> > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for a
> > > sequence ;-).
> > 
> > In light of the atomic ops discussion, any strong opinion to remove the
> barrier
> > here?
> 
> From the conclusion we derived, I think the inc/decrements should use Barriers
> and the read here should use Acquire_Load(). I'm still not convinced that this
> is required when dealing with a single variable (all memory re-ordering
examples
> I've seen use two variables and even a NoBarrier_Increment has to use RMW --
> read-modify-write -- semantics which have an implicit "barrier" for at least
> that single variable (but would allow memory re-ordering of other memory
regions
> with respect to it?), i.e. maybe we could get away with NoBarrier_Increment
and
> Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we need
> the Acquire_Load() though as the read doesn't have an implicit RMW like the
> increments do (don't understand when Release_Load() is appropriate -- C++11
> doesn't provide it FWIW).

I've my understanding is right, the atomic ops are guaranteed to provide
atomicity in operations. The barriers control visibility.

So IIUC, it's like this

[Atomic Operation]

NoBarrier_AtomicIncrement of the same atomic variable:
Thread A      [In-place Atomic Increment]    [In-place Atomic Increment]
Thread B      [In-place Atomic Increment]    [In-place Atomic Increment]

Result Case 1         0->1                          1->2
Result Case 2               1->2                    2->3
(And more)

The value is atomic (we aren't reading in the middle of another thread writing).
The visibility is up to the processor. 

With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
exclusion)

Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
Increment + B]
Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
Increment + B]
Result Case 1    A:  0->1                             3->4
                 B:       1->2                   2->3

Result Case 2    A:       1->2                        3->4
                 B:  0->1                        2->3
(...)

Non-atomic increment would separate read-increment-write steps, which for word
sized data types, would lead to similar behavior as NoBarrier_AtomicIncrement.
For sizes larger than a word, we could see intermediate reads and writes.

So that indeed suggests I need a barrier at the increment and decrements.





So as for whether or not to use Acquire_Load and Release_Load, I guess the
question comes down when do I want to sync the world

Acquire [Sync]  Read
Release         Read  [Sync]

Given that this count could be changing any time, practically speaking, it
doesn't seem to matter. If we get a 0, it could be 1 right after the read in
either case. The other thing to consider is the memory around the value. If that
needs to be visible too, then you'll also need a barrier (e.g. a certain
variable A is set to a value iff a variable B is 0).
A Release_Store ensures that setting A and B in concert is visible immediately
after B is set.

In this case of a read, the other values are synchronized, so it doesn't matter
as much if we sync now or later. With this analysis here, I think arguably we
can use NoBarrier_Load since with the increment barriers, the value is visible
everywhere.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/20 01:45:21, gab wrote:
> On 2016/07/13 21:07:55, gab wrote:
> > On 2016/07/07 20:45:57, fdoray wrote:
> > > On 2016/07/07 17:49:16, robliao wrote:
> > > > On 2016/07/04 19:41:32, fdoray wrote:
> > > > > Elsewhere in base/task_scheduler, we use a manual reset WaitableEvent
> > > instead
> > > > of
> > > > > a SchedulerLock+bool.
> > > > 
> > > > Here, that seems like an abuse of the WaitableEvent since we don't
intend
> to
> > > > wait on it. Maybe we should be using an atomic instead in those cases?
> > > 
> > > It would be nice to have an AtomicBool class. Since we don't have it yet,
> you
> > > can keep this as-is.
> > 
> > We have CancellationFlag. It's name is a bit too specific and no one has
> > bothered renaming it, but what some people do is typedef it to a
> > WhateverElseFlag.
> > 
> > Whether the flag or a WaitableEvent I would strongly prefer having a single
> > member here. Having single-member locks already bothers me some and when
> they're
> > test specific it bothers me some more.
> 
> ping ^^
> 
> > 
> > PS: I'd be happy to see someone change that class to a BooleanFlag with a
> > typedef at bottom to CancellationFlag for legacy usage.
> 

Yup. I see your pending change for the atomic flag. We can incorporate that once
the name change goes in. A typedef seems like an abuse of the type system just
to perform a local rename.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:144: SchedulerWorkerPoolImpl*
worker_pool,
On 2016/07/20 14:15:43, fdoray wrote:
> Can stay SchedulerWorkerPool*

Done.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ =
true;
On 2016/07/20 01:45:21, gab wrote:
> So if we create/wake one too many thread (i.e. 2 too many threads for workload
> given we always aim to keep one ready), we'll immediately detach it? Or do we
> want to at least wait for "reclaim time" (in which case I think starting this
at
> false gives us what we want?)

Yes, if we wake one too many threads with no work, those threads will detach. In
general, we shouldn't be doing that as that's indicative of a bug on our side. 

Waiting for the reclaim time is just as bad since we'll effectively have these
threads spin waiting. We should instead fix the wake bug.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an issue if
|num_single_threaded_runners_| is incremented after
On 2016/07/20 14:15:43, fdoray wrote:
> [...] after this because the newly created TaskRunner (from which no task has
> run yet) will simply run all its tasks on the next physical thread created by
> this worker.

Done.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:564:
worker_detachment_allowed_(true),
On 2016/07/20 01:45:21, gab wrote:
> (bool should be replaced by flag as mentioned in previous comment, but if it's
> not : initialize it inline in header)

Moved initialization.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker thread
detachment. If the suggested reclaim time is not
On 2016/07/20 14:15:43, fdoray wrote:
> // [...], then the test should call this before JoinForTesting()
> // when no worker thread can be in the middle of detaching. That
> // can be when all workers are known to be idle (after
> // WaitForAllWorkersIdleForTesting() returned) or busy (after
> // posting as many blocking tasks as the number of workers).
> 
> "before the first set of work is dispatched": If someone wants to disallow
> detachment before posting work, shouldn't they use TimeDelta::Max() for the
> suggested reclaim time?
> 

Unless the test requires the ability to detach. I've relaxed the requirement
from "first".

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const TimeDelta
kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
On 2016/07/20 01:45:21, gab wrote:
> constexpr

Done.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
worker_pool_->DisallowWorkerDetachmentForTesting();
On 2016/07/20 14:15:43, fdoray wrote:
> This is not required given that suggested reclaim time is TimeDelta::Max(),
> right?

Yep. Done.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
worker_pool->DisallowWorkerDetachmentForTesting();
On 2016/07/20 14:15:43, fdoray wrote:
> This is not required given that suggested reclaim time is TimeDelta::Max(),
> right?

Yep. Done.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
On 2016/07/20 01:45:21, gab wrote:
> s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)

We're counting zero TLS values here so the name should stick. The TLS is always
initialized to zero.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
EXPECT_GT(zero_tls_values_, 0);
On 2016/07/20 14:15:43, fdoray wrote:
> On 2016/07/20 01:45:21, gab wrote:
> > Should this use NoBarrier_Load()? On Windows this is equivalent, but in
> > atomicops_internals_portable.h it appears to do more than just a simple
deref.
> > 
> > Note: NoBarrier is clearly correct because of the surrounding use of
> > WaitableEvent (not that it's very efficient given the heavier
synchronization
> > constructs nearby but the barrier is definitely not necessary)
> 
> atomicops.h:19  It is incorrect to make direct assignments to/from an atomic
> variable. You should use one of the Load or Store routines.

The atomicops unittest direct references their variables for testing
(https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).

Went ahead and changed this to a NoBarrier_Load

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_params.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
"base/time/time.h"
On 2016/07/20 14:15:44, fdoray wrote:
> Not needed if included in the header.

See earlier comment.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_params.h (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
On 2016/07/20 14:15:44, fdoray wrote:
> #include "base/time/time.h" given that there is a TimeDelta member?

The forward-decl still works here since we only need to know the size in the
constructor. Folks who include this should also include time.h as well since
they themselves will be using TimeDelta.

robliao

The CQ bit was checked by robliao@chromium.org to run a CQ dry run

4 years, 5 months ago (2016-07-20 20:04:03 UTC) #36

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2116163002/320001

4 years, 5 months ago (2016-07-20 20:04:43 UTC) #37

danakj

On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote: > > > https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc > ...

4 years, 5 months ago (2016-07-20 20:17:46 UTC) #38

On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote:

>
>
>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
> base/task_scheduler/scheduler_worker_pool_impl.cc:531:
> !subtle::Release_Load(&num_single_threaded_runners_) &&
> On 2016/07/20 01:45:21, gab wrote:
> > On 2016/07/19 22:03:47, robliao wrote:
> > > On 2016/07/13 21:07:55, gab wrote:
> > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > Why do you need Release_Load() here?
> > > > > > > Since CanDetach may make decisions based off of surrounding
> state from
> > > the
> > > > > > > single threaded task runners (like idle thread accounting),
> I figured
> > it
> > > > > would
> > > > > > > be robust to make sure everything is updated at this point.
> > > > > >
> > > > > > Hmmm, I don't think this matters, idle thread accounting is
> done on its
> > > own
> > > > > (and
> > > > > > is in fact checked before the ReleaseLoad here anyways). All
> checks are
> > > > > > self-sufficient so this one is really only about
> > > > |num_single_threaded_runners|
> > > > > > IMO.
> > > > >
> > > > > In either case, it's the safest thing to do here. We want to
> make sure all
> > > > state
> > > > > from before gets synced up to this point.
> > > >
> > > > I disagree. It's not safer. It's equivalent (plus overhead).
> CanDetach()
> > > doesn't
> > > > document that it produces a memory barrier and nothing relies on
> it doing
> > so.
> > > > The only other thing after this call is HasJoinedForTesting()
> which is
> > > > self-sufficient.
> > > >
> > > > Everything in the scheduler is designed to do exactly what it
> needs to be
> > > safe.
> > > > Not more, not less.
> > > >
> > > > Adding Release semantics here implicitly encodes a requirement
> which doesn't
> > > > exist. It's like using ThreadTaskRunnerHandle when one merely
> cares for a
> > > > sequence ;-).
> > >
> > > In light of the atomic ops discussion, any strong opinion to remove
> the
> > barrier
> > > here?
> >
> > From the conclusion we derived, I think the inc/decrements should use
> Barriers
> > and the read here should use Acquire_Load(). I'm still not convinced
> that this
> > is required when dealing with a single variable (all memory
> re-ordering examples
> > I've seen use two variables and even a NoBarrier_Increment has to use
> RMW --
> > read-modify-write -- semantics which have an implicit "barrier" for at
> least
> > that single variable (but would allow memory re-ordering of other
> memory regions
> > with respect to it?), i.e. maybe we could get away with
> NoBarrier_Increment and
> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
> we need
> > the Acquire_Load() though as the read doesn't have an implicit RMW
> like the
> > increments do (don't understand when Release_Load() is appropriate --
> C++11
> > doesn't provide it FWIW).
>
> I've my understanding is right, the atomic ops are guaranteed to provide
> atomicity in operations. The barriers control visibility.
>
> So IIUC, it's like this
>
> [Atomic Operation]
>
> NoBarrier_AtomicIncrement of the same atomic variable:
> Thread A [In-place Atomic Increment] [In-place Atomic Increment]
> Thread B [In-place Atomic Increment] [In-place Atomic Increment]
>
> Result Case 1 0->1 1->2
> Result Case 2 1->2 2->3
> (And more)
>
> The value is atomic (we aren't reading in the middle of another thread
> writing). The visibility is up to the processor.
>
> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> exclusion)
>
> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place
> Atomic Increment + B]
> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place
> Atomic Increment + B]
> Result Case 1 A: 0->1 3->4
> B: 1->2 2->3
>
> Result Case 2 A: 1->2 3->4
> B: 0->1 2->3
> (...)
>
> Non-atomic increment would separate read-increment-write steps, which
> for word sized data types, would lead to similar behavior as
> NoBarrier_AtomicIncrement.
>

Quibble: My understanding is that even word-sized increments can end up
being non-atomic inside the processor.


> For sizes larger than a word, we could see
> intermediate reads and writes.
>
> So that indeed suggests I need a barrier at the increment and
> decrements.
>
>
>
>
>
> So as for whether or not to use Acquire_Load and Release_Load, I guess
> the question comes down when do I want to sync the world
>
> Acquire [Sync] Read
> Release Read [Sync]
>
> Given that this count could be changing any time, practically speaking,
> it doesn't seem to matter. If we get a 0, it could be 1 right after the
> read in either case. The other thing to consider is the memory around
> the value. If that needs to be visible too, then you'll also need a
> barrier (e.g. a certain variable A is set to a value iff a variable B is
> 0).
> A Release_Store ensures that setting A and B in concert is visible
> immediately after B is set.
>
> In this case of a read, the other values are synchronized, so it doesn't
> matter as much if we sync now or later. With this analysis here, I think
> arguably we can use NoBarrier_Load since with the increment barriers,
> the value is visible everywhere.
>
>
>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
> join_for_testing_called_lock_;
> On 2016/07/20 01:45:21, gab wrote:
> > On 2016/07/13 21:07:55, gab wrote:
> > > On 2016/07/07 20:45:57, fdoray wrote:
> > > > On 2016/07/07 17:49:16, robliao wrote:
> > > > > On 2016/07/04 19:41:32, fdoray wrote:
> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
> WaitableEvent
> > > > instead
> > > > > of
> > > > > > a SchedulerLock+bool.
> > > > >
> > > > > Here, that seems like an abuse of the WaitableEvent since we
> don't intend
> > to
> > > > > wait on it. Maybe we should be using an atomic instead in those
> cases?
> > > >
> > > > It would be nice to have an AtomicBool class. Since we don't have
> it yet,
> > you
> > > > can keep this as-is.
> > >
> > > We have CancellationFlag. It's name is a bit too specific and no one
> has
> > > bothered renaming it, but what some people do is typedef it to a
> > > WhateverElseFlag.
> > >
> > > Whether the flag or a WaitableEvent I would strongly prefer having a
> single
> > > member here. Having single-member locks already bothers me some and
> when
> > they're
> > > test specific it bothers me some more.
> >
> > ping ^^
> >
> > >
> > > PS: I'd be happy to see someone change that class to a BooleanFlag
> with a
> > > typedef at bottom to CancellationFlag for legacy usage.
> >
>
> Yup. I see your pending change for the atomic flag. We can incorporate
> that once the name change goes in. A typedef seems like an abuse of the
> type system just to perform a local rename.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl.cc:144:
> SchedulerWorkerPoolImpl* worker_pool,
> On 2016/07/20 14:15:43, fdoray wrote:
> > Can stay SchedulerWorkerPool*
>
> Done.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl.cc:446:
> performed_idle_cycle_ = true;
> On 2016/07/20 01:45:21, gab wrote:
> > So if we create/wake one too many thread (i.e. 2 too many threads for
> workload
> > given we always aim to keep one ready), we'll immediately detach it?
> Or do we
> > want to at least wait for "reclaim time" (in which case I think
> starting this at
> > false gives us what we want?)
>
> Yes, if we wake one too many threads with no work, those threads will
> detach. In general, we shouldn't be doing that as that's indicative of a
> bug on our side.
>
> Waiting for the reclaim time is just as bad since we'll effectively have
> these threads spin waiting. We should instead fix the wake bug.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an
> issue if |num_single_threaded_runners_| is incremented after
> On 2016/07/20 14:15:43, fdoray wrote:
> > [...] after this because the newly created TaskRunner (from which no
> task has
> > run yet) will simply run all its tasks on the next physical thread
> created by
> > this worker.
>
> Done.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl.cc:564:
> worker_detachment_allowed_(true),
> On 2016/07/20 01:45:21, gab wrote:
> > (bool should be replaced by flag as mentioned in previous comment, but
> if it's
> > not : initialize it inline in header)
>
> Moved initialization.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker
> thread detachment. If the suggested reclaim time is not
> On 2016/07/20 14:15:43, fdoray wrote:
> > // [...], then the test should call this before JoinForTesting()
> > // when no worker thread can be in the middle of detaching. That
> > // can be when all workers are known to be idle (after
> > // WaitForAllWorkersIdleForTesting() returned) or busy (after
> > // posting as many blocking tasks as the number of workers).
> >
> > "before the first set of work is dispatched": If someone wants to
> disallow
> > detachment before posting work, shouldn't they use TimeDelta::Max()
> for the
> > suggested reclaim time?
> >
>
> Unless the test requires the ability to detach. I've relaxed the
> requirement from "first".
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const
> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
> On 2016/07/20 01:45:21, gab wrote:
> > constexpr
>
> Done.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
> worker_pool_->DisallowWorkerDetachmentForTesting();
> On 2016/07/20 14:15:43, fdoray wrote:
> > This is not required given that suggested reclaim time is
> TimeDelta::Max(),
> > right?
>
> Yep. Done.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
> worker_pool->DisallowWorkerDetachmentForTesting();
> On 2016/07/20 14:15:43, fdoray wrote:
> > This is not required given that suggested reclaim time is
> TimeDelta::Max(),
> > right?
>
> Yep. Done.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
> On 2016/07/20 01:45:21, gab wrote:
> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)
>
> We're counting zero TLS values here so the name should stick. The TLS is
> always initialized to zero.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
> EXPECT_GT(zero_tls_values_, 0);
> On 2016/07/20 14:15:43, fdoray wrote:
> > On 2016/07/20 01:45:21, gab wrote:
> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but
> in
> > > atomicops_internals_portable.h it appears to do more than just a
> simple deref.
> > >
> > > Note: NoBarrier is clearly correct because of the surrounding use of
> > > WaitableEvent (not that it's very efficient given the heavier
> synchronization
> > > constructs nearby but the barrier is definitely not necessary)
> >
> > atomicops.h:19 It is incorrect to make direct assignments to/from an
> atomic
> > variable. You should use one of the Load or Store routines.
>
> The atomicops unittest direct references their variables for testing
> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).
>
> Went ahead and changed this to a NoBarrier_Load
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> File base/task_scheduler/scheduler_worker_pool_params.cc (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
> "base/time/time.h"
> On 2016/07/20 14:15:44, fdoray wrote:
> > Not needed if included in the header.
>
> See earlier comment.
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> File base/task_scheduler/scheduler_worker_pool_params.h (right):
>
>
>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
> On 2016/07/20 14:15:44, fdoray wrote:
> > #include "base/time/time.h" given that there is a TimeDelta member?
>
> The forward-decl still works here since we only need to know the size in
> the constructor. Folks who include this should also include time.h as
> well since they themselves will be using TimeDelta.
>
> https://codereview.chromium.org/2116163002/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

gab

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 19:44:00, robliao wrote: > On ...

4 years, 5 months ago (2016-07-20 20:26:31 UTC) #39

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/20 19:44:00, robliao wrote:
> On 2016/07/20 01:45:21, gab wrote:
> > On 2016/07/19 22:03:47, robliao wrote:
> > > On 2016/07/13 21:07:55, gab wrote:
> > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > Why do you need Release_Load() here?
> > > > > > > Since CanDetach may make decisions based off of surrounding state
> from
> > > the
> > > > > > > single threaded task runners (like idle thread accounting), I
> figured
> > it
> > > > > would
> > > > > > > be robust to make sure everything is updated at this point.
> > > > > > 
> > > > > > Hmmm, I don't think this matters, idle thread accounting is done on
> its
> > > own
> > > > > (and
> > > > > > is in fact checked before the ReleaseLoad here anyways). All checks
> are
> > > > > > self-sufficient so this one is really only about
> > > > |num_single_threaded_runners|
> > > > > > IMO.
> > > > > 
> > > > > In either case, it's the safest thing to do here. We want to make sure
> all
> > > > state
> > > > > from before gets synced up to this point. 
> > > > 
> > > > I disagree. It's not safer. It's equivalent (plus overhead). CanDetach()
> > > doesn't
> > > > document that it produces a memory barrier and nothing relies on it
doing
> > so.
> > > > The only other thing after this call is HasJoinedForTesting() which is
> > > > self-sufficient.
> > > > 
> > > > Everything in the scheduler is designed to do exactly what it needs to
be
> > > safe.
> > > > Not more, not less.
> > > > 
> > > > Adding Release semantics here implicitly encodes a requirement which
> doesn't
> > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares for
a
> > > > sequence ;-).
> > > 
> > > In light of the atomic ops discussion, any strong opinion to remove the
> > barrier
> > > here?
> > 
> > From the conclusion we derived, I think the inc/decrements should use
Barriers
> > and the read here should use Acquire_Load(). I'm still not convinced that
this
> > is required when dealing with a single variable (all memory re-ordering
> examples
> > I've seen use two variables and even a NoBarrier_Increment has to use RMW --
> > read-modify-write -- semantics which have an implicit "barrier" for at least
> > that single variable (but would allow memory re-ordering of other memory
> regions
> > with respect to it?), i.e. maybe we could get away with NoBarrier_Increment
> and
> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we
need
> > the Acquire_Load() though as the read doesn't have an implicit RMW like the
> > increments do (don't understand when Release_Load() is appropriate -- C++11
> > doesn't provide it FWIW).
> 
> I've my understanding is right

We have different understandings on a few things (inline)

>, the atomic ops are guaranteed to provide
> atomicity in operations. The barriers control visibility.
> 
> So IIUC, it's like this
> 
> [Atomic Operation]
> 
> NoBarrier_AtomicIncrement of the same atomic variable:
> Thread A      [In-place Atomic Increment]    [In-place Atomic Increment]
> Thread B      [In-place Atomic Increment]    [In-place Atomic Increment]
> 
> Result Case 1         0->1                          1->2
> Result Case 2               1->2                    2->3
> (And more)
> 
> The value is atomic (we aren't reading in the middle of another thread
writing).
> The visibility is up to the processor. 
> 
> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> exclusion)
> 
> Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
> Increment + B]
> Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
> Increment + B]
> Result Case 1    A:  0->1                             3->4
>                  B:       1->2                   2->3
> 
> Result Case 2    A:       1->2                        3->4
>                  B:  0->1                        2->3
> (...)
> 
> Non-atomic increment would separate read-increment-write steps, which for word
> sized data types, would lead to similar behavior as NoBarrier_AtomicIncrement.

Hmmm separating read-increment-write steps would be wrong even for word sized
data types (the whole point of atomic RMW is to make sure to do so as a single
transaction).

> For sizes larger than a word, we could see intermediate reads and writes.
> 
> So that indeed suggests I need a barrier at the increment and decrements.

I think you need a barrier so that the result of the increment is "flushed"
(guaranteed to not be reordered after the next sequential read).

> 
> 
> 
> 
> 
> So as for whether or not to use Acquire_Load and Release_Load, I guess the
> question comes down when do I want to sync the world
> 
> Acquire [Sync]  Read
> Release         Read  [Sync]
> 
> Given that this count could be changing any time, practically speaking, it
> doesn't seem to matter. If we get a 0, it could be 1 right after the read in
> either case. The other thing to consider is the memory around the value. If
that
> needs to be visible too, then you'll also need a barrier (e.g. a certain
> variable A is set to a value iff a variable B is 0).

I think we do care about syncing before (Acquire), otherwise a thread could see
0 forever. It's not relevant that the count could go up right after check (we
explicitly support that by sending a WakeUp which would result in a sleep no-op
IIRC). What's relevant is that we don't see a value that's outdated beyond the
range we explicitly protect ourselves against.

> A Release_Store ensures that setting A and B in concert is visible immediately
> after B is set.

I don't think we care about that. Nothing depends on being synchronized with the
variable at stake here, adding the Release adds an implicit dependency which we
don't document and that's no good for readability (and a potential burden if
someone starts depending on it because it happens to be there). Every
synchronization in the scheduler is explicit and intentional, let's not add one
for the sake of "why not".

> 
> In this case of a read, the other values are synchronized

What do you mean by "the other values are synchronized on read"? Which other
values and why and why does it matter?

>, so it doesn't matter
> as much if we sync now or later. With this analysis here, I think arguably we
> can use NoBarrier_Load since with the increment barriers, the value is visible
> everywhere.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/20 19:44:00, robliao wrote:
> On 2016/07/20 01:45:21, gab wrote:
> > On 2016/07/13 21:07:55, gab wrote:
> > > On 2016/07/07 20:45:57, fdoray wrote:
> > > > On 2016/07/07 17:49:16, robliao wrote:
> > > > > On 2016/07/04 19:41:32, fdoray wrote:
> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
WaitableEvent
> > > > instead
> > > > > of
> > > > > > a SchedulerLock+bool.
> > > > > 
> > > > > Here, that seems like an abuse of the WaitableEvent since we don't
> intend
> > to
> > > > > wait on it. Maybe we should be using an atomic instead in those cases?
> > > > 
> > > > It would be nice to have an AtomicBool class. Since we don't have it
yet,
> > you
> > > > can keep this as-is.
> > > 
> > > We have CancellationFlag. It's name is a bit too specific and no one has
> > > bothered renaming it, but what some people do is typedef it to a
> > > WhateverElseFlag.
> > > 
> > > Whether the flag or a WaitableEvent I would strongly prefer having a
single
> > > member here. Having single-member locks already bothers me some and when
> > they're
> > > test specific it bothers me some more.
> > 
> > ping ^^
> > 
> > > 
> > > PS: I'd be happy to see someone change that class to a BooleanFlag with a
> > > typedef at bottom to CancellationFlag for legacy usage.
> > 
> 
> Yup. I see your pending change for the atomic flag. 

Which change? It sure isn't mine or I code in my sleep :-)

> We can incorporate that once
> the name change goes in. A typedef seems like an abuse of the type system just
> to perform a local rename.

It's at least done here (and I remember other places but I can't find it, maybe
deleted code). I don't see why not, but fine to do it later too.

https://cs.chromium.org/chromium/src/chrome/browser/after_startup_task_utils....

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ =
true;
On 2016/07/20 19:44:01, robliao wrote:
> On 2016/07/20 01:45:21, gab wrote:
> > So if we create/wake one too many thread (i.e. 2 too many threads for
workload
> > given we always aim to keep one ready), we'll immediately detach it? Or do
we
> > want to at least wait for "reclaim time" (in which case I think starting
this
> at
> > false gives us what we want?)
> 
> Yes, if we wake one too many threads with no work, those threads will detach.
In
> general, we shouldn't be doing that as that's indicative of a bug on our side.

> 
> Waiting for the reclaim time is just as bad since we'll effectively have these
> threads spin waiting. We should instead fix the wake bug.

It's not necessarily a bug. It can today under a simple bursty load.

i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all processed
by first two threads before other 2 even get scheduled.

I think in those cases the ever idle threads should wait for reclaim time or we
might end up trashing the system. Say 4 more tasks show up after an immediate
reclaim: we recreate those 2 threads but the 2 active one yet again manage to
process the 4 tasks before the 2 recreated threads are scheduled (etc.).

So I still think we should start this false (and explain the above in a comment
here).

(and we should reset it to false right before detaching as well for the same
reasons)

robliao

The way that the atomic increment is implemented for x86 on Windows is via _InterlockedExchangeAdd, ...

4 years, 5 months ago (2016-07-20 20:58:20 UTC) #40

The way that the atomic increment is implemented for x86 on Windows is via
_InterlockedExchangeAdd, which is intended to be an atomic operation.

x86 allows for atomic increments in the processor as long as you specify
the lock prefix on the INC instruction.
Adds register to register are generally atomic in word sizes.

The _InterlockedExchangeAdd gets translated to
lock xadd [mem location], [increment value]
making the increment atomic.

Since on x86 in multiprocessor systems lock prefixed instructions have a
total order, the barriers are not necessary.


On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote:

> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote:
>
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>> base/task_scheduler/scheduler_worker_pool_impl.cc:531:
>> !subtle::Release_Load(&num_single_threaded_runners_) &&
>> On 2016/07/20 01:45:21, gab wrote:
>> > On 2016/07/19 22:03:47, robliao wrote:
>> > > On 2016/07/13 21:07:55, gab wrote:
>> > > > On 2016/07/13 20:19:46, robliao wrote:
>> > > > > On 2016/07/13 18:36:31, gab wrote:
>> > > > > > > > Why do you need Release_Load() here?
>> > > > > > > Since CanDetach may make decisions based off of surrounding
>> state from
>> > > the
>> > > > > > > single threaded task runners (like idle thread accounting),
>> I figured
>> > it
>> > > > > would
>> > > > > > > be robust to make sure everything is updated at this point.
>> > > > > >
>> > > > > > Hmmm, I don't think this matters, idle thread accounting is
>> done on its
>> > > own
>> > > > > (and
>> > > > > > is in fact checked before the ReleaseLoad here anyways). All
>> checks are
>> > > > > > self-sufficient so this one is really only about
>> > > > |num_single_threaded_runners|
>> > > > > > IMO.
>> > > > >
>> > > > > In either case, it's the safest thing to do here. We want to
>> make sure all
>> > > > state
>> > > > > from before gets synced up to this point.
>> > > >
>> > > > I disagree. It's not safer. It's equivalent (plus overhead).
>> CanDetach()
>> > > doesn't
>> > > > document that it produces a memory barrier and nothing relies on
>> it doing
>> > so.
>> > > > The only other thing after this call is HasJoinedForTesting()
>> which is
>> > > > self-sufficient.
>> > > >
>> > > > Everything in the scheduler is designed to do exactly what it
>> needs to be
>> > > safe.
>> > > > Not more, not less.
>> > > >
>> > > > Adding Release semantics here implicitly encodes a requirement
>> which doesn't
>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely
>> cares for a
>> > > > sequence ;-).
>> > >
>> > > In light of the atomic ops discussion, any strong opinion to remove
>> the
>> > barrier
>> > > here?
>> >
>> > From the conclusion we derived, I think the inc/decrements should use
>> Barriers
>> > and the read here should use Acquire_Load(). I'm still not convinced
>> that this
>> > is required when dealing with a single variable (all memory
>> re-ordering examples
>> > I've seen use two variables and even a NoBarrier_Increment has to use
>> RMW --
>> > read-modify-write -- semantics which have an implicit "barrier" for at
>> least
>> > that single variable (but would allow memory re-ordering of other
>> memory regions
>> > with respect to it?), i.e. maybe we could get away with
>> NoBarrier_Increment and
>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
>> we need
>> > the Acquire_Load() though as the read doesn't have an implicit RMW
>> like the
>> > increments do (don't understand when Release_Load() is appropriate --
>> C++11
>> > doesn't provide it FWIW).
>>
>> I've my understanding is right, the atomic ops are guaranteed to provide
>> atomicity in operations. The barriers control visibility.
>>
>> So IIUC, it's like this
>>
>> [Atomic Operation]
>>
>> NoBarrier_AtomicIncrement of the same atomic variable:
>> Thread A [In-place Atomic Increment] [In-place Atomic Increment]
>> Thread B [In-place Atomic Increment] [In-place Atomic Increment]
>>
>> Result Case 1 0->1 1->2
>> Result Case 2 1->2 2->3
>> (And more)
>>
>> The value is atomic (we aren't reading in the middle of another thread
>> writing). The visibility is up to the processor.
>>
>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
>> exclusion)
>>
>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place
>> Atomic Increment + B]
>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place
>> Atomic Increment + B]
>> Result Case 1 A: 0->1 3->4
>> B: 1->2 2->3
>>
>> Result Case 2 A: 1->2 3->4
>> B: 0->1 2->3
>> (...)
>>
>> Non-atomic increment would separate read-increment-write steps, which
>> for word sized data types, would lead to similar behavior as
>> NoBarrier_AtomicIncrement.
>>
>
> Quibble: My understanding is that even word-sized increments can end up
> being non-atomic inside the processor.
>
>
>> For sizes larger than a word, we could see
>> intermediate reads and writes.
>>
>> So that indeed suggests I need a barrier at the increment and
>> decrements.
>>
>>
>>
>>
>>
>> So as for whether or not to use Acquire_Load and Release_Load, I guess
>> the question comes down when do I want to sync the world
>>
>> Acquire [Sync] Read
>> Release Read [Sync]
>>
>> Given that this count could be changing any time, practically speaking,
>> it doesn't seem to matter. If we get a 0, it could be 1 right after the
>> read in either case. The other thing to consider is the memory around
>> the value. If that needs to be visible too, then you'll also need a
>> barrier (e.g. a certain variable A is set to a value iff a variable B is
>> 0).
>> A Release_Store ensures that setting A and B in concert is visible
>> immediately after B is set.
>>
>> In this case of a read, the other values are synchronized, so it doesn't
>> matter as much if we sync now or later. With this analysis here, I think
>> arguably we can use NoBarrier_Load since with the increment barriers,
>> the value is visible everywhere.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
>> join_for_testing_called_lock_;
>> On 2016/07/20 01:45:21, gab wrote:
>> > On 2016/07/13 21:07:55, gab wrote:
>> > > On 2016/07/07 20:45:57, fdoray wrote:
>> > > > On 2016/07/07 17:49:16, robliao wrote:
>> > > > > On 2016/07/04 19:41:32, fdoray wrote:
>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
>> WaitableEvent
>> > > > instead
>> > > > > of
>> > > > > > a SchedulerLock+bool.
>> > > > >
>> > > > > Here, that seems like an abuse of the WaitableEvent since we
>> don't intend
>> > to
>> > > > > wait on it. Maybe we should be using an atomic instead in those
>> cases?
>> > > >
>> > > > It would be nice to have an AtomicBool class. Since we don't have
>> it yet,
>> > you
>> > > > can keep this as-is.
>> > >
>> > > We have CancellationFlag. It's name is a bit too specific and no one
>> has
>> > > bothered renaming it, but what some people do is typedef it to a
>> > > WhateverElseFlag.
>> > >
>> > > Whether the flag or a WaitableEvent I would strongly prefer having a
>> single
>> > > member here. Having single-member locks already bothers me some and
>> when
>> > they're
>> > > test specific it bothers me some more.
>> >
>> > ping ^^
>> >
>> > >
>> > > PS: I'd be happy to see someone change that class to a BooleanFlag
>> with a
>> > > typedef at bottom to CancellationFlag for legacy usage.
>> >
>>
>> Yup. I see your pending change for the atomic flag. We can incorporate
>> that once the name change goes in. A typedef seems like an abuse of the
>> type system just to perform a local rename.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl.cc:144:
>> SchedulerWorkerPoolImpl* worker_pool,
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > Can stay SchedulerWorkerPool*
>>
>> Done.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl.cc:446:
>> performed_idle_cycle_ = true;
>> On 2016/07/20 01:45:21, gab wrote:
>> > So if we create/wake one too many thread (i.e. 2 too many threads for
>> workload
>> > given we always aim to keep one ready), we'll immediately detach it?
>> Or do we
>> > want to at least wait for "reclaim time" (in which case I think
>> starting this at
>> > false gives us what we want?)
>>
>> Yes, if we wake one too many threads with no work, those threads will
>> detach. In general, we shouldn't be doing that as that's indicative of a
>> bug on our side.
>>
>> Waiting for the reclaim time is just as bad since we'll effectively have
>> these threads spin waiting. We should instead fix the wake bug.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an
>> issue if |num_single_threaded_runners_| is incremented after
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > [...] after this because the newly created TaskRunner (from which no
>> task has
>> > run yet) will simply run all its tasks on the next physical thread
>> created by
>> > this worker.
>>
>> Done.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl.cc:564:
>> worker_detachment_allowed_(true),
>> On 2016/07/20 01:45:21, gab wrote:
>> > (bool should be replaced by flag as mentioned in previous comment, but
>> if it's
>> > not : initialize it inline in header)
>>
>> Moved initialization.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker
>> thread detachment. If the suggested reclaim time is not
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > // [...], then the test should call this before JoinForTesting()
>> > // when no worker thread can be in the middle of detaching. That
>> > // can be when all workers are known to be idle (after
>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after
>> > // posting as many blocking tasks as the number of workers).
>> >
>> > "before the first set of work is dispatched": If someone wants to
>> disallow
>> > detachment before posting work, shouldn't they use TimeDelta::Max()
>> for the
>> > suggested reclaim time?
>> >
>>
>> Unless the test requires the ability to detach. I've relaxed the
>> requirement from "first".
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const
>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
>> On 2016/07/20 01:45:21, gab wrote:
>> > constexpr
>>
>> Done.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
>> worker_pool_->DisallowWorkerDetachmentForTesting();
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > This is not required given that suggested reclaim time is
>> TimeDelta::Max(),
>> > right?
>>
>> Yep. Done.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
>> worker_pool->DisallowWorkerDetachmentForTesting();
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > This is not required given that suggested reclaim time is
>> TimeDelta::Max(),
>> > right?
>>
>> Yep. Done.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
>> On 2016/07/20 01:45:21, gab wrote:
>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)
>>
>> We're counting zero TLS values here so the name should stick. The TLS is
>> always initialized to zero.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
>> EXPECT_GT(zero_tls_values_, 0);
>> On 2016/07/20 14:15:43, fdoray wrote:
>> > On 2016/07/20 01:45:21, gab wrote:
>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but
>> in
>> > > atomicops_internals_portable.h it appears to do more than just a
>> simple deref.
>> > >
>> > > Note: NoBarrier is clearly correct because of the surrounding use of
>> > > WaitableEvent (not that it's very efficient given the heavier
>> synchronization
>> > > constructs nearby but the barrier is definitely not necessary)
>> >
>> > atomicops.h:19 It is incorrect to make direct assignments to/from an
>> atomic
>> > variable. You should use one of the Load or Store routines.
>>
>> The atomicops unittest direct references their variables for testing
>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).
>>
>> Went ahead and changed this to a NoBarrier_Load
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> File base/task_scheduler/scheduler_worker_pool_params.cc (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
>> "base/time/time.h"
>> On 2016/07/20 14:15:44, fdoray wrote:
>> > Not needed if included in the header.
>>
>> See earlier comment.
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> File base/task_scheduler/scheduler_worker_pool_params.h (right):
>>
>>
>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
>> On 2016/07/20 14:15:44, fdoray wrote:
>> > #include "base/time/time.h" given that there is a TimeDelta member?
>>
>> The forward-decl still works here since we only need to know the size in
>> the constructor. Folks who include this should also include time.h as
>> well since they themselves will be using TimeDelta.
>>
>> https://codereview.chromium.org/2116163002/
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 5 months ago (2016-07-20 21:00:25 UTC) #41

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 5 months ago (2016-07-20 21:00:26 UTC) #42

danakj

On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> wrote: > The way ...

4 years, 5 months ago (2016-07-20 21:00:41 UTC) #43

On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> wrote:

> The way that the atomic increment is implemented for x86 on Windows is via
> _InterlockedExchangeAdd, which is intended to be an atomic operation.
>
> x86 allows for atomic increments in the processor as long as you specify
> the lock prefix on the INC instruction.
> Adds register to register are generally atomic in word sizes.
>
> The _InterlockedExchangeAdd gets translated to
> lock xadd [mem location], [increment value]
> making the increment atomic.
>
> Since on x86 in multiprocessor systems lock prefixed instructions have a
> total order, the barriers are not necessary.
>

Perhaps thats related to why barriers have 0 perf impact on x86? (A reason
to write generally for any platform I guess.)


>
>
> On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote:
>
>> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote:
>>
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>> base/task_scheduler/scheduler_worker_pool_impl.cc:531:
>>> !subtle::Release_Load(&num_single_threaded_runners_) &&
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > On 2016/07/19 22:03:47, robliao wrote:
>>> > > On 2016/07/13 21:07:55, gab wrote:
>>> > > > On 2016/07/13 20:19:46, robliao wrote:
>>> > > > > On 2016/07/13 18:36:31, gab wrote:
>>> > > > > > > > Why do you need Release_Load() here?
>>> > > > > > > Since CanDetach may make decisions based off of surrounding
>>> state from
>>> > > the
>>> > > > > > > single threaded task runners (like idle thread accounting),
>>> I figured
>>> > it
>>> > > > > would
>>> > > > > > > be robust to make sure everything is updated at this point.
>>> > > > > >
>>> > > > > > Hmmm, I don't think this matters, idle thread accounting is
>>> done on its
>>> > > own
>>> > > > > (and
>>> > > > > > is in fact checked before the ReleaseLoad here anyways). All
>>> checks are
>>> > > > > > self-sufficient so this one is really only about
>>> > > > |num_single_threaded_runners|
>>> > > > > > IMO.
>>> > > > >
>>> > > > > In either case, it's the safest thing to do here. We want to
>>> make sure all
>>> > > > state
>>> > > > > from before gets synced up to this point.
>>> > > >
>>> > > > I disagree. It's not safer. It's equivalent (plus overhead).
>>> CanDetach()
>>> > > doesn't
>>> > > > document that it produces a memory barrier and nothing relies on
>>> it doing
>>> > so.
>>> > > > The only other thing after this call is HasJoinedForTesting()
>>> which is
>>> > > > self-sufficient.
>>> > > >
>>> > > > Everything in the scheduler is designed to do exactly what it
>>> needs to be
>>> > > safe.
>>> > > > Not more, not less.
>>> > > >
>>> > > > Adding Release semantics here implicitly encodes a requirement
>>> which doesn't
>>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely
>>> cares for a
>>> > > > sequence ;-).
>>> > >
>>> > > In light of the atomic ops discussion, any strong opinion to remove
>>> the
>>> > barrier
>>> > > here?
>>> >
>>> > From the conclusion we derived, I think the inc/decrements should use
>>> Barriers
>>> > and the read here should use Acquire_Load(). I'm still not convinced
>>> that this
>>> > is required when dealing with a single variable (all memory
>>> re-ordering examples
>>> > I've seen use two variables and even a NoBarrier_Increment has to use
>>> RMW --
>>> > read-modify-write -- semantics which have an implicit "barrier" for at
>>> least
>>> > that single variable (but would allow memory re-ordering of other
>>> memory regions
>>> > with respect to it?), i.e. maybe we could get away with
>>> NoBarrier_Increment and
>>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
>>> we need
>>> > the Acquire_Load() though as the read doesn't have an implicit RMW
>>> like the
>>> > increments do (don't understand when Release_Load() is appropriate --
>>> C++11
>>> > doesn't provide it FWIW).
>>>
>>> I've my understanding is right, the atomic ops are guaranteed to provide
>>> atomicity in operations. The barriers control visibility.
>>>
>>> So IIUC, it's like this
>>>
>>> [Atomic Operation]
>>>
>>> NoBarrier_AtomicIncrement of the same atomic variable:
>>> Thread A [In-place Atomic Increment] [In-place Atomic Increment]
>>> Thread B [In-place Atomic Increment] [In-place Atomic Increment]
>>>
>>> Result Case 1 0->1 1->2
>>> Result Case 2 1->2 2->3
>>> (And more)
>>>
>>> The value is atomic (we aren't reading in the middle of another thread
>>> writing). The visibility is up to the processor.
>>>
>>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
>>> exclusion)
>>>
>>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place
>>> Atomic Increment + B]
>>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place
>>> Atomic Increment + B]
>>> Result Case 1 A: 0->1 3->4
>>> B: 1->2 2->3
>>>
>>> Result Case 2 A: 1->2 3->4
>>> B: 0->1 2->3
>>> (...)
>>>
>>> Non-atomic increment would separate read-increment-write steps, which
>>> for word sized data types, would lead to similar behavior as
>>> NoBarrier_AtomicIncrement.
>>>
>>
>> Quibble: My understanding is that even word-sized increments can end up
>> being non-atomic inside the processor.
>>
>>
>>> For sizes larger than a word, we could see
>>> intermediate reads and writes.
>>>
>>> So that indeed suggests I need a barrier at the increment and
>>> decrements.
>>>
>>>
>>>
>>>
>>>
>>> So as for whether or not to use Acquire_Load and Release_Load, I guess
>>> the question comes down when do I want to sync the world
>>>
>>> Acquire [Sync] Read
>>> Release Read [Sync]
>>>
>>> Given that this count could be changing any time, practically speaking,
>>> it doesn't seem to matter. If we get a 0, it could be 1 right after the
>>> read in either case. The other thing to consider is the memory around
>>> the value. If that needs to be visible too, then you'll also need a
>>> barrier (e.g. a certain variable A is set to a value iff a variable B is
>>> 0).
>>> A Release_Store ensures that setting A and B in concert is visible
>>> immediately after B is set.
>>>
>>> In this case of a read, the other values are synchronized, so it doesn't
>>> matter as much if we sync now or later. With this analysis here, I think
>>> arguably we can use NoBarrier_Load since with the increment barriers,
>>> the value is visible everywhere.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
>>> join_for_testing_called_lock_;
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > On 2016/07/13 21:07:55, gab wrote:
>>> > > On 2016/07/07 20:45:57, fdoray wrote:
>>> > > > On 2016/07/07 17:49:16, robliao wrote:
>>> > > > > On 2016/07/04 19:41:32, fdoray wrote:
>>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
>>> WaitableEvent
>>> > > > instead
>>> > > > > of
>>> > > > > > a SchedulerLock+bool.
>>> > > > >
>>> > > > > Here, that seems like an abuse of the WaitableEvent since we
>>> don't intend
>>> > to
>>> > > > > wait on it. Maybe we should be using an atomic instead in those
>>> cases?
>>> > > >
>>> > > > It would be nice to have an AtomicBool class. Since we don't have
>>> it yet,
>>> > you
>>> > > > can keep this as-is.
>>> > >
>>> > > We have CancellationFlag. It's name is a bit too specific and no one
>>> has
>>> > > bothered renaming it, but what some people do is typedef it to a
>>> > > WhateverElseFlag.
>>> > >
>>> > > Whether the flag or a WaitableEvent I would strongly prefer having a
>>> single
>>> > > member here. Having single-member locks already bothers me some and
>>> when
>>> > they're
>>> > > test specific it bothers me some more.
>>> >
>>> > ping ^^
>>> >
>>> > >
>>> > > PS: I'd be happy to see someone change that class to a BooleanFlag
>>> with a
>>> > > typedef at bottom to CancellationFlag for legacy usage.
>>> >
>>>
>>> Yup. I see your pending change for the atomic flag. We can incorporate
>>> that once the name change goes in. A typedef seems like an abuse of the
>>> type system just to perform a local rename.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl.cc:144:
>>> SchedulerWorkerPoolImpl* worker_pool,
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > Can stay SchedulerWorkerPool*
>>>
>>> Done.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl.cc:446:
>>> performed_idle_cycle_ = true;
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > So if we create/wake one too many thread (i.e. 2 too many threads for
>>> workload
>>> > given we always aim to keep one ready), we'll immediately detach it?
>>> Or do we
>>> > want to at least wait for "reclaim time" (in which case I think
>>> starting this at
>>> > false gives us what we want?)
>>>
>>> Yes, if we wake one too many threads with no work, those threads will
>>> detach. In general, we shouldn't be doing that as that's indicative of a
>>> bug on our side.
>>>
>>> Waiting for the reclaim time is just as bad since we'll effectively have
>>> these threads spin waiting. We should instead fix the wake bug.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an
>>> issue if |num_single_threaded_runners_| is incremented after
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > [...] after this because the newly created TaskRunner (from which no
>>> task has
>>> > run yet) will simply run all its tasks on the next physical thread
>>> created by
>>> > this worker.
>>>
>>> Done.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl.cc:564:
>>> worker_detachment_allowed_(true),
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > (bool should be replaced by flag as mentioned in previous comment, but
>>> if it's
>>> > not : initialize it inline in header)
>>>
>>> Moved initialization.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker
>>> thread detachment. If the suggested reclaim time is not
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > // [...], then the test should call this before JoinForTesting()
>>> > // when no worker thread can be in the middle of detaching. That
>>> > // can be when all workers are known to be idle (after
>>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after
>>> > // posting as many blocking tasks as the number of workers).
>>> >
>>> > "before the first set of work is dispatched": If someone wants to
>>> disallow
>>> > detachment before posting work, shouldn't they use TimeDelta::Max()
>>> for the
>>> > suggested reclaim time?
>>> >
>>>
>>> Unless the test requires the ability to detach. I've relaxed the
>>> requirement from "first".
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const
>>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > constexpr
>>>
>>> Done.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
>>> worker_pool_->DisallowWorkerDetachmentForTesting();
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > This is not required given that suggested reclaim time is
>>> TimeDelta::Max(),
>>> > right?
>>>
>>> Yep. Done.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
>>> worker_pool->DisallowWorkerDetachmentForTesting();
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > This is not required given that suggested reclaim time is
>>> TimeDelta::Max(),
>>> > right?
>>>
>>> Yep. Done.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
>>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
>>> On 2016/07/20 01:45:21, gab wrote:
>>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)
>>>
>>> We're counting zero TLS values here so the name should stick. The TLS is
>>> always initialized to zero.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
>>> EXPECT_GT(zero_tls_values_, 0);
>>> On 2016/07/20 14:15:43, fdoray wrote:
>>> > On 2016/07/20 01:45:21, gab wrote:
>>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but
>>> in
>>> > > atomicops_internals_portable.h it appears to do more than just a
>>> simple deref.
>>> > >
>>> > > Note: NoBarrier is clearly correct because of the surrounding use of
>>> > > WaitableEvent (not that it's very efficient given the heavier
>>> synchronization
>>> > > constructs nearby but the barrier is definitely not necessary)
>>> >
>>> > atomicops.h:19 It is incorrect to make direct assignments to/from an
>>> atomic
>>> > variable. You should use one of the Load or Store routines.
>>>
>>> The atomicops unittest direct references their variables for testing
>>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).
>>>
>>> Went ahead and changed this to a NoBarrier_Load
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> File base/task_scheduler/scheduler_worker_pool_params.cc (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
>>> "base/time/time.h"
>>> On 2016/07/20 14:15:44, fdoray wrote:
>>> > Not needed if included in the header.
>>>
>>> See earlier comment.
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> File base/task_scheduler/scheduler_worker_pool_params.h (right):
>>>
>>>
>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
>>> On 2016/07/20 14:15:44, fdoray wrote:
>>> > #include "base/time/time.h" given that there is a TimeDelta member?
>>>
>>> The forward-decl still works here since we only need to know the size in
>>> the constructor. Folks who include this should also include time.h as
>>> well since they themselves will be using TimeDelta.
>>>
>>> https://codereview.chromium.org/2116163002/
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

robliao

Yeah, I think atomicops are indeed atomic. Even the arm64 version of NoBarrier_AtomicIncrement <https://cs.chromium.org/chromium/src/v8/src/base/atomicops_internals_arm64_gcc.h?q=NoBarrier_AtomicIncrement> attempts ...

4 years, 5 months ago (2016-07-20 21:08:02 UTC) #44

Yeah, I think atomicops are indeed atomic. Even the arm64 version of
NoBarrier_AtomicIncrement
<https://cs.chromium.org/chromium/src/v8/src/base/atomicops_internals_arm64_gc...>
attempts to make sure the op was atomic. Since arm64 doesn't ensure the
total ordering in multiprocessor systems, the barriers are required on both
sides of the increment.

On Wed, Jul 20, 2016 at 2:00 PM, <danakj@chromium.org> wrote:

> On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org> wrote:
>
>> The way that the atomic increment is implemented for x86 on Windows is
>> via _InterlockedExchangeAdd, which is intended to be an atomic operation.
>>
>> x86 allows for atomic increments in the processor as long as you specify
>> the lock prefix on the INC instruction.
>> Adds register to register are generally atomic in word sizes.
>>
>> The _InterlockedExchangeAdd gets translated to
>> lock xadd [mem location], [increment value]
>> making the increment atomic.
>>
>> Since on x86 in multiprocessor systems lock prefixed instructions have a
>> total order, the barriers are not necessary.
>>
>
> Perhaps thats related to why barriers have 0 perf impact on x86? (A reason
> to write generally for any platform I guess.)
>
>
>>
>>
>> On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote:
>>
>>> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote:
>>>
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:531:
>>>> !subtle::Release_Load(&num_single_threaded_runners_) &&
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > On 2016/07/19 22:03:47, robliao wrote:
>>>> > > On 2016/07/13 21:07:55, gab wrote:
>>>> > > > On 2016/07/13 20:19:46, robliao wrote:
>>>> > > > > On 2016/07/13 18:36:31, gab wrote:
>>>> > > > > > > > Why do you need Release_Load() here?
>>>> > > > > > > Since CanDetach may make decisions based off of surrounding
>>>> state from
>>>> > > the
>>>> > > > > > > single threaded task runners (like idle thread accounting),
>>>> I figured
>>>> > it
>>>> > > > > would
>>>> > > > > > > be robust to make sure everything is updated at this point.
>>>> > > > > >
>>>> > > > > > Hmmm, I don't think this matters, idle thread accounting is
>>>> done on its
>>>> > > own
>>>> > > > > (and
>>>> > > > > > is in fact checked before the ReleaseLoad here anyways). All
>>>> checks are
>>>> > > > > > self-sufficient so this one is really only about
>>>> > > > |num_single_threaded_runners|
>>>> > > > > > IMO.
>>>> > > > >
>>>> > > > > In either case, it's the safest thing to do here. We want to
>>>> make sure all
>>>> > > > state
>>>> > > > > from before gets synced up to this point.
>>>> > > >
>>>> > > > I disagree. It's not safer. It's equivalent (plus overhead).
>>>> CanDetach()
>>>> > > doesn't
>>>> > > > document that it produces a memory barrier and nothing relies on
>>>> it doing
>>>> > so.
>>>> > > > The only other thing after this call is HasJoinedForTesting()
>>>> which is
>>>> > > > self-sufficient.
>>>> > > >
>>>> > > > Everything in the scheduler is designed to do exactly what it
>>>> needs to be
>>>> > > safe.
>>>> > > > Not more, not less.
>>>> > > >
>>>> > > > Adding Release semantics here implicitly encodes a requirement
>>>> which doesn't
>>>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely
>>>> cares for a
>>>> > > > sequence ;-).
>>>> > >
>>>> > > In light of the atomic ops discussion, any strong opinion to remove
>>>> the
>>>> > barrier
>>>> > > here?
>>>> >
>>>> > From the conclusion we derived, I think the inc/decrements should use
>>>> Barriers
>>>> > and the read here should use Acquire_Load(). I'm still not convinced
>>>> that this
>>>> > is required when dealing with a single variable (all memory
>>>> re-ordering examples
>>>> > I've seen use two variables and even a NoBarrier_Increment has to use
>>>> RMW --
>>>> > read-modify-write -- semantics which have an implicit "barrier" for at
>>>> least
>>>> > that single variable (but would allow memory re-ordering of other
>>>> memory regions
>>>> > with respect to it?), i.e. maybe we could get away with
>>>> NoBarrier_Increment and
>>>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
>>>> we need
>>>> > the Acquire_Load() though as the read doesn't have an implicit RMW
>>>> like the
>>>> > increments do (don't understand when Release_Load() is appropriate --
>>>> C++11
>>>> > doesn't provide it FWIW).
>>>>
>>>> I've my understanding is right, the atomic ops are guaranteed to provide
>>>> atomicity in operations. The barriers control visibility.
>>>>
>>>> So IIUC, it's like this
>>>>
>>>> [Atomic Operation]
>>>>
>>>> NoBarrier_AtomicIncrement of the same atomic variable:
>>>> Thread A [In-place Atomic Increment] [In-place Atomic Increment]
>>>> Thread B [In-place Atomic Increment] [In-place Atomic Increment]
>>>>
>>>> Result Case 1 0->1 1->2
>>>> Result Case 2 1->2 2->3
>>>> (And more)
>>>>
>>>> The value is atomic (we aren't reading in the middle of another thread
>>>> writing). The visibility is up to the processor.
>>>>
>>>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
>>>> exclusion)
>>>>
>>>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place
>>>> Atomic Increment + B]
>>>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place
>>>> Atomic Increment + B]
>>>> Result Case 1 A: 0->1 3->4
>>>> B: 1->2 2->3
>>>>
>>>> Result Case 2 A: 1->2 3->4
>>>> B: 0->1 2->3
>>>> (...)
>>>>
>>>> Non-atomic increment would separate read-increment-write steps, which
>>>> for word sized data types, would lead to similar behavior as
>>>> NoBarrier_AtomicIncrement.
>>>>
>>>
>>> Quibble: My understanding is that even word-sized increments can end up
>>> being non-atomic inside the processor.
>>>
>>>
>>>> For sizes larger than a word, we could see
>>>> intermediate reads and writes.
>>>>
>>>> So that indeed suggests I need a barrier at the increment and
>>>> decrements.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> So as for whether or not to use Acquire_Load and Release_Load, I guess
>>>> the question comes down when do I want to sync the world
>>>>
>>>> Acquire [Sync] Read
>>>> Release Read [Sync]
>>>>
>>>> Given that this count could be changing any time, practically speaking,
>>>> it doesn't seem to matter. If we get a 0, it could be 1 right after the
>>>> read in either case. The other thing to consider is the memory around
>>>> the value. If that needs to be visible too, then you'll also need a
>>>> barrier (e.g. a certain variable A is set to a value iff a variable B is
>>>> 0).
>>>> A Release_Store ensures that setting A and B in concert is visible
>>>> immediately after B is set.
>>>>
>>>> In this case of a read, the other values are synchronized, so it doesn't
>>>> matter as much if we sync now or later. With this analysis here, I think
>>>> arguably we can use NoBarrier_Load since with the increment barriers,
>>>> the value is visible everywhere.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
>>>> join_for_testing_called_lock_;
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > On 2016/07/13 21:07:55, gab wrote:
>>>> > > On 2016/07/07 20:45:57, fdoray wrote:
>>>> > > > On 2016/07/07 17:49:16, robliao wrote:
>>>> > > > > On 2016/07/04 19:41:32, fdoray wrote:
>>>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
>>>> WaitableEvent
>>>> > > > instead
>>>> > > > > of
>>>> > > > > > a SchedulerLock+bool.
>>>> > > > >
>>>> > > > > Here, that seems like an abuse of the WaitableEvent since we
>>>> don't intend
>>>> > to
>>>> > > > > wait on it. Maybe we should be using an atomic instead in those
>>>> cases?
>>>> > > >
>>>> > > > It would be nice to have an AtomicBool class. Since we don't have
>>>> it yet,
>>>> > you
>>>> > > > can keep this as-is.
>>>> > >
>>>> > > We have CancellationFlag. It's name is a bit too specific and no one
>>>> has
>>>> > > bothered renaming it, but what some people do is typedef it to a
>>>> > > WhateverElseFlag.
>>>> > >
>>>> > > Whether the flag or a WaitableEvent I would strongly prefer having a
>>>> single
>>>> > > member here. Having single-member locks already bothers me some and
>>>> when
>>>> > they're
>>>> > > test specific it bothers me some more.
>>>> >
>>>> > ping ^^
>>>> >
>>>> > >
>>>> > > PS: I'd be happy to see someone change that class to a BooleanFlag
>>>> with a
>>>> > > typedef at bottom to CancellationFlag for legacy usage.
>>>> >
>>>>
>>>> Yup. I see your pending change for the atomic flag. We can incorporate
>>>> that once the name change goes in. A typedef seems like an abuse of the
>>>> type system just to perform a local rename.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:144:
>>>> SchedulerWorkerPoolImpl* worker_pool,
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > Can stay SchedulerWorkerPool*
>>>>
>>>> Done.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:446:
>>>> performed_idle_cycle_ = true;
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > So if we create/wake one too many thread (i.e. 2 too many threads for
>>>> workload
>>>> > given we always aim to keep one ready), we'll immediately detach it?
>>>> Or do we
>>>> > want to at least wait for "reclaim time" (in which case I think
>>>> starting this at
>>>> > false gives us what we want?)
>>>>
>>>> Yes, if we wake one too many threads with no work, those threads will
>>>> detach. In general, we shouldn't be doing that as that's indicative of a
>>>> bug on our side.
>>>>
>>>> Waiting for the reclaim time is just as bad since we'll effectively have
>>>> these threads spin waiting. We should instead fix the wake bug.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an
>>>> issue if |num_single_threaded_runners_| is incremented after
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > [...] after this because the newly created TaskRunner (from which no
>>>> task has
>>>> > run yet) will simply run all its tasks on the next physical thread
>>>> created by
>>>> > this worker.
>>>>
>>>> Done.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:564:
>>>> worker_detachment_allowed_(true),
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > (bool should be replaced by flag as mentioned in previous comment, but
>>>> if it's
>>>> > not : initialize it inline in header)
>>>>
>>>> Moved initialization.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows worker
>>>> thread detachment. If the suggested reclaim time is not
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > // [...], then the test should call this before JoinForTesting()
>>>> > // when no worker thread can be in the middle of detaching. That
>>>> > // can be when all workers are known to be idle (after
>>>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after
>>>> > // posting as many blocking tasks as the number of workers).
>>>> >
>>>> > "before the first set of work is dispatched": If someone wants to
>>>> disallow
>>>> > detachment before posting work, shouldn't they use TimeDelta::Max()
>>>> for the
>>>> > suggested reclaim time?
>>>> >
>>>>
>>>> Unless the test requires the ability to detach. I've relaxed the
>>>> requirement from "first".
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const
>>>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > constexpr
>>>>
>>>> Done.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
>>>> worker_pool_->DisallowWorkerDetachmentForTesting();
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > This is not required given that suggested reclaim time is
>>>> TimeDelta::Max(),
>>>> > right?
>>>>
>>>> Yep. Done.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
>>>> worker_pool->DisallowWorkerDetachmentForTesting();
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > This is not required given that suggested reclaim time is
>>>> TimeDelta::Max(),
>>>> > right?
>>>>
>>>> Yep. Done.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
>>>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
>>>> On 2016/07/20 01:45:21, gab wrote:
>>>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)
>>>>
>>>> We're counting zero TLS values here so the name should stick. The TLS is
>>>> always initialized to zero.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
>>>> EXPECT_GT(zero_tls_values_, 0);
>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>> > On 2016/07/20 01:45:21, gab wrote:
>>>> > > Should this use NoBarrier_Load()? On Windows this is equivalent, but
>>>> in
>>>> > > atomicops_internals_portable.h it appears to do more than just a
>>>> simple deref.
>>>> > >
>>>> > > Note: NoBarrier is clearly correct because of the surrounding use of
>>>> > > WaitableEvent (not that it's very efficient given the heavier
>>>> synchronization
>>>> > > constructs nearby but the barrier is definitely not necessary)
>>>> >
>>>> > atomicops.h:19 It is incorrect to make direct assignments to/from an
>>>> atomic
>>>> > variable. You should use one of the Load or Store routines.
>>>>
>>>> The atomicops unittest direct references their variables for testing
>>>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).
>>>>
>>>> Went ahead and changed this to a NoBarrier_Load
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> File base/task_scheduler/scheduler_worker_pool_params.cc (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
>>>> "base/time/time.h"
>>>> On 2016/07/20 14:15:44, fdoray wrote:
>>>> > Not needed if included in the header.
>>>>
>>>> See earlier comment.
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> File base/task_scheduler/scheduler_worker_pool_params.h (right):
>>>>
>>>>
>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
>>>> On 2016/07/20 14:15:44, fdoray wrote:
>>>> > #include "base/time/time.h" given that there is a TimeDelta member?
>>>>
>>>> The forward-decl still works here since we only need to know the size in
>>>> the constructor. Folks who include this should also include time.h as
>>>> well since they themselves will be using TimeDelta.
>>>>
>>>> https://codereview.chromium.org/2116163002/
>>>>
>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

robliao

*Barriers are required on both sides on the Barrier_AtomicIncrement version. On Wed, Jul 20, 2016 ...

4 years, 5 months ago (2016-07-20 21:08:27 UTC) #45

*Barriers are required on both sides on the Barrier_AtomicIncrement version.

On Wed, Jul 20, 2016 at 2:07 PM, Robert Liao <robliao@chromium.org> wrote:

> Yeah, I think atomicops are indeed atomic. Even the arm64 version of
> NoBarrier_AtomicIncrement
>
<https://cs.chromium.org/chromium/src/v8/src/base/atomicops_internals_arm64_gc...>
> attempts to make sure the op was atomic. Since arm64 doesn't ensure the
> total ordering in multiprocessor systems, the barriers are required on both
> sides of the increment.
>
> On Wed, Jul 20, 2016 at 2:00 PM, <danakj@chromium.org> wrote:
>
>> On Wed, Jul 20, 2016 at 1:58 PM, Robert Liao <robliao@chromium.org>
>> wrote:
>>
>>> The way that the atomic increment is implemented for x86 on Windows is
>>> via _InterlockedExchangeAdd, which is intended to be an atomic operation.
>>>
>>> x86 allows for atomic increments in the processor as long as you specify
>>> the lock prefix on the INC instruction.
>>> Adds register to register are generally atomic in word sizes.
>>>
>>> The _InterlockedExchangeAdd gets translated to
>>> lock xadd [mem location], [increment value]
>>> making the increment atomic.
>>>
>>> Since on x86 in multiprocessor systems lock prefixed instructions have a
>>> total order, the barriers are not necessary.
>>>
>>
>> Perhaps thats related to why barriers have 0 perf impact on x86? (A
>> reason to write generally for any platform I guess.)
>>
>>
>>>
>>>
>>> On Wed, Jul 20, 2016 at 1:17 PM, <danakj@chromium.org> wrote:
>>>
>>>> On Wed, Jul 20, 2016 at 12:44 PM, <robliao@chromium.org> wrote:
>>>>
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:531:
>>>>> !subtle::Release_Load(&num_single_threaded_runners_) &&
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > On 2016/07/19 22:03:47, robliao wrote:
>>>>> > > On 2016/07/13 21:07:55, gab wrote:
>>>>> > > > On 2016/07/13 20:19:46, robliao wrote:
>>>>> > > > > On 2016/07/13 18:36:31, gab wrote:
>>>>> > > > > > > > Why do you need Release_Load() here?
>>>>> > > > > > > Since CanDetach may make decisions based off of surrounding
>>>>> state from
>>>>> > > the
>>>>> > > > > > > single threaded task runners (like idle thread accounting),
>>>>> I figured
>>>>> > it
>>>>> > > > > would
>>>>> > > > > > > be robust to make sure everything is updated at this point.
>>>>> > > > > >
>>>>> > > > > > Hmmm, I don't think this matters, idle thread accounting is
>>>>> done on its
>>>>> > > own
>>>>> > > > > (and
>>>>> > > > > > is in fact checked before the ReleaseLoad here anyways). All
>>>>> checks are
>>>>> > > > > > self-sufficient so this one is really only about
>>>>> > > > |num_single_threaded_runners|
>>>>> > > > > > IMO.
>>>>> > > > >
>>>>> > > > > In either case, it's the safest thing to do here. We want to
>>>>> make sure all
>>>>> > > > state
>>>>> > > > > from before gets synced up to this point.
>>>>> > > >
>>>>> > > > I disagree. It's not safer. It's equivalent (plus overhead).
>>>>> CanDetach()
>>>>> > > doesn't
>>>>> > > > document that it produces a memory barrier and nothing relies on
>>>>> it doing
>>>>> > so.
>>>>> > > > The only other thing after this call is HasJoinedForTesting()
>>>>> which is
>>>>> > > > self-sufficient.
>>>>> > > >
>>>>> > > > Everything in the scheduler is designed to do exactly what it
>>>>> needs to be
>>>>> > > safe.
>>>>> > > > Not more, not less.
>>>>> > > >
>>>>> > > > Adding Release semantics here implicitly encodes a requirement
>>>>> which doesn't
>>>>> > > > exist. It's like using ThreadTaskRunnerHandle when one merely
>>>>> cares for a
>>>>> > > > sequence ;-).
>>>>> > >
>>>>> > > In light of the atomic ops discussion, any strong opinion to remove
>>>>> the
>>>>> > barrier
>>>>> > > here?
>>>>> >
>>>>> > From the conclusion we derived, I think the inc/decrements should use
>>>>> Barriers
>>>>> > and the read here should use Acquire_Load(). I'm still not convinced
>>>>> that this
>>>>> > is required when dealing with a single variable (all memory
>>>>> re-ordering examples
>>>>> > I've seen use two variables and even a NoBarrier_Increment has to use
>>>>> RMW --
>>>>> > read-modify-write -- semantics which have an implicit "barrier" for
>>>>> at
>>>>> least
>>>>> > that single variable (but would allow memory re-ordering of other
>>>>> memory regions
>>>>> > with respect to it?), i.e. maybe we could get away with
>>>>> NoBarrier_Increment and
>>>>> > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
>>>>> we need
>>>>> > the Acquire_Load() though as the read doesn't have an implicit RMW
>>>>> like the
>>>>> > increments do (don't understand when Release_Load() is appropriate --
>>>>> C++11
>>>>> > doesn't provide it FWIW).
>>>>>
>>>>> I've my understanding is right, the atomic ops are guaranteed to
>>>>> provide
>>>>> atomicity in operations. The barriers control visibility.
>>>>>
>>>>> So IIUC, it's like this
>>>>>
>>>>> [Atomic Operation]
>>>>>
>>>>> NoBarrier_AtomicIncrement of the same atomic variable:
>>>>> Thread A [In-place Atomic Increment] [In-place Atomic Increment]
>>>>> Thread B [In-place Atomic Increment] [In-place Atomic Increment]
>>>>>
>>>>> Result Case 1 0->1 1->2
>>>>> Result Case 2 1->2 2->3
>>>>> (And more)
>>>>>
>>>>> The value is atomic (we aren't reading in the middle of another thread
>>>>> writing). The visibility is up to the processor.
>>>>>
>>>>> With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
>>>>> exclusion)
>>>>>
>>>>> Thread A [B + In-place Atomic Increment + B] [B + Barrier In-place
>>>>> Atomic Increment + B]
>>>>> Thread B [B + In-place Atomic Increment + B] [B + Barrier In-place
>>>>> Atomic Increment + B]
>>>>> Result Case 1 A: 0->1 3->4
>>>>> B: 1->2 2->3
>>>>>
>>>>> Result Case 2 A: 1->2 3->4
>>>>> B: 0->1 2->3
>>>>> (...)
>>>>>
>>>>> Non-atomic increment would separate read-increment-write steps, which
>>>>> for word sized data types, would lead to similar behavior as
>>>>> NoBarrier_AtomicIncrement.
>>>>>
>>>>
>>>> Quibble: My understanding is that even word-sized increments can end up
>>>> being non-atomic inside the processor.
>>>>
>>>>
>>>>> For sizes larger than a word, we could see
>>>>> intermediate reads and writes.
>>>>>
>>>>> So that indeed suggests I need a barrier at the increment and
>>>>> decrements.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> So as for whether or not to use Acquire_Load and Release_Load, I guess
>>>>> the question comes down when do I want to sync the world
>>>>>
>>>>> Acquire [Sync] Read
>>>>> Release Read [Sync]
>>>>>
>>>>> Given that this count could be changing any time, practically speaking,
>>>>> it doesn't seem to matter. If we get a 0, it could be 1 right after the
>>>>> read in either case. The other thing to consider is the memory around
>>>>> the value. If that needs to be visible too, then you'll also need a
>>>>> barrier (e.g. a certain variable A is set to a value iff a variable B
>>>>> is
>>>>> 0).
>>>>> A Release_Store ensures that setting A and B in concert is visible
>>>>> immediately after B is set.
>>>>>
>>>>> In this case of a read, the other values are synchronized, so it
>>>>> doesn't
>>>>> matter as much if we sync now or later. With this analysis here, I
>>>>> think
>>>>> arguably we can use NoBarrier_Load since with the increment barriers,
>>>>> the value is visible everywhere.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
>>>>> join_for_testing_called_lock_;
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > On 2016/07/13 21:07:55, gab wrote:
>>>>> > > On 2016/07/07 20:45:57, fdoray wrote:
>>>>> > > > On 2016/07/07 17:49:16, robliao wrote:
>>>>> > > > > On 2016/07/04 19:41:32, fdoray wrote:
>>>>> > > > > > Elsewhere in base/task_scheduler, we use a manual reset
>>>>> WaitableEvent
>>>>> > > > instead
>>>>> > > > > of
>>>>> > > > > > a SchedulerLock+bool.
>>>>> > > > >
>>>>> > > > > Here, that seems like an abuse of the WaitableEvent since we
>>>>> don't intend
>>>>> > to
>>>>> > > > > wait on it. Maybe we should be using an atomic instead in those
>>>>> cases?
>>>>> > > >
>>>>> > > > It would be nice to have an AtomicBool class. Since we don't have
>>>>> it yet,
>>>>> > you
>>>>> > > > can keep this as-is.
>>>>> > >
>>>>> > > We have CancellationFlag. It's name is a bit too specific and no
>>>>> one
>>>>> has
>>>>> > > bothered renaming it, but what some people do is typedef it to a
>>>>> > > WhateverElseFlag.
>>>>> > >
>>>>> > > Whether the flag or a WaitableEvent I would strongly prefer having
>>>>> a
>>>>> single
>>>>> > > member here. Having single-member locks already bothers me some and
>>>>> when
>>>>> > they're
>>>>> > > test specific it bothers me some more.
>>>>> >
>>>>> > ping ^^
>>>>> >
>>>>> > >
>>>>> > > PS: I'd be happy to see someone change that class to a BooleanFlag
>>>>> with a
>>>>> > > typedef at bottom to CancellationFlag for legacy usage.
>>>>> >
>>>>>
>>>>> Yup. I see your pending change for the atomic flag. We can incorporate
>>>>> that once the name change goes in. A typedef seems like an abuse of the
>>>>> type system just to perform a local rename.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> File base/task_scheduler/scheduler_worker_pool_impl.cc (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:144:
>>>>> SchedulerWorkerPoolImpl* worker_pool,
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > Can stay SchedulerWorkerPool*
>>>>>
>>>>> Done.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:446:
>>>>> performed_idle_cycle_ = true;
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > So if we create/wake one too many thread (i.e. 2 too many threads for
>>>>> workload
>>>>> > given we always aim to keep one ready), we'll immediately detach it?
>>>>> Or do we
>>>>> > want to at least wait for "reclaim time" (in which case I think
>>>>> starting this at
>>>>> > false gives us what we want?)
>>>>>
>>>>> Yes, if we wake one too many threads with no work, those threads will
>>>>> detach. In general, we shouldn't be doing that as that's indicative of
>>>>> a
>>>>> bug on our side.
>>>>>
>>>>> Waiting for the reclaim time is just as bad since we'll effectively
>>>>> have
>>>>> these threads spin waiting. We should instead fix the wake bug.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:533: // It's not an
>>>>> issue if |num_single_threaded_runners_| is incremented after
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > [...] after this because the newly created TaskRunner (from which no
>>>>> task has
>>>>> > run yet) will simply run all its tasks on the next physical thread
>>>>> created by
>>>>> > this worker.
>>>>>
>>>>> Done.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.cc:564:
>>>>> worker_detachment_allowed_(true),
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > (bool should be replaced by flag as mentioned in previous comment,
>>>>> but
>>>>> if it's
>>>>> > not : initialize it inline in header)
>>>>>
>>>>> Moved initialization.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> File base/task_scheduler/scheduler_worker_pool_impl.h (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl.h:73: // Disallows
>>>>> worker
>>>>> thread detachment. If the suggested reclaim time is not
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > // [...], then the test should call this before JoinForTesting()
>>>>> > // when no worker thread can be in the middle of detaching. That
>>>>> > // can be when all workers are known to be idle (after
>>>>> > // WaitForAllWorkersIdleForTesting() returned) or busy (after
>>>>> > // posting as many blocking tasks as the number of workers).
>>>>> >
>>>>> > "before the first set of work is dispatched": If someone wants to
>>>>> disallow
>>>>> > detachment before posting work, shouldn't they use TimeDelta::Max()
>>>>> for the
>>>>> > suggested reclaim time?
>>>>> >
>>>>>
>>>>> Unless the test requires the ability to detach. I've relaxed the
>>>>> requirement from "first".
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> File base/task_scheduler/scheduler_worker_pool_impl_unittest.cc
>>>>> (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:46: const
>>>>> TimeDelta kSuggestedReclaimTime = TimeDelta::FromMilliseconds(10);
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > constexpr
>>>>>
>>>>> Done.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:72:
>>>>> worker_pool_->DisallowWorkerDetachmentForTesting();
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > This is not required given that suggested reclaim time is
>>>>> TimeDelta::Max(),
>>>>> > right?
>>>>>
>>>>> Yep. Done.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:386:
>>>>> worker_pool->DisallowWorkerDetachmentForTesting();
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > This is not required given that suggested reclaim time is
>>>>> TimeDelta::Max(),
>>>>> > right?
>>>>>
>>>>> Yep. Done.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:477: void
>>>>> CountZeroTlsValuesAndWait(WaitableEvent* count_waiter) {
>>>>> On 2016/07/20 01:45:21, gab wrote:
>>>>> > s/ZeroTlsValue/NoMagicTlsValue/ (here and below for variable?)
>>>>>
>>>>> We're counting zero TLS values here so the name should stick. The TLS
>>>>> is
>>>>> always initialized to zero.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_impl_unittest.cc:554:
>>>>> EXPECT_GT(zero_tls_values_, 0);
>>>>> On 2016/07/20 14:15:43, fdoray wrote:
>>>>> > On 2016/07/20 01:45:21, gab wrote:
>>>>> > > Should this use NoBarrier_Load()? On Windows this is equivalent,
>>>>> but
>>>>> in
>>>>> > > atomicops_internals_portable.h it appears to do more than just a
>>>>> simple deref.
>>>>> > >
>>>>> > > Note: NoBarrier is clearly correct because of the surrounding use
>>>>> of
>>>>> > > WaitableEvent (not that it's very efficient given the heavier
>>>>> synchronization
>>>>> > > constructs nearby but the barrier is definitely not necessary)
>>>>> >
>>>>> > atomicops.h:19 It is incorrect to make direct assignments to/from an
>>>>> atomic
>>>>> > variable. You should use one of the Load or Store routines.
>>>>>
>>>>> The atomicops unittest direct references their variables for testing
>>>>> (https://cs.chromium.org/chromium/src/base/atomicops_unittest.cc).
>>>>>
>>>>> Went ahead and changed this to a NoBarrier_Load
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> File base/task_scheduler/scheduler_worker_pool_params.cc (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_params.cc:7: #include
>>>>> "base/time/time.h"
>>>>> On 2016/07/20 14:15:44, fdoray wrote:
>>>>> > Not needed if included in the header.
>>>>>
>>>>> See earlier comment.
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> File base/task_scheduler/scheduler_worker_pool_params.h (right):
>>>>>
>>>>>
>>>>>
https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
>>>>> base/task_scheduler/scheduler_worker_pool_params.h:15: class TimeDelta;
>>>>> On 2016/07/20 14:15:44, fdoray wrote:
>>>>> > #include "base/time/time.h" given that there is a TimeDelta member?
>>>>>
>>>>> The forward-decl still works here since we only need to know the size
>>>>> in
>>>>> the constructor. Folks who include this should also include time.h as
>>>>> well since they themselves will be using TimeDelta.
>>>>>
>>>>> https://codereview.chromium.org/2116163002/
>>>>>
>>>>
>>>>
>>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

robliao

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 20:26:31, gab wrote: > On ...

4 years, 5 months ago (2016-07-20 22:18:03 UTC) #49

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/20 20:26:31, gab wrote:
> On 2016/07/20 19:44:00, robliao wrote:
> > On 2016/07/20 01:45:21, gab wrote:
> > > On 2016/07/19 22:03:47, robliao wrote:
> > > > On 2016/07/13 21:07:55, gab wrote:
> > > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > > Why do you need Release_Load() here?
> > > > > > > > Since CanDetach may make decisions based off of surrounding
state
> > from
> > > > the
> > > > > > > > single threaded task runners (like idle thread accounting), I
> > figured
> > > it
> > > > > > would
> > > > > > > > be robust to make sure everything is updated at this point.
> > > > > > > 
> > > > > > > Hmmm, I don't think this matters, idle thread accounting is done
on
> > its
> > > > own
> > > > > > (and
> > > > > > > is in fact checked before the ReleaseLoad here anyways). All
checks
> > are
> > > > > > > self-sufficient so this one is really only about
> > > > > |num_single_threaded_runners|
> > > > > > > IMO.
> > > > > > 
> > > > > > In either case, it's the safest thing to do here. We want to make
sure
> > all
> > > > > state
> > > > > > from before gets synced up to this point. 
> > > > > 
> > > > > I disagree. It's not safer. It's equivalent (plus overhead).
CanDetach()
> > > > doesn't
> > > > > document that it produces a memory barrier and nothing relies on it
> doing
> > > so.
> > > > > The only other thing after this call is HasJoinedForTesting() which is
> > > > > self-sufficient.
> > > > > 
> > > > > Everything in the scheduler is designed to do exactly what it needs to
> be
> > > > safe.
> > > > > Not more, not less.
> > > > > 
> > > > > Adding Release semantics here implicitly encodes a requirement which
> > doesn't
> > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares
for
> a
> > > > > sequence ;-).
> > > > 
> > > > In light of the atomic ops discussion, any strong opinion to remove the
> > > barrier
> > > > here?
> > > 
> > > From the conclusion we derived, I think the inc/decrements should use
> Barriers
> > > and the read here should use Acquire_Load(). I'm still not convinced that
> this
> > > is required when dealing with a single variable (all memory re-ordering
> > examples
> > > I've seen use two variables and even a NoBarrier_Increment has to use RMW
--
> > > read-modify-write -- semantics which have an implicit "barrier" for at
least
> > > that single variable (but would allow memory re-ordering of other memory
> > regions
> > > with respect to it?), i.e. maybe we could get away with
NoBarrier_Increment
> > and
> > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we
> need
> > > the Acquire_Load() though as the read doesn't have an implicit RMW like
the
> > > increments do (don't understand when Release_Load() is appropriate --
C++11
> > > doesn't provide it FWIW).
> > 
> > I've my understanding is right
> 
> We have different understandings on a few things (inline)
> 
> >, the atomic ops are guaranteed to provide
> > atomicity in operations. The barriers control visibility.
> > 
> > So IIUC, it's like this
> > 
> > [Atomic Operation]
> > 
> > NoBarrier_AtomicIncrement of the same atomic variable:
> > Thread A      [In-place Atomic Increment]    [In-place Atomic Increment]
> > Thread B      [In-place Atomic Increment]    [In-place Atomic Increment]
> > 
> > Result Case 1         0->1                          1->2
> > Result Case 2               1->2                    2->3
> > (And more)
> > 
> > The value is atomic (we aren't reading in the middle of another thread
> writing).
> > The visibility is up to the processor. 
> > 
> > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> > exclusion)
> > 
> > Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
> > Increment + B]
> > Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place Atomic
> > Increment + B]
> > Result Case 1    A:  0->1                             3->4
> >                  B:       1->2                   2->3
> > 
> > Result Case 2    A:       1->2                        3->4
> >                  B:  0->1                        2->3
> > (...)
> > 
> > Non-atomic increment would separate read-increment-write steps, which for
word
> > sized data types, would lead to similar behavior as
NoBarrier_AtomicIncrement.
> 
> Hmmm separating read-increment-write steps would be wrong even for word sized
> data types (the whole point of atomic RMW is to make sure to do so as a single
> transaction).
> 
> > For sizes larger than a word, we could see intermediate reads and writes.
> > 
> > So that indeed suggests I need a barrier at the increment and decrements.
> 
> I think you need a barrier so that the result of the increment is "flushed"
> (guaranteed to not be reordered after the next sequential read).
> 
> > 
> > 
> > 
> > 
> > 
> > So as for whether or not to use Acquire_Load and Release_Load, I guess the
> > question comes down when do I want to sync the world
> > 
> > Acquire [Sync]  Read
> > Release         Read  [Sync]
> > 
> > Given that this count could be changing any time, practically speaking, it
> > doesn't seem to matter. If we get a 0, it could be 1 right after the read in
> > either case. The other thing to consider is the memory around the value. If
> that
> > needs to be visible too, then you'll also need a barrier (e.g. a certain
> > variable A is set to a value iff a variable B is 0).
> 
> I think we do care about syncing before (Acquire), otherwise a thread could
see
> 0 forever. It's not relevant that the count could go up right after check (we
> explicitly support that by sending a WakeUp which would result in a sleep
no-op
> IIRC). What's relevant is that we don't see a value that's outdated beyond the
> range we explicitly protect ourselves against.
> 
> > A Release_Store ensures that setting A and B in concert is visible
immediately
> > after B is set.
> 
> I don't think we care about that. Nothing depends on being synchronized with
the
> variable at stake here, adding the Release adds an implicit dependency which
we
> don't document and that's no good for readability (and a potential burden if
> someone starts depending on it because it happens to be there). Every
> synchronization in the scheduler is explicit and intentional, let's not add
one
> for the sake of "why not".
> 
> > 
> > In this case of a read, the other values are synchronized
> 
> What do you mean by "the other values are synchronized on read"? Which other
> values and why and why does it matter?
> 
> >, so it doesn't matter
> > as much if we sync now or later. With this analysis here, I think arguably
we
> > can use NoBarrier_Load since with the increment barriers, the value is
visible
> > everywhere.
> 

> Hmmm separating read-increment-write steps would be wrong even for word sized
data types (the whole point of atomic RMW is to make sure to do so as a single
transaction).

Indeed, but that's the joy of omitting the barriers. Non-atomic ops look
indistinguishable from atomic ops.

> I think you need a barrier so that the result of the increment is "flushed"
(guaranteed to not be reordered after the next sequential read).
Processors and compilers guarantee that single-threaded behavior is preserved
regardless of the ordering game within a single processor. When multiple
processors (and thus threads) are running, that guarantee goes down to the
memory coherence guarantees (which are quite strong on x86 and weak on arm for
example).

> I don't think we care about that. Nothing depends on being synchronized with
the
variable at stake here, adding the Release adds an implicit dependency which we
don't document and that's no good for readability (and a potential burden if
someone starts depending on it because it happens to be there). Every
synchronization in the scheduler is explicit and intentional, let's not add one
for the sake of "why not".

That's correct. Since all of the other values are already sync'ed via locks or
are only modified on one thread, we're generally okay.

> I think we do care about syncing before (Acquire), otherwise a thread could
see
0 forever. It's not relevant that the count could go up right after check (we
explicitly support that by sending a WakeUp which would result in a sleep no-op
IIRC). What's relevant is that we don't see a value that's outdated beyond the
range we explicitly protect ourselves against.

In the Barrier_AtomicIncrement world, a thread won't always see zero because the
increments are barriered. The memory address in cache is flushed and invalidated
on the other cores.

> What do you mean by "the other values are synchronized on read"? Which other
values and why and why does it matter?
I mean Release_Read. It doesn't matter in this case now that we're using
barriers for the counters.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.h:167: SchedulerLock
join_for_testing_called_lock_;
On 2016/07/20 20:26:31, gab wrote:
> On 2016/07/20 19:44:00, robliao wrote:
> > On 2016/07/20 01:45:21, gab wrote:
> > > On 2016/07/13 21:07:55, gab wrote:
> > > > On 2016/07/07 20:45:57, fdoray wrote:
> > > > > On 2016/07/07 17:49:16, robliao wrote:
> > > > > > On 2016/07/04 19:41:32, fdoray wrote:
> > > > > > > Elsewhere in base/task_scheduler, we use a manual reset
> WaitableEvent
> > > > > instead
> > > > > > of
> > > > > > > a SchedulerLock+bool.
> > > > > > 
> > > > > > Here, that seems like an abuse of the WaitableEvent since we don't
> > intend
> > > to
> > > > > > wait on it. Maybe we should be using an atomic instead in those
cases?
> > > > > 
> > > > > It would be nice to have an AtomicBool class. Since we don't have it
> yet,
> > > you
> > > > > can keep this as-is.
> > > > 
> > > > We have CancellationFlag. It's name is a bit too specific and no one has
> > > > bothered renaming it, but what some people do is typedef it to a
> > > > WhateverElseFlag.
> > > > 
> > > > Whether the flag or a WaitableEvent I would strongly prefer having a
> single
> > > > member here. Having single-member locks already bothers me some and when
> > > they're
> > > > test specific it bothers me some more.
> > > 
> > > ping ^^
> > > 
> > > > 
> > > > PS: I'd be happy to see someone change that class to a BooleanFlag with
a
> > > > typedef at bottom to CancellationFlag for legacy usage.
> > > 
> > 
> > Yup. I see your pending change for the atomic flag. 
> 
> Which change? It sure isn't mine or I code in my sleep :-)
> 
> > We can incorporate that once
> > the name change goes in. A typedef seems like an abuse of the type system
just
> > to perform a local rename.
> 
> It's at least done here (and I remember other places but I can't find it,
maybe
> deleted code). I don't see why not, but fine to do it later too.
> 
>
https://cs.chromium.org/chromium/src/chrome/browser/after_startup_task_utils....

Ah, I guess I misread it and it was fdoray's
https://codereview.chromium.org/2163753004/

It's coming, so we might as well wait for that.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ =
true;
On 2016/07/20 20:26:31, gab wrote:
> On 2016/07/20 19:44:01, robliao wrote:
> > On 2016/07/20 01:45:21, gab wrote:
> > > So if we create/wake one too many thread (i.e. 2 too many threads for
> workload
> > > given we always aim to keep one ready), we'll immediately detach it? Or do
> we
> > > want to at least wait for "reclaim time" (in which case I think starting
> this
> > at
> > > false gives us what we want?)
> > 
> > Yes, if we wake one too many threads with no work, those threads will
detach.
> In
> > general, we shouldn't be doing that as that's indicative of a bug on our
side.
> 
> > 
> > Waiting for the reclaim time is just as bad since we'll effectively have
these
> > threads spin waiting. We should instead fix the wake bug.
> 
> It's not necessarily a bug. It can today under a simple bursty load.
> 
> i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all
processed
> by first two threads before other 2 even get scheduled.
> 
> I think in those cases the ever idle threads should wait for reclaim time or
we
> might end up trashing the system. Say 4 more tasks show up after an immediate
> reclaim: we recreate those 2 threads but the 2 active one yet again manage to
> process the 4 tasks before the 2 recreated threads are scheduled (etc.).
> 
> So I still think we should start this false (and explain the above in a
comment
> here).
> 
> (and we should reset it to false right before detaching as well for the same
> reasons)

Fair enough. I've moved this to an idle_start_time instead of tracking one cycle
count.

gab

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/20 22:18:03, robliao wrote: > On ...

4 years, 5 months ago (2016-07-21 13:43:22 UTC) #50

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/20 22:18:03, robliao wrote:
> On 2016/07/20 20:26:31, gab wrote:
> > On 2016/07/20 19:44:00, robliao wrote:
> > > On 2016/07/20 01:45:21, gab wrote:
> > > > On 2016/07/19 22:03:47, robliao wrote:
> > > > > On 2016/07/13 21:07:55, gab wrote:
> > > > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > > > Why do you need Release_Load() here?
> > > > > > > > > Since CanDetach may make decisions based off of surrounding
> state
> > > from
> > > > > the
> > > > > > > > > single threaded task runners (like idle thread accounting), I
> > > figured
> > > > it
> > > > > > > would
> > > > > > > > > be robust to make sure everything is updated at this point.
> > > > > > > > 
> > > > > > > > Hmmm, I don't think this matters, idle thread accounting is done
> on
> > > its
> > > > > own
> > > > > > > (and
> > > > > > > > is in fact checked before the ReleaseLoad here anyways). All
> checks
> > > are
> > > > > > > > self-sufficient so this one is really only about
> > > > > > |num_single_threaded_runners|
> > > > > > > > IMO.
> > > > > > > 
> > > > > > > In either case, it's the safest thing to do here. We want to make
> sure
> > > all
> > > > > > state
> > > > > > > from before gets synced up to this point. 
> > > > > > 
> > > > > > I disagree. It's not safer. It's equivalent (plus overhead).
> CanDetach()
> > > > > doesn't
> > > > > > document that it produces a memory barrier and nothing relies on it
> > doing
> > > > so.
> > > > > > The only other thing after this call is HasJoinedForTesting() which
is
> > > > > > self-sufficient.
> > > > > > 
> > > > > > Everything in the scheduler is designed to do exactly what it needs
to
> > be
> > > > > safe.
> > > > > > Not more, not less.
> > > > > > 
> > > > > > Adding Release semantics here implicitly encodes a requirement which
> > > doesn't
> > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely cares
> for
> > a
> > > > > > sequence ;-).
> > > > > 
> > > > > In light of the atomic ops discussion, any strong opinion to remove
the
> > > > barrier
> > > > > here?
> > > > 
> > > > From the conclusion we derived, I think the inc/decrements should use
> > Barriers
> > > > and the read here should use Acquire_Load(). I'm still not convinced
that
> > this
> > > > is required when dealing with a single variable (all memory re-ordering
> > > examples
> > > > I've seen use two variables and even a NoBarrier_Increment has to use
RMW
> --
> > > > read-modify-write -- semantics which have an implicit "barrier" for at
> least
> > > > that single variable (but would allow memory re-ordering of other memory
> > > regions
> > > > with respect to it?), i.e. maybe we could get away with
> NoBarrier_Increment
> > > and
> > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced we
> > need
> > > > the Acquire_Load() though as the read doesn't have an implicit RMW like
> the
> > > > increments do (don't understand when Release_Load() is appropriate --
> C++11
> > > > doesn't provide it FWIW).
> > > 
> > > I've my understanding is right
> > 
> > We have different understandings on a few things (inline)
> > 
> > >, the atomic ops are guaranteed to provide
> > > atomicity in operations. The barriers control visibility.
> > > 
> > > So IIUC, it's like this
> > > 
> > > [Atomic Operation]
> > > 
> > > NoBarrier_AtomicIncrement of the same atomic variable:
> > > Thread A      [In-place Atomic Increment]    [In-place Atomic Increment]
> > > Thread B      [In-place Atomic Increment]    [In-place Atomic Increment]
> > > 
> > > Result Case 1         0->1                          1->2
> > > Result Case 2               1->2                    2->3
> > > (And more)
> > > 
> > > The value is atomic (we aren't reading in the middle of another thread
> > writing).
> > > The visibility is up to the processor. 
> > > 
> > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> > > exclusion)
> > > 
> > > Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place
Atomic
> > > Increment + B]
> > > Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place
Atomic
> > > Increment + B]
> > > Result Case 1    A:  0->1                             3->4
> > >                  B:       1->2                   2->3
> > > 
> > > Result Case 2    A:       1->2                        3->4
> > >                  B:  0->1                        2->3
> > > (...)
> > > 
> > > Non-atomic increment would separate read-increment-write steps, which for
> word
> > > sized data types, would lead to similar behavior as
> NoBarrier_AtomicIncrement.
> > 
> > Hmmm separating read-increment-write steps would be wrong even for word
sized
> > data types (the whole point of atomic RMW is to make sure to do so as a
single
> > transaction).
> > 
> > > For sizes larger than a word, we could see intermediate reads and writes.
> > > 
> > > So that indeed suggests I need a barrier at the increment and decrements.
> > 
> > I think you need a barrier so that the result of the increment is "flushed"
> > (guaranteed to not be reordered after the next sequential read).
> > 
> > > 
> > > 
> > > 
> > > 
> > > 
> > > So as for whether or not to use Acquire_Load and Release_Load, I guess the
> > > question comes down when do I want to sync the world
> > > 
> > > Acquire [Sync]  Read
> > > Release         Read  [Sync]
> > > 
> > > Given that this count could be changing any time, practically speaking, it
> > > doesn't seem to matter. If we get a 0, it could be 1 right after the read
in
> > > either case. The other thing to consider is the memory around the value.
If
> > that
> > > needs to be visible too, then you'll also need a barrier (e.g. a certain
> > > variable A is set to a value iff a variable B is 0).
> > 
> > I think we do care about syncing before (Acquire), otherwise a thread could
> see
> > 0 forever. It's not relevant that the count could go up right after check
(we
> > explicitly support that by sending a WakeUp which would result in a sleep
> no-op
> > IIRC). What's relevant is that we don't see a value that's outdated beyond
the
> > range we explicitly protect ourselves against.
> > 
> > > A Release_Store ensures that setting A and B in concert is visible
> immediately
> > > after B is set.
> > 
> > I don't think we care about that. Nothing depends on being synchronized with
> the
> > variable at stake here, adding the Release adds an implicit dependency which
> we
> > don't document and that's no good for readability (and a potential burden if
> > someone starts depending on it because it happens to be there). Every
> > synchronization in the scheduler is explicit and intentional, let's not add
> one
> > for the sake of "why not".
> > 
> > > 
> > > In this case of a read, the other values are synchronized
> > 
> > What do you mean by "the other values are synchronized on read"? Which other
> > values and why and why does it matter?
> > 
> > >, so it doesn't matter
> > > as much if we sync now or later. With this analysis here, I think arguably
> we
> > > can use NoBarrier_Load since with the increment barriers, the value is
> visible
> > > everywhere.
> > 
> 
> > Hmmm separating read-increment-write steps would be wrong even for word
sized
> data types (the whole point of atomic RMW is to make sure to do so as a single
> transaction).
> 
> Indeed, but that's the joy of omitting the barriers. Non-atomic ops look
> indistinguishable from atomic ops.
> 
> > I think you need a barrier so that the result of the increment is "flushed"
> (guaranteed to not be reordered after the next sequential read).
> Processors and compilers guarantee that single-threaded behavior is preserved
> regardless of the ordering game within a single processor. When multiple
> processors (and thus threads) are running, that guarantee goes down to the
> memory coherence guarantees (which are quite strong on x86 and weak on arm for
> example).
> 
> > I don't think we care about that. Nothing depends on being synchronized with
> the
> variable at stake here, adding the Release adds an implicit dependency which
we
> don't document and that's no good for readability (and a potential burden if
> someone starts depending on it because it happens to be there). Every
> synchronization in the scheduler is explicit and intentional, let's not add
one
> for the sake of "why not".
> 
> That's correct. Since all of the other values are already sync'ed via locks or
> are only modified on one thread, we're generally okay.
> 
> > I think we do care about syncing before (Acquire), otherwise a thread could
> see
> 0 forever. It's not relevant that the count could go up right after check (we
> explicitly support that by sending a WakeUp which would result in a sleep
no-op
> IIRC). What's relevant is that we don't see a value that's outdated beyond the
> range we explicitly protect ourselves against.
> 
> In the Barrier_AtomicIncrement world, a thread won't always see zero because
the
> increments are barriered. The memory address in cache is flushed and
invalidated
> on the other cores.

I'm not convinced this is true. The RMW will add an lfence (Acquire) on its read
and an rfence (Release) on its write *for that thread*. That doesn't mean that
other threads will see the value (they need an lfence (Acquire)) to be forced to
fetch it.

In any case, even if NoBarrier was correct here, it's not worth optimizing
CanDetach() as it's running when the thread is idle anyways so might as well add
the lfence (Acquire).

> 
> > What do you mean by "the other values are synchronized on read"? Which other
> values and why and why does it matter?
> I mean Release_Read. It doesn't matter in this case now that we're using
> barriers for the counters.

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/260001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:446: performed_idle_cycle_ =
true;
On 2016/07/20 22:18:03, robliao wrote:
> On 2016/07/20 20:26:31, gab wrote:
> > On 2016/07/20 19:44:01, robliao wrote:
> > > On 2016/07/20 01:45:21, gab wrote:
> > > > So if we create/wake one too many thread (i.e. 2 too many threads for
> > workload
> > > > given we always aim to keep one ready), we'll immediately detach it? Or
do
> > we
> > > > want to at least wait for "reclaim time" (in which case I think starting
> > this
> > > at
> > > > false gives us what we want?)
> > > 
> > > Yes, if we wake one too many threads with no work, those threads will
> detach.
> > In
> > > general, we shouldn't be doing that as that's indicative of a bug on our
> side.
> > 
> > > 
> > > Waiting for the reclaim time is just as bad since we'll effectively have
> these
> > > threads spin waiting. We should instead fix the wake bug.
> > 
> > It's not necessarily a bug. It can today under a simple bursty load.
> > 
> > i.e. get 4 tasks, wake 4 threads. Tasks happen to be fast and are all
> processed
> > by first two threads before other 2 even get scheduled.
> > 
> > I think in those cases the ever idle threads should wait for reclaim time or
> we
> > might end up trashing the system. Say 4 more tasks show up after an
immediate
> > reclaim: we recreate those 2 threads but the 2 active one yet again manage
to
> > process the 4 tasks before the 2 recreated threads are scheduled (etc.).
> > 
> > So I still think we should start this false (and explain the above in a
> comment
> > here).
> > 
> > (and we should reset it to false right before detaching as well for the same
> > reasons)
> 
> Fair enough. I've moved this to an idle_start_time instead of tracking one
cycle
> count.

I like the new version very much, easier to read and reason about too :-).

https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() -
idle_start_time_) > outer_->suggested_reclaim_time_ &&
Use TimeTicks? IIRC they're more reliable (and maybe even more efficient?)

@fdoray's the expert on this

fdoray

https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode541 base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() - idle_start_time_) > outer_->suggested_reclaim_time_ && On 2016/07/21 13:43:22, ...

4 years, 5 months ago (2016-07-21 13:59:08 UTC) #51

robliao

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/21 13:43:21, gab wrote: > On ...

4 years, 5 months ago (2016-07-21 18:44:24 UTC) #52

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/21 13:43:21, gab wrote:
> On 2016/07/20 22:18:03, robliao wrote:
> > On 2016/07/20 20:26:31, gab wrote:
> > > On 2016/07/20 19:44:00, robliao wrote:
> > > > On 2016/07/20 01:45:21, gab wrote:
> > > > > On 2016/07/19 22:03:47, robliao wrote:
> > > > > > On 2016/07/13 21:07:55, gab wrote:
> > > > > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > > > > Why do you need Release_Load() here?
> > > > > > > > > > Since CanDetach may make decisions based off of surrounding
> > state
> > > > from
> > > > > > the
> > > > > > > > > > single threaded task runners (like idle thread accounting),
I
> > > > figured
> > > > > it
> > > > > > > > would
> > > > > > > > > > be robust to make sure everything is updated at this point.
> > > > > > > > > 
> > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is
done
> > on
> > > > its
> > > > > > own
> > > > > > > > (and
> > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All
> > checks
> > > > are
> > > > > > > > > self-sufficient so this one is really only about
> > > > > > > |num_single_threaded_runners|
> > > > > > > > > IMO.
> > > > > > > > 
> > > > > > > > In either case, it's the safest thing to do here. We want to
make
> > sure
> > > > all
> > > > > > > state
> > > > > > > > from before gets synced up to this point. 
> > > > > > > 
> > > > > > > I disagree. It's not safer. It's equivalent (plus overhead).
> > CanDetach()
> > > > > > doesn't
> > > > > > > document that it produces a memory barrier and nothing relies on
it
> > > doing
> > > > > so.
> > > > > > > The only other thing after this call is HasJoinedForTesting()
which
> is
> > > > > > > self-sufficient.
> > > > > > > 
> > > > > > > Everything in the scheduler is designed to do exactly what it
needs
> to
> > > be
> > > > > > safe.
> > > > > > > Not more, not less.
> > > > > > > 
> > > > > > > Adding Release semantics here implicitly encodes a requirement
which
> > > > doesn't
> > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely
cares
> > for
> > > a
> > > > > > > sequence ;-).
> > > > > > 
> > > > > > In light of the atomic ops discussion, any strong opinion to remove
> the
> > > > > barrier
> > > > > > here?
> > > > > 
> > > > > From the conclusion we derived, I think the inc/decrements should use
> > > Barriers
> > > > > and the read here should use Acquire_Load(). I'm still not convinced
> that
> > > this
> > > > > is required when dealing with a single variable (all memory
re-ordering
> > > > examples
> > > > > I've seen use two variables and even a NoBarrier_Increment has to use
> RMW
> > --
> > > > > read-modify-write -- semantics which have an implicit "barrier" for at
> > least
> > > > > that single variable (but would allow memory re-ordering of other
memory
> > > > regions
> > > > > with respect to it?), i.e. maybe we could get away with
> > NoBarrier_Increment
> > > > and
> > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty convinced
we
> > > need
> > > > > the Acquire_Load() though as the read doesn't have an implicit RMW
like
> > the
> > > > > increments do (don't understand when Release_Load() is appropriate --
> > C++11
> > > > > doesn't provide it FWIW).
> > > > 
> > > > I've my understanding is right
> > > 
> > > We have different understandings on a few things (inline)
> > > 
> > > >, the atomic ops are guaranteed to provide
> > > > atomicity in operations. The barriers control visibility.
> > > > 
> > > > So IIUC, it's like this
> > > > 
> > > > [Atomic Operation]
> > > > 
> > > > NoBarrier_AtomicIncrement of the same atomic variable:
> > > > Thread A      [In-place Atomic Increment]    [In-place Atomic Increment]
> > > > Thread B      [In-place Atomic Increment]    [In-place Atomic Increment]
> > > > 
> > > > Result Case 1         0->1                          1->2
> > > > Result Case 2               1->2                    2->3
> > > > (And more)
> > > > 
> > > > The value is atomic (we aren't reading in the middle of another thread
> > > writing).
> > > > The visibility is up to the processor. 
> > > > 
> > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> > > > exclusion)
> > > > 
> > > > Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> Atomic
> > > > Increment + B]
> > > > Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> Atomic
> > > > Increment + B]
> > > > Result Case 1    A:  0->1                             3->4
> > > >                  B:       1->2                   2->3
> > > > 
> > > > Result Case 2    A:       1->2                        3->4
> > > >                  B:  0->1                        2->3
> > > > (...)
> > > > 
> > > > Non-atomic increment would separate read-increment-write steps, which
for
> > word
> > > > sized data types, would lead to similar behavior as
> > NoBarrier_AtomicIncrement.
> > > 
> > > Hmmm separating read-increment-write steps would be wrong even for word
> sized
> > > data types (the whole point of atomic RMW is to make sure to do so as a
> single
> > > transaction).
> > > 
> > > > For sizes larger than a word, we could see intermediate reads and
writes.
> > > > 
> > > > So that indeed suggests I need a barrier at the increment and
decrements.
> > > 
> > > I think you need a barrier so that the result of the increment is
"flushed"
> > > (guaranteed to not be reordered after the next sequential read).
> > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > So as for whether or not to use Acquire_Load and Release_Load, I guess
the
> > > > question comes down when do I want to sync the world
> > > > 
> > > > Acquire [Sync]  Read
> > > > Release         Read  [Sync]
> > > > 
> > > > Given that this count could be changing any time, practically speaking,
it
> > > > doesn't seem to matter. If we get a 0, it could be 1 right after the
read
> in
> > > > either case. The other thing to consider is the memory around the value.
> If
> > > that
> > > > needs to be visible too, then you'll also need a barrier (e.g. a certain
> > > > variable A is set to a value iff a variable B is 0).
> > > 
> > > I think we do care about syncing before (Acquire), otherwise a thread
could
> > see
> > > 0 forever. It's not relevant that the count could go up right after check
> (we
> > > explicitly support that by sending a WakeUp which would result in a sleep
> > no-op
> > > IIRC). What's relevant is that we don't see a value that's outdated beyond
> the
> > > range we explicitly protect ourselves against.
> > > 
> > > > A Release_Store ensures that setting A and B in concert is visible
> > immediately
> > > > after B is set.
> > > 
> > > I don't think we care about that. Nothing depends on being synchronized
with
> > the
> > > variable at stake here, adding the Release adds an implicit dependency
which
> > we
> > > don't document and that's no good for readability (and a potential burden
if
> > > someone starts depending on it because it happens to be there). Every
> > > synchronization in the scheduler is explicit and intentional, let's not
add
> > one
> > > for the sake of "why not".
> > > 
> > > > 
> > > > In this case of a read, the other values are synchronized
> > > 
> > > What do you mean by "the other values are synchronized on read"? Which
other
> > > values and why and why does it matter?
> > > 
> > > >, so it doesn't matter
> > > > as much if we sync now or later. With this analysis here, I think
arguably
> > we
> > > > can use NoBarrier_Load since with the increment barriers, the value is
> > visible
> > > > everywhere.
> > > 
> > 
> > > Hmmm separating read-increment-write steps would be wrong even for word
> sized
> > data types (the whole point of atomic RMW is to make sure to do so as a
single
> > transaction).
> > 
> > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look
> > indistinguishable from atomic ops.
> > 
> > > I think you need a barrier so that the result of the increment is
"flushed"
> > (guaranteed to not be reordered after the next sequential read).
> > Processors and compilers guarantee that single-threaded behavior is
preserved
> > regardless of the ordering game within a single processor. When multiple
> > processors (and thus threads) are running, that guarantee goes down to the
> > memory coherence guarantees (which are quite strong on x86 and weak on arm
for
> > example).
> > 
> > > I don't think we care about that. Nothing depends on being synchronized
with
> > the
> > variable at stake here, adding the Release adds an implicit dependency which
> we
> > don't document and that's no good for readability (and a potential burden if
> > someone starts depending on it because it happens to be there). Every
> > synchronization in the scheduler is explicit and intentional, let's not add
> one
> > for the sake of "why not".
> > 
> > That's correct. Since all of the other values are already sync'ed via locks
or
> > are only modified on one thread, we're generally okay.
> > 
> > > I think we do care about syncing before (Acquire), otherwise a thread
could
> > see
> > 0 forever. It's not relevant that the count could go up right after check
(we
> > explicitly support that by sending a WakeUp which would result in a sleep
> no-op
> > IIRC). What's relevant is that we don't see a value that's outdated beyond
the
> > range we explicitly protect ourselves against.
> > 
> > In the Barrier_AtomicIncrement world, a thread won't always see zero because
> the
> > increments are barriered. The memory address in cache is flushed and
> invalidated
> > on the other cores.
> 
> I'm not convinced this is true. The RMW will add an lfence (Acquire) on its
read
> and an rfence (Release) on its write *for that thread*. That doesn't mean that
> other threads will see the value (they need an lfence (Acquire)) to be forced
to
> fetch it.
> 
> In any case, even if NoBarrier was correct here, it's not worth optimizing
> CanDetach() as it's running when the thread is idle anyways so might as well
add
> the lfence (Acquire).
> 
> > 
> > > What do you mean by "the other values are synchronized on read"? Which
other
> > values and why and why does it matter?
> > I mean Release_Read. It doesn't matter in this case now that we're using
> > barriers for the counters.
> 

Digging into this deeper, the RMW will happen atomically with atomicops (lock
xadd for x86, and a retry loop for arm to make sure the memory written is as
expected). My operating assumption above on visibility incorrectly assumed a
non-atomic RMW.

The fence is to make sure that state is updated in a consistent way. Because
memory ops can be reordered, things will appear just fine on the current thread,
but could appear strange to outside threads. Fencing ensures that memory updates
from a single thread appear in the correct order for other threads.

Using the refcounted object as an example...
When a refcounted object addrefs, the refcount doesn't really matter to an
external thread except for the fact that it is non-zero.

When refcounting down, the comment says to use a barrier to make sure the "state
written before the reference count became zero will be visible to a thread that
has just made the count zero". This comment makes no sense to me because from
the same thread, all memory operations on the same thread are immediately
visible on the same thread (even with reordering, processors work hard to ensure
that single-threaded behaviors still work correctly). 

However, if considering that someone knew they had the last ref (already
problematic, but that's a separate issue), and they set a boolean
fReleasedLastRef, this could potentially be useful to another thread expecting
to see fReleasedLastRef and an object of refcount 0.

https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/400001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.cc:541: (Time::Now() -
idle_start_time_) > outer_->suggested_reclaim_time_ &&
On 2016/07/21 13:59:08, fdoray wrote:
> On 2016/07/21 13:43:22, gab wrote:
> > Use TimeTicks? IIRC they're more reliable (and maybe even more efficient?)
> > 
> > @fdoray's the expert on this
> 
> Yes, use TimeTicks.
> 
> Time = use it to get an absolute value convertible to UTC and comparable with
> values taken on other machines. It is low resolution, non-monotonic and
doesn't
> increase at a constant rate.
> TimeTicks = use it to measure elapsed time. The absolute value has no meaning.
> It has a high resolution, is monotonic and increases at a constant rate.
> 
>
https://docs.google.com/document/d/1ypBZPZnshJ612FWAjkkpSBQiII7vYEEDsRkDOkVjQ...

Done.

gab

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On 2016/07/21 18:44:23, robliao wrote: > On ...

4 years, 5 months ago (2016-07-21 21:24:24 UTC) #53

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/21 18:44:23, robliao wrote:
> On 2016/07/21 13:43:21, gab wrote:
> > On 2016/07/20 22:18:03, robliao wrote:
> > > On 2016/07/20 20:26:31, gab wrote:
> > > > On 2016/07/20 19:44:00, robliao wrote:
> > > > > On 2016/07/20 01:45:21, gab wrote:
> > > > > > On 2016/07/19 22:03:47, robliao wrote:
> > > > > > > On 2016/07/13 21:07:55, gab wrote:
> > > > > > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > > > > > Why do you need Release_Load() here?
> > > > > > > > > > > Since CanDetach may make decisions based off of
surrounding
> > > state
> > > > > from
> > > > > > > the
> > > > > > > > > > > single threaded task runners (like idle thread
accounting),
> I
> > > > > figured
> > > > > > it
> > > > > > > > > would
> > > > > > > > > > > be robust to make sure everything is updated at this
point.
> > > > > > > > > > 
> > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting is
> done
> > > on
> > > > > its
> > > > > > > own
> > > > > > > > > (and
> > > > > > > > > > is in fact checked before the ReleaseLoad here anyways). All
> > > checks
> > > > > are
> > > > > > > > > > self-sufficient so this one is really only about
> > > > > > > > |num_single_threaded_runners|
> > > > > > > > > > IMO.
> > > > > > > > > 
> > > > > > > > > In either case, it's the safest thing to do here. We want to
> make
> > > sure
> > > > > all
> > > > > > > > state
> > > > > > > > > from before gets synced up to this point. 
> > > > > > > > 
> > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead).
> > > CanDetach()
> > > > > > > doesn't
> > > > > > > > document that it produces a memory barrier and nothing relies on
> it
> > > > doing
> > > > > > so.
> > > > > > > > The only other thing after this call is HasJoinedForTesting()
> which
> > is
> > > > > > > > self-sufficient.
> > > > > > > > 
> > > > > > > > Everything in the scheduler is designed to do exactly what it
> needs
> > to
> > > > be
> > > > > > > safe.
> > > > > > > > Not more, not less.
> > > > > > > > 
> > > > > > > > Adding Release semantics here implicitly encodes a requirement
> which
> > > > > doesn't
> > > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely
> cares
> > > for
> > > > a
> > > > > > > > sequence ;-).
> > > > > > > 
> > > > > > > In light of the atomic ops discussion, any strong opinion to
remove
> > the
> > > > > > barrier
> > > > > > > here?
> > > > > > 
> > > > > > From the conclusion we derived, I think the inc/decrements should
use
> > > > Barriers
> > > > > > and the read here should use Acquire_Load(). I'm still not convinced
> > that
> > > > this
> > > > > > is required when dealing with a single variable (all memory
> re-ordering
> > > > > examples
> > > > > > I've seen use two variables and even a NoBarrier_Increment has to
use
> > RMW
> > > --
> > > > > > read-modify-write -- semantics which have an implicit "barrier" for
at
> > > least
> > > > > > that single variable (but would allow memory re-ordering of other
> memory
> > > > > regions
> > > > > > with respect to it?), i.e. maybe we could get away with
> > > NoBarrier_Increment
> > > > > and
> > > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty
convinced
> we
> > > > need
> > > > > > the Acquire_Load() though as the read doesn't have an implicit RMW
> like
> > > the
> > > > > > increments do (don't understand when Release_Load() is appropriate
--
> > > C++11
> > > > > > doesn't provide it FWIW).
> > > > > 
> > > > > I've my understanding is right
> > > > 
> > > > We have different understandings on a few things (inline)
> > > > 
> > > > >, the atomic ops are guaranteed to provide
> > > > > atomicity in operations. The barriers control visibility.
> > > > > 
> > > > > So IIUC, it's like this
> > > > > 
> > > > > [Atomic Operation]
> > > > > 
> > > > > NoBarrier_AtomicIncrement of the same atomic variable:
> > > > > Thread A      [In-place Atomic Increment]    [In-place Atomic
Increment]
> > > > > Thread B      [In-place Atomic Increment]    [In-place Atomic
Increment]
> > > > > 
> > > > > Result Case 1         0->1                          1->2
> > > > > Result Case 2               1->2                    2->3
> > > > > (And more)
> > > > > 
> > > > > The value is atomic (we aren't reading in the middle of another thread
> > > > writing).
> > > > > The visibility is up to the processor. 
> > > > > 
> > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g. mutual
> > > > > exclusion)
> > > > > 
> > > > > Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> > Atomic
> > > > > Increment + B]
> > > > > Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> > Atomic
> > > > > Increment + B]
> > > > > Result Case 1    A:  0->1                             3->4
> > > > >                  B:       1->2                   2->3
> > > > > 
> > > > > Result Case 2    A:       1->2                        3->4
> > > > >                  B:  0->1                        2->3
> > > > > (...)
> > > > > 
> > > > > Non-atomic increment would separate read-increment-write steps, which
> for
> > > word
> > > > > sized data types, would lead to similar behavior as
> > > NoBarrier_AtomicIncrement.
> > > > 
> > > > Hmmm separating read-increment-write steps would be wrong even for word
> > sized
> > > > data types (the whole point of atomic RMW is to make sure to do so as a
> > single
> > > > transaction).
> > > > 
> > > > > For sizes larger than a word, we could see intermediate reads and
> writes.
> > > > > 
> > > > > So that indeed suggests I need a barrier at the increment and
> decrements.
> > > > 
> > > > I think you need a barrier so that the result of the increment is
> "flushed"
> > > > (guaranteed to not be reordered after the next sequential read).
> > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > So as for whether or not to use Acquire_Load and Release_Load, I guess
> the
> > > > > question comes down when do I want to sync the world
> > > > > 
> > > > > Acquire [Sync]  Read
> > > > > Release         Read  [Sync]
> > > > > 
> > > > > Given that this count could be changing any time, practically
speaking,
> it
> > > > > doesn't seem to matter. If we get a 0, it could be 1 right after the
> read
> > in
> > > > > either case. The other thing to consider is the memory around the
value.
> > If
> > > > that
> > > > > needs to be visible too, then you'll also need a barrier (e.g. a
certain
> > > > > variable A is set to a value iff a variable B is 0).
> > > > 
> > > > I think we do care about syncing before (Acquire), otherwise a thread
> could
> > > see
> > > > 0 forever. It's not relevant that the count could go up right after
check
> > (we
> > > > explicitly support that by sending a WakeUp which would result in a
sleep
> > > no-op
> > > > IIRC). What's relevant is that we don't see a value that's outdated
beyond
> > the
> > > > range we explicitly protect ourselves against.
> > > > 
> > > > > A Release_Store ensures that setting A and B in concert is visible
> > > immediately
> > > > > after B is set.
> > > > 
> > > > I don't think we care about that. Nothing depends on being synchronized
> with
> > > the
> > > > variable at stake here, adding the Release adds an implicit dependency
> which
> > > we
> > > > don't document and that's no good for readability (and a potential
burden
> if
> > > > someone starts depending on it because it happens to be there). Every
> > > > synchronization in the scheduler is explicit and intentional, let's not
> add
> > > one
> > > > for the sake of "why not".
> > > > 
> > > > > 
> > > > > In this case of a read, the other values are synchronized
> > > > 
> > > > What do you mean by "the other values are synchronized on read"? Which
> other
> > > > values and why and why does it matter?
> > > > 
> > > > >, so it doesn't matter
> > > > > as much if we sync now or later. With this analysis here, I think
> arguably
> > > we
> > > > > can use NoBarrier_Load since with the increment barriers, the value is
> > > visible
> > > > > everywhere.
> > > > 
> > > 
> > > > Hmmm separating read-increment-write steps would be wrong even for word
> > sized
> > > data types (the whole point of atomic RMW is to make sure to do so as a
> single
> > > transaction).
> > > 
> > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look
> > > indistinguishable from atomic ops.
> > > 
> > > > I think you need a barrier so that the result of the increment is
> "flushed"
> > > (guaranteed to not be reordered after the next sequential read).
> > > Processors and compilers guarantee that single-threaded behavior is
> preserved
> > > regardless of the ordering game within a single processor. When multiple
> > > processors (and thus threads) are running, that guarantee goes down to the
> > > memory coherence guarantees (which are quite strong on x86 and weak on arm
> for
> > > example).
> > > 
> > > > I don't think we care about that. Nothing depends on being synchronized
> with
> > > the
> > > variable at stake here, adding the Release adds an implicit dependency
which
> > we
> > > don't document and that's no good for readability (and a potential burden
if
> > > someone starts depending on it because it happens to be there). Every
> > > synchronization in the scheduler is explicit and intentional, let's not
add
> > one
> > > for the sake of "why not".
> > > 
> > > That's correct. Since all of the other values are already sync'ed via
locks
> or
> > > are only modified on one thread, we're generally okay.
> > > 
> > > > I think we do care about syncing before (Acquire), otherwise a thread
> could
> > > see
> > > 0 forever. It's not relevant that the count could go up right after check
> (we
> > > explicitly support that by sending a WakeUp which would result in a sleep
> > no-op
> > > IIRC). What's relevant is that we don't see a value that's outdated beyond
> the
> > > range we explicitly protect ourselves against.
> > > 
> > > In the Barrier_AtomicIncrement world, a thread won't always see zero
because
> > the
> > > increments are barriered. The memory address in cache is flushed and
> > invalidated
> > > on the other cores.
> > 
> > I'm not convinced this is true. The RMW will add an lfence (Acquire) on its
> read
> > and an rfence (Release) on its write *for that thread*. That doesn't mean
that
> > other threads will see the value (they need an lfence (Acquire)) to be
forced
> to
> > fetch it.
> > 
> > In any case, even if NoBarrier was correct here, it's not worth optimizing
> > CanDetach() as it's running when the thread is idle anyways so might as well
> add
> > the lfence (Acquire).
> > 
> > > 
> > > > What do you mean by "the other values are synchronized on read"? Which
> other
> > > values and why and why does it matter?
> > > I mean Release_Read. It doesn't matter in this case now that we're using
> > > barriers for the counters.
> > 
> 
> Digging into this deeper, the RMW will happen atomically with atomicops (lock
> xadd for x86, and a retry loop for arm to make sure the memory written is as
> expected). My operating assumption above on visibility incorrectly assumed a
> non-atomic RMW.
> 
> The fence is to make sure that state is updated in a consistent way. Because
> memory ops can be reordered, things will appear just fine on the current
thread,
> but could appear strange to outside threads. Fencing ensures that memory
updates
> from a single thread appear in the correct order for other threads.
> 
> Using the refcounted object as an example...
> When a refcounted object addrefs, the refcount doesn't really matter to an
> external thread except for the fact that it is non-zero.
> 
> When refcounting down, the comment says to use a barrier to make sure the
"state
> written before the reference count became zero will be visible to a thread
that
> has just made the count zero". This comment makes no sense to me because from
> the same thread, all memory operations on the same thread are immediately
> visible on the same thread (even with reordering, processors work hard to
ensure
> that single-threaded behaviors still work correctly). 
> 
> However, if considering that someone knew they had the last ref (already
> problematic, but that's a separate issue), and they set a boolean
> fReleasedLastRef, this could potentially be useful to another thread expecting
> to see fReleasedLastRef and an object of refcount 0.

Refcount is different, no one ever wants to just "read" the value (well except
in say AtomicRefCountIsOne() which does us Acquire_Load() as I think we need to
here).

The Barrier on decrement (and that comment) IMO mean that: although the memory
order doesn't matter, whoever reaches 0 (and thus falls last in the relaxed
memory order) should sync its state for *other variables* so that whatever it
deletes (likely action of getting to 0) is the most recent version (and it's
probably guaranteed to see what other threads have done to *other variables*
because their RMW "happened before" in the relaxed memory order per this thread
being the one hitting 0 -- though since it hit 0 last I would assume even a
NoBarrier RMW would force the lfence on the R to sync...).

robliao

Added a fence to the load. https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc File base/task_scheduler/scheduler_worker_pool_impl.cc (right): https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/scheduler_worker_pool_impl.cc#newcode531 base/task_scheduler/scheduler_worker_pool_impl.cc:531: !subtle::Release_Load(&num_single_threaded_runners_) && On ...

4 years, 5 months ago (2016-07-22 16:44:07 UTC) #54

Added a fence to the load.

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
File base/task_scheduler/scheduler_worker_pool_impl.cc (right):

https://codereview.chromium.org/2116163002/diff/1/base/task_scheduler/schedul...
base/task_scheduler/scheduler_worker_pool_impl.cc:531:
!subtle::Release_Load(&num_single_threaded_runners_) &&
On 2016/07/21 21:24:24, gab wrote:
> On 2016/07/21 18:44:23, robliao wrote:
> > On 2016/07/21 13:43:21, gab wrote:
> > > On 2016/07/20 22:18:03, robliao wrote:
> > > > On 2016/07/20 20:26:31, gab wrote:
> > > > > On 2016/07/20 19:44:00, robliao wrote:
> > > > > > On 2016/07/20 01:45:21, gab wrote:
> > > > > > > On 2016/07/19 22:03:47, robliao wrote:
> > > > > > > > On 2016/07/13 21:07:55, gab wrote:
> > > > > > > > > On 2016/07/13 20:19:46, robliao wrote:
> > > > > > > > > > On 2016/07/13 18:36:31, gab wrote:
> > > > > > > > > > > > > Why do you need Release_Load() here?
> > > > > > > > > > > > Since CanDetach may make decisions based off of
> surrounding
> > > > state
> > > > > > from
> > > > > > > > the
> > > > > > > > > > > > single threaded task runners (like idle thread
> accounting),
> > I
> > > > > > figured
> > > > > > > it
> > > > > > > > > > would
> > > > > > > > > > > > be robust to make sure everything is updated at this
> point.
> > > > > > > > > > > 
> > > > > > > > > > > Hmmm, I don't think this matters, idle thread accounting
is
> > done
> > > > on
> > > > > > its
> > > > > > > > own
> > > > > > > > > > (and
> > > > > > > > > > > is in fact checked before the ReleaseLoad here anyways).
All
> > > > checks
> > > > > > are
> > > > > > > > > > > self-sufficient so this one is really only about
> > > > > > > > > |num_single_threaded_runners|
> > > > > > > > > > > IMO.
> > > > > > > > > > 
> > > > > > > > > > In either case, it's the safest thing to do here. We want to
> > make
> > > > sure
> > > > > > all
> > > > > > > > > state
> > > > > > > > > > from before gets synced up to this point. 
> > > > > > > > > 
> > > > > > > > > I disagree. It's not safer. It's equivalent (plus overhead).
> > > > CanDetach()
> > > > > > > > doesn't
> > > > > > > > > document that it produces a memory barrier and nothing relies
on
> > it
> > > > > doing
> > > > > > > so.
> > > > > > > > > The only other thing after this call is HasJoinedForTesting()
> > which
> > > is
> > > > > > > > > self-sufficient.
> > > > > > > > > 
> > > > > > > > > Everything in the scheduler is designed to do exactly what it
> > needs
> > > to
> > > > > be
> > > > > > > > safe.
> > > > > > > > > Not more, not less.
> > > > > > > > > 
> > > > > > > > > Adding Release semantics here implicitly encodes a requirement
> > which
> > > > > > doesn't
> > > > > > > > > exist. It's like using ThreadTaskRunnerHandle when one merely
> > cares
> > > > for
> > > > > a
> > > > > > > > > sequence ;-).
> > > > > > > > 
> > > > > > > > In light of the atomic ops discussion, any strong opinion to
> remove
> > > the
> > > > > > > barrier
> > > > > > > > here?
> > > > > > > 
> > > > > > > From the conclusion we derived, I think the inc/decrements should
> use
> > > > > Barriers
> > > > > > > and the read here should use Acquire_Load(). I'm still not
convinced
> > > that
> > > > > this
> > > > > > > is required when dealing with a single variable (all memory
> > re-ordering
> > > > > > examples
> > > > > > > I've seen use two variables and even a NoBarrier_Increment has to
> use
> > > RMW
> > > > --
> > > > > > > read-modify-write -- semantics which have an implicit "barrier"
for
> at
> > > > least
> > > > > > > that single variable (but would allow memory re-ordering of other
> > memory
> > > > > > regions
> > > > > > > with respect to it?), i.e. maybe we could get away with
> > > > NoBarrier_Increment
> > > > > > and
> > > > > > > Acquire_Load but I'm not 100% sure anymore...). I am pretty
> convinced
> > we
> > > > > need
> > > > > > > the Acquire_Load() though as the read doesn't have an implicit RMW
> > like
> > > > the
> > > > > > > increments do (don't understand when Release_Load() is appropriate
> --
> > > > C++11
> > > > > > > doesn't provide it FWIW).
> > > > > > 
> > > > > > I've my understanding is right
> > > > > 
> > > > > We have different understandings on a few things (inline)
> > > > > 
> > > > > >, the atomic ops are guaranteed to provide
> > > > > > atomicity in operations. The barriers control visibility.
> > > > > > 
> > > > > > So IIUC, it's like this
> > > > > > 
> > > > > > [Atomic Operation]
> > > > > > 
> > > > > > NoBarrier_AtomicIncrement of the same atomic variable:
> > > > > > Thread A      [In-place Atomic Increment]    [In-place Atomic
> Increment]
> > > > > > Thread B      [In-place Atomic Increment]    [In-place Atomic
> Increment]
> > > > > > 
> > > > > > Result Case 1         0->1                          1->2
> > > > > > Result Case 2               1->2                    2->3
> > > > > > (And more)
> > > > > > 
> > > > > > The value is atomic (we aren't reading in the middle of another
thread
> > > > > writing).
> > > > > > The visibility is up to the processor. 
> > > > > > 
> > > > > > With Barrier_AtomicIncrement, you impose a total ordering (e.g.
mutual
> > > > > > exclusion)
> > > > > > 
> > > > > > Thread A  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> > > Atomic
> > > > > > Increment + B]
> > > > > > Thread B  [B + In-place Atomic Increment + B]  [B + Barrier In-place
> > > Atomic
> > > > > > Increment + B]
> > > > > > Result Case 1    A:  0->1                             3->4
> > > > > >                  B:       1->2                   2->3
> > > > > > 
> > > > > > Result Case 2    A:       1->2                        3->4
> > > > > >                  B:  0->1                        2->3
> > > > > > (...)
> > > > > > 
> > > > > > Non-atomic increment would separate read-increment-write steps,
which
> > for
> > > > word
> > > > > > sized data types, would lead to similar behavior as
> > > > NoBarrier_AtomicIncrement.
> > > > > 
> > > > > Hmmm separating read-increment-write steps would be wrong even for
word
> > > sized
> > > > > data types (the whole point of atomic RMW is to make sure to do so as
a
> > > single
> > > > > transaction).
> > > > > 
> > > > > > For sizes larger than a word, we could see intermediate reads and
> > writes.
> > > > > > 
> > > > > > So that indeed suggests I need a barrier at the increment and
> > decrements.
> > > > > 
> > > > > I think you need a barrier so that the result of the increment is
> > "flushed"
> > > > > (guaranteed to not be reordered after the next sequential read).
> > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > 
> > > > > > So as for whether or not to use Acquire_Load and Release_Load, I
guess
> > the
> > > > > > question comes down when do I want to sync the world
> > > > > > 
> > > > > > Acquire [Sync]  Read
> > > > > > Release         Read  [Sync]
> > > > > > 
> > > > > > Given that this count could be changing any time, practically
> speaking,
> > it
> > > > > > doesn't seem to matter. If we get a 0, it could be 1 right after the
> > read
> > > in
> > > > > > either case. The other thing to consider is the memory around the
> value.
> > > If
> > > > > that
> > > > > > needs to be visible too, then you'll also need a barrier (e.g. a
> certain
> > > > > > variable A is set to a value iff a variable B is 0).
> > > > > 
> > > > > I think we do care about syncing before (Acquire), otherwise a thread
> > could
> > > > see
> > > > > 0 forever. It's not relevant that the count could go up right after
> check
> > > (we
> > > > > explicitly support that by sending a WakeUp which would result in a
> sleep
> > > > no-op
> > > > > IIRC). What's relevant is that we don't see a value that's outdated
> beyond
> > > the
> > > > > range we explicitly protect ourselves against.
> > > > > 
> > > > > > A Release_Store ensures that setting A and B in concert is visible
> > > > immediately
> > > > > > after B is set.
> > > > > 
> > > > > I don't think we care about that. Nothing depends on being
synchronized
> > with
> > > > the
> > > > > variable at stake here, adding the Release adds an implicit dependency
> > which
> > > > we
> > > > > don't document and that's no good for readability (and a potential
> burden
> > if
> > > > > someone starts depending on it because it happens to be there). Every
> > > > > synchronization in the scheduler is explicit and intentional, let's
not
> > add
> > > > one
> > > > > for the sake of "why not".
> > > > > 
> > > > > > 
> > > > > > In this case of a read, the other values are synchronized
> > > > > 
> > > > > What do you mean by "the other values are synchronized on read"? Which
> > other
> > > > > values and why and why does it matter?
> > > > > 
> > > > > >, so it doesn't matter
> > > > > > as much if we sync now or later. With this analysis here, I think
> > arguably
> > > > we
> > > > > > can use NoBarrier_Load since with the increment barriers, the value
is
> > > > visible
> > > > > > everywhere.
> > > > > 
> > > > 
> > > > > Hmmm separating read-increment-write steps would be wrong even for
word
> > > sized
> > > > data types (the whole point of atomic RMW is to make sure to do so as a
> > single
> > > > transaction).
> > > > 
> > > > Indeed, but that's the joy of omitting the barriers. Non-atomic ops look
> > > > indistinguishable from atomic ops.
> > > > 
> > > > > I think you need a barrier so that the result of the increment is
> > "flushed"
> > > > (guaranteed to not be reordered after the next sequential read).
> > > > Processors and compilers guarantee that single-threaded behavior is
> > preserved
> > > > regardless of the ordering game within a single processor. When multiple
> > > > processors (and thus threads) are running, that guarantee goes down to
the
> > > > memory coherence guarantees (which are quite strong on x86 and weak on
arm
> > for
> > > > example).
> > > > 
> > > > > I don't think we care about that. Nothing depends on being
synchronized
> > with
> > > > the
> > > > variable at stake here, adding the Release adds an implicit dependency
> which
> > > we
> > > > don't document and that's no good for readability (and a potential
burden
> if
> > > > someone starts depending on it because it happens to be there). Every
> > > > synchronization in the scheduler is explicit and intentional, let's not
> add
> > > one
> > > > for the sake of "why not".
> > > > 
> > > > That's correct. Since all of the other values are already sync'ed via
> locks
> > or
> > > > are only modified on one thread, we're generally okay.
> > > > 
> > > > > I think we do care about syncing before (Acquire), otherwise a thread
> > could
> > > > see
> > > > 0 forever. It's not relevant that the count could go up right after
check
> > (we
> > > > explicitly support that by sending a WakeUp which would result in a
sleep
> > > no-op
> > > > IIRC). What's relevant is that we don't see a value that's outdated
beyond
> > the
> > > > range we explicitly protect ourselves against.
> > > > 
> > > > In the Barrier_AtomicIncrement world, a thread won't always see zero
> because
> > > the
> > > > increments are barriered. The memory address in cache is flushed and
> > > invalidated
> > > > on the other cores.
> > > 
> > > I'm not convinced this is true. The RMW will add an lfence (Acquire) on
its
> > read
> > > and an rfence (Release) on its write *for that thread*. That doesn't mean
> that
> > > other threads will see the value (they need an lfence (Acquire)) to be
> forced
> > to
> > > fetch it.
> > > 
> > > In any case, even if NoBarrier was correct here, it's not worth optimizing
> > > CanDetach() as it's running when the thread is idle anyways so might as
well
> > add
> > > the lfence (Acquire).
> > > 
> > > > 
> > > > > What do you mean by "the other values are synchronized on read"? Which
> > other
> > > > values and why and why does it matter?
> > > > I mean Release_Read. It doesn't matter in this case now that we're using
> > > > barriers for the counters.
> > > 
> > 
> > Digging into this deeper, the RMW will happen atomically with atomicops
(lock
> > xadd for x86, and a retry loop for arm to make sure the memory written is as
> > expected). My operating assumption above on visibility incorrectly assumed a
> > non-atomic RMW.
> > 
> > The fence is to make sure that state is updated in a consistent way. Because
> > memory ops can be reordered, things will appear just fine on the current
> thread,
> > but could appear strange to outside threads. Fencing ensures that memory
> updates
> > from a single thread appear in the correct order for other threads.
> > 
> > Using the refcounted object as an example...
> > When a refcounted object addrefs, the refcount doesn't really matter to an
> > external thread except for the fact that it is non-zero.
> > 
> > When refcounting down, the comment says to use a barrier to make sure the
> "state
> > written before the reference count became zero will be visible to a thread
> that
> > has just made the count zero". This comment makes no sense to me because
from
> > the same thread, all memory operations on the same thread are immediately
> > visible on the same thread (even with reordering, processors work hard to
> ensure
> > that single-threaded behaviors still work correctly). 
> > 
> > However, if considering that someone knew they had the last ref (already
> > problematic, but that's a separate issue), and they set a boolean
> > fReleasedLastRef, this could potentially be useful to another thread
expecting
> > to see fReleasedLastRef and an object of refcount 0.
> 
> Refcount is different, no one ever wants to just "read" the value (well except
> in say AtomicRefCountIsOne() which does us Acquire_Load() as I think we need
to
> here).
> 
> The Barrier on decrement (and that comment) IMO mean that: although the memory
> order doesn't matter, whoever reaches 0 (and thus falls last in the relaxed
> memory order) should sync its state for *other variables* so that whatever it
> deletes (likely action of getting to 0) is the most recent version (and it's
> probably guaranteed to see what other threads have done to *other variables*
> because their RMW "happened before" in the relaxed memory order per this
thread
> being the one hitting 0 -- though since it hit 0 last I would assume even a
> NoBarrier RMW would force the lfence on the R to sync...).

Fenced the load.

robliao

The CQ bit was checked by robliao@chromium.org to run a CQ dry run

4 years, 5 months ago (2016-07-22 16:44:25 UTC) #55

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2116163002/440001

4 years, 5 months ago (2016-07-22 16:45:02 UTC) #56

gab

lgtm, thanks. Keep an eye on https://codereview.chromium.org/2163753004/ to fix lock+bool usage after it lands.

4 years, 5 months ago (2016-07-22 16:47:01 UTC) #57

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 5 months ago (2016-07-22 18:02:26 UTC) #58

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 5 months ago (2016-07-22 18:02:28 UTC) #59

robliao

The patchset sent to the CQ was uploaded after l-g-t-m from fdoray@chromium.org Link to the ...

4 years, 5 months ago (2016-07-22 18:06:45 UTC) #61

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2116163002/440001

4 years, 5 months ago (2016-07-22 18:07:29 UTC) #62

commit-bot: I haz the power

Description was changed from ========== Add Lazy Creation and Thread Detachment Support in the Scheduler ...

4 years, 5 months ago (2016-07-22 18:12:30 UTC) #63

commit-bot: I haz the power

Description was changed from ========== Add Lazy Creation and Thread Detachment Support in the Scheduler ...

4 years, 5 months ago (2016-07-22 18:15:56 UTC) #65

commit-bot: I haz the power

Patchset 12 (id:??) landed as https://crrev.com/8a4029212fd36fdd8befc9479d6e2966bd27b95e Cr-Commit-Position: refs/heads/master@{#407205}

4 years, 5 months ago (2016-07-22 18:15:58 UTC) #66

gab

4 years, 5 months ago (2016-07-25 14:46:27 UTC) #67

Message was sent while issue was closed.

https://codereview.chromium.org/2116163002/diff/440001/base/task_scheduler/sc...
File base/task_scheduler/scheduler_worker_pool_impl.h (right):

https://codereview.chromium.org/2116163002/diff/440001/base/task_scheduler/sc...
base/task_scheduler/scheduler_worker_pool_impl.h:169: bool
worker_detachment_allowed_ = true;
https://codereview.chromium.org/2163753004/ landed, let's follow-up with using
an AtomicFlag for this, thanks!

Issue 2116163002: Add Lazy Creation and Thread Detachment Support in the Scheduler Worker Pool (Closed)

Description

Patch Set 1 #

Patch Set 2 : CR Feedback #

Patch Set 3 : CR Feedback fdoray@ #

Patch Set 4 : Test Controlled Detachment and Some Cleanup #

Patch Set 5 : CR Feedback gab@ #

Patch Set 6 : Rebase on top of SchedulerWorkerPoolParams #

Patch Set 7 : CR Feedback Continuation #

Patch Set 8 : CR Feedback #

Patch Set 9 : Rebase #

Patch Set 10 : CR Feedback #

Patch Set 11 : TimeTicks #

Patch Set 12 : Fenced Load #

Messages