Issue 2381893002: base: Take care of unexpected wakeups in condition_variable_win.cc.

prashant.n

The CQ bit was checked by prashant.n@samsung.com to run a CQ dry run

4 years, 2 months ago (2016-09-29 11:40:32 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2381893002/1

4 years, 2 months ago (2016-09-29 11:40:37 UTC) #2

prashant.n

Description was changed from ========== base: Take care of unexpected wakeups in condition_variable_win.cc. As per ...

4 years, 2 months ago (2016-09-29 11:40:55 UTC) #3

prashant.n

prashant.n@samsung.com changed reviewers: + danakj@chromium.org

4 years, 2 months ago (2016-09-29 11:40:56 UTC) #4

prashant.n

For bug "Chrome: Crash Report - [Renderer hang] cc::TaskGraphWorkQueue::HasReadyToRunTasksInNamespace" suspect spurious wakeups of SleepConditionVariableSRW() as ...

4 years, 2 months ago (2016-09-29 11:42:55 UTC) #5

prashant.n

For bug "Chrome: Crash Report - [Renderer hang] cc::TaskGraphWorkQueue::HasReadyToRunTasksInNamespace" suspect spurious wakeups of SleepConditionVariableSRW() as ...

4 years, 2 months ago (2016-09-29 11:42:55 UTC) #6

prashant.n

Description was changed from ========== base: Take care of unexpected wakeups in condition_variable_win.cc. As per ...

4 years, 2 months ago (2016-09-29 12:38:43 UTC) #7

prashant.n

The CQ bit was checked by prashant.n@samsung.com to run a CQ dry run

4 years, 2 months ago (2016-09-29 13:07:27 UTC) #8

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2381893002/60001

4 years, 2 months ago (2016-09-29 13:07:38 UTC) #9

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 2 months ago (2016-09-29 14:13:48 UTC) #10

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

4 years, 2 months ago (2016-09-29 14:13:49 UTC) #11

danakj

danakj@chromium.org changed reviewers: + robliao@chromium.org

4 years, 2 months ago (2016-09-29 19:22:32 UTC) #12

robliao

On 2016/09/29 19:22:33, danakj wrote: > +robliao Two questions: 1) Like most condition variables, the ...

4 years, 2 months ago (2016-09-29 19:25:38 UTC) #14

prashant.n

On 2016/09/29 19:25:38, robliao wrote: > On 2016/09/29 19:22:33, danakj wrote: > > +robliao > ...

4 years, 2 months ago (2016-09-30 00:35:18 UTC) #15

robliao

On 2016/09/30 00:35:18, prashant.n wrote: > On 2016/09/29 19:25:38, robliao wrote: > > On 2016/09/29 ...

4 years, 2 months ago (2016-09-30 00:44:08 UTC) #16

On 2016/09/30 00:35:18, prashant.n wrote:
> On 2016/09/29 19:25:38, robliao wrote:
> > On 2016/09/29 19:22:33, danakj wrote:
> > > +robliao
> > 
> > Two questions:
> > 1) Like most condition variables, the caller has to recheck the condition in
> any
> > case before proceeding. If there is a spurious wakeup, the caller should
> recheck
> > and go back to sleep. What is the goal of this retry logic?
> As per MSDN documentation (https://goo.gl/tPHPkY and
>
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682052(v=vs.85).aspx
> ), we should continue to wait in a while loop until predicate returns true.
I've
> used here until function successfully returns or returns expected error.
> Instead of infinite loop I had put some predefined number of tries after which
> we would crash.
> 
> 
> > 2) What's the role of http://crbug.com/617010 for this bug?
> This bug also had a hang with similar kinda call stack where hang crash was
> observed.

> As per MSDN documentation (https://goo.gl/tPHPkY and
>
https://msdn.microsoft.com/en-us/library/windows/desktop/ms682052(v=vs.85).aspx
> ), we should continue to wait in a while loop until predicate returns true.
I've
> used here until function successfully returns or returns expected error.
> Instead of infinite loop I had put some predefined number of tries after which
> we would crash.
MSDN is referring to the normal checking that developers already have to do when
using condition variables.

Generally the pattern is...

[Lock]
while ([Check State Synchronized by Lock]) {
  [Condition Variable Sleep and Wait to be Woken Up]
}

At any wake legitimate or spurious, you have to recheck the state.
It's not clear to me that this is needed.

Also, by indirecting the CV one level, you potentially introduce some races
where the CV is signaled during error retry. 

In the normal case, if the CV is signaled during a spurious wakeup, generally
there's not a problem as the conditions are rechecked. 

In the retry case, if the CV is signaled, that signal gets lost during the
retry logic and we sleep until timeout or forever.

prashant.n

> Also, by indirecting the CV one level, you potentially introduce some races > where ...

4 years, 2 months ago (2016-09-30 02:16:39 UTC) #17

prashant.n

> In such case, if we check whether lock is acquired or not before retrying ...

4 years, 2 months ago (2016-09-30 09:09:53 UTC) #18

prashant.n

IMO, the patch should work provided that SleepConditionVariableSRW() fails on spurious wakeups. If the function ...

4 years, 2 months ago (2016-09-30 11:34:37 UTC) #19

robliao

On 2016/09/30 11:34:37, prashant.n wrote: > IMO, the patch should work provided that SleepConditionVariableSRW() fails ...

4 years, 2 months ago (2016-09-30 17:18:42 UTC) #20

On 2016/09/30 11:34:37, prashant.n wrote:
> IMO, the patch should work provided that SleepConditionVariableSRW() fails on
> spurious wakeups. If the function does not fail, then we can have mechanism
> explained below.
> 
> 1. Suppose there are three threads t1, t2 and t3 running on three different
> processor cores. t1 has acquired lock and other two t2 and t3 are waiting for
> condition variable.
> 2. t1 is done and it signals the condition variable.
> 3. t2 and t3 both wake up at the same time.
> 4. Suppose t2 succeeds acquiring lock, then sleep function of t3 fails and we
> need to continue to wait for the remaining timeout as if there was no
> signalling.
> 
> If we don't want to handle this mechanism at src/base level, we can handle it
in
> client side where condition variable's spurious wakeups are causing problems.
> 
> +++++++++++++++++++++++++++++
> 
> If sleep function does not fail, we can have logic something like below -
> 
> ConditionVariable::Wait() {
>   // check held and unmark.
>   owning_thread_ref_ = PlatformThreadRef(); // lock owning ref is 0.
> 
>   while (1) {
>     SleepConditionVariableSRW(timeout);
> 
>     // only we acquired lock.
>     if (owning_thread_ref_ == PlatformThreadRef()) {
>       // check unheld and mark
>       owning_thread_ref_ = PlatformThread::CurrentRef(); // current thread is
> owner.
>       break;
>     }
>     
>     // compute remaining timeout.
>   }
> }

> But here there is no guarantee of whether lock is acquired or not. And there
is
> also the case of stolen wakeups, which would occur in case of Broadcast().
> 
> How about having function something like below which would give us whether
lock
> is acquired or not after the call to SleepConditionVariableSRW() and if not
> continue sleep again until we get expected return values in case of retries.
> 
> void Lock::IsAcquired() const {
>   return owning_thread_ref_ == PlatformThread::CurrentRef();
> }
SleepConditionVariableSRW guarantees the thread owns the lock regardless of
return value.
The last thing it does is attempt to reacquire the lock on success or failure.


> In the retry case, if the CV is signaled, that signal gets lost during the
> retry logic and we sleep until timeout or forever.

> This scenario would be -
> Suppose there are two threads t1 and t2 using condition variable.
> 1. t1 is sleeping on condition variable and due to spurious wakeup it comes
out
> of sleep function.
> 2. t2 has signalled the condition variable.
> 3. t1 is retrying to sleep.
> 
> In such case, if we check whether lock is acquired or not before retrying
would
> help. If the lock is not acquired and we sleep, then that sleep should
> immediately return as it can easily acquire the lock.
> 
> In case of non-infinite timeouts, with delta of Time::Now(), it will timeout
or
> succeed.
> 
> In case of infinite timeouts, we can keep next retry timeouts smaller (e.g. 50
> ms), so that we may sleep momentarily.
> 
> The signal once lost, it will not be usable further as condition variables are
> stateless.
This is indistinguishable from a normal timeout, suggesting that there is a race
condition in the code.
t2 should only signal the condition variable once it has updated the requisite
state to allow
t1 to move forward. t1 should only check the state in a lock.

> The situation of spurious and stolen wakeups would not arise if there is one
> producer and one consumer. When there are multiple consumers, then signal
should
> ideally wake up only one consumer. But due to multiprocessor architecture,
there
> can be multiple wakeups out of which only one succeeds and others fail. Due to
> unexpected wakeup, the SleepConditionVariableSRW() may return different error.
> IMO, in such case we can continue to wait again if the expected error is not
> returned.
Designs that have multiple consumers need to account for this by rechecking
their
own state before proceeding forward.


> IMO, the patch should work provided that SleepConditionVariableSRW() fails on
> spurious wakeups. If the function does not fail, then we can have mechanism
> explained below.
> 
> 1. Suppose there are three threads t1, t2 and t3 running on three different
> processor cores. t1 has acquired lock and other two t2 and t3 are waiting for
condition variable.
> 2. t1 is done and it signals the condition variable.
> > 3. t2 and t3 both wake up at the same time.
> 4. Suppose t2 succeeds acquiring lock, then sleep function of t3 fails and we
> need to continue to wait for the remaining timeout as if there was no
> signalling.
> 
> If we don't want to handle this mechanism at src/base level, we can handle it
in
> client side where condition variable's spurious wakeups are causing problems.

There are two main scenarios to consider when dealing with the CV:
  1) A CV sleep with timeout - A spurious wakeup is indistinguishable from a
timeout.
  2) A CV sleep with timeout of infinite - A spurious wakeup is
indistinguishable from a real wakeup.

In both cases, the code using the CV is required to check its state before
proceeding or sleeping
on success or failure. The retry code doesn't help in either case.

Taking a step back, what evidence is there that spurious wakeups are actually a
problem?

danakj

There is nothing windows-specific about spuriously waking up from wait either. pthreads says this: "When ...

4 years, 2 months ago (2016-09-30 18:12:42 UTC) #21

There is nothing windows-specific about spuriously waking up from wait
either. pthreads says this: "When using condition variables there is always
a Boolean predicate involving shared variables associated with each
condition wait that is true if the thread should proceed. Spurious wakeups
from the pthread_cond_timedwait() or pthread_cond_wait() functions may
occur. Since the return from pthread_cond_timedwait() or
pthread_cond_wait() does not imply anything about the value of this
predicate, the predicate should be re-evaluated upon such return." [1]
<http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwa...>

We probably have code that is waiting and failing to check anything, I
think I can name a couple of places like this actually, and I guess they
are wrong.

On Fri, Sep 30, 2016 at 4:34 AM, <prashant.n@samsung.com> wrote:

> IMO, the patch should work provided that SleepConditionVariableSRW() fails
> on
> spurious wakeups. If the function does not fail, then we can have mechanism
> explained below.
>
> 1. Suppose there are three threads t1, t2 and t3 running on three different
> processor cores. t1 has acquired lock and other two t2 and t3 are waiting
> for
> condition variable.
> 2. t1 is done and it signals the condition variable.
> 3. t2 and t3 both wake up at the same time.
> 4. Suppose t2 succeeds acquiring lock, then sleep function of t3 fails and
> we
> need to continue to wait for the remaining timeout as if there was no
> signalling.
>
> If we don't want to handle this mechanism at src/base level, we can handle
> it in
> client side where condition variable's spurious wakeups are causing
> problems.
>
> +++++++++++++++++++++++++++++
>
> If sleep function does not fail, we can have logic something like below -
>
> ConditionVariable::Wait() {
> // check held and unmark.
> owning_thread_ref_ = PlatformThreadRef(); // lock owning ref is 0.
>
> while (1) {
> SleepConditionVariableSRW(timeout);
>
> // only we acquired lock.
> if (owning_thread_ref_ == PlatformThreadRef()) {
> // check unheld and mark
> owning_thread_ref_ = PlatformThread::CurrentRef(); // current thread is
> owner.
> break;
> }
>
> // compute remaining timeout.
> }
> }
>
> https://codereview.chromium.org/2381893002/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

prashant.n

> There are two main scenarios to consider when dealing with the CV: > 1) ...

4 years, 2 months ago (2016-09-30 18:12:55 UTC) #22

danakj

On Fri, Sep 30, 2016 at 11:12 AM, <danakj@chromium.org> wrote: > There is nothing windows-specific ...

4 years, 2 months ago (2016-09-30 18:25:19 UTC) #23

On Fri, Sep 30, 2016 at 11:12 AM, <danakj@chromium.org> wrote:

> There is nothing windows-specific about spuriously waking up from wait
> either. pthreads says this: "When using condition variables there is always
> a Boolean predicate involving shared variables associated with each
> condition wait that is true if the thread should proceed. Spurious wakeups
> from the pthread_cond_timedwait() or pthread_cond_wait() functions may
> occur. Since the return from pthread_cond_timedwait() or
> pthread_cond_wait() does not imply anything about the value of this
> predicate, the predicate should be re-evaluated upon such return." [1]
>
<http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwa...>
>
> We probably have code that is waiting and failing to check anything, I
> think I can name a couple of places like this actually, and I guess they
> are wrong.
>

Oh, I'm thinking of base::WaitableEvent, but you're talking about
ConditionVariable. Maybe you want to be using WaitableEvent?


>
> On Fri, Sep 30, 2016 at 4:34 AM, <prashant.n@samsung.com> wrote:
>
>> IMO, the patch should work provided that SleepConditionVariableSRW()
>> fails on
>> spurious wakeups. If the function does not fail, then we can have
>> mechanism
>> explained below.
>>
>> 1. Suppose there are three threads t1, t2 and t3 running on three
>> different
>> processor cores. t1 has acquired lock and other two t2 and t3 are waiting
>> for
>> condition variable.
>> 2. t1 is done and it signals the condition variable.
>> 3. t2 and t3 both wake up at the same time.
>> 4. Suppose t2 succeeds acquiring lock, then sleep function of t3 fails
>> and we
>> need to continue to wait for the remaining timeout as if there was no
>> signalling.
>>
>> If we don't want to handle this mechanism at src/base level, we can
>> handle it in
>> client side where condition variable's spurious wakeups are causing
>> problems.
>>
>> +++++++++++++++++++++++++++++
>>
>> If sleep function does not fail, we can have logic something like below -
>>
>> ConditionVariable::Wait() {
>> // check held and unmark.
>> owning_thread_ref_ = PlatformThreadRef(); // lock owning ref is 0.
>>
>> while (1) {
>> SleepConditionVariableSRW(timeout);
>>
>> // only we acquired lock.
>> if (owning_thread_ref_ == PlatformThreadRef()) {
>> // check unheld and mark
>> owning_thread_ref_ = PlatformThread::CurrentRef(); // current thread is
>> owner.
>> break;
>> }
>>
>> // compute remaining timeout.
>> }
>> }
>>
>> https://codereview.chromium.org/2381893002/
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

robliao

> In case of spurious wakeup, is acquiring lock after signal is received > guaranteed? ...

4 years, 2 months ago (2016-09-30 18:35:27 UTC) #24

> In case of spurious wakeup, is acquiring lock after signal is received
> guaranteed?

> The spurious wakeup means single signal acted by two consumer threads
> simultaneously. In such case only 1 can acquire the lock and other may
fail as
> it got signalled but may or may not acquire the lock.

Yes. It is an error not to reacquire the lock because the thread presumes
it had
the lock before. It is a necessary postcondition under all cases, success
or failure
to reacquire the lock. The code in Windows and in posix does this today.


> The way condition variable is used in categorized worker pool is correct
and the
same mechanism works perfectly fine on linux. But only on few Windows
machines
it crashes and call stack shows calls which are of wait() and function to be
called after wait(). This is just the suspect, the root cause may be
different.

If you're referring to https://crbug.com/617010 for the crashes, those
aren't actually crashes.
Those are hangs reported by the hang detector. It is known that it's
misclassifying these as threads as hangs.
Incidentally, that bug is marked as fixed.



> Oh, I'm thinking of base::WaitableEvent, but you're talking about
ConditionVariable. Maybe you want to be using WaitableEvent?
Agreed. If you don't need external state to proceed and simply need a
wakeup call, WaitableEvent is the one to use.

CategorizedWorkerPool::Run does the correct thing and rechecks its state on
CV wakes.

On Fri, Sep 30, 2016 at 11:24 AM, <danakj@chromium.org> wrote:

> On Fri, Sep 30, 2016 at 11:12 AM, <danakj@chromium.org> wrote:
>
>> There is nothing windows-specific about spuriously waking up from wait
>> either. pthreads says this: "When using condition variables there is always
>> a Boolean predicate involving shared variables associated with each
>> condition wait that is true if the thread should proceed. Spurious wakeups
>> from the pthread_cond_timedwait() or pthread_cond_wait() functions may
>> occur. Since the return from pthread_cond_timedwait() or
>> pthread_cond_wait() does not imply anything about the value of this
>> predicate, the predicate should be re-evaluated upon such return." [1]
>>
<http://pubs.opengroup.org/onlinepubs/009695399/functions/pthread_cond_timedwa...>
>>
>> We probably have code that is waiting and failing to check anything, I
>> think I can name a couple of places like this actually, and I guess they
>> are wrong.
>>
>
> Oh, I'm thinking of base::WaitableEvent, but you're talking about
> ConditionVariable. Maybe you want to be using WaitableEvent?
>
>
>>
>> On Fri, Sep 30, 2016 at 4:34 AM, <prashant.n@samsung.com> wrote:
>>
>>> IMO, the patch should work provided that SleepConditionVariableSRW()
>>> fails on
>>> spurious wakeups. If the function does not fail, then we can have
>>> mechanism
>>> explained below.
>>>
>>> 1. Suppose there are three threads t1, t2 and t3 running on three
>>> different
>>> processor cores. t1 has acquired lock and other two t2 and t3 are
>>> waiting for
>>> condition variable.
>>> 2. t1 is done and it signals the condition variable.
>>> 3. t2 and t3 both wake up at the same time.
>>> 4. Suppose t2 succeeds acquiring lock, then sleep function of t3 fails
>>> and we
>>> need to continue to wait for the remaining timeout as if there was no
>>> signalling.
>>>
>>> If we don't want to handle this mechanism at src/base level, we can
>>> handle it in
>>> client side where condition variable's spurious wakeups are causing
>>> problems.
>>>
>>> +++++++++++++++++++++++++++++
>>>
>>> If sleep function does not fail, we can have logic something like below -
>>>
>>> ConditionVariable::Wait() {
>>> // check held and unmark.
>>> owning_thread_ref_ = PlatformThreadRef(); // lock owning ref is 0.
>>>
>>> while (1) {
>>> SleepConditionVariableSRW(timeout);
>>>
>>> // only we acquired lock.
>>> if (owning_thread_ref_ == PlatformThreadRef()) {
>>> // check unheld and mark
>>> owning_thread_ref_ = PlatformThread::CurrentRef(); // current thread is
>>> owner.
>>> break;
>>> }
>>>
>>> // compute remaining timeout.
>>> }
>>> }
>>>
>>> https://codereview.chromium.org/2381893002/
>>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

prashant.n

> Yes. It is an error not to reacquire the lock because the thread presumes ...

4 years, 2 months ago (2016-10-01 00:12:55 UTC) #25

prashant.n

prashant.n@samsung.com changed reviewers: + reveman@chromium.org, vmpstr@chromium.org

4 years, 2 months ago (2016-10-01 00:13:41 UTC) #26

prashant.n

My conclusion on this is - We cannot have spurious wakeups (as we don't see ...

4 years, 2 months ago (2016-10-03 05:22:40 UTC) #28

reveman

A while loop for this on windows seems like the appropriate fix based on linked ...

4 years, 2 months ago (2016-10-03 10:58:07 UTC) #29

prashant.n

> Is there a way we can test this in a unit test? I'm not ...

4 years, 2 months ago (2016-10-03 11:39:50 UTC) #30

prashant.n

> Is there a way we can test this in a unit test? I'm not ...

4 years, 2 months ago (2016-10-03 11:39:50 UTC) #31

prashant.n

https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc File base/synchronization/condition_variable_win.cc (right): https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc#newcode20 base/synchronization/condition_variable_win.cc:20: static const int kMaxNumberOfTriesAfterFailure = 10; On 2016/10/03 10:58:07, ...

4 years, 2 months ago (2016-10-03 11:40:14 UTC) #32

prashant.n

On 2016/10/03 11:39:50, prashant.n wrote: > > Is there a way we can test this ...

4 years, 2 months ago (2016-10-03 13:11:27 UTC) #33

reveman

https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc File base/synchronization/condition_variable_win.cc (right): https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc#newcode20 base/synchronization/condition_variable_win.cc:20: static const int kMaxNumberOfTriesAfterFailure = 10; On 2016/10/03 at ...

4 years, 2 months ago (2016-10-03 13:23:40 UTC) #34

prashant.n

On 2016/10/03 13:23:40, reveman wrote: > > The situation is very similar to how we ...

4 years, 2 months ago (2016-10-03 17:04:54 UTC) #35

robliao

On 2016/10/03 13:23:40, reveman wrote: > https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc > File base/synchronization/condition_variable_win.cc (right): > > https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc#newcode20 > ...

4 years, 2 months ago (2016-10-03 17:05:08 UTC) #36

prashant.n

On 2016/10/03 17:05:08, robliao wrote: > > As indicated before, clients will naturally have a ...

4 years, 2 months ago (2016-10-03 17:23:58 UTC) #37

reveman

On 2016/10/03 at 17:05:08, robliao wrote: > On 2016/10/03 13:23:40, reveman wrote: > > https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/condition_variable_win.cc ...

4 years, 2 months ago (2016-10-03 19:46:17 UTC) #38

On 2016/10/03 at 17:05:08, robliao wrote:
> On 2016/10/03 13:23:40, reveman wrote:
> >
https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/co...
> > File base/synchronization/condition_variable_win.cc (right):
> > 
> >
https://codereview.chromium.org/2381893002/diff/60001/base/synchronization/co...
> > base/synchronization/condition_variable_win.cc:20: static const int
> > kMaxNumberOfTriesAfterFailure = 10;
> > On 2016/10/03 at 11:40:14, prashant.n wrote:
> > > On 2016/10/03 10:58:07, reveman wrote:
> > > > do we need to limit the max number of retries? what if 10 retries is
needed
> > for
> > > > a valid use case?
> > > 
> > > Hmm. Actually MSDN documentation tells for ever while loop, no limits.
> > > 
> > > I thought putting some limits, so that we can control looping erroneous
> > conditions. May be we can decide on proper limit number.
> > > 
> > > In case CategorizedWorkerPool, while loop without limits is okay, but for
> > other users of condition variable it may cause forever waits. (Not confirmed
in
> > code though.)
> > 
> > The situation is very similar to how we handle interrupts on posix
platforms. In
> > that case we loop forever in debug builds and make 100 attempts in release
> > builds: 
> > https://cs.chromium.org/chromium/src/base/posix/eintr_wrapper.h?rcl=0&l=37
> > 
> > maybe do the same here?
> 
> As indicated before, clients will naturally have a while loop rechecking their
condition when using a condition variable. 
> There isn't a need for special handling of spurious wakeups. Spurious wakeups
are indistinguishable from timeouts as well.

Ah, it's documented here:
https://cs.chromium.org/chromium/src/base/synchronization/condition_variable....

I think it's perfectly fine to leave this up to the caller but would be nice to
also document it here:
https://cs.chromium.org/chromium/src/base/synchronization/condition_variable....

as currently it sounds like Wait() will only return iff the CV was signaled if
you read the comment next to the function declaration.

prashant.n

On 2016/10/03 19:46:17, reveman wrote: > Ah, it's documented here: > https://cs.chromium.org/chromium/src/base/synchronization/condition_variable.h?rcl=0&l=18 > > I ...

4 years, 2 months ago (2016-10-04 01:38:41 UTC) #39

prashant.n

On 2016/10/04 01:38:41, prashant.n wrote: > Hmm. Then should we patch up at call site ...

4 years, 2 months ago (2016-10-04 07:26:23 UTC) #40

prashant.n

On 2016/10/04 07:26:23, prashant.n wrote: > On 2016/10/04 01:38:41, prashant.n wrote: > > Hmm. Then ...

4 years, 2 months ago (2016-10-04 13:42:05 UTC) #41

robliao

On 2016/10/04 13:42:05, prashant.n wrote: > On 2016/10/04 07:26:23, prashant.n wrote: > > On 2016/10/04 ...

4 years, 2 months ago (2016-10-05 00:21:18 UTC) #42

prashant.n

On 2016/10/05 00:21:18, robliao wrote: > CWP needs to be robust to spurious wakeups. If ...

4 years, 2 months ago (2016-10-05 07:01:27 UTC) #43

robliao

On 2016/10/05 07:01:27, prashant.n wrote: > On 2016/10/05 00:21:18, robliao wrote: > > CWP needs ...

4 years, 2 months ago (2016-10-07 22:33:53 UTC) #44

robliao

4 years, 1 month ago (2016-10-27 23:23:12 UTC) #45

Message was sent while issue was closed.

On 2016/10/07 22:33:53, robliao wrote:
> On 2016/10/05 07:01:27, prashant.n wrote:
> > On 2016/10/05 00:21:18, robliao wrote:
> > > CWP needs to be robust to spurious wakeups. If CWP can't make forward
> progress
> > > due to deadlocking, that suggests abug.
> > 
> > The CWP handles this situation and will not lead to deadlocks assuming
> > SleepConditionVariableSRW() provides atomic operations.
> > 
> > On waiting,
> > SleepConditionVariableSRW() = Unlock + Start Waiting
> > 
> > On signalling,
> > it should function as = Stop Waiting + Lock
> > 
> > 
> > Suppose there are two threads t1 and t2. t1 is about to wait on cv and t2
has
> > already been waiting on same cv and due to spurious/stolen wake up, both get
> > woken up from cv, then only one can succeed to acquire the lock, other will
> > continue to wait to acquire the lock.
> 
> Are we all agreed that we do not need this change?

Closing this patch for now. Feel free to reopen if the need presents itself.

Issue 2381893002: base: Take care of unexpected wakeups in condition_variable_win.cc. (Closed)

Description

Patch Set 1 #

Patch Set 2 : build break #

Patch Set 3 : nits #

Patch Set 4 : build break #

Messages