|
|
Created:
6 years, 5 months ago by Sami Modified:
6 years, 5 months ago CC:
chromium-reviews, dsinclair+watch_chromium.org, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Project:
chromium Visibility:
Public. |
DescriptionFix data race with synthetic delay initialization
The synthetic delay macro initializes a local atomic variable to point
to a delay implementation class. Because initializing the class involves
writing data to memory which is subsequently deferences through the
atomic pointer, we need to use memory barries when updating the pointer
to make sure the data behind it is committed.
Thanks to Alexander Potapenko for spotting this.
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=284203
Patch Set 1 #
Total comments: 2
Patch Set 2 : Review comments. #Patch Set 3 : Rebased. #Patch Set 4 : Just the fix. #
Messages
Total messages: 17 (0 generated)
lgtm
LGTM https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... File base/debug/trace_event_synthetic_delay_unittest.cc (right): https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... base/debug/trace_event_synthetic_delay_unittest.cc:172: // synthetic delays so that dynamic analyzers like TSAN can spot any That's 'TSan', not 'TSAN' :) Have you actually tried running it on this test? https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... base/debug/trace_event_synthetic_delay_unittest.cc:177: threads[i].Start(); I think there should be two-space indentation here (and in the loop below)
On 2014/07/15 14:30:15, Alexander Potapenko wrote: > LGTM > > https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... > File base/debug/trace_event_synthetic_delay_unittest.cc (right): > > https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... > base/debug/trace_event_synthetic_delay_unittest.cc:172: // synthetic delays so > that dynamic analyzers like TSAN can spot any > That's 'TSan', not 'TSAN' :) Fixed. > Have you actually tried running it on this test? Not yet -- I was going to use the tsan trybot but looks like I forgot to tick the box :P > https://codereview.chromium.org/392933002/diff/1/base/debug/trace_event_synth... > base/debug/trace_event_synthetic_delay_unittest.cc:177: threads[i].Start(); > I think there should be two-space indentation here (and in the loop below) D'oh, done.
I've tried your test under TSan with barriers removed, and it didn't report any data races. This makes me think the test doesn't expose the race condition in the code.
On 2014/07/15 15:51:08, Alexander Potapenko wrote: > I've tried your test under TSan with barriers removed, and it didn't report any > data races. > This makes me think the test doesn't expose the race condition in the code. Hmm, I wonder why. My understanding is that TSan can have false negatives, but mostly in fairly rare corner cases.
On 2014/07/15 15:51:08, Alexander Potapenko wrote: > I've tried your test under TSan with barriers removed, and it didn't report any > data races. > This makes me think the test doesn't expose the race condition in the code. Or maybe I was overcautious about the barriers, but I still think they're necessary here. Perhaps the accesses in TRACE_EVENT_SYNTHETIC_DELAY(kThreadedDelayName) and TraceEventSyntheticDelay::Lookup are getting synchronized by the Acquire_Load/Release_Store pair on |delay_count_|.
I've also noticed the ANNOTATE_BENIGN_RACE macros in trace_event_synthetic_delay.cc - did you copy them from base/debug/trace_event.h?
On 2014/07/15 16:05:12, Alexander Potapenko wrote: > On 2014/07/15 15:51:08, Alexander Potapenko wrote: > > I've tried your test under TSan with barriers removed, and it didn't report > any > > data races. > > This makes me think the test doesn't expose the race condition in the code. > > Or maybe I was overcautious about the barriers, but I still think they're > necessary here. > Perhaps the accesses in TRACE_EVENT_SYNTHETIC_DELAY(kThreadedDelayName) and > TraceEventSyntheticDelay::Lookup are getting synchronized by the > Acquire_Load/Release_Store pair on |delay_count_|. I've the following scenario in my mind: - |storage| is initially zero - first thread calls GetOrCreateDelay(kName, &storage) - second thread calls GetOrCreateDelay(kName, &storage)->CalculateEndTimeLocked() (or any other method of TraceEventSyntheticDelay) Could this be possible in the real life? This is actually the situation when the TraceEventSyntheticDelay initialization can be reordered with the write to |storage| and the subsequent accesses to that objects from other threads can see wrong data.
On 2014/07/15 16:30:59, Alexander Potapenko wrote: > I've the following scenario in my mind: > - |storage| is initially zero > - first thread calls GetOrCreateDelay(kName, &storage) > - second thread calls GetOrCreateDelay(kName, > &storage)->CalculateEndTimeLocked() (or any other method of > TraceEventSyntheticDelay) > > Could this be possible in the real life? > This is actually the situation when the TraceEventSyntheticDelay initialization > can be reordered with the write to |storage| and the subsequent accesses to that > objects from other threads can see wrong data. I played around with this in tsan a bit and it looks like the ANNOTATE_BENIGN_RACE in BeginParallel() is somehow hiding the race. If I remove the !target_duration_.ToInternalValue() check, then TSan reports the race in the NoBarrier variant but is happy with the code from this patch. I'm not sure if that's a bug in TSan or my code. In any case, I think this patch is still correct, but it's very tricky to write a test for it.
On 2014/07/15 18:22:36, Sami wrote: > On 2014/07/15 16:30:59, Alexander Potapenko wrote: > > I've the following scenario in my mind: > > - |storage| is initially zero > > - first thread calls GetOrCreateDelay(kName, &storage) > > - second thread calls GetOrCreateDelay(kName, > > &storage)->CalculateEndTimeLocked() (or any other method of > > TraceEventSyntheticDelay) > > > > Could this be possible in the real life? > > This is actually the situation when the TraceEventSyntheticDelay > initialization > > can be reordered with the write to |storage| and the subsequent accesses to > that > > objects from other threads can see wrong data. > > I played around with this in tsan a bit and it looks like the > ANNOTATE_BENIGN_RACE in BeginParallel() is somehow hiding the race. If I remove > the !target_duration_.ToInternalValue() check, then TSan reports the race in the > NoBarrier variant but is happy with the code from this patch. I'm not sure if > that's a bug in TSan or my code. For me the bug appears in both cases. Looking into that. > In any case, I think this patch is still correct, but it's very tricky to write > a test for it. Agreed. If your patch fixes this particular test we can consider it a regtest.
By the way have you noticed the lock inversion reported by the TSan trybot: http://build.chromium.org/p/tryserver.chromium/builders/linux_clang_tsan/buil... ?
> > I played around with this in tsan a bit and it looks like the > > ANNOTATE_BENIGN_RACE in BeginParallel() is somehow hiding the race. If I > remove > > the !target_duration_.ToInternalValue() check, then TSan reports the race in > the > > NoBarrier variant but is happy with the code from this patch. I'm not sure if > > that's a bug in TSan or my code. > For me the bug appears in both cases. Looking into that. Double-checked: when I apply your patch and remove the ANNOTATE_BENIGN_RACE lines the race is still reported. This is actually expected, as accesses to target_duration_ aren't synchronized (i.e. there's a race in the test). How about landing just the trace_event_synthetic_delay.cc part of the patch?
Sorry for the delay in getting back to this... On 2014/07/16 12:21:03, Alexander Potapenko wrote: > By the way have you noticed the lock inversion reported by the TSan trybot: > http://build.chromium.org/p/tryserver.chromium/builders/linux_clang_tsan/buil... Hmm, no I hadn't. Filed crbug.com/395156. > Double-checked: when I apply your patch and remove the ANNOTATE_BENIGN_RACE > lines the race is still reported. > This is actually expected, as accesses to target_duration_ aren't synchronized > (i.e. there's a race in the test). Weird, here's what happens for me: 1. trace_event_synthetic_delay.cc + the new test: No errors from TSan. 2. Just the new test: No errors from TSan. > How about landing just the trace_event_synthetic_delay.cc part of the patch? Ok, I think we can agree at least that part is correct :)
The CQ bit was checked by skyostil@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/skyostil@chromium.org/392933002/60001
Message was sent while issue was closed.
Change committed as 284203 |