|
|
Chromium Code Reviews|
Created:
4 years, 1 month ago by Primiano Tucci (use gerrit) Modified:
4 years, 1 month ago CC:
chromium-reviews, tracing+reviews_chromium.org, wfh+watch_chromium.org, Lei Zhang Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptiontracing: fix races in CategoryRegistry
The previous CL crrev.com/2452063003 did introduce a subtle race and
a more trivial test flakiness.
1) race in trace_log.cc: this is a classical time-of-check vs
time-of-use from school books. The newly introduced
GetOrCreateCategoryByName() API was badly designed. If two threads
hit that concurrently only one of the two would see a true retval and initialize
the category (good) but the other one will carry on and use the uninitialized
category before the first one has completed the initialization from the
TraceLog (bad).
This was a consequence of trying to separate the locks between TraceLog
and CategoryRegistry. The new design is simpler and more aligned with
the old behavior. TraceLog is the only one handling a lock and
CategoryRegistry expects the caller to serialize calls, exposing an
explicit GetOrCreateCategoryLocked method.
2) flakiness in trace_category_unittest.cc: categories cannot be
reset once they are created, even in tests. This is because
the static pointers injected by the TRACE_EVENT macros cannot
be reset once they have been initialized so once hit a category
pointer must be stable for the entire lifetime of the process (this
behavior predates crrev.com/2452063003 and isn't a recent change).
Concretely this is causing an interference between the two new
test fixtures recently introduced in crrev.com/2452063003 if they are
ran in reverse order. Fixed by changing the prefix of the category
names so they don't collide.
BUG=659689
Committed: https://crrev.com/5a5b09bbda9a7399e602b4cc01e0c56bb187e651
Cr-Commit-Position: refs/heads/master@{#433601}
Patch Set 1 #Patch Set 2 : . #
Total comments: 2
Dependent Patchsets: Messages
Total messages: 23 (15 generated)
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== tracing: fix races in tests The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from playbooks. The GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls to GetOrCreateCategoryLocked. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once created a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ========== to ========== tracing: fix races in CategoryRegistry tests The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ==========
The CQ bit was checked by primiano@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
primiano@chromium.org changed reviewers: + oysteine@chromium.org
Small fix. long flights are great to debug races.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Adding some comment to explain the situation better https://codereview.chromium.org/2495173002/diff/20001/base/trace_event/trace_... File base/trace_event/trace_log.cc (left): https://codereview.chromium.org/2495173002/diff/20001/base/trace_event/trace_... base/trace_event/trace_log.cc:498: CategoryRegistry::GetOrCreateCategoryByName(category_group, &category); Essentially this was the core of the problem. The original code, before my previous CL, was quite similar to the latest patchset. Then, in my previous CL, I modified as follows: if (category_NOT_seen_before) { 1. Lock 2. Create Category 3. Unlock 4. Update category flags looking at trace config } This opened a race window such that if two threads hit this code at the same time: - They will only create one category entry, as the creation is locked (good) - The second thread might use the category before the first thread as hit 4 (bad) So, in practice, what can happen is that: two threads hit TRACE_EVENT0("something-enabled-by-default"). 1. Thread 1: adds "something-enabled-by-default" to the registry 2. Thread 2: sees that "something-enabled-by-default" is already in the registry and uses that. However since T1 didn't initialize the flags yet, T2 will assume that the category is not enabled and skip and not add its event to the log. 3. T1: does the initialization: queries TraceConfig, figures out that "something-enabled-by-default" is enabled, and updates the flag of the category. Now, this is not really a drama. It means that if two thread hit a new TRACE_EVENT with the same category at the same time, one of the two might be not recorded. However, there are unitttests that depend on this behavior. Also this behavior was guaranteed by the previous design, before my previous CL, so it's seems reasonable to re-establish. As an unrelated note, this is yet another reason to kill the concept of "dynamic" categories in the future and only have statically defined categories.
Description was changed from ========== tracing: fix races in CategoryRegistry tests The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ========== to ========== tracing: fix races in CategoryRegistry The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ==========
primiano@chromium.org changed reviewers: + simonhatch@chromium.org
Simon, can I ask you to review this? Oystein went on holidays and this is quite important as is causing races on TSan bots (See crbug.com/666592) I left some inline comments in #10 to explain what's going on. Thanks
On 2016/11/21 13:54:25, Primiano - slow(travelling) wrote: > Simon, can I ask you to review this? Oystein went on holidays and this is quite > important as is causing races on TSan bots (See crbug.com/666592) > > I left some inline comments in #10 to explain what's going on. > > Thanks Thanks for the detailed description/comments, really made this easy to follow. lgtm
The CQ bit was checked by primiano@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
thestig@chromium.org changed reviewers: + thestig@chromium.org
https://codereview.chromium.org/2495173002/diff/20001/base/trace_event/catego... File base/trace_event/category_registry.h (right): https://codereview.chromium.org/2495173002/diff/20001/base/trace_event/catego... base/trace_event/category_registry.h:21: // Allows fast and thread-safe acces to the state of all tracing categories. (Maybe later since it's already in CQ) typo: access
CQ is committing da patch.
Bot data: {"patchset_id": 20001, "attempt_start_ts": 1479750894688620,
"parent_rev": "0bf2a7127c4158cdd85140fd95f2722d850b15ae", "commit_rev":
"3afc1a698da967790215bf0da80e1ab56948f119"}
Message was sent while issue was closed.
Description was changed from ========== tracing: fix races in CategoryRegistry The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ========== to ========== tracing: fix races in CategoryRegistry The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001)
Message was sent while issue was closed.
Description was changed from ========== tracing: fix races in CategoryRegistry The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 ========== to ========== tracing: fix races in CategoryRegistry The previous CL crrev.com/2452063003 did introduce a subtle race and a more trivial test flakiness. 1) race in trace_log.cc: this is a classical time-of-check vs time-of-use from school books. The newly introduced GetOrCreateCategoryByName() API was badly designed. If two threads hit that concurrently only one of the two would see a true retval and initialize the category (good) but the other one will carry on and use the uninitialized category before the first one has completed the initialization from the TraceLog (bad). This was a consequence of trying to separate the locks between TraceLog and CategoryRegistry. The new design is simpler and more aligned with the old behavior. TraceLog is the only one handling a lock and CategoryRegistry expects the caller to serialize calls, exposing an explicit GetOrCreateCategoryLocked method. 2) flakiness in trace_category_unittest.cc: categories cannot be reset once they are created, even in tests. This is because the static pointers injected by the TRACE_EVENT macros cannot be reset once they have been initialized so once hit a category pointer must be stable for the entire lifetime of the process (this behavior predates crrev.com/2452063003 and isn't a recent change). Concretely this is causing an interference between the two new test fixtures recently introduced in crrev.com/2452063003 if they are ran in reverse order. Fixed by changing the prefix of the category names so they don't collide. BUG=659689 Committed: https://crrev.com/5a5b09bbda9a7399e602b4cc01e0c56bb187e651 Cr-Commit-Position: refs/heads/master@{#433601} ==========
Message was sent while issue was closed.
Patchset 2 (id:??) landed as https://crrev.com/5a5b09bbda9a7399e602b4cc01e0c56bb187e651 Cr-Commit-Position: refs/heads/master@{#433601} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
