|
|
Created:
4 years, 11 months ago by Anand Mistry (off Chromium) Modified:
4 years, 10 months ago Reviewers:
kinuko CC:
chromium-reviews, darin-cc_chromium.org, jam, chrome-apps-syd-reviews_chromium.org, gab Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionFix races in BrowserThreadImpl, and some related updates.
This change fixes two different races:
1. On destruction of BrowserThreadImpl, the thread is removed from the global
thread table after it is stopped. The process of stopping the thread destroys
the thread's message loop. However, PostTaskHelper accesses the message loop,
which races with the stopping thread. Reordering the two operations is not
sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread)
stopping the thread.
2. There's a missing lock in GetCurrentThreadIdentifier when accessing the
global thread table. To track whether or not this causes lock contention, a
ScopedTracker has been added.
The two other changes are:
1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the
relevant objects are leaky.
2. Remove impossible cases in BrowserThreadImpl::Run().
BUG=115540
Committed: https://crrev.com/83e3480a731d95a97ccc1eacd7b3356ded42b0d4
Cr-Commit-Position: refs/heads/master@{#370560}
Patch Set 1 #Patch Set 2 : V2 #Patch Set 3 : fix #Patch Set 4 : For review #
Total comments: 2
Patch Set 5 : Review comment. #
Total comments: 5
Patch Set 6 : Remove useless code. #Patch Set 7 : Add comment. #Patch Set 8 : Fix shutdown race. #Patch Set 9 : Fix tests failures. #
Total comments: 4
Patch Set 10 : Revert to not using thread-local. #Patch Set 11 : Rebase #
Messages
Total messages: 33 (8 generated)
Description was changed from ========== Fix race in BrowserThread::GetCurrentThreadIdentifier. BUG=115540 ========== to ========== Fix a race in BrowserThreadImpl, and some related updates. Due to an "avoid locking" optimisation, it's possible for GetCurrentThreadIdentifier to cause a nullptr dereference. To fix this, store a pointer to the current thread's BrowserThreadImpl in a thread-local. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ==========
amistry@chromium.org changed reviewers: + kinuko@chromium.org
WDYT?
The bug itself looks to be old / have existed before my change? But ok I see GetCurrentThreadIdentifier() doesn't lock when it looks into g_globals..
I think this is looking good. One nit + I'll take one more look https://codereview.chromium.org/1564193002/diff/60001/content/browser/browser... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/60001/content/browser/browser... content/browser/browser_thread_impl.cc:262: default: Let's avoid having default here, which makes it harder to catch changes made on the enum. Instead we could just have NOTREACHED (or CHECK(false)) after this switch/case.
Honestly, I kinda picked you as a reviewer at random. Feel free to add another reviewer if you think this needs a second look. https://codereview.chromium.org/1564193002/diff/60001/content/browser/browser... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/60001/content/browser/browser... content/browser/browser_thread_impl.cc:262: default: On 2016/01/08 08:43:32, kinuko wrote: > Let's avoid having default here, which makes it harder to catch changes made on > the enum. Instead we could just have NOTREACHED (or CHECK(false)) after this > switch/case. Done.
A few more questions... https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... content/browser/browser_thread_impl.cc:311: if (g_browser_thread.Get().Get() == this) Is this condition for tests? https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... content/browser/browser_thread_impl.cc:513: return false; Do we still need to check this?
Also: can you update the description to note that this used to be racy before the optimization to make it clearer why we have the BUG line?
On 2016/01/08 14:58:00, kinuko wrote: > Also: can you update the description to note that this used to be racy before > the optimization to make it clearer why we have the BUG line? I don't understand. The race is because of the optimisation. I've fixed the race. The alternative is to add a lock around the loop in GetCurrentThreadIdentifier(), and partially throw out the optimisation.
https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... content/browser/browser_thread_impl.cc:311: if (g_browser_thread.Get().Get() == this) On 2016/01/08 11:35:50, kinuko wrote: > Is this condition for tests? This is for the main/UI thread, where a BrowserThreadImpl object is created for the current thread. https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... content/browser/browser_thread_impl.cc:513: return false; On 2016/01/08 11:35:50, kinuko wrote: > Do we still need to check this? Nope. Removed.
On 2016/01/10 23:18:22, Anand Mistry wrote: > On 2016/01/08 14:58:00, kinuko wrote: > > Also: can you update the description to note that this used to be racy before > > the optimization to make it clearer why we have the BUG line? > > I don't understand. The race is because of the optimisation. I've fixed the > race. The alternative is to add a lock around the loop in > GetCurrentThreadIdentifier(), and partially throw out the optimisation. Ah ok I misunderstood what you meant by 'the optimization' (I had thought another irrelevant change). You meant the fact that GetCurrentThreadIdentifier doesn't have a lock... then could we just say it more specifically, e.g. 'Due to the fact that GetCurrentThreadIdentifier touches the table without lock'? It's not an optimization, it's rather simply a bug.
lgtm https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/80001/content/browser/browser... content/browser/browser_thread_impl.cc:311: if (g_browser_thread.Get().Get() == this) On 2016/01/10 23:18:28, Anand Mistry wrote: > On 2016/01/08 11:35:50, kinuko wrote: > > Is this condition for tests? > > This is for the main/UI thread, where a BrowserThreadImpl object is created for > the current thread. I see thanks, could we add a short comment about that here?
On 2016/01/11 03:49:59, kinuko wrote: > On 2016/01/10 23:18:22, Anand Mistry wrote: > > On 2016/01/08 14:58:00, kinuko wrote: > > > Also: can you update the description to note that this used to be racy > before > > > the optimization to make it clearer why we have the BUG line? > > > > I don't understand. The race is because of the optimisation. I've fixed the > > race. The alternative is to add a lock around the loop in > > GetCurrentThreadIdentifier(), and partially throw out the optimisation. > > Ah ok I misunderstood what you meant by 'the optimization' (I had thought > another irrelevant change). You meant the fact that GetCurrentThreadIdentifier > doesn't have a lock... then could we just say it more specifically, e.g. 'Due to > the fact that GetCurrentThreadIdentifier touches the table without lock'? It's > not an optimization, it's rather simply a bug. I've been thinking a bit more about this. The loop in GetCurrentThreadIdentifier is trivial and bounded to a small number of loop iterations. If I just add the lock, it would be simpler and the lock would be almost never contended. From what I can tell, the benefit of avoiding locks is in PostTaskHelper() where holding |BrowserThreadGlobals::lock| would end up serialising task posting (a non-trivial, potentially blocking operation) for the entire process. On the other hand, holding |BrowserThreadGlobals::lock| while looping over the threads list is a short, non-blocking, bounded operation. And if what you say about this not being an optimisation and just a bug, then the correct fix would be to add the lock.
On 2016/01/11 06:50:15, Anand Mistry wrote: > On 2016/01/11 03:49:59, kinuko wrote: > > On 2016/01/10 23:18:22, Anand Mistry wrote: > > > On 2016/01/08 14:58:00, kinuko wrote: > > > > Also: can you update the description to note that this used to be racy > > before > > > > the optimization to make it clearer why we have the BUG line? > > > > > > I don't understand. The race is because of the optimisation. I've fixed the > > > race. The alternative is to add a lock around the loop in > > > GetCurrentThreadIdentifier(), and partially throw out the optimisation. > > > > Ah ok I misunderstood what you meant by 'the optimization' (I had thought > > another irrelevant change). You meant the fact that > GetCurrentThreadIdentifier > > doesn't have a lock... then could we just say it more specifically, e.g. 'Due > to > > the fact that GetCurrentThreadIdentifier touches the table without lock'? > It's > > not an optimization, it's rather simply a bug. > > I've been thinking a bit more about this. The loop in GetCurrentThreadIdentifier > is trivial and bounded to a small number of loop iterations. If I just add the > lock, it would be simpler and the lock would be almost never contended. > > From what I can tell, the benefit of avoiding locks is in PostTaskHelper() where > holding |BrowserThreadGlobals::lock| would end up serialising task posting (a > non-trivial, potentially blocking operation) for the entire process. On the > other hand, holding |BrowserThreadGlobals::lock| while looping over the threads > list is a short, non-blocking, bounded operation. And if what you say about this > not being an optimisation and just a bug, then the correct fix would be to add > the lock. Could we measure the actual impact for adding a lock via profiler etc? Maybe we could add ScopedTracker to track the impact when the system's loaded (i.e. startup time)? (E.g. Land a profiler change, take a look for a few days, add a lock and see if it adds more jank/overhead) If the impact seems negligible we could probably just add a lock, which feels more straightforward.
On 2016/01/12 07:41:10, kinuko wrote: > On 2016/01/11 06:50:15, Anand Mistry wrote: > > On 2016/01/11 03:49:59, kinuko wrote: > > > On 2016/01/10 23:18:22, Anand Mistry wrote: > > > > On 2016/01/08 14:58:00, kinuko wrote: > > > > > Also: can you update the description to note that this used to be racy > > > before > > > > > the optimization to make it clearer why we have the BUG line? > > > > > > > > I don't understand. The race is because of the optimisation. I've fixed > the > > > > race. The alternative is to add a lock around the loop in > > > > GetCurrentThreadIdentifier(), and partially throw out the optimisation. > > > > > > Ah ok I misunderstood what you meant by 'the optimization' (I had thought > > > another irrelevant change). You meant the fact that > > GetCurrentThreadIdentifier > > > doesn't have a lock... then could we just say it more specifically, e.g. > 'Due > > to > > > the fact that GetCurrentThreadIdentifier touches the table without lock'? > > It's > > > not an optimization, it's rather simply a bug. > > > > I've been thinking a bit more about this. The loop in > GetCurrentThreadIdentifier > > is trivial and bounded to a small number of loop iterations. If I just add the > > lock, it would be simpler and the lock would be almost never contended. > > > > From what I can tell, the benefit of avoiding locks is in PostTaskHelper() > where > > holding |BrowserThreadGlobals::lock| would end up serialising task posting (a > > non-trivial, potentially blocking operation) for the entire process. On the > > other hand, holding |BrowserThreadGlobals::lock| while looping over the > threads > > list is a short, non-blocking, bounded operation. And if what you say about > this > > not being an optimisation and just a bug, then the correct fix would be to add > > the lock. > > > Could we measure the actual impact for adding a lock via profiler etc? Maybe we > could add ScopedTracker to track the impact when the system's loaded (i.e. > startup time)? (E.g. Land a profiler change, take a look for a few days, add a > lock and see if it adds more jank/overhead) If the impact seems negligible we > could probably just add a lock, which feels more straightforward. We could add a profiler to see how contended this is. But, I've been staring more at this code, and there's another problem which adding a lock won't help solve (and using a thread-local will help with). The lock only protects the threads array. It doesn't protect the thread itself, and specifically the message loop. It's possible for the message loop to become invalid while the lock is held (if the thread is shutting down), so the lock doesn't help. Let me put this CL aside for a few days while I take a more detailed look at BrowserThreadImpl. There are some subtle interactions going on here.
On 2016/01/13 06:18:39, Anand Mistry wrote: > On 2016/01/12 07:41:10, kinuko wrote: > > On 2016/01/11 06:50:15, Anand Mistry wrote: > > > On 2016/01/11 03:49:59, kinuko wrote: > > > > On 2016/01/10 23:18:22, Anand Mistry wrote: > > > > > On 2016/01/08 14:58:00, kinuko wrote: > > > > > > Also: can you update the description to note that this used to be racy > > > > before > > > > > > the optimization to make it clearer why we have the BUG line? > > > > > > > > > > I don't understand. The race is because of the optimisation. I've fixed > > the > > > > > race. The alternative is to add a lock around the loop in > > > > > GetCurrentThreadIdentifier(), and partially throw out the optimisation. > > > > > > > > Ah ok I misunderstood what you meant by 'the optimization' (I had thought > > > > another irrelevant change). You meant the fact that > > > GetCurrentThreadIdentifier > > > > doesn't have a lock... then could we just say it more specifically, e.g. > > 'Due > > > to > > > > the fact that GetCurrentThreadIdentifier touches the table without lock'? > > > It's > > > > not an optimization, it's rather simply a bug. > > > > > > I've been thinking a bit more about this. The loop in > > GetCurrentThreadIdentifier > > > is trivial and bounded to a small number of loop iterations. If I just add > the > > > lock, it would be simpler and the lock would be almost never contended. > > > > > > From what I can tell, the benefit of avoiding locks is in PostTaskHelper() > > where > > > holding |BrowserThreadGlobals::lock| would end up serialising task posting > (a > > > non-trivial, potentially blocking operation) for the entire process. On the > > > other hand, holding |BrowserThreadGlobals::lock| while looping over the > > threads > > > list is a short, non-blocking, bounded operation. And if what you say about > > this > > > not being an optimisation and just a bug, then the correct fix would be to > add > > > the lock. > > > > > > Could we measure the actual impact for adding a lock via profiler etc? Maybe > we > > could add ScopedTracker to track the impact when the system's loaded (i.e. > > startup time)? (E.g. Land a profiler change, take a look for a few days, add > a > > lock and see if it adds more jank/overhead) If the impact seems negligible we > > could probably just add a lock, which feels more straightforward. > > We could add a profiler to see how contended this is. > > But, I've been staring more at this code, and there's another problem which > adding a lock won't help solve (and using a thread-local will help with). The > lock only protects the threads array. It doesn't protect the thread itself, and > specifically the message loop. It's possible for the message loop to become > invalid while the lock is held (if the thread is shutting down), so the lock > doesn't help. Let me put this CL aside for a few days while I take a more > detailed look at BrowserThreadImpl. There are some subtle interactions going on > here. Gotcha. Thanks for taking a closer look! Yeah the code around the global table seems to have several subtle points...
Description was changed from ========== Fix a race in BrowserThreadImpl, and some related updates. Due to an "avoid locking" optimisation, it's possible for GetCurrentThreadIdentifier to cause a nullptr dereference. To fix this, store a pointer to the current thread's BrowserThreadImpl in a thread-local. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ========== to ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different, but related races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. Due to an "avoid locking" optimisation, it's possible for GetCurrentThreadIdentifier to cause a nullptr dereference. To fix this, store a pointer to the current thread's BrowserThreadImpl in a thread-local. Locking the access to the global thread table doesn't account for potential calls to CurrentlyOn() in message loop destruction observers and message loop callback bound objects. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ==========
On 2016/01/13 06:21:53, kinuko wrote: > On 2016/01/13 06:18:39, Anand Mistry wrote: > > On 2016/01/12 07:41:10, kinuko wrote: > > > On 2016/01/11 06:50:15, Anand Mistry wrote: > > > > On 2016/01/11 03:49:59, kinuko wrote: > > > > > On 2016/01/10 23:18:22, Anand Mistry wrote: > > > > > > On 2016/01/08 14:58:00, kinuko wrote: > > > > > > > Also: can you update the description to note that this used to be > racy > > > > > before > > > > > > > the optimization to make it clearer why we have the BUG line? > > > > > > > > > > > > I don't understand. The race is because of the optimisation. I've > fixed > > > the > > > > > > race. The alternative is to add a lock around the loop in > > > > > > GetCurrentThreadIdentifier(), and partially throw out the > optimisation. > > > > > > > > > > Ah ok I misunderstood what you meant by 'the optimization' (I had > thought > > > > > another irrelevant change). You meant the fact that > > > > GetCurrentThreadIdentifier > > > > > doesn't have a lock... then could we just say it more specifically, e.g. > > > 'Due > > > > to > > > > > the fact that GetCurrentThreadIdentifier touches the table without > lock'? > > > > It's > > > > > not an optimization, it's rather simply a bug. > > > > > > > > I've been thinking a bit more about this. The loop in > > > GetCurrentThreadIdentifier > > > > is trivial and bounded to a small number of loop iterations. If I just add > > the > > > > lock, it would be simpler and the lock would be almost never contended. > > > > > > > > From what I can tell, the benefit of avoiding locks is in PostTaskHelper() > > > where > > > > holding |BrowserThreadGlobals::lock| would end up serialising task posting > > (a > > > > non-trivial, potentially blocking operation) for the entire process. On > the > > > > other hand, holding |BrowserThreadGlobals::lock| while looping over the > > > threads > > > > list is a short, non-blocking, bounded operation. And if what you say > about > > > this > > > > not being an optimisation and just a bug, then the correct fix would be to > > add > > > > the lock. > > > > > > > > > Could we measure the actual impact for adding a lock via profiler etc? > Maybe > > we > > > could add ScopedTracker to track the impact when the system's loaded (i.e. > > > startup time)? (E.g. Land a profiler change, take a look for a few days, > add > > a > > > lock and see if it adds more jank/overhead) If the impact seems negligible > we > > > could probably just add a lock, which feels more straightforward. > > > > We could add a profiler to see how contended this is. > > > > But, I've been staring more at this code, and there's another problem which > > adding a lock won't help solve (and using a thread-local will help with). The > > lock only protects the threads array. It doesn't protect the thread itself, > and > > specifically the message loop. It's possible for the message loop to become > > invalid while the lock is held (if the thread is shutting down), so the lock > > doesn't help. Let me put this CL aside for a few days while I take a more > > detailed look at BrowserThreadImpl. There are some subtle interactions going > on > > here. > > Gotcha. Thanks for taking a closer look! Yeah the code around the global table > seems to have several subtle points... PTAL. I've identified the other significant problem and fixed that as well. BrowserThreadImpl should be a happier place now.
Hang on. This introduced a unit test failure. Sigh! On 19 January 2016 at 12:48, <amistry@chromium.org> wrote: > On 2016/01/13 06:21:53, kinuko wrote: > >> On 2016/01/13 06:18:39, Anand Mistry wrote: >> > On 2016/01/12 07:41:10, kinuko wrote: >> > > On 2016/01/11 06:50:15, Anand Mistry wrote: >> > > > On 2016/01/11 03:49:59, kinuko wrote: >> > > > > On 2016/01/10 23:18:22, Anand Mistry wrote: >> > > > > > On 2016/01/08 14:58:00, kinuko wrote: >> > > > > > > Also: can you update the description to note that this used >> to be >> racy >> > > > > before >> > > > > > > the optimization to make it clearer why we have the BUG line? >> > > > > > >> > > > > > I don't understand. The race is because of the optimisation. >> I've >> fixed >> > > the >> > > > > > race. The alternative is to add a lock around the loop in >> > > > > > GetCurrentThreadIdentifier(), and partially throw out the >> optimisation. >> > > > > >> > > > > Ah ok I misunderstood what you meant by 'the optimization' (I had >> thought >> > > > > another irrelevant change). You meant the fact that >> > > > GetCurrentThreadIdentifier >> > > > > doesn't have a lock... then could we just say it more >> specifically, >> > e.g. > >> > > 'Due >> > > > to >> > > > > the fact that GetCurrentThreadIdentifier touches the table without >> lock'? >> > > > It's >> > > > > not an optimization, it's rather simply a bug. >> > > > >> > > > I've been thinking a bit more about this. The loop in >> > > GetCurrentThreadIdentifier >> > > > is trivial and bounded to a small number of loop iterations. If I >> just >> > add > >> > the >> > > > lock, it would be simpler and the lock would be almost never >> contended. >> > > > >> > > > From what I can tell, the benefit of avoiding locks is in >> > PostTaskHelper() > >> > > where >> > > > holding |BrowserThreadGlobals::lock| would end up serialising task >> > posting > >> > (a >> > > > non-trivial, potentially blocking operation) for the entire >> process. On >> the >> > > > other hand, holding |BrowserThreadGlobals::lock| while looping over >> the >> > > threads >> > > > list is a short, non-blocking, bounded operation. And if what you >> say >> about >> > > this >> > > > not being an optimisation and just a bug, then the correct fix >> would be >> > to > >> > add >> > > > the lock. >> > > >> > > >> > > Could we measure the actual impact for adding a lock via profiler etc? >> Maybe >> > we >> > > could add ScopedTracker to track the impact when the system's loaded >> (i.e. >> > > startup time)? (E.g. Land a profiler change, take a look for a few >> days, >> add >> > a >> > > lock and see if it adds more jank/overhead) If the impact seems >> > negligible > >> we >> > > could probably just add a lock, which feels more straightforward. >> > >> > We could add a profiler to see how contended this is. >> > >> > But, I've been staring more at this code, and there's another problem >> which >> > adding a lock won't help solve (and using a thread-local will help >> with). >> > The > >> > lock only protects the threads array. It doesn't protect the thread >> itself, >> and >> > specifically the message loop. It's possible for the message loop to >> become >> > invalid while the lock is held (if the thread is shutting down), so the >> lock >> > doesn't help. Let me put this CL aside for a few days while I take a >> more >> > detailed look at BrowserThreadImpl. There are some subtle interactions >> going >> on >> > here. >> > > Gotcha. Thanks for taking a closer look! Yeah the code around the global >> table >> seems to have several subtle points... >> > > PTAL. I've identified the other significant problem and fixed that as well. > BrowserThreadImpl should be a happier place now. > > https://codereview.chromium.org/1564193002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Ok, I fixed the unit tests issue. PTAL.
https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... content/browser/browser_thread_impl.cc:433: // On shutdown, the global thread array could have been set to null, even nit: The expression 'on shutdown' feels a bit ambiguous, this could be read either 'on the thread's shutdown' or 'during system shutdown'. https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... content/browser/browser_thread_impl.cc:443: } Hmm I see. But if that's the case it starts to sound a bit weird we need this code only for testing. If that's the case it'd be nicer this code is somehow hidden in test-only code (like TestBrowserThreadImpl) but this is static code so we can't do that... If we remove the thread local impl below will we actually have a problem?
Description was changed from ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different, but related races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. Due to an "avoid locking" optimisation, it's possible for GetCurrentThreadIdentifier to cause a nullptr dereference. To fix this, store a pointer to the current thread's BrowserThreadImpl in a thread-local. Locking the access to the global thread table doesn't account for potential calls to CurrentlyOn() in message loop destruction observers and message loop callback bound objects. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ========== to ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. There's a missing lock in GetCurrentThreadIdentifier when accessing the global thread table. To track whether or not this causes lock contention, a ScopedTracker has been added. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ==========
PTAL. I've taken another look at this code and decided to revert the thread-local part since my concern about a thread shutdown case was unfounded. Also, I think there's technically a third race on startup, but I don't think any part of Chrome would trigger it, so I haven't fixed it. https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... File content/browser/browser_thread_impl.cc (right): https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... content/browser/browser_thread_impl.cc:433: // On shutdown, the global thread array could have been set to null, even On 2016/01/19 14:27:20, kinuko wrote: > nit: The expression 'on shutdown' feels a bit ambiguous, this could be read > either 'on the thread's shutdown' or 'during system shutdown'. This whole section is gone. https://codereview.chromium.org/1564193002/diff/160001/content/browser/browse... content/browser/browser_thread_impl.cc:443: } On 2016/01/19 14:27:20, kinuko wrote: > Hmm I see. But if that's the case it starts to sound a bit weird we need this > code only for testing. If that's the case it'd be nicer this code is somehow > hidden in test-only code (like TestBrowserThreadImpl) but this is static code so > we can't do that... > > If we remove the thread local impl below will we actually have a problem? So, with even more staring at code, I've determined this problem doesn't exist and so I've gotten rid of the thread-local and stuck with locking. The reason a MessageLoop shutdown problem doesn't exist is that at the time a Thread destroys it's MessageLoop, it has already set it's message_loop_ member to null. So any attempt to read Thread::message_loop() during a message loop destruciton observer will see null. Here, this means CurrentlyOn will return false, even though you may technically still be on the correct thread.
lgtm!
The CQ bit was checked by amistry@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from kinuko@chromium.org Link to the patchset: https://codereview.chromium.org/1564193002/#ps200001 (title: "Rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1564193002/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1564193002/200001
Message was sent while issue was closed.
Committed patchset #11 (id:200001)
Message was sent while issue was closed.
Description was changed from ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. There's a missing lock in GetCurrentThreadIdentifier when accessing the global thread table. To track whether or not this causes lock contention, a ScopedTracker has been added. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 ========== to ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. There's a missing lock in GetCurrentThreadIdentifier when accessing the global thread table. To track whether or not this causes lock contention, a ScopedTracker has been added. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 Committed: https://crrev.com/83e3480a731d95a97ccc1eacd7b3356ded42b0d4 Cr-Commit-Position: refs/heads/master@{#370560} ==========
Message was sent while issue was closed.
Patchset 11 (id:??) landed as https://crrev.com/83e3480a731d95a97ccc1eacd7b3356ded42b0d4 Cr-Commit-Position: refs/heads/master@{#370560}
Message was sent while issue was closed.
Just stumbled upon this change, this is great thanks :-)! Curious to know where you're tracking the lock contention results as I'm also interested in those results for go/lucky-luke. Thanks! Gab
Message was sent while issue was closed.
Description was changed from ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. There's a missing lock in GetCurrentThreadIdentifier when accessing the global thread table. To track whether or not this causes lock contention, a ScopedTracker has been added. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 Committed: https://crrev.com/83e3480a731d95a97ccc1eacd7b3356ded42b0d4 Cr-Commit-Position: refs/heads/master@{#370560} ========== to ========== Fix races in BrowserThreadImpl, and some related updates. This change fixes two different races: 1. On destruction of BrowserThreadImpl, the thread is removed from the global thread table after it is stopped. The process of stopping the thread destroys the thread's message loop. However, PostTaskHelper accesses the message loop, which races with the stopping thread. Reordering the two operations is not sufficient because it doesn't handle subclasses (i.e. BrowserProcessSubThread) stopping the thread. 2. There's a missing lock in GetCurrentThreadIdentifier when accessing the global thread table. To track whether or not this causes lock contention, a ScopedTracker has been added. The two other changes are: 1. Remove uses of ScopedAllowSingleton. This isn't necessary any more since the relevant objects are leaky. 2. Remove impossible cases in BrowserThreadImpl::Run(). BUG=115540 Committed: https://crrev.com/83e3480a731d95a97ccc1eacd7b3356ded42b0d4 Cr-Commit-Position: refs/heads/master@{#370560} ==========
Message was sent while issue was closed.
On 2016/02/09 17:55:01, gab wrote: > Just stumbled upon this change, this is great thanks :-)! > > Curious to know where you're tracking the lock contention results as I'm also > interested in those results for go/lucky-luke. > > Thanks! > Gab I'm not keeping track of contention, but it's visible in UMA's profiler section. Look up "BrowserThread::GetCurrentThreadIdentifier =>". If we consider anything over 1ms as contention, then it looks like there's ~0.001% contention in GetCurrentThreadIdentifier(). Take this with a massive grain of salt, since UMA is not a substitute for real contention tracking. |