|
|
Created:
9 years, 1 month ago by Robert Sesek Modified:
9 years, 1 month ago CC:
chromium-reviews, brettw-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionMake ObserverListThreadSafe key its observers by PlatformThreadId instead of MessageLoop.
This fixes a subtle behavior that drops RemoveObserver operations when the
current() loop is NULL. Now those operations will always succeed.
BUG=104826
TEST=base_unittests
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=111404
Patch Set 1 #
Total comments: 4
Patch Set 2 : Add a test #Patch Set 3 : Add comment #
Messages
Total messages: 25 (0 generated)
http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h... base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); It's not clear to me that this is safe. Won't ObserverListContext construct a NULL MessageLoopProxy if MessageLoop::current() is NULL? Later on, we might crash on Notify() then. Is that desired behavior?
Thanks for the review. http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h... base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On 2011/11/22 19:15:23, willchan wrote: > It's not clear to me that this is safe. Won't ObserverListContext construct a > NULL MessageLoopProxy if MessageLoop::current() is NULL? Later on, we might > crash on Notify() then. Is that desired behavior? You're right that it may crash if the current loop is NULL, but it seems the comment above is no longer applicable as all tests pass, which is the only context in which an observer could be added without a loop. That said, I'd be OK with re-adding the check in AddObserver; the real problematic piece of code is dropping RemoveObserver ops when there's no loop because it leaves garbage pointers around. When combined with TCMalloc's address reuse, it means that something being added to the list can technically be "found in it" even if the object is theoretically garbage. Alternatively, the Notify variants could ensure that the loop is non-NULL before posting.
http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h... base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On 2011/11/22 19:24:39, rsesek wrote: > On 2011/11/22 19:15:23, willchan wrote: > > It's not clear to me that this is safe. Won't ObserverListContext construct a > > NULL MessageLoopProxy if MessageLoop::current() is NULL? Later on, we might > > crash on Notify() then. Is that desired behavior? > > You're right that it may crash if the current loop is NULL, but it seems the > comment above is no longer applicable as all tests pass, which is the only > context in which an observer could be added without a loop. Yep, it looks like the time implementation changed, so the comment isn't valid. > > That said, I'd be OK with re-adding the check in AddObserver; the real > problematic piece of code is dropping RemoveObserver ops when there's no loop > because it leaves garbage pointers around. When combined with TCMalloc's address > reuse, it means that something being added to the list can technically be "found > in it" even if the object is theoretically garbage. > > Alternatively, the Notify variants could ensure that the loop is non-NULL before > posting. I prefer the former solution, not adding to the ObserverList if the MessageLoop is NULL. Can you add a test for this case?
http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h... base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On 2011/11/22 19:31:24, willchan wrote: > I prefer the former solution, not adding to the ObserverList if the MessageLoop > is NULL. Can you add a test for this case? Done. I also tried to add a test for the broken behavior before, but it's pretty darn impossible without reaching directly into the ObserverListThreadSafe internals.
Btw, can I see the actual code that revealed the problem? I'm sort of thinking it's the calling code that is at fault here. I think that if you add an observer, you must remove it while the MessageLoop is alive or not try to remove it at all. Do we actually have situations where the MessageLoop goes away and comes back again?
On 2011/11/22 21:39:09, willchan wrote: > Btw, can I see the actual code that revealed the problem? I'm sort of thinking > it's the calling code that is at fault here. I think that if you add an > observer, you must remove it while the MessageLoop is alive or not try to remove > it at all. Do we actually have situations where the MessageLoop goes away and > comes back again? There's no code per se, but I can try and explain it here as best as I can. The specific issue is that net::CertDatabase is a singleton that holds a ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do not properly clean it up with an AtExitManager. (I'm trying to fix this in a separate CL by putting up an AtExitManager for each individual test.) In some tests, I've been using a base::ShadowingAtExitingManager to clean up a singleton that is partially under test, with the destruction order looking like this: class T : public testing::Test { private: base::ShadowingAtExitManager at_exit_manager_; MessageLoop loop_; ObjectUnderTest o_; }; o_ uses some other singleton S_, partially under test, that is cleaned up properly by at_exit_manager_. S_ indirectly uses a part of a URLRequest, which then adds some member as a CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with loop_ already gone) and that URLRequest goes down, the URLRequest submember tries to remove itself as an observer, but because there is no loop_, the RemoveObserver operation gets dropped. Subsequent allocations of the same submember of S_ that observes CertDatabase will get allocated at the exact same address because of TCMalloc, leading the ObserverList to hit a CHECK that the observer has already been added. Now, loop_ could outlive at_exit_manager_, but that sort of violates shutdown causality. In short: - Singletons don't get cleaned up properly between tests, making it possible for state to stick around between them. - MessageLoop going away before objects that hold ObserverListThreadSafe leaves garbage in the ObserverLists. - Singletons that hold ObserverListThreadSafes compound these two problems. This has been a total WTF BBQ in my head. I hope this helps share the love.
On 2011/11/22 22:20:56, rsesek wrote: > On 2011/11/22 21:39:09, willchan wrote: > > Btw, can I see the actual code that revealed the problem? I'm sort of thinking > > it's the calling code that is at fault here. I think that if you add an > > observer, you must remove it while the MessageLoop is alive or not try to > remove > > it at all. Do we actually have situations where the MessageLoop goes away and > > comes back again? > > There's no code per se, but I can try and explain it here as best as I can. > > The specific issue is that net::CertDatabase is a singleton that holds a > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do not > properly clean it up with an AtExitManager. (I'm trying to fix this in a > separate CL by putting up an AtExitManager for each individual test.) > > In some tests, I've been using a base::ShadowingAtExitingManager to clean up a > singleton that is partially under test, with the destruction order looking like > this: > > class T : public testing::Test { > private: > base::ShadowingAtExitManager at_exit_manager_; > MessageLoop loop_; > ObjectUnderTest o_; > }; > > o_ uses some other singleton S_, partially under test, that is cleaned up > properly by at_exit_manager_. > > S_ indirectly uses a part of a URLRequest, which then adds some member as a > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with loop_ > already gone) and that URLRequest goes down, the URLRequest submember tries to > remove itself as an observer, but because there is no loop_, the RemoveObserver > operation gets dropped. > > Subsequent allocations of the same submember of S_ that observes CertDatabase > will get allocated at the exact same address because of TCMalloc, leading the > ObserverList to hit a CHECK that the observer has already been added. > > Now, loop_ could outlive at_exit_manager_, but that sort of violates shutdown > causality. > > In short: > - Singletons don't get cleaned up properly between tests, making it possible for > state to stick around between them. > - MessageLoop going away before objects that hold ObserverListThreadSafe leaves > garbage in the ObserverLists. > - Singletons that hold ObserverListThreadSafes compound these two problems. > > This has been a total WTF BBQ in my head. I hope this helps share the love. I'm inclined to agree with willchan@ that the way callers are making use of ObserverListThreadSafe is faulty; ObserverListThreadSafe should surely barf if it can't remove an observer, since otherwise a thread that adds an observer and then exits before tearing down observers might leave the world in an unstable state?
On Tue, Nov 22, 2011 at 2:20 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 21:39:09, willchan wrote: > >> Btw, can I see the actual code that revealed the problem? I'm sort of >> thinking >> it's the calling code that is at fault here. I think that if you add an >> observer, you must remove it while the MessageLoop is alive or not try to >> > remove > >> it at all. Do we actually have situations where the MessageLoop goes away >> and >> comes back again? >> > > There's no code per se, but I can try and explain it here as best as I can. > > The specific issue is that net::CertDatabase is a singleton that holds a > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do > not > properly clean it up with an AtExitManager. (I'm trying to fix this in a > separate CL by putting up an AtExitManager for each individual test.) > > In some tests, I've been using a base::**ShadowingAtExitingManager to > clean up a > singleton that is partially under test, with the destruction order looking > like > this: > > class T : public testing::Test { > private: > base::ShadowingAtExitManager at_exit_manager_; > MessageLoop loop_; > ObjectUnderTest o_; > }; > > o_ uses some other singleton S_, partially under test, that is cleaned up > properly by at_exit_manager_. > > S_ indirectly uses a part of a URLRequest, which then adds some member as a > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with > loop_ > already gone) and that URLRequest goes down, the URLRequest submember > tries to > remove itself as an observer, but because there is no loop_, the > RemoveObserver > operation gets dropped. > This sounds like the problem to me. The Singleton should not be owning the URLRequest. More specifically, a URLRequest shouldn't be getting destroyed by the AtExitManager, since that should be destroyed after the MessageLoop is. Why is that happening? > > Subsequent allocations of the same submember of S_ that observes > CertDatabase > will get allocated at the exact same address because of TCMalloc, leading > the > ObserverList to hit a CHECK that the observer has already been added. > > Now, loop_ could outlive at_exit_manager_, but that sort of violates > shutdown > causality. > > In short: > - Singletons don't get cleaned up properly between tests, making it > possible for > state to stick around between them. > - MessageLoop going away before objects that hold ObserverListThreadSafe > leaves > garbage in the ObserverLists. > - Singletons that hold ObserverListThreadSafes compound these two problems. > > This has been a total WTF BBQ in my head. I hope this helps share the love. > > http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635... >
On 2011/11/22 22:39:37, willchan wrote: > On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote: > > > On 2011/11/22 21:39:09, willchan wrote: > > > >> Btw, can I see the actual code that revealed the problem? I'm sort of > >> thinking > >> it's the calling code that is at fault here. I think that if you add an > >> observer, you must remove it while the MessageLoop is alive or not try to > >> > > remove > > > >> it at all. Do we actually have situations where the MessageLoop goes away > >> and > >> comes back again? > >> > > > > There's no code per se, but I can try and explain it here as best as I can. > > > > The specific issue is that net::CertDatabase is a singleton that holds a > > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do > > not > > properly clean it up with an AtExitManager. (I'm trying to fix this in a > > separate CL by putting up an AtExitManager for each individual test.) > > > > In some tests, I've been using a base::**ShadowingAtExitingManager to > > clean up a > > singleton that is partially under test, with the destruction order looking > > like > > this: > > > > class T : public testing::Test { > > private: > > base::ShadowingAtExitManager at_exit_manager_; > > MessageLoop loop_; > > ObjectUnderTest o_; > > }; > > > > o_ uses some other singleton S_, partially under test, that is cleaned up > > properly by at_exit_manager_. > > > > S_ indirectly uses a part of a URLRequest, which then adds some member as a > > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with > > loop_ > > already gone) and that URLRequest goes down, the URLRequest submember > > tries to > > remove itself as an observer, but because there is no loop_, the > > RemoveObserver > > operation gets dropped. > > > > This sounds like the problem to me. The Singleton should not be owning the > URLRequest. More specifically, a URLRequest shouldn't be getting destroyed > by the AtExitManager, since that should be destroyed after the MessageLoop > is. Why is that happening? > > > > > > Subsequent allocations of the same submember of S_ that observes > > CertDatabase > > will get allocated at the exact same address because of TCMalloc, leading > > the > > ObserverList to hit a CHECK that the observer has already been added. > > > > Now, loop_ could outlive at_exit_manager_, but that sort of violates > > shutdown > > causality. > > > > In short: > > - Singletons don't get cleaned up properly between tests, making it > > possible for > > state to stick around between them. > > - MessageLoop going away before objects that hold ObserverListThreadSafe > > leaves > > garbage in the ObserverLists. > > - Singletons that hold ObserverListThreadSafes compound these two problems. > > > > This has been a total WTF BBQ in my head. I hope this helps share the love. Haha, I bet. That's what you get for being a good fellow and trying to tackle this stuff. Thanks for doing it! > > > > > http://codereview.chromium.**org/8635002/%3Chttp://codereview.chromium.org/86...> > >
On 2011/11/22 22:39:37, willchan wrote: > On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote: > > > On 2011/11/22 21:39:09, willchan wrote: > > > >> Btw, can I see the actual code that revealed the problem? I'm sort of > >> thinking > >> it's the calling code that is at fault here. I think that if you add an > >> observer, you must remove it while the MessageLoop is alive or not try to > >> > > remove > > > >> it at all. Do we actually have situations where the MessageLoop goes away > >> and > >> comes back again? > >> > > > > There's no code per se, but I can try and explain it here as best as I can. > > > > The specific issue is that net::CertDatabase is a singleton that holds a > > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do > > not > > properly clean it up with an AtExitManager. (I'm trying to fix this in a > > separate CL by putting up an AtExitManager for each individual test.) > > > > In some tests, I've been using a base::**ShadowingAtExitingManager to > > clean up a > > singleton that is partially under test, with the destruction order looking > > like > > this: > > > > class T : public testing::Test { > > private: > > base::ShadowingAtExitManager at_exit_manager_; > > MessageLoop loop_; > > ObjectUnderTest o_; > > }; > > > > o_ uses some other singleton S_, partially under test, that is cleaned up > > properly by at_exit_manager_. > > > > S_ indirectly uses a part of a URLRequest, which then adds some member as a > > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with > > loop_ > > already gone) and that URLRequest goes down, the URLRequest submember > > tries to > > remove itself as an observer, but because there is no loop_, the > > RemoveObserver > > operation gets dropped. > > > > This sounds like the problem to me. The Singleton should not be owning the > URLRequest. More specifically, a URLRequest shouldn't be getting destroyed > by the AtExitManager, since that should be destroyed after the MessageLoop > is. Why is that happening? It's not the URLRequest specifically (typing from memory now that bash logs are long since clobbered). Some things use a MockURLRequestContext which leave observers hanging around (I think). But in this specific instance, it's because of some SSL related service that's used that adds the observer to CertDatabase Sorry for the vagueness. I did this investigation a bit ago and determined that this was a pretty smart approach rather than trying to fix every broken test. Of which there are a lot.
On 2011/11/22 22:51:13, rsesek wrote: > long since clobbered). Some things use a MockURLRequestContext which leave > observers hanging around (I think). But in this specific instance, it's because No, they leave the CertDatabase around.
On Tue, Nov 22, 2011 at 2:51 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 22:39:37, willchan wrote: > >> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote: >> > > > On 2011/11/22 21:39:09, willchan wrote: >> > >> >> Btw, can I see the actual code that revealed the problem? I'm sort of >> >> thinking >> >> it's the calling code that is at fault here. I think that if you add an >> >> observer, you must remove it while the MessageLoop is alive or not try >> to >> >> >> > remove >> > >> >> it at all. Do we actually have situations where the MessageLoop goes >> away >> >> and >> >> comes back again? >> >> >> > >> > There's no code per se, but I can try and explain it here as best as I >> can. >> > >> > The specific issue is that net::CertDatabase is a singleton that holds a >> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and >> do >> > not >> > properly clean it up with an AtExitManager. (I'm trying to fix this in a >> > separate CL by putting up an AtExitManager for each individual test.) >> > >> > In some tests, I've been using a base::****ShadowingAtExitingManager to >> > clean up a >> > singleton that is partially under test, with the destruction order >> looking >> > like >> > this: >> > >> > class T : public testing::Test { >> > private: >> > base::ShadowingAtExitManager at_exit_manager_; >> > MessageLoop loop_; >> > ObjectUnderTest o_; >> > }; >> > >> > o_ uses some other singleton S_, partially under test, that is cleaned >> up >> > properly by at_exit_manager_. >> > >> > S_ indirectly uses a part of a URLRequest, which then adds some member >> as a >> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with >> > loop_ >> > already gone) and that URLRequest goes down, the URLRequest submember >> > tries to >> > remove itself as an observer, but because there is no loop_, the >> > RemoveObserver >> > operation gets dropped. >> > >> > > This sounds like the problem to me. The Singleton should not be owning the >> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed >> by the AtExitManager, since that should be destroyed after the MessageLoop >> is. Why is that happening? >> > > It's not the URLRequest specifically (typing from memory now that bash > logs are > long since clobbered). Some things use a MockURLRequestContext which leave > observers hanging around (I think). But in this specific instance, it's > because > of some SSL related service that's used that adds the observer to > CertDatabase > > Sorry for the vagueness. I did this investigation a bit ago and determined > that > this was a pretty smart approach rather than trying to fix every broken > test. Of > which there are a lot. > So, in my mind, there are the "right" thing to do and the "practical" thing to do. The "right" thing to do would be to fix every broken test. The "practical" thing to do would be to allow your change. Your change is adds an extra header include and syscall on each AddObserver() and RemoveObserver(). From a purist standpoint, that's a bit lame, but practically speaking, the effects are negligible. And yes, it will fix the problem. It comes down to a balancing act for me. If there are < 10 tests that need to be fixed, and it's not too hard to do so, I'd push back. If there are significantly more than that, then I'd accept this changelist as is, because your time is more valuable than that. I would like extra comments explaining why this is necessary, since when reading the code, it won't be obvious why we're keying by PlatformThreadId. > > http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635... >
On 2011/11/22 23:00:14, willchan wrote: > On Tue, Nov 22, 2011 at 2:51 PM, <mailto:rsesek@chromium.org> wrote: > > > On 2011/11/22 22:39:37, willchan wrote: > > > >> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote: > >> > > > > > On 2011/11/22 21:39:09, willchan wrote: > >> > > >> >> Btw, can I see the actual code that revealed the problem? I'm sort of > >> >> thinking > >> >> it's the calling code that is at fault here. I think that if you add an > >> >> observer, you must remove it while the MessageLoop is alive or not try > >> to > >> >> > >> > remove > >> > > >> >> it at all. Do we actually have situations where the MessageLoop goes > >> away > >> >> and > >> >> comes back again? > >> >> > >> > > >> > There's no code per se, but I can try and explain it here as best as I > >> can. > >> > > >> > The specific issue is that net::CertDatabase is a singleton that holds a > >> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and > >> do > >> > not > >> > properly clean it up with an AtExitManager. (I'm trying to fix this in a > >> > separate CL by putting up an AtExitManager for each individual test.) > >> > > >> > In some tests, I've been using a base::****ShadowingAtExitingManager to > >> > clean up a > >> > singleton that is partially under test, with the destruction order > >> looking > >> > like > >> > this: > >> > > >> > class T : public testing::Test { > >> > private: > >> > base::ShadowingAtExitManager at_exit_manager_; > >> > MessageLoop loop_; > >> > ObjectUnderTest o_; > >> > }; > >> > > >> > o_ uses some other singleton S_, partially under test, that is cleaned > >> up > >> > properly by at_exit_manager_. > >> > > >> > S_ indirectly uses a part of a URLRequest, which then adds some member > >> as a > >> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with > >> > loop_ > >> > already gone) and that URLRequest goes down, the URLRequest submember > >> > tries to > >> > remove itself as an observer, but because there is no loop_, the > >> > RemoveObserver > >> > operation gets dropped. > >> > > >> > > > > This sounds like the problem to me. The Singleton should not be owning the > >> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed > >> by the AtExitManager, since that should be destroyed after the MessageLoop > >> is. Why is that happening? > >> > > > > It's not the URLRequest specifically (typing from memory now that bash > > logs are > > long since clobbered). Some things use a MockURLRequestContext which leave > > observers hanging around (I think). But in this specific instance, it's > > because > > of some SSL related service that's used that adds the observer to > > CertDatabase > > > > Sorry for the vagueness. I did this investigation a bit ago and determined > > that > > this was a pretty smart approach rather than trying to fix every broken > > test. Of > > which there are a lot. > > > > So, in my mind, there are the "right" thing to do and the "practical" thing > to do. The "right" thing to do would be to fix every broken test. The > "practical" thing to do would be to allow your change. Your change is adds > an extra header include and syscall on each AddObserver() and > RemoveObserver(). From a purist standpoint, that's a bit lame, but > practically speaking, the effects are negligible. And yes, it will fix the > problem. It comes down to a balancing act for me. If there are < 10 tests > that need to be fixed, and it's not too hard to do so, I'd push back. If > there are significantly more than that, then I'd accept this changelist as > is, because your time is more valuable than that. I would like extra > comments explaining why this is necessary, since when reading the code, it > won't be obvious why we're keying by PlatformThreadId. I agree that this is kind of lame. Unfortunately, there are indeed more than 10 test cases. I gave up trying to fix them after getting to chrome/browser/extensions/. My ultimate pipe dream goal is to make singleton use hermetic between tests by injecting an AtExitManager via base::TestSuite (mentioned above). This is probably a week out or so (if all goes well). How about we compromise? Go with this CL with the intention of it being temporary on condition of me AtExiting all singletons between tests. Failing that, we (I) should fix all those broken tests and revert back to using the MessageLoop pointer. Or we can hold off on this and see if I can slay the singleton dragon. I'm off for today, but I've added a comment about PlatformThreadId.
On Tue, Nov 22, 2011 at 3:08 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 23:00:14, willchan wrote: > > On Tue, Nov 22, 2011 at 2:51 PM, <mailto:rsesek@chromium.org> wrote: >> > > > On 2011/11/22 22:39:37, willchan wrote: >> > >> >> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote: >> >> >> > >> > > On 2011/11/22 21:39:09, willchan wrote: >> >> > >> >> >> Btw, can I see the actual code that revealed the problem? I'm sort >> of >> >> >> thinking >> >> >> it's the calling code that is at fault here. I think that if you >> add an >> >> >> observer, you must remove it while the MessageLoop is alive or not >> try >> >> to >> >> >> >> >> > remove >> >> > >> >> >> it at all. Do we actually have situations where the MessageLoop goes >> >> away >> >> >> and >> >> >> comes back again? >> >> >> >> >> > >> >> > There's no code per se, but I can try and explain it here as best as >> I >> >> can. >> >> > >> >> > The specific issue is that net::CertDatabase is a singleton that >> holds a >> >> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase >> and >> >> do >> >> > not >> >> > properly clean it up with an AtExitManager. (I'm trying to fix this >> in a >> >> > separate CL by putting up an AtExitManager for each individual test.) >> >> > >> >> > In some tests, I've been using a base::******ShadowingAtExitingManager >> to >> >> >> > clean up a >> >> > singleton that is partially under test, with the destruction order >> >> looking >> >> > like >> >> > this: >> >> > >> >> > class T : public testing::Test { >> >> > private: >> >> > base::ShadowingAtExitManager at_exit_manager_; >> >> > MessageLoop loop_; >> >> > ObjectUnderTest o_; >> >> > }; >> >> > >> >> > o_ uses some other singleton S_, partially under test, that is >> cleaned >> >> up >> >> > properly by at_exit_manager_. >> >> > >> >> > S_ indirectly uses a part of a URLRequest, which then adds some >> member >> >> as a >> >> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ >> (with >> >> > loop_ >> >> > already gone) and that URLRequest goes down, the URLRequest submember >> >> > tries to >> >> > remove itself as an observer, but because there is no loop_, the >> >> > RemoveObserver >> >> > operation gets dropped. >> >> > >> >> >> > >> > This sounds like the problem to me. The Singleton should not be owning >> the >> >> URLRequest. More specifically, a URLRequest shouldn't be getting >> destroyed >> >> by the AtExitManager, since that should be destroyed after the >> MessageLoop >> >> is. Why is that happening? >> >> >> > >> > It's not the URLRequest specifically (typing from memory now that bash >> > logs are >> > long since clobbered). Some things use a MockURLRequestContext which >> leave >> > observers hanging around (I think). But in this specific instance, it's >> > because >> > of some SSL related service that's used that adds the observer to >> > CertDatabase >> > >> > Sorry for the vagueness. I did this investigation a bit ago and >> determined >> > that >> > this was a pretty smart approach rather than trying to fix every broken >> > test. Of >> > which there are a lot. >> > >> > > So, in my mind, there are the "right" thing to do and the "practical" >> thing >> to do. The "right" thing to do would be to fix every broken test. The >> "practical" thing to do would be to allow your change. Your change is adds >> an extra header include and syscall on each AddObserver() and >> RemoveObserver(). From a purist standpoint, that's a bit lame, but >> practically speaking, the effects are negligible. And yes, it will fix the >> problem. It comes down to a balancing act for me. If there are < 10 tests >> that need to be fixed, and it's not too hard to do so, I'd push back. If >> there are significantly more than that, then I'd accept this changelist as >> is, because your time is more valuable than that. I would like extra >> comments explaining why this is necessary, since when reading the code, it >> won't be obvious why we're keying by PlatformThreadId. >> > > I agree that this is kind of lame. Unfortunately, there are indeed more > than 10 > test cases. I gave up trying to fix them after getting to > chrome/browser/extensions/. > > My ultimate pipe dream goal is to make singleton use hermetic between > tests by > injecting an AtExitManager via base::TestSuite (mentioned above). This is > probably a week out or so (if all goes well). > > How about we compromise? Go with this CL with the intention of it being > temporary on condition of me AtExiting all singletons between tests. > Failing > that, we (I) should fix all those broken tests and revert back to using the > MessageLoop pointer. Or we can hold off on this and see if I can slay the > singleton dragon. > > I'm off for today, but I've added a comment about PlatformThreadId. > Glad we agree on the situation. I'm LGTM'ing this changelist so you can submit it if you want, and I will let you decide whether or not to do so, and hopefully reverting the change in the future if there's a point at which we don't need it. Thanks for trying to make our tests' Singleton use hermetic, much appreciated. > > http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635... >
On 2011/11/22 23:08:53, rsesek wrote: > I'm off for today, but I've added a comment about PlatformThreadId. This has been keeping "Linux Clang (ChromeOS dbg)" red forever. I've been trying to figure out tests to disable temporarily to get it green, but since the problem is latent from a previous test, it's annoying. Does anyone mind if I click the commit button? [I'm going to go look at the CL again, now :-).]
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/8635002/19001
On 2011/11/23 17:31:16, shess wrote: > On 2011/11/22 23:08:53, rsesek wrote: > > I'm off for today, but I've added a comment about PlatformThreadId. > > This has been keeping "Linux Clang (ChromeOS dbg)" red forever. I've been > trying to figure out tests to disable temporarily to get it green, but since the > problem is latent from a previous test, it's annoying. Does anyone mind if I > click the commit button? > > [I'm going to go look at the CL again, now :-).] Your request is granted. I'm also super close (I hope) to SLAYING ALL THE SINGLETONS. Huzzah!
BTW, I find myself wondering if another alternative would be to add a destruction observer to the message loop when the ObserverListContext is created, and then clear that out when the loop goes away. A future change could assert that there are no observers left when the context is destroyed. [If I understand the bug appropriately.]
On 2011/11/23 17:35:04, shess wrote: > BTW, I find myself wondering if another alternative would be to add a > destruction observer to the message loop when the ObserverListContext is > created, and then clear that out when the loop goes away. A future change could > assert that there are no observers left when the context is destroyed. > > [If I understand the bug appropriately.] That would likely work, but it seems kind of kludgy as it would be hacking in test isolation into base/. I think this CL is a good enough stopgap and can be undone once the singletons are properly recycled between tests.
Try job failure for 8635002-19001 (retry) on mac_rel for step "compile" (clobber build). It's a second try, previously, step "compile" failed. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac_rel&nu...
On 2011/11/23 18:47:48, I haz the power (commit-bot) wrote: > Try job failure for 8635002-19001 (retry) on mac_rel for step "compile" (clobber > build). > It's a second try, previously, step "compile" failed. > http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac_rel&nu... I'll go poke at it a bit. Looks like someone else's breakage.
On 2011/11/23 18:56:50, shess wrote: > On 2011/11/23 18:47:48, I haz the power (commit-bot) wrote: > > Try job failure for 8635002-19001 (retry) on mac_rel for step "compile" > (clobber > > build). > > It's a second try, previously, step "compile" failed. > > > http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac_rel&nu... > > I'll go poke at it a bit. Looks like someone else's breakage. Yeah, looks like this caught robertshield's morning build break. I'll click it again.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/8635002/19001
Change committed as 111404 |