Issue 8635002: Make ObserverListThreadSafe key its observers by PlatformThreadId instead of MessageLoop.

Robert Sesek

9 years, 1 month ago (2011-11-22 05:13:37 UTC) #1

willchan no longer on Chromium

http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h#newcode98 base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); It's not clear to me ...

9 years, 1 month ago (2011-11-22 19:15:23 UTC) #2

Robert Sesek

Thanks for the review. http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h#newcode98 base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On ...

9 years, 1 month ago (2011-11-22 19:24:39 UTC) #3

willchan no longer on Chromium

http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h#newcode98 base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On 2011/11/22 19:24:39, rsesek wrote: ...

9 years, 1 month ago (2011-11-22 19:31:24 UTC) #4

Robert Sesek

http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h File base/observer_list_threadsafe.h (right): http://codereview.chromium.org/8635002/diff/1/base/observer_list_threadsafe.h#newcode98 base/observer_list_threadsafe.h:98: observer_lists_[thread_id] = new ObserverListContext(type_); On 2011/11/22 19:31:24, willchan wrote: ...

9 years, 1 month ago (2011-11-22 20:48:51 UTC) #5

willchan no longer on Chromium

Btw, can I see the actual code that revealed the problem? I'm sort of thinking ...

9 years, 1 month ago (2011-11-22 21:39:09 UTC) #6

Robert Sesek

On 2011/11/22 21:39:09, willchan wrote: > Btw, can I see the actual code that revealed ...

9 years, 1 month ago (2011-11-22 22:20:56 UTC) #7

On 2011/11/22 21:39:09, willchan wrote:
> Btw, can I see the actual code that revealed the problem? I'm sort of thinking
> it's the calling code that is at fault here. I think that if you add an
> observer, you must remove it while the MessageLoop is alive or not try to
remove
> it at all. Do we actually have situations where the MessageLoop goes away and
> comes back again?

There's no code per se, but I can try and explain it here as best as I can.

The specific issue is that net::CertDatabase is a singleton that holds a
ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do not
properly clean it up with an AtExitManager. (I'm trying to fix this in a
separate CL by putting up an AtExitManager for each individual test.)

In some tests, I've been using a base::ShadowingAtExitingManager to clean up a
singleton that is partially under test, with the destruction order looking like
this:

class T : public testing::Test {
 private:
  base::ShadowingAtExitManager at_exit_manager_;
  MessageLoop loop_;
  ObjectUnderTest o_;
};

o_ uses some other singleton S_, partially under test, that is cleaned up
properly by at_exit_manager_.

S_ indirectly uses a part of a URLRequest, which then adds some member as a
CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with loop_
already gone) and that URLRequest goes down, the URLRequest submember tries to
remove itself as an observer, but because there is no loop_, the RemoveObserver
operation gets dropped.

Subsequent allocations of the same submember of S_ that observes CertDatabase
will get allocated at the exact same address because of TCMalloc, leading the
ObserverList to hit a CHECK that the observer has already been added.

Now, loop_ could outlive at_exit_manager_, but that sort of violates shutdown
causality.

In short:
- Singletons don't get cleaned up properly between tests, making it possible for
state to stick around between them.
- MessageLoop going away before objects that hold ObserverListThreadSafe leaves
garbage in the ObserverLists.
- Singletons that hold ObserverListThreadSafes compound these two problems.

This has been a total WTF BBQ in my head. I hope this helps share the love.

Wez

On 2011/11/22 22:20:56, rsesek wrote: > On 2011/11/22 21:39:09, willchan wrote: > > Btw, can ...

9 years, 1 month ago (2011-11-22 22:38:12 UTC) #8

On 2011/11/22 22:20:56, rsesek wrote:
> On 2011/11/22 21:39:09, willchan wrote:
> > Btw, can I see the actual code that revealed the problem? I'm sort of
thinking
> > it's the calling code that is at fault here. I think that if you add an
> > observer, you must remove it while the MessageLoop is alive or not try to
> remove
> > it at all. Do we actually have situations where the MessageLoop goes away
and
> > comes back again?
> 
> There's no code per se, but I can try and explain it here as best as I can.
> 
> The specific issue is that net::CertDatabase is a singleton that holds a
> ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do not
> properly clean it up with an AtExitManager. (I'm trying to fix this in a
> separate CL by putting up an AtExitManager for each individual test.)
> 
> In some tests, I've been using a base::ShadowingAtExitingManager to clean up a
> singleton that is partially under test, with the destruction order looking
like
> this:
> 
> class T : public testing::Test {
>  private:
>   base::ShadowingAtExitManager at_exit_manager_;
>   MessageLoop loop_;
>   ObjectUnderTest o_;
> };
> 
> o_ uses some other singleton S_, partially under test, that is cleaned up
> properly by at_exit_manager_.
> 
> S_ indirectly uses a part of a URLRequest, which then adds some member as a
> CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with loop_
> already gone) and that URLRequest goes down, the URLRequest submember tries to
> remove itself as an observer, but because there is no loop_, the
RemoveObserver
> operation gets dropped.
> 
> Subsequent allocations of the same submember of S_ that observes CertDatabase
> will get allocated at the exact same address because of TCMalloc, leading the
> ObserverList to hit a CHECK that the observer has already been added.
> 
> Now, loop_ could outlive at_exit_manager_, but that sort of violates shutdown
> causality.
> 
> In short:
> - Singletons don't get cleaned up properly between tests, making it possible
for
> state to stick around between them.
> - MessageLoop going away before objects that hold ObserverListThreadSafe
leaves
> garbage in the ObserverLists.
> - Singletons that hold ObserverListThreadSafes compound these two problems.
> 
> This has been a total WTF BBQ in my head. I hope this helps share the love.

I'm inclined to agree with willchan@ that the way callers are making use of
ObserverListThreadSafe is faulty; ObserverListThreadSafe should surely barf if
it can't remove an observer, since otherwise a thread that adds an observer and
then exits before tearing down observers might leave the world in an unstable
state?

willchan no longer on Chromium

On Tue, Nov 22, 2011 at 2:20 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 21:39:09, willchan ...

9 years, 1 month ago (2011-11-22 22:39:37 UTC) #9

On Tue, Nov 22, 2011 at 2:20 PM, <rsesek@chromium.org> wrote:

> On 2011/11/22 21:39:09, willchan wrote:
>
>> Btw, can I see the actual code that revealed the problem? I'm sort of
>> thinking
>> it's the calling code that is at fault here. I think that if you add an
>> observer, you must remove it while the MessageLoop is alive or not try to
>>
> remove
>
>> it at all. Do we actually have situations where the MessageLoop goes away
>> and
>> comes back again?
>>
>
> There's no code per se, but I can try and explain it here as best as I can.
>
> The specific issue is that net::CertDatabase is a singleton that holds a
> ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do
> not
> properly clean it up with an AtExitManager. (I'm trying to fix this in a
> separate CL by putting up an AtExitManager for each individual test.)
>
> In some tests, I've been using a base::**ShadowingAtExitingManager to
> clean up a
> singleton that is partially under test, with the destruction order looking
> like
> this:
>
> class T : public testing::Test {
>  private:
>  base::ShadowingAtExitManager at_exit_manager_;
>  MessageLoop loop_;
>  ObjectUnderTest o_;
> };
>
> o_ uses some other singleton S_, partially under test, that is cleaned up
> properly by at_exit_manager_.
>
> S_ indirectly uses a part of a URLRequest, which then adds some member as a
> CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with
> loop_
> already gone) and that URLRequest goes down, the URLRequest submember
> tries to
> remove itself as an observer, but because there is no loop_, the
> RemoveObserver
> operation gets dropped.
>

This sounds like the problem to me. The Singleton should not be owning the
URLRequest. More specifically, a URLRequest shouldn't be getting destroyed
by the AtExitManager, since that should be destroyed after the MessageLoop
is. Why is that happening?


>
> Subsequent allocations of the same submember of S_ that observes
> CertDatabase
> will get allocated at the exact same address because of TCMalloc, leading
> the
> ObserverList to hit a CHECK that the observer has already been added.
>
> Now, loop_ could outlive at_exit_manager_, but that sort of violates
> shutdown
> causality.
>
> In short:
> - Singletons don't get cleaned up properly between tests, making it
> possible for
> state to stick around between them.
> - MessageLoop going away before objects that hold ObserverListThreadSafe
> leaves
> garbage in the ObserverLists.
> - Singletons that hold ObserverListThreadSafes compound these two problems.
>
> This has been a total WTF BBQ in my head. I hope this helps share the love.
>
>
http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635...
>

willchan no longer on Chromium

On 2011/11/22 22:39:37, willchan wrote: > On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> ...

9 years, 1 month ago (2011-11-22 22:46:22 UTC) #10

On 2011/11/22 22:39:37, willchan wrote:
> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote:
> 
> > On 2011/11/22 21:39:09, willchan wrote:
> >
> >> Btw, can I see the actual code that revealed the problem? I'm sort of
> >> thinking
> >> it's the calling code that is at fault here. I think that if you add an
> >> observer, you must remove it while the MessageLoop is alive or not try to
> >>
> > remove
> >
> >> it at all. Do we actually have situations where the MessageLoop goes away
> >> and
> >> comes back again?
> >>
> >
> > There's no code per se, but I can try and explain it here as best as I can.
> >
> > The specific issue is that net::CertDatabase is a singleton that holds a
> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do
> > not
> > properly clean it up with an AtExitManager. (I'm trying to fix this in a
> > separate CL by putting up an AtExitManager for each individual test.)
> >
> > In some tests, I've been using a base::**ShadowingAtExitingManager to
> > clean up a
> > singleton that is partially under test, with the destruction order looking
> > like
> > this:
> >
> > class T : public testing::Test {
> >  private:
> >  base::ShadowingAtExitManager at_exit_manager_;
> >  MessageLoop loop_;
> >  ObjectUnderTest o_;
> > };
> >
> > o_ uses some other singleton S_, partially under test, that is cleaned up
> > properly by at_exit_manager_.
> >
> > S_ indirectly uses a part of a URLRequest, which then adds some member as a
> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with
> > loop_
> > already gone) and that URLRequest goes down, the URLRequest submember
> > tries to
> > remove itself as an observer, but because there is no loop_, the
> > RemoveObserver
> > operation gets dropped.
> >
> 
> This sounds like the problem to me. The Singleton should not be owning the
> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed
> by the AtExitManager, since that should be destroyed after the MessageLoop
> is. Why is that happening?
> 
> 
> >
> > Subsequent allocations of the same submember of S_ that observes
> > CertDatabase
> > will get allocated at the exact same address because of TCMalloc, leading
> > the
> > ObserverList to hit a CHECK that the observer has already been added.
> >
> > Now, loop_ could outlive at_exit_manager_, but that sort of violates
> > shutdown
> > causality.
> >
> > In short:
> > - Singletons don't get cleaned up properly between tests, making it
> > possible for
> > state to stick around between them.
> > - MessageLoop going away before objects that hold ObserverListThreadSafe
> > leaves
> > garbage in the ObserverLists.
> > - Singletons that hold ObserverListThreadSafes compound these two problems.
> >
> > This has been a total WTF BBQ in my head. I hope this helps share the love.

Haha, I bet. That's what you get for being a good fellow and trying to tackle
this stuff. Thanks for doing it!

> >
> >
>
http://codereview.chromium.**org/8635002/%3Chttp://codereview.chromium.org/86...>
> >

Robert Sesek

On 2011/11/22 22:39:37, willchan wrote: > On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> ...

9 years, 1 month ago (2011-11-22 22:51:13 UTC) #11

On 2011/11/22 22:39:37, willchan wrote:
> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote:
> 
> > On 2011/11/22 21:39:09, willchan wrote:
> >
> >> Btw, can I see the actual code that revealed the problem? I'm sort of
> >> thinking
> >> it's the calling code that is at fault here. I think that if you add an
> >> observer, you must remove it while the MessageLoop is alive or not try to
> >>
> > remove
> >
> >> it at all. Do we actually have situations where the MessageLoop goes away
> >> and
> >> comes back again?
> >>
> >
> > There's no code per se, but I can try and explain it here as best as I can.
> >
> > The specific issue is that net::CertDatabase is a singleton that holds a
> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and do
> > not
> > properly clean it up with an AtExitManager. (I'm trying to fix this in a
> > separate CL by putting up an AtExitManager for each individual test.)
> >
> > In some tests, I've been using a base::**ShadowingAtExitingManager to
> > clean up a
> > singleton that is partially under test, with the destruction order looking
> > like
> > this:
> >
> > class T : public testing::Test {
> >  private:
> >  base::ShadowingAtExitManager at_exit_manager_;
> >  MessageLoop loop_;
> >  ObjectUnderTest o_;
> > };
> >
> > o_ uses some other singleton S_, partially under test, that is cleaned up
> > properly by at_exit_manager_.
> >
> > S_ indirectly uses a part of a URLRequest, which then adds some member as a
> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with
> > loop_
> > already gone) and that URLRequest goes down, the URLRequest submember
> > tries to
> > remove itself as an observer, but because there is no loop_, the
> > RemoveObserver
> > operation gets dropped.
> >
> 
> This sounds like the problem to me. The Singleton should not be owning the
> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed
> by the AtExitManager, since that should be destroyed after the MessageLoop
> is. Why is that happening?

It's not the URLRequest specifically (typing from memory now that bash logs are
long since clobbered). Some things use a MockURLRequestContext which leave
observers hanging around (I think). But in this specific instance, it's because
of some SSL related service that's used that adds the observer to CertDatabase

Sorry for the vagueness. I did this investigation a bit ago and determined that
this was a pretty smart approach rather than trying to fix every broken test. Of
which there are a lot.

Robert Sesek

On 2011/11/22 22:51:13, rsesek wrote: > long since clobbered). Some things use a MockURLRequestContext which ...

9 years, 1 month ago (2011-11-22 22:54:26 UTC) #12

willchan no longer on Chromium

On Tue, Nov 22, 2011 at 2:51 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 22:39:37, willchan ...

9 years, 1 month ago (2011-11-22 23:00:14 UTC) #13

On Tue, Nov 22, 2011 at 2:51 PM, <rsesek@chromium.org> wrote:

> On 2011/11/22 22:39:37, willchan wrote:
>
>> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote:
>>
>
>  > On 2011/11/22 21:39:09, willchan wrote:
>> >
>> >> Btw, can I see the actual code that revealed the problem? I'm sort of
>> >> thinking
>> >> it's the calling code that is at fault here. I think that if you add an
>> >> observer, you must remove it while the MessageLoop is alive or not try
>> to
>> >>
>> > remove
>> >
>> >> it at all. Do we actually have situations where the MessageLoop goes
>> away
>> >> and
>> >> comes back again?
>> >>
>> >
>> > There's no code per se, but I can try and explain it here as best as I
>> can.
>> >
>> > The specific issue is that net::CertDatabase is a singleton that holds a
>> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and
>> do
>> > not
>> > properly clean it up with an AtExitManager. (I'm trying to fix this in a
>> > separate CL by putting up an AtExitManager for each individual test.)
>> >
>> > In some tests, I've been using a base::****ShadowingAtExitingManager to
>> > clean up a
>> > singleton that is partially under test, with the destruction order
>> looking
>> > like
>> > this:
>> >
>> > class T : public testing::Test {
>> >  private:
>> >  base::ShadowingAtExitManager at_exit_manager_;
>> >  MessageLoop loop_;
>> >  ObjectUnderTest o_;
>> > };
>> >
>> > o_ uses some other singleton S_, partially under test, that is cleaned
>> up
>> > properly by at_exit_manager_.
>> >
>> > S_ indirectly uses a part of a URLRequest, which then adds some member
>> as a
>> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with
>> > loop_
>> > already gone) and that URLRequest goes down, the URLRequest submember
>> > tries to
>> > remove itself as an observer, but because there is no loop_, the
>> > RemoveObserver
>> > operation gets dropped.
>> >
>>
>
>  This sounds like the problem to me. The Singleton should not be owning the
>> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed
>> by the AtExitManager, since that should be destroyed after the MessageLoop
>> is. Why is that happening?
>>
>
> It's not the URLRequest specifically (typing from memory now that bash
> logs are
> long since clobbered). Some things use a MockURLRequestContext which leave
> observers hanging around (I think). But in this specific instance, it's
> because
> of some SSL related service that's used that adds the observer to
> CertDatabase
>
> Sorry for the vagueness. I did this investigation a bit ago and determined
> that
> this was a pretty smart approach rather than trying to fix every broken
> test. Of
> which there are a lot.
>

So, in my mind, there are the "right" thing to do and the "practical" thing
to do. The "right" thing to do would be to fix every broken test. The
"practical" thing to do would be to allow your change. Your change is adds
an extra header include and syscall on each AddObserver() and
RemoveObserver(). From a purist standpoint, that's a bit lame, but
practically speaking, the effects are negligible. And yes, it will fix the
problem. It comes down to a balancing act for me. If there are < 10 tests
that need to be fixed, and it's not too hard to do so, I'd push back. If
there are significantly more than that, then I'd accept this changelist as
is, because your time is more valuable than that. I would like extra
comments explaining why this is necessary, since when reading the code, it
won't be obvious why we're keying by PlatformThreadId.


>
>
http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635...
>

Robert Sesek

On 2011/11/22 23:00:14, willchan wrote: > On Tue, Nov 22, 2011 at 2:51 PM, <mailto:rsesek@chromium.org> ...

9 years, 1 month ago (2011-11-22 23:08:53 UTC) #14

On 2011/11/22 23:00:14, willchan wrote:
> On Tue, Nov 22, 2011 at 2:51 PM, <mailto:rsesek@chromium.org> wrote:
> 
> > On 2011/11/22 22:39:37, willchan wrote:
> >
> >> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote:
> >>
> >
> >  > On 2011/11/22 21:39:09, willchan wrote:
> >> >
> >> >> Btw, can I see the actual code that revealed the problem? I'm sort of
> >> >> thinking
> >> >> it's the calling code that is at fault here. I think that if you add an
> >> >> observer, you must remove it while the MessageLoop is alive or not try
> >> to
> >> >>
> >> > remove
> >> >
> >> >> it at all. Do we actually have situations where the MessageLoop goes
> >> away
> >> >> and
> >> >> comes back again?
> >> >>
> >> >
> >> > There's no code per se, but I can try and explain it here as best as I
> >> can.
> >> >
> >> > The specific issue is that net::CertDatabase is a singleton that holds a
> >> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase and
> >> do
> >> > not
> >> > properly clean it up with an AtExitManager. (I'm trying to fix this in a
> >> > separate CL by putting up an AtExitManager for each individual test.)
> >> >
> >> > In some tests, I've been using a base::****ShadowingAtExitingManager to
> >> > clean up a
> >> > singleton that is partially under test, with the destruction order
> >> looking
> >> > like
> >> > this:
> >> >
> >> > class T : public testing::Test {
> >> >  private:
> >> >  base::ShadowingAtExitManager at_exit_manager_;
> >> >  MessageLoop loop_;
> >> >  ObjectUnderTest o_;
> >> > };
> >> >
> >> > o_ uses some other singleton S_, partially under test, that is cleaned
> >> up
> >> > properly by at_exit_manager_.
> >> >
> >> > S_ indirectly uses a part of a URLRequest, which then adds some member
> >> as a
> >> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_ (with
> >> > loop_
> >> > already gone) and that URLRequest goes down, the URLRequest submember
> >> > tries to
> >> > remove itself as an observer, but because there is no loop_, the
> >> > RemoveObserver
> >> > operation gets dropped.
> >> >
> >>
> >
> >  This sounds like the problem to me. The Singleton should not be owning the
> >> URLRequest. More specifically, a URLRequest shouldn't be getting destroyed
> >> by the AtExitManager, since that should be destroyed after the MessageLoop
> >> is. Why is that happening?
> >>
> >
> > It's not the URLRequest specifically (typing from memory now that bash
> > logs are
> > long since clobbered). Some things use a MockURLRequestContext which leave
> > observers hanging around (I think). But in this specific instance, it's
> > because
> > of some SSL related service that's used that adds the observer to
> > CertDatabase
> >
> > Sorry for the vagueness. I did this investigation a bit ago and determined
> > that
> > this was a pretty smart approach rather than trying to fix every broken
> > test. Of
> > which there are a lot.
> >
> 
> So, in my mind, there are the "right" thing to do and the "practical" thing
> to do. The "right" thing to do would be to fix every broken test. The
> "practical" thing to do would be to allow your change. Your change is adds
> an extra header include and syscall on each AddObserver() and
> RemoveObserver(). From a purist standpoint, that's a bit lame, but
> practically speaking, the effects are negligible. And yes, it will fix the
> problem. It comes down to a balancing act for me. If there are < 10 tests
> that need to be fixed, and it's not too hard to do so, I'd push back. If
> there are significantly more than that, then I'd accept this changelist as
> is, because your time is more valuable than that. I would like extra
> comments explaining why this is necessary, since when reading the code, it
> won't be obvious why we're keying by PlatformThreadId.

I agree that this is kind of lame. Unfortunately, there are indeed more than 10
test cases. I gave up trying to fix them after getting to
chrome/browser/extensions/.

My ultimate pipe dream goal is to make singleton use hermetic between tests by
injecting an AtExitManager via base::TestSuite (mentioned above). This is
probably a week out or so (if all goes well).

How about we compromise? Go with this CL with the intention of it being
temporary on condition of me AtExiting all singletons between tests. Failing
that, we (I) should fix all those broken tests and revert back to using the
MessageLoop pointer. Or we can hold off on this and see if I can slay the
singleton dragon.

I'm off for today, but I've added a comment about PlatformThreadId.

willchan no longer on Chromium

On Tue, Nov 22, 2011 at 3:08 PM, <rsesek@chromium.org> wrote: > On 2011/11/22 23:00:14, willchan ...

9 years, 1 month ago (2011-11-22 23:11:26 UTC) #15

On Tue, Nov 22, 2011 at 3:08 PM, <rsesek@chromium.org> wrote:

> On 2011/11/22 23:00:14, willchan wrote:
>
>  On Tue, Nov 22, 2011 at 2:51 PM, <mailto:rsesek@chromium.org> wrote:
>>
>
>  > On 2011/11/22 22:39:37, willchan wrote:
>> >
>> >> On Tue, Nov 22, 2011 at 2:20 PM, <mailto:rsesek@chromium.org> wrote:
>> >>
>> >
>> >  > On 2011/11/22 21:39:09, willchan wrote:
>> >> >
>> >> >> Btw, can I see the actual code that revealed the problem? I'm sort
>> of
>> >> >> thinking
>> >> >> it's the calling code that is at fault here. I think that if you
>> add an
>> >> >> observer, you must remove it while the MessageLoop is alive or not
>> try
>> >> to
>> >> >>
>> >> > remove
>> >> >
>> >> >> it at all. Do we actually have situations where the MessageLoop goes
>> >> away
>> >> >> and
>> >> >> comes back again?
>> >> >>
>> >> >
>> >> > There's no code per se, but I can try and explain it here as best as
>> I
>> >> can.
>> >> >
>> >> > The specific issue is that net::CertDatabase is a singleton that
>> holds a
>> >> > ObserverListThreadSafe. A lot of tests indirectly use CertDatabase
>> and
>> >> do
>> >> > not
>> >> > properly clean it up with an AtExitManager. (I'm trying to fix this
>> in a
>> >> > separate CL by putting up an AtExitManager for each individual test.)
>> >> >
>> >> > In some tests, I've been using a base::******ShadowingAtExitingManager
>> to
>>
>> >> > clean up a
>> >> > singleton that is partially under test, with the destruction order
>> >> looking
>> >> > like
>> >> > this:
>> >> >
>> >> > class T : public testing::Test {
>> >> >  private:
>> >> >  base::ShadowingAtExitManager at_exit_manager_;
>> >> >  MessageLoop loop_;
>> >> >  ObjectUnderTest o_;
>> >> > };
>> >> >
>> >> > o_ uses some other singleton S_, partially under test, that is
>> cleaned
>> >> up
>> >> > properly by at_exit_manager_.
>> >> >
>> >> > S_ indirectly uses a part of a URLRequest, which then adds some
>> member
>> >> as a
>> >> > CertDatabase observer. When S_ is cleaned up by at_exit_manager_
>> (with
>> >> > loop_
>> >> > already gone) and that URLRequest goes down, the URLRequest submember
>> >> > tries to
>> >> > remove itself as an observer, but because there is no loop_, the
>> >> > RemoveObserver
>> >> > operation gets dropped.
>> >> >
>> >>
>> >
>> >  This sounds like the problem to me. The Singleton should not be owning
>> the
>> >> URLRequest. More specifically, a URLRequest shouldn't be getting
>> destroyed
>> >> by the AtExitManager, since that should be destroyed after the
>> MessageLoop
>> >> is. Why is that happening?
>> >>
>> >
>> > It's not the URLRequest specifically (typing from memory now that bash
>> > logs are
>> > long since clobbered). Some things use a MockURLRequestContext which
>> leave
>> > observers hanging around (I think). But in this specific instance, it's
>> > because
>> > of some SSL related service that's used that adds the observer to
>> > CertDatabase
>> >
>> > Sorry for the vagueness. I did this investigation a bit ago and
>> determined
>> > that
>> > this was a pretty smart approach rather than trying to fix every broken
>> > test. Of
>> > which there are a lot.
>> >
>>
>
>  So, in my mind, there are the "right" thing to do and the "practical"
>> thing
>> to do. The "right" thing to do would be to fix every broken test. The
>> "practical" thing to do would be to allow your change. Your change is adds
>> an extra header include and syscall on each AddObserver() and
>> RemoveObserver(). From a purist standpoint, that's a bit lame, but
>> practically speaking, the effects are negligible. And yes, it will fix the
>> problem. It comes down to a balancing act for me. If there are < 10 tests
>> that need to be fixed, and it's not too hard to do so, I'd push back. If
>> there are significantly more than that, then I'd accept this changelist as
>> is, because your time is more valuable than that. I would like extra
>> comments explaining why this is necessary, since when reading the code, it
>> won't be obvious why we're keying by PlatformThreadId.
>>
>
> I agree that this is kind of lame. Unfortunately, there are indeed more
> than 10
> test cases. I gave up trying to fix them after getting to
> chrome/browser/extensions/.
>
> My ultimate pipe dream goal is to make singleton use hermetic between
> tests by
> injecting an AtExitManager via base::TestSuite (mentioned above). This is
> probably a week out or so (if all goes well).
>
> How about we compromise? Go with this CL with the intention of it being
> temporary on condition of me AtExiting all singletons between tests.
> Failing
> that, we (I) should fix all those broken tests and revert back to using the
> MessageLoop pointer. Or we can hold off on this and see if I can slay the
> singleton dragon.
>
> I'm off for today, but I've added a comment about PlatformThreadId.
>

Glad we agree on the situation. I'm LGTM'ing this changelist so you can
submit it if you want, and I will let you decide whether or not to do so,
and hopefully reverting the change in the future if there's a point at
which we don't need it. Thanks for trying to make our tests' Singleton use
hermetic, much appreciated.


>
>
http://codereview.chromium.**org/8635002/<http://codereview.chromium.org/8635...
>

Scott Hess - ex-Googler

On 2011/11/22 23:08:53, rsesek wrote: > I'm off for today, but I've added a comment ...

9 years, 1 month ago (2011-11-23 17:31:16 UTC) #16

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/8635002/19001

9 years, 1 month ago (2011-11-23 17:32:23 UTC) #17

Robert Sesek

On 2011/11/23 17:31:16, shess wrote: > On 2011/11/22 23:08:53, rsesek wrote: > > I'm off ...

9 years, 1 month ago (2011-11-23 17:32:55 UTC) #18

Scott Hess - ex-Googler

BTW, I find myself wondering if another alternative would be to add a destruction observer ...

9 years, 1 month ago (2011-11-23 17:35:04 UTC) #19

Robert Sesek

On 2011/11/23 17:35:04, shess wrote: > BTW, I find myself wondering if another alternative would ...

9 years, 1 month ago (2011-11-23 17:43:38 UTC) #20

commit-bot: I haz the power

Try job failure for 8635002-19001 (retry) on mac_rel for step "compile" (clobber build). It's a ...

9 years, 1 month ago (2011-11-23 18:47:48 UTC) #21

Scott Hess - ex-Googler

On 2011/11/23 18:47:48, I haz the power (commit-bot) wrote: > Try job failure for 8635002-19001 ...

9 years, 1 month ago (2011-11-23 18:56:50 UTC) #22

Scott Hess - ex-Googler

On 2011/11/23 18:56:50, shess wrote: > On 2011/11/23 18:47:48, I haz the power (commit-bot) wrote: ...

9 years, 1 month ago (2011-11-23 18:58:26 UTC) #23

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/rsesek@chromium.org/8635002/19001

9 years, 1 month ago (2011-11-23 19:01:00 UTC) #24

Change committed as 111404

Issue 8635002: Make ObserverListThreadSafe key its observers by PlatformThreadId instead of MessageLoop. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Add a test #

Patch Set 3 : Add comment #

Messages