|
|
Created:
10 years ago by MAD Modified:
9 years, 7 months ago CC:
chromium-reviews, brettw-cc_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src/ Visibility:
Public. |
DescriptionProperly lock access to static variables.
There were a few race conditions where the static method would test the validity of the hitograms_ static variable before attempting to use the lock_ to protect access to it but nothing then prevented the destructor to free both the lock_ and the hitograms_ memory.
So I decided to not use a dynamic allocation of the static lock_ to resolve this problem.
BUG=38354
TEST=Hard to repro exit scenario crashes.
Committed: http://src.chromium.org/viewvc/chrome?view=rev&revision=70054
Patch Set 1 #Patch Set 2 : '' #Patch Set 3 : '' #
Total comments: 6
Patch Set 4 : '' #Patch Set 5 : '' #
Messages
Total messages: 22 (0 generated)
How about this? BYE MAD
I believe the code is still vulnerable to the same class of races, as an unjoined thread can still attempt to use the lock after the static destructor has been run (the atexit routine does not wait for all threads to join). The existing code used a dynamic allocation and deallocation, and now this code allocates sooner (before browser main), and deallocates later (long after browser main has exited)... but is still vulnerable. I think it is an improvement (race window is smaller)... and we could see how much better it does in the field. As I mentioned, better yet may be to more aggressively leak these resources (leaking the lock may be enough), as that would "solve" any such shutdown races (and hopefully not create a system resource leak). To do that full-fix, you need the dynamic allocation of the lock that this CL removes. LGTM (since this is at least a partial improvement... but I'd rather see the full fix)
It fixes crash issue (or may be just move to other location). We better should decide do we need this metrics or not. If we are OK with losing than may be just don't sent at exit at all? On 2010/12/14 22:54:37, jar wrote: > I believe the code is still vulnerable to the same class of races, as an > unjoined thread can still attempt to use the lock after the static destructor > has been run (the atexit routine does not wait for all threads to join). The > existing code used a dynamic allocation and deallocation, and now this code > allocates sooner (before browser main), and deallocates later (long after > browser main has exited)... but is still vulnerable. > > I think it is an improvement (race window is smaller)... and we could see how > much better it does in the field. > > As I mentioned, better yet may be to more aggressively leak these resources > (leaking the lock may be enough), as that would "solve" any such shutdown races > (and hopefully not create a system resource leak). To do that full-fix, you > need the dynamic allocation of the lock that this CL removes. > > LGTM (since this is at least a partial improvement... but I'd rather see the > full fix)
Personally I rather join the thread and wait for upload. It may slow shutdown but I see no problem with this. On 2010/12/15 00:57:47, Vitaly Buka wrote: > It fixes crash issue (or may be just move to other location). > We better should decide do we need this metrics or not. > If we are OK with losing than may be just don't sent at exit at all? > > On 2010/12/14 22:54:37, jar wrote: > > I believe the code is still vulnerable to the same class of races, as an > > unjoined thread can still attempt to use the lock after the static destructor > > has been run (the atexit routine does not wait for all threads to join). The > > existing code used a dynamic allocation and deallocation, and now this code > > allocates sooner (before browser main), and deallocates later (long after > > browser main has exited)... but is still vulnerable. > > > > I think it is an improvement (race window is smaller)... and we could see how > > much better it does in the field. > > > > As I mentioned, better yet may be to more aggressively leak these resources > > (leaking the lock may be enough), as that would "solve" any such shutdown > races > > (and hopefully not create a system resource leak). To do that full-fix, you > > need the dynamic allocation of the lock that this CL removes. > > > > LGTM (since this is at least a partial improvement... but I'd rather see the > > full fix)
I just checked. CL definitely does not fix to much. Last time it crashed in this thread on other static Lock. On 2010/12/15 01:00:11, Vitaly Buka wrote: > Personally I rather join the thread and wait for upload. It may slow shutdown > but I see no problem with this. > > On 2010/12/15 00:57:47, Vitaly Buka wrote: > > It fixes crash issue (or may be just move to other location). > > We better should decide do we need this metrics or not. > > If we are OK with losing than may be just don't sent at exit at all? > > > > On 2010/12/14 22:54:37, jar wrote: > > > I believe the code is still vulnerable to the same class of races, as an > > > unjoined thread can still attempt to use the lock after the static > destructor > > > has been run (the atexit routine does not wait for all threads to join). > The > > > existing code used a dynamic allocation and deallocation, and now this code > > > allocates sooner (before browser main), and deallocates later (long after > > > browser main has exited)... but is still vulnerable. > > > > > > I think it is an improvement (race window is smaller)... and we could see > how > > > much better it does in the field. > > > > > > As I mentioned, better yet may be to more aggressively leak these resources > > > (leaking the lock may be enough), as that would "solve" any such shutdown > > races > > > (and hopefully not create a system resource leak). To do that full-fix, you > > > need the dynamic allocation of the lock that this CL removes. > > > > > > LGTM (since this is at least a partial improvement... but I'd rather see > the > > > full fix)
Which other static lock? Maybe we should try the compromise proposed by jar@, which is to allocate the lck and not release it? BYE MAD Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : > I just checked. CL definitely does not fix to much. Last time it crashed in > this > thread on other static Lock. > > On 2010/12/15 01:00:11, Vitaly Buka wrote: >> Personally I rather join the thread and wait for upload. It may slow >> shutdown >> but I see no problem with this. > >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >> > It fixes crash issue (or may be just move to other location). >> > We better should decide do we need this metrics or not. >> > If we are OK with losing than may be just don't sent at exit at all? >> > >> > On 2010/12/14 22:54:37, jar wrote: >> > > I believe the code is still vulnerable to the same class of races, as >> an >> > > unjoined thread can still attempt to use the lock after the static >> destructor >> > > has been run (the atexit routine does not wait for all threads to >> join). >> The >> > > existing code used a dynamic allocation and deallocation, and now this > code >> > > allocates sooner (before browser main), and deallocates later (long >> after >> > > browser main has exited)... but is still vulnerable. >> > > >> > > I think it is an improvement (race window is smaller)... and we could >> see >> how >> > > much better it does in the field. >> > > >> > > As I mentioned, better yet may be to more aggressively leak these > resources >> > > (leaking the lock may be enough), as that would "solve" any such >> shutdown >> > races >> > > (and hopefully not create a system resource leak). To do that >> full-fix, > you >> > > need the dynamic allocation of the lock that this CL removes. >> > > >> > > LGTM (since this is at least a partial improvement... but I'd rather >> see >> the >> > > full fix) > > > > http://codereview.chromium.org/5784005/
On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org> wrote: > Which other static lock? > I need some time to reproduce. > Maybe we should try the compromise proposed by jar@, which is to allocate > the lck and not release it? > I see no benefits of compromise. Leaking such threads will anyway produce some noise of crashes. Any reasons to not join thread? Also inconsistence with this metrics may create biased results. I am not sure that probability to finish this task is equal for different computer. > BYE > MAD > Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : > > > I just checked. CL definitely does not fix to much. Last time it crashed > in > > this > > thread on other static Lock. > > > > On 2010/12/15 01:00:11, Vitaly Buka wrote: > >> Personally I rather join the thread and wait for upload. It may slow > >> shutdown > >> but I see no problem with this. > > > >> On 2010/12/15 00:57:47, Vitaly Buka wrote: > >> > It fixes crash issue (or may be just move to other location). > >> > We better should decide do we need this metrics or not. > >> > If we are OK with losing than may be just don't sent at exit at all? > >> > > >> > On 2010/12/14 22:54:37, jar wrote: > >> > > I believe the code is still vulnerable to the same class of races, > as > >> an > >> > > unjoined thread can still attempt to use the lock after the static > >> destructor > >> > > has been run (the atexit routine does not wait for all threads to > >> join). > >> The > >> > > existing code used a dynamic allocation and deallocation, and now > this > > code > >> > > allocates sooner (before browser main), and deallocates later (long > >> after > >> > > browser main has exited)... but is still vulnerable. > >> > > > >> > > I think it is an improvement (race window is smaller)... and we > could > >> see > >> how > >> > > much better it does in the field. > >> > > > >> > > As I mentioned, better yet may be to more aggressively leak these > > resources > >> > > (leaking the lock may be enough), as that would "solve" any such > >> shutdown > >> > races > >> > > (and hopefully not create a system resource leak). To do that > >> full-fix, > > you > >> > > need the dynamic allocation of the lock that this CL removes. > >> > > > >> > > LGTM (since this is at least a partial improvement... but I'd rather > > >> see > >> the > >> > > full fix) > > > > > > > > http://codereview.chromium.org/5784005/ >
This is one of them. I seen others. Lock rollover_lock; in base\time_win.cc ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line 276 C++ ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * session=0x0073f818, const wchar_t * url=0x0087b868, WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * results=0x0428f944) Line 30 + 0xb bytes C++ ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL & query_url={...}, net::ProxyInfo * results=0x024f0e98, CallbackRunner<Tuple1<int> > * __formal=0x00000000, CallbackRunner<Tuple1<int> > * __formal=0x00000000, CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes C++ ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop * origin_loop=0x040efe50) Line 267 C++ ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void (__thiscall net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> >::Run() Line 330 + 0xf bytes C++ ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line 419 C++ ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const MessageLoop::PendingTask & pending_task={...}) Line 430 C++ ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ ceee_broker.exe!MessageLoop::Run() Line 217 C++ ceee_broker.exe!base::Thread::Run(MessageLoop * message_loop=0x0428fb88) Line 141 C++ ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ Thanks. Vitaly On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com> wrote: > On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: > >> Which other static lock? >> > I need some time to reproduce. > > >> Maybe we should try the compromise proposed by jar@, which is to allocate >> the lck and not release it? >> > > I see no benefits of compromise. Leaking such threads will anyway produce > some noise of crashes. > Any reasons to not join thread? > > Also inconsistence with this metrics may create biased results. I am > not sure that probability to finish this task is equal for different > computer. > > > >> BYE >> MAD >> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >> >> > I just checked. CL definitely does not fix to much. Last time it crashed >> in >> > this >> > thread on other static Lock. >> > >> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >> >> Personally I rather join the thread and wait for upload. It may slow >> >> shutdown >> >> but I see no problem with this. >> > >> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >> >> > It fixes crash issue (or may be just move to other location). >> >> > We better should decide do we need this metrics or not. >> >> > If we are OK with losing than may be just don't sent at exit at all? >> >> > >> >> > On 2010/12/14 22:54:37, jar wrote: >> >> > > I believe the code is still vulnerable to the same class of races, >> as >> >> an >> >> > > unjoined thread can still attempt to use the lock after the static >> >> destructor >> >> > > has been run (the atexit routine does not wait for all threads to >> >> join). >> >> The >> >> > > existing code used a dynamic allocation and deallocation, and now >> this >> > code >> >> > > allocates sooner (before browser main), and deallocates later (long >> >> >> after >> >> > > browser main has exited)... but is still vulnerable. >> >> > > >> >> > > I think it is an improvement (race window is smaller)... and we >> could >> >> see >> >> how >> >> > > much better it does in the field. >> >> > > >> >> > > As I mentioned, better yet may be to more aggressively leak these >> > resources >> >> > > (leaking the lock may be enough), as that would "solve" any such >> >> shutdown >> >> > races >> >> > > (and hopefully not create a system resource leak). To do that >> >> full-fix, >> > you >> >> > > need the dynamic allocation of the lock that this CL removes. >> >> > > >> >> > > LGTM (since this is at least a partial improvement... but I'd >> rather >> >> see >> >> the >> >> > > full fix) >> > >> > >> > >> > http://codereview.chromium.org/5784005/ >> > >
I am strongly against leaking threads if you don't know 100% resources it uses. On Tue, Dec 14, 2010 at 17:41, Vitaly Buka <vitalybuka@google.com> wrote: > This is one of them. I seen others. > > Lock rollover_lock; in base\time_win.cc > > > ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes > ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes > > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line 276 > C++ > ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ > ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * > session=0x0073f818, const wchar_t * url=0x0087b868, > WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * > results=0x0428f944) Line 30 + 0xb bytes C++ > ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL & > query_url={...}, net::ProxyInfo * results=0x024f0e98, > CallbackRunner<Tuple1<int> > * __formal=0x00000000, > CallbackRunner<Tuple1<int> > * __formal=0x00000000, > CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes > C++ > ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop > * origin_loop=0x040efe50) Line 267 C++ > ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void > (__thiscall > net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> > >::Run() Line 330 + 0xf bytes C++ > ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line 419 > C++ > ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const > MessageLoop::PendingTask & pending_task={...}) Line 430 C++ > ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ > ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate > * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ > ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ > ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ > ceee_broker.exe!MessageLoop::Run() Line 217 C++ > ceee_broker.exe!base::Thread::Run(MessageLoop * message_loop=0x0428fb88) > Line 141 C++ > ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ > > > Thanks. > Vitaly > > > > On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com> wrote: > >> On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: >> >>> Which other static lock? >>> >> I need some time to reproduce. >> >> >>> Maybe we should try the compromise proposed by jar@, which is to >>> allocate the lck and not release it? >>> >> >> I see no benefits of compromise. Leaking such threads will anyway produce >> some noise of crashes. >> Any reasons to not join thread? >> >> Also inconsistence with this metrics may create biased results. I am >> not sure that probability to finish this task is equal for different >> computer. >> >> >> >>> BYE >>> MAD >>> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >>> >>> > I just checked. CL definitely does not fix to much. Last time it >>> crashed in >>> > this >>> > thread on other static Lock. >>> > >>> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >>> >> Personally I rather join the thread and wait for upload. It may slow >>> >> shutdown >>> >> but I see no problem with this. >>> > >>> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >>> >> > It fixes crash issue (or may be just move to other location). >>> >> > We better should decide do we need this metrics or not. >>> >> > If we are OK with losing than may be just don't sent at exit at all? >>> >> > >>> >> > On 2010/12/14 22:54:37, jar wrote: >>> >> > > I believe the code is still vulnerable to the same class of races, >>> as >>> >> an >>> >> > > unjoined thread can still attempt to use the lock after the static >>> >> destructor >>> >> > > has been run (the atexit routine does not wait for all threads to >>> >> join). >>> >> The >>> >> > > existing code used a dynamic allocation and deallocation, and now >>> this >>> > code >>> >> > > allocates sooner (before browser main), and deallocates later >>> (long >>> >> after >>> >> > > browser main has exited)... but is still vulnerable. >>> >> > > >>> >> > > I think it is an improvement (race window is smaller)... and we >>> could >>> >> see >>> >> how >>> >> > > much better it does in the field. >>> >> > > >>> >> > > As I mentioned, better yet may be to more aggressively leak these >>> > resources >>> >> > > (leaking the lock may be enough), as that would "solve" any such >>> >> shutdown >>> >> > races >>> >> > > (and hopefully not create a system resource leak). To do that >>> >> full-fix, >>> > you >>> >> > > need the dynamic allocation of the lock that this CL removes. >>> >> > > >>> >> > > LGTM (since this is at least a partial improvement... but I'd >>> rather >>> >> see >>> >> the >>> >> > > full fix) >>> > >>> > >>> > >>> > http://codereview.chromium.org/5784005/ >>> >> >> >
Ha, I thought you saw other crashes with the same metrics code... Yes, this other one is known too, and has been filed here: http://code.google.com/p/chromium/issues/detail?id=64388 I don't understand what you mean by joining the thread... I'm not sure it will resolve the issue identified in http://code.google.com/p/chromium/issues/detail?id=38354... As for the leak, it's only leaking once (and only one object) per process, and only at process exit, so not a "real" leak, right? Because the process exit will relinquish all system resources anyway, right? Unless I'm missing something... BYE MAD On Tue, Dec 14, 2010 at 8:41 PM, Vitaly Buka <vitalybuka@google.com> wrote: > This is one of them. I seen others. > > Lock rollover_lock; in base\time_win.cc > > > ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes > ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes > > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line 276 > C++ > ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ > ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * > session=0x0073f818, const wchar_t * url=0x0087b868, > WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * > results=0x0428f944) Line 30 + 0xb bytes C++ > ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL & > query_url={...}, net::ProxyInfo * results=0x024f0e98, > CallbackRunner<Tuple1<int> > * __formal=0x00000000, > CallbackRunner<Tuple1<int> > * __formal=0x00000000, > CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes > C++ > ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop > * origin_loop=0x040efe50) Line 267 C++ > ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void > (__thiscall > net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> > >::Run() Line 330 + 0xf bytes C++ > ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line 419 > C++ > ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const > MessageLoop::PendingTask & pending_task={...}) Line 430 C++ > ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ > ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate > * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ > ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ > ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ > ceee_broker.exe!MessageLoop::Run() Line 217 C++ > ceee_broker.exe!base::Thread::Run(MessageLoop * message_loop=0x0428fb88) > Line 141 C++ > ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ > > > Thanks. > Vitaly > > > > On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com> wrote: > >> On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: >> >>> Which other static lock? >>> >> I need some time to reproduce. >> >> >>> Maybe we should try the compromise proposed by jar@, which is to >>> allocate the lck and not release it? >>> >> >> I see no benefits of compromise. Leaking such threads will anyway produce >> some noise of crashes. >> Any reasons to not join thread? >> >> Also inconsistence with this metrics may create biased results. I am >> not sure that probability to finish this task is equal for different >> computer. >> >> >> >>> BYE >>> MAD >>> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >>> >>> > I just checked. CL definitely does not fix to much. Last time it >>> crashed in >>> > this >>> > thread on other static Lock. >>> > >>> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >>> >> Personally I rather join the thread and wait for upload. It may slow >>> >> shutdown >>> >> but I see no problem with this. >>> > >>> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >>> >> > It fixes crash issue (or may be just move to other location). >>> >> > We better should decide do we need this metrics or not. >>> >> > If we are OK with losing than may be just don't sent at exit at all? >>> >> > >>> >> > On 2010/12/14 22:54:37, jar wrote: >>> >> > > I believe the code is still vulnerable to the same class of races, >>> as >>> >> an >>> >> > > unjoined thread can still attempt to use the lock after the static >>> >> destructor >>> >> > > has been run (the atexit routine does not wait for all threads to >>> >> join). >>> >> The >>> >> > > existing code used a dynamic allocation and deallocation, and now >>> this >>> > code >>> >> > > allocates sooner (before browser main), and deallocates later >>> (long >>> >> after >>> >> > > browser main has exited)... but is still vulnerable. >>> >> > > >>> >> > > I think it is an improvement (race window is smaller)... and we >>> could >>> >> see >>> >> how >>> >> > > much better it does in the field. >>> >> > > >>> >> > > As I mentioned, better yet may be to more aggressively leak these >>> > resources >>> >> > > (leaking the lock may be enough), as that would "solve" any such >>> >> shutdown >>> >> > races >>> >> > > (and hopefully not create a system resource leak). To do that >>> >> full-fix, >>> > you >>> >> > > need the dynamic allocation of the lock that this CL removes. >>> >> > > >>> >> > > LGTM (since this is at least a partial improvement... but I'd >>> rather >>> >> see >>> >> the >>> >> > > full fix) >>> > >>> > >>> > >>> > http://codereview.chromium.org/5784005/ >>> >> >> >
Sorry. This is different one. On Tue, Dec 14, 2010 at 17:59, Marc-Andre Decoste <mad@chromium.org> wrote: > Ha, I thought you saw other crashes with the same metrics code... > > Yes, this other one is known too, and has been filed here: > http://code.google.com/p/chromium/issues/detail?id=64388 > > I don't understand what you mean by joining the thread... I'm not sure it > will resolve the issue identified in > http://code.google.com/p/chromium/issues/detail?id=38354... > Waiting for thread. But I am not sure if this is possible here. it possible in issues 64388. > > As for the leak, it's only leaking once (and only one object) per process, > and only at process exit, so not a "real" leak, right? Because the process > exit will relinquish all system resources anyway, right? Unless I'm missing > something... > I don't afraid about leaking resources. It will sometimes crash accessing other destroyed resource. I hate this kind of bugs :-). Real problem is not that Lock is destroyed. Problem is that this thread is still running when main thread destroys application. > > BYE > MAD > > > On Tue, Dec 14, 2010 at 8:41 PM, Vitaly Buka <vitalybuka@google.com>wrote: > >> This is one of them. I seen others. >> >> Lock rollover_lock; in base\time_win.cc >> >> >> ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes >> ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes >> > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line 276 >> C++ >> ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ >> ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * >> session=0x0073f818, const wchar_t * url=0x0087b868, >> WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * >> results=0x0428f944) Line 30 + 0xb bytes C++ >> ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL & >> query_url={...}, net::ProxyInfo * results=0x024f0e98, >> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >> CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes >> C++ >> ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop >> * origin_loop=0x040efe50) Line 267 C++ >> ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void >> (__thiscall >> net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> >> >::Run() Line 330 + 0xf bytes C++ >> ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line 419 >> C++ >> ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const >> MessageLoop::PendingTask & pending_task={...}) Line 430 C++ >> ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ >> ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate >> * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ >> ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ >> ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ >> ceee_broker.exe!MessageLoop::Run() Line 217 C++ >> ceee_broker.exe!base::Thread::Run(MessageLoop * >> message_loop=0x0428fb88) Line 141 C++ >> ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ >> >> >> Thanks. >> Vitaly >> >> >> >> On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com> wrote: >> >>> On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: >>> >>>> Which other static lock? >>>> >>> I need some time to reproduce. >>> >>> >>>> Maybe we should try the compromise proposed by jar@, which is to >>>> allocate the lck and not release it? >>>> >>> >>> I see no benefits of compromise. Leaking such threads will anyway produce >>> some noise of crashes. >>> Any reasons to not join thread? >>> >>> Also inconsistence with this metrics may create biased results. I am >>> not sure that probability to finish this task is equal for different >>> computer. >>> >>> >>> >>>> BYE >>>> MAD >>>> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >>>> >>>> > I just checked. CL definitely does not fix to much. Last time it >>>> crashed in >>>> > this >>>> > thread on other static Lock. >>>> > >>>> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >>>> >> Personally I rather join the thread and wait for upload. It may slow >>>> >> shutdown >>>> >> but I see no problem with this. >>>> > >>>> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >>>> >> > It fixes crash issue (or may be just move to other location). >>>> >> > We better should decide do we need this metrics or not. >>>> >> > If we are OK with losing than may be just don't sent at exit at >>>> all? >>>> >> > >>>> >> > On 2010/12/14 22:54:37, jar wrote: >>>> >> > > I believe the code is still vulnerable to the same class of >>>> races, as >>>> >> an >>>> >> > > unjoined thread can still attempt to use the lock after the >>>> static >>>> >> destructor >>>> >> > > has been run (the atexit routine does not wait for all threads to >>>> >>>> >> join). >>>> >> The >>>> >> > > existing code used a dynamic allocation and deallocation, and now >>>> this >>>> > code >>>> >> > > allocates sooner (before browser main), and deallocates later >>>> (long >>>> >> after >>>> >> > > browser main has exited)... but is still vulnerable. >>>> >> > > >>>> >> > > I think it is an improvement (race window is smaller)... and we >>>> could >>>> >> see >>>> >> how >>>> >> > > much better it does in the field. >>>> >> > > >>>> >> > > As I mentioned, better yet may be to more aggressively leak these >>>> > resources >>>> >> > > (leaking the lock may be enough), as that would "solve" any such >>>> >> shutdown >>>> >> > races >>>> >> > > (and hopefully not create a system resource leak). To do that >>>> >> full-fix, >>>> > you >>>> >> > > need the dynamic allocation of the lock that this CL removes. >>>> >> > > >>>> >> > > LGTM (since this is at least a partial improvement... but I'd >>>> rather >>>> >> see >>>> >> the >>>> >> > > full fix) >>>> > >>>> > >>>> > >>>> > http://codereview.chromium.org/5784005/ >>>> >>> >>> >> >
> > I don't afraid about leaking resources. It will sometimes crash accessing > other destroyed resource. I hate this kind of bugs :-). > Real problem is not that Lock is destroyed. Problem is that this thread is > still running when main thread destroys application. OK, then maybe we were not seeing the same thing, in the case where I saw this, the thread that were access the destroyed histogram_ data were about to be joined, but crashed before the join completed... BYE MAD On Tue, Dec 14, 2010 at 9:13 PM, Vitaly Buka <vitalybuka@google.com> wrote: > Sorry. This is different one. > > On Tue, Dec 14, 2010 at 17:59, Marc-Andre Decoste <mad@chromium.org>wrote: > >> Ha, I thought you saw other crashes with the same metrics code... >> >> Yes, this other one is known too, and has been filed here: >> http://code.google.com/p/chromium/issues/detail?id=64388 >> >> I don't understand what you mean by joining the thread... I'm not sure it >> will resolve the issue identified in >> http://code.google.com/p/chromium/issues/detail?id=38354... >> > > Waiting for thread. But I am not sure if this is possible here. it possible > in issues 64388. > > >> >> As for the leak, it's only leaking once (and only one object) per process, >> and only at process exit, so not a "real" leak, right? Because the process >> exit will relinquish all system resources anyway, right? Unless I'm missing >> something... >> > > I don't afraid about leaking resources. It will sometimes crash accessing > other destroyed resource. I hate this kind of bugs :-). > Real problem is not that Lock is destroyed. Problem is that this thread is > still running when main thread destroys application. > > >> >> BYE >> MAD >> >> >> On Tue, Dec 14, 2010 at 8:41 PM, Vitaly Buka <vitalybuka@google.com>wrote: >> >>> This is one of them. I seen others. >>> >>> Lock rollover_lock; in base\time_win.cc >>> >>> >>> ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes >>> ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes >>> > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line >>> 276 C++ >>> ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ >>> ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * >>> session=0x0073f818, const wchar_t * url=0x0087b868, >>> WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * >>> results=0x0428f944) Line 30 + 0xb bytes C++ >>> ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL & >>> query_url={...}, net::ProxyInfo * results=0x024f0e98, >>> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >>> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >>> CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes >>> C++ >>> ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop >>> * origin_loop=0x040efe50) Line 267 C++ >>> ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void >>> (__thiscall >>> net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> >>> >::Run() Line 330 + 0xf bytes C++ >>> ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line 419 >>> C++ >>> ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const >>> MessageLoop::PendingTask & pending_task={...}) Line 430 C++ >>> ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ >>> ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate >>> * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ >>> ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ >>> ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ >>> ceee_broker.exe!MessageLoop::Run() Line 217 C++ >>> ceee_broker.exe!base::Thread::Run(MessageLoop * >>> message_loop=0x0428fb88) Line 141 C++ >>> ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ >>> >>> >>> Thanks. >>> Vitaly >>> >>> >>> >>> On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com>wrote: >>> >>>> On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: >>>> >>>>> Which other static lock? >>>>> >>>> I need some time to reproduce. >>>> >>>> >>>>> Maybe we should try the compromise proposed by jar@, which is to >>>>> allocate the lck and not release it? >>>>> >>>> >>>> I see no benefits of compromise. Leaking such threads will anyway >>>> produce some noise of crashes. >>>> Any reasons to not join thread? >>>> >>>> Also inconsistence with this metrics may create biased results. I am >>>> not sure that probability to finish this task is equal for different >>>> computer. >>>> >>>> >>>> >>>>> BYE >>>>> MAD >>>>> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >>>>> >>>>> > I just checked. CL definitely does not fix to much. Last time it >>>>> crashed in >>>>> > this >>>>> > thread on other static Lock. >>>>> > >>>>> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >>>>> >> Personally I rather join the thread and wait for upload. It may slow >>>>> >>>>> >> shutdown >>>>> >> but I see no problem with this. >>>>> > >>>>> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >>>>> >> > It fixes crash issue (or may be just move to other location). >>>>> >> > We better should decide do we need this metrics or not. >>>>> >> > If we are OK with losing than may be just don't sent at exit at >>>>> all? >>>>> >> > >>>>> >> > On 2010/12/14 22:54:37, jar wrote: >>>>> >> > > I believe the code is still vulnerable to the same class of >>>>> races, as >>>>> >> an >>>>> >> > > unjoined thread can still attempt to use the lock after the >>>>> static >>>>> >> destructor >>>>> >> > > has been run (the atexit routine does not wait for all threads >>>>> to >>>>> >> join). >>>>> >> The >>>>> >> > > existing code used a dynamic allocation and deallocation, and >>>>> now this >>>>> > code >>>>> >> > > allocates sooner (before browser main), and deallocates later >>>>> (long >>>>> >> after >>>>> >> > > browser main has exited)... but is still vulnerable. >>>>> >> > > >>>>> >> > > I think it is an improvement (race window is smaller)... and we >>>>> could >>>>> >> see >>>>> >> how >>>>> >> > > much better it does in the field. >>>>> >> > > >>>>> >> > > As I mentioned, better yet may be to more aggressively leak >>>>> these >>>>> > resources >>>>> >> > > (leaking the lock may be enough), as that would "solve" any such >>>>> >>>>> >> shutdown >>>>> >> > races >>>>> >> > > (and hopefully not create a system resource leak). To do that >>>>> >> full-fix, >>>>> > you >>>>> >> > > need the dynamic allocation of the lock that this CL removes. >>>>> >> > > >>>>> >> > > LGTM (since this is at least a partial improvement... but I'd >>>>> rather >>>>> >> see >>>>> >> the >>>>> >> > > full fix) >>>>> > >>>>> > >>>>> > >>>>> > http://codereview.chromium.org/5784005/ >>>>> >>>> >>>> >>> >> >
It looks like chrome.dll was released before this moment. Maybe best fix is to release dll later. On Tue, Dec 14, 2010 at 18:19, Marc-Andre Decoste <mad@chromium.org> wrote: > I don't afraid about leaking resources. It will sometimes crash accessing >> other destroyed resource. I hate this kind of bugs :-). >> Real problem is not that Lock is destroyed. Problem is that this thread is >> still running when main thread destroys application. > > OK, then maybe we were not seeing the same thing, in the case where I saw > this, the thread that were access the destroyed histogram_ data were about > to be joined, but crashed before the join completed... > > BYE > MAD > > > On Tue, Dec 14, 2010 at 9:13 PM, Vitaly Buka <vitalybuka@google.com>wrote: > >> Sorry. This is different one. >> >> On Tue, Dec 14, 2010 at 17:59, Marc-Andre Decoste <mad@chromium.org>wrote: >> >>> Ha, I thought you saw other crashes with the same metrics code... >>> >>> Yes, this other one is known too, and has been filed here: >>> http://code.google.com/p/chromium/issues/detail?id=64388 >>> >>> I don't understand what you mean by joining the thread... I'm not sure it >>> will resolve the issue identified in >>> http://code.google.com/p/chromium/issues/detail?id=38354... >>> >> >> Waiting for thread. But I am not sure if this is possible here. it >> possible in issues 64388. >> >> >>> >>> As for the leak, it's only leaking once (and only one object) per >>> process, and only at process exit, so not a "real" leak, right? Because the >>> process exit will relinquish all system resources anyway, right? Unless I'm >>> missing something... >>> >> >> I don't afraid about leaking resources. It will sometimes crash accessing >> other destroyed resource. I hate this kind of bugs :-). >> Real problem is not that Lock is destroyed. Problem is that this thread is >> still running when main thread destroys application. >> >> >>> >>> BYE >>> MAD >>> >>> >>> On Tue, Dec 14, 2010 at 8:41 PM, Vitaly Buka <vitalybuka@google.com>wrote: >>> >>>> This is one of them. I seen others. >>>> >>>> Lock rollover_lock; in base\time_win.cc >>>> >>>> >>>> ntdll.dll!_RtlpWaitOnCriticalSection@8() + 0x99 bytes >>>> ntdll.dll!_RtlEnterCriticalSection@4() + 0x168d8 bytes >>>> > ceee_broker.exe!`anonymous namespace'::RolloverProtectedNow() Line >>>> 276 C++ >>>> ceee_broker.exe!base::TimeTicks::Now() Line 392 + 0x9 bytes C++ >>>> ceee_broker.exe!net::CallWinHttpGetProxyForUrl(void * >>>> session=0x0073f818, const wchar_t * url=0x0087b868, >>>> WINHTTP_AUTOPROXY_OPTIONS * options=0x0428f95c, WINHTTP_PROXY_INFO * >>>> results=0x0428f944) Line 30 + 0xb bytes C++ >>>> ceee_broker.exe!net::ProxyResolverWinHttp::GetProxyForURL(const GURL >>>> & query_url={...}, net::ProxyInfo * results=0x024f0e98, >>>> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >>>> CallbackRunner<Tuple1<int> > * __formal=0x00000000, >>>> CallbackRunner<Tuple1<int> > * __formal=0x00000000) Line 86 + 0x35 bytes >>>> C++ >>>> ceee_broker.exe!net::MultiThreadedProxyResolver::GetProxyForURLJob::Run(MessageLoop >>>> * origin_loop=0x040efe50) Line 267 C++ >>>> ceee_broker.exe!RunnableMethod<net::MultiThreadedProxyResolver::SetPacScriptJob,void >>>> (__thiscall >>>> net::MultiThreadedProxyResolver::SetPacScriptJob::*)(int),Tuple1<int> >>>> >::Run() Line 330 + 0xf bytes C++ >>>> ceee_broker.exe!MessageLoop::RunTask(Task * task=0x0087a6e0) Line >>>> 419 C++ >>>> ceee_broker.exe!MessageLoop::DeferOrRunPendingTask(const >>>> MessageLoop::PendingTask & pending_task={...}) Line 430 C++ >>>> ceee_broker.exe!MessageLoop::DoWork() Line 534 + 0x7 bytes C++ >>>> ceee_broker.exe!base::MessagePumpDefault::Run(base::MessagePump::Delegate >>>> * delegate=0x0428fb88) Line 50 + 0x11 bytes C++ >>>> ceee_broker.exe!MessageLoop::RunInternal() Line 267 C++ >>>> ceee_broker.exe!MessageLoop::RunHandler() Line 238 + 0x5 bytes C++ >>>> ceee_broker.exe!MessageLoop::Run() Line 217 C++ >>>> ceee_broker.exe!base::Thread::Run(MessageLoop * >>>> message_loop=0x0428fb88) Line 141 C++ >>>> ceee_broker.exe!base::Thread::ThreadMain() Line 167 C++ >>>> >>>> >>>> Thanks. >>>> Vitaly >>>> >>>> >>>> >>>> On Tue, Dec 14, 2010 at 17:36, Vitaly Buka <vitalybuka@google.com>wrote: >>>> >>>>> On Tue, Dec 14, 2010 at 17:19, Marc-Andre Decoste <mad@chromium.org>wrote: >>>>> >>>>>> Which other static lock? >>>>>> >>>>> I need some time to reproduce. >>>>> >>>>> >>>>>> Maybe we should try the compromise proposed by jar@, which is to >>>>>> allocate the lck and not release it? >>>>>> >>>>> >>>>> I see no benefits of compromise. Leaking such threads will anyway >>>>> produce some noise of crashes. >>>>> Any reasons to not join thread? >>>>> >>>>> Also inconsistence with this metrics may create biased results. I am >>>>> not sure that probability to finish this task is equal for different >>>>> computer. >>>>> >>>>> >>>>> >>>>>> BYE >>>>>> MAD >>>>>> Le 14 déc. 2010 20:03, <vitalybuka@chromium.org> a écrit : >>>>>> >>>>>> > I just checked. CL definitely does not fix to much. Last time it >>>>>> crashed in >>>>>> > this >>>>>> > thread on other static Lock. >>>>>> > >>>>>> > On 2010/12/15 01:00:11, Vitaly Buka wrote: >>>>>> >> Personally I rather join the thread and wait for upload. It may >>>>>> slow >>>>>> >> shutdown >>>>>> >> but I see no problem with this. >>>>>> > >>>>>> >> On 2010/12/15 00:57:47, Vitaly Buka wrote: >>>>>> >> > It fixes crash issue (or may be just move to other location). >>>>>> >> > We better should decide do we need this metrics or not. >>>>>> >> > If we are OK with losing than may be just don't sent at exit at >>>>>> all? >>>>>> >> > >>>>>> >> > On 2010/12/14 22:54:37, jar wrote: >>>>>> >> > > I believe the code is still vulnerable to the same class of >>>>>> races, as >>>>>> >> an >>>>>> >> > > unjoined thread can still attempt to use the lock after the >>>>>> static >>>>>> >> destructor >>>>>> >> > > has been run (the atexit routine does not wait for all threads >>>>>> to >>>>>> >> join). >>>>>> >> The >>>>>> >> > > existing code used a dynamic allocation and deallocation, and >>>>>> now this >>>>>> > code >>>>>> >> > > allocates sooner (before browser main), and deallocates later >>>>>> (long >>>>>> >> after >>>>>> >> > > browser main has exited)... but is still vulnerable. >>>>>> >> > > >>>>>> >> > > I think it is an improvement (race window is smaller)... and we >>>>>> could >>>>>> >> see >>>>>> >> how >>>>>> >> > > much better it does in the field. >>>>>> >> > > >>>>>> >> > > As I mentioned, better yet may be to more aggressively leak >>>>>> these >>>>>> > resources >>>>>> >> > > (leaking the lock may be enough), as that would "solve" any >>>>>> such >>>>>> >> shutdown >>>>>> >> > races >>>>>> >> > > (and hopefully not create a system resource leak). To do that >>>>>> >> full-fix, >>>>>> > you >>>>>> >> > > need the dynamic allocation of the lock that this CL removes. >>>>>> >> > > >>>>>> >> > > LGTM (since this is at least a partial improvement... but I'd >>>>>> rather >>>>>> >> see >>>>>> >> the >>>>>> >> > > full fix) >>>>>> > >>>>>> > >>>>>> > >>>>>> > http://codereview.chromium.org/5784005/ >>>>>> >>>>> >>>>> >>>> >>> >> >
LGTM. I don't know details to continue argue about this.
Performance of Chrome shutdown precludes joining most threads. We can't afford to wait for all the worker IO threads, as the wait may take an unbounded length of time. We need to be able to shutdown quickly so that we can do (for example) a restart. Historically, joining all threads (a.k.a., waiting for all threads to cleanly terminate) has been unacceptable for this reason. Jim On Tue, Dec 14, 2010 at 5:00 PM, <vitalybuka@chromium.org> wrote: > Personally I rather join the thread and wait for upload. It may slow > shutdown > but I see no problem with this. > > > On 2010/12/15 00:57:47, Vitaly Buka wrote: > >> It fixes crash issue (or may be just move to other location). >> We better should decide do we need this metrics or not. >> If we are OK with losing than may be just don't sent at exit at all? >> > > On 2010/12/14 22:54:37, jar wrote: >> > I believe the code is still vulnerable to the same class of races, as an >> > unjoined thread can still attempt to use the lock after the static >> > destructor > >> > has been run (the atexit routine does not wait for all threads to join). >> > The > >> > existing code used a dynamic allocation and deallocation, and now this >> code >> > allocates sooner (before browser main), and deallocates later (long >> after >> > browser main has exited)... but is still vulnerable. >> > >> > I think it is an improvement (race window is smaller)... and we could >> see >> > how > >> > much better it does in the field. >> > >> > As I mentioned, better yet may be to more aggressively leak these >> resources >> > (leaking the lock may be enough), as that would "solve" any such >> shutdown >> races >> > (and hopefully not create a system resource leak). To do that full-fix, >> you >> > need the dynamic allocation of the lock that this CL removes. >> > >> > LGTM (since this is at least a partial improvement... but I'd rather >> see >> > the > >> > full fix) >> > > > > http://codereview.chromium.org/5784005/ >
How about this? BYE MAD
Made one more small change to get the tests happy... BYE MAD
PING? Can I commit this now? Thanks... BYE MAD
Please anticipate that there is good chance that you'll break the build on valgrind, and need to land a suppression. As a result, please land in off-hours where you can recover gracefully without a lot of group pain. Also, please see nits below. I don't like the replicated code... but didn't see a good way to factor it out other than as per the nits. With those nit changes... LGTM p.s., extra credit nit repair: You might also consider changing WasStarted() to IsActive(), now that we're showing concerns for times after static teardown has taken place, and we transition to an inactive state (with histograms_ == NULL). ...but I'll leave it to you as an optional item. http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:908: lock_ = new Lock; nit: Please add comment about the leak here, as this will be the site that gets credit for the allocation, and needs the leak suppression. http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:968: } nit: Please replace lines 962-968 with: if (!WasStarted()) return; http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:997: } nit: Please replace lines 991-997 with: if (!WasStarted()) return;
Thanks... The linux_valgrind try bot seems happy with the change: http://build.chromium.org/p/tryserver.chromium/builders/linux_valgrind/builds... Memory test for base succeeded (though there were failures with other tests, but they don't seem to be related to our change). <http://build.chromium.org/p/tryserver.chromium/builders/linux_valgrind/builds...> Still waiting for Mac to try it out: http://build.chromium.org/p/tryserver.chromium/builders/mac_valgrind/builds/145 I still need to make the changes you requested (I've been distracted by my migration to Win7)... I'll attempt a commit tomorrow morning (EST) and see how the tree swallows it... And will react appropriately if a problem occurs... Thanks! BYE MAD On Tue, Dec 21, 2010 at 7:05 PM, <jar@chromium.org> wrote: > Please anticipate that there is good chance that you'll break the build on > valgrind, and need to land a suppression. As a result, please land in > off-hours > where you can recover gracefully without a lot of group pain. > > Also, please see nits below. I don't like the replicated code... but > didn't see > a good way to factor it out other than as per the nits. > > With those nit changes... > > LGTM > > > p.s., extra credit nit repair: You might also consider changing > WasStarted() to > IsActive(), now that we're showing concerns for times after static teardown > has > taken place, and we transition to an inactive state (with histograms_ == > NULL). > ...but I'll leave it to you as an optional item. > > > http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc > File base/metrics/histogram.cc (right): > > > http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... > base/metrics/histogram.cc:908: lock_ = new Lock; > nit: Please add comment about the leak here, as this will be the site > that gets credit for the allocation, and needs the leak suppression. > > > http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... > base/metrics/histogram.cc:968: } > nit: Please replace lines 962-968 with: > if (!WasStarted()) > return; > > > http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... > base/metrics/histogram.cc:997: } > nit: Please replace lines 991-997 with: > if (!WasStarted()) > return; > > > http://codereview.chromium.org/5784005/ >
Thanks... Addressed all comments, even the special credit naming nits... (I'm a big fan of those :-)... Will send another try to mac/linux valgrind and will commit tomorrow morning, if all goes well... BYE MAD http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:908: lock_ = new Lock; On 2010/12/22 00:05:16, jar wrote: > nit: Please add comment about the leak here, as this will be the site that gets > credit for the allocation, and needs the leak suppression. Done. http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:968: } On 2010/12/22 00:05:16, jar wrote: > nit: Please replace lines 962-968 with: > if (!WasStarted()) > return; Done. http://codereview.chromium.org/5784005/diff/23001/base/metrics/histogram.cc#n... base/metrics/histogram.cc:997: } On 2010/12/22 00:05:16, jar wrote: > nit: Please replace lines 991-997 with: > if (!WasStarted()) > return; Done.
LGTM (including extra credit for renaming ;-) ) |