|
|
Created:
6 years, 9 months ago by sof Modified:
6 years, 8 months ago CC:
blink-reviews, eae+blinkwatch, falken, dglazkov+blink, adamk+blink_chromium.org, Inactive, horo+watch_chromium.org, kinuko+watch, rwlbuis, dcheng Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Visibility:
Public. |
DescriptionExplicit notification and removal of lifecycle observers.
To help make disposing of execution contexts cleaner in a setting
where both a context and its lifetime observers are being finalized
by a garbage collector, add support for explicitly clearing out
all lifecycle observers:
protected:
void LifecycleNotifier<T>::removeAndNotifyAllObservers()
Notifying them that the context has been destroyed while doing so.
An operation that was previously (only) done by the LifecycleNotifier
destructor.
Also provide a convenience method over ExecutionContext for clearing
out its observers:
void ExecutionContext::removeAllLifecycleObservers()
The operation is used here when shutting down and finalizing a worker
thread and its global scope -- explicitly severing the connection
of all observers to its execution context before performing the
final garbage collection.
Doing so ensures that all observer objects being finalized during that
garbage collection will not have access to its execution context object.
A heap object that may or may not have been finalized by the garbage
collector already.
R=haraken@chromium.org,ager@chromium.org,abarth@chromium.org
BUG=340522
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=169435
Patch Set 1 #Patch Set 2 : Complete set of removeAndNotifyAllObservers() implementations #Patch Set 3 : Delay notifying thread's worker proxy when dispose()ing #Patch Set 4 : Make LifeCycleNotifier<T>::removeAndNotifyAllObservers() protect:ed #Patch Set 5 : Delay proxy notification #
Total comments: 5
Patch Set 6 : Try to clarify shutdown step sequencing a bit. #
Total comments: 1
Patch Set 7 : Comment improvements #Patch Set 8 : Perform ThreadState detachment last #
Total comments: 2
Messages
Total messages: 35 (0 generated)
When you get the chance, please take a look. I think this is ready now. However, my attempts to also have non-Oilpan code call dispose() followed by clearing the ref-counted pointer to WorkerGlobalScope (in WorkerThread::workerThread()) runs into some a timing-sensitive shutdown crash on linux_blink bots. Unable to repro locally at all, but it suggests there is some ref-counting imbalance problem hiding somewhere on trunk. Most frustrating. Hence, this CL keeps the shutdown sequence for non-Oilpan builds unaltered.
On 2014/03/12 11:03:35, sof wrote: > When you get the chance, please take a look. > > I think this is ready now. > > However, my attempts to also have non-Oilpan code call dispose() > followed by clearing the ref-counted pointer to WorkerGlobalScope > (in WorkerThread::workerThread()) runs into some a timing-sensitive > shutdown crash on linux_blink bots. Unable to repro locally at all, > but it suggests there is some ref-counting imbalance problem > hiding somewhere on trunk. Most frustrating. > > Hence, this CL keeps the shutdown sequence for non-Oilpan builds > unaltered. Just to clarify, it is a single test that occasionally behaves flakily (worker-terminate.html) -- http://build.chromium.org/p/tryserver.chromium/builders/linux_blink/builds/16236
On 2014/03/12 11:48:08, sof wrote: > On 2014/03/12 11:03:35, sof wrote: ... > > Just to clarify, it is a single test that occasionally behaves flakily > (worker-terminate.html) -- > http://build.chromium.org/p/tryserver.chromium/builders/linux_blink/builds/16236 The problem that this non-Oilpan test showed up has been addressed in the latest patchset.
LGTM. abarth: would you take another look? https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... File Source/core/workers/WorkerThread.cpp (right): https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... Source/core/workers/WorkerThread.cpp:147: ThreadState::detach(); Shall we move ThreadState::detach() at the very end of this method? A thread should be detached at the very end of the thread execution.
https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... File Source/core/workers/WorkerThread.cpp (right): https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... Source/core/workers/WorkerThread.cpp:147: ThreadState::detach(); On 2014/03/17 01:22:56, haraken wrote: > > Shall we move ThreadState::detach() at the very end of this method? A thread > should be detached at the very end of the thread execution. As far as I read your comment in https://codereview.chromium.org/196033004/#msg6, probably do you need to call ThreadState::detach() before workerReportingProxy().workerGlobalScopeDestroyed()? Then I might want to have a comment about that.
LGTM I agree with Haraken that it would be nice to have ThreadState::detach() as the last call. If we cannot do that I would like to understand why. :-) https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... File Source/core/workers/WorkerGlobalScope.cpp (right): https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... Source/core/workers/WorkerGlobalScope.cpp:200: removeAllLifecycleObservers(); This seems subtle enough that we should probably have a comment? The observers for the worker global scope access the thread member. Therefore, they have to be notified before clearing that member.
PS: The description of the issue is a little misleading. "Doing so ensures that all observer objects being finalized during that garbage collection will not have access to its execution context object. A heap object that may or may not have been finalized by the garbage collector already." The last part is not true as far as I can tell? There is no actual use after free involved in the thread member issue. The ContextLifecycleObserver pattern works just fine and none of the observers will access the worker global scope after it has been destructed. However, the observer code will not run before either the observer or the WorkerGlobalProxy dies. Therefore, we cannot clear out the m_thread member before the WorkerGlobalProxy is dead (at which point all of the observers have run). This patch makes it possible to clear out m_thread at the point where we know that the worker global proxy will die by making sure to run all the observer code at that point (instead of as part of the destructor) and then clear out the thread member.
On 2014/03/17 07:00:57, Mads Ager (chromium) wrote: > PS: The description of the issue is a little misleading. > > "Doing so ensures that all observer objects being finalized during that > garbage collection will not have access to its execution context object. > A heap object that may or may not have been finalized by the garbage > collector already." > > The last part is not true as far as I can tell? There is no actual use after > free involved in the thread member issue. The ContextLifecycleObserver pattern > works just fine and none of the observers will access the worker global scope > after it has been destructed. However, the observer code will not run before > either the observer or the WorkerGlobalProxy dies. Therefore, we cannot clear > out the m_thread member before the WorkerGlobalProxy is dead (at which point all > of the observers have run). This patch makes it possible to clear out m_thread > at the point where we know that the worker global proxy will die by making sure > to run all the observer code at that point (instead of as part of the > destructor) and then clear out the thread member. That is all correct and what I was trying to convey; there is no use-after-free bug lurking here (at least that we know of.) You avoid the situation where an observer's destructor may have access to the to-be-finalized context of the WorkerGlobalScope. e.g., ~ActiveDOMObject() may or may not peek at the context's thread object. Better to avoid such non-determinism.
On 2014/03/17 06:46:14, Mads Ager (chromium) wrote: > LGTM > > I agree with Haraken that it would be nice to have ThreadState::detach() as the > last call. If we cannot do that I would like to understand why. :-) > I didn't read his comment that way :) Just that the proxy notification has to be done after we've thoroughly cleaned out the heap (and its WorkerGlobalScope.) Which I agree with; doing ThreadState::detach() last and after the proxy notification sounds like status-quo -- you run the risk of process/thread scheduling letting that notification be responded to before ThreadState::detach() has completed & the global scope will in some cases be attempted destroyed once more. (That is how I read what https://codereview.chromium.org/196033004/ is running into.)
https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... File Source/core/workers/WorkerGlobalScope.cpp (right): https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... Source/core/workers/WorkerGlobalScope.cpp:200: removeAllLifecycleObservers(); On 2014/03/17 06:46:14, Mads Ager (chromium) wrote: > This seems subtle enough that we should probably have a comment? The observers > for the worker global scope access the thread member. Therefore, they have to be > notified before clearing that member. Yes, tried to clarify the reason for explicitly doing it right here. https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... File Source/core/workers/WorkerThread.cpp (right): https://codereview.chromium.org/194923007/diff/80001/Source/core/workers/Work... Source/core/workers/WorkerThread.cpp:147: ThreadState::detach(); On 2014/03/17 01:25:45, haraken wrote: > On 2014/03/17 01:22:56, haraken wrote: > > > > Shall we move ThreadState::detach() at the very end of this method? A thread > > should be detached at the very end of the thread execution. > > As far as I read your comment in > https://codereview.chromium.org/196033004/#msg6, probably do you need to call > ThreadState::detach() before > workerReportingProxy().workerGlobalScopeDestroyed()? Then I might want to have a > comment about that. Added a comment, which might address this? (not sure..) Notification has to come after the (ThreadState) detach.
> > > Shall we move ThreadState::detach() at the very end of this method? A thread > > > should be detached at the very end of the thread execution. > > > > As far as I read your comment in > > https://codereview.chromium.org/196033004/#msg6, probably do you need to call > > ThreadState::detach() before > > workerReportingProxy().workerGlobalScopeDestroyed()? Then I might want to have > a > > comment about that. > > Added a comment, which might address this? (not sure..) Notification has to come > after the (ThreadState) detach. LGTM. (As mads mentioned, I wanted to understand why we cannot put ThreadState::detach() as the last call.)
On 2014/03/17 07:38:10, sof wrote: > On 2014/03/17 06:46:14, Mads Ager (chromium) wrote: > > LGTM > > > > I agree with Haraken that it would be nice to have ThreadState::detach() as > the > > last call. If we cannot do that I would like to understand why. :-) > > > > I didn't read his comment that way :) Just that the proxy notification has to be > done after we've thoroughly cleaned out the heap (and its WorkerGlobalScope.) > > Which I agree with; doing ThreadState::detach() last and after the proxy > notification sounds like status-quo -- you run the risk of process/thread > scheduling letting that notification be responded to before > ThreadState::detach() has completed & the global scope will in some cases be > attempted destroyed once more. (That is how I read what > https://codereview.chromium.org/196033004/ is running into.) I think it is fine to have thread state detach at the end here. The problem https://codereview.chromium.org/196033004/ addressed was in this code itself and not a destructor of a heap allocated object. The problem was the m_workerGlobalScope = nullptr assignment after notification. m_workerGlobalScope is a persistent handle owned by the main thread. Therefore, if you notify before doing the assignement to m_workerGlobalScope, the main thread could respond to the notification, delete the thread object which deletes the persistent and the nullptr assignment will be a use-after-free. However, as far as I can tell the detach call should be safe to have where it is now or at the end. I personally like having it at the end because it is then completely clear that this thread is done with all it wants to do, but I don't mind the current placement either. :)
Still LGTM! https://codereview.chromium.org/194923007/diff/100001/Source/core/workers/Wor... File Source/core/workers/WorkerThread.cpp (right): https://codereview.chromium.org/194923007/diff/100001/Source/core/workers/Wor... Source/core/workers/WorkerThread.cpp:141: // The below assignment will destroy the context, which will in turn notify messaging proxy. This comment should be updated now that the notification is explicit below.
On 2014/03/17 08:20:37, Mads Ager (chromium) wrote: .. > > The problem was the m_workerGlobalScope = nullptr assignment after notification. > m_workerGlobalScope is a persistent handle owned by the main thread. Therefore, > if you notify before doing the assignement to m_workerGlobalScope, the main > thread could respond to the notification, delete the thread object which deletes > the persistent and the nullptr assignment will be a use-after-free. However, as > far as I can tell the detach call should be safe to have where it is now or at > the end. I personally like having it at the end because it is then completely > clear that this thread is done with all it wants to do, but I don't mind the > current placement either. :) Thanks, I see, so those are the object relationships at play here. Then as long as we don't touch the WorkerThread object after notification, this should work out fine. And ThreadState::detach() won't do that. I've consequently moved ThreadState::detach() last.
https://codereview.chromium.org/194923007/diff/140001/Source/platform/Lifecyc... File Source/platform/LifecycleNotifier.h (right): https://codereview.chromium.org/194923007/diff/140001/Source/platform/Lifecyc... Source/platform/LifecycleNotifier.h:51: virtual ~LifecycleNotifier(); Does this mean we can make the destructor non-virtual?
https://codereview.chromium.org/194923007/diff/140001/Source/platform/Lifecyc... File Source/platform/LifecycleNotifier.h (right): https://codereview.chromium.org/194923007/diff/140001/Source/platform/Lifecyc... Source/platform/LifecycleNotifier.h:51: virtual ~LifecycleNotifier(); On 2014/03/17 22:48:05, abarth wrote: > Does this mean we can make the destructor non-virtual? That's not readily doable in practice, as the notifier also keeps virtual methods. Compilers do to warn noisily when generating code for 'delete' over such classes.
The CQ bit was checked by sigbjornf@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sigbjornf@opera.com/194923007/140001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: tryserver.blink on blink_presubmit
On 2014/03/18 08:46:58, I haz the power (commit-bot) wrote: > Try jobs failed on following builders: > tryserver.blink on blink_presubmit Needs Source/platform/ owner approval.
rslgtm
The CQ bit was checked by sigbjornf@opera.com
On 2014/03/18 09:26:37, tkent wrote: > rslgtm thanks!
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sigbjornf@opera.com/194923007/140001
Message was sent while issue was closed.
Change committed as 169435
Message was sent while issue was closed.
Some problems have surfaced as a result of this one, http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40... Reverting to address.
Message was sent while issue was closed.
A revert of this CL has been created in https://codereview.chromium.org/203393002/ by sigbjornf@opera.com. The reason for reverting is: http://test-results.appspot.com/dashboards/flakiness_dashboard.html#group=%40....
Message was sent while issue was closed.
The LifecycleNotifier changes here won't be needed, as we're unable to strictly finalize/dispose WorkerGlobalScopes in the general case. https://codereview.chromium.org/203233004/ makes the necessary shutdown changes.
Message was sent while issue was closed.
On 2014/03/18 14:45:21, sof wrote: > The LifecycleNotifier changes here won't be needed, as we're unable to strictly > finalize/dispose WorkerGlobalScopes in the general case. > > https://codereview.chromium.org/203233004/ makes the necessary shutdown changes. I'm sorry for asking you the same question as before, but I'd be happy if you could help me understand: Why can't we call 'm_thread = 0' in WorkerGlobalScope::dispose()? Why does the following logic not work? void WorkerGlobalScope::dispose() { ...; willDestroyContext(); // This clears m_executionContext of ActiveDOMObjects associated with this WorkerGlobalScope. m_thread = 0; }
Message was sent while issue was closed.
On 2014/04/16 13:01:28, haraken wrote: > On 2014/03/18 14:45:21, sof wrote: > > The LifecycleNotifier changes here won't be needed, as we're unable to > strictly > > finalize/dispose WorkerGlobalScopes in the general case. > > > > https://codereview.chromium.org/203233004/ makes the necessary shutdown > changes. > > I'm sorry for asking you the same question as before, but I'd be happy if you > could help me understand: > > Why can't we call 'm_thread = 0' in WorkerGlobalScope::dispose()? Why does the > following logic not work? > > void WorkerGlobalScope::dispose() { > ...; > willDestroyContext(); // This clears m_executionContext of ActiveDOMObjects > associated with this WorkerGlobalScope. > m_thread = 0; > } That runs into problems with objects that need to keep the execution context alive until they really are destructed -- the 2nd paragraph of ager's fine description in https://codereview.chromium.org/203233004/ explains in some detail. Overall, that description captures the offline discussion we had about why the current shutdown sequence is the one needed. If you can make it simpler, that'd be (again) awesome, of course :-)
Message was sent while issue was closed.
On 2014/04/16 13:38:33, sof wrote: > On 2014/04/16 13:01:28, haraken wrote: > > On 2014/03/18 14:45:21, sof wrote: > > > The LifecycleNotifier changes here won't be needed, as we're unable to > > strictly > > > finalize/dispose WorkerGlobalScopes in the general case. > > > > > > https://codereview.chromium.org/203233004/ makes the necessary shutdown > > changes. > > > > I'm sorry for asking you the same question as before, but I'd be happy if you > > could help me understand: > > > > Why can't we call 'm_thread = 0' in WorkerGlobalScope::dispose()? Why does the > > following logic not work? > > > > void WorkerGlobalScope::dispose() { > > ...; > > willDestroyContext(); // This clears m_executionContext of ActiveDOMObjects > > associated with this WorkerGlobalScope. > > m_thread = 0; > > } > > That runs into problems with objects that need to keep the execution context > alive until they really are destructed -- the 2nd paragraph of ager's fine > description in https://codereview.chromium.org/203233004/ explains in some > detail. Overall, that description captures the offline discussion we had about > why the current shutdown sequence is the one needed. > > If you can make it simpler, that'd be (again) awesome, of course :-) If an ActiveDOMObject is the only problematic case, I guess that it's not a problem. ~ActiveDOMObject() has the following code: ActiveDOMObject::~ActiveDOMObject() { // ActiveDOMObject may be inherited by a sub-class whose life-cycle // exceeds that of the associated ExecutionContext. In those cases, // m_executionContext would/should have been nullified by // ContextLifecycleObserver::contextDestroyed() (which we implement / // inherit). Hence, we should ensure that this is not 0 before use it // here. if (!executionContext()) return; ASSERT(m_suspendIfNeededCalled); ASSERT(executionContext()->isContextThread()); } If we clear LifecycleObserver::m_lifecycleContext in willDestroyContext(), ~ActiveDOMObject() will hit the "if (!executionContext())" branch and return immediately.
Message was sent while issue was closed.
On 2014/04/16 14:13:35, haraken wrote: > On 2014/04/16 13:38:33, sof wrote: > > On 2014/04/16 13:01:28, haraken wrote: > > > On 2014/03/18 14:45:21, sof wrote: > > > > The LifecycleNotifier changes here won't be needed, as we're unable to > > > strictly > > > > finalize/dispose WorkerGlobalScopes in the general case. > > > > > > > > https://codereview.chromium.org/203233004/ makes the necessary shutdown > > > changes. > > > > > > I'm sorry for asking you the same question as before, but I'd be happy if > you > > > could help me understand: > > > > > > Why can't we call 'm_thread = 0' in WorkerGlobalScope::dispose()? Why does > the > > > following logic not work? > > > > > > void WorkerGlobalScope::dispose() { > > > ...; > > > willDestroyContext(); // This clears m_executionContext of > ActiveDOMObjects > > > associated with this WorkerGlobalScope. > > > m_thread = 0; > > > } > > > > That runs into problems with objects that need to keep the execution context > > alive until they really are destructed -- the 2nd paragraph of ager's fine > > description in https://codereview.chromium.org/203233004/ explains in some > > detail. Overall, that description captures the offline discussion we had about > > why the current shutdown sequence is the one needed. > > > > If you can make it simpler, that'd be (again) awesome, of course :-) > > If an ActiveDOMObject is the only problematic case, I guess that it's not a > problem. ~ActiveDOMObject() has the following code: > > ActiveDOMObject::~ActiveDOMObject() { > // ActiveDOMObject may be inherited by a sub-class whose life-cycle > // exceeds that of the associated ExecutionContext. In those cases, > // m_executionContext would/should have been nullified by > // ContextLifecycleObserver::contextDestroyed() (which we implement / > // inherit). Hence, we should ensure that this is not 0 before use it > // here. > if (!executionContext()) > return; > ASSERT(m_suspendIfNeededCalled); > ASSERT(executionContext()->isContextThread()); > } > > If we clear LifecycleObserver::m_lifecycleContext in willDestroyContext(), > ~ActiveDOMObject() will hit the "if (!executionContext())" branch and return > immediately. It's not the only issue (I thought it was when going ahead with this CL) -- cf SQLCallbackWrapper that that description mentions uses the execution context to control where to shut down.
Message was sent while issue was closed.
On 2014/04/16 14:17:19, sof wrote: > On 2014/04/16 14:13:35, haraken wrote: > > On 2014/04/16 13:38:33, sof wrote: > > > On 2014/04/16 13:01:28, haraken wrote: > > > > On 2014/03/18 14:45:21, sof wrote: > > > > > The LifecycleNotifier changes here won't be needed, as we're unable to > > > > strictly > > > > > finalize/dispose WorkerGlobalScopes in the general case. > > > > > > > > > > https://codereview.chromium.org/203233004/ makes the necessary shutdown > > > > changes. > > > > > > > > I'm sorry for asking you the same question as before, but I'd be happy if > > you > > > > could help me understand: > > > > > > > > Why can't we call 'm_thread = 0' in WorkerGlobalScope::dispose()? Why does > > the > > > > following logic not work? > > > > > > > > void WorkerGlobalScope::dispose() { > > > > ...; > > > > willDestroyContext(); // This clears m_executionContext of > > ActiveDOMObjects > > > > associated with this WorkerGlobalScope. > > > > m_thread = 0; > > > > } > > > > > > That runs into problems with objects that need to keep the execution context > > > alive until they really are destructed -- the 2nd paragraph of ager's fine > > > description in https://codereview.chromium.org/203233004/ explains in some > > > detail. Overall, that description captures the offline discussion we had > about > > > why the current shutdown sequence is the one needed. > > > > > > If you can make it simpler, that'd be (again) awesome, of course :-) > > > > If an ActiveDOMObject is the only problematic case, I guess that it's not a > > problem. ~ActiveDOMObject() has the following code: > > > > ActiveDOMObject::~ActiveDOMObject() { > > // ActiveDOMObject may be inherited by a sub-class whose life-cycle > > // exceeds that of the associated ExecutionContext. In those cases, > > // m_executionContext would/should have been nullified by > > // ContextLifecycleObserver::contextDestroyed() (which we implement / > > // inherit). Hence, we should ensure that this is not 0 before use it > > // here. > > if (!executionContext()) > > return; > > ASSERT(m_suspendIfNeededCalled); > > ASSERT(executionContext()->isContextThread()); > > } > > > > If we clear LifecycleObserver::m_lifecycleContext in willDestroyContext(), > > ~ActiveDOMObject() will hit the "if (!executionContext())" branch and return > > immediately. > > It's not the only issue (I thought it was when going ahead with this CL) -- cf > SQLCallbackWrapper that that description mentions uses the execution context to > control where to shut down. ah, this looks like a real issue. Thanks for the clarification! |