|
|
Created:
6 years, 7 months ago by eroman Modified:
6 years, 7 months ago CC:
blink-reviews, Nils Barth (inactive), arv+blink, jsbell+bindings_chromium.org, sof, kouhei+bindings_chromium.org, marja+watch_chromium.org, blink-reviews-bindings_chromium.org, Nate Chapin Base URL:
svn://svn.chromium.org/blink/trunk Visibility:
Public. |
DescriptionFix crash when ExecutionContext is torn down before a crypto operation has completed.
The issue is that ActiveDOMObjects cannot be deleted as part of a different observer's contextDestroyed().
The solution taken here is to subclass PromiseResolveWithContext so that the deletion is done as part of its contextDestroyed() instead.
BUG=368821, 245025
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=173981
Patch Set 1 #Patch Set 2 : Add layout test #
Total comments: 2
Patch Set 3 : fix comment typo #
Messages
Total messages: 30 (0 generated)
Test?
Thanks, I will try to add a test shortly! It also reproduces using the existing test (crypto/worker-subtle-crypto-concurrent.html) albeit flakily.
Added a corresponding layout test, PTAL
https://codereview.chromium.org/263163006/diff/40001/Source/modules/crypto/Cr... File Source/modules/crypto/CryptoResultImpl.cpp (right): https://codereview.chromium.org/263163006/diff/40001/Source/modules/crypto/Cr... Source/modules/crypto/CryptoResultImpl.cpp:154: virtual void contextDestroyed() OVERRIDE How about adding the functionality to ScriptPromiseResolverWithContext, instead of adding a new subclass? I think the goal of ScriptPromiseResolverWithContext is to take care of this kind of complexity. yhirano@ would know better.
https://codereview.chromium.org/263163006/diff/40001/Source/modules/crypto/Cr... File Source/modules/crypto/CryptoResultImpl.cpp (right): https://codereview.chromium.org/263163006/diff/40001/Source/modules/crypto/Cr... Source/modules/crypto/CryptoResultImpl.cpp:154: virtual void contextDestroyed() OVERRIDE On 2014/05/06 01:44:02, haraken wrote: > > How about adding the functionality to ScriptPromiseResolverWithContext, instead > of adding a new subclass? > > I think the goal of ScriptPromiseResolverWithContext is to take care of this > kind of complexity. yhirano@ would know better. Sure what do you have in mind? Perhaps pass in a delegate pointer that ScriptPromiseResolverWithContext calls on context destruction?
A ScriptPromiseResolverWithContext releases almost all information when resolved, rejected or the associated ExecutionContext is stopped. So I think you don't need to clear the resolver in contextDestroyed. Can you merge PromiseState into CryptoResultImpl?
Let me try and explain the use case a little bit more clearly: When starting a WebCrypto operation, a PromiseResolverWithContext is created to communicate the result back to the JavaScript. Ordinarily the PromiseResolveWithContext is completed, and everything cleans up properly. However, if the ExecutionContext is destroyed before the PromiseResolverWithContext has been completed, then we need to abort the async WebCrypto operation which has already been started (and is running on another thread). Currently, I am registering my own Lifecycle observer on ExecutionContext to be notified of teardown (PromiseState), and using this to cancel the operation (I actually orphan it at the moment rather than cancel). HOWEVER this doesn't work. As a matter of cancelling the outstanding operation, we drop the final reference to PromiseResolverWithContext. Lifecycle notifier does not support removing observers while dispatching, so this causes the crash I am trying to fix. This is further complicated by the fact that the WebCrypto runs on its own thread. Therefore it cannot directly hold a reference to the PromiseResolverWithContext. This is why the reference to the PromiseResolverWithContext is dropped upon completion of the request OR upon destruction of the ExecutionContext. In the case of PromiseResolverWithContext created from a WebWorker thread, destruction of the ExecutionContext means that the thread is being torn down. This is therefore the last opportunity to delete the PromiseResolverWithContext from the correct thread.
Let me summarize to check if I understand correctly. - Resolution and rejection happen on the origin thread. - contextDestroyed is called on the origin thraed. - CryptoResultImpl is created on the origin thread. - CryptoResultImpl is NOT destructed on the origin thread. - PromiseState is needed to clear out states bound to the origin thread when the associated execution context is destroyed.
Adam, do we really need TemporaryChange in LifeCycleNotifier::~LifeCycleNotifier? The function removes all observers and it works correctly even if some observers are destructed and removed in contextDestroyed. If there is no problem, I think removing TemporaryChange from the ~LifeCycleNotifier is the best solution. https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit...
By the way, abandon-crypto-operation.html sometimes crashes in my local environment with PS3. There may be another problem.
Yes exactly! The circumstance when CryptoResultImpl gets destroyed from another thread is when the Promise was created from a WebWorker thread, but this WebWorker thread is stopped before the Promise has been completed. Specifically what happen is: (1) We post a task to the crypto thread, giving it a CryptoResultImpl as its threadsafe handle back into Blink. (2) The WebWorker thread is stopped, and its associated ExecutionContext torn down. (3) The crypto thread finishes its work, and tries to post a task back to the Blink WebWorker thread to complete the Promise. However posting a task now fails, because the thread no longer exists. (4) With nothing left to do, it releases the CryptoResultImpl (from crypto thread). Since step (4) is executing on the crypto thread rather than the blink thread, it is not appropriate for action to release a reference to the ScriptPromiseResolverWithContext. For that reason, I have chosen to release the ScriptPromiseResolverWithContext as part of contextDestroyed(), so if the thread dies before the Promise is completed everything that was bound to that thread has been cleaned up. PromiseState is the class which holds the reference to ScriptPromiseResolverWithContext and is deleted upon request completion, or thread death (ExecutionContext destruction).
Yes I have been debugging a flake issue happening on another of my worker tests, which I think is a different issue (although I haven't figured it out yet). When running the test for multiple iterations I get a crash in: # # Fatal error in ../../v8/src/handles.h, line 48 # CHECK(location_ != NULL) failed # Program received signal SIGILL, Illegal instruction. #0 0x00007ffff7b30362 in v8::internal::OS::Abort () #1 0x00007ffff76adaac in V8_Fatal (file=<optimized out>, #2 0x00007ffff785072b in v8::internal::Isolate::RunMicrotasks ( #3 0x00007ffff766dee1 in v8::Isolate::RunMicrotasks (this=<optimized out>, #4 0x00007fffef800abc in WebCore::(anonymous namespace)::RunMicrotasksTask::performTask (this=0x381011825810, executionContext=0x3810119c56c8) #5 0x00007ffff0f69212 in WebCore::WorkerRunLoopTask::run (this=0x38101180ddd0) #6 0x00007ffff0f671b1 in WebCore::WorkerRunLoop::run (this=0x381011934ab0, queue=..., waitMode=WebCore::WorkerRunLoop::WaitForMessage) #7 0x00007ffff0f66dea in WebCore::WorkerRunLoop::run (this=0x381011934ab0) #8 0x00007ffff0f6ac6c in WebCore::WorkerThread::runEventLoop ( this=0x381011934a90) #9 0x00007ffff0f56b55 in WebCore::DedicatedWorkerThread::runEventLoop ( this=0x381011934a90) #10 0x00007ffff0f6ab2b in WebCore::WorkerThread::workerThread ( this=0x381011934a90) #11 0x00007ffff0f6a835 in WebCore::WorkerThread::workerThreadStart ( thread=0x381011934a90) #12 0x00007fffe6599c60 in WTF::threadEntryPoint (contextData=0x15ca78ef6b20) #13 0x00007fffe659a138 in WTF::wtfThreadEntryPoint (param=0x15ca77a29f20) #14 0x00007fffe7dade9a in start_thread (arg=0x7fffd9d89700) #15 0x00007fffe7ada3fd in clone ()
On 2014/05/08 04:12:37, yhirano wrote: > Adam, do we really need TemporaryChange in > LifeCycleNotifier::~LifeCycleNotifier? Yes, let's keep that please. > The function removes all observers and it works correctly even if some observers > are destructed and removed in contextDestroyed. We also want to prevent new observers from being added during destruction. These ASSERTs make life harder for some clients, but they make the overall system easier to reason about.
I see. > the assersion on ~LifeCycleNotifier Eric, can you try the following? I'm not sure how ugly it will be, but perhaps it will be better than PS3. 1. Remove PromiseResolver 2. Make PromiseState inherit ScriptFunction and add an empty |call| function 3. Remove contextDestroyed from PromiseState 4. Pass the PromiseState to ScriptPromise::then When the ScriptFunction is called or the promise is garbage collected the ScriptFunction will be deleted.
Thanks for the suggestion. I am not quite sure if I am understanding the proposal correctly, but it sounds more involved. Wouldn't it mean: * Re-implementing the logic of PromiseResolver/PromiseResolverWithContext in order to reject/resolve the ScriptPromise, re-enter the V8 context, delay completion when the context is suspended. * Where would the ScriptPromise be deleted from? Would it be done from the ScriptFunction destructor? Is it guaranteed that doing so is safe? (Or can the ScriptFunction be deleted while ScriptPromise code is still on stack?)
In response to comment #11 (that flakiness is still observed with patch 3): I have found 2 additional problems: (1) A race in BlinkPlatformImpl::crypto() initialization (http://crbug.com/371243) (2) Tear down problem with WebWorkers + Promises (http://crubg.com/371566) causing CHECK(location_ != NULL) failure.
>#15, #16 I tried it by myself and it was nasty. PS3 LGTM. The root problem is, you want to do some asynchronous computation but you don't have a means of holding the computation states (PromiseState in this case) naturally. I think this is a general problem and I need to provide some helper class, but it may take a while. If urgent feel free to land this CL (with OWNERs' approval, of course).
Thanks! I will go ahead and land this fix then since I want the flaky crash gone :) As a follow-up I am happy to generalize this or make design changes as you see best for a cleaner solution to the problem. Let's continue that discussion over email or on the bug thread.
The CQ bit was checked by eroman@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/eroman@chromium.org/263163006/60001
FYI, CQ is re-trying this CL (attempt #1). Please consider checking whether the failures are real, and report flakes to chrome-troopers@google.com. The failing builders are: blink_presubmit on tryserver.blink (http://build.chromium.org/p/tryserver.blink/builders/blink_presubmit/builds/5102)
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: blink_presubmit on tryserver.blink (http://build.chromium.org/p/tryserver.blink/builders/blink_presubmit/builds/5108)
abarth or haraken, could you review? Thanks! ** Presubmit ERRORS ** Missing LGTM from an OWNER for these files: Source/bindings/v8/ScriptPromiseResolverWithContext.h
ping @haraken, would you mind taking a look for OWNERS approval?
LGTM
The CQ bit was checked by eroman@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/eroman@chromium.org/263163006/60001
Message was sent while issue was closed.
Change committed as 173981 |