|
|
Created:
7 years, 6 months ago by vcarbune.chromium Modified:
7 years, 6 months ago CC:
blink-reviews, feature-media-reviews_chromium.org, dglazkov+blink, eae+blinkwatch, adamk+blink_chromium.org Base URL:
svn://svn.chromium.org/blink/trunk Visibility:
Public. |
DescriptionIf an event is created using as target an HTMLMediaElement which is
currently being deleted it becomes a heap-use-after free situation.
The GenericEventQueue instance is already owned by the HTMLMediaElement,
and there already is an underlying mechanism to set the target of the
event to NULL, if their target is owner of the queue.
In order to avoid creating this reference in the first place, we enqueue
the event with a NULL target to defer the refcount increment until the
timer for dispatching the event happens (which won't happen at all if
garbage collection is already destroying the objects).
BUG=243045
R=inferno,acolwell
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=151692
Patch Set 1 #
Total comments: 1
Patch Set 2 : Update #Patch Set 3 : Build fix #Patch Set 4 : Rebased #
Total comments: 2
Patch Set 5 : Fixed build error again #
Messages
Total messages: 28 (0 generated)
I did try thinking about alternatives to this, but this seems to be the most straightforward one. One other thing is that I've had troubles reproducing this using testRunner. The solution that seemed to work is to reload it at least 5 times, and each time it crashed at least once. This makes it slow. We could remove this behavior from the test but if the issue ever appears again, we'd likely end with a flaky test.
I don't know this code well enough to review this. Adding Adam and Eric.
https://codereview.chromium.org/15739014/diff/1/Source/core/html/HTMLMediaEle... File Source/core/html/HTMLMediaElement.cpp (right): https://codereview.chromium.org/15739014/diff/1/Source/core/html/HTMLMediaEle... Source/core/html/HTMLMediaElement.cpp:521: m_asyncEventQueue->enqueueEventWithoutTarget(event.release()); This doesn't seem right to me. I believe we do want to set the target in most cases. If ~HTMLMediaElement() has run shouldn't we just avoid scheduling any events at all? What about deleting m_asyncEventQueue in the destructor or adding a m_inDestructor member variable like MediaPlayer has?
Can you help me understand how an event is fired during HTMLMediaElement destructor? That seems fishy.
On 2013/05/30 17:18:09, Dimitri Glazkov wrote: > Can you help me understand how an event is fired during HTMLMediaElement > destructor? That seems fishy. No, it's not the event being dispatched, it's being enqueued in the queue to be dispatched later (and setting the target of the event does the memory violation) If you look at the stack trace from the issue, the issue is that HTMLTrackElement::removedFrom triggers active processing (including event enqueueing) from HTMLMediaElement::didRemoveTrack.
On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > If you look at the stack trace from the issue, the issue is that > HTMLTrackElement::removedFrom triggers active processing (including > event enqueueing) from HTMLMediaElement::didRemoveTrack. I see. I think acolwell has the right general idea here. Could we get away with just calling GenericEventQueue::close as soon as we enter ~HTMLMediaElement?
On 2013/05/30 17:52:24, Dimitri Glazkov wrote: > On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > > > If you look at the stack trace from the issue, the issue is that > > HTMLTrackElement::removedFrom triggers active processing (including > > event enqueueing) from HTMLMediaElement::didRemoveTrack. > > I see. I think acolwell has the right general idea here. Could we get away with > just calling GenericEventQueue::close as soon as we enter ~HTMLMediaElement? It's not enough to close the event queue. The problem is the event creation and setting its target, which happens in HTMLMediaElement::scheduleEvent (and it triggers a reference count increase, and that's where the assert fails). I don't seem to be able to get it working with setting a variable m_inDestructor either, at least apparently even though m_deletionHasBegun == true, the destructor has not yet been called.
On 2013/05/30 23:06:50, vcarbune.chromium wrote: > On 2013/05/30 17:52:24, Dimitri Glazkov wrote: > > On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > > > > > If you look at the stack trace from the issue, the issue is that > > > HTMLTrackElement::removedFrom triggers active processing (including > > > event enqueueing) from HTMLMediaElement::didRemoveTrack. > > > > I see. I think acolwell has the right general idea here. Could we get away > with > > just calling GenericEventQueue::close as soon as we enter ~HTMLMediaElement? > > It's not enough to close the event queue. The problem is the event creation and > setting > its target, which happens in HTMLMediaElement::scheduleEvent (and it triggers a > reference > count increase, and that's where the assert fails). > > I don't seem to be able to get it working with setting a variable m_inDestructor > either, > at least apparently even though m_deletionHasBegun == true, the destructor has > not yet > been called. Something like https://codereview.chromium.org/15959010/ doesn't fix this problem? I tried running the test you provided, but it didn't generate the crash for me so I couldn't verify the fix.
On 2013/05/31 00:01:43, acolwell wrote: > On 2013/05/30 23:06:50, vcarbune.chromium wrote: > > On 2013/05/30 17:52:24, Dimitri Glazkov wrote: > > > On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > > > > > > > If you look at the stack trace from the issue, the issue is that > > > > HTMLTrackElement::removedFrom triggers active processing (including > > > > event enqueueing) from HTMLMediaElement::didRemoveTrack. > > > > > > I see. I think acolwell has the right general idea here. Could we get away > > with > > > just calling GenericEventQueue::close as soon as we enter ~HTMLMediaElement? > > > > It's not enough to close the event queue. The problem is the event creation > and > > setting > > its target, which happens in HTMLMediaElement::scheduleEvent (and it triggers > a > > reference > > count increase, and that's where the assert fails). > > > > I don't seem to be able to get it working with setting a variable > m_inDestructor > > either, > > at least apparently even though m_deletionHasBegun == true, the destructor has > > not yet > > been called. > > Something like https://codereview.chromium.org/15959010/ doesn't fix this > problem? I tried running the test you provided, but it didn't generate the crash > for me so I couldn't verify the fix. Yes, I did exactly that and it didn't fix it. The test I added generates the crash on my (debug) linux build both when running it within the browser and using the drt. I'll look again through it the first thing tomorrow.
On 2013/05/31 00:04:41, vcarbune.chromium wrote: > On 2013/05/31 00:01:43, acolwell wrote: > > On 2013/05/30 23:06:50, vcarbune.chromium wrote: > > > On 2013/05/30 17:52:24, Dimitri Glazkov wrote: > > > > On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > > > > > > > > > If you look at the stack trace from the issue, the issue is that > > > > > HTMLTrackElement::removedFrom triggers active processing (including > > > > > event enqueueing) from HTMLMediaElement::didRemoveTrack. > > > > > > > > I see. I think acolwell has the right general idea here. Could we get away > > > with > > > > just calling GenericEventQueue::close as soon as we enter > ~HTMLMediaElement? > > > > > > It's not enough to close the event queue. The problem is the event creation > > and > > > setting > > > its target, which happens in HTMLMediaElement::scheduleEvent (and it > triggers > > a > > > reference > > > count increase, and that's where the assert fails). > > > > > > I don't seem to be able to get it working with setting a variable > > m_inDestructor > > > either, > > > at least apparently even though m_deletionHasBegun == true, the destructor > has > > > not yet > > > been called. > > > > Something like https://codereview.chromium.org/15959010/ doesn't fix this > > problem? I tried running the test you provided, but it didn't generate the > crash > > for me so I couldn't verify the fix. > > Yes, I did exactly that and it didn't fix it. The test I added generates the > crash on my (debug) linux build both when running it within the browser and > using the drt. I'll look again through it the first thing tomorrow. In the car ride home, I realized that HTMLMediaElement::stop() probably needs a m_asyncEventQueue->close() call so that events can't get queued between the stop() call and destruction. I bet that might also be a reason we are getting into trouble.
On 2013/05/31 02:13:38, acolwell wrote: > On 2013/05/31 00:04:41, vcarbune.chromium wrote: > > On 2013/05/31 00:01:43, acolwell wrote: > > > On 2013/05/30 23:06:50, vcarbune.chromium wrote: > > > > On 2013/05/30 17:52:24, Dimitri Glazkov wrote: > > > > > On 2013/05/30 17:44:34, vcarbune.chromium wrote: > > > > > > > > > > > > If you look at the stack trace from the issue, the issue is that > > > > > > HTMLTrackElement::removedFrom triggers active processing (including > > > > > > event enqueueing) from HTMLMediaElement::didRemoveTrack. > > > > > > > > > > I see. I think acolwell has the right general idea here. Could we get > away > > > > with > > > > > just calling GenericEventQueue::close as soon as we enter > > ~HTMLMediaElement? > > > > > > > > It's not enough to close the event queue. The problem is the event > creation > > > and > > > > setting > > > > its target, which happens in HTMLMediaElement::scheduleEvent (and it > > triggers > > > a > > > > reference > > > > count increase, and that's where the assert fails). > > > > > > > > I don't seem to be able to get it working with setting a variable > > > m_inDestructor > > > > either, > > > > at least apparently even though m_deletionHasBegun == true, the destructor > > has > > > > not yet > > > > been called. > > > > > > Something like https://codereview.chromium.org/15959010/ doesn't fix this > > > problem? I tried running the test you provided, but it didn't generate the > > crash > > > for me so I couldn't verify the fix. > > > > Yes, I did exactly that and it didn't fix it. The test I added generates the > > crash on my (debug) linux build both when running it within the browser and > > using the drt. I'll look again through it the first thing tomorrow. > > In the car ride home, I realized that HTMLMediaElement::stop() probably needs a > m_asyncEventQueue->close() call so that events can't get queued between the > stop() call and destruction. I bet that might also be a reason we are getting > into trouble. That's certainly something we need to add in the code, but it doesn't fix the current issue (note that in the crash stack trace there's no call to stop()). There's also a behaviour issue I've been wondering about since I started looking into this: because updateActiveTextTrackCues is responsible of dispatching cue enter, exit and change events, it makes little sense to dispatch these events when the track has just been removed from the media element. In fact, each cue of the track that was active is set silently (without an event dispatch) as inactive in removeTrack() as a result of another security issue solved in WebKit. Shortly, resuming the options that can fix the problem (at least on my local linux debug build): 1) Exit early from updateActiveTextTrackCues if the call comes from didRemoveTrack 2) Append events without setting their target to owner, to avoid reference count increases Even though I did 1) and no tests have failed at all, I have some concerns about maybe unexpected behavior if fix is merged in earlier release branches. However, 2) ensures the same behavior from a functionality perspective, but feels more like a hack. The other problem I can't really understand is why m_deletionHasBegun = true and destructor not called.
On 2013/05/31 13:17:34, vcarbune.chromium wrote: > Shortly, resuming the options that can fix the problem (at least on my local > linux debug build): > 1) Exit early from updateActiveTextTrackCues if the call comes from > didRemoveTrack > 2) Append events without setting their target to owner, to avoid reference count > increases > > Even though I did 1) and no tests have failed at all, I have some concerns about > maybe unexpected behavior if fix is merged in earlier release branches. However, > 2) ensures the same behavior from a functionality perspective, but feels more > like a hack. > > The other problem I can't really understand is why m_deletionHasBegun = true and > destructor not called. After digging around a bunch I discovered that m_deletionHasBegun is getting set in addChildNodesToDeletionQueue()(https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/dom/ContainerNodeAlgorithms.h&q=addChildNodesToDeletionQueue&sq=package:chromium&type=cs&l=173) This code is really strange because the behavior is totally different depending on whether something has a reference to HTMLMediaElement or not. If something has a reference then HTMLMediaElement::removedFrom() will get called so it could theoretically do something to avoid scheduling these events. If nothing has a reference to HTMLMediaElement, then it never gets a notification that it is being removed. Which situation you are in appears to be completely timing dependent which is why the test appears to be flaky and requires multiple loads to trigger the issue. I'm actually surprised that HTMLMediaElement doesn't receive some sort of notification that it is being removed from the page and doesn't have its ActiveDOMObject::stop() override called before being destructed. Unfortunately I'm not an expert in those areas so I don't know how to guide you there. Now that I see what is going on, I'm more comfortable with your fix. IIUC it essentially defers the target refcount increment until GenericEventQueue::timerFired() runs. timerFired() will never end up running in this situation because the queue gets destroyed when the element is destroyed. Instead of adding the new method to GenericEventQueue, I think you should just remove the ASSERT() that verifies that the target is set. I think it is easier to understand that not setting the target will simply fire the event at the queue's owner instead of trying to figure out when it is appropriate to call the NoTarget varient.
On 2013/05/31 22:54:20, acolwell wrote: > On 2013/05/31 13:17:34, vcarbune.chromium wrote: > > Shortly, resuming the options that can fix the problem (at least on my local > > linux debug build): > > 1) Exit early from updateActiveTextTrackCues if the call comes from > > didRemoveTrack > > 2) Append events without setting their target to owner, to avoid reference > count > > increases > > > > Even though I did 1) and no tests have failed at all, I have some concerns > about > > maybe unexpected behavior if fix is merged in earlier release branches. > However, > > 2) ensures the same behavior from a functionality perspective, but feels more > > like a hack. > > > > The other problem I can't really understand is why m_deletionHasBegun = true > and > > destructor not called. > After digging around a bunch I discovered that m_deletionHasBegun is getting set > in > addChildNodesToDeletionQueue()(https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/dom/ContainerNodeAlgorithms.h&q=addChildNodesToDeletionQueue&sq=package:chromium&type=cs&l=173) > > > This code is really strange because the behavior is totally different depending > on whether something has a reference to HTMLMediaElement or not. If something > has a reference then HTMLMediaElement::removedFrom() will get called so it could > theoretically do something to avoid scheduling these events. If nothing has a > reference to HTMLMediaElement, then it never gets a notification that it is > being removed. Which situation you are in appears to be completely timing > dependent which is why the test appears to be flaky and requires multiple loads > to trigger the issue. > > I'm actually surprised that HTMLMediaElement doesn't receive some sort of > notification that it is being removed from the page and doesn't have its > ActiveDOMObject::stop() override called before being destructed. Unfortunately > I'm not an expert in those areas so I don't know how to guide you there. Thank you for taking the time and looking through this part of the code, completing the overview of this whole picture. I don't feel familiar enough either, but definitely there's something confusing happening there. > Now that I see what is going on, I'm more comfortable with your fix. IIUC it > essentially defers the target refcount increment until > GenericEventQueue::timerFired() runs. timerFired() will never end up running in > this situation because the queue gets destroyed when the element is destroyed. > Instead of adding the new method to GenericEventQueue, I think you should just > remove the ASSERT() that verifies that the target is set. I think it is easier > to understand that not setting the target will simply fire the event at the > queue's owner instead of trying to figure out when it is appropriate to call the > NoTarget varient. OK, I updated the code and description to match your suggestion. Even though we fixed this issue, I still think that updateActiveTextTrackCues shouldn't be called at all while the element is being removed from it's parent. But probably that also needs some correction in the spec. Anyway, I'll probably open a different issue.
On 2013/06/01 10:49:02, vcarbune.chromium wrote: > Thank you for taking the time and looking through this part of the code, > completing the overview of this whole picture. I don't feel familiar enough > either, but definitely there's something confusing happening there. No problem. I needed to dig deeper to make sure I was providing helpful feedback. ;) Thanks for fixing this. > > Now that I see what is going on, I'm more comfortable with your fix. IIUC it > > essentially defers the target refcount increment until > > GenericEventQueue::timerFired() runs. timerFired() will never end up running > in > > this situation because the queue gets destroyed when the element is destroyed. > > Instead of adding the new method to GenericEventQueue, I think you should just > > remove the ASSERT() that verifies that the target is set. I think it is easier > > to understand that not setting the target will simply fire the event at the > > queue's owner instead of trying to figure out when it is appropriate to call > the > > NoTarget varient. > > OK, I updated the code and description to match your suggestion. Thanks. LGTM > Even though we fixed this issue, I still think that updateActiveTextTrackCues > shouldn't be called at all while the element is being removed from it's parent. > But probably that also needs some correction in the spec. Anyway, I'll probably > open a different issue. SGTM.
lgtm
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vcarbune@chromium.org/15739014/15001
Sorry for I got bad news for ya. Compile failed with a clobber build on linux_layout_rel. http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=linux_layo... Your code is likely broken or HEAD is junk. Please ensure your code is not broken then alert the build sheriffs. Look at the try server FAQ for more details.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vcarbune@chromium.org/15739014/30001
On 2013/06/02 18:59:09, I haz the power (commit-bot) wrote: > Sorry for I got bad news for ya. > Compile failed with a clobber build on linux_layout_rel. > http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=linux_layo... > Your code is likely broken or HEAD is junk. Please ensure your > code is not broken then alert the build sheriffs. > Look at the try server FAQ for more details. yay! this one was an easy one to fix. so, i uploaded it instead :)
Retried try job too often on win_layout_rel for step(s) webkit_tests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_layout...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vcarbune@chromium.org/15739014/30001
Retried try job too often on win_layout_rel for step(s) webkit_tests http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=win_layout...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vcarbune@chromium.org/15739014/30001
https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... File Source/core/html/HTMLMediaElement.cpp (right): https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... Source/core/html/HTMLMediaElement.cpp:294: m_asyncEventQueue.close(); This line was a compile failure, that is why i uploaded the build fix.
On 2013/06/03 16:47:26, aarya wrote: > https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... > File Source/core/html/HTMLMediaElement.cpp (right): > > https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... > Source/core/html/HTMLMediaElement.cpp:294: m_asyncEventQueue.close(); > This line was a compile failure, that is why i uploaded the build fix. We are planning to close code 28 tmrw, please try to commit patch before that [make sure to take the build fix]. The patch ran green on all bots, except win_layout_rel which seems jacked up.
https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... File Source/core/html/HTMLMediaElement.cpp (right): https://codereview.chromium.org/15739014/diff/49002/Source/core/html/HTMLMedi... Source/core/html/HTMLMediaElement.cpp:294: m_asyncEventQueue.close(); On 2013/06/03 16:47:26, aarya wrote: > This line was a compile failure, that is why i uploaded the build fix. Argh, sorry about that, I didn't realize you uploaded another patch. I'll fix it and dcommit.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/vcarbune@chromium.org/15739014/28002
Message was sent while issue was closed.
Change committed as 151692 |