|
|
Created:
4 years, 3 months ago by no sievers Modified:
4 years, 3 months ago CC:
chromium-reviews, cc-bugs_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Descriptioncc: Avoid ResourceProvider nullptr deref
When ReleaseCompositorFrameSink() was called explicitly we delete
|resource_provider_|, but we might still have a BeginMainFrame
occur and processing UIResourceRequests will try deref it.
BUG=643721
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel
Committed: https://crrev.com/68dc75d78f547eeb4c80a9ff8ced7e1bdb62b485
Cr-Commit-Position: refs/heads/master@{#420742}
Patch Set 1 #
Total comments: 2
Patch Set 2 : doh #
Total comments: 2
Patch Set 3 : not for review: fix crash #Patch Set 4 : not for review: fix crash #Patch Set 5 : not for review: fix crash #Patch Set 6 : not for review: fix crash #Patch Set 7 : not for review: fix crash #Patch Set 8 : not for review: fix crash #
Total comments: 4
Patch Set 9 : not for review: fix crash #
Messages
Total messages: 35 (16 generated)
Description was changed from ========== not for review: fix crash BUG=643721 ========== to ========== not for review: fix crash BUG=643721 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel ==========
danakj@chromium.org changed reviewers: + danakj@chromium.org
https://codereview.chromium.org/2349143003/diff/1/cc/trees/layer_tree_host_im... File cc/trees/layer_tree_host_impl.cc (right): https://codereview.chromium.org/2349143003/diff/1/cc/trees/layer_tree_host_im... cc/trees/layer_tree_host_impl.cc:3711: MarkUIResourceNotEvicted(uid); I think you wanna mark evicted (instead of *not evicted*) if !valid
Description was changed from ========== not for review: fix crash BUG=643721 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel ========== to ========== cc: Avoid ResourceProvider nullptr deref When ReleaseCompositorFrameSink() was called explicitly we delete |resource_provider_|, but we might still have a BeginMainFrame occur and processing UIResourceRequests will try deref it. BUG=643721 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel ==========
well i guess i'll land this after all... https://codereview.chromium.org/2349143003/diff/1/cc/trees/layer_tree_host_im... File cc/trees/layer_tree_host_impl.cc (right): https://codereview.chromium.org/2349143003/diff/1/cc/trees/layer_tree_host_im... cc/trees/layer_tree_host_impl.cc:3711: MarkUIResourceNotEvicted(uid); On 2016/09/20 01:17:34, danakj wrote: > I think you wanna mark evicted (instead of *not evicted*) if !valid Doh, thanks.. done!
sunnyps@chromium.org changed reviewers: + sunnyps@chromium.org
Just FYI I'm trying to prevent the BeginMainFrame (or any scheduler action) from happening as part of the retro frame removal CL (https://codereview.chromium.org/2339633003/)
On 2016/09/20 23:01:55, sunnyps wrote: > Just FYI I'm trying to prevent the BeginMainFrame (or any scheduler action) from > happening as part of the retro frame removal CL > (https://codereview.chromium.org/2339633003/) That's even better. I'm asked though to land a fix before 2pm tomorrow to unblock the dev build.
hmmm, actually my CL won't fix this issue - here the BeginMainFrame task is posted before CFS is lost but run afterwards. My CL only takes care of the swap ack inside STP::Stop/~ProxyImpl which causes the scheduler to run. https://codereview.chromium.org/2349143003/diff/20001/cc/trees/layer_tree_hos... File cc/trees/layer_tree_host_impl.h (right): https://codereview.chromium.org/2349143003/diff/20001/cc/trees/layer_tree_hos... cc/trees/layer_tree_host_impl.h:840: bool has_valid_compositor_frame_sink_; Can you add an IsLost method to CompositorFrameSink instead? I don't like stuffing state like this into LTHI.
https://codereview.chromium.org/2349143003/diff/20001/cc/trees/layer_tree_hos... File cc/trees/layer_tree_host_impl.h (right): https://codereview.chromium.org/2349143003/diff/20001/cc/trees/layer_tree_hos... cc/trees/layer_tree_host_impl.h:840: bool has_valid_compositor_frame_sink_; On 2016/09/21 00:08:41, sunnyps wrote: > Can you add an IsLost method to CompositorFrameSink instead? I don't like > stuffing state like this into LTHI. I should've mentioned that this is a nit. Feel free to ignore this.
Patchset #3 (id:40001) has been deleted
The CQ bit was checked by sievers@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
sunnyps@chromium.org changed reviewers: + enne@chromium.org
+enne@ who knows more about this Just to make sure we're on the same page here: 1. CompositorImplAndroid releases the output surface (CFS) when becoming invisible. This can happen after a BeginMainFrame task is posted but before it's run. 2. Before https://codereview.chromium.org/2270573002/ we used to keep track of output surface lost state in LayerTreeHost and early out from STP BeginMainFrame based on that. 3. After that change BeginMainFrame gets to run and during commit it tries to upload textures for UI resources. This causes a crash because resource provider is dead. 4. This CL prevents that from happening by keeping track of the output surface lost state in LTHI and not uploading/deleting ui resources in commit based on that. 5. As a result it's possible for LTHI to mark the newly requested resources as evicted but any existing resources (from previous commits) are not evicted. I don't know enough about ui resources to know if this will break something other than the one test in this CL. 6. Also I suspect that there are other uses of resource provider / tile manager / output surface in BeginMainFrame/Commit that we also need to guard. Otherwise this crash might just happen in a different place. 7. I suggest that CompositorImplAndroid call LayerTreeHost::SetDeferCommits(true) when becoming invisible (and reset on becoming visible). This will cause BeginMainFrame to early out correctly and also ensure that the scheduler resumes it later.
On Wed, Sep 21, 2016 at 2:59 PM, <sunnyps@chromium.org> wrote: > +enne@ who knows more about this > > Just to make sure we're on the same page here: > 1. CompositorImplAndroid releases the output surface (CFS) when becoming > invisible. This can happen after a BeginMainFrame task is posted but > before it's > run. > 2. Before https://codereview.chromium.org/2270573002/ we used to keep > track of > output surface lost state in LayerTreeHost and early out from STP > BeginMainFrame > based on that. > Yep main frames used to depend on parameters of the output surface so it needed to know. > 3. After that change BeginMainFrame gets to run and during commit it tries > to > upload textures for UI resources. This causes a crash because resource > provider > is dead. > Now output surface is used only for the compositor thread, so finishing a commit after the surface is lost is not a problem. 4. This CL prevents that from happening by keeping track of the output > surface > lost state in LTHI and not uploading/deleting ui resources in commit based > on > that. > However, we can't try to upload ui resources while the surface is lost, of course. What we do when losing the surface is lost is evict ui resources, so I asked that we evict them immediately here as well. > 5. As a result it's possible for LTHI to mark the newly requested > resources as > evicted but any existing resources (from previous commits) are not > evicted. I > don't know enough about ui resources to know if this will break something > other > than the one test in this CL. > The existing resources will eventually be evicted too. The evicted list is used as a bool (empty or not) to tell the next commit to recreate them. Evicting them at different times and building up the list seems okay to me. > 6. Also I suspect that there are other uses of resource provider / tile > manager > / output surface in BeginMainFrame/Commit that we also need to guard. > Otherwise > this crash might just happen in a different place. > That's possible though generally we lose all resources when the output surface is lost. And when a replacement surface fails to initialize things become null, so we already have to handle this (that was demonstrated by this crash already occuring sometimes, but becoming more reproducable after (3)). So I think we already handle null and were missing this case. There may be others but this is the one we can reproduce atm. > 7. I suggest that CompositorImplAndroid call > LayerTreeHost::SetDeferCommits(true) when becoming invisible (and reset on > becoming visible). This will cause BeginMainFrame to early out correctly > and > also ensure that the scheduler resumes it later. > We could pickyback on SetDeferCommits, though as of (2) there's no reason to not commit anymore when the output surface is lost (other than we don't want to waste work and generate frames we won't show). However if the output surface is lost from the compositor side (not from ReleaseOutputSurface) there's no signal to the main side that it got lost, we just stop generating frames. I guess we could send a signal and just abort main frames in the proxy without it leaking beyond the proxy. Since we'll do another commit after recovering the output surface (this actually isn't needed anymore except for OutputSurface) we'd be able to recover and get a new main frame there. I guess it's a question of where do you prefer complexity? Having messages/coordination between impl and main side for output surface lost to about main frames? Or evict things that arrive while the surface is lost as we will do for existing resources? > > https://codereview.chromium.org/2349143003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
I think the simplest solution here is to just revert that change from #2 and abort commits when there's no compositor frame sink. I don't think there's any reason to commit when you don't have a compositor frame sink. If we need a quick fix, I think this is it. I think aborting main frames on the compositor thread before they're sent also seems like a valid option, but is tricky. The only difficulty there is handling updates that come with the begin main frame args. An aborted begin main frame is saying that the main thread didn't have any updates to reply with, but not that it didn't receive information. So, if you aborted on the compositor thread then we'd have to have some other aborting path where scroll updates aren't notified that the main thread got them. I'm less keen on deferring commits from CompositorImplAndroid, largely because that's making the caller responsible for internal compositor state? I'd rather have LayerTreeHost::SetVisible end up deferring commits during visibility changes. I'm not sure if that's enough to DCHECK that we have a compositor frame sink during commit. So, maybe we'd still need that early out.
On Thu, Sep 22, 2016 at 11:17 AM, enne@chromium.org via codereview.chromium.org <reply@chromiumcodereview-hr.appspotmail.com> wrote: > I think the simplest solution here is to just revert that change from #2 > and > abort commits when there's no compositor frame sink. I don't think there's > any > reason to commit when you don't have a compositor frame sink. If we need a > quick fix, I think this is it. > I don't think that fixes everything because a) The context can be lost independently of calling ReleaseOutputSurface/DeferCommits. b) The context can be lost after main frame is complete but before commit happens. There were already crashes of this nature but they became more common/easy to reproduce now. So we'd need to abort *commit* not just abort *main frame* which is what reverting #2 would do. Which is why I'd rather handle commit happening and just don't try to make resources (we don't try to raster tiles either, UIResources just weren't looking for this), than try to abort commits (now we have to save damage and stuff..) or abort main frames and go back to our old low crash rate. > > I think aborting main frames on the compositor thread before they're sent > also > seems like a valid option, but is tricky. The only difficulty there is > handling > updates that come with the begin main frame args. An aborted begin main > frame > is saying that the main thread didn't have any updates to reply with, but > not > that it didn't receive information. So, if you aborted on the compositor > thread > then we'd have to have some other aborting path where scroll updates aren't > notified that the main thread got them. > > I'm less keen on deferring commits from CompositorImplAndroid, largely > because > that's making the caller responsible for internal compositor state? I'd > rather > have LayerTreeHost::SetVisible end up deferring commits during visibility > changes. I'm not sure if that's enough to DCHECK that we have a compositor > frame sink during commit. So, maybe we'd still need that early out. > > https://codereview.chromium.org/2349143003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/09/22 20:07:01, danakj wrote: > On Thu, Sep 22, 2016 at 11:17 AM, mailto:enne@chromium.org via > http://codereview.chromium.org <mailto:reply@chromiumcodereview-hr.appspotmail.com> wrote: > > > I think the simplest solution here is to just revert that change from #2 > > and > > abort commits when there's no compositor frame sink. I don't think there's > > any > > reason to commit when you don't have a compositor frame sink. If we need a > > quick fix, I think this is it. > > > > I don't think that fixes everything because > a) The context can be lost independently of calling > ReleaseOutputSurface/DeferCommits. True. > b) The context can be lost after main frame is complete but before commit > happens. There were already crashes of this nature but they became more > common/easy to reproduce now. So we'd need to abort *commit* not just abort > *main frame* which is what reverting #2 would do. This does not happen. It's true that context can be lost in between NotifyReadyToCommit and Commit in renderer (browser BMF + Commit is synchronous) but the impl side doesn't call ReleaseCFS until it hears back from the main thread which is blocked while commit is pending. > > Which is why I'd rather handle commit happening and just don't try to make > resources (we don't try to raster tiles either, UIResources just weren't > looking for this), than try to abort commits (now we have to save damage > and stuff..) or abort main frames and go back to our old low crash rate. > > > > > > I think aborting main frames on the compositor thread before they're sent > > also > > seems like a valid option, but is tricky. The only difficulty there is > > handling > > updates that come with the begin main frame args. An aborted begin main > > frame > > is saying that the main thread didn't have any updates to reply with, but > > not > > that it didn't receive information. So, if you aborted on the compositor > > thread > > then we'd have to have some other aborting path where scroll updates aren't > > notified that the main thread got them. > > > > I'm less keen on deferring commits from CompositorImplAndroid, largely > > because > > that's making the caller responsible for internal compositor state? I'd > > rather > > have LayerTreeHost::SetVisible end up deferring commits during visibility > > changes. I'm not sure if that's enough to DCHECK that we have a compositor > > frame sink during commit. So, maybe we'd still need that early out. > > > > https://codereview.chromium.org/2349143003/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
On Thu, Sep 22, 2016 at 1:41 PM, <sunnyps@chromium.org> wrote: > On 2016/09/22 20:07:01, danakj wrote: > > On Thu, Sep 22, 2016 at 11:17 AM, mailto:enne@chromium.org via > > http://codereview.chromium.org > <mailto:reply@chromiumcodereview-hr.appspotmail.com> wrote: > > > > > I think the simplest solution here is to just revert that change from > #2 > > > and > > > abort commits when there's no compositor frame sink. I don't think > there's > > > any > > > reason to commit when you don't have a compositor frame sink. If we > need a > > > quick fix, I think this is it. > > > > > > > I don't think that fixes everything because > > a) The context can be lost independently of calling > > ReleaseOutputSurface/DeferCommits. > > True. > > > b) The context can be lost after main frame is complete but before commit > > happens. There were already crashes of this nature but they became more > > common/easy to reproduce now. So we'd need to abort *commit* not just > abort > > *main frame* which is what reverting #2 would do. > > This does not happen. It's true that context can be lost in between > NotifyReadyToCommit and Commit in renderer (browser BMF + Commit is > synchronous) > but the impl side doesn't call ReleaseCFS until it hears back from the main > thread which is blocked while commit is pending. > Hm there is some way this was crashing before that early out in begin main frame was removed, on android. But UIResources are used in the renderer too for scrollbars on non-android. > > > > > > Which is why I'd rather handle commit happening and just don't try to > make > > resources (we don't try to raster tiles either, UIResources just weren't > > looking for this), than try to abort commits (now we have to save damage > > and stuff..) or abort main frames and go back to our old low crash rate. > > > > > > > > > > I think aborting main frames on the compositor thread before they're > sent > > > also > > > seems like a valid option, but is tricky. The only difficulty there is > > > handling > > > updates that come with the begin main frame args. An aborted begin main > > > frame > > > is saying that the main thread didn't have any updates to reply with, > but > > > not > > > that it didn't receive information. So, if you aborted on the > compositor > > > thread > > > then we'd have to have some other aborting path where scroll updates > aren't > > > notified that the main thread got them. > > > > > > I'm less keen on deferring commits from CompositorImplAndroid, largely > > > because > > > that's making the caller responsible for internal compositor state? I'd > > > rather > > > have LayerTreeHost::SetVisible end up deferring commits during > visibility > > > changes. I'm not sure if that's enough to DCHECK that we have a > compositor > > > frame sink during commit. So, maybe we'd still need that early out. > > > > > > https://codereview.chromium.org/2349143003/ > > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > https://codereview.chromium.org/2349143003/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
LGTM https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_impl.cc (right): https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_impl.cc:2311: has_valid_compositor_frame_sink_ = true; nit: put this right beside the assignment to compositor_frame_sink_ https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_impl.h (right): https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_impl.h:840: bool has_valid_compositor_frame_sink_; Can you TODO(danakj): Delete the compositor frame sink and all resources when it's lost instead of having this bool
https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_impl.cc (right): https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_impl.cc:2311: has_valid_compositor_frame_sink_ = true; On 2016/09/23 21:24:10, danakj wrote: > nit: put this right beside the assignment to compositor_frame_sink_ Done. https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_impl.h (right): https://codereview.chromium.org/2349143003/diff/160001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_impl.h:840: bool has_valid_compositor_frame_sink_; On 2016/09/23 21:24:10, danakj wrote: > Can you TODO(danakj): Delete the compositor frame sink and all resources when > it's lost instead of having this bool Done.
The CQ bit was checked by sievers@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2349143003/#ps180001 (title: "not for review: fix crash")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...)
The CQ bit was checked by sievers@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Failed to apply the patch. On branch working_branch Your branch is up-to-date with 'origin/refs/pending/heads/master'. nothing to commit, working tree clean
Message was sent while issue was closed.
Description was changed from ========== cc: Avoid ResourceProvider nullptr deref When ReleaseCompositorFrameSink() was called explicitly we delete |resource_provider_|, but we might still have a BeginMainFrame occur and processing UIResourceRequests will try deref it. BUG=643721 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel ========== to ========== cc: Avoid ResourceProvider nullptr deref When ReleaseCompositorFrameSink() was called explicitly we delete |resource_provider_|, but we might still have a BeginMainFrame occur and processing UIResourceRequests will try deref it. BUG=643721 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_precise_blink_rel Committed: https://crrev.com/68dc75d78f547eeb4c80a9ff8ced7e1bdb62b485 Cr-Commit-Position: refs/heads/master@{#420742} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/68dc75d78f547eeb4c80a9ff8ced7e1bdb62b485 Cr-Commit-Position: refs/heads/master@{#420742} |