|
|
Descriptioncc: Prevent recursion in *TileTaskWorkerPool methods.
PixelBufferTileTaskWorkerPool::CheckForCompletedRasterTasks() has a DCHECK
that triggers due to TileManager::DidFinishRunningTileTasks recursing
back into PixelBufferTileTaskWorkerPool.
This patch removes the recursion.
BUG=442598
Committed: https://crrev.com/49dc26e98442218f986ec0804e9637f8cde91366
Cr-Commit-Position: refs/heads/master@{#309603}
Patch Set 1 : #Patch Set 2 : Update tests to also not call TTW recursively. #Patch Set 3 : Use DCHECK instead of CHECK. #
Total comments: 3
Patch Set 4 : Removed reentrance checks. Will make that a separate CL. #
Total comments: 4
Patch Set 5 : Fixed suggestions. #Patch Set 6 : Rebase. #Patch Set 7 : Rebase more. #
Messages
Total messages: 22 (4 generated)
Patchset #1 (id:1) has been deleted
vmiura@chromium.org changed reviewers: + vmpstr@chromium.org
PTAL. If this make sense, I would add the single_entry checks to all TTWs.
I'm not sure about the single entry checks. You could argue that being single-entry-only is a requirement in general in cc (with a few exceptions). So, it's not clear to me why we want these checks in task runners but not in, say, TileManager, or any other code that calls to a client, which in turn could call back into the code. That aside, and assuming there is little perf impact from this change, I think the TM changes to schedule a task look good.
On 2014/12/18 00:49:35, vmpstr wrote: > I'm not sure about the single entry checks. You could argue that being > single-entry-only is a requirement in general in cc (with a few exceptions). So, > it's not clear to me why we want these checks in task runners but not in, say, > TileManager, or any other code that calls to a client, which in turn could call > back into the code. > > That aside, and assuming there is little perf impact from this change, I think > the TM changes to schedule a task look good. Also, is it easy to add a unittest (lth test probably) that demonstrates the bugs caused by reentrancy into this object?
On 2014/12/18 00:49:35, vmpstr wrote: > I'm not sure about the single entry checks. You could argue that being > single-entry-only is a requirement in general in cc (with a few exceptions). So, > it's not clear to me why we want these checks in task runners but not in, say, > TileManager, or any other code that calls to a client, which in turn could call > back into the code. It's a sanity check, to the same level as DCHECKs. Sanity checks are optional, but good to have. We could make this a no-op if DCHECKs are off as well. It would be good to add these to TileManager as well, as changing some "seemingly innocent" things around LTHI/Scheduler show up as hard to debug crashes. > Also, is it easy to add a unittest (lth test probably) that demonstrates the > bugs caused by reentrancy into this object? I'm not sure. Any ideas? We don't usually write tests that should crash.
On 2014/12/18 01:10:55, vmiura wrote: > On 2014/12/18 00:49:35, vmpstr wrote: > > I'm not sure about the single entry checks. You could argue that being > > single-entry-only is a requirement in general in cc (with a few exceptions). > So, > > it's not clear to me why we want these checks in task runners but not in, say, > > TileManager, or any other code that calls to a client, which in turn could > call > > back into the code. > > It's a sanity check, to the same level as DCHECKs. Sanity checks are optional, > but good to have. We could make this a no-op if DCHECKs are off as well. It's a CHECK in the constructor, no? This would assert in release as well. > > It would be good to add these to TileManager as well, as changing some > "seemingly innocent" things around LTHI/Scheduler show up as hard to debug > crashes. > > > Also, is it easy to add a unittest (lth test probably) that demonstrates the > > bugs caused by reentrancy into this object? > > I'm not sure. Any ideas? We don't usually write tests that should crash. I meant a unittest that would demonstrate that the changes you are making in tile manager are fixing an issue (ie a test that would crash/fail without the patch, but pass with the patch).
On 2014/12/18 01:15:39, vmpstr wrote: > On 2014/12/18 01:10:55, vmiura wrote: > > On 2014/12/18 00:49:35, vmpstr wrote: > > > I'm not sure about the single entry checks. You could argue that being > > > single-entry-only is a requirement in general in cc (with a few exceptions). > > So, > > > it's not clear to me why we want these checks in task runners but not in, > say, > > > TileManager, or any other code that calls to a client, which in turn could > > call > > > back into the code. > > > > It's a sanity check, to the same level as DCHECKs. Sanity checks are > optional, > > but good to have. We could make this a no-op if DCHECKs are off as well. > > It's a CHECK in the constructor, no? This would assert in release as well. Right, I'll change it to DCHECK. > > > > It would be good to add these to TileManager as well, as changing some > > "seemingly innocent" things around LTHI/Scheduler show up as hard to debug > > crashes. > > > > > Also, is it easy to add a unittest (lth test probably) that demonstrates the > > > bugs caused by reentrancy into this object? > > > > I'm not sure. Any ideas? We don't usually write tests that should crash. > > I meant a unittest that would demonstrate that the changes you are making in > tile manager are fixing an issue (ie a test that would crash/fail without the > patch, but pass with the patch). Oh sure. We have three tests that fail the sanity check without the TileManager change: LayerTreeHostOnDemandRasterPixelTest.RasterPictureLayer LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DelegatingRenderer_ImplSidePaint LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DirectRenderer_ImplSidePaint
> Oh sure. We have three tests that fail the sanity check without the TileManager > change: > > LayerTreeHostOnDemandRasterPixelTest.RasterPictureLayer > > LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DelegatingRenderer_ImplSidePaint > > LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DirectRenderer_ImplSidePaint What I was trying to say is that this came out of a bug for which I thought a good solution was to ensure that we issued notifications in a particular order (specifically that ALL came last). The claim in the description here is that there are subtle bugs if we don't reschedule things, is that the bug that you're referring to? As a backstory, this was always tail recursion, even when it was initially added in https://codereview.chromium.org/17351017 (1y5m ago) which was changed to generalized notifications in: https://codereview.chromium.org/523243002 and then I asked to order notifications alphabetically in: https://codereview.chromium.org/672283003/diff/1/cc/resources/raster_worker_p... which broke everything :D My proposal was to undo the last ordering change, since clearly ALL has to be last. Anyway, the solution here adds a bit more timing confusion (now other things can happen before we reschedule more work, which is probably fine, but it's just more non determinism in the system). Additionally, with this patch, we still have a situation that we get an ALL callback (as in everything finished) before getting RFA or RFD callback, which is a bit awkward in my opinion, but not a big deal. All in all, I do believe this fixes a bug that exists right now (ordering issue), so I think this can go in. Are there any performance changes as a result of this change (on thread times)?
On 2014/12/18 03:36:10, vmpstr wrote: > > Oh sure. We have three tests that fail the sanity check without the > TileManager > > change: > > > > LayerTreeHostOnDemandRasterPixelTest.RasterPictureLayer > > > > > LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DelegatingRenderer_ImplSidePaint > > > > > LayerTreeHostTestOneActivatePerPrepareTiles.RunMultiThread_DirectRenderer_ImplSidePaint > > What I was trying to say is that this came out of a bug for which I thought a > good solution > was to ensure that we issued notifications in a particular order (specifically > that ALL > came last). > > The claim in the description here is that there are subtle bugs if > we don't > reschedule things, is that the bug that you're referring to? > As a backstory, this was always tail recursion, even when it was initially added > in > https://codereview.chromium.org/17351017 (1y5m ago) > which was changed to generalized notifications in: > https://codereview.chromium.org/523243002 > and then I asked to order notifications alphabetically in: > https://codereview.chromium.org/672283003/diff/1/cc/resources/raster_worker_p... > which broke everything :D > > My proposal was to undo the last ordering change, since clearly ALL has to be > last. > I see so ordering matters. Should we put ALL at the end, and add a test for that? This CL, changing the scheduling and adding checks is more for future sanity. I realize it was intended to tail recurse, but we don't have a test for that. I realize we are changing scheduling, so I'll leave it to you to determine if it's a good trade off. > Anyway, the solution here adds a bit more timing confusion (now other things can > happen before > we reschedule more work, which is probably fine, but it's just more non > determinism in the system). > > Additionally, with this patch, we still have a situation that we get an ALL > callback (as in everything finished) > before getting RFA or RFD callback, which is a bit awkward in my opinion, but > not a big deal. > > All in all, I do believe this fixes a bug that exists right now (ordering > issue), so I think this can go in. > Are there any performance changes as a result of this change (on thread times)? I ran smoothness & frame times but it's hard to see a difference between the two. The extra callback happens somewhat infrequently.
reveman@chromium.org changed reviewers: + reveman@chromium.org
https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h File cc/base/util.h (right): https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h#newcode28 cc/base/util.h:28: class ScopedSingleEntry { This seems very similar to base::ThreadCollisionWarner. I guess you could just use that instead but that might be confusing as it's not really checking for thread collisions. Would be nice if usage was at least consistent with base::ThreadCollisionWarner..
> > I ran smoothness & frame times but it's hard to see a difference between the > two. The extra callback happens somewhat infrequently. OK. I'm fine with this change. I would mildly prefer that it's the task runner that ensures that the stack be unwound (ie, instead of calling DidFinishRunningTileTasks directly, it schedules it). However, that might be more work to ensure correct behavior, so I'll leave the choice to you. lgtm (% other comments)
https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h File cc/base/util.h (right): https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h#newcode28 cc/base/util.h:28: class ScopedSingleEntry { On 2014/12/18 19:24:07, reveman wrote: > This seems very similar to base::ThreadCollisionWarner. I guess you could just > use that instead but that might be confusing as it's not really checking for > thread collisions. Would be nice if usage was at least consistent with > base::ThreadCollisionWarner.. DFAKE_SCOPED_LOCK() seems like a superset of what we need; it detects recursion, as well as multiple thread entry. Would you be OK if I use that? The failure looks like: [40382:40383:1218/131159:763931716939:FATAL:thread_collision_warner.cc(13)] Check failed: false. Thread Collision #0 0x0000010f55fe base::debug::StackTrace::StackTrace() #1 0x00000113c6b5 logging::LogMessage::~LogMessage() #2 0x0000011c40e1 base::DCheckAsserter::warn() #3 0x0000011c41f6 base::ThreadCollisionWarner::Enter() #4 0x000000ef76a3 base::ThreadCollisionWarner::ScopedCheck::ScopedCheck() #5 0x000000ef5fa4 cc::PixelBufferTileTaskWorkerPool::CheckForCompletedTasks() #6 0x000000ef64ac cc::PixelBufferTileTaskWorkerPool::CheckForCompletedTasks() #7 0x000000f68297 cc::TileManager::CheckIfMoreTilesNeedPrepare() #8 0x000000f69209 cc::TileManager::DidFinishRunningTileTasks() #9 0x000000ef0ebc cc::PixelBufferTileTaskWorkerPool::CheckForCompletedRasterTasks() #10 0x000000ef689c cc::PixelBufferTileTaskWorkerPool::OnTaskSetFinished() #11 0x000000b62eda base::internal::RunnableAdapter<>::Run() #12 0x000000ea20d8 base::internal::InvokeHelper<>::MakeItSo() #13 0x000000efc88c base::internal::Invoker<>::Run() #14 0x0000004e376e base::Callback<>::Run()
https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h File cc/base/util.h (right): https://codereview.chromium.org/807273005/diff/60001/cc/base/util.h#newcode28 cc/base/util.h:28: class ScopedSingleEntry { On 2014/12/18 21:16:48, vmiura wrote: > On 2014/12/18 19:24:07, reveman wrote: > > This seems very similar to base::ThreadCollisionWarner. I guess you could just > > use that instead but that might be confusing as it's not really checking for > > thread collisions. Would be nice if usage was at least consistent with > > base::ThreadCollisionWarner.. > > DFAKE_SCOPED_LOCK() seems like a superset of what we need; it detects recursion, > as well as multiple thread entry. Would you be OK if I use that? Yes, I think that's better than adding this new class. Ideally we'd have something called a ReentryChecker in base but if ThreadCollisionWarner works I think it's a good idea to just use that. This is only for PixelBufferTileTaskWorkerPool, which we'll hopefully be able to remove soon anyhow.
PTAL. Will add the reentrance checks as a separate CL.
lgtm with nit https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... File cc/resources/tile_manager.cc (right): https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... cc/resources/tile_manager.cc:879: // would otherwise not be picked up by the old raster queue. Not your patch but it would be nice if fixed the formatting of this comment while touching this code. https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manager.h File cc/resources/tile_manager.h (right): https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... cc/resources/tile_manager.h:246: void CheckIfMoreTilesNeedPrepare(); nit: ...NeedToBePrepared() Maybe it would be better to invert this and call it CheckIfIdle and the notifier becomes idle_check_notifier_. I don't feel that strongly about it though.
https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... File cc/resources/tile_manager.cc (right): https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... cc/resources/tile_manager.cc:879: // would otherwise not be picked up by the old raster queue. On 2014/12/19 01:07:44, reveman wrote: > Not your patch but it would be nice if fixed the formatting of this comment > while touching this code. Done. https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manager.h File cc/resources/tile_manager.h (right): https://codereview.chromium.org/807273005/diff/80001/cc/resources/tile_manage... cc/resources/tile_manager.h:246: void CheckIfMoreTilesNeedPrepare(); On 2014/12/19 01:07:44, reveman wrote: > nit: ...NeedToBePrepared() > > Maybe it would be better to invert this and call it CheckIfIdle and the notifier > becomes idle_check_notifier_. I don't feel that strongly about it though. Done.
The CQ bit was checked by vmiura@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/807273005/140001
Message was sent while issue was closed.
Committed patchset #7 (id:140001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/49dc26e98442218f986ec0804e9637f8cde91366 Cr-Commit-Position: refs/heads/master@{#309603} |