|
|
Created:
3 years, 9 months ago by Saman Sami Modified:
3 years, 8 months ago Reviewers:
danakj CC:
chromium-reviews, cc-bugs_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionFixing flakiness of TextureLayerChangeInvisibleMailboxTest
The flake seems to be because of a race between the order of running main and impl
thread. Resources are always reclaimed after impl threads posts
DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5
depends on whether we switch to main thread or keep running impl thread and
reclaim resources which triggers posting a task to the main thread to release the
mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task
for releasing mailbox is posted using a blocking task runner so it is run first.
In order to solve this issue, we decided to use
DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because
all the resources are released before DidReceiveCompositorFrameAck is
called so the race between releasing resources and starting next commit
doesn't happen. In single-thread mode we used to call
DidReceiveCompositorFrameAck directly instead of PostTasking which
caused different behaviour from multi-thread mode. We decided to also
use PostTask for DidReceiveCompositorFrameAck in single thread mode and
use weak pointers to discard the ack if CompositorFrameSink is released
before ack is processed.
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
BUG=702868
Review-Url: https://codereview.chromium.org/2757373002
Cr-Commit-Position: refs/heads/master@{#465638}
Committed: https://chromium.googlesource.com/chromium/src/+/44b6dfc95bd4a7b8240b81720b9804692b439ae9
Patch Set 1 #Patch Set 2 : Use DidReceiveCompositorFrameAck #Patch Set 3 : Post DidReceiveCompositorFrameAck #Patch Set 4 : Added comments #Patch Set 5 : c #
Total comments: 4
Patch Set 6 : c #
Total comments: 3
Patch Set 7 : Don't PostTask release callback in single thread mode #
Total comments: 2
Patch Set 8 : c #Patch Set 9 : Drop ack if CFS is gone #
Total comments: 2
Patch Set 10 : WeakPtr #Patch Set 11 : WeakPtr for multi thread #Patch Set 12 : Invalidate on DidLose #
Total comments: 1
Patch Set 13 : c #Patch Set 14 : c #Patch Set 15 : c #Patch Set 16 : Added test #Patch Set 17 : Cleanup #Patch Set 18 : Rename #
Total comments: 12
Patch Set 19 : Addressed comments #
Total comments: 1
Patch Set 20 : Updated test #
Total comments: 4
Patch Set 21 : Rebased #Patch Set 22 : Addressed comments #Patch Set 23 : c #
Messages
Total messages: 122 (81 generated)
Description was changed from ========== Fixing flakiness ========== to ========== Fixing flakiness CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Description was changed from ========== Fixing flakiness CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== texture_layer_->ClearClient() returns the mailbox in the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Description was changed from ========== texture_layer_->ClearClient() returns the mailbox in the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Description was changed from ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
samans@chromium.org changed reviewers: + danakj@chromium.org
Description was changed from ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
PTAL
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CL title "Fixing flakiness of TextureLayerChangeInvisibleMailboxTest" should be the first line of the Description. The git commit doesn't use the rietveld title.
I don't understand what the race is from the description, can you explain why this is the right fix? The commit_count_ is set by DidCommitAndDrawFrame on the main thread.
Description was changed from ========== texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
On 2017/03/20 19:02:54, danakj wrote: > I don't understand what the race is from the description, can you explain why > this is the right fix? The commit_count_ is set by DidCommitAndDrawFrame on the > main thread. Yes but the call to DidCommitAndDrawFrame is posted from the impl thread. Releasing the mailbox happens on the main thread apparently.
On 2017/03/20 19:14:57, Saman Sami wrote: > On 2017/03/20 19:02:54, danakj wrote: > > I don't understand what the race is from the description, can you explain why > > this is the right fix? The commit_count_ is set by DidCommitAndDrawFrame on > the > > main thread. > > Yes but the call to DidCommitAndDrawFrame is posted from the impl thread. > Releasing the mailbox happens on the main thread apparently. The mailbox release and the DidCommitAndDrawFrame both come from the compositor thread, and both execute on the main thread.
On 2017/03/20 19:24:49, danakj wrote: > On 2017/03/20 19:14:57, Saman Sami wrote: > > On 2017/03/20 19:02:54, danakj wrote: > > > I don't understand what the race is from the description, can you explain > why > > > this is the right fix? The commit_count_ is set by DidCommitAndDrawFrame on > > the > > > main thread. > > > > Yes but the call to DidCommitAndDrawFrame is posted from the impl thread. > > Releasing the mailbox happens on the main thread apparently. > > The mailbox release and the DidCommitAndDrawFrame both come from the compositor > thread, and both execute on the main thread. Was thinking about this some more, maybe I can put this another way. This is changing an expectation in the test. Presumably this expectation was testing something for correctness. Now it's testing something else. Was it testing before that it isn't testing now? Does this no longer catch some class of bugs? Or was it just incorrect? Can you explain?
On 2017/03/21 16:24:00, danakj wrote: > On 2017/03/20 19:24:49, danakj wrote: > > On 2017/03/20 19:14:57, Saman Sami wrote: > > > On 2017/03/20 19:02:54, danakj wrote: > > > > I don't understand what the race is from the description, can you explain > > why > > > > this is the right fix? The commit_count_ is set by DidCommitAndDrawFrame > on > > > the > > > > main thread. > > > > > > Yes but the call to DidCommitAndDrawFrame is posted from the impl thread. > > > Releasing the mailbox happens on the main thread apparently. > > > > The mailbox release and the DidCommitAndDrawFrame both come from the > compositor > > thread, and both execute on the main thread. > > Was thinking about this some more, maybe I can put this another way. > > This is changing an expectation in the test. Presumably this expectation was > testing something for correctness. Now it's testing something else. Was it > testing before that it isn't testing now? Does this no longer catch some class > of bugs? Or was it just incorrect? Can you explain? Thanks for the follow-up. I don't know a lot about texture mailboxes, and I was not able to reproduce this problem on my workstation. I'll try it on my mac, maybe it fails there. If I can reproduce the failure I might get a better idea what's going on.
On Tue, Mar 21, 2017 at 1:46 PM, <samans@chromium.org> wrote: > On 2017/03/21 16:24:00, danakj wrote: > > On 2017/03/20 19:24:49, danakj wrote: > > > On 2017/03/20 19:14:57, Saman Sami wrote: > > > > On 2017/03/20 19:02:54, danakj wrote: > > > > > I don't understand what the race is from the description, can you > explain > > > why > > > > > this is the right fix? The commit_count_ is set by > DidCommitAndDrawFrame > > on > > > > the > > > > > main thread. > > > > > > > > Yes but the call to DidCommitAndDrawFrame is posted from the impl > thread. > > > > Releasing the mailbox happens on the main thread apparently. > > > > > > The mailbox release and the DidCommitAndDrawFrame both come from the > > compositor > > > thread, and both execute on the main thread. > > > > Was thinking about this some more, maybe I can put this another way. > > > > This is changing an expectation in the test. Presumably this expectation > was > > testing something for correctness. Now it's testing something else. Was > it > > testing before that it isn't testing now? Does this no longer catch some > class > > of bugs? Or was it just incorrect? Can you explain? > > Thanks for the follow-up. I don't know a lot about texture mailboxes, and > I was > not able to reproduce this problem on my workstation. I'll try it on my > mac, > maybe it fails there. If I can reproduce the failure I might get a better > idea > what's going on. > FWIW on linux it happens for me regularly (almost always) on a release asan, or on a debug build, when i run the whole cc_unittests suite (which runs a bunch of tests in parallel) > > https://codereview.chromium.org/2757373002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2017/03/21 18:41:51, danakj wrote: > On Tue, Mar 21, 2017 at 1:46 PM, <mailto:samans@chromium.org> wrote: > > > On 2017/03/21 16:24:00, danakj wrote: > > > On 2017/03/20 19:24:49, danakj wrote: > > > > On 2017/03/20 19:14:57, Saman Sami wrote: > > > > > On 2017/03/20 19:02:54, danakj wrote: > > > > > > I don't understand what the race is from the description, can you > > explain > > > > why > > > > > > this is the right fix? The commit_count_ is set by > > DidCommitAndDrawFrame > > > on > > > > > the > > > > > > main thread. > > > > > > > > > > Yes but the call to DidCommitAndDrawFrame is posted from the impl > > thread. > > > > > Releasing the mailbox happens on the main thread apparently. > > > > > > > > The mailbox release and the DidCommitAndDrawFrame both come from the > > > compositor > > > > thread, and both execute on the main thread. > > > > > > Was thinking about this some more, maybe I can put this another way. > > > > > > This is changing an expectation in the test. Presumably this expectation > > was > > > testing something for correctness. Now it's testing something else. Was > > it > > > testing before that it isn't testing now? Does this no longer catch some > > class > > > of bugs? Or was it just incorrect? Can you explain? > > > > Thanks for the follow-up. I don't know a lot about texture mailboxes, and > > I was > > not able to reproduce this problem on my workstation. I'll try it on my > > mac, > > maybe it fails there. If I can reproduce the failure I might get a better > > idea > > what's going on. > > > > FWIW on linux it happens for me regularly (almost always) on a release > asan, or on a debug build, when i run the whole cc_unittests suite (which > runs a bunch of tests in parallel) > > > > > > https://codereview.chromium.org/2757373002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org. This seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame on the main thread (here: https://cs.chromium.org/chromium/src/cc/trees/proxy_impl.cc?rcl=db8b09a592fde...), so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first.
On 2017/03/29 15:58:07, Saman Sami wrote: > On 2017/03/21 18:41:51, danakj wrote: > > On Tue, Mar 21, 2017 at 1:46 PM, <mailto:samans@chromium.org> wrote: > > > > > On 2017/03/21 16:24:00, danakj wrote: > > > > On 2017/03/20 19:24:49, danakj wrote: > > > > > On 2017/03/20 19:14:57, Saman Sami wrote: > > > > > > On 2017/03/20 19:02:54, danakj wrote: > > > > > > > I don't understand what the race is from the description, can you > > > explain > > > > > why > > > > > > > this is the right fix? The commit_count_ is set by > > > DidCommitAndDrawFrame > > > > on > > > > > > the > > > > > > > main thread. > > > > > > > > > > > > Yes but the call to DidCommitAndDrawFrame is posted from the impl > > > thread. > > > > > > Releasing the mailbox happens on the main thread apparently. > > > > > > > > > > The mailbox release and the DidCommitAndDrawFrame both come from the > > > > compositor > > > > > thread, and both execute on the main thread. > > > > > > > > Was thinking about this some more, maybe I can put this another way. > > > > > > > > This is changing an expectation in the test. Presumably this expectation > > > was > > > > testing something for correctness. Now it's testing something else. Was > > > it > > > > testing before that it isn't testing now? Does this no longer catch some > > > class > > > > of bugs? Or was it just incorrect? Can you explain? > > > > > > Thanks for the follow-up. I don't know a lot about texture mailboxes, and > > > I was > > > not able to reproduce this problem on my workstation. I'll try it on my > > > mac, > > > maybe it fails there. If I can reproduce the failure I might get a better > > > idea > > > what's going on. > > > > > > > FWIW on linux it happens for me regularly (almost always) on a release > > asan, or on a debug build, when i run the whole cc_unittests suite (which > > runs a bunch of tests in parallel) > > > > > > > > > > https://codereview.chromium.org/2757373002/ > > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > This seems to be because of a race between the order of running main and impl > thread. Resources are always reclaimed after impl threads posts > DidCommitAndDrawFrame on the main thread (here: > https://cs.chromium.org/chromium/src/cc/trees/proxy_impl.cc?rcl=db8b09a592fde...), > so whether commit_count_ == 4 or 5 depends on whether we switch to main thread > or keep running impl thread and reclaim resources which triggers posting a task > to the main thread to release the mailbox. Even though the task for > DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted > using a blocking task runner so it is run first. Thanks, let me try to reiterate this back and see if I'm following.. - DidCommitAndDrawFrame is posted impl->main using the main thread task runner during impl thread Drawing. - During DidCommitAndDrawFrame on main with commit_count_ = 4, we release the texture mailbox via ClearClient(). - The ClearClient() causes another commit, which releases the mailbox. - The TextureLayerImpl deletes the resource in ResourceProvider from the active tree here https://cs.chromium.org/chromium/src/cc/layers/texture_layer_impl.cc?rcl=44c1... - The resource would be in use in the display compositor, so it is not released yet. - The impl thread "draws" and submits a frame to the display compositor without the mailbox. - When this drawing is done, it posts DidCommitAndDrawFrame impl->main using the main thread task runner. (1) - The display compositor returns the old mailbox to the impl thread via LayerTreeHostImpl::ReclaimResources (2) - ReclaimResources frees the resource in ResourceProvider, which posts the release callback impl->main via TextureLayer::TextureMailboxHolder::ReturnAndReleaseOnImplThread which uses a blocking task runner. (3) So which are the conditions when (3) executes before (1)? You mentioned that blocking task runner was a part of it, though it forwards to the usual task runner unless it is capturing via BlockingTaskRunner::CapturePostTasks being active. This happens in proxy main while the thread is blocked. https://cs.chromium.org/chromium/src/cc/trees/proxy_main.cc?rcl=c8cf9f23a6c3b... When clearing the client, TextureLayer calls SetNextCommitWaitsForActivation() which causes proxy_main's hold_commit_for_activation to be true. That means the main thread is blocked (and the BlockingTaskRunner is active) until activation happens in ProxyImpl::DidActivateSyncTree https://cs.chromium.org/chromium/src/cc/trees/proxy_impl.cc?rcl=c8cf9f23a6c3b.... But that should be before drawing. Are you maybe saying that the signal there that releases the BlockingTaskRunner is sometimes staying alive longer so that (3) gets captured instead of forwarding? If that's the case, causing flaky ordering would not be ideal, and we could fix this by posting the DidCommitAndDrawFrame through the blocking task runner too? Let me know if I'm following, thanks.
I was actually not aware of how BlockingTaskRunner works and I assumed it always runs before a normal task runner. I did some investigation and yes in fact when the test fails both DidDraw post and release post happen during the time that CapturePostTasks is active. If this is a bug, I can do more digging to figure out why this is happening. Do we really need DidCommitAndDrawFrame to be posted through a blocking task runner? I would rather just fix the expectation in test if it's not accurate.
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest texture_layer_->ClearClient() returns the mailbox on the main thread, so there is no guarantee that impl thread commits before it. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
On 2017/03/29 23:02:05, Saman Sami wrote: > I was actually not aware of how BlockingTaskRunner works and I assumed it always > runs before a normal task runner. I did some investigation and yes in fact when > the test fails both DidDraw post and release post happen during the time that > CapturePostTasks is active. If this is a bug, I can do more digging to figure > out why this is happening. It's definitely a race which is usually not desirable, especially in tests. The goal of our system should be consistent ordering, again, especially in tests. What we'd want to do here requires some thought to understand the pieces in play. The point of BlockingTaskRunner is to ensure that we return resources back to the main thread before we tell the main thread DidCommit() so that code on the main thread can consistently reuse resources at the time of DidCommit. I found that DidCommit is coming from ProxyMain here https://cs.chromium.org/chromium/src/cc/trees/proxy_main.cc?rcl=16e103e585d6f... right after BlockingTaskRunner::CapturePostTasks goes out of scope. Since those are both on the main thread the ordering is okay. Step (2) above is fundamentally meant to be asynchronous cuz it's out of process normally. I was *going* to say that DidCommitAndDrawFrame is for tests and maybe we should consider some more synchronization for it but apparently it is used to plumb swap-acks thru to plugins. However.. if it's a swap ack, we'd want to unblock plugins the same time we'd want to unblock the renderer.. which is when CompositorFrameAck happens, not when CompositorFrame is sent. If DidCommitAndDrawFrame was after the CompositorFrameAck would that prevent the race and the resources would always come back to the main thread before DidCommitAndDrawFrame? Is (2) the resources in CompositorFrameAck or is it something else asynchronous? > Do we really need DidCommitAndDrawFrame to be posted through a blocking task > runner? I would rather just fix the expectation in test if it's not accurate. It looks to me after digging that that would be wrong, cuz that would put DidCommitAndDrawFrame before DidCommit, since we run that after stopping capture on BlockingTaskRunner (the code link above).
The resources are returned separately from the ack but always before the ack to ensure we can always reuse resources when making the next frame. Sending DidCommitAndDrawFrame after ack should remove the race, but this kinda changes what DidCommitAndDrawFrame means, right? Actually, why not just get rid of DidCommitAndDrawFrame and use DidReceiveCompositorFrameAck if we're gonna do this?
I tried using DidReceiveCompositorFrameAck in the unit test, hoping that it would fix the flakiness. Unfortunately, now we have inconsistent behaviour between single- and multi-thread mode. In multi-thread mode, DidReceiveCompositorFrameAck and release callback are both post-tasked, but in single-thread mode DidReceiveCompositorFrameAck is called directly but release callback is post-tasked. So in multi-thread mode resources are returned before ack on the main thread, but in single-thread mode ack comes first.
I'm not sure if single-thread mode is used in real world, but if it's used, wouldn't this problem actually cause memory usage issues? I remember that we used to send ack before resources in CompostorFrameSinkSupport and it causes memory regressions. See crbug.com/696850.
On 2017/04/03 23:46:50, Saman Sami wrote: > I'm not sure if single-thread mode is used in real world, but if it's used, > wouldn't this problem actually cause memory usage issues? I remember that we > used to send ack before resources in CompostorFrameSinkSupport and it causes > memory regressions. See crbug.com/696850. Thanks for looking at this in such detail! Single-thread proxy is used for: - ui::Compositor - blink layout tests I think the DidReceiveCompositorFrameAck direct-call you're referring to here is https://cs.chromium.org/chromium/src/cc/trees/single_thread_proxy.cc?rcl=85ca... right? Keeping the single and multi thread ordering the same is also desirable. In this case, its probably not important since DidCommitAndDrawFrame was only being used for plugins and unit tests, and plugins are only used in blink which is threaded (outside layout tests). But I like the idea very much of dropping DidCommitAndDrawFrame and just using DidReceiveCompositorFrameAck, and then making SingleThread mode post it so that it gets the same ordering, and it happens on a new stack frame the same as for threaded.
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks! I did what you suggested. PTAL. I ran the test 20000 time and it didn't fail.
Thanks! Now that we've identified DidCommitAndDrawFrame and DidReceiveCompositorFrameAck as being redundant would you be down to remove the first entirely? It'd be great to reduce API surfaces and it would make the plugins code more clear about when things are happening too ftw. https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... cc/trees/single_thread_proxy.cc:803: if (layer_tree_host_) This is unusual in this file, the host is only null after Stop() happens. After which the proxy is destroyed IIRC. Can you explain why it's here?
If it always make sense to use DidReceiveCompositorFrameAck instead, then yeah definitely. I actually think using it makes more sense because I don't think we ever start the next commit before getting an ack? At least that's what I always thought ack is used for, so I'm not sure why we ever used DidCommitAndDrawFrame. One problem is that Fady used to say we want to get rid of DidReceiveCompositorFrameAck so I'm not sure if we use it in more places it's a good thing or not. Do you think we should remove DidCommitAndDrawFrame in this CL? That could be a little more controversial CL and I want this change to stick so maybe I can make another CL for it?
https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... cc/trees/single_thread_proxy.cc:803: if (layer_tree_host_) On 2017/04/04 19:20:54, danakj wrote: > This is unusual in this file, the host is only null after Stop() happens. After > which the proxy is destroyed IIRC. Can you explain why it's here? I'm not familiar with lifetime of layer_tree_host_ compared to the proxy so I thought this check will be useful. I'll get rid of it.
On 2017/04/04 19:32:13, Saman Sami wrote: > If it always make sense to use DidReceiveCompositorFrameAck instead, then yeah > definitely. I actually think using it makes more sense because I don't think we > ever start the next commit before getting an ack? At least that's what I always > thought ack is used for, so I'm not sure why we ever used DidCommitAndDrawFrame. > One problem is that Fady used to say we want to get rid of > DidReceiveCompositorFrameAck so I'm not sure if we use it in more places it's a > good thing or not. Do you think we should remove DidCommitAndDrawFrame in this > CL? That could be a little more controversial CL and I want this change to stick > so maybe I can make another CL for it? Well, there's already code using DidCommitAndDrawFrame, which surprised me cuz I thot it was only tests :) But I don't see what would be worse about changing them to DidReceiveCompositorFrameAck. That is what plugins actually want, they are waiting for a "swap ack" which is exactly what that is to them, so using that actually makes more sense to me, and now we know that it also means they get back resources before it happens which is a win! Having 1 method in the API instead of 2, and keeping the one with more deterministic ordering seems like a good improvement to me. I think we should keep this CL small and fix the test ASAP but would love to tackle this with you in another CL. https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... cc/trees/single_thread_proxy.cc:803: if (layer_tree_host_) On 2017/04/04 19:32:31, Saman Sami wrote: > On 2017/04/04 19:20:54, danakj wrote: > > This is unusual in this file, the host is only null after Stop() happens. > After > > which the proxy is destroyed IIRC. Can you explain why it's here? > > I'm not familiar with lifetime of layer_tree_host_ compared to the proxy so I > thought this check will be useful. I'll get rid of it. I'm quite against branches just-in-case, and want them understood and explained, because every branch is complexity when someone else reads or wants to change the code later. Then they'd wonder when layer_tree_host_ is null, and people will easily cargo-cult it into other places, and it just kinda spreads. Sounds good to remove then.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I'll definitely look into using DidReceiveCompositorFrameAck in more places after this CL. Right now this DCHECK seems to fire on Android so I'm kinda stuck: https://cs.chromium.org/chromium/src/content/browser/renderer_host/compositor... I'm investigating. https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/80001/cc/trees/single_thread_... cc/trees/single_thread_proxy.cc:803: if (layer_tree_host_) On 2017/04/04 19:36:21, danakj wrote: > On 2017/04/04 19:32:31, Saman Sami wrote: > > On 2017/04/04 19:20:54, danakj wrote: > > > This is unusual in this file, the host is only null after Stop() happens. > > After > > > which the proxy is destroyed IIRC. Can you explain why it's here? > > > > I'm not familiar with lifetime of layer_tree_host_ compared to the proxy so I > > thought this check will be useful. I'll get rid of it. > > I'm quite against branches just-in-case, and want them understood and explained, > because every branch is complexity when someone else reads or wants to change > the code later. Then they'd wonder when layer_tree_host_ is null, and people > will easily cargo-cult it into other places, and it just kinda spreads. Sounds > good to remove then. Done.
https://codereview.chromium.org/2757373002/diff/100001/cc/trees/single_thread... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/100001/cc/trees/single_thread... cc/trees/single_thread_proxy.cc:433: weak_factory_.GetWeakPtr())); You can make a separate weak factory for this, and invalidate it when ReleaseCompositorFrameSink happens, then we will stop acking in this case. Probably should do the same thing in proxy_impl, and you could add a LayerTreeHostTest that sends a commit, and in DidCommit it does ReleaseCompositorFrameSink and verifies no ack happens? https://codereview.chromium.org/2757373002/diff/100001/content/browser/render... File content/browser/renderer_host/compositor_impl_android.cc (left): https://codereview.chromium.org/2757373002/diff/100001/content/browser/render... content/browser/renderer_host/compositor_impl_android.cc:551: pending_frames_ = 0; Oh it looks like DidReceiveCompositorFrameSink is coming after ReleaseCompositorFrameSink which it's not expecting.
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I think my latest patch should make android happy. So basically we just call the release callback directly in single-thread mode. This way, we'll have consistent ordering between single- and multi-thread proxies and also not introduce new problems by delivering acks later than we used to. https://codereview.chromium.org/2757373002/diff/100001/content/browser/render... File content/browser/renderer_host/compositor_impl_android.cc (left): https://codereview.chromium.org/2757373002/diff/100001/content/browser/render... content/browser/renderer_host/compositor_impl_android.cc:551: pending_frames_ = 0; On 2017/04/04 21:47:50, danakj wrote: > Oh it looks like DidReceiveCompositorFrameSink is coming after > ReleaseCompositorFrameSink which it's not expecting. Yes probably.
https://codereview.chromium.org/2757373002/diff/120001/cc/layers/texture_laye... File cc/layers/texture_layer.cc (right): https://codereview.chromium.org/2757373002/diff/120001/cc/layers/texture_laye... cc/layers/texture_layer.cc:302: if (main_thread_task_runner->BelongsToCurrentThread()) { You're right this orders it right, but I usually avoid this pattern of if (on thread) { A } else { post to thread(A) } The reason is that things that are expecting to be on a fresh call stack suddenly are only sometimes on a fresh callstack which can lead to unexpected re-entrancy or other bad side effects (like for instance a recent bug where during paint we ended up destroying the compositor, which would then crash when it tried to continue painting but it's destroyed). So I'd prefer that we keep posting here, and just drop the ack if the frame sink is released. wdyt?
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2757373002/diff/120001/cc/layers/texture_laye... File cc/layers/texture_layer.cc (right): https://codereview.chromium.org/2757373002/diff/120001/cc/layers/texture_laye... cc/layers/texture_layer.cc:302: if (main_thread_task_runner->BelongsToCurrentThread()) { I see. Yes, dropping the ack should work. I sent out a patch doing that. Let's see how the bots feel.
https://codereview.chromium.org/2757373002/diff/160001/content/browser/render... File content/browser/renderer_host/compositor_impl_android.cc (right): https://codereview.chromium.org/2757373002/diff/160001/content/browser/render... content/browser/renderer_host/compositor_impl_android.cc:809: if (!has_compositor_frame_sink_) This would be racey if CompositorImpl can has_compositor_frame_sink_ again before the ack arrives, which is why I suggested the weakptrfactory to cancel the callback mid-flight. But maybe you can convince us that that wouldn't happen?
https://codereview.chromium.org/2757373002/diff/160001/content/browser/render... File content/browser/renderer_host/compositor_impl_android.cc (right): https://codereview.chromium.org/2757373002/diff/160001/content/browser/render... content/browser/renderer_host/compositor_impl_android.cc:809: if (!has_compositor_frame_sink_) On 2017/04/04 22:14:47, danakj wrote: > This would be racey if CompositorImpl can has_compositor_frame_sink_ again > before the ack arrives, which is why I suggested the weakptrfactory to cancel > the callback mid-flight. But maybe you can convince us that that wouldn't > happen? Interesting. No, it's not obvious to me why it won't be racey. I'll implement this tomorrow.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL. I added weak pointers for both single and multi threaded case.
Thanks, looks good! Can you add unit tests that verify the cases that we invalidate weakptrs and show that the callback doesn't run in each of the cases you added? Since this was an expectation of the android compositor these should have existed but we didn't realize it was an expectation until now. https://codereview.chromium.org/2757373002/diff/220001/cc/trees/proxy_impl.cc File cc/trees/proxy_impl.cc (right): https://codereview.chromium.org/2757373002/diff/220001/cc/trees/proxy_impl.cc... cc/trees/proxy_impl.cc:126: base::WeakPtr<ProxyMain> compositor_frame_sink_weak_ptr) { If you want, and I think it'd be slightly simpler, you could make the weakptrs entirely in proxy_impl, because - ReleaseCompositorFrameSink() is a synchronous method to the compositor thread, so the main thread can't run the callback while telling the compositor thread to release. - DidLoseCompositorFrameSink comes from the compositor thread. You'd do it by putting the factory right on proxy_impl, and invalidating inside this class.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_ozone_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL. Android passes and I added a test.
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. In single-thread mode we used to call DidReceiveCompositorFrameAck directly instead of PostTasking which caused different behaviour from multi-thread mode. We decided to also use PostTask for DidReceiveCompositorFrameAck in single thread mode and use weak pointers to discard the ack if CompositorFrameSink is released before ack is processed. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. In single-thread mode we used to call DidReceiveCompositorFrameAck directly instead of PostTasking which caused different behaviour from multi-thread mode. We decided to also use PostTask for DidReceiveCompositorFrameAck in single thread mode and use weak pointers to discard the ack if CompositorFrameSink is released before ack is processed. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. In single-thread mode we used to call DidReceiveCompositorFrameAck directly instead of PostTasking which caused different behaviour from multi-thread mode. We decided to also use PostTask for DidReceiveCompositorFrameAck in single thread mode and use weak pointers to discard the ack if CompositorFrameSink is released before ack is processed. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ==========
Thanks! The test looks great. A couple last comments https://codereview.chromium.org/2757373002/diff/340001/cc/test/test_hooks.h File cc/test/test_hooks.h (right): https://codereview.chromium.org/2757373002/diff/340001/cc/test/test_hooks.h#n... cc/test/test_hooks.h:64: virtual void WillReceiveCompositorFrameAckOnImplThread() {} just "OnThread" is the convention used here, and can you group it with the other OnThread things just above? https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7505: // Verifies that LayerTreeHost does not receive frame acks from a released LayerTreeHostClient https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7512: layer_tree_host()->SetRootLayer(root); std::move() https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7529: &LayerTreeHostTestDiscardAckAfterRelease::VerifyAckNotReceived, Can you post the VerifyAckNotReceived with a small delay (100ms?). Not because what you have is racey but just in case code changes in the future, and the ack ends up going down a different path, maybe gets posted later, then the test would see the ack happen /at least/ flakily. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/proxy_main.cc File cc/trees/proxy_main.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/proxy_main.cc... cc/trees/proxy_main.cc:92: frame_sink_bound_weak_factory_.InvalidateWeakPtrs(); I would either add a unit test for this case too or remove this change. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/single_thread... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/single_thread... cc/trees/single_thread_proxy.cc:409: frame_sink_bound_weak_factory_.InvalidateWeakPtrs(); same here
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks for the comments! PTAL. https://codereview.chromium.org/2757373002/diff/340001/cc/test/test_hooks.h File cc/test/test_hooks.h (right): https://codereview.chromium.org/2757373002/diff/340001/cc/test/test_hooks.h#n... cc/test/test_hooks.h:64: virtual void WillReceiveCompositorFrameAckOnImplThread() {} On 2017/04/18 20:58:42, danakj wrote: > just "OnThread" is the convention used here, and can you group it with the other > OnThread things just above? Done. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7505: // Verifies that LayerTreeHost does not receive frame acks from a released On 2017/04/18 20:58:42, danakj wrote: > LayerTreeHostClient Done. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7512: layer_tree_host()->SetRootLayer(root); On 2017/04/18 20:58:42, danakj wrote: > std::move() Done. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7529: &LayerTreeHostTestDiscardAckAfterRelease::VerifyAckNotReceived, On 2017/04/18 20:58:42, danakj wrote: > Can you post the VerifyAckNotReceived with a small delay (100ms?). Not because > what you have is racey but just in case code changes in the future, and the ack > ends up going down a different path, maybe gets posted later, then the test > would see the ack happen /at least/ flakily. Done. Good point. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/proxy_main.cc File cc/trees/proxy_main.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/proxy_main.cc... cc/trees/proxy_main.cc:92: frame_sink_bound_weak_factory_.InvalidateWeakPtrs(); On 2017/04/18 20:58:42, danakj wrote: > I would either add a unit test for this case too or remove this change. Removed to keep things simple. If we ever need this behaviour, we can revisit. https://codereview.chromium.org/2757373002/diff/340001/cc/trees/single_thread... File cc/trees/single_thread_proxy.cc (right): https://codereview.chromium.org/2757373002/diff/340001/cc/trees/single_thread... cc/trees/single_thread_proxy.cc:409: frame_sink_bound_weak_factory_.InvalidateWeakPtrs(); On 2017/04/18 20:58:42, danakj wrote: > same here Removed.
https://codereview.chromium.org/2757373002/diff/360001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/360001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7528: MainThreadTaskRunner()->PostDelayedTask( Thanks! LGTM One suggestion is pretend you are a traveller from the future and are reading this code and want to understand it. Over-commenting in tests is very helpful because every thing you do in a test has a reason but it's not always entirely clear, and it makes understanding this later much easier. So like why delayed here, so you don't have to reason it out again in the future. Same for why the Release is happening between WillReceive and DidReceive on the compositor thread.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Patchset #20 (id:380001) has been deleted
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_compile_dbg_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...) win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...) win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Thanks for the LGTM. PTAL if you can. I updated the test as we discussed offline.
LGTM https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7514: void PrepareNextCommit() { nit: I think the standard way to write this would be to keep the layer setup in SetupTree, then in CheckFrameAck() to do layer_tree_host()->SetNeedsCommit() instead of resetting the root layer. And lastly to use layer_tree_host()->source_frame_number() instead of rolling your own frame counter as commit_count_.
The CQ bit was checked by samans@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7514: void PrepareNextCommit() { On 2017/04/19 14:36:28, danakj wrote: > nit: I think the standard way to write this would be to keep the layer setup in > SetupTree, then in CheckFrameAck() to do layer_tree_host()->SetNeedsCommit() > instead of resetting the root layer. And lastly to use > layer_tree_host()->source_frame_number() instead of rolling your own frame > counter as commit_count_. Seems like SetNeedsCommit doesn't work on its own. I think If I don't make the tree dirty, the swap will not happen.
https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7514: void PrepareNextCommit() { On 2017/04/19 15:00:15, Saman Sami wrote: > On 2017/04/19 14:36:28, danakj wrote: > > nit: I think the standard way to write this would be to keep the layer setup > in > > SetupTree, then in CheckFrameAck() to do layer_tree_host()->SetNeedsCommit() > > instead of resetting the root layer. And lastly to use > > layer_tree_host()->source_frame_number() instead of rolling your own frame > > counter as commit_count_. > > Seems like SetNeedsCommit doesn't work on its own. I think If I don't make the > tree dirty, the swap will not happen. Oh oh yeah, to redraw you need some damage. One way is to do something to change the root layer (instead of SetNeedsCommit) like // Cause damage so that we draw and swap. layer_tree_host()->root_layer()->SetBackgroundColor(SK_ColorGREEN);
https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... File cc/trees/layer_tree_host_unittest.cc (right): https://codereview.chromium.org/2757373002/diff/400001/cc/trees/layer_tree_ho... cc/trees/layer_tree_host_unittest.cc:7514: void PrepareNextCommit() { On 2017/04/19 15:23:01, danakj wrote: > On 2017/04/19 15:00:15, Saman Sami wrote: > > On 2017/04/19 14:36:28, danakj wrote: > > > nit: I think the standard way to write this would be to keep the layer setup > > in > > > SetupTree, then in CheckFrameAck() to do layer_tree_host()->SetNeedsCommit() > > > instead of resetting the root layer. And lastly to use > > > layer_tree_host()->source_frame_number() instead of rolling your own frame > > > counter as commit_count_. > > > > Seems like SetNeedsCommit doesn't work on its own. I think If I don't make the > > tree dirty, the swap will not happen. > > Oh oh yeah, to redraw you need some damage. One way is to do something to change > the root layer (instead of SetNeedsCommit) like > > // Cause damage so that we draw and swap. > layer_tree_host()->root_layer()->SetBackgroundColor(SK_ColorGREEN); Done.
The CQ bit was checked by samans@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from danakj@chromium.org Link to the patchset: https://codereview.chromium.org/2757373002/#ps460001 (title: "c")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 460001, "attempt_start_ts": 1492616750531760, "parent_rev": "7196f5ddd17df5379a7b8042ad6d208ee06b667c", "commit_rev": "44b6dfc95bd4a7b8240b81720b9804692b439ae9"}
Message was sent while issue was closed.
Description was changed from ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. In single-thread mode we used to call DidReceiveCompositorFrameAck directly instead of PostTasking which caused different behaviour from multi-thread mode. We decided to also use PostTask for DidReceiveCompositorFrameAck in single thread mode and use weak pointers to discard the ack if CompositorFrameSink is released before ack is processed. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 ========== to ========== Fixing flakiness of TextureLayerChangeInvisibleMailboxTest The flake seems to be because of a race between the order of running main and impl thread. Resources are always reclaimed after impl threads posts DidCommitAndDrawFrame to the main thread so whether commit_count_ == 4 or 5 depends on whether we switch to main thread or keep running impl thread and reclaim resources which triggers posting a task to the main thread to release the mailbox. Even though the task for DidCommitAndDrawFrame is posted first, the task for releasing mailbox is posted using a blocking task runner so it is run first. In order to solve this issue, we decided to use DidReceiveCompositorFrameAck instead of DidCommitAndDrawFrame because all the resources are released before DidReceiveCompositorFrameAck is called so the race between releasing resources and starting next commit doesn't happen. In single-thread mode we used to call DidReceiveCompositorFrameAck directly instead of PostTasking which caused different behaviour from multi-thread mode. We decided to also use PostTask for DidReceiveCompositorFrameAck in single thread mode and use weak pointers to discard the ack if CompositorFrameSink is released before ack is processed. CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel BUG=702868 Review-Url: https://codereview.chromium.org/2757373002 Cr-Commit-Position: refs/heads/master@{#465638} Committed: https://chromium.googlesource.com/chromium/src/+/44b6dfc95bd4a7b8240b81720b98... ==========
Message was sent while issue was closed.
Committed patchset #23 (id:460001) as https://chromium.googlesource.com/chromium/src/+/44b6dfc95bd4a7b8240b81720b98... |