|
|
Descriptiongpu: Reduce GL context switches used to check pending queries.
This reduces the frequency of GL context switches for checking
pending queries by keeping track of the last time we checked
queries. Instead of simply posting as task that will switch
context and process pending queries we now keep track of the
time at which we should process queries and only when we have
reached this time will we switch context and process queries.
BUG=522903, 436314
Committed: https://crrev.com/87580eb4e9a648eff277e8e2883f9c1616877285
Cr-Commit-Position: refs/heads/master@{#349502}
Patch Set 1 #Patch Set 2 : Fix InProcessCommandBuffer #Patch Set 3 : webview fix #
Total comments: 17
Patch Set 4 : rebase #Messages
Total messages: 38 (11 generated)
reveman@chromium.org changed reviewers: + sievers@chromium.org
The CQ bit was checked by reveman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1315713007/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1315713007/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_android_r...)
dongseong.hwang@intel.com changed reviewers: + dongseong.hwang@intel.com
Thank you for fixing this issue. looks very good. some nits. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( do we have to delay one more? PollWork() is already called via PostDelayedTask considering |process_delayed_work_time_|. In addition, I'm afraid that twice post-tasks delays to perform pending tasks for unnecessarily long time. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:399: scheduler_->HasMoreIdleWork()); very nice :) https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:428: scheduler_->HasMoreIdleWork()) { Don't we need to check (scheduler_->HasPendingQueries() || scheduler_->HasMoreIdleWork())? https://codereview.chromium.org/1315713007/diff/40001/gpu/command_buffer/serv... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/1315713007/diff/40001/gpu/command_buffer/serv... gpu/command_buffer/service/gles2_cmd_decoder.cc:11900: if (!query_manager_.get()) if (!HasPendingQueries())
I checked if this CL resolves the root issue. Unfortunately, not. I measure FPS on http://webglsamples.org/aquarium/aquarium.html on daisy_sping I applied this CL and then check with and without quirk: https://codereview.chromium.org/1221433002/ with quirk without quirk 50: 58 40 250: 50 33 1k: 34 24 2k: 22 19 4k: 12 12 It's same result I reported in https://code.google.com/p/chromium/issues/detail?id=522903 I guess 1. EGL driver is smart enough to do nothing on redundant context change. 2. The root issue probably is preemption by scheduler BTW, this CL is nice. lgtm with nits
On 2015/09/01 at 10:52:35, dongseong.hwang wrote: > I checked if this CL resolves the root issue. Unfortunately, not. > > I measure FPS on http://webglsamples.org/aquarium/aquarium.html on daisy_sping > I applied this CL and then check with and without quirk: https://codereview.chromium.org/1221433002/ > with quirk without quirk > 50: 58 40 > 250: 50 33 > 1k: 34 24 > 2k: 22 19 > 4k: 12 12 > > It's same result I reported in https://code.google.com/p/chromium/issues/detail?id=522903 > I guess > 1. EGL driver is smart enough to do nothing on redundant context change. > 2. The root issue probably is preemption by scheduler I was afraid that this would not be enough. We're still doing a lot of context switches. This just removed a few obviously unnecessary ones. It think this patch is a good first step. The next step is to start using SignalQuery in cc/ as that would allow us to reduce the number context switches significantly because it's only really when the compositor has run out of staging buffers that we need to poll. We can use a pending SignalQuery as an indicator that polling frequency needs to be relatively high. If there's no pending SignalQuery then checking queries ones per frame might be enough. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > do we have to delay one more? > PollWork() is already called via PostDelayedTask considering |process_delayed_work_time_|. > In addition, I'm afraid that twice post-tasks delays to perform pending tasks for unnecessarily long time. That's what this patch is supposed to improve. If "process_delayed_work_time_ > current_time" then that means we've checked for pending queries since the task was posted so performing work here would cause us to do that more frequently than desired. We need to delay the PerformWork more to avoid that. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:428: scheduler_->HasMoreIdleWork()) { On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > Don't we need to check (scheduler_->HasPendingQueries() || scheduler_->HasMoreIdleWork())? This code is only for handling idle work. Pending queries should not affect it. https://codereview.chromium.org/1315713007/diff/40001/gpu/command_buffer/serv... File gpu/command_buffer/service/gles2_cmd_decoder.cc (right): https://codereview.chromium.org/1315713007/diff/40001/gpu/command_buffer/serv... gpu/command_buffer/service/gles2_cmd_decoder.cc:11900: if (!query_manager_.get()) On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > if (!HasPendingQueries()) It's up to the caller to do that check if needed. This way the code is more consistent with PerformIdleWork.
https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 15:54:11, reveman wrote: > On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > > do we have to delay one more? > > PollWork() is already called via PostDelayedTask considering > |process_delayed_work_time_|. > > In addition, I'm afraid that twice post-tasks delays to perform pending tasks > for unnecessarily long time. > > That's what this patch is supposed to improve. If "process_delayed_work_time_ > > current_time" then that means we've checked for pending queries since the task > was posted so performing work here would cause us to do that more frequently > than desired. We need to delay the PerformWork more to avoid that. How is it possible |process_delayed_work_time_ > current_time|? Only GpuCommandBufferStub::ScheduleDelayedWork posts this method with |delay|, which is set to |process_delayed_work_time_|.
On 2015/09/01 15:54:11, reveman wrote: > On 2015/09/01 at 10:52:35, dongseong.hwang wrote: > > I checked if this CL resolves the root issue. Unfortunately, not. > > > > I measure FPS on http://webglsamples.org/aquarium/aquarium.html on daisy_sping > > I applied this CL and then check with and without quirk: > https://codereview.chromium.org/1221433002/ > > with quirk without quirk > > 50: 58 40 > > 250: 50 33 > > 1k: 34 24 > > 2k: 22 19 > > 4k: 12 12 > > > > It's same result I reported in > https://code.google.com/p/chromium/issues/detail?id=522903 > > I guess > > 1. EGL driver is smart enough to do nothing on redundant context change. > > 2. The root issue probably is preemption by scheduler > > I was afraid that this would not be enough. We're still doing a lot of context > switches. This just removed a few obviously unnecessary ones. It think this > patch is a good first step. The next step is to start using SignalQuery in cc/ > as that would allow us to reduce the number context switches significantly > because it's only really when the compositor has run out of staging buffers that > we need to poll. We can use a pending SignalQuery as an indicator that polling > frequency needs to be relatively high. If there's no pending SignalQuery then > checking queries ones per frame might be enough. Wow, sounds plan. Thx!
https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 at 16:15:50, dshwang_ooo_5.9-27.9 wrote: > On 2015/09/01 15:54:11, reveman wrote: > > On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > > > do we have to delay one more? > > > PollWork() is already called via PostDelayedTask considering > > |process_delayed_work_time_|. > > > In addition, I'm afraid that twice post-tasks delays to perform pending tasks > > for unnecessarily long time. > > > > That's what this patch is supposed to improve. If "process_delayed_work_time_ > > > current_time" then that means we've checked for pending queries since the task > > was posted so performing work here would cause us to do that more frequently > > than desired. We need to delay the PerformWork more to avoid that. > > How is it possible |process_delayed_work_time_ > current_time|? > Only GpuCommandBufferStub::ScheduleDelayedWork posts this method with |delay|, which is set to |process_delayed_work_time_|. ScheduleDelayedWork is called as a result of performing work. A call to that function means that we should re-schedule the time at which we should perform delayed. At line 409 we update |process_delayed_work_time_| to the new time at which we should perform delayed work and if a delayed task is already in flight then "process_delayed_work_time_ > current_time" is likely going to be true when that task runs.
https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 16:25:17, reveman wrote: > On 2015/09/01 at 16:15:50, dshwang_ooo_5.9-27.9 wrote: > > On 2015/09/01 15:54:11, reveman wrote: > > > On 2015/09/01 at 08:56:10, dshwang_ooo_5.9-27.9 wrote: > > > > do we have to delay one more? > > > > PollWork() is already called via PostDelayedTask considering > > > |process_delayed_work_time_|. > > > > In addition, I'm afraid that twice post-tasks delays to perform pending > tasks > > > for unnecessarily long time. > > > > > > That's what this patch is supposed to improve. If > "process_delayed_work_time_ > > > > current_time" then that means we've checked for pending queries since the > task > > > was posted so performing work here would cause us to do that more frequently > > > than desired. We need to delay the PerformWork more to avoid that. > > > > How is it possible |process_delayed_work_time_ > current_time|? > > Only GpuCommandBufferStub::ScheduleDelayedWork posts this method with |delay|, > which is set to |process_delayed_work_time_|. > > ScheduleDelayedWork is called as a result of performing work. A call to that > function means that we should re-schedule the time at which we should perform > delayed. At line 409 we update |process_delayed_work_time_| to the new time at > which we should perform delayed work and if a delayed task is already in flight > then "process_delayed_work_time_ > current_time" is likely going to be true when > that task runs. I understand, thx for explanation.
https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 16:25:17, reveman wrote: > ScheduleDelayedWork is called as a result of performing work. A call to that > function means that we should re-schedule the time at which we should perform > delayed. At line 409 we update |process_delayed_work_time_| to the new time at > which we should perform delayed work and if a delayed task is already in flight > then "process_delayed_work_time_ > current_time" is likely going to be true when > that task runs. Is it possible for idle tasks and pending query to be starved? If GpuCommandBufferStub get messages continuously, it looks possible.
https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:341: task_runner_->PostDelayedTask( On 2015/09/01 at 16:37:00, dshwang_ooo_5.9-27.9 wrote: > On 2015/09/01 16:25:17, reveman wrote: > > ScheduleDelayedWork is called as a result of performing work. A call to that > > function means that we should re-schedule the time at which we should perform > > delayed. At line 409 we update |process_delayed_work_time_| to the new time at > > which we should perform delayed work and if a delayed task is already in flight > > then "process_delayed_work_time_ > current_time" is likely going to be true when > > that task runs. > > Is it possible for idle tasks and pending query to be starved? If GpuCommandBufferStub get messages continuously, it looks possible. Pending queries are processed each time we process messages so they can't be starved. Idle work can be starved but I'm not sure if that's a problem or the behavior we prefer. It was a bit of a problem with async uploads but with that removed it might be better to only do idle work when actually idle.
I forgot about this. sievers, can you take a look?
You have to rebase on top of https://codereview.chromium.org/1308913004/. It touches this code but shouldn't really change anything functionally. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:381: scheduler_->ProcessPendingQueries(); Why do we really treat pending queries different from idle work? Apart from the fact that we already check queries after each handled message and after flush and finish. I'm wondering if we can even avoid the MakeCurrent() above in line 355 if we are not idle. It means we will handle a message very soon and check pending queries anyhow. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:409: process_delayed_work_time_ = current_time + delay; This keeps pushing out the time though if called repeatedly, so it's a bit counterintuitive if you just look at ScheduleDelayedWork() in isolation (the previous implementation has an early return if already scheduled). Is this because calling this *should* imply that you also probably just did PollWork() or weren't idle (i.e. handled a message and also at least processed queries then)? I'm just ever so slightly worried that it could be misconceived and somebody adds a call to this somewhere with the intention of 'making sure work is scheduled'. Maybe a comment above the function is all it needs.
The CQ bit was checked by reveman@chromium.org to run a CQ dry run
ptal https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:381: scheduler_->ProcessPendingQueries(); On 2015/09/16 at 21:04:35, sievers wrote: > Why do we really treat pending queries different from idle work? > Apart from the fact that we already check queries after each handled message and after flush and finish. I'm not that familiar with the current use cases for idle work. It was created for async uploads but is used for other things today. Would be nice to remove the idle work concept if possible. > > I'm wondering if we can even avoid the MakeCurrent() above in line 355 if we are not idle. It means we will handle a message very soon and check pending queries anyhow. I'm not sure processing of queries as idle work is going to work well. I think that would result in even worse stalls when running out of staging buffers than we have today. While flush and finish will often be enough to process pending queries, it's important that we check them frequently in situations where the compositor depends on them to make progress when initializing tiles. e.g. when raster is fast enough that we run out of staging buffers. My plan for reducing processing of queries when not needed but increasing the interval when needed is to use SignalQuery API from the compositor when we run out of staging buffers to tell the GPU service that it's now important to process queries. I'd prefer to keep PerformIdleWork and ProcessPendingQueries separated for now. https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:409: process_delayed_work_time_ = current_time + delay; On 2015/09/16 at 21:04:35, sievers wrote: > This keeps pushing out the time though if called repeatedly, so it's a bit counterintuitive if you just look at ScheduleDelayedWork() in isolation (the previous implementation has an early return if already scheduled). Is this because calling this *should* imply that you also probably just did PollWork() or weren't idle (i.e. handled a message and also at least processed queries then)? > > I'm just ever so slightly worried that it could be misconceived and somebody adds a call to this somewhere with the intention of 'making sure work is scheduled'. Maybe a comment above the function is all it needs. Improved the comment in the header file to make this more clear.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1315713007/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1315713007/60001
lgtm https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... File content/common/gpu/gpu_command_buffer_stub.cc (right): https://codereview.chromium.org/1315713007/diff/40001/content/common/gpu/gpu_... content/common/gpu/gpu_command_buffer_stub.cc:381: scheduler_->ProcessPendingQueries(); On 2015/09/16 23:12:08, reveman wrote: > On 2015/09/16 at 21:04:35, sievers wrote: > > Why do we really treat pending queries different from idle work? > > Apart from the fact that we already check queries after each handled message > and after flush and finish. > > I'm not that familiar with the current use cases for idle work. It was created > for async uploads but is used for other things today. Would be nice to remove > the idle work concept if possible. > > > > > I'm wondering if we can even avoid the MakeCurrent() above in line 355 if we > are not idle. It means we will handle a message very soon and check pending > queries anyhow. > > I'm not sure processing of queries as idle work is going to work well. I think > that would result in even worse stalls when running out of staging buffers than > we have today. > > While flush and finish will often be enough to process pending queries, it's > important that we check them frequently in situations where the compositor > depends on them to make progress when initializing tiles. e.g. when raster is > fast enough that we run out of staging buffers. > > My plan for reducing processing of queries when not needed but increasing the > interval when needed is to use SignalQuery API from the compositor when we run > out of staging buffers to tell the GPU service that it's now important to > process queries. I'd prefer to keep PerformIdleWork and ProcessPendingQueries > separated for now. > ok sounds good. i was just thinking that because we already process queries above after each message being handled, that i was wondering how much value it adds to do it here for |!is_idle|. it seems that would occur when we have messages pending (!is_idle) but are not processing them because we are descheduled. so probably important enough to consider.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by reveman@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dongseong.hwang@intel.com Link to the patchset: https://codereview.chromium.org/1315713007/#ps60001 (title: "rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1315713007/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1315713007/60001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
reveman@chromium.org changed reviewers: + boliu@chromium.org
+boliu for android_webview/
So I guess this is just a rename in InProcessCommandBuffer, but there isn't actually any change there?
On 2015/09/17 at 20:44:06, boliu wrote: > So I guess this is just a rename in InProcessCommandBuffer, but there isn't actually any change there? Yes, it should have been called "delayed" work from the start as it's both idle work and processing of queries. This rename makes it less confusing.
On 2015/09/17 21:33:38, reveman wrote: > On 2015/09/17 at 20:44:06, boliu wrote: > > So I guess this is just a rename in InProcessCommandBuffer, but there isn't > actually any change there? > > Yes, it should have been called "delayed" work from the start as it's both idle > work and processing of queries. This rename makes it less confusing. lgtm
The CQ bit was checked by reveman@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1315713007/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1315713007/60001
Message was sent while issue was closed.
Committed patchset #4 (id:60001)
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/87580eb4e9a648eff277e8e2883f9c1616877285 Cr-Commit-Position: refs/heads/master@{#349502} |