|
|
Chromium Code Reviews|
Created:
4 years, 8 months ago by ccameron Modified:
4 years, 8 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam, piman+watch_chromium.org, enne (OOO), danakj Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionMac: Clean up ImageTransportSurfaceOverlayMac timing
Change the processing of a frame to be the following:
0. GPU commands are decoded (compositor, WebGL, nor nothing)
1. SwapBuffers is called on ImageTransportSurfaceOverlayMac
2. Do glFinish on the GL context
3. Update the CALayer tree and ack the swap to the browser
This is much simpler to reason about, and appears to result in improved
performance.
BUG=601608
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel
Committed: https://crrev.com/5a3c87b37a46ce1b0813f0530d981165daaa7cfe
Cr-Commit-Position: refs/heads/master@{#386316}
Patch Set 1 #Patch Set 2 : Rebase #
Messages
Total messages: 26 (10 generated)
ccameron@chromium.org changed reviewers: + kbr@chromium.org
This makes the change that we tested locally, and then deletes all of the now-dead code. I feel that in this area, I built something far too complicated just because the simple solution felt "too easy". If this creates new problems on other hardware, we can revert and look at other options. We'll have to rely on our Canary/Dev/Beta channels to see that.
Based on our local testing, this patch significantly improves behavior for GPU-bound content. The large code simplifications are a great added benefit. It would be good to get a review from someone else more familiar with scheduling and the overall compositor architecture. CC'ing piman, sunnyps, enne, danakj as candidates. The glFinish call does seem a bit scary, but also visibly improves both performance and behavior. It improves back-pressure and prevents the renderer from issuing more work than the GPU can consume. It would be ideal if we could add performance tests covering this change. LGTM overall.
On Thu, Apr 7, 2016 at 3:38 PM, <kbr@chromium.org> wrote: > Based on our local testing, this patch significantly improves behavior for > GPU-bound content. The large code simplifications are a great added > benefit. > > It would be good to get a review from someone else more familiar with > scheduling > and the overall compositor architecture. CC'ing piman, sunnyps, enne, > danakj as > candidates. > > The glFinish call does seem a bit scary, but also visibly improves both > performance and behavior. It improves back-pressure and prevents the > renderer > from issuing more work than the GPU can consume. > There's 2 things that I can see the glFinish hurting: 1- paint-heavy content, because while we wait on glFinish on the browser context, we can't schedule uploads. 2- content that is CPU-bound on the GPU process - i.e. driver-heavy or command-buffer-heavy, maybe skia - because we miss overlap opportunities between CPU and GPU. So I guess the question is, have you tested this change against that type of content? For #1 maybe we can use the SwapBuffersAsync mechanism, like on Ozone. On Mac we could do it by queueing a fence after the glFlush and then polling it. That would free up the thread so that we can schedule other things. For #2, maybe it's not as much of a problem if #1 is fixed (we don't lose the time completely, we can schedule other things). That said, if you tested those use cases and they're fine, we can go ahead with this. > > It would be ideal if we could add performance tests covering this change. > > LGTM overall. > > > https://codereview.chromium.org/1867163002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/04/07 23:07:03, piman wrote: > On Thu, Apr 7, 2016 at 3:38 PM, <mailto:kbr@chromium.org> wrote: > > > Based on our local testing, this patch significantly improves behavior for > > GPU-bound content. The large code simplifications are a great added > > benefit. > > > > > It would be good to get a review from someone else more familiar with > > scheduling > > and the overall compositor architecture. CC'ing piman, sunnyps, enne, > > danakj as > > candidates. > > > > The glFinish call does seem a bit scary, but also visibly improves both > > performance and behavior. It improves back-pressure and prevents the > > renderer > > from issuing more work than the GPU can consume. > > > > There's 2 things that I can see the glFinish hurting: > 1- paint-heavy content, because while we wait on glFinish on the browser > context, we can't schedule uploads. > 2- content that is CPU-bound on the GPU process - i.e. driver-heavy or > command-buffer-heavy, maybe skia - because we miss overlap opportunities > between CPU and GPU. > > So I guess the question is, have you tested this change against that type > of content? > > For #1 maybe we can use the SwapBuffersAsync mechanism, like on Ozone. On > Mac we could do it by queueing a fence after the glFlush and then polling > it. That would free up the thread so that we can schedule other things. For > #2, maybe it's not as much of a problem if #1 is fixed (we don't lose the > time completely, we can schedule other things). For #1, the glFinish is in the GPU process, and the scheduler allows ~2 swaps to be in-flight, so this doesn't start stalling the browser until we get >2 frames behind (which I think is what we want). For #2, that is a concern -- if we are GPU-decoder-limited, this will make things worse. I haven't managed to encounter a page that is decoder limited -- I tried on my ~2014 machine and on a ~2013 Macbook Air, and they both behave very smoothly with this change. > > That said, if you tested those use cases and they're fine, we can go ahead > with this. > > > > > > It would be ideal if we could add performance tests covering this change. > > > > LGTM overall. > > > > > > https://codereview.chromium.org/1867163002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
On Thu, Apr 7, 2016 at 4:24 PM, <ccameron@chromium.org> wrote: > On 2016/04/07 23:07:03, piman wrote: > > > On Thu, Apr 7, 2016 at 3:38 PM, <mailto:kbr@chromium.org> wrote: > > > > > Based on our local testing, this patch significantly improves behavior > for > > > GPU-bound content. The large code simplifications are a great added > > > benefit. > > > > > > > > It would be good to get a review from someone else more familiar with > > > scheduling > > > and the overall compositor architecture. CC'ing piman, sunnyps, enne, > > > danakj as > > > candidates. > > > > > > The glFinish call does seem a bit scary, but also visibly improves both > > > performance and behavior. It improves back-pressure and prevents the > > > renderer > > > from issuing more work than the GPU can consume. > > > > > > > There's 2 things that I can see the glFinish hurting: > > 1- paint-heavy content, because while we wait on glFinish on the browser > > context, we can't schedule uploads. > > 2- content that is CPU-bound on the GPU process - i.e. driver-heavy or > > command-buffer-heavy, maybe skia - because we miss overlap opportunities > > between CPU and GPU. > > > > So I guess the question is, have you tested this change against that type > > of content? > > > > For #1 maybe we can use the SwapBuffersAsync mechanism, like on Ozone. On > > Mac we could do it by queueing a fence after the glFlush and then polling > > it. That would free up the thread so that we can schedule other things. > For > > #2, maybe it's not as much of a problem if #1 is fixed (we don't lose the > > time completely, we can schedule other things). > > For #1, the glFinish is in the GPU process, and the scheduler allows ~2 > swaps to > be in-flight, so this doesn't start stalling the browser until we get >2 > frames > behind (which I think is what we want). > But it would prevent renderer contexts (that want to do texture uploads) from being scheduled, wouldn't it? > > For #2, that is a concern -- if we are GPU-decoder-limited, this will make > things worse. I haven't managed to encounter a page that is decoder > limited -- I > tried on my ~2014 machine and on a ~2013 Macbook Air, and they both behave > very > smoothly with this change. > > > > > That said, if you tested those use cases and they're fine, we can go > ahead > > with this. > > > > > > > > > > It would be ideal if we could add performance tests covering this > change. > > > > > > LGTM overall. > > > > > > > > > https://codereview.chromium.org/1867163002/ > > > > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Chromium-reviews" group. > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > https://codereview.chromium.org/1867163002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/04/07 23:27:39, piman wrote: > On Thu, Apr 7, 2016 at 4:24 PM, <mailto:ccameron@chromium.org> wrote: > > > On 2016/04/07 23:07:03, piman wrote: > > > > > On Thu, Apr 7, 2016 at 3:38 PM, <mailto:kbr@chromium.org> wrote: > > > > > > > Based on our local testing, this patch significantly improves behavior > > for > > > > GPU-bound content. The large code simplifications are a great added > > > > benefit. > > > > > > > > > > > It would be good to get a review from someone else more familiar with > > > > scheduling > > > > and the overall compositor architecture. CC'ing piman, sunnyps, enne, > > > > danakj as > > > > candidates. > > > > > > > > The glFinish call does seem a bit scary, but also visibly improves both > > > > performance and behavior. It improves back-pressure and prevents the > > > > renderer > > > > from issuing more work than the GPU can consume. > > > > > > > > > > There's 2 things that I can see the glFinish hurting: > > > 1- paint-heavy content, because while we wait on glFinish on the browser > > > context, we can't schedule uploads. > > > 2- content that is CPU-bound on the GPU process - i.e. driver-heavy or > > > command-buffer-heavy, maybe skia - because we miss overlap opportunities > > > between CPU and GPU. > > > > > > So I guess the question is, have you tested this change against that type > > > of content? > > > > > > For #1 maybe we can use the SwapBuffersAsync mechanism, like on Ozone. On > > > Mac we could do it by queueing a fence after the glFlush and then polling > > > it. That would free up the thread so that we can schedule other things. > > For > > > #2, maybe it's not as much of a problem if #1 is fixed (we don't lose the > > > time completely, we can schedule other things). > > > > For #1, the glFinish is in the GPU process, and the scheduler allows ~2 > > swaps to > > be in-flight, so this doesn't start stalling the browser until we get >2 > > frames > > behind (which I think is what we want). > > > > But it would prevent renderer contexts (that want to do texture uploads) > from being scheduled, wouldn't it? True, it would prevent texture uploads (or any other GL commands) from being scheduled. Mac is zero-copy-only -- I see raster tasks being executed independently of when the finish is happening. I tested way-oversubscribing the GPU, and I see that indeed raster happens during the glFinish (and also GMB allocation happens during the finish). > > > > > > For #2, that is a concern -- if we are GPU-decoder-limited, this will make > > things worse. I haven't managed to encounter a page that is decoder > > limited -- I > > tried on my ~2014 machine and on a ~2013 Macbook Air, and they both behave > > very > > smoothly with this change. > > > > > > > > That said, if you tested those use cases and they're fine, we can go > > ahead > > > with this. > > > > > > > > > > > > > > It would be ideal if we could add performance tests covering this > > change. > > > > > > > > LGTM overall. > > > > > > > > > > > > https://codereview.chromium.org/1867163002/ > > > > > > > > > > -- > > > You received this message because you are subscribed to the Google Groups > > > "Chromium-reviews" group. > > > To unsubscribe from this group and stop receiving emails from it, send an > > email > > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > > > > > https://codereview.chromium.org/1867163002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
Ok, LGTM from my side if we're happy with this.
sunnyps@chromium.org changed reviewers: + sunnyps@chromium.org
Did you consider using a fence to delay SwapBuffersAck to the browser instead of using glFinish?
On 2016/04/08 02:22:05, sunnyps wrote: > Did you consider using a fence to delay SwapBuffersAck to the browser instead of > using glFinish? Sorry, I saw your notes on the bug after I wrote this. I'll ask any questions I have there.
Thanks!
The CQ bit was checked by ccameron@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1867163002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1867163002/1
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by ccameron@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1867163002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1867163002/1
The CQ bit was unchecked by ccameron@chromium.org
Description was changed from ========== Mac: Clean up ImageTransportSurfaceOverlayMac timing Change the processing of a frame to be the following: 0. GPU commands are decoded (compositor, WebGL, nor nothing) 1. SwapBuffers is called on ImageTransportSurfaceOverlayMac 2. Do glFinish on the GL context 3. Update the CALayer tree and ack the swap to the browser This is much simpler to reason about, and appears to result in improved performance. BUG=601608 ========== to ========== Mac: Clean up ImageTransportSurfaceOverlayMac timing Change the processing of a frame to be the following: 0. GPU commands are decoded (compositor, WebGL, nor nothing) 1. SwapBuffers is called on ImageTransportSurfaceOverlayMac 2. Do glFinish on the GL context 3. Update the CALayer tree and ack the swap to the browser This is much simpler to reason about, and appears to result in improved performance. BUG=601608 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
The CQ bit was checked by ccameron@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from kbr@chromium.org, piman@chromium.org Link to the patchset: https://codereview.chromium.org/1867163002/#ps20001 (title: "Rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1867163002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1867163002/20001
Message was sent while issue was closed.
Committed patchset #2 (id:20001)
Message was sent while issue was closed.
Description was changed from ========== Mac: Clean up ImageTransportSurfaceOverlayMac timing Change the processing of a frame to be the following: 0. GPU commands are decoded (compositor, WebGL, nor nothing) 1. SwapBuffers is called on ImageTransportSurfaceOverlayMac 2. Do glFinish on the GL context 3. Update the CALayer tree and ack the swap to the browser This is much simpler to reason about, and appears to result in improved performance. BUG=601608 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== Mac: Clean up ImageTransportSurfaceOverlayMac timing Change the processing of a frame to be the following: 0. GPU commands are decoded (compositor, WebGL, nor nothing) 1. SwapBuffers is called on ImageTransportSurfaceOverlayMac 2. Do glFinish on the GL context 3. Update the CALayer tree and ack the swap to the browser This is much simpler to reason about, and appears to result in improved performance. BUG=601608 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Committed: https://crrev.com/5a3c87b37a46ce1b0813f0530d981165daaa7cfe Cr-Commit-Position: refs/heads/master@{#386316} ==========
Message was sent while issue was closed.
Patchset 2 (id:??) landed as https://crrev.com/5a3c87b37a46ce1b0813f0530d981165daaa7cfe Cr-Commit-Position: refs/heads/master@{#386316} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
