|
|
Descriptiongpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks
The fragment shader of CMAA is heavy but CMAA is not expensive. It's because
CMAA runs the fragment shader only on edge fragments via early Z rejection.
The edge framents is only fractional on the whole screen.
However, COMBINE_EDGES path runs the fragment shadre on all screen fragments.
It's redundant because combined edges in COMBINE_EDGES path must be subset of
the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside
the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer.
For your information, CMAA consists of in terms of GPU cost;
* DETECT_EDGES1 : cheap shader on the whole screen.
* DETECT_EDGES2 : cheap shader on the only edges.
* COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL
* BLUR_EDGES : heavy shader on the only edges.
Performance data:
Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes
The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz
FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1"
NoAA 25.2 FPS
MSAA 10.6 FPS
CMAA-before 19.9 FPS
CMAA-after 21.3 FPS
BUG=535198
TEST=Run a WebGL app on Chromebook Pixel 2015
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel
Committed: https://crrev.com/45a5dde9824768ca016b4abae13a05abbd99014c
Cr-Commit-Position: refs/heads/master@{#404132}
Patch Set 1 #Patch Set 2 : update comments #
Total comments: 1
Patch Set 3 : remove g_Depth uniform because frag shader controls depth by itself #
Total comments: 2
Messages
Total messages: 20 (10 generated)
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. BUG=535198 TEST=Run a WebGL app with Chrome started with "--enable-cmaa-shaders" ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. BUG=535198 TEST=Run a WebGL app with Chrome started with "--enable-cmaa-shaders" CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
dongseong.hwang@intel.com changed reviewers: + adrian.belgun@intel.com, jon.kennedy@intel.com, piman@chromium.org, zmo@chromium.org
jon, adrian, could you review? zmo@, could you review as owner?
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. BUG=535198 TEST=Run a WebGL app with Chrome started with "--enable-cmaa-shaders" CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
https://codereview.chromium.org/2125803002/diff/20001/gpu/command_buffer/serv... File gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc (left): https://codereview.chromium.org/2125803002/diff/20001/gpu/command_buffer/serv... gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc:404: // detection to work correctly). It's wrong, because all edge information is generated by DETECT_EDGES1 path. All pixel which DETECT_EDGES1 didn't mark don't have any information of edges.
remove g_Depth uniform because frag shader controls depth by itself https://codereview.chromium.org/2125803002/diff/40001/gpu/command_buffer/serv... File gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc (right): https://codereview.chromium.org/2125803002/diff/40001/gpu/command_buffer/serv... gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc:1157: gl_FragDepth = any(bvec4(outputEdges)) ? 1.0 : 0.0; gl_Depth is redudant. https://codereview.chromium.org/2125803002/diff/40001/gpu/command_buffer/serv... gpu/command_buffer/service/gles2_cmd_apply_framebuffer_attachment_cmaa_intel.cc:1425: any(greaterThan(outEdge4, uvec4(1))) ? 1.0 : 0.0; gl_Depth is redudant.
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
lgtm
On 2016/07/06 18:39:02, piman wrote: > lgtm Thank you for reviewing. Filip Strugar, who is CMAA author, gave me lgtm via email, because he is not familiar with rietveld. Let me land.
The CQ bit was checked by dongseong.hwang@intel.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by dongseong.hwang@intel.com
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * DETECT_EDGES2 : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * BLUR_EDGES : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
The CQ bit was checked by dongseong.hwang@intel.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * BLUR_EDGES : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * BLUR_EDGES : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001)
Message was sent while issue was closed.
Description was changed from ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * BLUR_EDGES : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel ========== to ========== gpu, cmaa: optimize COMBINE_EDGES path to reduce fragment shader tasks The fragment shader of CMAA is heavy but CMAA is not expensive. It's because CMAA runs the fragment shader only on edge fragments via early Z rejection. The edge framents is only fractional on the whole screen. However, COMBINE_EDGES path runs the fragment shadre on all screen fragments. It's redundant because combined edges in COMBINE_EDGES path must be subset of the edges, which DETECT_EDGES1 finds. So COMBINE_EDGES must be performed inside the area, which DETECT_EDGES1 marks depth value 1 on the depth buffer. For your information, CMAA consists of in terms of GPU cost; * DETECT_EDGES1 : cheap shader on the whole screen. * DETECT_EDGES2 : cheap shader on the only edges. * COMBINE_EDGES : cheap shader on the only edges. <- fixed in this CL * BLUR_EDGES : heavy shader on the only edges. Performance data: Measure FPS for NoAA, MSAA, CMAA-before and CMAA-after on http://akirodic.com/p/jellyfish/ with 50 jellyfishes The test machine is Intel Haswell, Intel(R) Core(TM) i7-4900MQ CPU @ 2.80GHz FPS is measured by --show-fps-counter --enable-logging=stderr --vmodule="head*=1" NoAA 25.2 FPS MSAA 10.6 FPS CMAA-before 19.9 FPS CMAA-after 21.3 FPS BUG=535198 TEST=Run a WebGL app on Chromebook Pixel 2015 CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_optional_gpu_tests_rel;tryserver.chromium.mac:mac_optional_gpu_tests_rel;tryserver.chromium.win:win_optional_gpu_tests_rel Committed: https://crrev.com/45a5dde9824768ca016b4abae13a05abbd99014c Cr-Commit-Position: refs/heads/master@{#404132} ==========
Message was sent while issue was closed.
Patchset 3 (id:??) landed as https://crrev.com/45a5dde9824768ca016b4abae13a05abbd99014c Cr-Commit-Position: refs/heads/master@{#404132} |