|
|
DescriptionAVX 2 SrcOver blits: color32, blitmask.
As a follow up to the SSE 4.1 CL, this should look pretty familiar.
I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later.
Perf changes (relative to SSE 4.1):
Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit
Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent
text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit
text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit
This should be a big throughout win, and a small latency win.
This should all be pixel-exact to the previous SSE 4.1 code.
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1532613002
Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36
TODO:
- x86_64-cros-linux-gnu-g++ does not seem to support AVX2, despite accepting the -mavx2 command line option:
error: '__m256i' does not name a type (and many others)
- we don't appear to be passing /arch:avx2 correctly to MSVC, producing this warning-as-error:
warning C4752: found Intel(R) Advanced Vector Extensions; consider using /arch:AVX
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot,Build-Ubuntu-GCC-x86_64-Debug-CrOS_Link-Trybot,Build-Win-MSVC-x86_64-Release-Trybot
Patch Set 1 #Patch Set 2 : 8 at a time #Patch Set 3 : hacking #Patch Set 4 : fix #Patch Set 5 : tighter #Patch Set 6 : separate out packing strategy #Patch Set 7 : TODO #Patch Set 8 : small stuff #
Total comments: 4
Patch Set 9 : mul255 #Patch Set 10 : update SSE4.1 to match new style #Patch Set 11 : factoring #Patch Set 12 : replicate_coverage #Patch Set 13 : tiny tweak #Patch Set 14 : streamline n<4 #Patch Set 15 : consistency #Patch Set 16 : inline #Patch Set 17 : dup #Patch Set 18 : ah, struct #Patch Set 19 : more consistent like this #Patch Set 20 : turn avx2 back on #Patch Set 21 : branch-free tail coverage #Patch Set 22 : formatting tweaks #Patch Set 23 : undo buggy branch-free tail code #Patch Set 24 : i'd have sworn other_cflags used to work #
Messages
Total messages: 92 (50 generated)
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. Follow up to the SSE 4.1 CL. ========== to ========== AVX 2 SrcOver blits: color32, blitmask. Follow up to the SSE 4.1 CL. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/80001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Ubuntu-GCC-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-x86...)
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/100001
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. Follow up to the SSE 4.1 CL. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. Follow up to the SSE 4.1 CL. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ==========
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/120001
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. Follow up to the SSE 4.1 CL. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ==========
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/140001
mtklein@chromium.org changed reviewers: + herb@google.com - mtklein@google.com
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1535443003 ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ==========
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/140001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/140001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.cpp File src/opts/SkOpts_avx2.cpp (right): https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.c... src/opts/SkOpts_avx2.cpp:181: s_255_128 = div255_part1(_mm256_mullo_epi16(s, _mm256_set1_epi16(255))), Instead of _mm256_mullo_epi16, could we use _mm256_bslli_epi128? (s<<8) - s. This should be two cycles instead of 5. https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.c... src/opts/SkOpts_avx2.cpp:205: dst += dstRB / sizeof(*dst); My paranoid lizard brain says that we should hoist the increments like : dstRB / sizeof(*dst), just because my lizard brain does not trust the compiler.
https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.cpp File src/opts/SkOpts_avx2.cpp (right): https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.c... src/opts/SkOpts_avx2.cpp:181: s_255_128 = div255_part1(_mm256_mullo_epi16(s, _mm256_set1_epi16(255))), On 2015/12/17 at 21:37:26, herb_g wrote: > Instead of _mm256_mullo_epi16, could we use _mm256_bslli_epi128? (s<<8) - s. This should be two cycles instead of 5. Done. I admit I didn't pay much attention to things that asymptotically cost 0. https://codereview.chromium.org/1532613002/diff/140001/src/opts/SkOpts_avx2.c... src/opts/SkOpts_avx2.cpp:205: dst += dstRB / sizeof(*dst); Nah, `ptr += k / sizeof(*ptr)` is as cheap as you can get for a pointer increment in C++. Remember, all pointers are really byte pointers, so this compiles into a simple increment by k, where something like int stride = k / sizeof(*ptr); ... ptr += stride; is potentially more expensive, as it's got to remultiply stride by sizeof(*ptr) to get it into bytes. (Though, in practice I bet it optimizes to the same code.) It's also not anywhere near the critical path. So, agreed, paranoid lizard brain.
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/160001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I plan to backport these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ==========
mtklein@chromium.org changed reviewers: - mtklein@google.com
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should be pixel-exact to the SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Mac10.8-Clang-MacMini4.1-CPU-SSE4-x86_64-Release-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/200001
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/220001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/220001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/240001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/240001
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/260001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/260001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/320001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/320001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...)
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/340001
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/360001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/360001
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/380001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/380001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/420001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/420001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/360002 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/360002
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/360002 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/360002
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Test-Mac10.8-Clang-MacMini4.1-CPU-SSE4-x86_64-Release-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Mac10.8-Clang-MacMini4....)
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/360002 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/360002
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Test-Mac10.8-Clang-MacMini4.1-CPU-SSE4-x86_64-Release-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Mac10.8-Clang-MacMini4....)
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/360002 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/360002
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Build-Mac10.8-Clang-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.8-Clang-...)
The CQ bit was checked by mtklein@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from herb@google.com Link to the patchset: https://codereview.chromium.org/1532613002/#ps450001 (title: "i'd have sworn other_cflags used to work")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/450001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/450001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Test-Mac10.8-Clang-MacMini4.1-CPU-SSE4-x86_64-Release-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Mac10.8-Clang-MacMini4....)
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Mac10.8-Clang-MacMini4.1-CPU-SSE4-x86_64-Release-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ==========
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1532613002/450001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1532613002/450001
Message was sent while issue was closed.
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36 ==========
Message was sent while issue was closed.
Committed patchset #24 (id:450001) as https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36
Message was sent while issue was closed.
A revert of this CL (patchset #24 id:450001) has been created in https://codereview.chromium.org/1632713002/ by msarett@google.com. The reason for reverting is: Bot failures.
Message was sent while issue was closed.
Description was changed from ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36 ========== to ========== AVX 2 SrcOver blits: color32, blitmask. As a follow up to the SSE 4.1 CL, this should look pretty familiar. I've made some organizational changes around how we load, store, pack, and unpack data that I think makes things clearer and more orthogonal, and it'll make it easier to try out a pmaddubsw lerp. I have backported these changes to the SSE 4.1 code, and I hope that I can actually get a lot of this code templated for sharing between the two later. Perf changes (relative to SSE 4.1): Xfermode_SrcOver: 1650 -> 1180 (0.71x) // large opaque blit Xfermode_SrcOver_aa: 1794 -> 1653 (0.92x) // large opaque + small transparent text_16_AA_{FF,BK,WT}: 1.72 -> 1.59 (0.92x) // small opaque blit text_16_AA_88: 1.83 -> 1.77 (0.97x) // small transparent blit This should be a big throughout win, and a small latency win. This should all be pixel-exact to the previous SSE 4.1 code. GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/5d2117015eb271e09faf4a7ddd89093c9d618a36 TODO: - x86_64-cros-linux-gnu-g++ does not seem to support AVX2, despite accepting the -mavx2 command line option: error: '__m256i' does not name a type (and many others) - we don't appear to be passing /arch:avx2 correctly to MSVC, producing this warning-as-error: warning C4752: found Intel(R) Advanced Vector Extensions; consider using /arch:AVX CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;client.skia.compile:Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot,Build-Mac10.9-Clang-x86_64-Release-CMake-Trybot,Build-Ubuntu-GCC-x86_64-Debug-CrOS_Link-Trybot,Build-Win-MSVC-x86_64-Release-Trybot ========== |