|
|
DescriptionClean up SkXfermode_opts.h
It seems that MSVC + __vectorcall don't play well together,
so back ourselves out into a situation where we don't need it.
- Inline transfermode functions. This removes the need for SK_VECTORCALL.
- Remove 565 destination specializations.
Blending into 565 is not speed-critical enough to merit the code bloat.
- Removing 565 specializations means a bunch of Sk4px code is now dead.
8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate.
565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate.
[1] the 565->8888 conversion is actually being autovectorized
BUG=skia:4765, skia:4776
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1565223002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
No public API changes.
TBR=reed@google.com
Committed: https://skia.googlesource.com/skia/+/defa0daa6a0f4e97a3527a522ae602c6771a7c80
Patch Set 1 #Patch Set 2 : names and refactoring #Patch Set 3 : maybe we dont need to throw away all perf #Patch Set 4 : typo #Patch Set 5 : &&&&&& #
Messages
Total messages: 31 (21 generated)
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. BUG=skia:4765 ========== to ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. 8888 xfermodes generally speed up a bit, smoothly ranging from equal speed down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. 8888 xfermodes generally speed up a bit, smoothly ranging from equal speed down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not important enough (or a good enough idea) to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing them one pixel at a time, smoothly ranging from no change up to 3.8x for the slowest functions like Multiply and HardLight. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing them one pixel at a time, smoothly ranging from no change up to 3.8x for the slowest functions like Multiply and HardLight. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially* on the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. * the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially* on the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. * the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] on the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] on the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Clean up SkXfermode_opts.h Seems like MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1565223002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1565223002/60001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1565223002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1565223002/80001
mtklein@chromium.org changed reviewers: + lsalzman@mozilla.com, reed@google.com
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Any reason why the lambdas in the Map calls still need to be there? The only real function they seem to serve is to swap the order of the argument to the xfermode, whereas that could just as well be done by flipping those in the xfermode definition itself. That was one of the things I tried addressing in my own patch.
On 2016/01/07 21:04:56, lsalzman1 wrote: > Any reason why the lambdas in the Map calls still need to be there? The only > real function they seem to serve is to swap the order of the argument to the > xfermode, whereas that could just as well be done by flipping those in the > xfermode definition itself. That was one of the things I tried addressing in my > own patch. Ah, I noticed that too. Was thinking of just cleaning that up in another CL, given that it's a no-op refactor. I actually tried that once before, but I think I got stymied by MSVC. In retrospect, it was probably just miscompiling it even worse, so I'd rather follow up than do that first. :)
Description was changed from ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
lgtm lgtm
The CQ bit was checked by mtklein@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1565223002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1565223002/80001
Description was changed from ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot No public API changes. TBR=reed@google.com ==========
Message was sent while issue was closed.
Description was changed from ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot No public API changes. TBR=reed@google.com ========== to ========== Clean up SkXfermode_opts.h It seems that MSVC + __vectorcall don't play well together, so back ourselves out into a situation where we don't need it. - Inline transfermode functions. This removes the need for SK_VECTORCALL. - Remove 565 destination specializations. Blending into 565 is not speed-critical enough to merit the code bloat. - Removing 565 specializations means a bunch of Sk4px code is now dead. 8888 xfermodes generally speed up a bit from inlining, smoothly ranging from no change down to 0.65x for the fastest functions like Plus or Modulate. 565 xfermodes generally slow down because we're doing 565 -> 8888 and 8888->565 conversion serially[1] and using the stack, smoothly ranging from no change up to 2x slower for the fastest functions like Plus and Modulate. [1] the 565->8888 conversion is actually being autovectorized BUG=skia:4765,skia:4776 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot No public API changes. TBR=reed@google.com Committed: https://skia.googlesource.com/skia/+/defa0daa6a0f4e97a3527a522ae602c6771a7c80 ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://skia.googlesource.com/skia/+/defa0daa6a0f4e97a3527a522ae602c6771a7c80
Message was sent while issue was closed.
Patchset #6 (id:100001) has been deleted |