|
|
Descriptionarithmetic mode with Sk4f
After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package.
Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time:
- short circuits on 0x00 and 0xff coverage;
- floating point multiplication with untrusted k1-k4; if someone figures out a clever way to sometimes create denorm floats and sometimes not, there's a gigantic performance difference.
I would hazard the pin is constant time now though.
I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right.
curr/maxrss loops min median mean max stddev samples config bench
9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa
9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa
9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm
9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic
~~~~>
curr/maxrss loops min median mean max stddev samples config bench
9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa
9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa
9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm
9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1873963003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/d58f840650a6768b50d024247e2817ccbacd8a0d
Patch Set 1 #Patch Set 2 : all together #Messages
Total messages: 21 (11 generated)
Description was changed from ========== arithmetic mode with Sk4f BUG=skia: ========== to ========== arithmetic mode with Sk4f BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== arithmetic mode with Sk4f BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== arithmetic mode with Sk4f curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== arithmetic mode with Sk4f I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right. curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage - floating point multiplication with untrusted k1-k4 I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right. curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage; - floating point multiplication with untrusted k1-k4; if someone figures out a clever way to sometimes create denorm floats and sometimes not, there's a gigantic performance difference. I would hazard the pin is constant time now though. I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right. curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
mtklein@chromium.org changed reviewers: + senorblanco@chromium.org
(The raw nanobench times obviously vary from your code and machine, but the speedups are all qualitatively the same.)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1873963003/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1873963003/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/04/08 at 22:29:25, commit-bot wrote: > Dry run: This issue passed the CQ dry run. Neat, we can see the lerp fix in gm/aaxfermodes. The AA edges are no longer disturbingly bright and jagged.
This is freakin' awesome! Even if the perf were slightly worse that my hand-rolled SSE2 (which I doubt), it'd be worth it for the clarity and maintenance win alone. I also didn't realize (until this made me check) that we've made NEON min-spec on ARM. That makes life sooo much easier. Does it give a pixel-exact match for the old code in the non-AA case? Otherwise it'll break layout tests in Chrome. (Chrome doesn't use the AA path, so changing those is ok.)
Just to answer my own questions: this absolutely destroys my version (at least on this Mac where it gets to do PSHUFB and other SSSE3 goodness). And the non-AA GMs look to be an exact match. A most hearty LGTM!
On 2016/04/09 at 23:45:58, senorblanco wrote: > Just to answer my own questions: this absolutely destroys my version (at least on this Mac where it gets to do PSHUFB and other SSSE3 goodness). And the non-AA GMs look to be an exact match. > > A most hearty LGTM! :)
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1873963003/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1873963003/20001
Message was sent while issue was closed.
Description was changed from ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage; - floating point multiplication with untrusted k1-k4; if someone figures out a clever way to sometimes create denorm floats and sometimes not, there's a gigantic performance difference. I would hazard the pin is constant time now though. I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right. curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== arithmetic mode with Sk4f After reading the SSE version, I figured I'd show off the new hotness a little. This'll get us SSE, NEON and portable implementations all in one easy to read package. Since we've been talking about it, it's worth noting the several ways this implementation is still not constant time: - short circuits on 0x00 and 0xff coverage; - floating point multiplication with untrusted k1-k4; if someone figures out a clever way to sometimes create denorm floats and sometimes not, there's a gigantic performance difference. I would hazard the pin is constant time now though. I've also fixed the lerp to lerp between dst and r instead of src and r. That can't have been right. curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 25.5ms 25.5ms 25.5ms 25.5ms 0% ▃▁▁▃▂▇▅▆▇█ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 24.1ms 24.2ms 24.2ms 24.3ms 0% ▄▃▁▄█▆▆█▃█ 8888 Xfermode_arithmetic_aa 9/9 MB 1 102ms 102ms 102ms 103ms 0% ▁▅▂▆▂█▂█▁▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 94.8ms 95.4ms 95.2ms 95.8ms 0% ▅▅▁▁▁▁▄▇█▇ 8888 Xfermode_arithmetic ~~~~> curr/maxrss loops min median mean max stddev samples config bench 9/9 MB 1 9.71ms 9.74ms 9.73ms 9.78ms 0% █▅▄▄▁▂▂▂▄▄ 8888 Xfermode_arithmetic_enforce_pm_aa 9/9 MB 1 9.5ms 9.57ms 9.58ms 9.7ms 1% ▂▁█▅▂▂▆▃▄▄ 8888 Xfermode_arithmetic_aa 9/9 MB 1 21.8ms 21.8ms 21.8ms 21.9ms 0% █▂▂▂▂▂▂▁▄▂ 8888 Xfermode_arithmetic_enforce_pm 9/9 MB 1 16.5ms 16.6ms 16.6ms 16.6ms 0% ▃█▁▁▄▄▁▁▆▅ 8888 Xfermode_arithmetic BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/d58f840650a6768b50d024247e2817ccbacd8a0d ==========
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://skia.googlesource.com/skia/+/d58f840650a6768b50d024247e2817ccbacd8a0d |