|
|
Descriptionfloat xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX.
Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x
Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x
Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x
Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x
Xfermode_ColorDodge 25ms -> 16.5ms 0.66x
Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x
Ought to be no pixel diffs:
https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dgm&master=false
Incidental stuff:
I made the SkNx(T) constructors implicit to make writing math expressions simpler.
This allows us to write expressions like
Sk4f v;
...
v = v*4;
rather than
Sk4f v;
...
v = v * Sk4f(4);
As written it only works when the constant is on the right-hand side,
so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on
following up with a CL that lets those become `(1 - da)` too.
BUG=skia:4117
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/084db25d47dbad3ffbd7d15c04b63d344b351f90
Patch Set 1 #Patch Set 2 : 1.0f/255 #
Total comments: 14
Messages
Total messages: 25 (13 generated)
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1432903002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1432903002/1
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1432903002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1432903002/20001
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1431143002 ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1431143002 ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dg... Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1431143002 ==========
mtklein@chromium.org changed reviewers: + msarett@google.com
"I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4);" I'm not sure I'm convinced that this is better than before. v = v * Sk4f(4); Seems to indicate that vector math is happening, while the alternative causes me to pause and try to figure our what's going on. On the other hand, it's nice to get rid of T(4) and other similar expressions. Also, this seems to be keeping with overriding +, *, -, etc. https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h File src/opts/SkNx_avx.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h#new... src/opts/SkNx_avx.h:32: __m256i fix32 = _mm256_insertf128_si256(_mm256_castsi128_si256(lo32), hi32, 1); Seems annoying to do all the unpacking in 128-bit and then combine to 256-bits. But I guess we don't get 256-bit integer unpack/shuffle instructions until AVX2. https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opts.h File src/opts/SkXfermode_opts.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:126: static inline Sk8f alphas(const Sk8f& f) { return SkNx_shuffle<3,3,3,3,7,7,7,7>(f); } Where is shuffle defined for AVX? Oh, it looks like it uses SkNx's default implementation with kth(). Is a platform specific implementation a TODO? Or can we not do better than the default? https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:254: void xfer32(SkPMColor dst[], const SkPMColor src[], int n, const SkAlpha aa[]) const override { It looks like you've deleted some private helper functions and pulled them into the implementation of this function? Which allows you to efficiently combine Sk8f and Sk4f? https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:257: auto d = Sk8f::FromBytes((const uint8_t*)dst) * (1.0f/255), nit: Any reason not to write 1.0f / 255.0f? https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:289: dst[i] = SkPixel32ToPixel16(dst32); // Repack dst to 565 and store. This seems slow? Although not different from before.
On 2015/11/09 at 23:25:06, msarett wrote: > "I made the SkNx(T) constructors implicit to make writing math expressions simpler. > This allows us to write expressions like > Sk4f v; > ... > v = v*4; > rather than > Sk4f v; > ... > v = v * Sk4f(4);" > > I'm not sure I'm convinced that this is better than before. > > v = v * Sk4f(4); > Seems to indicate that vector math is happening, while the alternative causes me to pause and try to figure our what's going on. > > On the other hand, it's nice to get rid of T(4) and other similar expressions. Also, this seems to be keeping with overriding +, *, -, etc. Pretty much exactly how I feel about it.
mtklein@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h File src/opts/SkNx_avx.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h#new... src/opts/SkNx_avx.h:32: __m256i fix32 = _mm256_insertf128_si256(_mm256_castsi128_si256(lo32), hi32, 1); On 2015/11/09 at 23:25:06, msarett wrote: > Seems annoying to do all the unpacking in 128-bit and then combine to 256-bits. But I guess we don't get 256-bit integer unpack/shuffle instructions until AVX2. Right. Even then, the AVX and AVX2 packing, unpacking, and shuffling instructions can't cross from the low 128 bits to the high 128 bits or the other way around. They basically only work on 2 128-bit arguments in parallel. They're depressingly disappointing. E.g., take a look at the pseudocode for vpshufb... at first you think, that looks weird, why is it written so complicated... and then you're like, oh, it just sucks that's why: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3868,474... The only unpacking instruction that's really useful for us and that works in the way you'd hope is vpmovzxbd to zero-extend 8 bytes to 8 ints, but that's AVX2. That'd get us right from fix8 to fix32 in this code (and I think can even take a memory argument to read from, compiling the entire function up to __m256i fix32 = ...; into a single instruction). https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opts.h File src/opts/SkXfermode_opts.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:126: static inline Sk8f alphas(const Sk8f& f) { return SkNx_shuffle<3,3,3,3,7,7,7,7>(f); } On 2015/11/09 at 23:25:06, msarett wrote: > Where is shuffle defined for AVX? > > Oh, it looks like it uses SkNx's default implementation with kth(). Is a platform specific implementation a TODO? Or can we not do better than the default? Right, default implementation. This compiles into one instruction, vpermilps $255, %ymmN, %ymmM. (Shuffle element 3 into each of lanes 0,1,2,3 and element 3+4=7 into 4,5,6,7). Pretty hard to beat. https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:254: void xfer32(SkPMColor dst[], const SkPMColor src[], int n, const SkAlpha aa[]) const override { On 2015/11/09 at 23:25:06, msarett wrote: > It looks like you've deleted some private helper functions and pulled them into the implementation of this function? Which allows you to efficiently combine Sk8f and Sk4f? Yep, just moving things around. https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:257: auto d = Sk8f::FromBytes((const uint8_t*)dst) * (1.0f/255), On 2015/11/09 at 23:25:06, msarett wrote: > nit: Any reason not to write 1.0f / 255.0f? Nope, either way works the same and gets to the same compile time constant. I'd write 1.0/255, but MSVC complains that 1.0/255 != 1.0f/255. Gotta get an 'f' in there somewhere. https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:289: dst[i] = SkPixel32ToPixel16(dst32); // Repack dst to 565 and store. On 2015/11/09 at 23:25:06, msarett wrote: > This seems slow? Although not different from before. Right. No different from before. Just always 1 pixel at a time. Seems like a fine compromise for 565.
lgtm https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h File src/opts/SkNx_avx.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkNx_avx.h#new... src/opts/SkNx_avx.h:32: __m256i fix32 = _mm256_insertf128_si256(_mm256_castsi128_si256(lo32), hi32, 1); On 2015/11/10 00:22:05, mtklein wrote: > On 2015/11/09 at 23:25:06, msarett wrote: > > Seems annoying to do all the unpacking in 128-bit and then combine to > 256-bits. But I guess we don't get 256-bit integer unpack/shuffle instructions > until AVX2. > > Right. Even then, the AVX and AVX2 packing, unpacking, and shuffling > instructions can't cross from the low 128 bits to the high 128 bits or the other > way around. They basically only work on 2 128-bit arguments in parallel. > They're depressingly disappointing. E.g., take a look at the pseudocode for > vpshufb... at first you think, that looks weird, why is it written so > complicated... and then you're like, oh, it just sucks that's why: > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#expand=3868,474... > Good point, forgot about that. > The only unpacking instruction that's really useful for us and that works in the > way you'd hope is vpmovzxbd to zero-extend 8 bytes to 8 ints, but that's AVX2. > That'd get us right from fix8 to fix32 in this code (and I think can even take a > memory argument to read from, compiling the entire function up to __m256i fix32 > = ...; into a single instruction). Cool! https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opts.h File src/opts/SkXfermode_opts.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:126: static inline Sk8f alphas(const Sk8f& f) { return SkNx_shuffle<3,3,3,3,7,7,7,7>(f); } On 2015/11/10 00:22:05, mtklein wrote: > On 2015/11/09 at 23:25:06, msarett wrote: > > Where is shuffle defined for AVX? > > > > Oh, it looks like it uses SkNx's default implementation with kth(). Is a > platform specific implementation a TODO? Or can we not do better than the > default? > > Right, default implementation. This compiles into one instruction, vpermilps > $255, %ymmN, %ymmM. (Shuffle element 3 into each of lanes 0,1,2,3 and element > 3+4=7 into 4,5,6,7). Pretty hard to beat. Great, cool instruction! I'm impressed by the compiler. Is having the shared code the benefit of obscuring this? https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:254: void xfer32(SkPMColor dst[], const SkPMColor src[], int n, const SkAlpha aa[]) const override { On 2015/11/10 00:22:05, mtklein wrote: > On 2015/11/09 at 23:25:06, msarett wrote: > > It looks like you've deleted some private helper functions and pulled them > into the implementation of this function? Which allows you to efficiently > combine Sk8f and Sk4f? > > Yep, just moving things around. Acknowledged.
https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opts.h File src/opts/SkXfermode_opts.h (right): https://codereview.chromium.org/1432903002/diff/20001/src/opts/SkXfermode_opt... src/opts/SkXfermode_opts.h:126: static inline Sk8f alphas(const Sk8f& f) { return SkNx_shuffle<3,3,3,3,7,7,7,7>(f); } On 2015/11/10 at 14:54:16, msarett wrote: > On 2015/11/10 00:22:05, mtklein wrote: > > On 2015/11/09 at 23:25:06, msarett wrote: > > > Where is shuffle defined for AVX? > > > > > > Oh, it looks like it uses SkNx's default implementation with kth(). Is a > > platform specific implementation a TODO? Or can we not do better than the > > default? > > > > Right, default implementation. This compiles into one instruction, vpermilps > > $255, %ymmN, %ymmM. (Shuffle element 3 into each of lanes 0,1,2,3 and element > > 3+4=7 into 4,5,6,7). Pretty hard to beat. > > Great, cool instruction! > > I'm impressed by the compiler. Is having the shared code the benefit of obscuring this? Yeah, Clang's pretty impressive. Mostly SkNx_shuffle works the way it does in case we find things the compiler can't quite do best. There have been a couple shuffles we think we can do better on NEON, for instance, though none are currently landed. SkNx_shuffle also gives us a point where we can hook in and help out less smart compilers, not to name names.
Description was changed from ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dg... Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot COMMIT=false Depends on https://codereview.chromium.org/1431143002 ========== to ========== float xfermodes (burn, dodge, softlight) in Sk8f, possibly using AVX. Xfermode_ColorDodge_aa 10.3ms -> 7.85ms 0.76x Xfermode_SoftLight_aa 13.8ms -> 10.2ms 0.74x Xfermode_ColorBurn_aa 10.7ms -> 7.82ms 0.73x Xfermode_SoftLight 33.6ms -> 23.2ms 0.69x Xfermode_ColorDodge 25ms -> 16.5ms 0.66x Xfermode_ColorBurn 26.1ms -> 16.6ms 0.63x Ought to be no pixel diffs: https://gold.skia.org/search2?issue=1432903002&unt=true&query=source_type%3Dg... Incidental stuff: I made the SkNx(T) constructors implicit to make writing math expressions simpler. This allows us to write expressions like Sk4f v; ... v = v*4; rather than Sk4f v; ... v = v * Sk4f(4); As written it only works when the constant is on the right-hand side, so expressions like `(Sk4f(1) - da)` have to stay for now. I plan on following up with a CL that lets those become `(1 - da)` too. BUG=skia:4117 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1432903002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1432903002/20001
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://skia.googlesource.com/skia/+/084db25d47dbad3ffbd7d15c04b63d344b351f90 |