|
|
DescriptionAdd NEON swap opts and use opts in SkSwizzler
All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast
options (with the exception of conversions to 565).
Swizzle Time for swap_rb
0.94x Nexus 9
0.81x Nexus 6P
Unpremul Decode Time for RGBA PNGs***
ZeroInit 0.93x Nexus 9
Regular 0.94x Nexus 9
ZeroInit 0.97x Nexus 6P
ZeroInit 0.95x Nexus 6P
***Two Notes:
The improvements here are actually due to taking advantage of
memcpy() (no need to swap, the bytes are already in the proper
order).
ZeroInit skips writing zeros to zero initialized memory. This
is a memory use opt in Android.
BMP decodes should also benefit from these improvements.
I am relying on Gold to help test all possible cases.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1581933006
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/03108de163354fa574679ad153b58ce57126b2ba
Patch Set 1 : #
Total comments: 15
Patch Set 2 : Removing unrelated code #Patch Set 3 : Remove opts for kBGRX #
Total comments: 4
Patch Set 4 : Use SkTSwap #Messages
Total messages: 23 (10 generated)
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Add NEON optimization for swap_rb Swizzle Time 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Add NEON optimization for swap_rb Swizzle Time 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Add NEON optimization for swap_rb Swizzle Time 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Three Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Three Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler use fast options. (with the exception of decodes to 565) Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. I am relying on Gold to help test all possible cases. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:1) has been deleted
msarett@google.com changed reviewers: + mtklein@google.com, scroggo@google.com
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:322: memcpy(dst, src + offset, width * bpp); This could be a no-op if we decoded directly into dst. I think this makes an argument to try decoding directly into dst. I could see us making dst == src on decodes where the SrcConfig and DstColorType have the same number of bytes per pixel. One potential complication is subset decodes (where offset is non-zero). https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:734: case kRGB_565_SkColorType: We have never needed this because we "fill" RGB PNGs to RGB(FF) in libpng. We will stop doing this in libpng because it is slow and an extra pass over the output. https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:741: case kBGR: This will diverge from BGRX because of the opt functions. https://codereview.chromium.org/1581933006/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1581933006/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:145: uint8x16_t r = bgra.val[2], No matter how I write this code, clang really wants to do it in 4 moves (though it seems like it should be possible in 3). Maybe there is a limitation on which registers can load and which can store? Or maybe clang is dumb?
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:322: memcpy(dst, src + offset, width * bpp); On 2016/01/14 17:08:25, msarett wrote: > This could be a no-op if we decoded directly into dst. I think this makes an > argument to try decoding directly into dst. > > I could see us making dst == src on decodes where the SrcConfig and DstColorType > have the same number of bytes per pixel. > > One potential complication is subset decodes (where offset is non-zero). +1 https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:366: static void fast_swizzle_bgra_to_n32_premul( Should we have a fast_swizzle to unpremul, as well? Also, we've talked about decoding to the non-native 32 bit version. Should we add those along with these changes or hold off? https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:734: case kRGB_565_SkColorType: On 2016/01/14 17:08:25, msarett wrote: > We have never needed this because we "fill" RGB PNGs to RGB(FF) in libpng. > > We will stop doing this in libpng because it is slow and an extra pass over the > output. Is that a separate (future) CL?
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:734: case kRGB_565_SkColorType: On 2016/01/14 17:55:06, scroggo wrote: > On 2016/01/14 17:08:25, msarett wrote: > > We have never needed this because we "fill" RGB PNGs to RGB(FF) in libpng. > > > > We will stop doing this in libpng because it is slow and an extra pass over > the > > output. > > Is that a separate (future) CL? Yes. Maybe this doesn't belong in this CL?
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:734: case kRGB_565_SkColorType: On 2016/01/14 18:09:09, msarett wrote: > On 2016/01/14 17:55:06, scroggo wrote: > > On 2016/01/14 17:08:25, msarett wrote: > > > We have never needed this because we "fill" RGB PNGs to RGB(FF) in libpng. > > > > > > We will stop doing this in libpng because it is slow and an extra pass over > > the > > > output. > > > > Is that a separate (future) CL? > > Yes. Maybe this doesn't belong in this CL? If it's only needed once we turn off filling, then I don't think it belongs. (Unless it makes the rest of this code simpler?) https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:741: case kBGR: On 2016/01/14 17:08:25, msarett wrote: > This will diverge from BGRX because of the opt functions. Will there be opts for BGR? Or is that not conducive to using SIMD? Would it be faster to make two passes: one to fill (in libpng) and then use SIMD to swizzle? I'm guessing not?
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:366: static void fast_swizzle_bgra_to_n32_premul( On 2016/01/14 17:55:06, scroggo wrote: > Should we have a fast_swizzle to unpremul, as well? > > Also, we've talked about decoding to the non-native 32 bit version. Should we > add those along with these changes or hold off? Sorry I missed this. This is implemented in this CL also. It uses fast_swizzle_bgrx_to_n32 since the code is identical. I originally had them separated, but then I realized that the code was the same. https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:741: case kBGR: On 2016/01/14 18:14:51, scroggo wrote: > On 2016/01/14 17:08:25, msarett wrote: > > This will diverge from BGRX because of the opt functions. > > Will there be opts for BGR? Or is that not conducive to using SIMD? Would it be > faster to make two passes: one to fill (in libpng) and then use SIMD to swizzle? > I'm guessing not? Yes, I plan on opts for BGR and RGB (coming soon). What you've suggested is an option, but I think we can do better than that.
https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:734: case kRGB_565_SkColorType: On 2016/01/14 18:14:51, scroggo wrote: > On 2016/01/14 18:09:09, msarett wrote: > > On 2016/01/14 17:55:06, scroggo wrote: > > > On 2016/01/14 17:08:25, msarett wrote: > > > > We have never needed this because we "fill" RGB PNGs to RGB(FF) in libpng. > > > > > > > > We will stop doing this in libpng because it is slow and an extra pass > over > > > the > > > > output. > > > > > > Is that a separate (future) CL? > > > > Yes. Maybe this doesn't belong in this CL? > > If it's only needed once we turn off filling, then I don't think it belongs. > (Unless it makes the rest of this code simpler?) Done.
One naming nit. otherwise 1gtm. Again deferring to Mike on the details. https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:366: static void fast_swizzle_bgra_to_n32_premul( On 2016/01/14 18:23:19, msarett wrote: > On 2016/01/14 17:55:06, scroggo wrote: > > Should we have a fast_swizzle to unpremul, as well? > > > > Also, we've talked about decoding to the non-native 32 bit version. Should we > > add those along with these changes or hold off? > > Sorry I missed this. > > This is implemented in this CL also. It uses fast_swizzle_bgrx_to_n32 since the > code is identical. I originally had them separated, but then I realized that > the code was the same. Ah, then should these functions be named something other than "n32"? _to_32?
I've removed the kBGRX opts because we cannot guarantee that the "X" will be "FF" (BMP is the only user of this case anyway). This makes the names confusing. In a follow-up change, I will make a distinction between kBGRX/kBGRF and kRGBX/kRGBF. https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1581933006/diff/20001/src/codec/SkSwizzler.cp... src/codec/SkSwizzler.cpp:366: static void fast_swizzle_bgra_to_n32_premul( On 2016/01/14 19:08:06, scroggo wrote: > On 2016/01/14 18:23:19, msarett wrote: > > On 2016/01/14 17:55:06, scroggo wrote: > > > Should we have a fast_swizzle to unpremul, as well? > > > > > > Also, we've talked about decoding to the non-native 32 bit version. Should > we > > > add those along with these changes or hold off? > > > > Sorry I missed this. > > > > This is implemented in this CL also. It uses fast_swizzle_bgrx_to_n32 since > the > > code is identical. I originally had them separated, but then I realized that > > the code was the same. > > Ah, then should these functions be named something other than "n32"? _to_32? Changed to _to_32.
I looked mostly at SkSwizzler_opts.h. If there's anything tricky I should see in SkSwizzer.cpp, let me know? https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:148: bgra.val[0] = r; Consider replacing this with: SkTSwap(bgra.val[0], bgra.val[2]); (and below). https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:157: while (count >= 8) { my math says this "while" can be written as "if".
Nothing too complicated in SkSwizzler.cpp. Leon's looked over that file. https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:148: bgra.val[0] = r; On 2016/01/15 17:50:50, mtklein wrote: > Consider replacing this with: > SkTSwap(bgra.val[0], bgra.val[2]); > (and below). Done. https://codereview.chromium.org/1581933006/diff/60001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:157: while (count >= 8) { On 2016/01/15 17:50:50, mtklein wrote: > my math says this "while" can be written as "if". Done.
lgtm
The CQ bit was checked by msarett@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1581933006/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1581933006/80001
Message was sent while issue was closed.
Description was changed from ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. I am relying on Gold to help test all possible cases. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Add NEON swap opts and use opts in SkSwizzler All RGBA, RGBX, BGRA, BGRX routines in SkSwizzler now use fast options (with the exception of conversions to 565). Swizzle Time for swap_rb 0.94x Nexus 9 0.81x Nexus 6P Unpremul Decode Time for RGBA PNGs*** ZeroInit 0.93x Nexus 9 Regular 0.94x Nexus 9 ZeroInit 0.97x Nexus 6P ZeroInit 0.95x Nexus 6P ***Two Notes: The improvements here are actually due to taking advantage of memcpy() (no need to swap, the bytes are already in the proper order). ZeroInit skips writing zeros to zero initialized memory. This is a memory use opt in Android. BMP decodes should also benefit from these improvements. I am relying on Gold to help test all possible cases. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/03108de163354fa574679ad153b58ce57126b2ba ==========
Message was sent while issue was closed.
Committed patchset #4 (id:80001) as https://skia.googlesource.com/skia/+/03108de163354fa574679ad153b58ce57126b2ba |