|
|
DescriptionSSSE3 optimizations for gray -> RGBA (or BGRA)
Swizzle Bench Runtime
Dell Venue 8 0.16x
HP z620 0.47x
PNG Decode Time (for test set of gray encoded PNGs)
Dell Venue 8 0.80x
HP z620 0.96x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1657393002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/0700651128f8c505da65e651f9788589593f07c4
Patch Set 1 : #
Total comments: 6
Patch Set 2 : Unpack based approach #Patch Set 3 : Fix windows bot #
Depends on Patchset: Messages
Total messages: 23 (11 generated)
Description was changed from ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.34x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.85x HP z620 0.96x BUG=skia:4767 ========== to ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.34x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.85x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
msarett@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { This performs almost identically to an *unpack* based approach. It may be just slightly faster, since it needs a couple less MOV instructions. The *unpack* approach requires more MOVs because unpack is destructive to both argument registers.
lgtm https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { On 2016/02/02 20:22:36, msarett wrote: > This performs almost identically to an *unpack* based approach. It may be just > slightly faster, since it needs a couple less MOV instructions. Even on mobile x86 (Venue8)?
https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { On 2016/02/02 20:29:30, mtklein wrote: > On 2016/02/02 20:22:36, msarett wrote: > > This performs almost identically to an *unpack* based approach. It may be > just > > slightly faster, since it needs a couple less MOV instructions. > > Even on mobile x86 (Venue8)? (If so that makes me very happy. I love,love,love pshufb.)
https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { On 2016/02/02 20:29:30, mtklein wrote: > On 2016/02/02 20:22:36, msarett wrote: > > This performs almost identically to an *unpack* based approach. It may be > just > > slightly faster, since it needs a couple less MOV instructions. > > Even on mobile x86 (Venue8)? I only compared the two approaches on my desktop initially. Unpack is wayyy faster on mobile. How did you know that? Wow! I'm switching to *unpack*. Mobile is more important.
https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { On 2016/02/02 20:41:44, msarett wrote: > On 2016/02/02 20:29:30, mtklein wrote: > > On 2016/02/02 20:22:36, msarett wrote: > > > This performs almost identically to an *unpack* based approach. It may be > > just > > > slightly faster, since it needs a couple less MOV instructions. > > > > Even on mobile x86 (Venue8)? > > I only compared the two approaches on my desktop initially. Unpack is wayyy > faster on mobile. > > How did you know that? Wow! > > I'm switching to *unpack*. Mobile is more important. :( This is one of those times I hate to be right. pshufb costs 5 cycles on mobile x86, but only 1 on normal x86. unpacks are 1 everywhere. I usually refer to https://software.intel.com/sites/landingpage/IntrinsicsGuide/ for a rough idea of how most desktop-class processors will work, but http://www.agner.org/optimize/instruction_tables.pdf gives you details for many more (including mobile, e.g. Silvermont).
Description was changed from ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.34x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.85x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.16x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.80x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:1) has been deleted
PTAL A completely different approach. https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1657393002/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:455: static void gray_to_RGB1(uint32_t dst[], const void* vsrc, int count) { On 2016/02/02 20:47:18, mtklein wrote: > On 2016/02/02 20:41:44, msarett wrote: > > On 2016/02/02 20:29:30, mtklein wrote: > > > On 2016/02/02 20:22:36, msarett wrote: > > > > This performs almost identically to an *unpack* based approach. It may be > > > just > > > > slightly faster, since it needs a couple less MOV instructions. > > > > > > Even on mobile x86 (Venue8)? > > > > I only compared the two approaches on my desktop initially. Unpack is wayyy > > faster on mobile. > > > > How did you know that? Wow! > > > > I'm switching to *unpack*. Mobile is more important. > > :( This is one of those times I hate to be right. > pshufb costs 5 cycles on mobile x86, but only 1 on normal x86. unpacks are 1 > everywhere. > > I usually refer to https://software.intel.com/sites/landingpage/IntrinsicsGuide/ > for a rough idea of how most desktop-class processors will work, but > http://www.agner.org/optimize/instruction_tables.pdf gives you details for many > more (including mobile, e.g. Silvermont). Thanks for the extra reference! Got an extra 5% on mobile! Woohoo!
lgtm (Don't worry about lots of movdqa instructions... they're typically 0 cycles.)
The CQ bit was checked by msarett@google.com
The CQ bit was unchecked by commit-bot@chromium.org
This CL has an open dependency (Issue 1656383002 Patch 40001). Please resolve the dependency and try again.
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1657393002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1657393002/40001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...) Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by msarett@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from mtklein@google.com Link to the patchset: https://codereview.chromium.org/1657393002/#ps60001 (title: "Fix windows bot")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1657393002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1657393002/60001
Message was sent while issue was closed.
Description was changed from ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.16x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.80x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== SSSE3 optimizations for gray -> RGBA (or BGRA) Swizzle Bench Runtime Dell Venue 8 0.16x HP z620 0.47x PNG Decode Time (for test set of gray encoded PNGs) Dell Venue 8 0.80x HP z620 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/0700651128f8c505da65e651f9788589593f07c4 ==========
Message was sent while issue was closed.
Committed patchset #3 (id:60001) as https://skia.googlesource.com/skia/+/0700651128f8c505da65e651f9788589593f07c4 |