|
|
DescriptionSSSE3 opts for RGB -> RGB(FF) or BGR(FF)
Swizzle Bench Runtime
z620 0.21x
Dell Venue 8 0.26x
RGB PNGs Decode Runtime
z620 0.91x
Dell Venus 8 0.96x
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1618603003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/13aa1a5ad97156e35184970fc1ce1aaf3c50c91c
Patch Set 1 : #
Total comments: 5
Patch Set 2 : Do not read off the end of memory #
Depends on Patchset: Messages
Total messages: 16 (5 generated)
Description was changed from ========== SSSE3 opts for RGB -> RGB(FF) or BGR(FF) Swizzle Bench Runtime z620 0.21x Dell Venue 8 0.26x RGB PNGs Decode Runtime z620 0.91x Dell Venus 8 0.96x BUG=skia:4767 ========== to ========== SSSE3 opts for RGB -> RGB(FF) or BGR(FF) Swizzle Bench Runtime z620 0.21x Dell Venue 8 0.26x RGB PNGs Decode Runtime z620 0.91x Dell Venus 8 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:1) has been deleted
msarett@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:367: const uint8_t X = 0xFF; // Used a placeholder. The value of X is irrelevant. Because X > 127, this actually sets those lanes to zero. Shame we can't exploit that. https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:375: // Load a vector. While this actually contains 5 pixels plus an So isn't this only safe while (count >= 6) ?
https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:375: // Load a vector. While this actually contains 5 pixels plus an On 2016/01/22 17:51:50, mtklein wrote: > So isn't this only safe while (count >= 6) ? Alternatively, since only the load is weird and the store is quite ordinary, we might want to try looping while (count >= 4) and loading exactly the 12 bytes of 4 pixels using some combination of an 8 byte load and a 4 byte load? We can load the bytes anywhere into the register that's convenient, as we'll be _mm_shuffle_epi8'ing anyway. E.g. __m128i rgb = _mm_cvtsi32_si128(*(const uint32_t*)src); // load 4 bytes into low 4 bytes rgb = _mm_castpd_si128(_mm_loadh_pd(_mm_castsi128_pd(rgb), (const double*)(src+4))); // load next 8 bytes into high 8 bytes, keeping the original 4 in the low lane _mm_shuffle_epi8(rgb, TBD)
https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opts.h File src/opts/SkSwizzler_opts.h (right): https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:367: const uint8_t X = 0xFF; // Used a placeholder. The value of X is irrelevant. On 2016/01/22 17:51:50, mtklein wrote: > Because X > 127, this actually sets those lanes to zero. Shame we can't exploit > that. Agreed. I was hoping there was something that would automatically set them to 0xFF :(. https://codereview.chromium.org/1618603003/diff/20001/src/opts/SkSwizzler_opt... src/opts/SkSwizzler_opts.h:375: // Load a vector. While this actually contains 5 pixels plus an On 2016/01/22 17:51:50, mtklein wrote: > So isn't this only safe while (count >= 6) ? I don't understand why it's not always safe? At the very end of image we will load outside of our pixel memory, but we will never use those values. Is that not ok? If it's not, maybe we should use this approach up until >= 6 and then your approach for >= 4?
> At the very end of image we will load outside of our pixel memory, but we will > never use those values. Is that not ok? Very not ok. If you run off the end of the page you're reading from, you'll segfault. > If it's not, maybe we should use this approach up until >= 6 and then your > approach for >= 4? I suspect we won't care about the difference between 5 serial pixels and 3 serial pixels, so I'd go for one or the other, but a hybrid's probably overkill.
On 2016/01/22 18:21:46, mtklein wrote: > > At the very end of image we will load outside of our pixel memory, but we will > > never use those values. Is that not ok? > Very not ok. If you run off the end of the page you're reading from, you'll > segfault. I should qualify that. You _can_ segfault, and technically you can segfault within a page too. Reading or writing beyond your known allocation bounds is a time bomb.
Of course thanks! I'm going with the (count >= 6) approach. It appears slightly faster.
lgtm
The CQ bit was checked by msarett@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1618603003/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1618603003/40001
Message was sent while issue was closed.
Description was changed from ========== SSSE3 opts for RGB -> RGB(FF) or BGR(FF) Swizzle Bench Runtime z620 0.21x Dell Venue 8 0.26x RGB PNGs Decode Runtime z620 0.91x Dell Venus 8 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== SSSE3 opts for RGB -> RGB(FF) or BGR(FF) Swizzle Bench Runtime z620 0.21x Dell Venue 8 0.26x RGB PNGs Decode Runtime z620 0.91x Dell Venus 8 0.96x BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/13aa1a5ad97156e35184970fc1ce1aaf3c50c91c ==========
Message was sent while issue was closed.
Committed patchset #2 (id:40001) as https://skia.googlesource.com/skia/+/13aa1a5ad97156e35184970fc1ce1aaf3c50c91c |