|
|
DescriptionSk4x4f: Simplify x86 down to SSE2.
- This drops the minimum requirement for Sk4x4f on x86 to SSE2 by
removing calls to _mm_shuffle_epi8(). Instead we use good old
shifting and masking.
- Performance is very similar to SSSE3, close enough I'm having trouble
telling which is faster. I think we should let ourselves circle back
on whether we need an SSSE3 version later. When possible it's nice
to stick to SSE2: it's most available, and performs most uniformly
across different chips.
This makes Sk4x4f fast on Windows and Linux, and may help mobile x86.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1817353005
Committed: https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f
Patch Set 1 #Patch Set 2 : tweaks #Patch Set 3 : derp #Messages
Total messages: 25 (12 generated)
Description was changed from ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performst most uniformly across different chips. BUG=skia: ========== to ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performst most uniformly across different chips. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performst most uniformly across different chips. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast(er) on Windows, Linux, and mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast(er) on Windows, Linux, and mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast on Windows and Linux, and may help mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/1
mtklein@chromium.org changed reviewers: + fmalita@chromium.org
The CQ bit was checked by mtklein@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/20001
Note for Reviewers: The CQ is waiting for an approval. If you believe that the CL is not ready yet, or if you would like to L-G-T-M with comments then please uncheck the CQ checkbox. Waiting for LGTM from valid reviewer(s) till 2016-03-23 08:22 UTC
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: Test-Ubuntu-GCC-ShuttleA-GPU-GTX660-x86_64-Release-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Ubuntu-GCC-ShuttleA-GPU...) Build-Mac-Clang-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac-Clang-x86_...) Build-Ubuntu-Clang-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-Clang-x...) Build-Ubuntu-GCC-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-x86...)
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
mtklein@chromium.org changed reviewers: + herb@google.com - mtklein@google.com
+Herb
The CQ bit was checked by fmalita@chromium.org
lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1817353005/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1817353005/40001
Message was sent while issue was closed.
Description was changed from ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast on Windows and Linux, and may help mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== Sk4x4f: Simplify x86 down to SSE2. - This drops the minimum requirement for Sk4x4f on x86 to SSE2 by removing calls to _mm_shuffle_epi8(). Instead we use good old shifting and masking. - Performance is very similar to SSSE3, close enough I'm having trouble telling which is faster. I think we should let ourselves circle back on whether we need an SSSE3 version later. When possible it's nice to stick to SSE2: it's most available, and performs most uniformly across different chips. This makes Sk4x4f fast on Windows and Linux, and may help mobile x86. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://skia.googlesource.com/skia/+/1443c6920c4b7aa80811c30ed9cdc81395d5df4f
Message was sent while issue was closed.
You may be able to move the shifts into the mults and divs for normalization.
Message was sent while issue was closed.
lgtm |