|
|
DescriptionMove CPU feature detection to its own file.
- Moves CPU feature detection to its own file.
- Cleans up some redundant feature detection scattered around core/ and opts/.
- Can now detect a few new CPU features:
* F16C -> Intel f16<->f32 instructions, added between AVX and AVX2
* FMA -> Intel FMA instructions, added at the same time as AVX2
* VFP_FP16 -> ARM f16<->f32 instructions, quite common
* NEON_FMA -> ARM FMA instructions, also quite common
* SSE and SSE3... why not?
This new internal API makes it very cheap to do fine-grained runtime CPU
feature detection. Redundant calls to SkCpu::Supports() should be eliminated
and it's hoistable out of loops. It compiles away entirely when we have the
appropriate instructions available at compile time.
This means we can call it to guard even a little snippet of 1 or 2 instructions
right where needed and let inlining hoist the check (if any at all) up to
somewhere that doesn't hurt performance. I've explained how I made this work
in the private section of the new header.
Once this lands and bakes a bit, I'll start following up with CLs to use it more
and to add a bunch of those little 1-2 instruction snippets we've been wanting,
e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps
(for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1890483002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Committed: https://skia.googlesource.com/skia/+/4311f016612a814282029daa4bd102053a853d82
Patch Set 1 #Patch Set 2 : port opts_check_x86 #Patch Set 3 : typo #Patch Set 4 : warning #Patch Set 5 : Update old NEON detection code too. #Patch Set 6 : getting ahead of myself on F16C #
Total comments: 6
Patch Set 7 : link #
Messages
Total messages: 51 (27 generated)
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel added f16<->f32 instructions between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * NEON_FMA -> ARM FMA instructions, quite common * VFP_FP16 -> ARM f16<->f32 instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel added f16<->f32 instructions between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * NEON_FMA -> ARM FMA instructions, quite common * VFP_FP16 -> ARM f16<->f32 instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel added f16<->f32 instructions between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * NEON_FMA -> ARM FMA instructions, quite common * VFP_FP16 -> ARM f16<->f32 instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel added f16<->f32 instructions between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * NEON_FMA -> ARM FMA instructions, quite common * VFP_FP16 -> ARM f16<->f32 instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel added f16<->f32 instructions between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * NEON_FMA -> ARM FMA instructions, quite common * VFP_FP16 -> ARM f16<->f32 instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Ubuntu-GCC-x86_64-Release-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-x86...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/60001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the call (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/80001
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
mtklein@chromium.org changed reviewers: + herb@google.com
The CQ bit was checked by mtklein@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/80001
Note for Reviewers: The CQ is waiting for an approval. If you believe that the CL is not ready yet, or if you would like to L-G-T-M with comments then please uncheck the CQ checkbox. Waiting for LGTM from valid reviewer(s) till 2016-04-13 23:16 UTC
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/100001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
mtklein@chromium.org changed reviewers: + fmalita@chromium.org, reed@google.com
+Mike and Florin I think I can take yesterday's half<->float CL in a slightly different direction that makes it generally cheaper to do runtime CPU detection. This is a foundation CL, and then the next one I'll send out builds on it to add that fast x86 float conversion.
lgtm Yeah!!! https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp File src/core/SkCpu.cpp (right): https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp#new... src/core/SkCpu.cpp:18: #include <cpuid.h> If you have a link to the docs that describe cpuid that you used to figure out this code could you add it here.
lgtm https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp File src/core/SkCpu.cpp (right): https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp#new... src/core/SkCpu.cpp:39: if (abcd[3] & (1<<25)) { features |= SkCpu:: SSE1; } It's a bummer we can't use the cpuid.h macros instead of magic constants. I guess adding an enum would be kinda redundant too. https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.h File src/core/SkCpu.h (right): https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.h#newco... src/core/SkCpu.h:62: // (Static intializers would work fine everywhere, but Chrome really dislikes them.) Does this mean we'll have to sell a static initialized to Chrome/MSVC?
mtklein@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp File src/core/SkCpu.cpp (right): https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp#new... src/core/SkCpu.cpp:18: #include <cpuid.h> On 2016/04/14 at 15:38:34, herb_g wrote: > If you have a link to the docs that describe cpuid that you used to figure out this code could you add it here. Done, below. https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.cpp#new... src/core/SkCpu.cpp:39: if (abcd[3] & (1<<25)) { features |= SkCpu:: SSE1; } On 2016/04/14 at 15:40:35, f(malita) wrote: > It's a bummer we can't use the cpuid.h macros instead of magic constants. I guess adding an enum would be kinda redundant too. Yeah, we could use them when we have cpuid.h, but that'd fork this code. This function more or less is the enum(s) we'd write. https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.h File src/core/SkCpu.h (right): https://codereview.chromium.org/1890483002/diff/100001/src/core/SkCpu.h#newco... src/core/SkCpu.h:62: // (Static intializers would work fine everywhere, but Chrome really dislikes them.) On 2016/04/14 at 15:40:35, f(malita) wrote: > Does this mean we'll have to sell a static initialized to Chrome/MSVC? Yeah, maybe, for some values of "have to" and "sell". Skia's already got a few Windows-only static initializers, and I'm told they're not such a problem on Windows as they are (used to be?) with other executable formats that store the initializers in a way that's not at all disk-seek friendly. I'm open to continuing to look for other non-static-intializer approaches for MSVC that still allow this to act like a constant... I just haven't found one yet.
The CQ bit was checked by mtklein@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from herb@google.com, fmalita@chromium.org Link to the patchset: https://codereview.chromium.org/1890483002/#ps120001 (title: "link")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/120001
Message was sent while issue was closed.
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001) as https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/1892643003/ by mtklein@google.com. The reason for reverting is: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places..
On 2016/04/15 at 15:40:06, mtklein wrote: > A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/1892643003/ by mtklein@google.com. > > The reason for reverting is: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.. Note to self: reverting src/core/SkOpts.cpp is enough to fix the diffs on Windows.
On 2016/04/19 at 15:32:39, mtklein wrote: > On 2016/04/15 at 15:40:06, mtklein wrote: > > A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/1892643003/ by mtklein@google.com. > > > > The reason for reverting is: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.. > > Note to self: reverting src/core/SkOpts.cpp is enough to fix the diffs on Windows. Disabling if (abcd[2] & (1<<19)) { Init_sse41(); } in the old code gives the same effect as landing this CL.
On 2016/04/19 at 15:36:33, mtklein wrote: > On 2016/04/19 at 15:32:39, mtklein wrote: > > On 2016/04/15 at 15:40:06, mtklein wrote: > > > A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/1892643003/ by mtklein@google.com. > > > > > > The reason for reverting is: many unexpected GM diffs across GPU+CPU configs on Windows (hopefully just text masks on GPU?). seems like we pick a different srcover variant in some places.. > > > > Note to self: reverting src/core/SkOpts.cpp is enough to fix the diffs on Windows. > > Disabling if (abcd[2] & (1<<19)) { Init_sse41(); } in the old code gives the same effect as landing this CL. I've fallen into a classic trap. The problem is global initialization is in unspecified order. I have an implicit ordering requirement between the new Windows-only gCachedCpuFeatures and this intiializer in SkOpts.cpp: #if SK_ALLOW_STATIC_GLOBAL_INITIALIZERS static struct AutoInit { AutoInit() { Init(); } } gAutoInit; #endif This gAutoInit calls Init() which calls init() once, which calls SkCpu::Supports() which calls SkCpu::RuntimeFeatures() which reads gCachedCpuFeatures, which has not yet been initialized. It reads as zero, so all SkCpu::Supports() calls return false. Simplest way to handle is probably remove that SkOpts.cpp static initializer.
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/120001
The CQ bit was unchecked by mtklein@chromium.org
The CQ bit was checked by mtklein@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890483002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890483002/120001
Message was sent while issue was closed.
Description was changed from ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 ========== to ========== Move CPU feature detection to its own file. - Moves CPU feature detection to its own file. - Cleans up some redundant feature detection scattered around core/ and opts/. - Can now detect a few new CPU features: * F16C -> Intel f16<->f32 instructions, added between AVX and AVX2 * FMA -> Intel FMA instructions, added at the same time as AVX2 * VFP_FP16 -> ARM f16<->f32 instructions, quite common * NEON_FMA -> ARM FMA instructions, also quite common * SSE and SSE3... why not? This new internal API makes it very cheap to do fine-grained runtime CPU feature detection. Redundant calls to SkCpu::Supports() should be eliminated and it's hoistable out of loops. It compiles away entirely when we have the appropriate instructions available at compile time. This means we can call it to guard even a little snippet of 1 or 2 instructions right where needed and let inlining hoist the check (if any at all) up to somewhere that doesn't hurt performance. I've explained how I made this work in the private section of the new header. Once this lands and bakes a bit, I'll start following up with CLs to use it more and to add a bunch of those little 1-2 instruction snippets we've been wanting, e.g. cvtps2ph, cvtph2ps, ptest, pmulld, pmovzxbd, blendvps, pshufb, roundps (for floor) on x86, and vcvt.f32.f16, vcvt.f16.f32 on ARM. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/872ea29357439f05b1f6995dd300fc054733e607 Committed: https://skia.googlesource.com/skia/+/4311f016612a814282029daa4bd102053a853d82 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001) as https://skia.googlesource.com/skia/+/4311f016612a814282029daa4bd102053a853d82 |