|
|
DescriptionExpand _01 half<->float limitation to _finite. Simplify.
It's become clear we need to sometimes deal with values <0 or >1.
I'm not yet convinced we care about NaN or +-inf.
We had some fairly clever tricks and optimizations here for NEON
and SSE. I've thrown them out in favor of a single implementation.
If we find the specializations mattered, we can certainly figure out
how to extend them to this new range/domain.
This happens to add a vectorized float -> half for ARMv7, which was
missing from the _01 version. (The SSE strategy was not portable to
platforms that flush denorm floats to zero.)
I've tested the full float range for FloatToHalf on my desktop and a 5x.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003
CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot
Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Committed: https://skia.googlesource.com/skia/+/58e389b0518b46bbe58ba01c23443cf23c18435c
Patch Set 1 #Patch Set 2 : working #Patch Set 3 : that feels better #Patch Set 4 : comments and tweaks #Patch Set 5 : NEON #Patch Set 6 : typo #
Total comments: 13
Patch Set 7 : clarify #Patch Set 8 : drop _mm_packus_epi32 for now. #
Messages
Total messages: 48 (29 generated)
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. We had some fairly clever little tricks and optimizations for NEON and SSE. I've thrown them out in favor of a single Sk4i implementation. If we find it matters, we can certainly put them back. BUG=skia: ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. We had some fairly clever little tricks and optimizations for NEON and SSE. I've thrown them out in favor of a single Sk4i implementation. If we find it matters, we can certainly put them back. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. We had some fairly clever little tricks and optimizations for NEON and SSE. I've thrown them out in favor of a single Sk4i implementation. If we find it matters, we can certainly put them back. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. We had some fairly clever little tricks and optimizations for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. We had some fairly clever little tricks and optimizations for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Ubuntu-GCC-x86_64-Release-CMake-Trybot on master.client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-x86...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
mtklein@chromium.org changed reviewers: + msarett@google.com
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Your CL relies on deprecated CQ feature(s): * Specifying master names without "master." prefix is deprecated: client.skia. For more details, see http://crbug.com/617627.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h File src/core/SkHalf.h (right): https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:49: Sk4i positive = SkNx_cast<int>(Sk4h::Load(&hs)), nit: I found this code block confusing because "positive" is defined before the sign bit is stripped. This would be more clear: Sk4i positive = SkNx_cast<int>(Sk4h::Load(&hs)) & 0x00007FFF; But I guess since you use "sign" down below this doesn't save anything (and uses another constant). https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:53: // For normal half floats, align the exponent/mantissa line and rebias the exponent. This is the simplest part, and still, this code is really complicated! Can we try to make it clearer? Ex: static constexpr int kF32MantissaBits = 23; static constexpr int kF32Bias = 127; static constexpr int kF16MantissaBits = 10; static constexpr int kF16Bias = 15; // Align the f16 exponent/mantissa line with the f32 exponent/mantissa line x = x << (kF32MantissaBits - kF16MantissaBits); // Rebias the exponent x = x + ((kF32Bias - kF16Bias) << kF32MantissaBits) https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:56: // For denorm half floats, mask in a value with the right exponent for 2^-14, I think the comment that would have made things much clearer for me is about why it's ok to put the exponent bits of the float16 in the mantissa of the float32. https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:59: Sk4i denorm = positive | denorm_fixup; nit: Confused by the variable name. It's not "denorm" here. It's just a bunch of bits. I guess this is the same style pattern as above. It's not my personal preference, but I don't feel too strongly. https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:87: const Sk4i denorm_fixup = 126<<23; Haven't looked here yet... https://codereview.chromium.org/2145663003/diff/100001/src/opts/SkNx_sse.h File src/opts/SkNx_sse.h (right): https://codereview.chromium.org/2145663003/diff/100001/src/opts/SkNx_sse.h#ne... src/opts/SkNx_sse.h:397: // Sign extend to trick _mm_packs_epi32() into doing the pack we want. Cool!
https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h File src/core/SkHalf.h (right): https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:56: // For denorm half floats, mask in a value with the right exponent for 2^-14, On 2016/07/13 22:07:05, msarett wrote: > I think the comment that would have made things much clearer for me is about why > it's ok to put the exponent bits of the float16 in the mantissa of the float32. Oh duh the exponent bits are all zero.
Alright I finally understand the code. Had to wait until my brain was working a little better this morning... I think there might be some things you can do to make this a little more clear, but otherwise this looks good. https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h File src/core/SkHalf.h (right): https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:57: // then subtract it off as a float. This leaves just our original fraction. // Desired exponent is 2^-14 because that is the exponent on denormal half floats (-bias + 1) https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:58: const Sk4i denorm_fixup = 126<<23; // Because the bias is 127 this is an exponent of 2^-1. But the mantissa is also shifted right by 13, so we really have 2^-14. https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:60: Sk4f denorm_f = Sk4f::Load(&denorm) - Sk4f::Load(&denorm_fixup); // ((1 * 2^-1) + value) - (1 * 2^-1) = value https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:85: Sk4i norm = (positive - (112<<23)) >> 13; nit: Still think this is clearer with constants. // What happens when the exponent is less than 112? It'll be a denormal half-float so it doesn't matter anyway? https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:88: Sk4f denorm_f = Sk4f::Load(&positive) + Sk4f::Load(&denorm_fixup); // (1 * 2^-1) + small float effectively shifts the small float down to the bottom ten bits of the mantissa. https://codereview.chromium.org/2145663003/diff/100001/src/core/SkHalf.h#newc... src/core/SkHalf.h:89: Sk4i denorm = Sk4i::Load(&denorm_f) ^ denorm_fixup; Cool this saves us a mask. // Mask away the exponent bits.
I agree... I'll try to make things more understandable and ping the thread later today.
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Your CL relies on deprecated CQ feature(s): * Specifying master names in CQ_INCLUDE_TRYBOTS part of description without "master." prefix is deprecated: client.skia For more details, see http://crbug.com/617627.
On 2016/07/14 at 13:41:46, mtklein_C wrote: > I agree... I'll try to make things more understandable and ping the thread later today. OK, please have another look to see if things are clearer?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Your CL relies on deprecated CQ feature(s): * Specifying master names in CQ_INCLUDE_TRYBOTS part of description without "master." prefix is deprecated: client.skia For more details, see http://crbug.com/617627.
Message was sent while issue was closed.
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001) as https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90
Message was sent while issue was closed.
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 ==========
Message was sent while issue was closed.
A revert of this CL (patchset #7 id:120001) has been created in https://codereview.chromium.org/2151023003/ by mtklein@google.com. The reason for reverting is: Unit tests fail on Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast.
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Your CL relies on deprecated CQ feature(s): * Specifying master names in CQ_INCLUDE_TRYBOTS part of description without "master." prefix is deprecated: client.skia For more details, see http://crbug.com/617627.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Don't know why yet, but it looks like the problem with the -Fast- bot was due to using _mm_packus_epi32 to pack int to uint16_t (i.e. what it's designed for) when available, SSE4.1+. It's causing the (Linux, GCC) bot to fail at runtime and my (Mac,Clang) laptop to crash during code generation... mysterious. For now I'm just going with the <<16, >>16, packs approach, and am in parallel writing up a CL that measures the best ways to do this operation.
The CQ bit was checked by mtklein@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from msarett@google.com Link to the patchset: https://codereview.chromium.org/2145663003/#ps140001 (title: "drop _mm_packus_epi32 for now.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Your CL relies on deprecated CQ feature(s): * Specifying master names in CQ_INCLUDE_TRYBOTS part of description without "master." prefix is deprecated: client.skia For more details, see http://crbug.com/617627.
Message was sent while issue was closed.
Description was changed from ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 ========== to ========== Expand _01 half<->float limitation to _finite. Simplify. It's become clear we need to sometimes deal with values <0 or >1. I'm not yet convinced we care about NaN or +-inf. We had some fairly clever tricks and optimizations here for NEON and SSE. I've thrown them out in favor of a single implementation. If we find the specializations mattered, we can certainly figure out how to extend them to this new range/domain. This happens to add a vectorized float -> half for ARMv7, which was missing from the _01 version. (The SSE strategy was not portable to platforms that flush denorm floats to zero.) I've tested the full float range for FloatToHalf on my desktop and a 5x. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2145663003 CQ_INCLUDE_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot;master.client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot,Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-Fast-Trybot Committed: https://skia.googlesource.com/skia/+/3296bee70d074bb8094b3229dbe12fa016657e90 Committed: https://skia.googlesource.com/skia/+/58e389b0518b46bbe58ba01c23443cf23c18435c ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://skia.googlesource.com/skia/+/58e389b0518b46bbe58ba01c23443cf23c18435c
Message was sent while issue was closed.
CQ bit was unchecked. |