|
|
DescriptionSkHalfToFloat_01 / SkFloatToHalf_01
These are basically inlined, 4-at-a-time versions of our existing functions,
but cut down to avoid any work that's only necessary outside [0,1].
Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat.
In exchange for a little speed, f32->f16 does not round properly.
Instead it truncates, so it's never off by more than 1 bit.
Support for finite values >1 or <0 is straightforward to add back.
>1 might already work as-is.
Getting close to _u16 performance:
micros bench
261.13 xferu64_bw_1_opaque_u16
1833.51 xferu64_bw_1_alpha_u16
2762.32 ? xferu64_aa_1_opaque_u16
3334.29 xferu64_aa_1_alpha_u16
249.78 xferu64_bw_1_opaque_f16
3383.18 xferu64_bw_1_alpha_f16
4214.72 xferu64_aa_1_opaque_f16
4701.19 xferu64_aa_1_alpha_f16
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1685133005
Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/fff055cc5f9ca5015678f4f13a4f842084bd62d5
Patch Set 1 #Patch Set 2 : other way #Patch Set 3 : relax test #Patch Set 4 : sampling mode #Patch Set 5 : comments #Patch Set 6 : simplify #Patch Set 7 : fixes #Patch Set 8 : saner portable code #Patch Set 9 : more fixes #Patch Set 10 : shhhh #Patch Set 11 : loadl/storel #Patch Set 12 : guard #
Messages
Total messages: 48 (30 generated)
Description was changed from ========== HalfToFloat_01 BUG=skia: ========== to ========== HalfToFloat_01 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== HalfToFloat_01 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 1 254µs 264µs 260µs 265µs 2% ▁▁▁▇████▅▁ nonrendering xferu64_bw_1_opaque_u16 8/8 MB 1 1.75ms 1.83ms 1.83ms 2.1ms 6% █▃▃▃▃▃▁▁▁▁ nonrendering xferu64_bw_1_alpha_u16 8/8 MB 1 2.66ms 2.78ms 2.78ms 3.02ms 4% ▃▃▃▁▁▄█▃▃▃ nonrendering xferu64_aa_1_opaque_u16 8/8 MB 1 3.26ms 3.41ms 3.36ms 3.55ms 3% ▁▁▁▁▃▅▅█▅▅ nonrendering xferu64_aa_1_alpha_u16 8/8 MB 2 268µs 268µs 268µs 272µs 1% ▁▁▁▅█▁▁▁▁▁ nonrendering xferu64_bw_1_opaque_f16 8/8 MB 1 3.24ms 3.29ms 3.32ms 3.57ms 3% ▁▁▁▂▁▁█▄▄▄ nonrendering xferu64_bw_1_alpha_f16 8/8 MB 1 4.03ms 4.06ms 4.13ms 4.38ms 3% ▁▁▁▂▁▁█▅▅▅ nonrendering xferu64_aa_1_opaque_f16 8/8 MB 1 4.67ms 4.7ms 4.76ms 4.97ms 2% ▁▁▂▃▁█▆▆▂▁ nonrendering xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 1 254µs 264µs 260µs 265µs 2% ▁▁▁▇████▅▁ nonrendering xferu64_bw_1_opaque_u16 8/8 MB 1 1.75ms 1.83ms 1.83ms 2.1ms 6% █▃▃▃▃▃▁▁▁▁ nonrendering xferu64_bw_1_alpha_u16 8/8 MB 1 2.66ms 2.78ms 2.78ms 3.02ms 4% ▃▃▃▁▁▄█▃▃▃ nonrendering xferu64_aa_1_opaque_u16 8/8 MB 1 3.26ms 3.41ms 3.36ms 3.55ms 3% ▁▁▁▁▃▅▅█▅▅ nonrendering xferu64_aa_1_alpha_u16 8/8 MB 2 268µs 268µs 268µs 272µs 1% ▁▁▁▅█▁▁▁▁▁ nonrendering xferu64_bw_1_opaque_f16 8/8 MB 1 3.24ms 3.29ms 3.32ms 3.57ms 3% ▁▁▁▂▁▁█▄▄▄ nonrendering xferu64_bw_1_alpha_f16 8/8 MB 1 4.03ms 4.06ms 4.13ms 4.38ms 3% ▁▁▁▂▁▁█▅▅▅ nonrendering xferu64_aa_1_opaque_f16 8/8 MB 1 4.67ms 4.7ms 4.76ms 4.97ms 2% ▁▁▂▃▁█▆▆▂▁ nonrendering xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 Getting close to _u16 performance: curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 1 254µs 264µs 260µs 265µs 2% ▁▁▁▇████▅▁ nonrendering xferu64_bw_1_opaque_u16 8/8 MB 1 1.75ms 1.83ms 1.83ms 2.1ms 6% █▃▃▃▃▃▁▁▁▁ nonrendering xferu64_bw_1_alpha_u16 8/8 MB 1 2.66ms 2.78ms 2.78ms 3.02ms 4% ▃▃▃▁▁▄█▃▃▃ nonrendering xferu64_aa_1_opaque_u16 8/8 MB 1 3.26ms 3.41ms 3.36ms 3.55ms 3% ▁▁▁▁▃▅▅█▅▅ nonrendering xferu64_aa_1_alpha_u16 8/8 MB 2 268µs 268µs 268µs 272µs 1% ▁▁▁▅█▁▁▁▁▁ nonrendering xferu64_bw_1_opaque_f16 8/8 MB 1 3.24ms 3.29ms 3.32ms 3.57ms 3% ▁▁▁▂▁▁█▄▄▄ nonrendering xferu64_bw_1_alpha_f16 8/8 MB 1 4.03ms 4.06ms 4.13ms 4.38ms 3% ▁▁▁▂▁▁█▅▅▅ nonrendering xferu64_aa_1_opaque_f16 8/8 MB 1 4.67ms 4.7ms 4.76ms 4.97ms 2% ▁▁▂▃▁█▆▆▂▁ nonrendering xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 Getting close to _u16 performance: curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 1 254µs 264µs 260µs 265µs 2% ▁▁▁▇████▅▁ nonrendering xferu64_bw_1_opaque_u16 8/8 MB 1 1.75ms 1.83ms 1.83ms 2.1ms 6% █▃▃▃▃▃▁▁▁▁ nonrendering xferu64_bw_1_alpha_u16 8/8 MB 1 2.66ms 2.78ms 2.78ms 3.02ms 4% ▃▃▃▁▁▄█▃▃▃ nonrendering xferu64_aa_1_opaque_u16 8/8 MB 1 3.26ms 3.41ms 3.36ms 3.55ms 3% ▁▁▁▁▃▅▅█▅▅ nonrendering xferu64_aa_1_alpha_u16 8/8 MB 2 268µs 268µs 268µs 272µs 1% ▁▁▁▅█▁▁▁▁▁ nonrendering xferu64_bw_1_opaque_f16 8/8 MB 1 3.24ms 3.29ms 3.32ms 3.57ms 3% ▁▁▁▂▁▁█▄▄▄ nonrendering xferu64_bw_1_alpha_f16 8/8 MB 1 4.03ms 4.06ms 4.13ms 4.38ms 3% ▁▁▁▂▁▁█▅▅▅ nonrendering xferu64_aa_1_opaque_f16 8/8 MB 1 4.67ms 4.7ms 4.76ms 4.97ms 2% ▁▁▂▃▁█▆▆▂▁ nonrendering xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, vectorized versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for speed, f32->f16 does not round properly. Instead it truncates. It's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/100001
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Mac-Clang-Arm7-Debug-iOS-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac-Clang-Arm7...) Build-Ubuntu-GCC-Arm64-Debug-Android-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Arm...) Build-Ubuntu-GCC-Arm7-Debug-Android-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Arm...) Build-Ubuntu-GCC-Mips-Debug-Android-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Mip...) Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...) Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/120001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Debug-Trybot on client.skia (JOB_FAILED, http://build.chromium.org/p/client.skia/builders/Test-Ubuntu-GCC-GCE-CPU-AVX2...) Build-Mac-Clang-Arm7-Debug-iOS-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac-Clang-Arm7...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/160001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/160001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...) Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/180001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/180001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...)
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/200001
mtklein@chromium.org changed reviewers: + jvanverth@google.com, reed@google.com
The bots may have a little more complaining to do, but it's probably good enough to take a look at now.
(A weirder version of f16 -> f32 requiring more denormalized multiplication, __m128i h = _mm_unpacklo_epi16(_mm_loadl_epi64((const __m128i*)&hs), _mm_setzero_si128()); const __m128 magic = _mm_castsi128_ps(_mm_set1_epi32(239 << 23)); return _mm_mul_ps(magic, _mm_castsi128_ps(_mm_slli_epi32(h, 13))); results in very interesting performance numbers: 261.13 xferu64_bw_1_opaque_u16 nonrendering 1832.58 xferu64_bw_1_alpha_u16 nonrendering 2704.76 xferu64_aa_1_opaque_u16 nonrendering 3268.43 xferu64_aa_1_alpha_u16 nonrendering 262.84 xferu64_bw_1_opaque_f16 nonrendering 2061.21 xferu64_bw_1_alpha_f16 nonrendering 2906.57 xferu64_aa_1_opaque_f16 nonrendering 8183.68 xferu64_aa_1_alpha_f16 nonrendering Generally closer to _u16, with one notable regression (probably hitting the denormalized multiply more often).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
double-woot! Glad we at least have a chance to add back in < 0 and > 1 if needed in the future, but certainly not needed today. lgtm
The CQ bit was checked by mtklein@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/200001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/200001
Message was sent while issue was closed.
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 ==========
Message was sent while issue was closed.
Committed patchset #11 (id:200001) as https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9
Message was sent while issue was closed.
A revert of this CL (patchset #11 id:200001) has been created in https://codereview.chromium.org/1693443003/ by mtklein@google.com. The reason for reverting is: Gotta fix Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD.
Message was sent while issue was closed.
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
The CQ bit was checked by mtklein@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from reed@google.com Link to the patchset: https://codereview.chromium.org/1685133005/#ps220001 (title: "guard")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1685133005/220001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1685133005/220001
Message was sent while issue was closed.
Description was changed from ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== SkHalfToFloat_01 / SkFloatToHalf_01 These are basically inlined, 4-at-a-time versions of our existing functions, but cut down to avoid any work that's only necessary outside [0,1]. Both f16 and f32 denorms should work fine modulo the usual ARMv7 NEON denorm==zero caveat. In exchange for a little speed, f32->f16 does not round properly. Instead it truncates, so it's never off by more than 1 bit. Support for finite values >1 or <0 is straightforward to add back. >1 might already work as-is. Getting close to _u16 performance: micros bench 261.13 xferu64_bw_1_opaque_u16 1833.51 xferu64_bw_1_alpha_u16 2762.32 ? xferu64_aa_1_opaque_u16 3334.29 xferu64_aa_1_alpha_u16 249.78 xferu64_bw_1_opaque_f16 3383.18 xferu64_bw_1_alpha_f16 4214.72 xferu64_aa_1_opaque_f16 4701.19 xferu64_aa_1_alpha_f16 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/9ea11a4235b3e3521cc8bf914a27c2d0dc062db9 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/fff055cc5f9ca5015678f4f13a4f842084bd62d5 ==========
Message was sent while issue was closed.
Committed patchset #12 (id:220001) as https://skia.googlesource.com/skia/+/fff055cc5f9ca5015678f4f13a4f842084bd62d5 |