Descriptiondiv255(x) as ((x+128)*257)>>16 with SSE
_mm_mulhi_epu16 makes the (...*257)>>16 part simple.
This seems to speed up every transfermode that uses div255(),
in the 7-25% range.
It even appears to obviate the need for approxMulDiv255() on SSE.
I'm not sure about NEON yet, so I'll keep approxMulDiv255() for now.
Should be no pixels change:
https://gold.skia.org/search2?issue=1452903004&unt=true&query=source_type%3Dgm&master=false
BUG=skia:
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/cbf4fba43933302a846872e4c5ce8f1adb8b325e
Patch Set 1 #Patch Set 2 : rest #Patch Set 3 : notes #
Messages
Total messages: 14 (8 generated)
|