DescriptionAdd SSSE3 implementation for SkPMFloat, with faster get() and set().
With SSSE3, we can use the Swiss Army Knife byte shuffler pshufb,
a.k.a. _mm_shuffle_epi8(), to jump directly between 32 and 128 bits.
In microbench isolation, this looks like an additional 10-15% speedup:
SkPMFloat_get: 2.35ns -> 1.98ns
SkPMFloat_clamp: 2.35ns -> 2.18ns
Before this CL, get() and clamp() were identical code. The _get benchmark improves because both set() and get() become faster; the _clamp benchmark shows the improvement from set() getting faster with clamp() staying the same.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/8c7ba092a68836c5db95c2a80b74d4f9cb475cc2
Patch Set 1 #Patch Set 2 : leave a note about SSE 4.1 #Patch Set 3 : rediff #Patch Set 4 : remove the note for now #Patch Set 5 : rebase #
Total comments: 4
Messages
Total messages: 8 (3 generated)
|