Descriptionskeleton for float <-> half optimized procs
Nothing fancy yet, just calls the serial code in a loop.
I will try to folow this up with at least some of:
- SSE2 version of serial code
- NEON version of serial code
- NEON version using vcvt.f32.f16/vcvt.f16.f32
- F16C (between AVX and AVX2) version using vcvtph2ps/vcvtps2ph
The last two are fastest but need runtime detection.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1686543003
Committed: https://skia.googlesource.com/skia/+/a525cb151bb39fb6362af051f69b6d633f660fd9
Patch Set 1 #Patch Set 2 : add a test #Patch Set 3 : npot #Patch Set 4 : bench #
Messages
Total messages: 14 (7 generated)
|