DescriptionRewrite memset benches, then use results to add a small-N optimization.
The benches for N <= 10 get around 2x faster on my N7 and N9. I believe this
is because of the reduced function-call-then-function-pointer-call overhead on
the N7, and additionally because it seems autovectorization beats our NEON code
for small N on the N9.
My desktop is unchanged, though that's probably because N=10 lies well within a
region where memset's performance is essentially constant: N=100 takes only
about 2x as long as N=1 and N=10, which perform nearly identically.
BUG=skia:
Committed: https://skia.googlesource.com/skia/+/9ff378b01be0b0a3fc35677a2155ba4ade286cc2
Patch Set 1 #
Total comments: 2
Patch Set 2 : generalize #
Total comments: 4
Patch Set 3 : undef #
Messages
Total messages: 20 (8 generated)
|