Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(116)

Issue 1092433002: Rework SSE and NEON Color32 algorithms to be more correct and faster. (Closed)

Created:
5 years, 8 months ago by mtklein_C
Modified:
5 years, 8 months ago
Reviewers:
f(malita), mtklein, reed1
CC:
reviews_skia.org
Base URL:
https://skia.googlesource.com/skia@master
Target Ref:
refs/heads/master
Project:
skia
Visibility:
Public.

Description

Rework SSE and NEON Color32 algorithms to be more correct and faster. This algorithm changes the blend math, guarded by SK_LEGACY_COLOR32_MATH. The new math is more correct: it's never off by more than 1, and correct in all the interesting 0x00 and 0xFF edge cases, where the old math was never off by more than 2, and not always correct on the edges. If you look at tests/BlendTest.cpp, the old code was using the `blend_256_plus1_trunc` algorithm, while the new code uses `blend_256_round_alt`. Neither uses `blend_perfect`, which is about ~35% slower than `blend_256_round_alt`. This will require an unfathomable number of rebaselines, first to Skia, then to Blink when I remove the guard. I plan to follow up with some integer SIMD abstractions that can unify these two implementations into a single algorithm. This was originally what I was working on here, but the correctness gains seem to be quite compelling. The only places these two algorithms really differ greatly now is the kernel function, and even there they can really both be expressed abstractly as: - multiply 8-bits and 8-bits producing 16-bits - add 16-bits to 16-bits, returning the top 8 bits. All the constants are the same, except SSE is a little faster to keep 8 16-bit inverse alphas, NEON's a little faster to keep 8 8-bit inverse alphas. I may need to take this small speed win back to unify the two. We should expect a ~25% speedup on Intel (mostly from unrolling to 8 pixels) and a ~20% speedup on ARM (mostly from using vaddhn to add `color`, round, and narrow back down to 8-bit all into one instruction. (I am probably missing several more related bugs here.) BUG=skia:3738, skia:420, chromium:111470 Committed: https://skia.googlesource.com/skia/+/afe2ffb8ba5e7362a2ee6f4e1540c9ab22df2c1e

Patch Set 1 #

Patch Set 2 : update NEON algorithm to match #

Patch Set 3 : by 8 #

Patch Set 4 : tweak out SSE. faster too #

Patch Set 5 : add legacy guards #

Patch Set 6 : fix Color32 too, add some notes #

Patch Set 7 : >> 0 #

Patch Set 8 : comment #

Patch Set 9 : support #

Unified diffs Side-by-side diffs Delta from patch set Stats (+136 lines, -156 lines) Patch
M src/core/SkBlitRow_D32.cpp View 1 2 3 4 5 6 7 8 1 chunk +28 lines, -18 lines 0 comments Download
M src/opts/SkBlitRow_opts_SSE2.cpp View 1 2 3 4 5 6 7 8 1 chunk +55 lines, -47 lines 0 comments Download
M src/opts/SkBlitRow_opts_arm_neon.cpp View 1 2 3 4 5 6 7 8 1 chunk +53 lines, -91 lines 0 comments Download

Messages

Total messages: 27 (10 generated)
commit-bot: I haz the power
Dry run: COMMIT=false is deprecated please use the dry run button instead.
5 years, 8 months ago (2015-04-17 14:24:45 UTC) #3
mtklein_C
5 years, 8 months ago (2015-04-17 14:24:50 UTC) #5
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1092433002/80001
5 years, 8 months ago (2015-04-17 14:50:41 UTC) #7
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
5 years, 8 months ago (2015-04-17 14:57:41 UTC) #9
reed1
Lets be sure to capture the new "blessed" src-over algorithm somewhere in the code/documentation, so ...
5 years, 8 months ago (2015-04-17 15:18:36 UTC) #10
mtklein
Done. Updated the vanilla Color32 too, which we'll see on the Xoom.
5 years, 8 months ago (2015-04-17 15:52:32 UTC) #11
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1092433002/140001
5 years, 8 months ago (2015-04-17 16:18:14 UTC) #13
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
5 years, 8 months ago (2015-04-17 16:24:16 UTC) #15
reed1
thanks for the change/dox in the portable code. Obviously we'll want to centralize (or remove) ...
5 years, 8 months ago (2015-04-17 16:30:11 UTC) #16
reed1
nit: for the most part, we're using SK_SUPPORT_LEGACY_... for these flags, to make it easy ...
5 years, 8 months ago (2015-04-17 16:30:52 UTC) #17
mtklein
On 2015/04/17 16:30:52, reed1 wrote: > nit: for the most part, we're using SK_SUPPORT_LEGACY_... for ...
5 years, 8 months ago (2015-04-17 16:36:01 UTC) #18
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1092433002/160001
5 years, 8 months ago (2015-04-17 16:36:18 UTC) #20
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
5 years, 8 months ago (2015-04-17 16:48:10 UTC) #22
mtklein
Going to land this to see the overall effect. The Chrome guard has landed, so ...
5 years, 8 months ago (2015-04-17 18:00:24 UTC) #23
mtklein
lgtm
5 years, 8 months ago (2015-04-17 18:00:29 UTC) #25
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1092433002/160001
5 years, 8 months ago (2015-04-17 18:00:38 UTC) #26
commit-bot: I haz the power
5 years, 8 months ago (2015-04-17 18:01:02 UTC) #27
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as
https://skia.googlesource.com/skia/+/afe2ffb8ba5e7362a2ee6f4e1540c9ab22df2c1e

Powered by Google App Engine
This is Rietveld 408576698