Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(329)

Issue 874863002: SSE4 opaque blend using intrinsics instead of assembly. (Closed)

Created:
5 years, 11 months ago by mtklein_C
Modified:
5 years, 10 months ago
CC:
reviews_skia.org
Base URL:
https://skia.googlesource.com/skia.git@master
Target Ref:
refs/heads/master
Project:
skia
Visibility:
Public.

Description

SSE4 opaque blend using intrinsics instead of assembly. Since we had such a hard time with the assembly versions of this blit (to the point that we have them completely disabled everywhere), I thought I'd take a shot at writing a version of the blit using intrinsics. The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*) to skip the blend when the 16 src pixels we consider each loop are all opaque or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract all those alphas. It's worth looking to see if we can backport this type of logic to SSE2 using _mm_movemask_epi8, or up to 32 pixels at a time using AVX. My local performance testing doesn't show this to be an unambiguous win (there are probably microbenchmarks and SKPs where we'd be better off just powering through the blend rather than looking at alphas), but the potential does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.) DM says it draws pixel perfect compare to the old code. Microbenchmarks: bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x bitmap_RGBA_8888 5.73us -> 5.58us 0.97x bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x Running over our ~70 SKP web page captures, this looks like we spend 0.7x the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should be a decent predictor of real-world impact. BUG=chromium:399842 Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785 CQ_EXTRA_TRYBOTS=client.skia:Test-Mac10.6-MacMini4.1-GeForce320M-x86_64-Release-Trybot Committed: https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493

Patch Set 1 #

Patch Set 2 : 16 at a time #

Patch Set 3 : correct #

Patch Set 4 : merge in over_tail #

Patch Set 5 : wasted work #

Patch Set 6 : shuffle #

Patch Set 7 : comments #

Patch Set 8 : single ptest #

Patch Set 9 : windows boo #

Patch Set 10 : rebase #

Patch Set 11 : simpler with one alpha mask. same speed #

Patch Set 12 : simpler still, maybe faster #

Patch Set 13 : MSVC doesn't like &, | #

Patch Set 14 : can't hurt to swap the order #

Total comments: 1

Patch Set 15 : 10.6 fix? #

Patch Set 16 : rebase for gyp changes #

Unified diffs Side-by-side diffs Delta from patch set Stats (+80 lines, -972 lines) Patch
M gyp/opts.gypi View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 chunk +1 line, -0 lines 0 comments Download
M src/opts/SkBlitRow_opts_SSE4.h View 1 chunk +4 lines, -19 lines 0 comments Download
A src/opts/SkBlitRow_opts_SSE4.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 1 chunk +66 lines, -0 lines 0 comments Download
D src/opts/SkBlitRow_opts_SSE4_asm.S View 1 chunk +0 lines, -475 lines 0 comments Download
D src/opts/SkBlitRow_opts_SSE4_x64_asm.S View 1 chunk +0 lines, -472 lines 0 comments Download
M src/opts/SkColor_opts_SSE2.h View 1 2 3 4 5 6 7 8 9 1 chunk +8 lines, -1 line 0 comments Download
M src/opts/opts_check_x86.cpp View 1 chunk +1 line, -5 lines 0 comments Download

Messages

Total messages: 43 (15 generated)
mtklein_C
Hi Henrik, since you were the author of https://codereview.chromium.org/289473009, and this is sort of my ...
5 years, 11 months ago (2015-01-25 16:53:39 UTC) #2
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/140001
5 years, 11 months ago (2015-01-25 16:56:07 UTC) #4
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-25 16:56:08 UTC) #5
commit-bot: I haz the power
Try jobs failed on following builders: Build-Win-VS2013-x86-Debug-Trybot on client.skia.compile (http://build.chromium.org/p/client.skia.compile/builders/Build-Win-VS2013-x86-Debug-Trybot/builds/2081)
5 years, 11 months ago (2015-01-25 16:59:48 UTC) #7
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/160001
5 years, 11 months ago (2015-01-25 17:04:08 UTC) #9
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-25 17:04:09 UTC) #10
commit-bot: I haz the power
No LGTM from a valid reviewer yet. Please ask for an LGTM from a full ...
5 years, 11 months ago (2015-01-25 23:04:18 UTC) #12
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/200001
5 years, 11 months ago (2015-01-26 19:38:34 UTC) #14
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-26 19:38:34 UTC) #15
mtklein_C
+Herb, who's been working on this with me this afternoon.
5 years, 11 months ago (2015-01-26 21:40:24 UTC) #18
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/220001
5 years, 11 months ago (2015-01-26 21:45:29 UTC) #20
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-26 21:45:29 UTC) #21
commit-bot: I haz the power
Try jobs failed on following builders: Build-Win-VS2013-x86-Debug-Trybot on client.skia.compile (http://build.chromium.org/p/client.skia.compile/builders/Build-Win-VS2013-x86-Debug-Trybot/builds/2117)
5 years, 11 months ago (2015-01-26 21:49:39 UTC) #23
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/240001
5 years, 11 months ago (2015-01-26 21:52:38 UTC) #25
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-26 21:52:39 UTC) #26
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/260001
5 years, 11 months ago (2015-01-26 21:59:59 UTC) #28
commit-bot: I haz the power
Note for Reviewers: The CQ is waiting for an approval. If you believe that the ...
5 years, 11 months ago (2015-01-26 22:00:00 UTC) #29
reed1
hurray for deleting code! (and speeding things up) lgtm
5 years, 11 months ago (2015-01-26 22:04:32 UTC) #31
commit-bot: I haz the power
Committed patchset #14 (id:260001) as https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785
5 years, 11 months ago (2015-01-26 22:06:46 UTC) #32
Nico
https://codereview.chromium.org/874863002/diff/260001/gyp/opts.gyp File gyp/opts.gyp (right): https://codereview.chromium.org/874863002/diff/260001/gyp/opts.gyp#newcode210 gyp/opts.gyp:210: 'target_name': 'opts_sse4', You probably need something like https://codereview.chromium.org/475273002 for ...
5 years, 11 months ago (2015-01-26 22:23:02 UTC) #34
bungeman-skia
A revert of this CL (patchset #14 id:260001) has been created in https://codereview.chromium.org/874033004/ by bungeman@google.com. ...
5 years, 11 months ago (2015-01-26 22:31:42 UTC) #35
Nico
On 2015/01/26 22:31:42, bungeman1 wrote: > A revert of this CL (patchset #14 id:260001) has ...
5 years, 11 months ago (2015-01-26 22:37:45 UTC) #36
mtklein
> This might be a config problem with your 10.6 bots. On a chromium 10.6 ...
5 years, 11 months ago (2015-01-26 23:08:02 UTC) #37
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874863002/300001
5 years, 11 months ago (2015-01-27 22:19:20 UTC) #39
commit-bot: I haz the power
Committed patchset #16 (id:300001) as https://skia.googlesource.com/skia/+/6dbfb21a6c88af6d94e8c823c3ad559f1a41b493
5 years, 11 months ago (2015-01-27 22:35:24 UTC) #40
fmalita_google_do_not_use
On 2015/01/27 22:35:24, I haz the power (commit-bot) wrote: > Committed patchset #16 (id:300001) as ...
5 years, 10 months ago (2015-01-29 05:04:13 UTC) #41
mtklein
> I just noticed a ~5% linux/key_mobile_sites_smooth rasterization time > improvement which seems attributable to ...
5 years, 10 months ago (2015-01-29 14:38:21 UTC) #42
stephana
5 years, 10 months ago (2015-02-02 17:52:00 UTC) #43
Message was sent while issue was closed.
A revert of this CL (patchset #16 id:300001) has been created in
https://codereview.chromium.org/873553003/ by stephana@google.com.

The reason for reverting is: This causes a bug on the 'hittestpath' GM on
MacMini 4,1 

See: 

https://gold.skia.org/#/triage/hittestpath?head=0 

for details. .

Powered by Google App Engine
This is Rietveld 408576698