Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(164)

Issue 874033004: Revert of SSE4 opaque blend using intrinsics instead of assembly. (Closed)

Created:
5 years, 11 months ago by bungeman-skia
Modified:
5 years, 11 months ago
CC:
reviews_skia.org
Base URL:
https://skia.googlesource.com/skia.git@master
Target Ref:
refs/heads/master
Project:
skia
Visibility:
Public.

Description

Revert of SSE4 opaque blend using intrinsics instead of assembly. (patchset #14 id:260001 of https://codereview.chromium.org/874863002/) Reason for revert: This kills Mac 10.6 bots. FAILED: c++ -MMD -MF obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o.d -DSK_INTERNAL -DSK_GAMMA_SRGB -DSK_GAMMA_APPLY_TO_A8 -DSK_SCALAR_TO_FLOAT_EXCLUDED -DSK_ALLOW_STATIC_GLOBAL_INITIALIZERS=1 -DSK_SUPPORT_GPU=1 -DSK_SUPPORT_OPENCL=0 -DSK_FORCE_DISTANCE_FIELD_TEXT=0 -DSK_BUILD_FOR_MAC -DSK_CRASH_HANDLER -DSK_DEVELOPER=1 -I../../src/core -I../../src/utils -I../../include/c -I../../include/config -I../../include/core -I../../include/pathops -I../../include/pipe -I../../include/utils/mac -I../../include/effects -O0 -gdwarf-2 -mmacosx-version-min=10.6 -arch x86_64 -mssse3 -Wall -Wextra -Winit-self -Wpointer-arith -Wsign-compare -Wno-unused-parameter -Wno-invalid-offsetof -msse4.1 -c ../../src/opts/SkBlitRow_opts_SSE4.cpp -o obj/src/opts/opts_sse4.SkBlitRow_opts_SSE4.o ../../src/opts/SkBlitRow_opts_SSE4.cpp:15:27: warning: x86intrin.h: No such file or directory ../../src/opts/SkBlitRow_opts_SSE4.cpp: In function 'void S32A_Opaque_BlitRow32_SSE4(SkPMColor*, const SkPMColor*, int, U8CPU)': ../../src/opts/SkBlitRow_opts_SSE4.cpp:40: error: '_mm_testz_si128' was not declared in this scope ../../src/opts/SkBlitRow_opts_SSE4.cpp:45: error: '_mm_testc_si128' was not declared in this scope Original issue's description: > SSE4 opaque blend using intrinsics instead of assembly. > > Since we had such a hard time with the assembly versions of this blit (to the > point that we have them completely disabled everywhere), I thought I'd take > a shot at writing a version of the blit using intrinsics. > > The key feature of SSE4 we're exploiting is that we can use ptest (_mm_test*) > to skip the blend when the 16 src pixels we consider each loop are all opaque > or all transparent. _mm_shuffle_epi8 from SSSE3 also lends a hand to extract > all those alphas. > > It's worth looking to see if we can backport this type of logic to SSE2 using > _mm_movemask_epi8, or up to 32 pixels at a time using AVX. > > My local performance testing doesn't show this to be an unambiguous win > (there are probably microbenchmarks and SKPs where we'd be better off just > powering through the blend rather than looking at alphas), but the potential > does seem tantalizing enough to let skiaperf vet it on the bots. (< 1.0x is a win.) > > DM says it draws pixel perfect compare to the old code. > > Microbenchmarks: > bitmap_RGBA_8888_A_source_stripes_two 14us -> 14.4us 1.03x > bitmap_RGBA_8888_A_source_stripes_three 14.3us -> 14.5us 1.01x > bitmap_RGBA_8888_scale_bilerp 61.9us -> 62.2us 1.01x > bitmap_RGBA_8888_update_volatile_scale_rotate_bilerp 102us -> 101us 0.99x > bitmap_RGBA_8888_scale_rotate_bilerp 103us -> 101us 0.99x > bitmap_RGBA_8888_scale 18.4us -> 18.2us 0.99x > bitmap_RGBA_8888_A_scale_rotate_bicubic 71us -> 70us 0.99x > bitmap_RGBA_8888_update_scale_rotate_bilerp 103us -> 101us 0.99x > bitmap_RGBA_8888_A_scale_rotate_bilerp 112us -> 109us 0.98x > bitmap_RGBA_8888_update_volatile 5.72us -> 5.58us 0.98x > bitmap_RGBA_8888 5.73us -> 5.58us 0.97x > bitmap_RGBA_8888_update 5.78us -> 5.6us 0.97x > bitmap_RGBA_8888_A_scale_bilerp 70.7us -> 68us 0.96x > bitmap_RGBA_8888_A_scale_bicubic 23.7us -> 21.8us 0.92x > bitmap_RGBA_8888_A 13.9us -> 10.9us 0.78x > bitmap_RGBA_8888_A_source_opaque 14us -> 6.29us 0.45x > bitmap_RGBA_8888_A_source_transparent 14us -> 3.65us 0.26x > > Running over our ~70 SKP web page captures, this looks like we spend 0.7x > the time in S32A_Opaque_BlitRow compared to the SSE2 version, which should > be a decent predictor of real-world impact. > > BUG=chromium:399842 > > Committed: https://skia.googlesource.com/skia/+/04bc91b972417038fecfa87c484771eac2b9b785 TBR=henrik.smiding@intel.com,mtklein@google.com,herb@google.com,reed@google.com,thakis@chromium.org,mtklein@chromium.org NOPRESUBMIT=true NOTREECHECKS=true NOTRY=true BUG=chromium:399842 Committed: https://skia.googlesource.com/skia/+/2d80dd264704984bca4c2915f31ef427033b0b9b

Patch Set 1 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+982 lines, -83 lines) Patch
M gyp/opts.gyp View 1 chunk +10 lines, -1 line 0 comments Download
M src/opts/SkBlitRow_opts_SSE4.h View 1 chunk +19 lines, -4 lines 0 comments Download
D src/opts/SkBlitRow_opts_SSE4.cpp View 1 chunk +0 lines, -69 lines 0 comments Download
A src/opts/SkBlitRow_opts_SSE4_asm.S View 1 chunk +475 lines, -0 lines 0 comments Download
A src/opts/SkBlitRow_opts_SSE4_x64_asm.S View 1 chunk +472 lines, -0 lines 0 comments Download
M src/opts/SkColor_opts_SSE2.h View 1 chunk +1 line, -8 lines 0 comments Download
M src/opts/opts_check_x86.cpp View 1 chunk +5 lines, -1 line 0 comments Download

Messages

Total messages: 3 (0 generated)
bungeman-skia
Created Revert of SSE4 opaque blend using intrinsics instead of assembly.
5 years, 11 months ago (2015-01-26 22:31:43 UTC) #1
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/874033004/1
5 years, 11 months ago (2015-01-26 22:31:58 UTC) #2
commit-bot: I haz the power
5 years, 11 months ago (2015-01-26 22:32:12 UTC) #3
Message was sent while issue was closed.
Committed patchset #1 (id:1) as
https://skia.googlesource.com/skia/+/2d80dd264704984bca4c2915f31ef427033b0b9b

Powered by Google App Engine
This is Rietveld 408576698