Chromium Code Reviews

Issue 1505673003: Optimize yuv alpha blend AVX2 code to do 32 pixels at time. (Closed)

Created:
5 years ago by fbarchard1
Modified:
5 years ago
Reviewers:
harryjin, fbarchard, Diony Rosa
CC:
harryjin
Base URL:
https://chromium.googlesource.com/libyuv/libyuv@master
Target Ref:
refs/heads/master
Project:
libyuv
Visibility:
Public.

Description

Optimize yuv alpha blend AVX2 code to do 32 pixels at time. out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt Was LibYUVPlanarTest.I420Blend_Opt (2335 ms) Now LibYUVPlanarTest.I420Blend_Opt (1937 ms) vs SSSE3 LibYUVPlanarTest.I420Blend_Opt (2599 ms) BUG=libyuv:527 R=dhrosa@google.com Committed: https://chromium.googlesource.com/libyuv/libyuv/+/dee77a4ebeaebc781cb3acd80aa6627fd1c7c825

Patch Set 1 #

Patch Set 2 : avx2 does 32 pixels at a time now #

Patch Set 3 : gcc port of avx2 that does 32 pixels #

Total comments: 6

Patch Set 4 : add xgetbv comment #

Patch Set 5 : update formula to match spreadsheet #

Patch Set 6 : merge cpuid changes #

Unified diffs Side-by-side diffs Stats (+58 lines, -39 lines)
M source/cpu_id.cc View 2 chunks +3 lines, -3 lines 0 comments
M source/planar_functions.cc View 2 chunks +2 lines, -2 lines 0 comments
M source/row_gcc.cc View 4 chunks +26 lines, -16 lines 0 comments
M source/row_win.cc View 3 chunks +27 lines, -18 lines 0 comments

Messages

Total messages: 10 (4 generated)
fbarchard
perf report on avx2 function: 0.02 │ vpsllw $0x8,%ymm5,%ymm5 0.02 │ mov $0x80808080,%eax 0.06 │ ...
5 years ago (2015-12-08 06:59:03 UTC) #3
harryjin
https://codereview.chromium.org/1505673003/diff/40001/source/cpu_id.cc File source/cpu_id.cc (right): https://codereview.chromium.org/1505673003/diff/40001/source/cpu_id.cc#newcode110 source/cpu_id.cc:110: //0 && (defined(_MSC_VER) && !defined(__clang__)) && (_MSC_FULL_VER >= 160040219) ...
5 years ago (2015-12-08 07:48:58 UTC) #5
fbarchard
Add comment about xgetbv bug. Only affects AVX2 on vs2013 and earlier. Add version number. ...
5 years ago (2015-12-08 19:24:04 UTC) #6
fbarchard
BlendPlaneRow_AVX2 takes less time, meaning ScaleRowDown2Box_SSE2 takes more: Samples: 7K of event 'cycles', Event count ...
5 years ago (2015-12-08 19:36:39 UTC) #7
Diony Rosa
lgtm
5 years ago (2015-12-09 01:28:51 UTC) #8
fbarchard1
5 years ago (2015-12-09 02:20:35 UTC) #10
Message was sent while issue was closed.
Committed patchset #6 (id:100001) manually as
dee77a4ebeaebc781cb3acd80aa6627fd1c7c825 (presubmit successful).

Powered by Google App Engine