Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(8)

Issue 1505433002: AVX2 YUV alpha blender and improved unittests (Closed)

Created:
5 years ago by fbarchard1
Modified:
5 years ago
CC:
Diony Rosa
Base URL:
https://chromium.googlesource.com/libyuv/libyuv@master
Target Ref:
refs/heads/master
Project:
libyuv
Visibility:
Public.

Description

AVX2 YUV alpha blender and improved unittests AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions. unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Committed: https://chromium.googlesource.com/libyuv/libyuv/+/bea690b3e03d24f77fea45c9a8592ea480a4acd8

Patch Set 1 #

Patch Set 2 : port avx2 to gcc #

Patch Set 3 : enable avx2 for gcc #

Patch Set 4 : add vpermq for avx2 #

Patch Set 5 : move vpermq after pack" #

Patch Set 6 : I420Blend implemented #

Total comments: 2

Patch Set 7 : off by 1 fix on win #

Unified diffs Side-by-side diffs Delta from patch set Stats (+539 lines, -46 lines) Patch
M README.chromium View 1 1 chunk +1 line, -1 line 0 comments Download
M include/libyuv/planar_functions.h View 1 2 3 4 5 2 chunks +26 lines, -0 lines 0 comments Download
M include/libyuv/row.h View 1 2 3 4 5 3 chunks +7 lines, -6 lines 0 comments Download
M include/libyuv/version.h View 1 1 chunk +1 line, -1 line 0 comments Download
M source/planar_functions.cc View 1 2 3 4 5 2 chunks +162 lines, -0 lines 0 comments Download
M source/row_gcc.cc View 1 2 3 4 2 chunks +50 lines, -1 line 0 comments Download
M source/row_win.cc View 1 2 3 4 5 6 4 chunks +59 lines, -3 lines 0 comments Download
M unit_test/planar_test.cc View 1 2 3 4 5 4 chunks +233 lines, -34 lines 0 comments Download

Messages

Total messages: 12 (4 generated)
fbarchard1
5 years ago (2015-12-05 01:07:28 UTC) #1
fbarchard
AVX2 LibYUVPlanarTest.BlendPlane_Opt (176 ms) SSSE3 LibYUVPlanarTest.BlendPlane_Opt (193 ms)
5 years ago (2015-12-05 05:16:28 UTC) #3
Diony Rosa
lgtm
5 years ago (2015-12-06 05:24:07 UTC) #6
fbarchard
ARGBBlend_Any (605 ms) ARGBBlend_Unaligned (547 ms) ARGBBlend_Invert (489 ms) ARGBBlend_Opt (489 ms) BlendPlaneRow_Opt (167 ms) ...
5 years ago (2015-12-06 06:23:12 UTC) #7
fbarchard1
Committed patchset #7 (id:120001) manually as bea690b3e03d24f77fea45c9a8592ea480a4acd8 (presubmit successful).
5 years ago (2015-12-06 06:23:34 UTC) #9
fbarchard
gcc versions with AVX2 ARGBBlend_Unaligned (515 ms) ARGBBlend_Any (498 ms) ARGBBlend_Opt (491 ms) ARGBBlend_Invert (456 ...
5 years ago (2015-12-06 06:25:07 UTC) #10
Diony Rosa
https://codereview.chromium.org/1505433002/diff/100001/source/planar_functions.cc File source/planar_functions.cc (right): https://codereview.chromium.org/1505433002/diff/100001/source/planar_functions.cc#newcode725 source/planar_functions.cc:725: for (y = 0; y < height; ++y) { ...
5 years ago (2015-12-06 07:10:35 UTC) #11
fbarchard
5 years ago (2015-12-07 18:49:34 UTC) #12
Message was sent while issue was closed.
Linux64 AVX2
LibYUVPlanarTest.I420Blend_Opt (275 ms)
LibYUVPlanarTest.I420Blend_Unaligned (237 ms)

Win32 SSSE3
LibYUVPlanarTest.I420Blend_Opt (373 ms)
LibYUVPlanarTest.I420Blend_Unaligned (355 ms)

perf on linux version
Samples: 1K of event 'cycles', Event count (approx.): 997736971
 79.26%  libyuv_unittest  libyuv_unittest    [.] BlendPlaneRow_AVX2
 14.91%  libyuv_unittest  libyuv_unittest    [.] ScaleRowDown2Box_SSE2
  2.15%  libyuv_unittest  libyuv_unittest    [.] libyuv::TestI420Blend(int, int,
int, int, int, int, int) [clone .isra.28]
  1.07%  libyuv_unittest  [kernel.kallsyms]  [k] 0xffffffff8104f45a
  0.74%  libyuv_unittest  libyuv_unittest    [.] testing::AssertionResult
testing::internal::CmpHelperEQ<unsigned char, unsigned char>(char const*, char
const*, unsigned
  0.64%  libyuv_unittest  libyuv_unittest    [.] I420Blend

       │    Disassembly of section .text:
       │
       │    0000000000597240 <ScaleRowDown2Box_SSE2>:
       │      pcmpeq %xmm5,%xmm5
  0.60 │      psrlw  $0x8,%xmm5
       │ 9:   movdqu (%rdi),%xmm0
 19.28 │      movdqu 0x10(%rdi),%xmm1
  2.41 │      movdqu (%rdi,%rsi,1),%xmm2
 29.52 │      movdqu 0x10(%rdi,%rsi,1),%xmm3
  3.01 │      lea    0x20(%rdi),%rdi
  1.20 │      pavgb  %xmm2,%xmm0
  6.63 │      pavgb  %xmm3,%xmm1
  4.22 │      movdqa %xmm0,%xmm2
  3.01 │      psrlw  $0x8,%xmm0
  3.61 │      movdqa %xmm1,%xmm3
  5.42 │      psrlw  $0x8,%xmm1
  6.02 │      pand   %xmm5,%xmm2
  2.41 │      pand   %xmm5,%xmm3
  2.41 │      pavgw  %xmm2,%xmm0
  2.41 │      pavgw  %xmm3,%xmm1
  1.81 │      packus %xmm1,%xmm0
  3.61 │      movdqu %xmm0,(%rdx)
  1.81 │      lea    0x10(%rdx),%rdx
  0.60 │      sub    $0x10,%ecx
       │    ↑ jg     9
       │    ← retq

I ran a test width drmemory on odd height, but it didnt catch an overread.

Dr. Memory version 1.9.0 build 4 built on Oct  2 2015 13:13:14
Dr. Memory results for pid 5144: "libyuv_unittest.exe"
Application cmdline: "out\debug\libyuv_unittest.exe --gtest_catch_exceptions=0
--gtest_filter=*BlendPlane_Opt"
Recorded 116 suppression(s) from default C:\Program Files (x86)\Dr.
Memory\bin\suppress-default.txt

===========================================================================
FINAL SUMMARY:

DUPLICATE ERROR COUNTS:

SUPPRESSIONS USED:

NO ERRORS FOUND:
      0 unique,     0 total unaddressable access(es)
      0 unique,     0 total uninitialized access(es)
      0 unique,     0 total invalid heap argument(s)
      0 unique,     0 total GDI usage error(s)
      0 unique,     0 total handle leak(s)
      0 unique,     0 total warning(s)
      0 unique,     0 total,      0 byte(s) of leak(s)
      0 unique,     0 total,      0 byte(s) of possible leak(s)
ERRORS IGNORED:
     10 potential leak(s) (suspected false positives)
         (details: C:\Users\fbarchard\AppData\Roaming\Dr.
Memory\DrMemory-libyuv_unittest.exe.5144.000\potential_errors.txt)
    123 unique,   195 total,  12506 byte(s) of still-reachable allocation(s)
         (re-run with "-show_reachable" for details)
Details: C:\Users\fbarchard\AppData\Roaming\Dr.
Memory\DrMemory-libyuv_unittest.exe.5144.000\results.txt

https://codereview.chromium.org/1505433002/diff/100001/source/planar_function...
File source/planar_functions.cc (right):

https://codereview.chromium.org/1505433002/diff/100001/source/planar_function...
source/planar_functions.cc:725: for (y = 0; y < height; ++y) {
On 2015/12/06 07:10:35, Diony Rosa wrote:
> What about images with odd height? I think this will read past the end of the
> alpha plane.
> 
> e.g., if the height is 5, then the chroma channel has height 3.
> 
> First loop will use alpha rows 0 + 1,
> next loop will use alpha rows 2 + 3
> final loop will use alpha rows 4 + 5, but the alpha channel doesn't have a row
> 5.

Acknowledged.
Good catch. I'll add odd height support in followup.

Powered by Google App Engine
This is Rietveld 408576698