Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1382)

Issue 2387713002: HalfFloat_SSE2 for Visual C (Closed)

Created:
4 years, 2 months ago by fbarchard1
Modified:
4 years, 2 months ago
Reviewers:
wangcheng, hubbe
Target Ref:
refs/heads/master
Project:
libyuv
Visibility:
Public.

Description

HalfFloat_SSE2 for Visual C Low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560, chromium:445071 TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows R=hubbe@chromium.org, wangcheng@google.com Committed: https://chromium.googlesource.com/libyuv/libyuv/+/aa197ee1a307fefb7853784fc04c82e0c7bd823b

Patch Set 1 #

Patch Set 2 : switch to xmm2/3 to allow gcc to use xmm0 for param #

Patch Set 3 : cast to dword ptr for movd on clangcl #

Patch Set 4 : planar functions sse2 only #

Unified diffs Side-by-side diffs Delta from patch set Stats (+78 lines, -58 lines) Patch
M include/libyuv/row.h View 2 chunks +1 line, -7 lines 0 comments Download
M source/planar_functions.cc View 1 2 3 1 chunk +0 lines, -9 lines 0 comments Download
M source/row_gcc.cc View 1 3 chunks +31 lines, -32 lines 0 comments Download
M source/row_win.cc View 1 2 2 chunks +46 lines, -10 lines 0 comments Download

Messages

Total messages: 7 (2 generated)
fbarchard1
Windows 32 Visual C 2015 on Sandy Bridge Z620 Was C 2081 ms Now SSE2 ...
4 years, 2 months ago (2016-09-30 18:05:30 UTC) #2
hubbe
LGTM Still can't really review the simd code properly.
4 years, 2 months ago (2016-09-30 18:07:44 UTC) #3
wangcheng
lgtm
4 years, 2 months ago (2016-09-30 18:10:36 UTC) #4
fbarchard1
Committed patchset #4 (id:60001) manually as aa197ee1a307fefb7853784fc04c82e0c7bd823b (presubmit successful).
4 years, 2 months ago (2016-10-03 17:33:42 UTC) #6
fbarchard1
4 years, 2 months ago (2016-10-13 00:16:05 UTC) #7
Message was sent while issue was closed.
For gcc switching to xmm2/3 reduced overhead

Was
0000000000000000 <HalfFloatRow_SSE2>:
   0:	f3 0f 59 05 00 00 00 	mulss  0x0(%rip),%xmm0        # 8
<HalfFloatRow_SSE2+0x8>
   7:	00 
   8:	f3 0f 11 44 24 fc    	movss  %xmm0,-0x4(%rsp)
   e:	66 0f 6e 64 24 fc    	movd   -0x4(%rsp),%xmm4
  14:	66 0f 70 e4 00       	pshufd $0x0,%xmm4,%xmm4
  19:	66 0f ef ed          	pxor   %xmm5,%xmm5
  1d:	f3 0f 6f 07          	movdqu (%rdi),%xmm0
  21:	48 8d 7f 10          	lea    0x10(%rdi),%rdi
  25:	66 0f 6f c8          	movdqa %xmm0,%xmm1
  29:	66 0f 61 c5          	punpcklwd %xmm5,%xmm0
  2d:	0f 5b c0             	cvtdq2ps %xmm0,%xmm0
  30:	66 0f 69 cd          	punpckhwd %xmm5,%xmm1
  34:	0f 5b c9             	cvtdq2ps %xmm1,%xmm1
  37:	0f 59 c4             	mulps  %xmm4,%xmm0
  3a:	0f 59 cc             	mulps  %xmm4,%xmm1
  3d:	66 0f 72 d0 0d       	psrld  $0xd,%xmm0
  42:	66 0f 72 d1 0d       	psrld  $0xd,%xmm1
  47:	66 0f 6b c1          	packssdw %xmm1,%xmm0
  4b:	f3 0f 7f 06          	movdqu %xmm0,(%rsi)
  4f:	48 8d 76 10          	lea    0x10(%rsi),%rsi
  53:	83 ea 08             	sub    $0x8,%edx
  56:	7f c5                	jg     1d <HalfFloatRow_SSE2+0x1d>
  58:	c3                   	retq   

Now
0000000000000000 <HalfFloatRow_SSE2>:
   0:	f3 0f 59 05 00 00 00 	mulss  0x0(%rip),%xmm0        # 8
<HalfFloatRow_SSE2+0x8>
   7:	00 
   8:	66 0f 70 e0 00       	pshufd $0x0,%xmm0,%xmm4
   d:	66 0f ef ed          	pxor   %xmm5,%xmm5
  11:	f3 0f 6f 17          	movdqu (%rdi),%xmm2
  15:	48 8d 7f 10          	lea    0x10(%rdi),%rdi
  19:	66 0f 6f da          	movdqa %xmm2,%xmm3
  1d:	66 0f 61 d5          	punpcklwd %xmm5,%xmm2
  21:	0f 5b d2             	cvtdq2ps %xmm2,%xmm2
  24:	66 0f 69 dd          	punpckhwd %xmm5,%xmm3
  28:	0f 5b db             	cvtdq2ps %xmm3,%xmm3
  2b:	0f 59 d4             	mulps  %xmm4,%xmm2
  2e:	0f 59 dc             	mulps  %xmm4,%xmm3
  31:	66 0f 72 d2 0d       	psrld  $0xd,%xmm2
  36:	66 0f 72 d3 0d       	psrld  $0xd,%xmm3
  3b:	66 0f 6b d3          	packssdw %xmm3,%xmm2
  3f:	f3 0f 7f 16          	movdqu %xmm2,(%rsi)
  43:	48 8d 76 10          	lea    0x10(%rsi),%rsi
  47:	83 ea 08             	sub    $0x8,%edx
  4a:	7f c5                	jg     11 <HalfFloatRow_SSE2+0x11>
  4c:	c3                   	retq

Powered by Google App Engine
This is Rietveld 408576698