Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(200)

Issue 2553403002: Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions (Closed)

Created:
4 years ago by manojkumar.bhosale
Modified:
4 years ago
Reviewers:
fbarchard1
CC:
gordana.cmiljanovic_imgtec.com, raghu.gandham_imgtec.com, parag.salasakar_imgtec.com, mandar.sahastrabuddhe_imgtec.com, rob.isherwood_imgtec.com
Target Ref:
refs/heads/master
Project:
libyuv
Visibility:
Public.

Description

Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) TransposeWx8_MSA - ~2.7x TransposeWx8_Any_MSA - ~2.1x TransposeUVWx8_MSA - ~2.5x TransposeUVWx8_Any_MSA - ~2.7x Performance Gain (vs C non-vectorized) TransposeWx8_MSA - ~4.6x TransposeWx8_Any_MSA - ~2.9x TransposeUVWx8_MSA - ~4.4x TransposeUVWx8_Any_MSA - ~3.7x Committed: https://chromium.googlesource.com/libyuv/libyuv/+/6fa5e4eb780b67fe3275a529c6c0da9ea7b58cff

Patch Set 1 #

Total comments: 9

Patch Set 2 : Changes as per review comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+342 lines, -33 lines) Patch
M Android.mk View 1 chunk +2 lines, -1 line 0 comments Download
M BUILD.gn View 1 chunk +1 line, -0 lines 0 comments Download
M CMakeLists.txt View 1 chunk +1 line, -0 lines 0 comments Download
M include/libyuv/macros_msa.h View 1 3 chunks +83 lines, -32 lines 0 comments Download
M include/libyuv/rotate_row.h View 5 chunks +29 lines, -0 lines 0 comments Download
M libyuv.gypi View 1 chunk +1 line, -0 lines 0 comments Download
M source/rotate.cc View 1 2 chunks +16 lines, -0 lines 0 comments Download
M source/rotate_any.cc View 2 chunks +6 lines, -0 lines 0 comments Download
A source/rotate_msa.cc View 1 chunk +203 lines, -0 lines 0 comments Download

Messages

Total messages: 16 (4 generated)
manojkumar.bhosale
4 years ago (2016-12-07 08:44:46 UTC) #1
manojkumar.bhosale
Updated with reviewers, cc list and performance gain numbers
4 years ago (2016-12-07 08:52:56 UTC) #4
manojkumar.bhosale
On 2016/12/07 08:52:56, manojkumar.bhosale wrote: > Updated with reviewers, cc list and performance gain numbers ...
4 years ago (2016-12-12 10:52:57 UTC) #6
fbarchard1
https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h File include/libyuv/macros_msa.h (right): https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode58 include/libyuv/macros_msa.h:58: \ remove blank lines from macros. https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode130 include/libyuv/macros_msa.h:130: asm ...
4 years ago (2016-12-12 19:59:19 UTC) #7
manojkumar.bhosale
https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h File include/libyuv/macros_msa.h (right): https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode58 include/libyuv/macros_msa.h:58: \ On 2016/12/12 19:59:18, fbarchard1 wrote: > remove blank ...
4 years ago (2016-12-13 08:52:23 UTC) #8
fbarchard1
lgtm https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc File source/rotate_msa.cc (right): https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc#newcode67 source/rotate_msa.cc:67: val0 = __msa_copy_s_d((v2i64)dst0, 0); On 2016/12/13 08:52:23, manojkumar.bhosale ...
4 years ago (2016-12-13 18:32:03 UTC) #9
manojkumar.bhosale
On 2016/12/13 18:32:03, fbarchard1 wrote: > lgtm > > https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc > File source/rotate_msa.cc (right): > ...
4 years ago (2016-12-14 06:23:20 UTC) #10
fbarchard1
On 2016/12/14 06:23:20, manojkumar.bhosale wrote: > On 2016/12/13 18:32:03, fbarchard1 wrote: > > lgtm > ...
4 years ago (2016-12-14 18:40:38 UTC) #11
fbarchard1
lgtm
4 years ago (2016-12-14 18:40:45 UTC) #12
manojkumar.bhosale
Committed patchset #2 (id:20001) manually as 6fa5e4eb780b67fe3275a529c6c0da9ea7b58cff (presubmit successful).
4 years ago (2016-12-15 04:34:57 UTC) #14
manojkumar.bhosale
On 2016/12/14 18:40:38, fbarchard1 wrote: > On 2016/12/14 06:23:20, manojkumar.bhosale wrote: > > On 2016/12/13 ...
4 years ago (2016-12-20 09:36:11 UTC) #15
fbarchard1
4 years ago (2016-12-20 19:12:01 UTC) #16
Message was sent while issue was closed.
On 2016/12/20 09:36:11, manojkumar.bhosale wrote:
> On 2016/12/14 18:40:38, fbarchard1 wrote:
> > On 2016/12/14 06:23:20, manojkumar.bhosale wrote:
> > > On 2016/12/13 18:32:03, fbarchard1 wrote:
> > > > lgtm
> > > > 
> > > > https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc
> > > > File source/rotate_msa.cc (right):
> > > > 
> > > >
> > >
> >
>
https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc#newcode67
> > > > source/rotate_msa.cc:67: val0 = __msa_copy_s_d((v2i64)dst0, 0);
> > > > On 2016/12/13 08:52:23, manojkumar.bhosale wrote:
> > > > > On 2016/12/12 19:59:19, fbarchard1 wrote:
> > > > > > consider if this can be weaved back into simd registers and use
single
> > > store
> > > > > per
> > > > > > vector?
> > > > > 
> > > > > As each double word store is offset with dst_stride, not possible to
> form
> > a
> > > > > vector for single store.
> > > > 
> > > > Acknowledged.
> > > > msa can't store 8 bytes directly from a SIMD register?
> > > 
> > > No, it can't.
> > 
> > ok, so a full 16x16 transpose would work better.  Suggest you add a comment
or
> > TODO that the function could be improved doing 16 byte writes.
> 
> Yes, 16x16 transpose would work better for MSA, however then in rotate.cc,
> in function TransposePlane, we will need to change the calling from
TransposeWx8
> to TransposeWx16 (processing 16x16 tile)
> Will this kind of MSA specific change be allowed in rotate.cc (generic C
code)?

yes.
We could either do it for all platforms, or have a block size constant for each
platform.

Powered by Google App Engine
This is Rietveld 408576698