Issue 2553403002: Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions

Issue 2553403002: Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions (Closed)

Created:
4 years ago by manojkumar.bhosale

Modified:
4 years ago

Reviewers:
fbarchard1

CC:
gordana.cmiljanovic_imgtec.com, raghu.gandham_imgtec.com, parag.salasakar_imgtec.com, mandar.sahastrabuddhe_imgtec.com, rob.isherwood_imgtec.com

Target Ref:
refs/heads/master

Project:
libyuv

Visibility:
Public.

More Reviews

Description

Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) TransposeWx8_MSA - ~2.7x TransposeWx8_Any_MSA - ~2.1x TransposeUVWx8_MSA - ~2.5x TransposeUVWx8_Any_MSA - ~2.7x Performance Gain (vs C non-vectorized) TransposeWx8_MSA - ~4.6x TransposeWx8_Any_MSA - ~2.9x TransposeUVWx8_MSA - ~4.4x TransposeUVWx8_Any_MSA - ~3.7x Committed: https://chromium.googlesource.com/libyuv/libyuv/+/6fa5e4eb780b67fe3275a529c6c0da9ea7b58cff

Patch Set 1 #

Total comments: 9

Patch Set 2 : Changes as per review comments #

Created: 4 years ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+342 lines, -33 lines)			Patch
M	Android.mk	View		1 chunk	+2 lines, -1 line	0 comments	Download
M	BUILD.gn	View		1 chunk	+1 line, -0 lines	0 comments	Download
M	CMakeLists.txt	View		1 chunk	+1 line, -0 lines	0 comments	Download
M	include/libyuv/macros_msa.h	View	1	3 chunks	+83 lines, -32 lines	0 comments	Download
M	include/libyuv/rotate_row.h	View		5 chunks	+29 lines, -0 lines	0 comments	Download
M	libyuv.gypi	View		1 chunk	+1 line, -0 lines	0 comments	Download
M	source/rotate.cc	View	1	2 chunks	+16 lines, -0 lines	0 comments	Download
M	source/rotate_any.cc	View		2 chunks	+6 lines, -0 lines	0 comments	Download
A	source/rotate_msa.cc	View		1 chunk	+203 lines, -0 lines	0 comments	Download

Messages

Total messages: 16 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

manojkumar.bhosale

Updated with reviewers, cc list and performance gain numbers

4 years ago (2016-12-07 08:52:56 UTC) #4

manojkumar.bhosale

On 2016/12/07 08:52:56, manojkumar.bhosale wrote: > Updated with reviewers, cc list and performance gain numbers ...

4 years ago (2016-12-12 10:52:57 UTC) #6

fbarchard1

https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h File include/libyuv/macros_msa.h (right): https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode58 include/libyuv/macros_msa.h:58: \ remove blank lines from macros. https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode130 include/libyuv/macros_msa.h:130: asm ...

4 years ago (2016-12-12 19:59:19 UTC) #7

manojkumar.bhosale

https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h File include/libyuv/macros_msa.h (right): https://codereview.chromium.org/2553403002/diff/1/include/libyuv/macros_msa.h#newcode58 include/libyuv/macros_msa.h:58: \ On 2016/12/12 19:59:18, fbarchard1 wrote: > remove blank ...

4 years ago (2016-12-13 08:52:23 UTC) #8

fbarchard1

lgtm https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc File source/rotate_msa.cc (right): https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc#newcode67 source/rotate_msa.cc:67: val0 = __msa_copy_s_d((v2i64)dst0, 0); On 2016/12/13 08:52:23, manojkumar.bhosale ...

4 years ago (2016-12-13 18:32:03 UTC) #9

manojkumar.bhosale

On 2016/12/13 18:32:03, fbarchard1 wrote: > lgtm > > https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc > File source/rotate_msa.cc (right): > ...

4 years ago (2016-12-14 06:23:20 UTC) #10

fbarchard1

On 2016/12/14 06:23:20, manojkumar.bhosale wrote: > On 2016/12/13 18:32:03, fbarchard1 wrote: > > lgtm > ...

4 years ago (2016-12-14 18:40:38 UTC) #11

manojkumar.bhosale

Committed patchset #2 (id:20001) manually as 6fa5e4eb780b67fe3275a529c6c0da9ea7b58cff (presubmit successful).

4 years ago (2016-12-15 04:34:57 UTC) #14

manojkumar.bhosale

On 2016/12/14 18:40:38, fbarchard1 wrote: > On 2016/12/14 06:23:20, manojkumar.bhosale wrote: > > On 2016/12/13 ...

4 years ago (2016-12-20 09:36:11 UTC) #15

fbarchard1

4 years ago (2016-12-20 19:12:01 UTC) #16

Message was sent while issue was closed.

On 2016/12/20 09:36:11, manojkumar.bhosale wrote:
> On 2016/12/14 18:40:38, fbarchard1 wrote:
> > On 2016/12/14 06:23:20, manojkumar.bhosale wrote:
> > > On 2016/12/13 18:32:03, fbarchard1 wrote:
> > > > lgtm
> > > > 
> > > > https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc
> > > > File source/rotate_msa.cc (right):
> > > > 
> > > >
> > >
> >
>
https://codereview.chromium.org/2553403002/diff/1/source/rotate_msa.cc#newcode67
> > > > source/rotate_msa.cc:67: val0 = __msa_copy_s_d((v2i64)dst0, 0);
> > > > On 2016/12/13 08:52:23, manojkumar.bhosale wrote:
> > > > > On 2016/12/12 19:59:19, fbarchard1 wrote:
> > > > > > consider if this can be weaved back into simd registers and use
single
> > > store
> > > > > per
> > > > > > vector?
> > > > > 
> > > > > As each double word store is offset with dst_stride, not possible to
> form
> > a
> > > > > vector for single store.
> > > > 
> > > > Acknowledged.
> > > > msa can't store 8 bytes directly from a SIMD register?
> > > 
> > > No, it can't.
> > 
> > ok, so a full 16x16 transpose would work better.  Suggest you add a comment
or
> > TODO that the function could be improved doing 16 byte writes.
> 
> Yes, 16x16 transpose would work better for MSA, however then in rotate.cc,
> in function TransposePlane, we will need to change the calling from
TransposeWx8
> to TransposeWx16 (processing 16x16 tile)
> Will this kind of MSA specific change be allowed in rotate.cc (generic C
code)?

yes.
We could either do it for all platforms, or have a block size constant for each
platform.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages