Issue 1295873004: SkColorCubeFilter: require alpha == 0xFF.

Issue 1295873004: SkColorCubeFilter: require alpha == 0xFF. (Closed)

Created:
5 years, 4 months ago by mtklein_C

Modified:
5 years, 3 months ago

Reviewers:
Noel Gordon, mtklein, reed1

CC:
reviews_skia.org, radu.velea

Base URL:
https://skia.googlesource.com/skia.git@master

Target Ref:
refs/heads/master

Project:
skia

Visibility:
Public.

More Reviews

Description

SkColorCubeFilter: require alpha == 0xFF. This is about a 12% improvement on my desktop, from 134 to 118ms on our bench. BUG=skia: Committed: https://skia.googlesource.com/skia/+/d1c6b7c5007b5c609b44a9cdfe95ef64a5a8f29f

Patch Set 1 #

Patch Set 2 : fixes #

Patch Set 3 : just perf #

Total comments: 5

Created: 5 years, 4 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+14 lines, -14 lines)			Patch
M	include/effects/SkColorCubeFilter.h	View		1 chunk	+1 line, -1 line	1 comment	Download
M	src/core/SkPMFloat.h	View		1 chunk	+1 line, -1 line	0 comments	Download
M	src/opts/SkColorCubeFilter_opts.h	View		1 chunk	+4 lines, -4 lines	1 comment	Download
M	src/opts/SkPMFloat_neon.h	View		1 chunk	+2 lines, -2 lines	0 comments	Download
M	src/opts/SkPMFloat_none.h	View	1 2	1 chunk	+2 lines, -1 line	1 comment	Download
M	src/opts/SkPMFloat_sse.h	View	1	2 chunks	+4 lines, -5 lines	2 comments	Download

Messages

Total messages: 30 (7 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1295873004/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1295873004/20001

5 years, 4 months ago (2015-08-19 17:54:57 UTC) #2

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 4 months ago (2015-08-19 18:02:41 UTC) #4

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1295873004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1295873004/40001

5 years, 4 months ago (2015-08-19 18:07:18 UTC) #6

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 4 months ago (2015-08-19 18:13:44 UTC) #8

Noel Gordon

LGTM++ Awesome patch mtklein. I expect this to match qcms tetrahedral SSE perf-wise on win, ...

5 years, 4 months ago (2015-08-20 01:43:29 UTC) #12

mtklein

https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse.h File src/opts/SkPMFloat_sse.h (left): https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse.h#oldcode59 src/opts/SkPMFloat_sse.h:59: fix8_32 = _mm_or_si128(fix8_32, _mm_set_epi32(0xFF,0,0,0)); // Force alpha to 1. ...

5 years, 4 months ago (2015-08-20 01:55:17 UTC) #14

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1295873004/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1295873004/40001

5 years, 4 months ago (2015-08-20 01:56:18 UTC) #16

commit-bot: I haz the power

Committed patchset #3 (id:40001) as https://skia.googlesource.com/skia/+/d1c6b7c5007b5c609b44a9cdfe95ef64a5a8f29f

5 years, 4 months ago (2015-08-20 01:56:54 UTC) #17

mtklein

On 2015/08/20 01:56:54, commit-bot: I haz the power wrote: > Committed patchset #3 (id:40001) as ...

5 years, 4 months ago (2015-08-20 02:07:26 UTC) #18

Noel Gordon

On 2015/08/20 01:55:17, mtklein wrote: > https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse.h > File src/opts/SkPMFloat_sse.h (left): > > https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse.h#oldcode59 > ...

5 years, 4 months ago (2015-08-20 04:01:30 UTC) #19

Message was sent while issue was closed.

On 2015/08/20 01:55:17, mtklein wrote:
> https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse.h
> File src/opts/SkPMFloat_sse.h (left):
> 
>
https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkPMFloat_sse....
> src/opts/SkPMFloat_sse.h:59: fix8_32 = _mm_or_si128(fix8_32,
> _mm_set_epi32(0xFF,0,0,0));  // Force alpha to 1.
> On 2015/08/20 01:43:29, noel gordon wrote:
> > Very cool; less instructions generally means more perf.
> > 
> > Aside: there is one other way I've thought of implementing this entire
> function:
> > 
> > const __m128 __color = _mm_set_ps((float)SkColorGetR(c),
> (float)SkColorGetR(g),
> > (float)SkColorGetB(c), 1.0f);  // FIXME
> > SkPMFloat pmf = Sk4f(_mm_mul_ps(__color, _mm_set1_ps(1.0f/255)));
> > SkASSERT(pmf.isValid());	
> > return pmf;
> > 
> > Would need a fix to get the __color R G B byte order correct for platform
> > PMColor order.  But with many fewer SSE instructions, it might be faster, or
> it
> > might not -- it would depend on the latency of _mm_set_ps() and maybe the
C++
> > compiler.
> 
> Yeah, _mm_set_ps doesn't really correspond to any particular instruction,

Ah yes, it's a compiler intrinsic so ..

> so I've always preferred writing things out myself rather than having to hope
GCC,
> Clang, and MSVC all generate something good.

this indeed makes sense.

> But it's worth trying out.  If we
> find we can trust _mm_set_ps, it'd make lots of code quite a bit clearer.

One thing we found when doing the qcms tetra SSE, was that MSVC _mm_set_ps() was
awesome compared to writing it out long hand.  However, the long hand was better
on Linux and Mac.  Since most chrome users are in win, we preferred _mm_set_ps()
to avoid #ifdef-ing the code up the wahzoo.

> I'm not particularly optimistic about this one though, as it'd involve three
> cvtsi2ss calls rather than a single cvtdq2ps, and a bunch of serial shifts to
> move components around.

When measuring win, I recall I quickly tried the _mm_set_ps() idea.  Shorter
code, and I expected to see a perf improvement compared to long hand way of
SkPMFloat.  I _did not_ see any improvement in MSVC 2013 at all (from memory,
was doing other measurements for you guys at that time so I dropped
_mm_set_ps())

The result suggested to me that the long hand way in Skia is very fast / good.

@Radu is this something we could re-investigate for QCMS? eg., compare
_mm_set_ps(), vs a backport of the Skia readers from SkPMFloat into QCMS to
confirm whether _mm_set_ps() is the right option?

Noel Gordon

On 2015/08/20 02:07:26, mtklein wrote: > On 2015/08/20 01:56:54, commit-bot: I haz the power wrote: ...

5 years, 4 months ago (2015-08-20 04:08:58 UTC) #20

Noel Gordon

> This is about a 12% improvement on my desktop, from 134 to 118ms on ...

5 years, 4 months ago (2015-08-20 04:56:01 UTC) #21

Noel Gordon

https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkColorCubeFilter_opts.h File src/opts/SkColorCubeFilter_opts.h (right): https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkColorCubeFilter_opts.h#newcode56 src/opts/SkColorCubeFilter_opts.h:56: for (int x = 0; x < 2; ++x) ...

5 years, 4 months ago (2015-08-20 07:10:07 UTC) #22

Noel Gordon

On 2015/08/20 07:10:07, noel gordon wrote: > https://codereview.chromium.org/1295873004/diff/40001/src/opts/SkColorCubeFilter_opts.h#newcode56 > src/opts/SkColorCubeFilter_opts.h:56: for (int x = 0; ...

5 years, 3 months ago (2015-08-27 06:20:24 UTC) #23

mtklein

On 2015/08/27 06:20:24, noel gordon wrote: > On 2015/08/20 07:10:07, noel gordon wrote: > > ...

5 years, 3 months ago (2015-08-27 11:45:42 UTC) #24

mtklein

On 2015/08/27 12:41:37, noel gordon wrote: > > Yes? > > #22 do/did you confirm? ...

5 years, 3 months ago (2015-08-27 13:07:06 UTC) #26

Noel Gordon

On 2015/08/27 13:07:06, mtklein wrote: > On 2015/08/27 12:41:37, noel gordon wrote: > > > ...

5 years, 3 months ago (2015-08-27 13:55:55 UTC) #27

mtklein

> I measured on a mac air, it has an integrated GPU (shares the memory ...

5 years, 3 months ago (2015-08-27 14:00:51 UTC) #28

mtklein

> 3) Parameterize SkPMFloat by a divisor bias. E.g. SkPMFloat<1> would be > today's SkPMFloat, ...

5 years, 3 months ago (2015-08-27 15:36:00 UTC) #29

Noel Gordon

5 years, 3 months ago (2015-09-01 11:13:31 UTC) #30

Message was sent while issue was closed.

On 2015/08/27 14:00:51, mtklein wrote:
> > I measured on a mac air, it has an integrated GPU (shares the memory bus
with
> > the CPU). Perhaps your mac has a discrete GPU (has it's own memory bus),
like
> a
> > mac pro or similar?
> 
> Does that matter?  We're entirely in software here.  No GPU involved in these
> tests.
> 
> But yes, I've got a MBP locked into Intel HD Graphics 4000 mode.

No it shouldn't matter, but I do see the improvement on a mac air.  Anyway, with
the change in
bias CL that was discussed in another CL, we get a good speed improvement.  My
result is: if you combine that CL with a loop unroll, there is a speed
_regression_.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages