Issue 2060823003: Implement fast, correct gamma conversion for color xforms

msarett

Description was changed from ========== Implement fast, correct gamma conversion for color xforms BUG=skia: ========== ...

4 years, 6 months ago (2016-06-13 19:49:12 UTC) #1

msarett

Description was changed from ========== Implement fast, correct gamma conversion for color xforms BUG=skia: GOLD_TRYBOT_URL= ...

4 years, 6 months ago (2016-06-14 22:50:15 UTC) #3

msarett

msarett@google.com changed reviewers: + brianosman@google.com, herb@google.com, mtklein@google.com, reed@google.com

4 years, 6 months ago (2016-06-14 22:52:54 UTC) #4

msarett

https://codereview.chromium.org/2060823003/diff/20001/src/core/SkColorSpaceXform.cpp File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2060823003/diff/20001/src/core/SkColorSpaceXform.cpp#newcode40 src/core/SkColorSpaceXform.cpp:40: if (SkColorSpace::k2Dot2Curve_GammaNamed == dstSpace->gammaNamed() && I plan to add ...

4 years, 6 months ago (2016-06-14 22:52:56 UTC) #5

Brian Osman

lgtm, but I'll defer to the CPU experts on the SSE implementation details. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h File ...

4 years, 6 months ago (2016-06-15 14:02:01 UTC) #6

msarett

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h#newcode292 src/opts/SkColorXform_opts.h:292: static uint8_t clamp_float_to_byte(float v) { On 2016/06/15 14:02:01, Brian ...

4 years, 6 months ago (2016-06-15 14:05:27 UTC) #7

mtklein

We don't absolutely require SSE2 or NEON, but in turn we do not care how ...

4 years, 6 months ago (2016-06-15 14:31:36 UTC) #8

mtklein

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h#newcode160 src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); If I remember my high ...

4 years, 6 months ago (2016-06-15 15:10:46 UTC) #9

msarett

Description was changed from ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on ...

4 years, 6 months ago (2016-06-15 17:52:58 UTC) #10

msarett

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h#newcode160 src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); On 2016/06/15 15:10:46, mtklein wrote: ...

4 years, 6 months ago (2016-06-15 17:55:03 UTC) #11

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
File src/opts/SkColorXform_opts.h (right):

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32);
On 2016/06/15 15:10:46, mtklein wrote:
> If I remember my high school math, the limit of  x^(y) as y --> 0 is 1.   Just
> curious how close to the limit we are here, and whether we're close enough to
> ignore this term?
> 
> I guess that's the same as asking whether x ^ (29/64)  is different from x ^
> (30/64) within the precision of our rsqrts and rcps?

This is a good question, I'll need to follow up.

I've been assuming that rsqrts and rcps are perfect.  "The maximum relative
error for this approximation is less than 1.5*2^-12" seems pretty good, but I'm
sure the error could propagate...  I think I'll need to actually measure the
accuracy of these instructions to have a confident answer.

In the simpler world of "rsqrts and rcps are perfect" it does matter.  The diff
to the true value goes from 0.9 (off by 1 max) to 2.9 (off by 3 max).

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:162: // x^(+29/64) = x * x^(-1/2) * x^(-1/32) *
x^(-1/64)
On 2016/06/15 15:10:46, mtklein wrote:
> Why do we need to involve x?  Is it not simpler to do
> 
>     x^(29/64) == x^(32/64) * x^(-2/64) * x^(-1/64)
> 
> something like rcp(x2) * x32 * rcp(x64)?

Long story short: Looks good, let's do that!

My first implementation was actually rcp(x2 * x64) * x32
Turns out to be a fair amount slower than the current - I'm guessing because we
spend time waiting for x64 to be ready.

I tried and didn't pick your suggestion because I didn't see a performance
difference, and because I preferred to trust an extra mul vs. an extra
reciprocal in terms of accuracy.

Of course, now I'm running it again and seeing a slight boost from using
reciprocal.  So I've made the change :).  I think it's clearer and faster.

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:188: __m128 reds   =
_mm_setr_ps(gamma_to_linear[(src[0] >>  0) & 0xFF],
On 2016/06/15 15:10:46, mtklein wrote:
> Let's hook this all into SkOpts_sse41.cpp too?  These setr_ps will (should?)
be
> faster with SSE4.1..

SGTM

Woohoo!  Another small performance win.

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:261: gamma_srgb_to_linear[(src[0] >> 24) & 0xFF]);
On 2016/06/15 15:10:46, mtklein wrote:
> Can't this lane just be a constant, e.g. 0 or 1?  It seems unused... seems
silly
> to look it up in the table (which itself is semantically confusing, given that
> alpha is already linear).

Of course, removing thanks.

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:268: __m128 dstPixel =               _mm_mul_ps(r,
rXgXbX);
On 2016/06/15 15:10:46, mtklein wrote:
> funky formatting here?

Done.

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:330: dstFloats[0] = pow(dstFloats[0], (1/2.2f)) *
255.0f;
On 2016/06/15 15:10:46, mtklein wrote:
> might as well use powf.  that at least avoids float ->  double and double ->
> float?

Done.

mtklein

https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXform.h File src/core/SkColorSpaceXform.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXform.h#newcode28 src/core/SkColorSpaceXform.h:28: * Apply the color conversion to a src buffer, ...

4 years, 6 months ago (2016-06-15 20:39:43 UTC) #12

msarett

https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXform.h File src/core/SkColorSpaceXform.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXform.h#newcode28 src/core/SkColorSpaceXform.h:28: * Apply the color conversion to a src buffer, ...

4 years, 6 months ago (2016-06-15 21:20:24 UTC) #13

mtklein_C

mtklein@chromium.org changed reviewers: + mtklein@chromium.org

4 years, 6 months ago (2016-06-16 13:27:39 UTC) #14

mtklein_C

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXform.cpp File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXform.cpp#newcode41 src/core/SkColorSpaceXform.cpp:41: 0.0f == srcToDst.getFloat(3, 0) && So the idea is ...

4 years, 6 months ago (2016-06-16 13:27:41 UTC) #15

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf...
File src/core/SkColorSpaceXform.cpp (right):

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf...
src/core/SkColorSpaceXform.cpp:41: 0.0f == srcToDst.getFloat(3, 0) &&
So the idea is that the Q terms are so uncommon they're not even worth an
optimized routine?

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h
File src/core/SkOpts.h (right):

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newco...
src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888
opaque pixels.
Doesn't this actually transform RGBA into RGB1 pixels now?  It's only because
you've swapped around the matrix that we get SkPMColor out, right?

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
File src/opts/SkColorXform_opts.h (right):

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:152: static __m128
inverse_gamma_linear_to_2dot2(__m128 x) {
Just some naming questions.  Some of these names feel redundant, and I'm not
sure if they're really redundant or there's some important clarification going
on.  Would things still mean the same if they were rewritten like this?

   gamma_srgb_to_linear          -> linear_from_srgb
   gamma_2dot2_to_linear         -> linear_from_2dot2
   inverse_gamma_linear_to_2dot2 -> linear_to_2dot2

or even
   from_srgb, from_2dot2, to_2dot2

?

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:172: const float matrix[16]) {
This line might want to be re-wrapped?

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:174: if (SkColorSpace::kSRGB_GammaNamed ==
kGammaNamed) {
Since we're not otherwise using kGammaNamed, I think this would read more
clearly if we just templatize these functions (here and portable) on the table
pointers themselves:

static const float gamma_srgb_to_linear[] = { ... };

template <const float* gamma_to_linear>
static void color_xform_RGB1(...) {
   ... use gamma_to_linear directly ...
}

static void color_xform_RGB1_srgb_to_2dot2(...) {
   color_xform_RGB1<gamma_srgb_to_linear>(...);
}

It's one of the underhyped features of C++11 that you can use pointers to static
const arrays as template arguments.  In C++98 they had to be extern.

And if you want to be really careful, you can specify the template like
    template <const float (&gamma_to_linear)[256]>
which will make sure you passed an array with exactly 256 entries.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:238: // clamps to zero.  Note that max(NaN, 0) = 0,
while max(0, NaN) = NaN.
Do we have test cases exercising the NaN input?  Just want to make sure that if
we think we care, we care enough to test it and keep it correct.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:257: __m128 srcPixel =
_mm_setr_ps(gamma_srgb_to_linear[(src[0] >>  0) & 0xFF],
Wouldn't this part be simpler as,

// Splat red, green, and blue components.
__m128 r = _mm_set1_ps(gamma_srgb_to_linear[...]),
       g = _mm_set1_ps(gamma_srgb_to_linear[...]),
       b = _mm_set1_ps(gamma_srgb_to_linear[...]);

Seems like there's no need for the srcPixel intermediate.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:275: dstPixel = _mm_min_ps(_mm_max_ps(dstPixel,
_mm_setzero_ps()), _mm_set1_ps(255.0f));
Let's make the clamping a static function?  That can help make sure the
important comment about NaN and argument order is always paired up with the
code.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(0xFF,
Didn't you already munge the matrix so that we should always act as if the
output were RGBA and then that is actually SkPMColor order?  SkPackARGB32NoCheck
applies SkPMColor order itself, which seems wrong... shouldn't this just be
shifts?

This is made complicated, of course, by all our non-x86 bots having RGBA as
their SkPMColor order.  There shouldn't be any visible bug here, but I think
this code is semantically misleading.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:346: color_xform_RGB1_portable<kGammaNamed>(dst,
src, len, matrix);
This appears to be a complete passthrough function.  Why don't we just rename
what's now color_xform_RGB1_portable to color_xform_RGB1?

mtklein

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newcode72 src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888 ...

4 years, 6 months ago (2016-06-16 13:29:28 UTC) #16

msarett

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_opts.h#newcode160 src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); On 2016/06/15 17:55:03, msarett wrote: ...

4 years, 6 months ago (2016-06-16 15:46:12 UTC) #17

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
File src/opts/SkColorXform_opts.h (right):

https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32);
On 2016/06/15 17:55:03, msarett wrote:
> On 2016/06/15 15:10:46, mtklein wrote:
> > If I remember my high school math, the limit of  x^(y) as y --> 0 is 1.  
Just
> > curious how close to the limit we are here, and whether we're close enough
to
> > ignore this term?
> > 
> > I guess that's the same as asking whether x ^ (29/64)  is different from x ^
> > (30/64) within the precision of our rsqrts and rcps?
> 
> This is a good question, I'll need to follow up.
> 
> I've been assuming that rsqrts and rcps are perfect.  "The maximum relative
> error for this approximation is less than 1.5*2^-12" seems pretty good, but
I'm
> sure the error could propagate...  I think I'll need to actually measure the
> accuracy of these instructions to have a confident answer.
> 
> In the simpler world of "rsqrts and rcps are perfect" it does matter.  The
diff
> to the true value goes from 0.9 (off by 1 max) to 2.9 (off by 3 max).

I tested x^(29/64) and x^(30/64) over the 0 to 1 range (at increments of 0.001).
 x^(29/64) stays within 1 of x^(1/2.2) and x^(30/64) stays within 3.  The
instructions are accurate enough that we can trust them.

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf...
File src/core/SkColorSpaceXform.cpp (right):

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf...
src/core/SkColorSpaceXform.cpp:41: 0.0f == srcToDst.getFloat(3, 0) &&
On 2016/06/16 13:27:40, mtklein_C wrote:
> So the idea is that the Q terms are so uncommon they're not even worth an
> optimized routine?

Well not right now, I may consider templating them back in or something...

I have one image with Q terms.  It comes from a website that says, "here are
some wacky profiles".  And it actually doesn't even hit the opt code anyway
because it has a funky gamma.

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h
File src/core/SkOpts.h (right):

https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newco...
src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888
opaque pixels.
On 2016/06/16 13:27:40, mtklein_C wrote:
> Doesn't this actually transform RGBA into RGB1 pixels now?  It's only because
> you've swapped around the matrix that we get SkPMColor out, right?

Yes fixing this comment.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
File src/opts/SkColorXform_opts.h (right):

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:152: static __m128
inverse_gamma_linear_to_2dot2(__m128 x) {
On 2016/06/16 13:27:41, mtklein_C wrote:
> Just some naming questions.  Some of these names feel redundant, and I'm not
> sure if they're really redundant or there's some important clarification going
> on.  Would things still mean the same if they were rewritten like this?
> 
>    gamma_srgb_to_linear          -> linear_from_srgb
>    gamma_2dot2_to_linear         -> linear_from_2dot2
>    inverse_gamma_linear_to_2dot2 -> linear_to_2dot2
> 
> or even
>    from_srgb, from_2dot2, to_2dot2
> 
> ?

sgtm, shorter names are better.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:172: const float matrix[16]) {
On 2016/06/16 13:27:41, mtklein_C wrote:
> This line might want to be re-wrapped?

Done.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:174: if (SkColorSpace::kSRGB_GammaNamed ==
kGammaNamed) {
On 2016/06/16 13:27:41, mtklein_C wrote:
> Since we're not otherwise using kGammaNamed, I think this would read more
> clearly if we just templatize these functions (here and portable) on the table
> pointers themselves:
> 
> static const float gamma_srgb_to_linear[] = { ... };
> 
> template <const float* gamma_to_linear>
> static void color_xform_RGB1(...) {
>    ... use gamma_to_linear directly ...
> }
> 
> static void color_xform_RGB1_srgb_to_2dot2(...) {
>    color_xform_RGB1<gamma_srgb_to_linear>(...);
> }
> 
> It's one of the underhyped features of C++11 that you can use pointers to
static
> const arrays as template arguments.  In C++98 they had to be extern.
> 
> And if you want to be really careful, you can specify the template like
>     template <const float (&gamma_to_linear)[256]>
> which will make sure you passed an array with exactly 256 entries.

Woohoo, this is cool!

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:238: // clamps to zero.  Note that max(NaN, 0) = 0,
while max(0, NaN) = NaN.
On 2016/06/16 13:27:40, mtklein_C wrote:
> Do we have test cases exercising the NaN input?  Just want to make sure that
if
> we think we care, we care enough to test it and keep it correct.

Yes we are hitting this case quite frequently actually.  The
inverse_gamma_xform(0) is giving NaN.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:257: __m128 srcPixel =
_mm_setr_ps(gamma_srgb_to_linear[(src[0] >>  0) & 0xFF],
On 2016/06/16 13:27:40, mtklein_C wrote:
> Wouldn't this part be simpler as,
> 
> // Splat red, green, and blue components.
> __m128 r = _mm_set1_ps(gamma_srgb_to_linear[...]),
>        g = _mm_set1_ps(gamma_srgb_to_linear[...]),
>        b = _mm_set1_ps(gamma_srgb_to_linear[...]);
> 
> Seems like there's no need for the srcPixel intermediate.

Yes I like this better, done.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:275: dstPixel = _mm_min_ps(_mm_max_ps(dstPixel,
_mm_setzero_ps()), _mm_set1_ps(255.0f));
On 2016/06/16 13:27:41, mtklein_C wrote:
> Let's make the clamping a static function?  That can help make sure the
> important comment about NaN and argument order is always paired up with the
> code.

SGTM

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(0xFF,
On 2016/06/16 13:27:41, mtklein_C wrote:
> Didn't you already munge the matrix so that we should always act as if the
> output were RGBA and then that is actually SkPMColor order? 
SkPackARGB32NoCheck
> applies SkPMColor order itself, which seems wrong... shouldn't this just be
> shifts?
> 
> This is made complicated, of course, by all our non-x86 bots having RGBA as
> their SkPMColor order.  There shouldn't be any visible bug here, but I think
> this code is semantically misleading.

Agreed, this is wrong.  Using shifts.

https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o...
src/opts/SkColorXform_opts.h:346: color_xform_RGB1_portable<kGammaNamed>(dst,
src, len, matrix);
On 2016/06/16 13:27:40, mtklein_C wrote:
> This appears to be a complete passthrough function.  Why don't we just rename
> what's now color_xform_RGB1_portable to color_xform_RGB1?

Of course, Done.

mtklein

lgtm https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_opts.h#newcode200 src/opts/SkColorXform_opts.h:200: // Splat rX, rY, rZ, and rQ each ...

4 years, 6 months ago (2016-06-16 16:46:04 UTC) #18

msarett

https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_opts.h File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_opts.h#newcode200 src/opts/SkColorXform_opts.h:200: // Splat rX, rY, rZ, and rQ each across ...

4 years, 6 months ago (2016-06-16 17:05:08 UTC) #19

msarett

The CQ bit was checked by msarett@google.com to run a CQ dry run

4 years, 6 months ago (2016-06-16 17:05:15 UTC) #20