|
|
DescriptionImplement fast, correct gamma conversion for color xforms
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
QCMS Xform 0.495 ms
Skia Old Xform 0.235 ms
Skia NEW Xform 0.423 ms
Vs Old Code 0.56x
Vs QCMS 1.17x
So to summarize, we are now much slower than before,
but still a bit faster than QCMS. And now we are also
far more accurate than QCMS :).
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/dea0340cadb759932e53416a657f5ea75fee8b5f
Patch Set 1 #
Total comments: 19
Patch Set 2 : Improvements #
Total comments: 4
Patch Set 3 : Should be 0xFF #
Total comments: 21
Patch Set 4 : Naming and organization improvements #
Total comments: 2
Patch Set 5 : Fix comments #Patch Set 6 : Win test #
Depends on Patchset: Messages
Total messages: 32 (13 generated)
Description was changed from ========== Implement fast, correct gamma conversion for color xforms BUG=skia: ========== to ========== Implement fast, correct gamma conversion for color xforms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:1) has been deleted
Description was changed from ========== Implement fast, correct gamma conversion for color xforms BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.429 ms Vs Old Code 0.55x :( Vs QCMS 1.15x :) So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
msarett@google.com changed reviewers: + brianosman@google.com, herb@google.com, mtklein@google.com, reed@google.com
https://codereview.chromium.org/2060823003/diff/20001/src/core/SkColorSpaceXf... File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2060823003/diff/20001/src/core/SkColorSpaceXf... src/core/SkColorSpaceXform.cpp:40: if (SkColorSpace::k2Dot2Curve_GammaNamed == dstSpace->gammaNamed() && I plan to add sRGB dsts next... https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:151: // x^(29/64) is a very good approximation of the true value, x^(1/2.2). This is the most accurate approach I tried. It's always within 0.9 of the true float value, so the max diff is 1. The max diff for the entire transform is 1 since the forward transform is perfect. It's worth mentioning that the inverse 2.2 table is pretty inaccurate because the curve is so steep. Even if I go to 8192 entries, table[0] = 0 and table[1] = 4 (aahhh!). I managed to make this faster than the table, but it's slower than pretty much everything else I tried (though not awful). https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:256: while (len > 0) { The planar approach ultimately was faster than the interleaved one...but the interleaved one is useful to handle any tail pixels one by one. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:292: static uint8_t clamp_float_to_byte(float v) { Next two functions copied from above, with minor modifications to match the new opt code. Don't think anyone should call these. (we always have SSE2?)
lgtm, but I'll defer to the CPU experts on the SSE implementation details. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:292: static uint8_t clamp_float_to_byte(float v) { On 2016/06/14 22:52:56, msarett wrote: > Next two functions copied from above, with minor modifications to match the new > opt code. Don't think anyone should call these. (we always have SSE2?) Not on ARM?
https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:292: static uint8_t clamp_float_to_byte(float v) { On 2016/06/15 14:02:01, Brian Osman wrote: > On 2016/06/14 22:52:56, msarett wrote: > > Next two functions copied from above, with minor modifications to match the > new > > opt code. Don't think anyone should call these. (we always have SSE2?) > > Not on ARM? True. NEON implementations are coming soon. We can always assume NEON with Arm, I think.
We don't absolutely require SSE2 or NEON, but in turn we do not care how fast code is without them. There's always got to be a portable code path, but it's much more important that path exists, is simple, clear, correct, and portable than is fast. Now in practice, we do always have SSE2 or NEON in Chrome, SSSE3 for Chrome on x86 Android or Mac, and -march=native in Android itself. I'm fuzzy on what minimums other clients like Firefox set. Some clients also compile Skia for other architectures (e.g. MIPS). These are all compile-time minimums; with runtime detection like SkOpts here can provide, there's a high probability you'll find support up through SSE4.1/2 on x86, and about even odds for AVX. While we're in the area, one related bit we've really started completely ignoring is endianness. Until we're made aware of a compellingly reason to change our minds, please just blindly assume the target architecture is little-endian. Don't check SK_CPU_BENDIAN, don't think about it at all... we don't have any testing coverage for big-endian targets, so there's no reproducible way prove that any thinking or #ifdefing we do about endianness is worthwhile. Now that you've read this paragraph, maybe even forget I mentioned it.
https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); If I remember my high school math, the limit of x^(y) as y --> 0 is 1. Just curious how close to the limit we are here, and whether we're close enough to ignore this term? I guess that's the same as asking whether x ^ (29/64) is different from x ^ (30/64) within the precision of our rsqrts and rcps? https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:162: // x^(+29/64) = x * x^(-1/2) * x^(-1/32) * x^(-1/64) Why do we need to involve x? Is it not simpler to do x^(29/64) == x^(32/64) * x^(-2/64) * x^(-1/64) something like rcp(x2) * x32 * rcp(x64)? https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:188: __m128 reds = _mm_setr_ps(gamma_to_linear[(src[0] >> 0) & 0xFF], Let's hook this all into SkOpts_sse41.cpp too? These setr_ps will (should?) be faster with SSE4.1.. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:261: gamma_srgb_to_linear[(src[0] >> 24) & 0xFF]); Can't this lane just be a constant, e.g. 0 or 1? It seems unused... seems silly to look it up in the table (which itself is semantically confusing, given that alpha is already linear). https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:268: __m128 dstPixel = _mm_mul_ps(r, rXgXbX); funky formatting here? https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:330: dstFloats[0] = pow(dstFloats[0], (1/2.2f)) * 255.0f; might as well use powf. that at least avoids float -> double and double -> float?
Description was changed from ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.429 ms Vs Old Code 0.55x :( Vs QCMS 1.15x :) So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); On 2016/06/15 15:10:46, mtklein wrote: > If I remember my high school math, the limit of x^(y) as y --> 0 is 1. Just > curious how close to the limit we are here, and whether we're close enough to > ignore this term? > > I guess that's the same as asking whether x ^ (29/64) is different from x ^ > (30/64) within the precision of our rsqrts and rcps? This is a good question, I'll need to follow up. I've been assuming that rsqrts and rcps are perfect. "The maximum relative error for this approximation is less than 1.5*2^-12" seems pretty good, but I'm sure the error could propagate... I think I'll need to actually measure the accuracy of these instructions to have a confident answer. In the simpler world of "rsqrts and rcps are perfect" it does matter. The diff to the true value goes from 0.9 (off by 1 max) to 2.9 (off by 3 max). https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:162: // x^(+29/64) = x * x^(-1/2) * x^(-1/32) * x^(-1/64) On 2016/06/15 15:10:46, mtklein wrote: > Why do we need to involve x? Is it not simpler to do > > x^(29/64) == x^(32/64) * x^(-2/64) * x^(-1/64) > > something like rcp(x2) * x32 * rcp(x64)? Long story short: Looks good, let's do that! My first implementation was actually rcp(x2 * x64) * x32 Turns out to be a fair amount slower than the current - I'm guessing because we spend time waiting for x64 to be ready. I tried and didn't pick your suggestion because I didn't see a performance difference, and because I preferred to trust an extra mul vs. an extra reciprocal in terms of accuracy. Of course, now I'm running it again and seeing a slight boost from using reciprocal. So I've made the change :). I think it's clearer and faster. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:188: __m128 reds = _mm_setr_ps(gamma_to_linear[(src[0] >> 0) & 0xFF], On 2016/06/15 15:10:46, mtklein wrote: > Let's hook this all into SkOpts_sse41.cpp too? These setr_ps will (should?) be > faster with SSE4.1.. SGTM Woohoo! Another small performance win. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:261: gamma_srgb_to_linear[(src[0] >> 24) & 0xFF]); On 2016/06/15 15:10:46, mtklein wrote: > Can't this lane just be a constant, e.g. 0 or 1? It seems unused... seems silly > to look it up in the table (which itself is semantically confusing, given that > alpha is already linear). Of course, removing thanks. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:268: __m128 dstPixel = _mm_mul_ps(r, rXgXbX); On 2016/06/15 15:10:46, mtklein wrote: > funky formatting here? Done. https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:330: dstFloats[0] = pow(dstFloats[0], (1/2.2f)) * 255.0f; On 2016/06/15 15:10:46, mtklein wrote: > might as well use powf. that at least avoids float -> double and double -> > float? Done.
https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXf... File src/core/SkColorSpaceXform.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXf... src/core/SkColorSpaceXform.h:28: * Apply the color conversion to a src buffer, storing the output in the dst buffer. Just curious, is there any real reason we need to require RGB1 input to produce RGB1 output? Can't these be RGBx -> RGB1 with the same performance? Ideally we'll never even look at the source alpha. This kinda trickles through the various API names and comments. I'm not sure it's important. https://codereview.chromium.org/2060823003/diff/40001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(((*src >> 24) & 0xFF), any particular reason we read src alpha here instead of just setting 0xff?
https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXf... File src/core/SkColorSpaceXform.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/core/SkColorSpaceXf... src/core/SkColorSpaceXform.h:28: * Apply the color conversion to a src buffer, storing the output in the dst buffer. On 2016/06/15 20:39:42, mtklein wrote: > Just curious, is there any real reason we need to require RGB1 input to produce > RGB1 output? Can't these be RGBx -> RGB1 with the same performance? Ideally > we'll never even look at the source alpha. > > This kinda trickles through the various API names and comments. I'm not sure > it's important. It became RGB1 because I wanted to take advantage of always writing 0xFF to the alpha channel... It could be renamed RGBx, I just don't know where we'd ever get RGBx input from. I think I prefer it as is, but don't feel strongly. It'll get extended and possibly renamed anyway when we add RGBA unpremul and premul. https://codereview.chromium.org/2060823003/diff/40001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/40001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(((*src >> 24) & 0xFF), On 2016/06/15 20:39:42, mtklein wrote: > any particular reason we read src alpha here instead of just setting 0xff? Should be 0xFF, thanks.
mtklein@chromium.org changed reviewers: + mtklein@chromium.org
https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf... File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf... src/core/SkColorSpaceXform.cpp:41: 0.0f == srcToDst.getFloat(3, 0) && So the idea is that the Q terms are so uncommon they're not even worth an optimized routine? https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newco... src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888 opaque pixels. Doesn't this actually transform RGBA into RGB1 pixels now? It's only because you've swapped around the matrix that we get SkPMColor out, right? https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:152: static __m128 inverse_gamma_linear_to_2dot2(__m128 x) { Just some naming questions. Some of these names feel redundant, and I'm not sure if they're really redundant or there's some important clarification going on. Would things still mean the same if they were rewritten like this? gamma_srgb_to_linear -> linear_from_srgb gamma_2dot2_to_linear -> linear_from_2dot2 inverse_gamma_linear_to_2dot2 -> linear_to_2dot2 or even from_srgb, from_2dot2, to_2dot2 ? https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:172: const float matrix[16]) { This line might want to be re-wrapped? https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:174: if (SkColorSpace::kSRGB_GammaNamed == kGammaNamed) { Since we're not otherwise using kGammaNamed, I think this would read more clearly if we just templatize these functions (here and portable) on the table pointers themselves: static const float gamma_srgb_to_linear[] = { ... }; template <const float* gamma_to_linear> static void color_xform_RGB1(...) { ... use gamma_to_linear directly ... } static void color_xform_RGB1_srgb_to_2dot2(...) { color_xform_RGB1<gamma_srgb_to_linear>(...); } It's one of the underhyped features of C++11 that you can use pointers to static const arrays as template arguments. In C++98 they had to be extern. And if you want to be really careful, you can specify the template like template <const float (&gamma_to_linear)[256]> which will make sure you passed an array with exactly 256 entries. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:238: // clamps to zero. Note that max(NaN, 0) = 0, while max(0, NaN) = NaN. Do we have test cases exercising the NaN input? Just want to make sure that if we think we care, we care enough to test it and keep it correct. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:257: __m128 srcPixel = _mm_setr_ps(gamma_srgb_to_linear[(src[0] >> 0) & 0xFF], Wouldn't this part be simpler as, // Splat red, green, and blue components. __m128 r = _mm_set1_ps(gamma_srgb_to_linear[...]), g = _mm_set1_ps(gamma_srgb_to_linear[...]), b = _mm_set1_ps(gamma_srgb_to_linear[...]); Seems like there's no need for the srcPixel intermediate. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:275: dstPixel = _mm_min_ps(_mm_max_ps(dstPixel, _mm_setzero_ps()), _mm_set1_ps(255.0f)); Let's make the clamping a static function? That can help make sure the important comment about NaN and argument order is always paired up with the code. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(0xFF, Didn't you already munge the matrix so that we should always act as if the output were RGBA and then that is actually SkPMColor order? SkPackARGB32NoCheck applies SkPMColor order itself, which seems wrong... shouldn't this just be shifts? This is made complicated, of course, by all our non-x86 bots having RGBA as their SkPMColor order. There shouldn't be any visible bug here, but I think this code is semantically misleading. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:346: color_xform_RGB1_portable<kGammaNamed>(dst, src, len, matrix); This appears to be a complete passthrough function. Why don't we just rename what's now color_xform_RGB1_portable to color_xform_RGB1?
https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newco... src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888 opaque pixels. On 2016/06/16 13:27:40, mtklein_C wrote: > Doesn't this actually transform RGBA into RGB1 pixels now? It's only because > you've swapped around the matrix that we get SkPMColor out, right? Err, RGB1 -> RGB1. Just interested in color order here, not alpha.
https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/20001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:160: __m128 x64 = _mm_rsqrt_ps(x32); On 2016/06/15 17:55:03, msarett wrote: > On 2016/06/15 15:10:46, mtklein wrote: > > If I remember my high school math, the limit of x^(y) as y --> 0 is 1. Just > > curious how close to the limit we are here, and whether we're close enough to > > ignore this term? > > > > I guess that's the same as asking whether x ^ (29/64) is different from x ^ > > (30/64) within the precision of our rsqrts and rcps? > > This is a good question, I'll need to follow up. > > I've been assuming that rsqrts and rcps are perfect. "The maximum relative > error for this approximation is less than 1.5*2^-12" seems pretty good, but I'm > sure the error could propagate... I think I'll need to actually measure the > accuracy of these instructions to have a confident answer. > > In the simpler world of "rsqrts and rcps are perfect" it does matter. The diff > to the true value goes from 0.9 (off by 1 max) to 2.9 (off by 3 max). I tested x^(29/64) and x^(30/64) over the 0 to 1 range (at increments of 0.001). x^(29/64) stays within 1 of x^(1/2.2) and x^(30/64) stays within 3. The instructions are accurate enough that we can trust them. https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf... File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkColorSpaceXf... src/core/SkColorSpaceXform.cpp:41: 0.0f == srcToDst.getFloat(3, 0) && On 2016/06/16 13:27:40, mtklein_C wrote: > So the idea is that the Q terms are so uncommon they're not even worth an > optimized routine? Well not right now, I may consider templating them back in or something... I have one image with Q terms. It comes from a website that says, "here are some wacky profiles". And it actually doesn't even hit the opt code anyway because it has a funky gamma. https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/core/SkOpts.h#newco... src/core/SkOpts.h:72: // Color xform RGB1 input into SkPMColor ordered 8888 opaque pixels. On 2016/06/16 13:27:40, mtklein_C wrote: > Doesn't this actually transform RGBA into RGB1 pixels now? It's only because > you've swapped around the matrix that we get SkPMColor out, right? Yes fixing this comment. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:152: static __m128 inverse_gamma_linear_to_2dot2(__m128 x) { On 2016/06/16 13:27:41, mtklein_C wrote: > Just some naming questions. Some of these names feel redundant, and I'm not > sure if they're really redundant or there's some important clarification going > on. Would things still mean the same if they were rewritten like this? > > gamma_srgb_to_linear -> linear_from_srgb > gamma_2dot2_to_linear -> linear_from_2dot2 > inverse_gamma_linear_to_2dot2 -> linear_to_2dot2 > > or even > from_srgb, from_2dot2, to_2dot2 > > ? sgtm, shorter names are better. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:172: const float matrix[16]) { On 2016/06/16 13:27:41, mtklein_C wrote: > This line might want to be re-wrapped? Done. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:174: if (SkColorSpace::kSRGB_GammaNamed == kGammaNamed) { On 2016/06/16 13:27:41, mtklein_C wrote: > Since we're not otherwise using kGammaNamed, I think this would read more > clearly if we just templatize these functions (here and portable) on the table > pointers themselves: > > static const float gamma_srgb_to_linear[] = { ... }; > > template <const float* gamma_to_linear> > static void color_xform_RGB1(...) { > ... use gamma_to_linear directly ... > } > > static void color_xform_RGB1_srgb_to_2dot2(...) { > color_xform_RGB1<gamma_srgb_to_linear>(...); > } > > It's one of the underhyped features of C++11 that you can use pointers to static > const arrays as template arguments. In C++98 they had to be extern. > > And if you want to be really careful, you can specify the template like > template <const float (&gamma_to_linear)[256]> > which will make sure you passed an array with exactly 256 entries. Woohoo, this is cool! https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:238: // clamps to zero. Note that max(NaN, 0) = 0, while max(0, NaN) = NaN. On 2016/06/16 13:27:40, mtklein_C wrote: > Do we have test cases exercising the NaN input? Just want to make sure that if > we think we care, we care enough to test it and keep it correct. Yes we are hitting this case quite frequently actually. The inverse_gamma_xform(0) is giving NaN. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:257: __m128 srcPixel = _mm_setr_ps(gamma_srgb_to_linear[(src[0] >> 0) & 0xFF], On 2016/06/16 13:27:40, mtklein_C wrote: > Wouldn't this part be simpler as, > > // Splat red, green, and blue components. > __m128 r = _mm_set1_ps(gamma_srgb_to_linear[...]), > g = _mm_set1_ps(gamma_srgb_to_linear[...]), > b = _mm_set1_ps(gamma_srgb_to_linear[...]); > > Seems like there's no need for the srcPixel intermediate. Yes I like this better, done. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:275: dstPixel = _mm_min_ps(_mm_max_ps(dstPixel, _mm_setzero_ps()), _mm_set1_ps(255.0f)); On 2016/06/16 13:27:41, mtklein_C wrote: > Let's make the clamping a static function? That can help make sure the > important comment about NaN and argument order is always paired up with the > code. SGTM https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:333: *dst = SkPackARGB32NoCheck(0xFF, On 2016/06/16 13:27:41, mtklein_C wrote: > Didn't you already munge the matrix so that we should always act as if the > output were RGBA and then that is actually SkPMColor order? SkPackARGB32NoCheck > applies SkPMColor order itself, which seems wrong... shouldn't this just be > shifts? > > This is made complicated, of course, by all our non-x86 bots having RGBA as > their SkPMColor order. There shouldn't be any visible bug here, but I think > this code is semantically misleading. Agreed, this is wrong. Using shifts. https://codereview.chromium.org/2060823003/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:346: color_xform_RGB1_portable<kGammaNamed>(dst, src, len, matrix); On 2016/06/16 13:27:40, mtklein_C wrote: > This appears to be a complete passthrough function. Why don't we just rename > what's now color_xform_RGB1_portable to color_xform_RGB1? Of course, Done.
lgtm https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:200: // Splat rX, rY, rZ, and rQ each across a register. We can probably drop references to rQ, gQ, bQ?
https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2060823003/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:200: // Splat rX, rY, rZ, and rQ each across a register. On 2016/06/16 16:46:03, mtklein wrote: > We can probably drop references to rQ, gQ, bQ? Done.
The CQ bit was checked by msarett@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2060823003/100001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by msarett@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2060823003/120001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by msarett@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from brianosman@google.com, mtklein@google.com Link to the patchset: https://codereview.chromium.org/2060823003/#ps120001 (title: "Win test")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2060823003/120001
Message was sent while issue was closed.
Description was changed from ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Implement fast, correct gamma conversion for color xforms 201295.jpg on HP z620 (300x280, most common form of sRGB profile) QCMS Xform 0.495 ms Skia Old Xform 0.235 ms Skia NEW Xform 0.423 ms Vs Old Code 0.56x Vs QCMS 1.17x So to summarize, we are now much slower than before, but still a bit faster than QCMS. And now we are also far more accurate than QCMS :). BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2060823003 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/dea0340cadb759932e53416a657f5ea75fee8b5f ==========
Message was sent while issue was closed.
Committed patchset #6 (id:120001) as https://skia.googlesource.com/skia/+/dea0340cadb759932e53416a657f5ea75fee8b5f |