|
|
DescriptionOptimize color xforms with 2.2 gammas for SSE2
Because we recognize commonly used gamma tables and
parameters as 2.2f, about 98% of jpegs with color profiles
will pass through this xform (assuming the dst is also
2.2f). Sample size is 10,322 jpegs.
I won't go crazy with performance numbers because this is
a work in progress, particularly in terms of correctness.
201295.jpg on HP z620
(300x280, most common form of sRGB profile)
Decode Time + QCMS Xform 1.28 ms
QCMS Xform Only 0.495 ms
Decode Time + Skia Opt Xform 1.01 ms
Skia Opt Xform Only 0.235 ms
Decode Time + Xform Speed-up 1.27x
Xform Only Speed-up 2.11x
FWIW, Skia xform time before these optimizations was
41.1 ms. But we expected that code to be slow.
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002
CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot
Committed: https://skia.googlesource.com/skia/+/a9e878c836994bce695274b4c28890290139dcdf
Patch Set 1 : #
Total comments: 2
Patch Set 2 : #
Total comments: 20
Patch Set 3 : Response to comments #
Total comments: 2
Patch Set 4 : Clarify different versions of clamping #
Total comments: 6
Patch Set 5 : Response to comments #
Messages
Total messages: 35 (19 generated)
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 BUG=skia: ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:1) has been deleted
Patchset #1 (id:20001) has been deleted
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 images. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 8.72 ms QCMS Xform Only 7.88 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 8.72x Xform Only Speed-up 34.9x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 images. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 8.72 ms QCMS Xform Only 7.88 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 8.72x Xform Only Speed-up 34.9x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 8.72 ms QCMS Xform Only 7.88 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 8.72x Xform Only Speed-up 34.9x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Patchset #1 (id:40001) has been deleted
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 8.72 ms QCMS Xform Only 7.88 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 8.72x Xform Only Speed-up 34.9x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 1.28x Xform Only Speed-up 2.19x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 1.28x Xform Only Speed-up 2.19x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 1.28x Xform Only Speed-up 2.19x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
msarett@google.com changed reviewers: + mtklein@google.com, scroggo@google.com
I was hoping to win by a bit more (given that we may need to tweak things for accuracy), but we're solidly faster than qcms. Should be a bigger win on ARM, since qcms doesn't optimize for that. https://codereview.chromium.org/2046013002/diff/60001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:15: static void color_xform_2Dot2_RGBA_to_8888_portable(uint32_t* dst, const uint32_t* src, int len, This function is moved from SkColorSpaceXform.cpp with a few changes: * Gamma is estimated instead of using pow(x, 2.2) * No longer normalizing floats to the 0-1 range. Seems completely fine to work with them in the 0-255 range. * No longer clamping the output * Added comments https://codereview.chromium.org/2046013002/diff/60001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:100: __m128 rX = _mm_shuffle_ps(rXgXbX, rXgXbX, 0); The compiler moves these shuffle instructions outside of the loop, which is what we want I think. Though things do spill out of the registers a little bit. Not sure if it's more clear to leave the code here or move it out of the loop.
I take it the plan is: 1) pin down correctness 2) NEON 3) AVX+ ? https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:55: // or is there a performance reason to clamp only when necessary? Also, clamping is not I think clamping is always the right choice when converting down to bytes. If you're keeping these in float or half, sure, you can get philosophical about whether or when to clamp. But it really makes no sense to claim you're converting 256.3 to a byte. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:90: // Should we be more accurate? Luckily we've been working on just this problem! Grab me tomorrow and I can walk you through all the options for speed/accuracy tradeoffs. Brian Osman will be able to give us the exact constants once we pick a particular formula skeletons. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:97: // rQ, gQ, and bQ are almost always zero. Can we save a couple instructions? Seems worthwhile, given you can do the check once outside the pixel loop. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:99: // Copy rX, rY, rZ, and rQ across their vector own vectors. I think I get what you're saying here, but this doesn't strike me as well-formed English. // Splat rX, rY, rZ, and rQ each across a register. ? https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:100: __m128 rX = _mm_shuffle_ps(rXgXbX, rXgXbX, 0); I sort of like to write this sort of shuffle as 0x00 as a reminder it's a bit pattern. And of course I'll prefer 0b00000000 once C++14 rolls around. :) https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:118: __m128 dstGreens = _mm_mul_ps(reds, gX); This mul-add chain makes me think we should follow up with versions using AVX and/or AVX+FMA (≈AVX2). https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:138: dstReds = _mm_sqrt_ps(dstReds); You might want to try _mm_rcp_ps(_mm_rsqrt_ps(...)) as a faster sqrt. It should be precise to 11 bits, and very noticeably faster, at least 2x and on some machines up to something like 7x. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:144: // We need to clamp. Definitely. Any overflow here will clobber neighboring channels. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkOpts_ssse3.cpp File src/opts/SkOpts_ssse3.cpp (right): https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkOpts_ssse3.c... src/opts/SkOpts_ssse3.cpp:33: color_xform_2Dot2_RGBA_to_8888 = sk_ssse3::color_xform_2Dot2_RGBA_to_8888; I think everything you used is ≤SSE2. sk_default:: should cover it fine. Did I miss something?
https://codereview.chromium.org/2046013002/diff/80001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2046013002/diff/80001/src/core/SkOpts.h#newco... src/core/SkOpts.h:74: extern void (*color_xform_2Dot2_RGBA_to_8888)(uint32_t* dst, const uint32_t* src, int len, Is this really meant to be 2.2 gamma or is it the sRGB curve? Or are those two close enough for you not to care?
> I take it the plan is: > 1) pin down correctness > 2) NEON > 3) AVX+ > ? Yeah, I think that's how we should we move forward. I may have jumped the gun on NEON a little bit, but I think pinning down correctness first is a better use of time. I'll do that next. https://codereview.chromium.org/2046013002/diff/80001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2046013002/diff/80001/src/core/SkOpts.h#newco... src/core/SkOpts.h:74: extern void (*color_xform_2Dot2_RGBA_to_8888)(uint32_t* dst, const uint32_t* src, int len, On 2016/06/08 02:01:40, mtklein wrote: > Is this really meant to be 2.2 gamma or is it the sRGB curve? Or are those two > close enough for you not to care? It is both. We are treating the two as close enough to not care... Gold compares our outputs to QCMS, so this should help make sure that our results are accurate enough. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:55: // or is there a performance reason to clamp only when necessary? Also, clamping is not On 2016/06/08 01:41:40, mtklein wrote: > I think clamping is always the right choice when converting down to bytes. If > you're keeping these in float or half, sure, you can get philosophical about > whether or when to clamp. But it really makes no sense to claim you're > converting 256.3 to a byte. Yeah I think it's fine (and in many cases necessary) to clamp. I was suggesting that, in some cases, the form of the srcToDst matrix might guarantee that the final result will be between 0 and 255. I've added clamping. It adds about 20us to xform time. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:90: // Should we be more accurate? On 2016/06/08 01:41:40, mtklein wrote: > Luckily we've been working on just this problem! Grab me tomorrow and I can > walk you through all the options for speed/accuracy tradeoffs. Brian Osman will > be able to give us the exact constants once we pick a particular formula > skeletons. Very cool, sounds good. I plan to start by landing this (to see diffs on Gold), then to start to mess with the formulas until we are at least as accurate as QCMS. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:97: // rQ, gQ, and bQ are almost always zero. Can we save a couple instructions? On 2016/06/08 01:41:40, mtklein wrote: > Seems worthwhile, given you can do the check once outside the pixel loop. This actually improves performance more than I suspected. Minus about 25us. Maybe it allows us to do a better job of keeping these constants in the registers. I think I'll leave the TODO and handle in a follow-up. I think we'll need to use a different proc (or template this one)... https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:99: // Copy rX, rY, rZ, and rQ across their vector own vectors. On 2016/06/08 01:41:41, mtklein wrote: > I think I get what you're saying here, but this doesn't strike me as well-formed > English. > > // Splat rX, rY, rZ, and rQ each across a register. > > ? Yes that's better :). https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:100: __m128 rX = _mm_shuffle_ps(rXgXbX, rXgXbX, 0); On 2016/06/08 01:41:41, mtklein wrote: > I sort of like to write this sort of shuffle as 0x00 as a reminder it's a bit > pattern. And of course I'll prefer 0b00000000 once C++14 rolls around. :) sgtm https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:118: __m128 dstGreens = _mm_mul_ps(reds, gX); On 2016/06/08 01:41:41, mtklein wrote: > This mul-add chain makes me think we should follow up with versions using AVX > and/or AVX+FMA (≈AVX2). Yes! I had the same thought. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:138: dstReds = _mm_sqrt_ps(dstReds); On 2016/06/08 01:41:40, mtklein wrote: > You might want to try _mm_rcp_ps(_mm_rsqrt_ps(...)) as a faster sqrt. It should > be precise to 11 bits, and very noticeably faster, at least 2x and on some > machines up to something like 7x. Great! Drops about 10us. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkColorXform_o... src/opts/SkColorXform_opts.h:144: // We need to clamp. On 2016/06/08 01:41:40, mtklein wrote: > Definitely. Any overflow here will clobber neighboring channels. Done. https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkOpts_ssse3.cpp File src/opts/SkOpts_ssse3.cpp (right): https://codereview.chromium.org/2046013002/diff/80001/src/opts/SkOpts_ssse3.c... src/opts/SkOpts_ssse3.cpp:33: color_xform_2Dot2_RGBA_to_8888 = sk_ssse3::color_xform_2Dot2_RGBA_to_8888; On 2016/06/08 01:41:41, mtklein wrote: > I think everything you used is ≤SSE2. sk_default:: should cover it fine. Did I > miss something? Thanks, yes SSE2 is fine! Didn't realize that default == sse2.
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.00 ms Skia Opt Xform Only 0.226 ms Decode Time + Xform Speed-up 1.28x Xform Only Speed-up 2.19x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ==========
https://codereview.chromium.org/2046013002/diff/100001/src/opts/SkColorXform_... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/100001/src/opts/SkColorXform_... src/opts/SkColorXform_opts.h:15: static uint8_t clamp_float_to_byte(float v) { This method is slightly different than the other one (in SkColorSpaceXform), despite having the same name. The other one multiplies by 255 first, and compares against 255 instead of 254.5. Are they intended to be different? If so, maybe give them different names than clarify the difference?
https://codereview.chromium.org/2046013002/diff/100001/src/opts/SkColorXform_... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/100001/src/opts/SkColorXform_... src/opts/SkColorXform_opts.h:15: static uint8_t clamp_float_to_byte(float v) { On 2016/06/08 17:08:34, scroggo wrote: > This method is slightly different than the other one (in SkColorSpaceXform), > despite having the same name. > > The other one multiplies by 255 first, and compares against 255 instead of > 254.5. > > Are they intended to be different? If so, maybe give them different names than > clarify the difference? Done. I think the version with 254.5 and 0.5 is "better", because we will be less likely to need to cast float to int. I'll update both versions to behave like that. The other difference is that here we assume that the float corresponds to 0-255 (while the other assumes 0-1). I'm still figuring out whether we need to normalize to 0-1 or not. It may depend on how we decide to approximate the gamma curve. I've changed the name in one of the functions to reference "normalized floats". It's likely that both implementations will keep changing.
lgtm https://codereview.chromium.org/2046013002/diff/120001/src/core/SkColorSpaceX... File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2046013002/diff/120001/src/core/SkColorSpaceX... src/core/SkColorSpaceXform.cpp:41: SkColorSpace::k2Dot2Curve_GammaNamed == dstSpace->gammaNamed()) { Wouldn't you feel so much better if all those letters lined up in tidy parallel? Maybe just me. https://codereview.chromium.org/2046013002/diff/120001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2046013002/diff/120001/src/core/SkOpts.h#newc... src/core/SkOpts.h:72: // Color xform RGBA input into platform dependent 8888 pixels. Does not premultiply, and might be clearer to "platform dependent" -> "SkPMColor order" ? https://codereview.chromium.org/2046013002/diff/120001/src/opts/SkColorXform_... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/120001/src/opts/SkColorXform_... src/opts/SkColorXform_opts.h:145: dstReds = _mm_min_ps(_mm_max_ps(dstReds, _mm_setzero_ps()), _mm_set1_ps(255.0f)); Aww man, it looks so much prettier if you write max(0, min(x, 255)). The x is between 0 and 255 :P
https://codereview.chromium.org/2046013002/diff/120001/src/core/SkColorSpaceX... File src/core/SkColorSpaceXform.cpp (right): https://codereview.chromium.org/2046013002/diff/120001/src/core/SkColorSpaceX... src/core/SkColorSpaceXform.cpp:41: SkColorSpace::k2Dot2Curve_GammaNamed == dstSpace->gammaNamed()) { On 2016/06/08 18:49:57, mtklein wrote: > Wouldn't you feel so much better if all those letters lined up in tidy parallel? > Maybe just me. Haha sure. https://codereview.chromium.org/2046013002/diff/120001/src/core/SkOpts.h File src/core/SkOpts.h (right): https://codereview.chromium.org/2046013002/diff/120001/src/core/SkOpts.h#newc... src/core/SkOpts.h:72: // Color xform RGBA input into platform dependent 8888 pixels. Does not premultiply, and On 2016/06/08 18:49:57, mtklein wrote: > might be clearer to "platform dependent" -> "SkPMColor order" ? Done. https://codereview.chromium.org/2046013002/diff/120001/src/opts/SkColorXform_... File src/opts/SkColorXform_opts.h (right): https://codereview.chromium.org/2046013002/diff/120001/src/opts/SkColorXform_... src/opts/SkColorXform_opts.h:145: dstReds = _mm_min_ps(_mm_max_ps(dstReds, _mm_setzero_ps()), _mm_set1_ps(255.0f)); On 2016/06/08 18:49:57, mtklein wrote: > Aww man, it looks so much prettier if you write > > max(0, min(x, 255)). > > The x is between 0 and 255 :P Agreed!
The CQ bit was checked by msarett@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2046013002/140001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...) Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by msarett@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2046013002/140001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86-D...) Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by msarett@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2046013002/140001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by msarett@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from mtklein@google.com Link to the patchset: https://codereview.chromium.org/2046013002/#ps140001 (title: "Response to comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2046013002/140001
Message was sent while issue was closed.
Description was changed from ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot ========== to ========== Optimize color xforms with 2.2 gammas for SSE2 Because we recognize commonly used gamma tables and parameters as 2.2f, about 98% of jpegs with color profiles will pass through this xform (assuming the dst is also 2.2f). Sample size is 10,322 jpegs. I won't go crazy with performance numbers because this is a work in progress, particularly in terms of correctness. 201295.jpg on HP z620 (300x280, most common form of sRGB profile) Decode Time + QCMS Xform 1.28 ms QCMS Xform Only 0.495 ms Decode Time + Skia Opt Xform 1.01 ms Skia Opt Xform Only 0.235 ms Decode Time + Xform Speed-up 1.27x Xform Only Speed-up 2.11x FWIW, Skia xform time before these optimizations was 41.1 ms. But we expected that code to be slow. BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2046013002 CQ_EXTRA_TRYBOTS=client.skia:Test-Ubuntu-GCC-GCE-CPU-AVX2-x86_64-Release-SKNX_NO_SIMD-Trybot Committed: https://skia.googlesource.com/skia/+/a9e878c836994bce695274b4c28890290139dcdf ==========
Message was sent while issue was closed.
Committed patchset #5 (id:140001) as https://skia.googlesource.com/skia/+/a9e878c836994bce695274b4c28890290139dcdf |