|
|
DescriptionApply linear gradient premul in 4f
(spotted by reed@)
Instead of converting to SkPMColor first, apply the premul in 4f and
then just store.
A modified bench shows a significant speedup:
17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color
17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor
17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp
vs.
17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color
17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor
17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp
R=reed@google.com,brianosman@google.com
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003
Committed: https://skia.googlesource.com/skia/+/0ce4f230eb2e95523557bf6c6a195f51bc88163f
Patch Set 1 #
Total comments: 2
Patch Set 2 : deferred dithering #
Total comments: 3
Patch Set 3 : cleanup #Messages
Total messages: 34 (19 generated)
Description was changed from ========== Apply linear gradient premul in 4f Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows almost 2x speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com ========== to ========== Apply linear gradient premul in 4f Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows almost 2x speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ==========
Description was changed from ========== Apply linear gradient premul in 4f Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows almost 2x speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ========== to ========== Apply linear gradient premul in 4f Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows a significant speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ==========
I'll land a Chromium guard first. This is for the legacy 4f impl only, planning to update the new impl separately.
Description was changed from ========== Apply linear gradient premul in 4f Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows a significant speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ========== to ========== Apply linear gradient premul in 4f (spotted by reed@) Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows a significant speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ==========
The CQ bit was checked by fmalita@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... File src/effects/gradients/SkLinearGradient.cpp (right): https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... src/effects/gradients/SkLinearGradient.cpp:543: c4f255 = Sk4f::Min(Sk4f::Max(x, Sk4f(0)), Sk4f(255)); does the store() call also pin? Why don't we have to pin in the non-alpha case?
https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... File src/effects/gradients/SkLinearGradient.cpp (right): https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... src/effects/gradients/SkLinearGradient.cpp:543: c4f255 = Sk4f::Min(Sk4f::Max(x, Sk4f(0)), Sk4f(255)); On 2016/10/10 18:20:17, reed1 wrote: > does the store() call also pin? Why don't we have to pin in the non-alpha case? Good question. Doesn't look like it pins, just truncates - but somehow we don't run into trouble in that case (maybe because all values are biased the same way). When we apply the premul in 4f though we seem to sometimes end up breaking the premul invariant (rgb > a) without a pin, presumably because after premul the channels truncate differently. This manifests as dithery-looking artifacts in the output. Also, I seem to recall we pre-bias all values by 0.5 to prepare for truncate -- so maybe we only need an upper clamp on alpha? Let me see if I tweak this some more.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/10/10 18:33:18, f(malita) wrote: > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > File src/effects/gradients/SkLinearGradient.cpp (right): > > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > src/effects/gradients/SkLinearGradient.cpp:543: c4f255 = Sk4f::Min(Sk4f::Max(x, > Sk4f(0)), Sk4f(255)); > On 2016/10/10 18:20:17, reed1 wrote: > > does the store() call also pin? Why don't we have to pin in the non-alpha > case? > > Good question. > > Doesn't look like it pins, just truncates - but somehow we don't run into > trouble in that case (maybe because all values are biased the same way). > > When we apply the premul in 4f though we seem to sometimes end up breaking the > premul invariant (rgb > a) without a pin, presumably because after premul the > channels truncate differently. This manifests as dithery-looking artifacts in > the output. > > Also, I seem to recall we pre-bias all values by 0.5 to prepare for truncate -- > so maybe we only need an upper clamp on alpha? Let me see if I tweak this some > more. or do we need to logically remove the 0.5 before the scale, and then re-add it x' = (x - 0.5) * alpha + 0.5 ?
On 2016/10/10 18:48:51, reed1 wrote: > On 2016/10/10 18:33:18, f(malita) wrote: > > > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > > File src/effects/gradients/SkLinearGradient.cpp (right): > > > > > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > > src/effects/gradients/SkLinearGradient.cpp:543: c4f255 = > Sk4f::Min(Sk4f::Max(x, > > Sk4f(0)), Sk4f(255)); > > On 2016/10/10 18:20:17, reed1 wrote: > > > does the store() call also pin? Why don't we have to pin in the non-alpha > > case? > > > > Good question. > > > > Doesn't look like it pins, just truncates - but somehow we don't run into > > trouble in that case (maybe because all values are biased the same way). > > > > When we apply the premul in 4f though we seem to sometimes end up breaking the > > premul invariant (rgb > a) without a pin, presumably because after premul the > > channels truncate differently. This manifests as dithery-looking artifacts in > > the output. > > > > Also, I seem to recall we pre-bias all values by 0.5 to prepare for truncate > -- > > so maybe we only need an upper clamp on alpha? Let me see if I tweak this > some > > more. > > or do we need to logically remove the 0.5 before the scale, and then re-add it > > x' = (x - 0.5) * alpha + 0.5 > ? That alone doesn't seem to suffice. I suspect we'd also have to undo the dither portion of the bias, which is not easily accessible at this stage. The minimal tweak I've found to work is to truncate alpha in the scale calculation (which makes sense, since that's what we're going to commit to pmcolor, e.g.: const float scale = static_cast<uint8_t>(x[SkPM4f::A]) * 1.f / 255; This does have a significant overhead though, and I'm not convinced that it produces accurate results... do we want the bias to be compressed by alpha or not? If not, then maybe we should defer it until after premul?
On 2016/10/10 19:22:24, f(malita) wrote: > On 2016/10/10 18:48:51, reed1 wrote: > > On 2016/10/10 18:33:18, f(malita) wrote: > > > > > > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > > > File src/effects/gradients/SkLinearGradient.cpp (right): > > > > > > > > > https://codereview.chromium.org/2409583003/diff/1/src/effects/gradients/SkLin... > > > src/effects/gradients/SkLinearGradient.cpp:543: c4f255 = > > Sk4f::Min(Sk4f::Max(x, > > > Sk4f(0)), Sk4f(255)); > > > On 2016/10/10 18:20:17, reed1 wrote: > > > > does the store() call also pin? Why don't we have to pin in the non-alpha > > > case? > > > > > > Good question. > > > > > > Doesn't look like it pins, just truncates - but somehow we don't run into > > > trouble in that case (maybe because all values are biased the same way). > > > > > > When we apply the premul in 4f though we seem to sometimes end up breaking > the > > > premul invariant (rgb > a) without a pin, presumably because after premul > the > > > channels truncate differently. This manifests as dithery-looking artifacts > in > > > the output. > > > > > > Also, I seem to recall we pre-bias all values by 0.5 to prepare for truncate > > -- > > > so maybe we only need an upper clamp on alpha? Let me see if I tweak this > > some > > > more. > > > > or do we need to logically remove the 0.5 before the scale, and then re-add it > > > > x' = (x - 0.5) * alpha + 0.5 > > ? > > That alone doesn't seem to suffice. I suspect we'd also have to undo the dither > portion of the bias, which is not easily accessible at this stage. > > The minimal tweak I've found to work is to truncate alpha in the scale > calculation (which makes sense, since that's what we're going to commit to > pmcolor, e.g.: > > const float scale = static_cast<uint8_t>(x[SkPM4f::A]) * 1.f / 255; > > This does have a significant overhead though, and I'm not convinced that it > produces accurate results... do we want the bias to be compressed by alpha or > not? If not, then maybe we should defer it until after premul? Curse you dither!!!
It turns out it's not quite that hard to defer dithering when we need to premul. Updated the CL, WDYT?
The CQ bit was checked by fmalita@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
wow, that seems very tricky, that at several call-sites we have to know to add the bias ourselves, or pass it as a parameter. Does it have to be that complex? https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... File src/effects/gradients/SkLinearGradient.cpp (right): https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... src/effects/gradients/SkLinearGradient.cpp:543: const float scale = x[SkPM4f::A] * 1.f / 255; are we sure the 1 / 255 *always* gets resolved at compile time into a single multiply?
mtklein@google.com changed reviewers: + mtklein@google.com
https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... File src/effects/gradients/SkLinearGradient.cpp (right): https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... src/effects/gradients/SkLinearGradient.cpp:543: const float scale = x[SkPM4f::A] * 1.f / 255; On 2016/10/11 16:31:39, reed1 wrote: > are we sure the 1 / 255 *always* gets resolved at compile time into a single > multiply? No, as written this will never be a single multiply. It will multiply then divide. You should always write 1/255 as (1/255.0f). That'll never go wrong.
On 2016/10/11 16:31:39, reed1 wrote: > wow, that seems very tricky, that at several call-sites we have to know to add > the bias ourselves, or pass it as a parameter. Does it have to be that complex? Cleaned up a bit, added a couple of helpers to encapsulate the pre/post bias logic. PTAL. https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... File src/effects/gradients/SkLinearGradient.cpp (right): https://codereview.chromium.org/2409583003/diff/20001/src/effects/gradients/S... src/effects/gradients/SkLinearGradient.cpp:543: const float scale = x[SkPM4f::A] * 1.f / 255; On 2016/10/11 16:38:15, mtklein wrote: > On 2016/10/11 16:31:39, reed1 wrote: > > are we sure the 1 / 255 *always* gets resolved at compile time into a single > > multiply? > > No, as written this will never be a single multiply. It will multiply then > divide. > > You should always write 1/255 as (1/255.0f). That'll never go wrong. Done.
The CQ bit was checked by fmalita@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
The CQ bit was checked by fmalita@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1481041076045420, "parent_rev": "83926346f106a63e72e768614ebc70c6d4c93b6f", "commit_rev": "0ce4f230eb2e95523557bf6c6a195f51bc88163f"}
Message was sent while issue was closed.
Description was changed from ========== Apply linear gradient premul in 4f (spotted by reed@) Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows a significant speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 ========== to ========== Apply linear gradient premul in 4f (spotted by reed@) Instead of converting to SkPMColor first, apply the premul in 4f and then just store. A modified bench shows a significant speedup: 17/17 MB 1 690µs 695µs 695µs 700µs 0% █▅▁▄▆▅▁▅▆▃ 8888 gradient_linear_clamp_3color 17/17 MB 1 832µs 837µs 839µs 870µs 1% █▁▁▂▂▂▂▂▂▂ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 651µs 659µs 665µs 701µs 3% ▆█▅▁▂▂▁▁▂▂ 8888 gradient_linear_clamp vs. 17/17 MB 1 1.03ms 1.03ms 1.04ms 1.08ms 2% ██▇▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_3color 17/17 MB 1 1.17ms 1.18ms 1.18ms 1.22ms 1% █▄▂▁▁▁▁▁▁▁ 8888 gradient_linear_clamp_hicolor 17/17 MB 1 1.1ms 1.15ms 1.14ms 1.16ms 2% ▇▇▇▇▇▇██▁▁ 8888 gradient_linear_clamp R=reed@google.com,brianosman@google.com GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2409583003 Committed: https://skia.googlesource.com/skia/+/0ce4f230eb2e95523557bf6c6a195f51bc88163f ==========
Message was sent while issue was closed.
Committed patchset #3 (id:40001) as https://skia.googlesource.com/skia/+/0ce4f230eb2e95523557bf6c6a195f51bc88163f |