Issue 1391303002: Optional gradient dithering

Issue 1391303002: Optional gradient dithering (Closed)

Created:
5 years, 2 months ago by f(malita)

Modified:
5 years, 2 months ago

Reviewers:
reed1, mtklein, tomhudson

CC:
reviews_skia.org

Base URL:
https://chromium.googlesource.com/skia.git@master

Target Ref:
refs/heads/master

Project:
skia

Visibility:
Public.

More Reviews

Description

Optional gradient dithering Update raster gradient impls to observe the paint dithering flag. For now this new behavior is disabled pending internal/external client updates to mitigate diffs. BUG=skia:4436 R=reed@google.com,mtklein@google.com,tomhudson@google.com Committed: https://skia.googlesource.com/skia/+/37d8688927988cc71e968e0b5e90689e4e6ed7ab

Patch Set 3 : template-based #

Patch Set 4 : straight kDitherStride32 runtime plumbing #

Patch Set 5 : table-based approach #

Created: 5 years, 2 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+103 lines, -71 lines)			Patch
	M	src/effects/gradients/SkGradientShader.cpp	View	1 2 3 4	15 chunks	+96 lines, -67 lines	0 comments	Download
	M	src/effects/gradients/SkGradientShaderPriv.h	View	1 2 3 4	6 chunks	+7 lines, -4 lines	0 comments	Download

Messages

Total messages: 27 (6 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1391303002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1391303002/40001

5 years, 2 months ago (2015-10-07 21:52:23 UTC) #2

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 2 months ago (2015-10-07 21:58:53 UTC) #4

f(malita)

Patch set #2 is fairly contained, parameterizing the toggle dither stride (0 disables dithering). Unfortunately ...

5 years, 2 months ago (2015-10-07 22:30:08 UTC) #6

Patch set #2 is fairly contained, parameterizing the toggle dither stride (0
disables dithering).  Unfortunately it yields a significant nanobench regression
(~15% for linear gradients).  Apparently this little toggle is quite hot :(

Patch set #3 is template-based and intrusive, but not only it doesn't regress -
it improves perf when dithering is disabled by ~13%.

Based on these numbers, the complexity cost of PS#2 may be justified.  WDYT?


ToT:

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 37.1ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	94.1µs	94.7µs	102µs	279µs	19%	3995	8888	gradient_linear_clamp_shallow
  33/33 
MB	5	80.9µs	81.1µs	81.4µs	139µs	2%	2325	565	gradient_linear_clamp_shallow
  33/33 
MB	5	76.5µs	76.8µs	77.1µs	119µs	2%	2330	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	80.9µs	81.1µs	81.5µs	144µs	3%	2323	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	76.5µs	76.8µs	77.1µs	120µs	3%	2329	8888	gradient_linear_clamp_3color
  33/33 
MB	5	80.9µs	81.1µs	81.3µs	93.7µs	1%	2328	565	gradient_linear_clamp_3color
  33/33 
MB	5	76.5µs	76.8µs	77.2µs	135µs	3%	2328	8888	gradient_linear_clamp_hicolor
  33/33 
MB	5	80.8µs	81.1µs	81.3µs	109µs	1%	2328	565	gradient_linear_clamp_hicolor
  33/33  MB	5	76.5µs	76.8µs	77.3µs	137µs	5%	2323	8888	gradient_linear_clamp
  33/33  MB	5	80.9µs	81.1µs	81.4µs	136µs	2%	2326	565	gradient_linear_clamp


PS#2:

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 39.8ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	90.9µs	91.1µs	98µs	333µs	20%	4109	8888	gradient_linear_clamp_shallow
  32/33 
MB	5	89.9µs	90.7µs	90.9µs	127µs	2%	2095	565	gradient_linear_clamp_shallow
  33/33 
MB	5	90.9µs	91.2µs	91.8µs	156µs	4%	1988	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	89.9µs	90.6µs	90.9µs	114µs	1%	2096	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	90.9µs	91.2µs	92.4µs	188µs	8%	1975	8888	gradient_linear_clamp_3color
  33/33 
MB	5	89.9µs	90.7µs	90.9µs	127µs	2%	2096	565	gradient_linear_clamp_3color
  33/33 
MB	5	90.9µs	91.2µs	91.6µs	151µs	3%	1992	8888	gradient_linear_clamp_hicolor
  33/33  MB	5	89.9µs	90.9µs	91µs	159µs	2%	2094	565	gradient_linear_clamp_hicolor
  33/33  MB	5	90.9µs	91.2µs	91.5µs	149µs	2%	1994	8888	gradient_linear_clamp
  33/33  MB	5	89.9µs	90.7µs	90.9µs	127µs	2%	2096	565	gradient_linear_clamp


PS#3:

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 36.3ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	93.5µs	95.8µs	111µs	261µs	27%	3651	8888	gradient_linear_clamp_shallow
  32/33 
MB	5	68.5µs	70.1µs	70.2µs	110µs	2%	2675	565	gradient_linear_clamp_shallow
  33/33 
MB	5	79µs	80.3µs	80.4µs	120µs	2%	2242	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	81µs	81.5µs	81.8µs	139µs	3%	2315	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	65.8µs	69.2µs	69.2µs	85.4µs	2%	2564	8888	gradient_linear_clamp_3color
  33/33  MB	5	68.2µs	70µs	70.1µs	119µs	3%	2676	565	gradient_linear_clamp_3color
  33/33 
MB	5	66.1µs	69.1µs	69.2µs	80.8µs	2%	2564	8888	gradient_linear_clamp_hicolor
  33/33  MB	5	67.9µs	70µs	70.2µs	111µs	3%	2675	565	gradient_linear_clamp_hicolor
  33/33  MB	5	66.3µs	69.2µs	69.3µs	103µs	2%	2561	8888	gradient_linear_clamp
  33/33  MB	5	67.9µs	69.9µs	70.1µs	122µs	3%	2678	565	gradient_linear_clamp

reed1

Could we not have just made SkGradientShaderBase::kDitherStride32 be a runtime value instead of a const?

5 years, 2 months ago (2015-10-08 14:39:20 UTC) #7

f(malita)

On 2015/10/08 14:39:20, reed1 wrote: > Could we not have just made SkGradientShaderBase::kDitherStride32 be a ...

5 years, 2 months ago (2015-10-08 15:42:48 UTC) #8

f(malita)

On 2015/10/08 15:42:48, f(malita) wrote: > On 2015/10/08 14:39:20, reed1 wrote: > > Could we ...

5 years, 2 months ago (2015-10-08 15:43:33 UTC) #9

reed1

Another 2nd-order consideration: If we're going to NOT dither, then the tables we built are ...

5 years, 2 months ago (2015-10-08 15:44:23 UTC) #10

f(malita)

On 2015/10/08 15:44:23, reed1 wrote: > Another 2nd-order consideration: > > If we're going to ...

5 years, 2 months ago (2015-10-08 16:01:03 UTC) #11

reed1

On 2015/10/08 16:01:03, f(malita) wrote: > On 2015/10/08 15:44:23, reed1 wrote: > > Another 2nd-order ...

5 years, 2 months ago (2015-10-08 16:20:25 UTC) #12

tomhudson

Since Mike's got concerns about the approach, reading the code may not have been useful, ...

5 years, 2 months ago (2015-10-08 16:43:04 UTC) #13

f(malita)

On 2015/10/08 16:43:04, tomhudson wrote: > What PS#2 does looks like a really expensive, inefficient ...

5 years, 2 months ago (2015-10-08 19:47:26 UTC) #14

On 2015/10/08 16:43:04, tomhudson wrote:
> What PS#2 does looks like a really expensive, inefficient way to do it -
object
> construction, object assignment, obj-to-int cast overridden to return a member
> field. Are we confident that the compiler was really optimizing all that away?

Confident is overstating it, but yes, I would hope modern compilers
inline/optimize
all that away.

> Or could we get a more primitive implementation that wouldn't have the perf
> impact? Also, it isn't quite exactly the same thing as Mike proposed in #7,
> "make kDitherStride32 a runtime value".

Yup, that's what I was just trying - see patch set #4.  Mike, is this what you
had in
mind?

Unfortunately it still shows the same perf degradation as PS#2.  It's surprising
to me
that swapping a constant with something that should fit in a register for the
duration
of the hot loops has such an impact.  Register spilling?

[ToT]

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 36.9ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	93.1µs	93.6µs	101µs	226µs	19%	4020	8888	gradient_linear_clamp_shallow
  32/33 
MB	5	80.7µs	81.1µs	81.4µs	209µs	4%	2324	565	gradient_linear_clamp_shallow
  33/33 
MB	5	76.4µs	79.1µs	79.1µs	146µs	3%	2275	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	80.8µs	81µs	81.4µs	125µs	2%	2325	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	76.4µs	78.6µs	78.8µs	135µs	3%	2282	8888	gradient_linear_clamp_3color
  33/33 
MB	5	80.8µs	81.1µs	81.5µs	158µs	3%	2321	565	gradient_linear_clamp_3color
  33/33 
MB	5	76.4µs	78.5µs	78.8µs	115µs	2%	2284	8888	gradient_linear_clamp_hicolor
  33/33  MB	5	80.8µs	81µs	81.4µs	140µs	2%	2325	565	gradient_linear_clamp_hicolor
  33/33  MB	5	76.4µs	78.8µs	79µs	149µs	4%	2278	8888	gradient_linear_clamp
  33/33  MB	5	80.8µs	81.3µs	85.7µs	156µs	15%	2209	565	gradient_linear_clamp


[PS#4]

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 33ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	87.2µs	96.5µs	108µs	349µs	25%	3737	8888	gradient_linear_clamp_shallow
  32/33 
MB	4	87.2µs	89.5µs	89.8µs	162µs	3%	2616	565	gradient_linear_clamp_shallow
  33/33 
MB	4	87.1µs	93.1µs	93.7µs	206µs	4%	2388	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	4	87.3µs	87.5µs	88µs	135µs	2%	2666	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	4	86.9µs	92.7µs	94.4µs	149µs	7%	2366	8888	gradient_linear_clamp_3color
  33/33 
MB	4	87.8µs	89.6µs	89.8µs	122µs	1%	2617	565	gradient_linear_clamp_3color
  33/33 
MB	4	86.9µs	92.9µs	93.8µs	169µs	4%	2386	8888	gradient_linear_clamp_hicolor
  33/33 
MB	4	87.3µs	89.6µs	89.8µs	177µs	3%	2615	565	gradient_linear_clamp_hicolor
  33/33  MB	4	86.9µs	92.6µs	93.5µs	178µs	6%	2390	8888	gradient_linear_clamp
  33/33  MB	4	87.3µs	89.6µs	90.2µs	170µs	5%	2603	565	gradient_linear_clamp

tomhudson

On 2015/10/08 19:47:26, f(malita) wrote: > On 2015/10/08 16:43:04, tomhudson wrote: > > What PS#2 ...

5 years, 2 months ago (2015-10-08 19:52:21 UTC) #15

reed1

Hmmmm. 1. I can believe the slow down, esp. in the very trivial cases where ...

5 years, 2 months ago (2015-10-08 20:32:18 UTC) #16

f(malita)

On 2015/10/08 20:32:18, reed1 wrote: > Hmmmm. > > 1. I can believe the slow ...

5 years, 2 months ago (2015-10-08 21:09:46 UTC) #17

On 2015/10/08 20:32:18, reed1 wrote:
> Hmmmm.
> 
> 1. I can believe the slow down, esp. in the very trivial cases where our loop
is
> only 3-4 instructions.
> 
> 2. Not sure that I like using the template per-se, in that it is a lot of
binary
> code-bloat.
> 
> 3. In the "big" picture, not sure how important this perf-diff is
> 
> 4. In the medium term, we are very likely to rewrite much of this code anyway,
> to handle the new hot bugs around steep gradients with lots of color-stops.

ACK

> 5. Given all of that, I am willing to go with whatever approach you choose,
but
> I do think we should consider the table itself (i.e. setting the bias to 1/2).

Done, see patch set #5.  I like this approach because it's contained and solves
both the rounding bias & disables dithering in one swoop - plus there's no perf
regression.

The gotcha is we may need to recompute the cache tables if the shader is reused
with a different dither flag, but I presume this case is uncommon.

(not sure whether there was any hidden bias in Build16bitCache that I missed,
please double-check)


[ToT]

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 37.1ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	93.9µs	94.5µs	102µs	221µs	19%	3997	8888	gradient_linear_clamp_shallow
  32/33 
MB	5	80.8µs	81.2µs	81.7µs	140µs	4%	2318	565	gradient_linear_clamp_shallow
  33/33 
MB	5	76.5µs	77.4µs	77.6µs	110µs	2%	2316	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	80.8µs	81.2µs	81.5µs	135µs	3%	2323	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	76.5µs	77.4µs	77.5µs	97.8µs	1%	2319	8888	gradient_linear_clamp_3color
  33/33 
MB	5	80.8µs	81.2µs	81.5µs	140µs	3%	2323	565	gradient_linear_clamp_3color
  33/33 
MB	5	76.5µs	77.4µs	77.6µs	108µs	2%	2317	8888	gradient_linear_clamp_hicolor
  33/33 
MB	5	80.8µs	81.2µs	81.6µs	134µs	4%	2320	565	gradient_linear_clamp_hicolor
  33/33  MB	5	76.5µs	77.4µs	77.4µs	89.4µs	1%	2320	8888	gradient_linear_clamp
  33/33  MB	5	80.8µs	81.2µs	81.6µs	133µs	3%	2322	565	gradient_linear_clamp


[PS#5]

out/Release/nanobench -m gradient_linear_clamp --samplingTime 1s --config 8888
565
Timer overhead: 37.3ns
curr/maxrss	loops	min	median	mean	max	stddev	samples	config	bench
  33/33 
MB	2	94.1µs	94.6µs	102µs	218µs	19%	3994	8888	gradient_linear_clamp_shallow
  32/33 
MB	5	80.7µs	81.1µs	81.5µs	103µs	2%	2322	565	gradient_linear_clamp_shallow
  33/33 
MB	5	76.6µs	77.2µs	77.4µs	124µs	2%	2321	8888	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	80.8µs	81.2µs	81.7µs	134µs	3%	2318	565	gradient_linear_clamp_shallow_dither
  33/33 
MB	5	76.6µs	77.2µs	77.5µs	97.4µs	2%	2320	8888	gradient_linear_clamp_3color
  33/33 
MB	5	80.8µs	81.1µs	81.6µs	142µs	3%	2320	565	gradient_linear_clamp_3color
  33/33 
MB	5	76.6µs	77.2µs	77.5µs	127µs	3%	2318	8888	gradient_linear_clamp_hicolor
  33/33 
MB	5	80.7µs	81.3µs	82.6µs	121µs	5%	2293	565	gradient_linear_clamp_hicolor
  33/33  MB	5	76.6µs	77.2µs	77.5µs	96.1µs	2%	2319	8888	gradient_linear_clamp
  33/33  MB	5	80.7µs	81.4µs	88.6µs	218µs	22%	2137	565	gradient_linear_clamp

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1391303002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1391303002/80001

5 years, 2 months ago (2015-10-08 21:12:54 UTC) #19

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 2 months ago (2015-10-08 21:25:04 UTC) #21

tomhudson

> The gotcha is we may need to recompute the cache tables if the shader ...

5 years, 2 months ago (2015-10-08 21:37:32 UTC) #22

f(malita)

On 2015/10/08 21:37:32, tomhudson wrote: > > The gotcha is we may need to recompute ...

5 years, 2 months ago (2015-10-09 12:41:38 UTC) #23

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1391303002/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1391303002/80001

5 years, 2 months ago (2015-10-09 17:22:09 UTC) #26

commit-bot: I haz the power

5 years, 2 months ago (2015-10-09 17:23:16 UTC) #27

Message was sent while issue was closed.

Committed patchset #5 (id:80001) as
https://skia.googlesource.com/skia/+/37d8688927988cc71e968e0b5e90689e4e6ed7ab

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages