|
|
Descriptionspeedup mip builders
1. push the inner-loop into the procs
2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand
Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air
Before
9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512
9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512
9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511
9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511
After
10/10 MB 1 1.08ms 1.09ms 1.1ms 1.18ms 3% ▁▁▁▁▁▁▁█▃▁ nonrendering mipmap_build_512x512
10/10 MB 1 1.22ms 1.44ms 1.66ms 2.83ms 30% ▂▂▄▁▁▃█▅▂▁ nonrendering mipmap_build_511x512
10/10 MB 1 1.45ms 1.91ms 2.04ms 3.75ms 36% ▁▁▁▃▅█▃▂▂▂ nonrendering mipmap_build_512x511
10/10 MB 1 1.7ms 1.7ms 1.81ms 2.41ms 13% █▁▁▁▁▁▁▁▁▄ nonrendering mipmap_build_511x511
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1593073002
Committed: https://skia.googlesource.com/skia/+/326253ef22d39e72a729e8069e54b34ade72ad1b
Patch Set 1 : 1. push inner-loop into proc #Patch Set 2 : 2. remove redundant load/expand of 1 pixel per-row #
Total comments: 4
Patch Set 3 : fix copy/paste bug, remove unneeded prevW/H #Patch Set 4 : smarten up loop setup (via fmalita) #Messages
Total messages: 28 (13 generated)
Description was changed from ========== speedup mip builders BUG=skia: ========== to ========== speedup mip builders BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
reed@google.com changed reviewers: + fmalita@chromium.org, mtklein@google.com
ptal I bet a next improvement may be using faster ways to expand/compact ...
Description was changed from ========== speedup mip builders BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== speedup mip builders 1. push the inner-loop into the procs 2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air Before 9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512 9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511 After 9/9 MB 1 1.23ms 1.24ms 1.24ms 1.27ms 1% ▂▂▃▆█▁▃▃▁▁ nonrendering mipmap_build_512x512 10/10 MB 1 1.35ms 1.4ms 1.61ms 2.68ms 27% ▁▁▂▄█▁▁▁▁▃ nonrendering mipmap_build_511x512 10/10 MB 1 1.61ms 1.62ms 1.69ms 2.25ms 12% █▁▁▁▁▁▁▂▁▁ nonrendering mipmap_build_512x511 10/10 MB 1 1.7ms 1.83ms 2.1ms 3.5ms 29% ▁▁▁▁▁▅█▂▂▄ nonrendering mipmap_build_511x511 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1593073002/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1593073002/20001
The CQ bit was unchecked by reed@google.com
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1593073002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1593073002/40001
Awesome! https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp File src/core/SkMipMap.cpp (right): https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp#n... src/core/SkMipMap.cpp:430: } Instead of unrolling a full iteration, could we set up just the relevant values and then fall through a regular 0-based loop? auto c02 = F::Expand(p0[0]); auto c12 = F::Expand(p1[0]); for (int i = 0; i < count; ++i) { auto c00 = c02; auto c01 = F::Expand(p0[1]); c02 = F::Expand(p0[2]); auto c10 = c12; auto c11 = F::Expand(p1[1]); c12 = F::Expand(p1[2]); auto c = add_121(c00, c01, c02) + add_121(c10, c11, c12); d[i] = F::Compact(c >> 3); p0 += 2; p1 += 2; } https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp#n... src/core/SkMipMap.cpp:489: } Ditto.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was unchecked by reed@google.com
https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp File src/core/SkMipMap.cpp (right): https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp#n... src/core/SkMipMap.cpp:430: } On 2016/01/16 16:46:10, f(malita) wrote: > Instead of unrolling a full iteration, could we set up just the relevant values > and then fall through a regular 0-based loop? > > auto c02 = F::Expand(p0[0]); > auto c12 = F::Expand(p1[0]); > > for (int i = 0; i < count; ++i) { > auto c00 = c02; > auto c01 = F::Expand(p0[1]); > c02 = F::Expand(p0[2]); > > auto c10 = c12; > auto c11 = F::Expand(p1[1]); > c12 = F::Expand(p1[2]); > > auto c = add_121(c00, c01, c02) + add_121(c10, c11, c12); > d[i] = F::Compact(c >> 3); > > p0 += 2; > p1 += 2; > } Good catch! Thanks. https://codereview.chromium.org/1593073002/diff/20001/src/core/SkMipMap.cpp#n... src/core/SkMipMap.cpp:489: } On 2016/01/16 16:46:10, f(malita) wrote: > Ditto. Done.
Description was changed from ========== speedup mip builders 1. push the inner-loop into the procs 2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air Before 9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512 9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511 After 9/9 MB 1 1.23ms 1.24ms 1.24ms 1.27ms 1% ▂▂▃▆█▁▃▃▁▁ nonrendering mipmap_build_512x512 10/10 MB 1 1.35ms 1.4ms 1.61ms 2.68ms 27% ▁▁▂▄█▁▁▁▁▃ nonrendering mipmap_build_511x512 10/10 MB 1 1.61ms 1.62ms 1.69ms 2.25ms 12% █▁▁▁▁▁▁▂▁▁ nonrendering mipmap_build_512x511 10/10 MB 1 1.7ms 1.83ms 2.1ms 3.5ms 29% ▁▁▁▁▁▅█▂▂▄ nonrendering mipmap_build_511x511 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== speedup mip builders 1. push the inner-loop into the procs 2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air Before 9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512 9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511 After 10/10 MB 1 1.08ms 1.09ms 1.1ms 1.18ms 3% ▁▁▁▁▁▁▁█▃▁ nonrendering mipmap_build_512x512 10/10 MB 1 1.22ms 1.44ms 1.66ms 2.83ms 30% ▂▂▄▁▁▃█▅▂▁ nonrendering mipmap_build_511x512 10/10 MB 1 1.45ms 1.91ms 2.04ms 3.75ms 36% ▁▁▁▃▅█▃▂▂▂ nonrendering mipmap_build_512x511 10/10 MB 1 1.7ms 1.7ms 1.81ms 2.41ms 13% █▁▁▁▁▁▁▁▁▄ nonrendering mipmap_build_511x511 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1593073002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1593073002/60001
ptal
LGTM On 2016/01/16 15:54:54, reed1 wrote: > I bet a next improvement may be using faster ways to expand/compact ... Thinking of Blink possibly switching to mipmaps for HQ downscaling, and the horrible hack people use to make their images "Retina-ready" - it seems we may see a bunch of huge imgs scaled down by a factor of 2 exactly. One thing that occurred to me is we're always building the full mipmap table - but it's reasonable to assume we'll only need one (maybe two, for trilerp) level most of the time. Is there a fundamental reason why building needed levels lazily won't work?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
I'm been thinking a little bit about lazy scaling too. I don't see any problem per-se with it. We'll allocate the full tree the first time (I presume) and just have a way to mark a given level as having been already built. All that said, if the common level is the first, the time spent to build the rest is only 33% more (or perhaps even slightly less, as we'll get better memory-cache-line usage as we get smaller). And, this code is already gobs faster than the current HQ scaling cost. Anyway, I think that'd be a fine experiment, to see how complicated (or not) it makes the code.
I'm also interested in trying out tri-lerp, as something we could enable in HQ mode. Not sure how much it will help, but something to try. It will definitely be slower.
The CQ bit was checked by reed@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1593073002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1593073002/60001
Message was sent while issue was closed.
Description was changed from ========== speedup mip builders 1. push the inner-loop into the procs 2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air Before 9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512 9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511 After 10/10 MB 1 1.08ms 1.09ms 1.1ms 1.18ms 3% ▁▁▁▁▁▁▁█▃▁ nonrendering mipmap_build_512x512 10/10 MB 1 1.22ms 1.44ms 1.66ms 2.83ms 30% ▂▂▄▁▁▃█▅▂▁ nonrendering mipmap_build_511x512 10/10 MB 1 1.45ms 1.91ms 2.04ms 3.75ms 36% ▁▁▁▃▅█▃▂▂▂ nonrendering mipmap_build_512x511 10/10 MB 1 1.7ms 1.7ms 1.81ms 2.41ms 13% █▁▁▁▁▁▁▁▁▄ nonrendering mipmap_build_511x511 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== speedup mip builders 1. push the inner-loop into the procs 2. for the 3x3 and 3x2 cases, skip the redundant horizontal read/expand Some before/after timings (unfortunately there's a lot of variance) -- on MacBook Air Before 9/9 MB 1 1.45ms 1.65ms 1.92ms 2.93ms 31% ▂█▂▁▁▁▂█▆▂ nonrendering mipmap_build_512x512 9/9 MB 1 1.85ms 2.33ms 2.47ms 3.69ms 28% ▃██▄▃▂▁▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 2.15ms 2.21ms 2.37ms 3.28ms 15% █▂▂▁▁▁▁▅▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 2.74ms 3.9ms 4.03ms 5.89ms 25% ▄▂▃▄█▂▁▂▇▅ nonrendering mipmap_build_511x511 After 10/10 MB 1 1.08ms 1.09ms 1.1ms 1.18ms 3% ▁▁▁▁▁▁▁█▃▁ nonrendering mipmap_build_512x512 10/10 MB 1 1.22ms 1.44ms 1.66ms 2.83ms 30% ▂▂▄▁▁▃█▅▂▁ nonrendering mipmap_build_511x512 10/10 MB 1 1.45ms 1.91ms 2.04ms 3.75ms 36% ▁▁▁▃▅█▃▂▂▂ nonrendering mipmap_build_512x511 10/10 MB 1 1.7ms 1.7ms 1.81ms 2.41ms 13% █▁▁▁▁▁▁▁▁▄ nonrendering mipmap_build_511x511 BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/326253ef22d39e72a729e8069e54b34ade72ad1b ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://skia.googlesource.com/skia/+/326253ef22d39e72a729e8069e54b34ade72ad1b
Message was sent while issue was closed.
The numbers are still very noisy, but here's a bench run only building the first level... 9/9 MB 1 922µs 930µs 929µs 950µs 1% ▄▁▁▄▁▁▁▃█▃ nonrendering mipmap_build_512x512 9/9 MB 1 889µs 1.05ms 1.06ms 1.54ms 17% ▃▃▃▂▂█▃▂▁▁ nonrendering mipmap_build_511x512 9/9 MB 1 1.06ms 1.14ms 1.31ms 2.15ms 30% █▆▃▂▂▁▁▁▁▁ nonrendering mipmap_build_512x511 9/9 MB 1 1.28ms 1.69ms 1.65ms 2.54ms 22% ▃▃▄▃▃▂▁▁▁█ nonrendering mipmap_build_511x511 |