|
|
DescriptionSkSwizzler: Factor skipping zeros out into a helper function.
I figure something like this lets us not worry about it in the new opts.
This skips only leading zeros per-scanline, not all zeros, but my bet is that
leading zeros are all that matters: it's got to be rare that a scanline is both
larger than 1024 pixels and has runs of 1024 transparent pixels in the middle.
I bet the big bang for the buck comes from skipping full scanlines (or even
multiple adjacent scanlines).
BUG=skia:4767
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1566653007
Committed: https://skia.googlesource.com/skia/+/8604ca2d84cfa9527705c69d74a8449b532e0ee4
Patch Set 1 #
Total comments: 1
Patch Set 2 : split in two #Patch Set 3 : back to 1 #Patch Set 4 : rebase, back to PS 1 strategy with note #Patch Set 5 : clearer #Messages
Total messages: 32 (9 generated)
Description was changed from ========== SkSwizzler: Factor skipping zeros out into a helper function. I figure something like this lets us not worry about it in the new opts. This skips only leading zeros per-scanline, not all zeros, but my bet is that leading zeros are all that matters: it's got to be rare that a scanline is both larger than 1024 pixels and has runs of 1024 transparent pixels in the middle. I bet the big bang for the buck comes from skipping full scanlines (or even multiple adjacent scanlines). BUG=skia:4767 ========== to ========== SkSwizzler: Factor skipping zeros out into a helper function. I figure something like this lets us not worry about it in the new opts. This skips only leading zeros per-scanline, not all zeros, but my bet is that leading zeros are all that matters: it's got to be rare that a scanline is both larger than 1024 pixels and has runs of 1024 transparent pixels in the middle. I bet the big bang for the buck comes from skipping full scanlines (or even multiple adjacent scanlines). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by mtklein@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1566653007/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1566653007/1
mtklein@chromium.org changed reviewers: + msarett@google.com, scroggo@google.com - mtklein@google.com
Seem sane? How do I best test / bench this?
On 2016/01/08 21:21:40, mtklein_C wrote: > Seem sane? Yes. LGTM. > How do I best test / bench this? :( Currently untested, except by running on Android. We should add a test in DM that uses calloc and then uses kYes_ZeroInitialized. There is a benchmark called SkipZeroesBench which tests using SkImageDecoder. We should update it to use SkCodec. https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp File src/codec/SkSwizzler.cpp (right): https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { FWIW, when we are premultiplying, we really only need to look at the alpha value, since if the rest of them will be zero in the output whatever they are. I don't know that that helps any, or that there are actually images encoded like that.
> https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > FWIW, when we are premultiplying, we really only need to look at the alpha > value, since if the rest of them will be zero in the output whatever they are. I > don't know that that helps any, or that there are actually images encoded like > that. Ah, I see, this won't skip 0x00FFFFFF even though it could. It does seem weird to encode transparent as anything but 0x00000000. It's probably no faster or slower right now to test the pixel or to test alpha, but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 pixels, 128-bit reads of 4 pixels).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/01/08 21:41:34, mtklein wrote: > > > https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > > FWIW, when we are premultiplying, we really only need to look at the alpha > > value, since if the rest of them will be zero in the output whatever they are. > I > > don't know that that helps any, or that there are actually images encoded like > > that. > > Ah, I see, this won't skip 0x00FFFFFF even though it could. It does seem weird > to encode transparent as anything but 0x00000000. > It's probably no faster or slower right now to test the pixel or to test alpha, > but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 pixels, > 128-bit reads of 4 pixels). Let's just check the alpha for now. Then, as next steps, we can look into: How much faster can we go if we check the whole pixel for zero? Are there actually PNGs encoded with zeto alpha and non-zero components? - There are always "some", but maybe it's not common.
On 2016/01/08 21:46:28, msarett wrote: > On 2016/01/08 21:41:34, mtklein wrote: > > > > > > https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > > > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > > > FWIW, when we are premultiplying, we really only need to look at the alpha > > > value, since if the rest of them will be zero in the output whatever they > are. > > I > > > don't know that that helps any, or that there are actually images encoded > like > > > that. > > > > Ah, I see, this won't skip 0x00FFFFFF even though it could. It does seem > weird > > to encode transparent as anything but 0x00000000. > > It's probably no faster or slower right now to test the pixel or to test > alpha, > > but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 pixels, > > 128-bit reads of 4 pixels). > > Let's just check the alpha for now. > > Then, as next steps, we can look into: > How much faster can we go if we check the whole pixel for zero? > Are there actually PNGs encoded with zeto alpha and non-zero components? > - There are always "some", but maybe it's not common. OK. That means I've got to drop the unpremul case, right?
On 2016/01/08 21:53:47, mtklein wrote: > On 2016/01/08 21:46:28, msarett wrote: > > On 2016/01/08 21:41:34, mtklein wrote: > > > > > > > > > > https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > > > > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > > > > FWIW, when we are premultiplying, we really only need to look at the alpha > > > > value, since if the rest of them will be zero in the output whatever they > > are. > > > I > > > > don't know that that helps any, or that there are actually images encoded > > like > > > > that. > > > > > > Ah, I see, this won't skip 0x00FFFFFF even though it could. It does seem > > weird > > > to encode transparent as anything but 0x00000000. > > > It's probably no faster or slower right now to test the pixel or to test > > alpha, > > > but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 > pixels, > > > 128-bit reads of 4 pixels). > > > > Let's just check the alpha for now. > > > > Then, as next steps, we can look into: > > How much faster can we go if we check the whole pixel for zero? > > Are there actually PNGs encoded with zeto alpha and non-zero components? > > - There are always "some", but maybe it's not common. > > OK. That means I've got to drop the unpremul case, right? Actually, just realized, the full pixel test is correct for UPM, the alpha byte test is correct for PM.... so, two functions. Duh.
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1566653007/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1566653007/20001
On 2016/01/08 22:01:31, mtklein wrote: > On 2016/01/08 21:53:47, mtklein wrote: > > On 2016/01/08 21:46:28, msarett wrote: > > > On 2016/01/08 21:41:34, mtklein wrote: > > > > > > > > > > > > > > > https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > > > > > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > > > > > FWIW, when we are premultiplying, we really only need to look at the > alpha > > > > > value, since if the rest of them will be zero in the output whatever > they > > > are. > > > > I > > > > > don't know that that helps any, or that there are actually images > encoded > > > like > > > > > that. > > > > > > > > Ah, I see, this won't skip 0x00FFFFFF even though it could. It does seem > > > weird > > > > to encode transparent as anything but 0x00000000. > > > > It's probably no faster or slower right now to test the pixel or to test > > > alpha, > > > > but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 > > pixels, > > > > 128-bit reads of 4 pixels). > > > > > > Let's just check the alpha for now. > > > > > > Then, as next steps, we can look into: > > > How much faster can we go if we check the whole pixel for zero? > > > Are there actually PNGs encoded with zeto alpha and non-zero components? > > > - There are always "some", but maybe it's not common. > > > > OK. That means I've got to drop the unpremul case, right? > > Actually, just realized, the full pixel test is correct for UPM, the alpha byte > test is correct for PM.... so, two functions. Duh. sgtm I think we have the option to not check zero init for Unpremul? Since we have never done it, it would not be a regression? Also, I'm not convinced that it matters what we put in RGB for an Unpremul pixel where A=0. But I think someone has explained this to me before?
On 2016/01/08 22:05:15, msarett wrote: > On 2016/01/08 22:01:31, mtklein wrote: > > On 2016/01/08 21:53:47, mtklein wrote: > > > On 2016/01/08 21:46:28, msarett wrote: > > > > On 2016/01/08 21:41:34, mtklein wrote: > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1566653007/diff/1/src/codec/SkSwizzler.cpp#ne... > > > > > > src/codec/SkSwizzler.cpp:474: while (dstWidth > 0 && *src32 == 0) { > > > > > > FWIW, when we are premultiplying, we really only need to look at the > > alpha > > > > > > value, since if the rest of them will be zero in the output whatever > > they > > > > are. > > > > > I > > > > > > don't know that that helps any, or that there are actually images > > encoded > > > > like > > > > > > that. > > > > > > > > > > Ah, I see, this won't skip 0x00FFFFFF even though it could. It does > seem > > > > weird > > > > > to encode transparent as anything but 0x00000000. > > > > > It's probably no faster or slower right now to test the pixel or to test > > > > alpha, > > > > > but testing the pixel is easier to make faster (e.g. 64-bit reads of 2 > > > pixels, > > > > > 128-bit reads of 4 pixels). > > > > > > > > Let's just check the alpha for now. > > > > > > > > Then, as next steps, we can look into: > > > > How much faster can we go if we check the whole pixel for zero? > > > > Are there actually PNGs encoded with zeto alpha and non-zero components? > > > > - There are always "some", but maybe it's not common. > > > > > > OK. That means I've got to drop the unpremul case, right? > > > > Actually, just realized, the full pixel test is correct for UPM, the alpha > byte > > test is correct for PM.... so, two functions. Duh. > > sgtm > > I think we have the option to not check zero init for Unpremul? Since we have > never done it, it would not be a regression? > > Also, I'm not convinced that it matters what we put in RGB for an Unpremul pixel > where A=0. But I think someone has explained this to me before? You're suggesting do what the commented out hack did, read the alpha and skip the whole pixel if it's 0? E.g. skip 0x00FFFFFF even for UPM output, effectively transforming that into 0x00000000? That makes sense to me, especially because the client has a way to turn it off (stop setting the zero-intialized bit in Options).
PS3 puts things back to 1 function checking only alpha, applied to UPM and PM. Won't land this until next week so we can talk it over and make some perf measurements.
I think this is a great change! "What are the effects of only skipping leading zeros?" I'm sure there are images where there are 1024 consecutive zero pixels (not in the same row). I have some test images that look that way. But this will certainly catch the egregious cases. I'm perfectly fine with that and I hope Android will be too. Leon, any ideas how to measure memory usage? Can you link the old bug? In terms of performance, I think this is awesome for two reasons: (1) Many of my test pngs are opaque figures on a transparent background. We may actually make performance gains by skipping the premul on the transparent backgrounds. (now we need to figure out if we can efficiently skip the premul on opaque pixels :)) (2) This allows us to use the accelerated premultiply after we have determined that the row has non-zero pixels. "How do we test?" Mike, thanks for improving CodecBench. I'll add something to DM on Monday (since we'll iterating on this). But that doesn't need to hold this CL up. I've shared some RGBA encoded PNGs that I think are useful for benching. LGTM
On 2016/01/08 22:19:49, mtklein_C wrote: > PS3 puts things back to 1 function checking only alpha, applied to UPM and PM. > > Won't land this until next week so we can talk it over and make some perf > measurements. Yes we can talk. I don't feel strongly about the UPM zero checking. Sorry if I made you back and forth.
On 2016/01/08 at 22:24:00, msarett wrote: > On 2016/01/08 22:19:49, mtklein_C wrote: > > PS3 puts things back to 1 function checking only alpha, applied to UPM and PM. > > > > Won't land this until next week so we can talk it over and make some perf > > measurements. > > Yes we can talk. I don't feel strongly about the UPM zero checking. > > Sorry if I made you back and forth. I've enjoyed creating multiple viable options, one per patch set. :)
On 2016/01/08 22:22:18, msarett wrote: > I think this is a great change! > > "What are the effects of only skipping leading zeros?" > > I'm sure there are images where there are 1024 consecutive zero pixels (not in > the same row). I have some test images that look that way. Oh, of course - the actual image is in the center, so the transparent pixels from the right side of the actual image to the left side on the next line ends up being more than a page. So maybe this is not good enough.... > > But this will certainly catch the egregious cases. I'm perfectly fine with that > and I hope Android will be too. Leon, any ideas how to measure memory usage? In order to measure memory usage, I'd imagine we need to use or mimic the system on Android that shares the pages which are never dirtied. In the original bug, romlem@ (who originally requested the feature) first reported that it saved 8-9 MB during startup on an occam-svelte (https://b.corp.google.com/u/0/issues/10016979#comment32), and later reported the difference (in KSM) to be about 1745 pages (https://b.corp.google.com/u/0/issues/10016979#comment38). I'm not sure how he measured. > Can you link the old bug? b/10016979 > > In terms of performance, I think this is awesome for two reasons: > (1) Many of my test pngs are opaque figures on a transparent background. We may > actually make performance gains by skipping the premul on the transparent > backgrounds. (now we need to figure out if we can efficiently skip the premul > on opaque pixels :)) What do you have in mind for skipping the premul on opaque pixels? FWIW, SkPremultiplyARGBInline (thanks for finding that, mtklein@!) already checks for a == 255. > (2) This allows us to use the accelerated premultiply after we have determined > that the row has non-zero pixels. > > "How do we test?" > > Mike, thanks for improving CodecBench. I'll add something to DM on Monday > (since we'll iterating on this). But that doesn't need to hold this CL up. > > I've shared some RGBA encoded PNGs that I think are useful for benching. > > LGTM
On 2016/01/11 14:47:02, scroggo wrote: > On 2016/01/08 22:22:18, msarett wrote: > > I think this is a great change! > > > > "What are the effects of only skipping leading zeros?" > > > > I'm sure there are images where there are 1024 consecutive zero pixels (not in > > the same row). I have some test images that look that way. > > Oh, of course - the actual image is in the center, so the transparent pixels > from the right side of the actual image to the left side on the next line ends > up being more than a page. So maybe this is not good enough.... > > > > > But this will certainly catch the egregious cases. I'm perfectly fine with > that > > and I hope Android will be too. Leon, any ideas how to measure memory usage? > > In order to measure memory usage, I'd imagine we need to use or mimic the system > on Android that shares the pages which are never dirtied. In the original bug, > romlem@ (who originally requested the feature) first reported that it saved 8-9 > MB during startup on an occam-svelte > (https://b.corp.google.com/u/0/issues/10016979#comment32), and later reported > the difference (in KSM) to be about 1745 pages > (https://b.corp.google.com/u/0/issues/10016979#comment38). I'm not sure how he > measured. > > > Can you link the old bug? > > b/10016979 > > > > > In terms of performance, I think this is awesome for two reasons: > > (1) Many of my test pngs are opaque figures on a transparent background. We > may > > actually make performance gains by skipping the premul on the transparent > > backgrounds. (now we need to figure out if we can efficiently skip the premul > > on opaque pixels :)) > > What do you have in mind for skipping the premul on opaque pixels? FWIW, > SkPremultiplyARGBInline (thanks for finding that, mtklein@!) already checks for > a == 255. I think in SIMD it will be cheaper to always do the multiply rather than check for 255 and then decide whether or not to multiply. This is just an idea if there happens to be a fast way to check for 255 (SSE has one - still not sure if it's worth using though). > > > (2) This allows us to use the accelerated premultiply after we have determined > > that the row has non-zero pixels. > > > > "How do we test?" > > > > Mike, thanks for improving CodecBench. I'll add something to DM on Monday > > (since we'll iterating on this). But that doesn't need to hold this CL up. > > > > I've shared some RGBA encoded PNGs that I think are useful for benching. > > > > LGTM
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1566653007/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1566653007/80001
ok guys, ptal PS 5 is rebased and back to its original strategy: check all 4 bytes for zero, both for unpremul output (perfect) and for premul output (false negatives on weird transparent pixels like 0x00ABCDEF)
lgtm
On 2016/01/11 21:07:08, msarett wrote: > lgtm lgtm 2
The CQ bit was unchecked by mtklein@chromium.org
The CQ bit was checked by mtklein@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1566653007/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1566653007/80001
Message was sent while issue was closed.
Description was changed from ========== SkSwizzler: Factor skipping zeros out into a helper function. I figure something like this lets us not worry about it in the new opts. This skips only leading zeros per-scanline, not all zeros, but my bet is that leading zeros are all that matters: it's got to be rare that a scanline is both larger than 1024 pixels and has runs of 1024 transparent pixels in the middle. I bet the big bang for the buck comes from skipping full scanlines (or even multiple adjacent scanlines). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== SkSwizzler: Factor skipping zeros out into a helper function. I figure something like this lets us not worry about it in the new opts. This skips only leading zeros per-scanline, not all zeros, but my bet is that leading zeros are all that matters: it's got to be rare that a scanline is both larger than 1024 pixels and has runs of 1024 transparent pixels in the middle. I bet the big bang for the buck comes from skipping full scanlines (or even multiple adjacent scanlines). BUG=skia:4767 GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... Committed: https://skia.googlesource.com/skia/+/8604ca2d84cfa9527705c69d74a8449b532e0ee4 ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://skia.googlesource.com/skia/+/8604ca2d84cfa9527705c69d74a8449b532e0ee4 |