|
|
Descriptionleave pixel memory uninitialized for opaque alpha type in SkSurface::MakeRaster
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2412633002
Committed: https://skia.googlesource.com/skia/+/c64ef3563dc8badac3d64544513b03df826cf8c3
Patch Set 1 #
Messages
Total messages: 38 (11 generated)
Description was changed from ========== leave pixel memory uninitialized for opaque alpha type in SkSurface::MakeRaster BUG=skia: ========== to ========== leave pixel memory uninitialized for opaque alpha type in SkSurface::MakeRaster BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2412633002 ==========
lsalzman@mozilla.com changed reviewers: + reed@google.com
This saves the caller the cost of a double-clear for opaque alpha types where it is not appropriate to zero out alpha pixels, like is done for SkBitmapDevice. If opaque, the pixels are left uninitialized. Otherwise, they are zeroed out as before.
reed@google.com changed reviewers: + bsalomon@google.com, mtklein@google.com
mike/brian, are we ok to go back to the previous (raster at least) behavior of only zeroing the surface initially if the imageinfo is not opaque?
The CQ bit was checked by mtklein@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Sure. The important bit from the previous CL was replacing malloc()->bzero() with just calloc(). Beyond that, I think we just figured calloc() was usually negligibly more expensive than malloc(), and that it's easier to reason about zeroed memory than conditionally uninitialized memory. Just out of curiosity, how much faster does this make Firefox?
On 2016/10/12 00:32:43, mtklein_C wrote: > Sure. The important bit from the previous CL was replacing malloc()->bzero() > with just calloc(). Beyond that, I think we just figured calloc() was usually > negligibly more expensive than malloc(), and that it's easier to reason about > zeroed memory than conditionally uninitialized memory. > > Just out of curiosity, how much faster does this make Firefox? Don't have hard numbers on the performance impact right now because our Talos benchmark numbers are being thrown off by a more serious regression in blit_row_color32 (we are seeing 3.5x or more slowdown on some benchmarks just from this). Trying to track down why that alone is tanking our numbers so much before I can see what other dragons lurk below it.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/10/12 at 00:43:00, lsalzman wrote: > On 2016/10/12 00:32:43, mtklein_C wrote: > > Sure. The important bit from the previous CL was replacing malloc()->bzero() > > with just calloc(). Beyond that, I think we just figured calloc() was usually > > negligibly more expensive than malloc(), and that it's easier to reason about > > zeroed memory than conditionally uninitialized memory. > > > > Just out of curiosity, how much faster does this make Firefox? > > Don't have hard numbers on the performance impact right now because our Talos > benchmark numbers are being thrown off by a more serious regression in > blit_row_color32 (we are seeing 3.5x or more slowdown on some benchmarks just > from this). Trying to track down why that alone is tanking our numbers so much > before I can see what other dragons lurk below it. This proposed change makes both the implementation of and human interface to this code more complex and error prone. To land it without having measured its impact borders on superstition. Addressing these problems in order of descending significance may give us a clearer view on each change's value. What's up with your blit_row_color32()? Where's the time spent, how are you compiling it, and what does the disassembly look like? Are you sure you're seeing the time spent in blit_row_color32() and not SkBlitRow::Color32()? The two differ by a possible early-out into memmove() or sk_memset32(). It looks like the current implementation of sk_memset32() is a simple for loop right now, so we should probably check that it's auto-vectorizing well for you.
On 2016/10/12 03:03:51, mtklein_C wrote: > This proposed change makes both the implementation of and human interface to > this code more complex and error prone. To land it without having measured its > impact borders on superstition. > > Addressing these problems in order of descending significance may give us a > clearer view on each change's value. What's up with your blit_row_color32()? > Where's the time spent, how are you compiling it, and what does the disassembly > look like? Are you sure you're seeing the time spent in blit_row_color32() and > not SkBlitRow::Color32()? The two differ by a possible early-out into memmove() > or sk_memset32(). It looks like the current implementation of sk_memset32() is > a simple for loop right now, so we should probably check that it's > auto-vectorizing well for you. I don't think it is the source of what is complex and error prone to this particular case, because we're dealing with an opaque surface. To set its alpha values to zero is, in fact, an error, because then the surface is not actually opaque, and Skia will selectively treat it like it is not actually an opaque surface and read the alpha values sometimes, while at other times going down a code path that looks at the opaque alpha type and treats it as such. You could argue the fact that Skia's non-uniform handling of the kOpaque_SkAlphaType (for lack of kIgnore_SkAlphaType, which I still mourn) is the complex and error prone mechanism here. So in the efforts to work around this selective handling of kOpaque_SkAlphaType, I have to actually fill the alpha values with opaque values, instead of zero, which gets me into this nice mess. Now, as an alternative. I'd be just as happy to have it fill with Sk_ColorBLACK by default, since really that's what we're trying to do in the first place, if the objection is to just leaving it uninitialized, since really, that would be far more correct than the current status quo. In an ideal world, the alpha value wouldn't matter for kOpaque_SkAlphaType so a surface could legitimately be initialized to zero as such, but, well, that's not the scenario the library is presenting me to solve.
On 2016/10/12 at 04:36:20, lsalzman wrote: > On 2016/10/12 03:03:51, mtklein_C wrote: > > This proposed change makes both the implementation of and human interface to > > this code more complex and error prone. To land it without having measured its > > impact borders on superstition. > > > > Addressing these problems in order of descending significance may give us a > > clearer view on each change's value. What's up with your blit_row_color32()? > > Where's the time spent, how are you compiling it, and what does the disassembly > > look like? Are you sure you're seeing the time spent in blit_row_color32() and > > not SkBlitRow::Color32()? The two differ by a possible early-out into memmove() > > or sk_memset32(). It looks like the current implementation of sk_memset32() is > > a simple for loop right now, so we should probably check that it's > > auto-vectorizing well for you. > > I don't think it is the source of what is complex and error prone to this particular case, because we're dealing with an opaque surface. To set its alpha values to zero is, in fact, an error, because then the surface is not actually opaque, and Skia will selectively treat it like it is not actually an opaque surface and read the alpha values sometimes, while at other times going down a code path that looks at the opaque alpha type and treats it as such. You could argue the fact that Skia's non-uniform handling of the kOpaque_SkAlphaType (for lack of kIgnore_SkAlphaType, which I still mourn) is the complex and error prone mechanism here. > > So in the efforts to work around this selective handling of kOpaque_SkAlphaType, I have to actually fill the alpha values with opaque values, instead of zero, which gets me into this nice mess. Now, as an alternative. I'd be just as happy to have it fill with Sk_ColorBLACK by default, since really that's what we're trying to do in the first place, if the objection is to just leaving it uninitialized, since really, that would be far more correct than the current status quo. > > In an ideal world, the alpha value wouldn't matter for kOpaque_SkAlphaType so a surface could legitimately be initialized to zero as such, but, well, that's not the scenario the library is presenting me to solve. So, if we want black, why are we messing around with uninitialized? Let's just make an SkSurface::MakeRaster() that takes an SkColor? (Zeroing memory cannot possibly be less correct than leaving it uninitialized. What if that uninitialized memory was all zero already?)
On 2016/10/12 13:16:21, mtklein_C wrote: > On 2016/10/12 at 04:36:20, lsalzman wrote: > > On 2016/10/12 03:03:51, mtklein_C wrote: > > > This proposed change makes both the implementation of and human interface to > > > this code more complex and error prone. To land it without having measured > its > > > impact borders on superstition. > > > > > > Addressing these problems in order of descending significance may give us a > > > clearer view on each change's value. What's up with your > blit_row_color32()? > > > Where's the time spent, how are you compiling it, and what does the > disassembly > > > look like? Are you sure you're seeing the time spent in blit_row_color32() > and > > > not SkBlitRow::Color32()? The two differ by a possible early-out into > memmove() > > > or sk_memset32(). It looks like the current implementation of sk_memset32() > is > > > a simple for loop right now, so we should probably check that it's > > > auto-vectorizing well for you. > > > > I don't think it is the source of what is complex and error prone to this > particular case, because we're dealing with an opaque surface. To set its alpha > values to zero is, in fact, an error, because then the surface is not actually > opaque, and Skia will selectively treat it like it is not actually an opaque > surface and read the alpha values sometimes, while at other times going down a > code path that looks at the opaque alpha type and treats it as such. You could > argue the fact that Skia's non-uniform handling of the kOpaque_SkAlphaType (for > lack of kIgnore_SkAlphaType, which I still mourn) is the complex and error prone > mechanism here. > > > > So in the efforts to work around this selective handling of > kOpaque_SkAlphaType, I have to actually fill the alpha values with opaque > values, instead of zero, which gets me into this nice mess. Now, as an > alternative. I'd be just as happy to have it fill with Sk_ColorBLACK by default, > since really that's what we're trying to do in the first place, if the objection > is to just leaving it uninitialized, since really, that would be far more > correct than the current status quo. > > > > In an ideal world, the alpha value wouldn't matter for kOpaque_SkAlphaType so > a surface could legitimately be initialized to zero as such, but, well, that's > not the scenario the library is presenting me to solve. > > So, if we want black, why are we messing around with uninitialized? Let's just > make an SkSurface::MakeRaster() that takes an SkColor? > > (Zeroing memory cannot possibly be less correct than leaving it uninitialized. > What if that uninitialized memory was all zero already?) Okay, let me propose this then, we tack an SkColor argument onto the end of MakeRaster like so: static sk_sp<SkSurface> MakeRaster(const SkImageInfo&, size_t rowBytes, const SkSurfaceProps*, SkColor = 0); static sk_sp<SkSurface> MakeRaster(const SkImageInfo& info, const SkSurfaceProps* props = nullptr, SkColor color = 0) { MakeRaster(info, 0, props, color); } Could change MakeRasterN32Premul similarly. This way, existing users of these functions are unaffected because the color just defaults to zero and the argument order is preserved. When the color is 0, it will behave as usual, it zeroes everything, regardless of format, because that is what the color we're asking for explicitly is. Otherwise, it will use whatever color we ask to clear, regardless of format, which it basically is doing even if the color is zero, but zero will be special-cased to use the calloc route that it currently uses under the hood. So nothing gets left uninitialized, and you can initialize any format to any color you want.
On 2016/10/12 14:58:47, lsalzman1 wrote: > On 2016/10/12 13:16:21, mtklein_C wrote: > > On 2016/10/12 at 04:36:20, lsalzman wrote: > > > On 2016/10/12 03:03:51, mtklein_C wrote: > > > > This proposed change makes both the implementation of and human interface > to > > > > this code more complex and error prone. To land it without having > measured > > its > > > > impact borders on superstition. > > > > > > > > Addressing these problems in order of descending significance may give us > a > > > > clearer view on each change's value. What's up with your > > blit_row_color32()? > > > > Where's the time spent, how are you compiling it, and what does the > > disassembly > > > > look like? Are you sure you're seeing the time spent in > blit_row_color32() > > and > > > > not SkBlitRow::Color32()? The two differ by a possible early-out into > > memmove() > > > > or sk_memset32(). It looks like the current implementation of > sk_memset32() > > is > > > > a simple for loop right now, so we should probably check that it's > > > > auto-vectorizing well for you. > > > > > > I don't think it is the source of what is complex and error prone to this > > particular case, because we're dealing with an opaque surface. To set its > alpha > > values to zero is, in fact, an error, because then the surface is not actually > > opaque, and Skia will selectively treat it like it is not actually an opaque > > surface and read the alpha values sometimes, while at other times going down a > > code path that looks at the opaque alpha type and treats it as such. You could > > argue the fact that Skia's non-uniform handling of the kOpaque_SkAlphaType > (for > > lack of kIgnore_SkAlphaType, which I still mourn) is the complex and error > prone > > mechanism here. > > > > > > So in the efforts to work around this selective handling of > > kOpaque_SkAlphaType, I have to actually fill the alpha values with opaque > > values, instead of zero, which gets me into this nice mess. Now, as an > > alternative. I'd be just as happy to have it fill with Sk_ColorBLACK by > default, > > since really that's what we're trying to do in the first place, if the > objection > > is to just leaving it uninitialized, since really, that would be far more > > correct than the current status quo. > > > > > > In an ideal world, the alpha value wouldn't matter for kOpaque_SkAlphaType > so > > a surface could legitimately be initialized to zero as such, but, well, that's > > not the scenario the library is presenting me to solve. > > > > So, if we want black, why are we messing around with uninitialized? Let's > just > > make an SkSurface::MakeRaster() that takes an SkColor? > > > > (Zeroing memory cannot possibly be less correct than leaving it uninitialized. > > > What if that uninitialized memory was all zero already?) > > Okay, let me propose this then, we tack an SkColor argument onto the end of > MakeRaster like so: > > static sk_sp<SkSurface> MakeRaster(const SkImageInfo&, size_t rowBytes, const > SkSurfaceProps*, SkColor = 0); > static sk_sp<SkSurface> MakeRaster(const SkImageInfo& info, const > SkSurfaceProps* props = nullptr, SkColor color = 0) { MakeRaster(info, 0, props, > color); } > > Could change MakeRasterN32Premul similarly. > > This way, existing users of these functions are unaffected because the color > just defaults to zero and the argument order is preserved. > > When the color is 0, it will behave as usual, it zeroes everything, regardless > of format, because that is what the color we're asking for explicitly is. > > Otherwise, it will use whatever color we ask to clear, regardless of format, > which it basically is doing even if the color is zero, but zero will be > special-cased to use the calloc route that it currently uses under the hood. > > So nothing gets left uninitialized, and you can initialize any format to any > color you want. Yeah, that sounds ideal to me.
SkColor isn't very future-oriented, given colorspaces etc. I think just the opaque-check is reasonable and happens to address the current concern (perf). if (opaque) malloc else calloc
On 2016/10/12 at 16:54:18, reed wrote: > SkColor isn't very future-oriented, given colorspaces etc. > > I think just the opaque-check is reasonable and happens to address the current concern (perf). > > if (opaque) > malloc > else > calloc No, we don't know that this addresses the current concern. No one has timed this. We will be less bothered by spurious bugs from Chrome and other well-instrumented clients if we make sure all these raster surfaces are initialized to something. I don't think it's important how we communicate what to initialize it to: SkColor, SkColor4f + SkColorspace, an SkShader, whatever... that's an independent concern here.
Seems like any change is speculative at this point.
On 2016/10/12 at 17:04:08, reed wrote: > Seems like any change is speculative at this point. Yes, definitely agree with that. I have yet to see anything show that allocating a zeroed buffer then clearing manually is bad.
On 2016/10/12 17:15:34, mtklein_C wrote: > On 2016/10/12 at 17:04:08, reed wrote: > > Seems like any change is speculative at this point. > > Yes, definitely agree with that. I have yet to see anything show that > allocating a zeroed buffer then clearing manually is bad. Did some microbenchmarking with our allocator in situ. Malloc followed by memset, or calloc followed by memset. Results represent average speedup of using malloc instead of calloc over 1000 allocations: randomly chosen sizes between 50^2 to 250^2: 29% randomly chosen sizes between 100^2 to 600^2: 10% randomly chosen sizes between 300^2 to 1200^2: it's a wash 100^2: 32% 200^2: 40% 250^2: 50% 500^2: 39% 750^2: it's a wash 1000^2: it's a wash So, basically, the results are highly dependent on how the allocator is feeling. If the allocator decides to deal with the memory by decommitting it and remapping it, which clearly there is a crossover effect somewhere in the neighborhood of those 250^2 to 500^2 allocations, the calloc is already paid for and the performance is even. Smaller allocations which tend to be more serviced by pulling things out of bins are fairly strongly showing the cost of the extra superfluous zeroing. In random sample sizes the just larger overall allocation times for larger surfaces seem to dominate the results and tend to average away part of the cost of allocating the smaller ones, and possibly also affecting the decommit behavior. The random vs fixed allocation patterns seem to have a clear effect on this as well.
On 2016/10/12 at 18:53:47, lsalzman wrote: > On 2016/10/12 17:15:34, mtklein_C wrote: > > On 2016/10/12 at 17:04:08, reed wrote: > > > Seems like any change is speculative at this point. > > > > Yes, definitely agree with that. I have yet to see anything show that > > allocating a zeroed buffer then clearing manually is bad. > > Did some microbenchmarking with our allocator in situ. Malloc followed by memset, or calloc followed by memset. Results represent average speedup of using malloc instead of calloc over 1000 allocations: > > randomly chosen sizes between 50^2 to 250^2: 29% > randomly chosen sizes between 100^2 to 600^2: 10% > randomly chosen sizes between 300^2 to 1200^2: it's a wash > 100^2: 32% > 200^2: 40% > 250^2: 50% > 500^2: 39% > 750^2: it's a wash > 1000^2: it's a wash > > So, basically, the results are highly dependent on how the allocator is feeling. If the allocator decides to deal with the memory by decommitting it and remapping it, which clearly there is a crossover effect somewhere in the neighborhood of those 250^2 to 500^2 allocations, the calloc is already paid for and the performance is even. > > Smaller allocations which tend to be more serviced by pulling things out of bins are fairly strongly showing the cost of the extra superfluous zeroing. > > In random sample sizes the just larger overall allocation times for larger surfaces seem to dominate the results and tend to average away part of the cost of allocating the smaller ones, and possibly also affecting the decommit behavior. > > The random vs fixed allocation patterns seem to have a clear effect on this as well. Neat. What sort of affect does this have on Firefox's overall performance?
On 2016/10/12 18:56:28, mtklein_C wrote: > On 2016/10/12 at 18:53:47, lsalzman wrote: > > On 2016/10/12 17:15:34, mtklein_C wrote: > > > On 2016/10/12 at 17:04:08, reed wrote: > > > > Seems like any change is speculative at this point. > > > > > > Yes, definitely agree with that. I have yet to see anything show that > > > allocating a zeroed buffer then clearing manually is bad. > > > > Did some microbenchmarking with our allocator in situ. Malloc followed by > memset, or calloc followed by memset. Results represent average speedup of using > malloc instead of calloc over 1000 allocations: > > > > randomly chosen sizes between 50^2 to 250^2: 29% > > randomly chosen sizes between 100^2 to 600^2: 10% > > randomly chosen sizes between 300^2 to 1200^2: it's a wash > > 100^2: 32% > > 200^2: 40% > > 250^2: 50% > > 500^2: 39% > > 750^2: it's a wash > > 1000^2: it's a wash > > > > So, basically, the results are highly dependent on how the allocator is > feeling. If the allocator decides to deal with the memory by decommitting it and > remapping it, which clearly there is a crossover effect somewhere in the > neighborhood of those 250^2 to 500^2 allocations, the calloc is already paid for > and the performance is even. > > > > Smaller allocations which tend to be more serviced by pulling things out of > bins are fairly strongly showing the cost of the extra superfluous zeroing. > > > > In random sample sizes the just larger overall allocation times for larger > surfaces seem to dominate the results and tend to average away part of the cost > of allocating the smaller ones, and possibly also affecting the decommit > behavior. > > > > The random vs fixed allocation patterns seem to have a clear effect on this as > well. > > Neat. What sort of affect does this have on Firefox's overall performance? Harder to see on the average macrobenchmark as they are not necessarily exercising the issue in such a concentrated manner and are dominated by other performance bottlenecks.
> Harder to see on the average macrobenchmark as they are not necessarily exercising the issue in such a concentrated manner and are dominated by other performance bottlenecks. So, why is this important to do?
On 2016/10/12 19:56:14, mtklein_C wrote: > > Harder to see on the average macrobenchmark as they are not necessarily > exercising the issue in such a concentrated manner and are dominated by other > performance bottlenecks. > > So, why is this important to do? Because there is a regression in performance in the interface I am providing via Skia to the rest of the Firefox project, and I'd rather not do that. My job is to make things faster where I can, which enables people higher up in the stack to do as they want with fewer worries. It's pretty trivial to fix, and would simplify my life downstream to have, so I did not imagine this would turn into another controversial feature. And overall, it is an improvement to the behavior to at least let me initialize to something other than zero, since the initialization to zero behavior is measurably less useful than just being uninitialized. It is not safe to leave the surface zeroed out anyway, and arguably, I am now paying a cost for it in my corner of the world, either in performance or in code complexity to avoid that on our end.
Lets try landing this change as is, and see if it sticks. We can certainly entertain other variations in the future if we want.
The CQ bit was checked by mtklein@chromium.org
lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: skia_presubmit-Trybot on master.client.skia.fyi (JOB_FAILED, http://build.chromium.org/p/client.skia.fyi/builders/skia_presubmit-Trybot/bu...)
The CQ bit was checked by reed@google.com
lgtm
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== leave pixel memory uninitialized for opaque alpha type in SkSurface::MakeRaster BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2412633002 ========== to ========== leave pixel memory uninitialized for opaque alpha type in SkSurface::MakeRaster BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search?issue=2412633002 Committed: https://skia.googlesource.com/skia/+/c64ef3563dc8badac3d64544513b03df826cf8c3 ==========
Message was sent while issue was closed.
Committed patchset #1 (id:1) as https://skia.googlesource.com/skia/+/c64ef3563dc8badac3d64544513b03df826cf8c3
Message was sent while issue was closed.
On 2016/10/12 23:50:19, commit-bot: I haz the power wrote: > Committed patchset #1 (id:1) as > https://skia.googlesource.com/skia/+/c64ef3563dc8badac3d64544513b03df826cf8c3 Thank you
Message was sent while issue was closed.
A revert of this CL (patchset #1 id:1) has been created in https://codereview.chromium.org/2421473003/ by mtklein@google.com. The reason for reverting is: Reverting while we think about skia:5854 so the bot doesn't regress further.. |