Issue 2035793002: Add color correction benchmark - with comparison to qcms

msarett

Description was changed from ========== Add color correction benchmark - with comparison to qcms --colorImages ...

4 years, 6 months ago (2016-06-02 16:17:07 UTC) #1

msarett

msarett@google.com changed reviewers: + scroggo@google.com

4 years, 6 months ago (2016-06-02 16:18:20 UTC) #2

scroggo

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#newcode14 bench/ColorCodecBench.cpp:14: #if !defined(GOOGLE3) nit: My general preference would be to ...

4 years, 6 months ago (2016-06-02 18:51:50 UTC) #4

msarett

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#newcode14 bench/ColorCodecBench.cpp:14: #if !defined(GOOGLE3) On 2016/06/02 18:51:50, scroggo wrote: > nit: ...

4 years, 6 months ago (2016-06-02 19:34:40 UTC) #5

scroggo

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#newcode124 bench/ColorCodecBench.cpp:124: // Transform in place On 2016/06/02 19:34:39, msarett wrote: ...

4 years, 6 months ago (2016-06-02 20:42:31 UTC) #6

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#n...
bench/ColorCodecBench.cpp:124: // Transform in place
On 2016/06/02 19:34:39, msarett wrote:
> On 2016/06/02 18:51:50, scroggo wrote:
> > Why is this in place, but not some others?
> 
> I decided to time the "xform only" benches in place.  I think this makes more
> sense than keeping an entire src buffer and dst buffer.  I'm not sure if the
> extra memory pressure would impact the timing (probably not), but it doesn't
> match Chrome's use case (where they only have a single dst buffer).
> 
> When we time the decode and the xform together, we can do exactly what chrome
> does.  First decode into a row buffer, then xform to the dst buffer.
> 
> If this is too strange, I think we can make both src->dst without worrying too
> much about it changing anything.

Looking at this a little harder, won't the second iteration transform the output
from the first? And the third one will transform the output from the second?
Don't you want to transform the same/original data each time?

FWIW, I don't think you ever need an entire dst buffer - you only need one row,
right? We're never going to look at the output of this test, so can you just
write over it? Or might the transformation do some blending?

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:62: codec->getScanlines(srcRow, 1, 0);
This will be awfully slow on interlaced PNG, and the output will be upside
down/scrambled for bottom-up BMP/interlaced GIF. The latter may be fine for this
benchmark, since you're not looking at the output, but it does make me think
about how color transforms will interact with incrementalDecode...

One nice thing about incrementalDecode is that the client doesn't need to care
about the SkScanlineOrder - the codec knows the whole block of destination
memory and writes to it in the right place.

But if the client doesn't know what lines were written, they won't know which
lines to transform.

Some possible solutions:
- do the transformation on every row
  - inefficient when the image is partially decoded, since we'll transform
     each row on each pass
- make incrementalDecode() somehow report the rows it decoded into
  - also helps with the filling problem
  - but I don't yet know what that API would look like
- pass the destination space to the codec, so it can do the transformation as it
  decodes. (This is more or less what Chrome does now, right?)
  - this seems the cleanest to me, but I thought we had reasons not to do it
    that way?
    - one is that the same image may be drawn to different destination spaces
      - e.g. two different monitors with different profiles

Generally it's not clear to me when the transformation will be applied, and
whether or not it will be in place. Maybe I'm missing something that makes this
simpler...?

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:183: sk_sp<SkColorSpace> dstSpace = nullptr;
Why not do all this setup in onDelayedSetup? If nanobench decides it only needs
one loop to time this benchmark, this will still be included in the time.

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:198: fProc(fEncoded.get(), fDst.get(), fSrcRow.get(),
fInfo, dstProfile,
What if you made fProc call a member method? Then you wouldn't need to pass 6
parameters, and you wouldn't need to be cast dstProfile. OTOH, then fProc could
mess with all the member variables in ways you do not intend.

msarett

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#newcode124 bench/ColorCodecBench.cpp:124: // Transform in place On 2016/06/02 20:42:31, scroggo wrote: ...

4 years, 6 months ago (2016-06-02 22:17:47 UTC) #7

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/1/bench/ColorCodecBench.cpp#n...
bench/ColorCodecBench.cpp:124: // Transform in place
On 2016/06/02 20:42:31, scroggo wrote:
> On 2016/06/02 19:34:39, msarett wrote:
> > On 2016/06/02 18:51:50, scroggo wrote:
> > > Why is this in place, but not some others?
> > 
> > I decided to time the "xform only" benches in place.  I think this makes
more
> > sense than keeping an entire src buffer and dst buffer.  I'm not sure if the
> > extra memory pressure would impact the timing (probably not), but it doesn't
> > match Chrome's use case (where they only have a single dst buffer).
> > 
> > When we time the decode and the xform together, we can do exactly what
chrome
> > does.  First decode into a row buffer, then xform to the dst buffer.
> > 
> > If this is too strange, I think we can make both src->dst without worrying
too
> > much about it changing anything.
> 
> Looking at this a little harder, won't the second iteration transform the
output
> from the first? And the third one will transform the output from the second?
> Don't you want to transform the same/original data each time?
> 

You're right, I didn't think past the first loop iteration...  I'll decode into
a srcBuffer and then xform into the dstBuffer.

> FWIW, I don't think you ever need an entire dst buffer - you only need one
row,
> right? We're never going to look at the output of this test, so can you just
> write over it? Or might the transformation do some blending?

You're right.  I think I'll keep the dstBuffer though, just to match the actual
use case as close as we can.

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:62: codec->getScanlines(srcRow, 1, 0);
On 2016/06/02 20:42:31, scroggo wrote:
> This will be awfully slow on interlaced PNG, and the output will be upside
> down/scrambled for bottom-up BMP/interlaced GIF. The latter may be fine for
this
> benchmark, since you're not looking at the output, but it does make me think
> about how color transforms will interact with incrementalDecode...
> 
> One nice thing about incrementalDecode is that the client doesn't need to care
> about the SkScanlineOrder - the codec knows the whole block of destination
> memory and writes to it in the right place.
> 
> But if the client doesn't know what lines were written, they won't know which
> lines to transform.
> 
> Some possible solutions:
> - do the transformation on every row
>   - inefficient when the image is partially decoded, since we'll transform
>      each row on each pass
> - make incrementalDecode() somehow report the rows it decoded into
>   - also helps with the filling problem
>   - but I don't yet know what that API would look like
> - pass the destination space to the codec, so it can do the transformation as
it
>   decodes. (This is more or less what Chrome does now, right?)
>   - this seems the cleanest to me, but I thought we had reasons not to do it
>     that way?
>     - one is that the same image may be drawn to different destination spaces
>       - e.g. two different monitors with different profiles
> 
> Generally it's not clear to me when the transformation will be applied, and
> whether or not it will be in place. Maybe I'm missing something that makes
this
> simpler...?

"pass the destination space to the codec, so it can do the transformation as it
decodes. (This is more or less what Chrome does now, right?)"

Yes I think this needs to happen inside the codec, the way Chrome does it right
now.  We already have a way to pass the destination space to the codec (since
there is an SkColorSpace on SkImageInfo).

Maybe I'm wrong to write benchmarks before it's integrated with our codecs. 
Because maybe we'll need new ones after...  But I'm actually thinking about
integrating with Chrome's codecs first.  I think this is a good way to measure
the impact that we'll have there (starting with jpeg in particular).

I'm really far away from thinking about BMP and GIF, I don't think anybody color
corrects those anyway.

Interlaced PNG/JPEG is an interesting thought.  It looks Chrome will redo the
correction every time a row is updated.

-------------------------------------------

I also think the color xtransform logic needs to exist outside of the codecs,
for other uses.  Which I guess is why it's floating around in src/core right
now.

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:183: sk_sp<SkColorSpace> dstSpace = nullptr;
On 2016/06/02 20:42:31, scroggo wrote:
> Why not do all this setup in onDelayedSetup? If nanobench decides it only
needs
> one loop to time this benchmark, this will still be included in the time.

Even better.  Done.

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:198: fProc(fEncoded.get(), fDst.get(), fSrcRow.get(),
fInfo, dstProfile,
On 2016/06/02 20:42:31, scroggo wrote:
> What if you made fProc call a member method? Then you wouldn't need to pass 6
> parameters, and you wouldn't need to be cast dstProfile. OTOH, then fProc
could
> mess with all the member variables in ways you do not intend.

I've made this change, and I think it makes things a lot cleaner :)

scroggo

lgtm https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.cpp#newcode62 bench/ColorCodecBench.cpp:62: codec->getScanlines(srcRow, 1, 0); On 2016/06/02 22:17:47, msarett wrote: ...

4 years, 6 months ago (2016-06-03 14:25:37 UTC) #8

lgtm

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/20001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:62: codec->getScanlines(srcRow, 1, 0);
On 2016/06/02 22:17:47, msarett wrote:
> On 2016/06/02 20:42:31, scroggo wrote:
> > This will be awfully slow on interlaced PNG, and the output will be upside
> > down/scrambled for bottom-up BMP/interlaced GIF. The latter may be fine for
> this
> > benchmark, since you're not looking at the output, but it does make me think
> > about how color transforms will interact with incrementalDecode...
> > 
> > One nice thing about incrementalDecode is that the client doesn't need to
care
> > about the SkScanlineOrder - the codec knows the whole block of destination
> > memory and writes to it in the right place.
> > 
> > But if the client doesn't know what lines were written, they won't know
which
> > lines to transform.
> > 
> > Some possible solutions:
> > - do the transformation on every row
> >   - inefficient when the image is partially decoded, since we'll transform
> >      each row on each pass
> > - make incrementalDecode() somehow report the rows it decoded into
> >   - also helps with the filling problem
> >   - but I don't yet know what that API would look like
> > - pass the destination space to the codec, so it can do the transformation
as
> it
> >   decodes. (This is more or less what Chrome does now, right?)
> >   - this seems the cleanest to me, but I thought we had reasons not to do it
> >     that way?
> >     - one is that the same image may be drawn to different destination
spaces
> >       - e.g. two different monitors with different profiles
> > 
> > Generally it's not clear to me when the transformation will be applied, and
> > whether or not it will be in place. Maybe I'm missing something that makes
> this
> > simpler...?
> 
> "pass the destination space to the codec, so it can do the transformation as
it
> decodes. (This is more or less what Chrome does now, right?)"
> 
> Yes I think this needs to happen inside the codec, the way Chrome does it
right
> now. 

Whew! That makes the most sense to me, with the caveat of I don't know what to
do if the same image is used on multiple screens with different properties.
That's probably an uncommon use case, though (and we don't handle it today). The
simplest approach will be to decode twice, which is not perfect, but maybe it's
okay.

> We already have a way to pass the destination space to the codec (since
> there is an SkColorSpace on SkImageInfo).

Of course! I had been trying to wrap my head around putting the color space in
the image info, but I think this demonstrates why it makes sense.

> 
> Maybe I'm wrong to write benchmarks before it's integrated with our codecs. 
> Because maybe we'll need new ones after... 

No, I think it does make sense to write these benchmarks now. Even if it changes
later you can find out how to improve the current code, which will likely still
apply.

> But I'm actually thinking about
> integrating with Chrome's codecs first.  I think this is a good way to measure
> the impact that we'll have there (starting with jpeg in particular).

I think that's the right approach. SkCodec still needs to finish incremental
decoding and to support animation. And there's still some Chromium plumbing work
to be done.

> 
> I'm really far away from thinking about BMP and GIF, I don't think anybody
color
> corrects those anyway.

Probably not, although we'll want to do it eventually.

> 
> Interlaced PNG/JPEG is an interesting thought.  It looks Chrome will redo the
> correction every time a row is updated.
> 
> -------------------------------------------
> 
> I also think the color xtransform logic needs to exist outside of the codecs,
> for other uses.  Which I guess is why it's floating around in src/core right
> now.

sgtm

https://codereview.chromium.org/2035793002/diff/40001/bench/ColorCodecBench.cpp
File bench/ColorCodecBench.cpp (right):

https://codereview.chromium.org/2035793002/diff/40001/bench/ColorCodecBench.c...
bench/ColorCodecBench.cpp:21: , fDstSpaceQCMS(nullptr)
This method is only declared if !GOOGLE3, so this needs to do the same.

msarett

https://codereview.chromium.org/2035793002/diff/40001/bench/ColorCodecBench.cpp File bench/ColorCodecBench.cpp (right): https://codereview.chromium.org/2035793002/diff/40001/bench/ColorCodecBench.cpp#newcode21 bench/ColorCodecBench.cpp:21: , fDstSpaceQCMS(nullptr) On 2016/06/03 14:25:36, scroggo wrote: > This ...

4 years, 6 months ago (2016-06-03 14:46:36 UTC) #9

msarett

The CQ bit was checked by msarett@google.com to run a CQ dry run

4 years, 6 months ago (2016-06-03 14:46:41 UTC) #10