Issue 2697413002: color: Add analytic transfer functions in shaders

ccameron

Description was changed from ========== color: Add analytic transfer functions Create a separate ColorTransformStep for ...

3 years, 10 months ago (2017-02-16 22:43:44 UTC) #1

ccameron

ccameron@chromium.org changed reviewers: + hubbe@chromium.org

3 years, 10 months ago (2017-02-16 22:43:44 UTC) #2

ccameron

I spent a while trying to write "SkTranfserFn almost cancel" functions analytically, but found that ...

3 years, 10 months ago (2017-02-16 22:57:28 UTC) #3

hubbe

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#newcode393 ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { Please add ...

3 years, 10 months ago (2017-02-16 23:06:38 UTC) #4

ccameron

Description was changed from ========== color: Add analytic transfer functions in shaders Create a separate ...

3 years, 10 months ago (2017-02-16 23:29:43 UTC) #5

ccameron

Updated with a handful more tests. For the "bunch o different transfer functions", the generated ...

3 years, 10 months ago (2017-02-16 23:51:58 UTC) #6

Updated with a handful more tests. For the "bunch o different transfer
functions", the generated sources look like...

(and I'm seeing now that the "add a vec3 that is zero" could be eliminated...).

        // The identity.                                                 
        {1.f, 1.f, 0.f, 1.f, 0.f, 0.f, 0.f},                                    
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

        // The identity, with an if statement.                                  
              
        {1.f, 1.f, 0.f, 1.f, 0.5f, 0.f, 0.f},                                   
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}
        // Just the power function.                                             
              
        {1.1f, 1.f, 0.f, 1.f, 0.f, 0.f, 0.f},                                   
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  color.r = pow(color.r, 1.100000);
  color.g = pow(color.g, 1.100000);
  color.b = pow(color.b, 1.100000);
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

        // Everything but the power function, nonlinear only.                   
              
        {1.f, 0.9f, 0.1f, 0.9f, 0.f, 0.1f, 0.1f},                               
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  color.r = 0.900000 * color.r + 0.100000 + 0.100000;
  color.g = 0.900000 * color.g + 0.100000 + 0.100000;
  color.b = 0.900000 * color.b + 0.100000 + 0.100000;
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

        // Everything, nonlinear only.                                          
              
        {1.1f, 0.9f, 0.1f, 0.9f, 0.f, 0.1f, 0.1f},                              
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  color.r = pow(0.900000 * color.r + 0.100000, 1.100000) + 0.100000;
  color.g = pow(0.900000 * color.g + 0.100000, 1.100000) + 0.100000;
  color.b = pow(0.900000 * color.b + 0.100000, 1.100000) + 0.100000;
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

        // Everything but the power function.                                   
              
        {1.f, 0.9f, 0.1f, 0.9f, 0.5f, 0.1f, 0.1f},                              
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  if (color.r < 0.500000)
    color.r = 0.900000 * color.r+ 0.100000;
  else
    color.r = 0.900000 * color.r + 0.100000 + 0.100000;
  if (color.g < 0.500000)
    color.g = 0.900000 * color.g+ 0.100000;
  else
    color.g = 0.900000 * color.g + 0.100000 + 0.100000;
  if (color.b < 0.500000)
    color.b = 0.900000 * color.b+ 0.100000;
  else
    color.b = 0.900000 * color.b + 0.100000 + 0.100000;
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

        // Everything.                                                          
              
        {1.1f, 0.9f, 0.1f, 0.9f, 0.5f, 0.1f, 0.1f},                             
              
SOURCE:
vec3 DoColorConversion(vec3 color) {
  if (color.r < 0.500000)
    color.r = 0.900000 * color.r+ 0.100000;
  else
    color.r = pow(0.900000 * color.r + 0.100000, 1.100000) + 0.100000;
  if (color.g < 0.500000)
    color.g = 0.900000 * color.g+ 0.100000;
  else
    color.g = pow(0.900000 * color.g + 0.100000, 1.100000) + 0.100000;
  if (color.b < 0.500000)
    color.b = 0.900000 * color.b+ 0.100000;
  else
    color.b = pow(0.900000 * color.b + 0.100000, 1.100000) + 0.100000;
  color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659,
0.148399, 0.060623, 0.865465) * color;
  color += vec3(0.000000, 0.000000, 0.000000);
  return color;
}

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc
File ui/gfx/color_transform.cc (right):

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string*
result, const char* value) {
On 2017/02/16 23:06:38, hubbe wrote:
> Please add some tests for this.
> It would also help me review the generated code, as I can see what the code
> would look like in the test. :)

Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it
didn't get all of the cases (where some things are 1 and others are 0).

I added a test that goes through a variety of SkColorSpaceTransferFns, to
generate the various combinations of elided operations.

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r");
On 2017/02/16 23:06:38, hubbe wrote:
> Is the compiler smart enough to vectorize these?
> I just wonder if it would be better to work on all channels simultaneously?

I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate the
same.

Since the ~Tesla generation, GPU shader cores are actually scalar units -- when
you do matrix-vector multiplies, they're executed one-float-at-a-time. At least
at a per-pixel level.

The place where GPU shader cores get parallelism is that they're executing the
same scalar program for each pixel. So, the compiler can be super-lazy and not
try to vectorize anything, and just rely on the fact that we're running the same
program on thousands of pixels in parallel to get the parallelism. In
particular, vectorizing in the compiler is a losing game.

In NVIDIA parlance, the parallel execution units are called "warps" of 16 or 32
"threads", where "thread" is sort of a misnomer (it's more like 1 lane of a
SIMD). Within this warp
- if the shader programs diverge (like have an if-statement), the shader will
execute both paths of the if-statement 
- if one of the threads is delayed cause of a texture cache miss, every thread
is delayed

This is all a mental relief for me, cause it means I don't have to think about
"is this vectorize-able".

hubbe

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#newcode393 ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On 2017/02/16 ...

3 years, 10 months ago (2017-02-17 00:06:00 UTC) #7

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc
File ui/gfx/color_transform.cc (right):

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string*
result, const char* value) {
On 2017/02/16 23:51:58, ccameron wrote:
> On 2017/02/16 23:06:38, hubbe wrote:
> > Please add some tests for this.
> > It would also help me review the generated code, as I can see what the code
> > would look like in the test. :)
> 
> Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it
> didn't get all of the cases (where some things are 1 and others are 0).
> 
> I added a test that goes through a variety of SkColorSpaceTransferFns, to
> generate the various combinations of elided operations.

Could I get one of two tests on the form:

string str;
trasnform->AppendShaderSource(&str);
EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ?

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r");
On 2017/02/16 23:51:58, ccameron wrote:
> On 2017/02/16 23:06:38, hubbe wrote:
> > Is the compiler smart enough to vectorize these?
> > I just wonder if it would be better to work on all channels simultaneously?
> 
> I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate
the
> same.
> 
> Since the ~Tesla generation, GPU shader cores are actually scalar units --
when
> you do matrix-vector multiplies, they're executed one-float-at-a-time. At
least
> at a per-pixel level.
> 
> The place where GPU shader cores get parallelism is that they're executing the
> same scalar program for each pixel. So, the compiler can be super-lazy and not
> try to vectorize anything, and just rely on the fact that we're running the
same
> program on thousands of pixels in parallel to get the parallelism. In
> particular, vectorizing in the compiler is a losing game.
> 
> In NVIDIA parlance, the parallel execution units are called "warps" of 16 or
32
> "threads", where "thread" is sort of a misnomer (it's more like 1 lane of a
> SIMD). Within this warp
> - if the shader programs diverge (like have an if-statement), the shader will
> execute both paths of the if-statement 
> - if one of the threads is delayed cause of a texture cache miss, every thread
> is delayed
> 
> This is all a mental relief for me, cause it means I don't have to think about
> "is this vectorize-able".

That's very interesting. It's my impression that some GPUs operate more
efficiently on quads. I'd like to get a second opinion on on this, perhaps
piman@?

hubbe

On 2017/02/17 00:06:00, hubbe wrote: > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc > File ui/gfx/color_transform.cc (right): > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#newcode393 > ...

3 years, 10 months ago (2017-02-17 00:07:59 UTC) #8

On 2017/02/17 00:06:00, hubbe wrote:
> https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc
> File ui/gfx/color_transform.cc (right):
> 
>
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
> ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string*
> result, const char* value) {
> On 2017/02/16 23:51:58, ccameron wrote:
> > On 2017/02/16 23:06:38, hubbe wrote:
> > > Please add some tests for this.
> > > It would also help me review the generated code, as I can see what the
code
> > > would look like in the test. :)
> > 
> > Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it
> > didn't get all of the cases (where some things are 1 and others are 0).
> > 
> > I added a test that goes through a variety of SkColorSpaceTransferFns, to
> > generate the various combinations of elided operations.
> 
> Could I get one of two tests on the form:
> 
> string str;
> trasnform->AppendShaderSource(&str);
> EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ?
> 
>
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
> ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r");
> On 2017/02/16 23:51:58, ccameron wrote:
> > On 2017/02/16 23:06:38, hubbe wrote:
> > > Is the compiler smart enough to vectorize these?
> > > I just wonder if it would be better to work on all channels
simultaneously?
> > 
> > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate
> the
> > same.
> > 
> > Since the ~Tesla generation, GPU shader cores are actually scalar units --
> when
> > you do matrix-vector multiplies, they're executed one-float-at-a-time. At
> least
> > at a per-pixel level.
> > 
> > The place where GPU shader cores get parallelism is that they're executing
the
> > same scalar program for each pixel. So, the compiler can be super-lazy and
not
> > try to vectorize anything, and just rely on the fact that we're running the
> same
> > program on thousands of pixels in parallel to get the parallelism. In
> > particular, vectorizing in the compiler is a losing game.
> > 
> > In NVIDIA parlance, the parallel execution units are called "warps" of 16 or
> 32
> > "threads", where "thread" is sort of a misnomer (it's more like 1 lane of a
> > SIMD). Within this warp
> > - if the shader programs diverge (like have an if-statement), the shader
will
> > execute both paths of the if-statement 
> > - if one of the threads is delayed cause of a texture cache miss, every
thread
> > is delayed
> > 
> > This is all a mental relief for me, cause it means I don't have to think
about
> > "is this vectorize-able".
> 
> That's very interesting. It's my impression that some GPUs operate more
> efficiently on quads. I'd like to get a second opinion on on this, perhaps
> piman@?

I guess piman is on leave, who else knows a lot about a variety of gpu
architectures?

chromium-reviews

PS: If we can't find a second opinion, I'm ok with checking it in and ...

3 years, 10 months ago (2017-02-17 00:09:09 UTC) #9

PS: If we can't find a second opinion, I'm ok with checking it in and let
the performance tests tell us if it sucks. :)


On Thu, Feb 16, 2017 at 4:07 PM, <hubbe@chromium.org> wrote:

> On 2017/02/17 00:06:00, hubbe wrote:
> > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/
> color_transform.cc
> > File ui/gfx/color_transform.cc (right):
> >
> >
> https://codereview.chromium.org/2697413002/diff/1/ui/gfx/
> color_transform.cc#newcode393
> > ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std:
> :string*
> > result, const char* value) {
> > On 2017/02/16 23:51:58, ccameron wrote:
> > > On 2017/02/16 23:06:38, hubbe wrote:
> > > > Please add some tests for this.
> > > > It would also help me review the generated code, as I can see what
> the
> code
> > > > would look like in the test. :)
> > >
> > > Mmh, good point -- this was hit a little bit by the GLRenderer tests,
> but it
> > > didn't get all of the cases (where some things are 1 and others are 0).
> > >
> > > I added a test that goes through a variety of SkColorSpaceTransferFns,
> to
> > > generate the various combinations of elided operations.
> >
> > Could I get one of two tests on the form:
> >
> > string str;
> > trasnform->AppendShaderSource(&str);
> > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ?
> >
> >
> https://codereview.chromium.org/2697413002/diff/1/ui/gfx/
> color_transform.cc#newcode433
> > ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result,
> "color.r");
> > On 2017/02/16 23:51:58, ccameron wrote:
> > > On 2017/02/16 23:06:38, hubbe wrote:
> > > > Is the compiler smart enough to vectorize these?
> > > > I just wonder if it would be better to work on all channels
> simultaneously?
> > >
> > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they
> operate
> > the
> > > same.
> > >
> > > Since the ~Tesla generation, GPU shader cores are actually scalar
> units --
> > when
> > > you do matrix-vector multiplies, they're executed one-float-at-a-time.
> At
> > least
> > > at a per-pixel level.
> > >
> > > The place where GPU shader cores get parallelism is that they're
> executing
> the
> > > same scalar program for each pixel. So, the compiler can be super-lazy
> and
> not
> > > try to vectorize anything, and just rely on the fact that we're
> running the
> > same
> > > program on thousands of pixels in parallel to get the parallelism. In
> > > particular, vectorizing in the compiler is a losing game.
> > >
> > > In NVIDIA parlance, the parallel execution units are called "warps" of
> 16 or
> > 32
> > > "threads", where "thread" is sort of a misnomer (it's more like 1 lane
> of a
> > > SIMD). Within this warp
> > > - if the shader programs diverge (like have an if-statement), the
> shader
> will
> > > execute both paths of the if-statement
> > > - if one of the threads is delayed cause of a texture cache miss, every
> thread
> > > is delayed
> > >
> > > This is all a mental relief for me, cause it means I don't have to
> think
> about
> > > "is this vectorize-able".
> >
> > That's very interesting. It's my impression that some GPUs operate more
> > efficiently on quads. I'd like to get a second opinion on on this,
> perhaps
> > piman@?
>
> I guess piman is on leave, who else knows a lot about a variety of gpu
> architectures?
>
>
> https://codereview.chromium.org/2697413002/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

ccameron

The CQ bit was checked by ccameron@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-19 22:05:35 UTC) #10

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2697413002/80001

3 years, 10 months ago (2017-02-19 22:05:41 UTC) #11

ccameron

On 2017/02/17 00:09:09, chromium-reviews wrote: > > > > > Is the compiler smart enough ...

3 years, 10 months ago (2017-02-19 22:28:07 UTC) #12

On 2017/02/17 00:09:09, chromium-reviews wrote:

> > > > > Is the compiler smart enough to vectorize these?
> > > > > I just wonder if it would be better to work on all channels
> > > > > simultaneously?
> > > > > 
> > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure GPU
> > > > cores are actually scalar units
> > > > 
> > > That's very interesting. It's my impression that some GPUs operate more
> > > efficiently on quads. I'd like to get a second opinion on on this,
> > > perhaps piman@?
> >
> > I guess piman is on leave, who else knows a lot about a variety of gpu
> > architectures?
> 
> PS: If we can't find a second opinion, I'm ok with checking it in and let
> the performance tests tell us if it sucks. :)
> 

Stéphane (cced) is also another good resource on this.

I don't think that performance concerns are warranted here. This really is the
sort of math that GPUs are designed to do efficiently.

I also did a number of experiments to verify that the behavior was as expected.
This document (which I've sent around a few times) runs the worst-case of this
shader source (in the "Analytic sRGB" column), and it out-performs LUTs across
the board.

https://docs.google.com/document/d/1lTyFs7_BtBAdL0ChoUtq1RVmP_1boGcnpqiNuNY5U...

Also of note (particularly WRT HDR) is that 
- in the experiments, the LUTs are 8-bit, which is likely much lower power than
16-bit
- with an analytic approach, we don't have to know what the input domain is

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc
File ui/gfx/color_transform.cc (right):

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string*
result, const char* value) {
On 2017/02/17 00:06:00, hubbe wrote:
> Could I get one of two tests on the form:
> 
> string str;
> trasnform->AppendShaderSource(&str);
> EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ?

I don't feel that that sort of test is particularly helpful -- it will fail
every time we change the shader source, but will give no signal as to whether or
not the shader source is correct.

I added a test similar to this, which verifies that the shader source doesn't
emit an "if" or a "pow" if they aren't needed -- is that closer to what you were
looking for?

We currently have 3 other sets of tests for this
1. GLRenderer tests that verify that all versions of this compile
2. GLRenderer pixel tests that verify that the output pixels did not change
3. Blink pixel layout tests that verify that the output pixels did not change

IMO we need to add more of category [2] tests (and I'm working on that, but it
has lots of unrelated plumbing that is needed first).

ccameron

ccameron@chromium.org changed reviewers: + marcheu@chromium.org

3 years, 10 months ago (2017-02-19 22:28:35 UTC) #13

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-19 23:00:14 UTC) #15

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_android_rel_ng/builds/235447)

3 years, 10 months ago (2017-02-19 23:00:15 UTC) #16

hubbe

https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#newcode393 ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { > it ...

3 years, 10 months ago (2017-02-20 05:01:02 UTC) #17

ccameron

The CQ bit was checked by ccameron@chromium.org to run a CQ dry run

3 years, 10 months ago (2017-02-21 22:32:29 UTC) #18

ccameron

Updated https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#newcode393 ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On ...

3 years, 10 months ago (2017-02-21 22:32:42 UTC) #19

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2697413002/100001

3 years, 10 months ago (2017-02-21 22:35:01 UTC) #20

hubbe

LGTM (I'm still hoping for a second opinion on the shader code, but I'm fine ...

3 years, 10 months ago (2017-02-21 22:37:28 UTC) #21

marcheu

On 2017/02/19 22:28:07, ccameron wrote: > On 2017/02/17 00:09:09, chromium-reviews wrote: > > > > ...

3 years, 10 months ago (2017-02-21 22:48:52 UTC) #22

On 2017/02/19 22:28:07, ccameron wrote:
> On 2017/02/17 00:09:09, chromium-reviews wrote:
> 
> > > > > > Is the compiler smart enough to vectorize these?
> > > > > > I just wonder if it would be better to work on all channels
> > > > > > simultaneously?
> > > > > > 
> > > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure GPU
> > > > > cores are actually scalar units
> > > > > 
> > > > That's very interesting. It's my impression that some GPUs operate more
> > > > efficiently on quads. I'd like to get a second opinion on on this,
> > > > perhaps piman@?
> > >
> > > I guess piman is on leave, who else knows a lot about a variety of gpu
> > > architectures?
> > 
> > PS: If we can't find a second opinion, I'm ok with checking it in and let
> > the performance tests tell us if it sucks. :)
> > 
> 
> Stéphane (cced) is also another good resource on this.
> 
> I don't think that performance concerns are warranted here. This really is the
> sort of math that GPUs are designed to do efficiently.

Sorry, I only see this now. I was gone all weekend and am catching up on email
backlog.

The debate on scalar vs vector GPU architectures is essentially over at this
point, since all GPU families, except for one, have switched to scalar models
within the last decade (nvidia was one of the first on nv50/g80, others followed
quickly). This happened just after the transition away from combiners and to
shaders; as soon as combiners disappeared, the notion of handling RGBA vectors
was made obsolete because it lacked flexibility. The last remaining GPU family
using vectors (one of the mobile GPUs) is in the process of switching too. 

So at this point, we shouldn't need to worry about how well things vectorize,
because at the low level it's not needed for efficiency.


> 
> I also did a number of experiments to verify that the behavior was as
expected.
> This document (which I've sent around a few times) runs the worst-case of this
> shader source (in the "Analytic sRGB" column), and it out-performs LUTs across
> the board.
> 
>
https://docs.google.com/document/d/1lTyFs7_BtBAdL0ChoUtq1RVmP_1boGcnpqiNuNY5U...
> 
> Also of note (particularly WRT HDR) is that 
> - in the experiments, the LUTs are 8-bit, which is likely much lower power
than
> 16-bit
> - with an analytic approach, we don't have to know what the input domain is
> 
> https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc
> File ui/gfx/color_transform.cc (right):
> 
>
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n...
> ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string*
> result, const char* value) {
> On 2017/02/17 00:06:00, hubbe wrote:
> > Could I get one of two tests on the form:
> > 
> > string str;
> > trasnform->AppendShaderSource(&str);
> > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ?
> 
> I don't feel that that sort of test is particularly helpful -- it will fail
> every time we change the shader source, but will give no signal as to whether
or
> not the shader source is correct.
> 
> I added a test similar to this, which verifies that the shader source doesn't
> emit an "if" or a "pow" if they aren't needed -- is that closer to what you
were
> looking for?
> 
> We currently have 3 other sets of tests for this
> 1. GLRenderer tests that verify that all versions of this compile
> 2. GLRenderer pixel tests that verify that the output pixels did not change
> 3. Blink pixel layout tests that verify that the output pixels did not change
> 
> IMO we need to add more of category [2] tests (and I'm working on that, but it
> has lots of unrelated plumbing that is needed first).

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-21 23:58:20 UTC) #23

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_swarming_rel/builds/123557)

3 years, 10 months ago (2017-02-21 23:58:21 UTC) #24

ccameron

The patchset sent to the CQ was uploaded after l-g-t-m from hubbe@chromium.org Link to the ...

3 years, 10 months ago (2017-02-22 02:44:37 UTC) #26

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2697413002/120001

3 years, 10 months ago (2017-02-22 02:45:24 UTC) #27

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 10 months ago (2017-02-22 04:01:07 UTC) #28

commit-bot: I haz the power

Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/387107)

3 years, 10 months ago (2017-02-22 04:01:08 UTC) #29

ccameron

The patchset sent to the CQ was uploaded after l-g-t-m from hubbe@chromium.org Link to the ...

3 years, 10 months ago (2017-02-22 09:49:09 UTC) #31

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2697413002/160001

3 years, 10 months ago (2017-02-22 09:50:11 UTC) #32

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 160001, "attempt_start_ts": 1487756948538780, "parent_rev": "85ed9bafd675dc129eebb238ac3c679d436e5d40", "commit_rev": "8958419e9b0c522ad702b225c9348d2486a456c9"}

3 years, 10 months ago (2017-02-22 11:47:37 UTC) #33

commit-bot: I haz the power

Description was changed from ========== color: Add analytic transfer functions in shaders Create a separate ...

3 years, 10 months ago (2017-02-22 11:48:15 UTC) #34

commit-bot: I haz the power

3 years, 10 months ago (2017-02-22 11:48:15 UTC) #35

Message was sent while issue was closed.

Committed patchset #9 (id:160001) as
https://chromium.googlesource.com/chromium/src/+/8958419e9b0c522ad702b225c934...

Issue 2697413002: color: Add analytic transfer functions in shaders (Closed)

Description

Patch Set 1 #

Patch Set 2 : Add rec601 for non-cancelling tests #

Patch Set 3 : More tests #

Patch Set 4 : Add tests for shader source #

Patch Set 5 : Rebase, use +1.8e for full precision #

Patch Set 6 : Add source unit test #

Patch Set 7 : Dont use ICC profile in shader source test #

Patch Set 8 : Disable on android #

Patch Set 9 : Disable on android #

Messages