|
|
Descriptioncolor: Add analytic transfer functions in shaders
Create a separate ColorTransformStep for SkColorSpaceTransferFn-based
transfer functions.
Add helpers SkTransferFnsApproximatelyCancel and
SkMatrixIsApproximatelyIdentity, which will empirically test if two
transfer functions nearly match.
The threshold for "nearly match" is chosen such that BT709 (the most
common video transfer function) and sRGB (the most common monitor
transfer function) are considered to "approximately cancel out".
Add code to generate shader source for SkColorSpaceTransferFn-based
transfer functions.
BUG=667966
CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel
Review-Url: https://codereview.chromium.org/2697413002
Cr-Commit-Position: refs/heads/master@{#452000}
Committed: https://chromium.googlesource.com/chromium/src/+/8958419e9b0c522ad702b225c9348d2486a456c9
Patch Set 1 #
Total comments: 9
Patch Set 2 : Add rec601 for non-cancelling tests #Patch Set 3 : More tests #Patch Set 4 : Add tests for shader source #Patch Set 5 : Rebase, use +1.8e for full precision #Patch Set 6 : Add source unit test #Patch Set 7 : Dont use ICC profile in shader source test #Patch Set 8 : Disable on android #Patch Set 9 : Disable on android #
Dependent Patchsets: Messages
Total messages: 35 (19 generated)
Description was changed from ========== color: Add analytic transfer functions Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 ========== to ========== color: Add analytic transfer functions in shaders Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 ==========
ccameron@chromium.org changed reviewers: + hubbe@chromium.org
I spent a while trying to write "SkTranfserFn almost cancel" functions analytically, but found that evaluating them at 8 points was very effective (and plotted the graphs in python to convince myself). I chose the epsilon of 2.5/256 so that bt709-video-to-sRGB will have the transfer functions cancel out (and added a test to verify it).
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { Please add some tests for this. It would also help me review the generated code, as I can see what the code would look like in the test. :) https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r"); Is the compiler smart enough to vectorize these? I just wonder if it would be better to work on all channels simultaneously?
Description was changed from ========== color: Add analytic transfer functions in shaders Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 ========== to ========== color: Add analytic transfer functions in shaders Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ==========
Updated with a handful more tests. For the "bunch o different transfer functions", the generated sources look like... (and I'm seeing now that the "add a vec3 that is zero" could be eliminated...). // The identity. {1.f, 1.f, 0.f, 1.f, 0.f, 0.f, 0.f}, SOURCE: vec3 DoColorConversion(vec3 color) { color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // The identity, with an if statement. {1.f, 1.f, 0.f, 1.f, 0.5f, 0.f, 0.f}, SOURCE: vec3 DoColorConversion(vec3 color) { color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // Just the power function. {1.1f, 1.f, 0.f, 1.f, 0.f, 0.f, 0.f}, SOURCE: vec3 DoColorConversion(vec3 color) { color.r = pow(color.r, 1.100000); color.g = pow(color.g, 1.100000); color.b = pow(color.b, 1.100000); color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // Everything but the power function, nonlinear only. {1.f, 0.9f, 0.1f, 0.9f, 0.f, 0.1f, 0.1f}, SOURCE: vec3 DoColorConversion(vec3 color) { color.r = 0.900000 * color.r + 0.100000 + 0.100000; color.g = 0.900000 * color.g + 0.100000 + 0.100000; color.b = 0.900000 * color.b + 0.100000 + 0.100000; color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // Everything, nonlinear only. {1.1f, 0.9f, 0.1f, 0.9f, 0.f, 0.1f, 0.1f}, SOURCE: vec3 DoColorConversion(vec3 color) { color.r = pow(0.900000 * color.r + 0.100000, 1.100000) + 0.100000; color.g = pow(0.900000 * color.g + 0.100000, 1.100000) + 0.100000; color.b = pow(0.900000 * color.b + 0.100000, 1.100000) + 0.100000; color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // Everything but the power function. {1.f, 0.9f, 0.1f, 0.9f, 0.5f, 0.1f, 0.1f}, SOURCE: vec3 DoColorConversion(vec3 color) { if (color.r < 0.500000) color.r = 0.900000 * color.r+ 0.100000; else color.r = 0.900000 * color.r + 0.100000 + 0.100000; if (color.g < 0.500000) color.g = 0.900000 * color.g+ 0.100000; else color.g = 0.900000 * color.g + 0.100000 + 0.100000; if (color.b < 0.500000) color.b = 0.900000 * color.b+ 0.100000; else color.b = 0.900000 * color.b + 0.100000 + 0.100000; color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } // Everything. {1.1f, 0.9f, 0.1f, 0.9f, 0.5f, 0.1f, 0.1f}, SOURCE: vec3 DoColorConversion(vec3 color) { if (color.r < 0.500000) color.r = 0.900000 * color.r+ 0.100000; else color.r = pow(0.900000 * color.r + 0.100000, 1.100000) + 0.100000; if (color.g < 0.500000) color.g = 0.900000 * color.g+ 0.100000; else color.g = pow(0.900000 * color.g + 0.100000, 1.100000) + 0.100000; if (color.b < 0.500000) color.b = 0.900000 * color.b+ 0.100000; else color.b = pow(0.900000 * color.b + 0.100000, 1.100000) + 0.100000; color = mat3(0.452210, 0.222478, 0.016876, 0.399392, 0.716899, 0.117659, 0.148399, 0.060623, 0.865465) * color; color += vec3(0.000000, 0.000000, 0.000000); return color; } https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On 2017/02/16 23:06:38, hubbe wrote: > Please add some tests for this. > It would also help me review the generated code, as I can see what the code > would look like in the test. :) Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it didn't get all of the cases (where some things are 1 and others are 0). I added a test that goes through a variety of SkColorSpaceTransferFns, to generate the various combinations of elided operations. https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r"); On 2017/02/16 23:06:38, hubbe wrote: > Is the compiler smart enough to vectorize these? > I just wonder if it would be better to work on all channels simultaneously? I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate the same. Since the ~Tesla generation, GPU shader cores are actually scalar units -- when you do matrix-vector multiplies, they're executed one-float-at-a-time. At least at a per-pixel level. The place where GPU shader cores get parallelism is that they're executing the same scalar program for each pixel. So, the compiler can be super-lazy and not try to vectorize anything, and just rely on the fact that we're running the same program on thousands of pixels in parallel to get the parallelism. In particular, vectorizing in the compiler is a losing game. In NVIDIA parlance, the parallel execution units are called "warps" of 16 or 32 "threads", where "thread" is sort of a misnomer (it's more like 1 lane of a SIMD). Within this warp - if the shader programs diverge (like have an if-statement), the shader will execute both paths of the if-statement - if one of the threads is delayed cause of a texture cache miss, every thread is delayed This is all a mental relief for me, cause it means I don't have to think about "is this vectorize-able".
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On 2017/02/16 23:51:58, ccameron wrote: > On 2017/02/16 23:06:38, hubbe wrote: > > Please add some tests for this. > > It would also help me review the generated code, as I can see what the code > > would look like in the test. :) > > Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it > didn't get all of the cases (where some things are 1 and others are 0). > > I added a test that goes through a variety of SkColorSpaceTransferFns, to > generate the various combinations of elided operations. Could I get one of two tests on the form: string str; trasnform->AppendShaderSource(&str); EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ? https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r"); On 2017/02/16 23:51:58, ccameron wrote: > On 2017/02/16 23:06:38, hubbe wrote: > > Is the compiler smart enough to vectorize these? > > I just wonder if it would be better to work on all channels simultaneously? > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate the > same. > > Since the ~Tesla generation, GPU shader cores are actually scalar units -- when > you do matrix-vector multiplies, they're executed one-float-at-a-time. At least > at a per-pixel level. > > The place where GPU shader cores get parallelism is that they're executing the > same scalar program for each pixel. So, the compiler can be super-lazy and not > try to vectorize anything, and just rely on the fact that we're running the same > program on thousands of pixels in parallel to get the parallelism. In > particular, vectorizing in the compiler is a losing game. > > In NVIDIA parlance, the parallel execution units are called "warps" of 16 or 32 > "threads", where "thread" is sort of a misnomer (it's more like 1 lane of a > SIMD). Within this warp > - if the shader programs diverge (like have an if-statement), the shader will > execute both paths of the if-statement > - if one of the threads is delayed cause of a texture cache miss, every thread > is delayed > > This is all a mental relief for me, cause it means I don't have to think about > "is this vectorize-able". That's very interesting. It's my impression that some GPUs operate more efficiently on quads. I'd like to get a second opinion on on this, perhaps piman@?
On 2017/02/17 00:06:00, hubbe wrote: > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc > File ui/gfx/color_transform.cc (right): > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... > ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* > result, const char* value) { > On 2017/02/16 23:51:58, ccameron wrote: > > On 2017/02/16 23:06:38, hubbe wrote: > > > Please add some tests for this. > > > It would also help me review the generated code, as I can see what the code > > > would look like in the test. :) > > > > Mmh, good point -- this was hit a little bit by the GLRenderer tests, but it > > didn't get all of the cases (where some things are 1 and others are 0). > > > > I added a test that goes through a variety of SkColorSpaceTransferFns, to > > generate the various combinations of elided operations. > > Could I get one of two tests on the form: > > string str; > trasnform->AppendShaderSource(&str); > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ? > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... > ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, "color.r"); > On 2017/02/16 23:51:58, ccameron wrote: > > On 2017/02/16 23:06:38, hubbe wrote: > > > Is the compiler smart enough to vectorize these? > > > I just wonder if it would be better to work on all channels simultaneously? > > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they operate > the > > same. > > > > Since the ~Tesla generation, GPU shader cores are actually scalar units -- > when > > you do matrix-vector multiplies, they're executed one-float-at-a-time. At > least > > at a per-pixel level. > > > > The place where GPU shader cores get parallelism is that they're executing the > > same scalar program for each pixel. So, the compiler can be super-lazy and not > > try to vectorize anything, and just rely on the fact that we're running the > same > > program on thousands of pixels in parallel to get the parallelism. In > > particular, vectorizing in the compiler is a losing game. > > > > In NVIDIA parlance, the parallel execution units are called "warps" of 16 or > 32 > > "threads", where "thread" is sort of a misnomer (it's more like 1 lane of a > > SIMD). Within this warp > > - if the shader programs diverge (like have an if-statement), the shader will > > execute both paths of the if-statement > > - if one of the threads is delayed cause of a texture cache miss, every thread > > is delayed > > > > This is all a mental relief for me, cause it means I don't have to think about > > "is this vectorize-able". > > That's very interesting. It's my impression that some GPUs operate more > efficiently on quads. I'd like to get a second opinion on on this, perhaps > piman@? I guess piman is on leave, who else knows a lot about a variety of gpu architectures?
PS: If we can't find a second opinion, I'm ok with checking it in and let the performance tests tell us if it sucks. :) On Thu, Feb 16, 2017 at 4:07 PM, <hubbe@chromium.org> wrote: > On 2017/02/17 00:06:00, hubbe wrote: > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/ > color_transform.cc > > File ui/gfx/color_transform.cc (right): > > > > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/ > color_transform.cc#newcode393 > > ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std: > :string* > > result, const char* value) { > > On 2017/02/16 23:51:58, ccameron wrote: > > > On 2017/02/16 23:06:38, hubbe wrote: > > > > Please add some tests for this. > > > > It would also help me review the generated code, as I can see what > the > code > > > > would look like in the test. :) > > > > > > Mmh, good point -- this was hit a little bit by the GLRenderer tests, > but it > > > didn't get all of the cases (where some things are 1 and others are 0). > > > > > > I added a test that goes through a variety of SkColorSpaceTransferFns, > to > > > generate the various combinations of elided operations. > > > > Could I get one of two tests on the form: > > > > string str; > > trasnform->AppendShaderSource(&str); > > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ? > > > > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/ > color_transform.cc#newcode433 > > ui/gfx/color_transform.cc:433: AppendShaderSourceChannel(result, > "color.r"); > > On 2017/02/16 23:51:58, ccameron wrote: > > > On 2017/02/16 23:06:38, hubbe wrote: > > > > Is the compiler smart enough to vectorize these? > > > > I just wonder if it would be better to work on all channels > simultaneously? > > > > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure they > operate > > the > > > same. > > > > > > Since the ~Tesla generation, GPU shader cores are actually scalar > units -- > > when > > > you do matrix-vector multiplies, they're executed one-float-at-a-time. > At > > least > > > at a per-pixel level. > > > > > > The place where GPU shader cores get parallelism is that they're > executing > the > > > same scalar program for each pixel. So, the compiler can be super-lazy > and > not > > > try to vectorize anything, and just rely on the fact that we're > running the > > same > > > program on thousands of pixels in parallel to get the parallelism. In > > > particular, vectorizing in the compiler is a losing game. > > > > > > In NVIDIA parlance, the parallel execution units are called "warps" of > 16 or > > 32 > > > "threads", where "thread" is sort of a misnomer (it's more like 1 lane > of a > > > SIMD). Within this warp > > > - if the shader programs diverge (like have an if-statement), the > shader > will > > > execute both paths of the if-statement > > > - if one of the threads is delayed cause of a texture cache miss, every > thread > > > is delayed > > > > > > This is all a mental relief for me, cause it means I don't have to > think > about > > > "is this vectorize-able". > > > > That's very interesting. It's my impression that some GPUs operate more > > efficiently on quads. I'd like to get a second opinion on on this, > perhaps > > piman@? > > I guess piman is on leave, who else knows a lot about a variety of gpu > architectures? > > > https://codereview.chromium.org/2697413002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
The CQ bit was checked by ccameron@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/02/17 00:09:09, chromium-reviews wrote: > > > > > Is the compiler smart enough to vectorize these? > > > > > I just wonder if it would be better to work on all channels > > > > > simultaneously? > > > > > > > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure GPU > > > > cores are actually scalar units > > > > > > > That's very interesting. It's my impression that some GPUs operate more > > > efficiently on quads. I'd like to get a second opinion on on this, > > > perhaps piman@? > > > > I guess piman is on leave, who else knows a lot about a variety of gpu > > architectures? > > PS: If we can't find a second opinion, I'm ok with checking it in and let > the performance tests tell us if it sucks. :) > Stéphane (cced) is also another good resource on this. I don't think that performance concerns are warranted here. This really is the sort of math that GPUs are designed to do efficiently. I also did a number of experiments to verify that the behavior was as expected. This document (which I've sent around a few times) runs the worst-case of this shader source (in the "Analytic sRGB" column), and it out-performs LUTs across the board. https://docs.google.com/document/d/1lTyFs7_BtBAdL0ChoUtq1RVmP_1boGcnpqiNuNY5U... Also of note (particularly WRT HDR) is that - in the experiments, the LUTs are 8-bit, which is likely much lower power than 16-bit - with an analytic approach, we don't have to know what the input domain is https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On 2017/02/17 00:06:00, hubbe wrote: > Could I get one of two tests on the form: > > string str; > trasnform->AppendShaderSource(&str); > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ? I don't feel that that sort of test is particularly helpful -- it will fail every time we change the shader source, but will give no signal as to whether or not the shader source is correct. I added a test similar to this, which verifies that the shader source doesn't emit an "if" or a "pow" if they aren't needed -- is that closer to what you were looking for? We currently have 3 other sets of tests for this 1. GLRenderer tests that verify that all versions of this compile 2. GLRenderer pixel tests that verify that the output pixels did not change 3. Blink pixel layout tests that verify that the output pixels did not change IMO we need to add more of category [2] tests (and I'm working on that, but it has lots of unrelated plumbing that is needed first).
ccameron@chromium.org changed reviewers: + marcheu@chromium.org
(actually ccing marcheu@)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { > it will fail every time we change the shader source That is exactly the point. I don't want a lot of these kind of tests, because that would be really annoying, but one or two will force people to explicitly show how their changes affect the generated code, which will make it much easier for the person reviewing the changes. (Which is likely to be you or me.)
The CQ bit was checked by ccameron@chromium.org to run a CQ dry run
Updated https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc File ui/gfx/color_transform.cc (right): https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* result, const char* value) { On 2017/02/20 05:01:02, hubbe wrote: > > > it will fail every time we change the shader source > > That is exactly the point. I don't want a lot of these kind of tests, because > that would be really annoying, but one or two will force people to explicitly > show how their changes affect the generated code, which will make it much easier > for the person reviewing the changes. (Which is likely to be you or me.) Ah, that's a good idea. I've added one test which does that.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
LGTM (I'm still hoping for a second opinion on the shader code, but I'm fine with checking it in.)
On 2017/02/19 22:28:07, ccameron wrote: > On 2017/02/17 00:09:09, chromium-reviews wrote: > > > > > > > Is the compiler smart enough to vectorize these? > > > > > > I just wonder if it would be better to work on all channels > > > > > > simultaneously? > > > > > > > > > > > I can't speak to anything but NVIDIA GPUs, but I'm pretty sure GPU > > > > > cores are actually scalar units > > > > > > > > > That's very interesting. It's my impression that some GPUs operate more > > > > efficiently on quads. I'd like to get a second opinion on on this, > > > > perhaps piman@? > > > > > > I guess piman is on leave, who else knows a lot about a variety of gpu > > > architectures? > > > > PS: If we can't find a second opinion, I'm ok with checking it in and let > > the performance tests tell us if it sucks. :) > > > > Stéphane (cced) is also another good resource on this. > > I don't think that performance concerns are warranted here. This really is the > sort of math that GPUs are designed to do efficiently. Sorry, I only see this now. I was gone all weekend and am catching up on email backlog. The debate on scalar vs vector GPU architectures is essentially over at this point, since all GPU families, except for one, have switched to scalar models within the last decade (nvidia was one of the first on nv50/g80, others followed quickly). This happened just after the transition away from combiners and to shaders; as soon as combiners disappeared, the notion of handling RGBA vectors was made obsolete because it lacked flexibility. The last remaining GPU family using vectors (one of the mobile GPUs) is in the process of switching too. So at this point, we shouldn't need to worry about how well things vectorize, because at the low level it's not needed for efficiency. > > I also did a number of experiments to verify that the behavior was as expected. > This document (which I've sent around a few times) runs the worst-case of this > shader source (in the "Analytic sRGB" column), and it out-performs LUTs across > the board. > > https://docs.google.com/document/d/1lTyFs7_BtBAdL0ChoUtq1RVmP_1boGcnpqiNuNY5U... > > Also of note (particularly WRT HDR) is that > - in the experiments, the LUTs are 8-bit, which is likely much lower power than > 16-bit > - with an analytic approach, we don't have to know what the input domain is > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc > File ui/gfx/color_transform.cc (right): > > https://codereview.chromium.org/2697413002/diff/1/ui/gfx/color_transform.cc#n... > ui/gfx/color_transform.cc:393: void AppendShaderSourceChannel(std::string* > result, const char* value) { > On 2017/02/17 00:06:00, hubbe wrote: > > Could I get one of two tests on the form: > > > > string str; > > trasnform->AppendShaderSource(&str); > > EXPECT_EQ(str, "LITERAL SHADER SOURCE HERE"); ? > > I don't feel that that sort of test is particularly helpful -- it will fail > every time we change the shader source, but will give no signal as to whether or > not the shader source is correct. > > I added a test similar to this, which verifies that the shader source doesn't > emit an "if" or a "pow" if they aren't needed -- is that closer to what you were > looking for? > > We currently have 3 other sets of tests for this > 1. GLRenderer tests that verify that all versions of this compile > 2. GLRenderer pixel tests that verify that the output pixels did not change > 3. Blink pixel layout tests that verify that the output pixels did not change > > IMO we need to add more of category [2] tests (and I'm working on that, but it > has lots of unrelated plumbing that is needed first).
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by ccameron@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from hubbe@chromium.org Link to the patchset: https://codereview.chromium.org/2697413002/#ps120001 (title: "Dont use ICC profile in shader source test")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by ccameron@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from hubbe@chromium.org Link to the patchset: https://codereview.chromium.org/2697413002/#ps160001 (title: "Disable on android")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 160001, "attempt_start_ts": 1487756948538780, "parent_rev": "85ed9bafd675dc129eebb238ac3c679d436e5d40", "commit_rev": "8958419e9b0c522ad702b225c9348d2486a456c9"}
Message was sent while issue was closed.
Description was changed from ========== color: Add analytic transfer functions in shaders Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel ========== to ========== color: Add analytic transfer functions in shaders Create a separate ColorTransformStep for SkColorSpaceTransferFn-based transfer functions. Add helpers SkTransferFnsApproximatelyCancel and SkMatrixIsApproximatelyIdentity, which will empirically test if two transfer functions nearly match. The threshold for "nearly match" is chosen such that BT709 (the most common video transfer function) and sRGB (the most common monitor transfer function) are considered to "approximately cancel out". Add code to generate shader source for SkColorSpaceTransferFn-based transfer functions. BUG=667966 CQ_INCLUDE_TRYBOTS=master.tryserver.blink:linux_trusty_blink_rel Review-Url: https://codereview.chromium.org/2697413002 Cr-Commit-Position: refs/heads/master@{#452000} Committed: https://chromium.googlesource.com/chromium/src/+/8958419e9b0c522ad702b225c934... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001) as https://chromium.googlesource.com/chromium/src/+/8958419e9b0c522ad702b225c934... |