|
|
Descriptionstarter procs for blending with pm4f
curr/maxrss loops min median mean max stddev samples config bench
8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear
9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb
9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear
9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb
9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear
9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb
9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear
9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb
BUG=skia:
GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&issue=1642703003
TBR=
landing now so these incremental types/functions can be used to collaborate with herb's work. nothing is active at this point
Committed: https://skia.googlesource.com/skia/+/fbc1e296b2e98dc76de533a2bb45d9ccc8c2498f
Patch Set 1 #Patch Set 2 : #Patch Set 3 : #Patch Set 4 : int to float #
Total comments: 27
Messages
Total messages: 31 (15 generated)
Description was changed from ========== starter procs for blending with pm4f BUG=skia: ========== to ========== starter procs for blending with pm4f BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
Description was changed from ========== starter procs for blending with pm4f BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 91µs 91.2µs 193µs 563µs 89% ▁█▆▄▁▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 8/8 MB 3 151µs 151µs 152µs 154µs 1% █▁▁▁▁▁▁▁▄▁ nonrendering xfer4f_srcover_N_opaque_srgb 8/8 MB 2 289µs 290µs 315µs 504µs 21% ▂▁▁█▁▁▁▁▁▁ nonrendering xfer4f_srcover_N_alpha_linear 8/8 MB 1 496µs 496µs 496µs 497µs 0% ▁▂▂▂▂▂█▂▂▂ nonrendering xfer4f_srcover_N_alpha_srgb 8/8 MB 24 13.3µs 13.3µs 13.3µs 13.5µs 0% ▁█▁▁▁▁▁▁▁▁ nonrendering xfer4f_srcover_1_opaque_linear 8/8 MB 25 12.6µs 12.6µs 12.6µs 12.6µs 0% █▄▃▂▁▂▁▂▁▃ nonrendering xfer4f_srcover_1_opaque_srgb 8/8 MB 2 175µs 175µs 176µs 182µs 1% ▁▁▁█▁▁▁▁▁▁ nonrendering xfer4f_srcover_1_alpha_linear 8/8 MB 1 479µs 479µs 479µs 479µs 0% █▆▂▂▂▂▂▂▁▂ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
reed@google.com changed reviewers: + herb@google.com, mtklein@google.com
just a starter set. - need to support coverage at some point - need to swap in "correct" srgb conversions - need to return something more generic for other modes
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1642703003/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1642703003/1
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Mac10.9-Clang-Arm7-Debug-iOS-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Mac10.9-Clang-...) Build-Ubuntu-GCC-Mips-Debug-Android-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Ubuntu-GCC-Mip...)
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1642703003/20001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1642703003/20001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 91µs 91.2µs 193µs 563µs 89% ▁█▆▄▁▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 8/8 MB 3 151µs 151µs 152µs 154µs 1% █▁▁▁▁▁▁▁▄▁ nonrendering xfer4f_srcover_N_opaque_srgb 8/8 MB 2 289µs 290µs 315µs 504µs 21% ▂▁▁█▁▁▁▁▁▁ nonrendering xfer4f_srcover_N_alpha_linear 8/8 MB 1 496µs 496µs 496µs 497µs 0% ▁▂▂▂▂▂█▂▂▂ nonrendering xfer4f_srcover_N_alpha_srgb 8/8 MB 24 13.3µs 13.3µs 13.3µs 13.5µs 0% ▁█▁▁▁▁▁▁▁▁ nonrendering xfer4f_srcover_1_opaque_linear 8/8 MB 25 12.6µs 12.6µs 12.6µs 12.6µs 0% █▄▃▂▁▂▁▂▁▃ nonrendering xfer4f_srcover_1_opaque_srgb 8/8 MB 2 175µs 175µs 176µs 182µs 1% ▁▁▁█▁▁▁▁▁▁ nonrendering xfer4f_srcover_1_alpha_linear 8/8 MB 1 479µs 479µs 479µs 479µs 0% █▆▂▂▂▂▂▂▁▂ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb 9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear 9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb 9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear 9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb 9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear 9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ==========
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1642703003/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1642703003/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: Build-Win-MSVC-x86_64-Debug-Trybot on client.skia.compile (JOB_FAILED, http://build.chromium.org/p/client.skia.compile/builders/Build-Win-MSVC-x86_6...)
The CQ bit was checked by reed@google.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1642703003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1642703003/60001
Now with bench and gm. PTAL
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
Description was changed from ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb 9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear 9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb 9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear 9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb 9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear 9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... ========== to ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb 9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear 9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb 9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear 9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb 9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear 9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... TBR= landing now so these incremental types/functions can be used to collaborate with herb's work. nothing is active at this point ==========
The CQ bit was checked by reed@google.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1642703003/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1642703003/60001
Message was sent while issue was closed.
Description was changed from ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb 9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear 9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb 9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear 9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb 9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear 9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... TBR= landing now so these incremental types/functions can be used to collaborate with herb's work. nothing is active at this point ========== to ========== starter procs for blending with pm4f curr/maxrss loops min median mean max stddev samples config bench 8/8 MB 4 87.1µs 91µs 89.8µs 92µs 2% ▇▇▇▇█▇▅▁▁▁ nonrendering xfer4f_srcover_N_opaque_linear 9/9 MB 2 196µs 196µs 215µs 383µs 27% ▁▁▁▁█▁▁▁▁▁ nonrendering xfer4f_srcover_N_opaque_srgb 9/9 MB 1 313µs 313µs 313µs 313µs 0% ▁▄▅▅▅▂████ nonrendering xfer4f_srcover_N_alpha_linear 9/9 MB 1 580µs 580µs 582µs 602µs 1% ▁▁▁▁▁▁▂▁▁█ nonrendering xfer4f_srcover_N_alpha_srgb 9/9 MB 23 13.1µs 13.1µs 13.1µs 13.1µs 0% ▆▄▄█▂▂▂▁▂▁ nonrendering xfer4f_srcover_1_opaque_linear 9/9 MB 23 13.2µs 13.2µs 13.2µs 13.2µs 0% █▄▂▁▃▁▂▂▂▂ nonrendering xfer4f_srcover_1_opaque_srgb 9/9 MB 2 178µs 183µs 183µs 185µs 1% ▇▇▇█▇▇▇▇▇▁ nonrendering xfer4f_srcover_1_alpha_linear 9/9 MB 1 517µs 517µs 517µs 517µs 0% ▇█▄▃▄▁▂▁▂▄ nonrendering xfer4f_srcover_1_alpha_srgb BUG=skia: GOLD_TRYBOT_URL= https://gold.skia.org/search2?unt=true&query=source_type%3Dgm&master=false&is... TBR= landing now so these incremental types/functions can be used to collaborate with herb's work. nothing is active at this point Committed: https://skia.googlesource.com/skia/+/fbc1e296b2e98dc76de533a2bb45d9ccc8c2498f ==========
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as https://skia.googlesource.com/skia/+/fbc1e296b2e98dc76de533a2bb45d9ccc8c2498f
Message was sent while issue was closed.
https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp File bench/Xfer4fBench.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:26: c.fVec[0] = 1; c.fVec[1] = 1; c.fVec[2] = 1; c.fVec[3] = 1; This can also be SkPM4f c = {{ 1,1,1,1 }}; https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:39: for (int i = 0; i < loops; ++i) { Generally, you can also write for (int i = 0; i < INNER_LOOPS*loops; ++i) { ... } https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:65: DEF_BENCH( return new Xfer4fBench(SkXfermode::kSrcOver_Mode, "srcover", false, kDstIsSRGB_SkXfer4fFlag); ) Wanna throw in equal-pixel-count 8888 benchmarks here as a reference? https://codereview.chromium.org/1642703003/diff/60001/gm/xfer4f.cpp File gm/xfer4f.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/gm/xfer4f.cpp#newcode30 gm/xfer4f.cpp:30: const SkPM4f src = SkPM4f::FromPMColor(SkPreMultiplyColor(c)); Just had a thought that we might want to generally do this in the opposite order (-> float, then premul), so that the premultiply step does not lose precision. https://codereview.chromium.org/1642703003/diff/60001/include/core/SkColor.h File include/core/SkColor.h (right): https://codereview.chromium.org/1642703003/diff/60001/include/core/SkColor.h#... include/core/SkColor.h:172: }; +1 https://codereview.chromium.org/1642703003/diff/60001/include/core/SkColor.h#... include/core/SkColor.h:175: float a() const { return fVec[3]; } Ahem, fVec[A]; https://codereview.chromium.org/1642703003/diff/60001/src/core/SkColor.cpp File src/core/SkColor.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkColor.cpp#ne... src/core/SkColor.cpp:145: bool SkPM4f::isUnit() const { If this is only used inside asserts, we may want to power this down to #ifdef SK_DEBUG void assertIsUnit() const; #else void assertIsUnit() const {} #endif allTrue() is the best way to do this sort of thing, but it's not exactly fast (especially on ARM). Best to not use in hot code. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h File src/core/SkPM4fPriv.h (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:30: static inline Sk4f s2l(const Sk4f& s4) { Had to think about what these were for a second. Might wanna expand some of these names out: sRGB_to_linear(), linear_to_sRGB(). https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:58: static Sk4f unit_to_l255_round(const SkPM4f& pm4) { I think this might be clearer if we start first with completely orthogonal helper functions, e.g. Sk4f to_4f(uint32_t); uint32_t to_4b(Sk4f f255); Sk4f scale_255_to_1(Sk4f f255); Sk4f scale_1_to_255(Sk4f f1); Sk4f to_sRGB(Sk4f f1); Sk4f to_linear(Sk4f f1); then build the compound operations (sRGB 8888 -> linear unit float) out of those exclusively, either as more static helper functions after a big slashline or as little local lambdas. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:67: for (int i = 0; i < (count >> 2); ++i) { This looks right to me, but in my experience I've found a loop skeleton that ticks `count` down is more glanceably correct when different strides are involved: while (count >= 4) { ... src += 4; dst += 4; count -= 4; } while (count >= 1) { ... src++; dst++; count--; } Usually these loops don't require us to name or think about an `i` either, which is nice when it's not otherwise useful. Neither style measurably affects performance. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:94: static inline void SkPM4f_s32_src_mode(SkPMColor dst[], const SkPM4f src[], int count) { I think we're going to have to come up with a new abbreviation for sRGB 8888, or not abbreviate it. I keep reading this as "uses 4 floats... 32-bit source... src mode". "s32" screams 32-bit source to me. Let's just write out _sRGB_ and _linear_ ? We could probably name these SkPM4f_src_sRGB SkPM4f_src_linear SkPM4f_srcover_sRGB SkPM4f_srcover_linear without losing any information. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp File src/core/SkXfer4f.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:14: void CLEAR_pm41p(uint32_t dst[], const SkPM4f& src, int count) { pm41p / pm4np is a bit beyond my readability decryption limit. I don't think we need the pm4 on any of these... they're all going to have that for the foreseeable future. I'm not sure what the last 'p' is for. How about one of: static void CLEAR_const_src(uint32_t dst[], const SkPM4f& src, int count) static void CLEAR_array_src(uint32_t dst[], const SkPM4f* src, int count) or static void CLEAR_1(uint32_t dst[], const SkPM4f& src, int n) static void CLEAR_n(uint32_t dst[], const SkPM4f* src, int n) (I think all these functions in here can/should be static.) https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:64: dst[i] = to_4b(s4 + d4 * scale + Sk4f(0.5f)); Add the half to s4 outside the loop too? https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:79: struct Pair { This struct seems a little dangerous without an anonymous namespace around it. It's very easy to violate ODR with a name like Pair. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:84: const Pair gClearPairs[] = { static for all these too? https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:117: case SkXfermode::kClear_Mode: pairs = gClearPairs; I think you've missed some breaks? https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:123: return &pairs[flags & 3]; I think this could be a lot simpler if we return Pair instead of const Pair*. static Pair find_pair(SkXfermode::Mode mode, uint32_t flags) { SkASSERT(0 == (flags& ~3)); flags &= 3; switch (mode) { case SkXfermode::kClear_Mode: return {CLEAR_pm41p, CLEAR_pm4np}; case SkXfermode::kDst_Mode: return { DST_pm41p, DST_pm4np}; ... case SkXfermode::kSrc_Mode: return gSrcPairs [flags]; case SkXfermode::kSrcOver_Mode: return gSrcOverPairs[flags]; ... default: return {nullptr, nullptr} } } and the callers: return find_pair(mode, flags).fProc1; or return find_pair(mode, flags).fProcN;
Message was sent while issue was closed.
https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp File src/core/SkXfer4f.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkXfer4f.cpp#n... src/core/SkXfer4f.cpp:55: s4 = s4 * Sk4f(255); I think if we break these problems apart into independent factors (sRGB vs. linear dst, single vs many src, blend mode) we can piece them all together with templates and end up with just as fast code. This'd let us write each mode once, figure out sRGB and back once, and let the compiler handle the single-src work hoisting. (The compiler can eliminate all the unnecessary work for special cases like Dst, Src and Clear on its own.) I'd be happy to try to sketch this up.
Message was sent while issue was closed.
all fixes to comments rolling into https://codereview.chromium.org/1634273002/ except that SkXfer4f.cpp is totally gone https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp File bench/Xfer4fBench.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:26: c.fVec[0] = 1; c.fVec[1] = 1; c.fVec[2] = 1; c.fVec[3] = 1; On 2016/01/29 15:23:00, mtklein wrote: > This can also be > SkPM4f c = {{ 1,1,1,1 }}; Done. https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:39: for (int i = 0; i < loops; ++i) { On 2016/01/29 15:23:00, mtklein wrote: > Generally, you can also write > for (int i = 0; i < INNER_LOOPS*loops; ++i) { > ... > } Ah, so you never pass 0 for loops? https://codereview.chromium.org/1642703003/diff/60001/gm/xfer4f.cpp File gm/xfer4f.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/gm/xfer4f.cpp#newcode30 gm/xfer4f.cpp:30: const SkPM4f src = SkPM4f::FromPMColor(SkPreMultiplyColor(c)); On 2016/01/29 15:23:00, mtklein wrote: > Just had a thought that we might want to generally do this in the opposite order > (-> float, then premul), so that the premultiply step does not lose precision. Totally agree, was just being lazy since I already have a helper function to premul the SkColor. Will change. https://codereview.chromium.org/1642703003/diff/60001/include/core/SkColor.h File include/core/SkColor.h (right): https://codereview.chromium.org/1642703003/diff/60001/include/core/SkColor.h#... include/core/SkColor.h:175: float a() const { return fVec[3]; } On 2016/01/29 15:23:00, mtklein wrote: > Ahem, fVec[A]; Doh https://codereview.chromium.org/1642703003/diff/60001/src/core/SkColor.cpp File src/core/SkColor.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkColor.cpp#ne... src/core/SkColor.cpp:145: bool SkPM4f::isUnit() const { On 2016/01/29 15:23:00, mtklein wrote: > If this is only used inside asserts, we may want to power this down to > > #ifdef SK_DEBUG > void assertIsUnit() const; > #else > void assertIsUnit() const {} > #endif > > allTrue() is the best way to do this sort of thing, but it's not exactly fast > (especially on ARM). Best to not use in hot code. Gotcha. No idea if we'll need it for real, so for now I'll just make it a debug-assert. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h File src/core/SkPM4fPriv.h (right): https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:30: static inline Sk4f s2l(const Sk4f& s4) { On 2016/01/29 15:23:00, mtklein wrote: > Had to think about what these were for a second. Might wanna expand some of > these names out: sRGB_to_linear(), linear_to_sRGB(). Done. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:58: static Sk4f unit_to_l255_round(const SkPM4f& pm4) { On 2016/01/29 15:23:00, mtklein wrote: > I think this might be clearer if we start first with completely orthogonal > helper functions, e.g. > > Sk4f to_4f(uint32_t); > uint32_t to_4b(Sk4f f255); > > Sk4f scale_255_to_1(Sk4f f255); > Sk4f scale_1_to_255(Sk4f f1); > > Sk4f to_sRGB(Sk4f f1); > Sk4f to_linear(Sk4f f1); > > then build the compound operations (sRGB 8888 -> linear unit float) out of those > exclusively, either as more static helper functions after a big slashline or as > little local lambdas. > Agree, this is definitely a worktable of experimental helpers. Will work to refine/reduce over time. https://codereview.chromium.org/1642703003/diff/60001/src/core/SkPM4fPriv.h#n... src/core/SkPM4fPriv.h:94: static inline void SkPM4f_s32_src_mode(SkPMColor dst[], const SkPM4f src[], int count) { On 2016/01/29 15:23:00, mtklein wrote: > I think we're going to have to come up with a new abbreviation for sRGB 8888, or > not abbreviate it. I keep reading this as "uses 4 floats... 32-bit source... > src mode". "s32" screams 32-bit source to me. > > Let's just write out _sRGB_ and _linear_ ? > > We could probably name these > SkPM4f_src_sRGB > SkPM4f_src_linear > SkPM4f_srcover_sRGB > SkPM4f_srcover_linear > without losing any information. Done.
Message was sent while issue was closed.
https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp File bench/Xfer4fBench.cpp (right): https://codereview.chromium.org/1642703003/diff/60001/bench/Xfer4fBench.cpp#n... bench/Xfer4fBench.cpp:39: for (int i = 0; i < loops; ++i) { On 2016/01/29 22:35:27, reed1 wrote: > On 2016/01/29 15:23:00, mtklein wrote: > > Generally, you can also write > > for (int i = 0; i < INNER_LOOPS*loops; ++i) { > > ... > > } > > Ah, so you never pass 0 for loops? Um, it's true that nanobench happens to never pass 0 for loops, but I can't help but ask... what do you think goes wrong if loops is 0? That's still just for (int i = 0; i < 0; ++i) { ... } |