|
|
Created:
7 years, 3 months ago by kevin.petit.not.used.account Modified:
7 years, 2 months ago CC:
skia-review_googlegroups.com Base URL:
https://skia.googlecode.com/svn/trunk Visibility:
Public. |
DescriptionARM Skia NEON patches - 28 - Xfermode: SIMD modeprocs
Xfermode: allow for SIMD modeprocs
This patch introduces the ability to have SIMD Xfermode modeprocs.
In the NEON implementation, SIMD modeprocs will process 8 pixels
at a time.
Signed-off-by: Kévin PETIT <kevin.petit@arm.com>
BUG=
Committed: http://code.google.com/p/skia/source/detail?r=11654
Committed: http://code.google.com/p/skia/source/detail?r=11669
Patch Set 1 #
Total comments: 5
Patch Set 2 : #
Total comments: 6
Patch Set 3 : Review comments / cleaning #Patch Set 4 : Review comments #Patch Set 5 : Rebase #Patch Set 6 : Fix a warning with clang #Patch Set 7 : Fix android build + implement serialization #
Messages
Total messages: 28 (0 generated)
ping. This is the foundation for a series of ~20 patches. The whole series: - has no mismatches - brings a big speedup on Xfermodes (up to 3x) - should allow good speed improvements on some CSS-intensive web workloads The speedup is already interesting despite gcc (applies to 4.7) generating *really bad* code for a lot of things. When gcc has it completely right, it will be possible to: - remove the few bits of assembly (4.8 should allow to do this) - benefit from speedups up to 10x (!)
I think this CL can (and probably should) be shrunken down to not affect any of the gyp files (with the exception of adding a few new files to opts.gyp. Making it line up with the factory like methods we use for other opts will allow us to have greater flexibility in environments where we don't know the capabilities of the system until runtime. https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h File include/core/SkXfermode.h (right): https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h#new... include/core/SkXfermode.h:15: #include "SkXfermode_opts.h" we shouldn't have private headers exposed in our public includes. https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h#new... include/core/SkXfermode.h:280: SkXfermodeProcSIMD fProcSIMD; I don't think we should expose this here. My initial reaction is if a custom subclass really wants to include SIMD optimizations they should override the entire xfer..() method and not use SkProcXferMode. However, for the built-in SkProcXferModes that we support we could provide an internal SIMD override. https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp File src/core/SkXfermode.cpp (right): https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp#newco... src/core/SkXfermode.cpp:639: SkXfermodeProcSIMD fProcSIMD; instead of defining this at compile time why not have SkXfermode_opts.h implement a function like this... typedef void (*SkXfermodeProcSIMD)(SkPMColor dst[], const SkPMColor src[], int count, const SkAlpha aa[]); // type is an enum consisting of 32, 16, or A8 entries SkXfermodeProcSIMD SkPlatformXfermodeProcSIMDFactory(unsigned type, SkXferMode::Mode mode) That way for clients like Chrome where it is not known at compile time if NEON is available we can still benefit from these optimizations. https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp#newco... src/core/SkXfermode.cpp:797: #if SK_ARM_NEON_IS_ALWAYS if you do the above approach this code could be put in a static inline function inside SkXfermode_opts_arm_neon.cpp
Thanks a lot for the review. On 2013/10/03 16:38:29, djsollen wrote: > I think this CL can (and probably should) be shrunken down to not affect any of > the gyp files (with the exception of adding a few new files to opts.gyp. > > Making it line up with the factory like methods we use for other opts will allow > us to have greater flexibility in environments where we don't know the > capabilities of the system until runtime. My initial version was less invasive and more in the line of what you're proposing but I dropped it for performance reasons. > https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h > File include/core/SkXfermode.h (right): > > https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h#new... > include/core/SkXfermode.h:15: #include "SkXfermode_opts.h" > we shouldn't have private headers exposed in our public includes. Yes, I agree. I expected that comment and to be honest, this has proven to be a real pain to rebase / maintain because of the gyp files. > https://codereview.chromium.org/23644006/diff/1/include/core/SkXfermode.h#new... > include/core/SkXfermode.h:280: SkXfermodeProcSIMD fProcSIMD; > I don't think we should expose this here. My initial reaction is if a custom > subclass really wants to include SIMD optimizations they should override the > entire xfer..() method and not use SkProcXferMode. > > However, for the built-in SkProcXferModes that we support we could provide an > internal SIMD override. I don't understand your comment about subclasses. This is a private member... The problem of not storing it in the objects is that the xferXX functions have to get the pointer on each call. This cost is negligible but not for small values of count. > https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp > File src/core/SkXfermode.cpp (right): > > https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp#newco... > src/core/SkXfermode.cpp:639: SkXfermodeProcSIMD fProcSIMD; > instead of defining this at compile time why not have SkXfermode_opts.h > implement a function like this... > > typedef void (*SkXfermodeProcSIMD)(SkPMColor dst[], const SkPMColor src[], int > count, const SkAlpha aa[]); > > // type is an enum consisting of 32, 16, or A8 entries > SkXfermodeProcSIMD SkPlatformXfermodeProcSIMDFactory(unsigned type, > SkXferMode::Mode mode) That was my initial approach but there were performance regressions on some non-optimized cases with small values of count which I deemed unacceptable. > That way for clients like Chrome where it is not known at compile time if NEON > is available we can still benefit from these optimizations. OK, that alone invalidates the approach I've kept. > https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp#newco... > src/core/SkXfermode.cpp:797: #if SK_ARM_NEON_IS_ALWAYS > if you do the above approach this code could be put in a static inline function > inside SkXfermode_opts_arm_neon.cpp Yes. ------------------- Here are my proposals for a second version: 1) == - Get rid of the private header in SkXfermode.h and define SkXfermodeProcSIMD directly in it (NEON/non-NEON). This would solve the gyp files mess. - Populate gProcCoeffs at startup time with the relevant pointers in a constructor function. - Have the xfer32 function calling the right helper function at runtime. That would allow to get rid of the NEON code in SkXfermode.cpp and handle the dynamically detected NEON case. This proposal will already have a significant cost for small values of count. 2) == Alternatively, if you really don't care about small values of count, it is possible to get rid of the fProcSIMD member altogether and get everything dynamically from the xfer32 function. If I remember correctly that means the code would be two times slower for count = 1 in the worst case scenario. Overall that's still a big performance win but there could be some pathological cases. All these assume that the common way to create a SkXfermode is using SkXfermode::Create(Mode mode). If that's not the case or if I've missed something important (I'm absolutely not familiar with how this code is used), I'm open to suggestions.
After talking with a few people on the team, I'm convinced that for maintainability we want to move the core logic out of SkXferMode.cpp and into an external function. Unlike the micro-benchmarks in practice we don't often see small values of count so I'm not too concerned about the overhead of one extra function call. > > ------------------- > > Here are my proposals for a second version: > > 1) > == > > - Get rid of the private header in SkXfermode.h and define SkXfermodeProcSIMD > directly in it (NEON/non-NEON). This would solve the gyp files mess. > - Populate gProcCoeffs at startup time with the relevant pointers in a > constructor function. > - Have the xfer32 function calling the right helper function at runtime. That > would allow to get rid of the NEON code in SkXfermode.cpp and handle the > dynamically detected NEON case. > > This proposal will already have a significant cost for small values of count. > > 2) > == > > Alternatively, if you really don't care about small values of count, it is > possible to get rid of the fProcSIMD member altogether and get everything > dynamically from the xfer32 function. If I remember correctly that means the > code would be two times slower for count = 1 in the worst case scenario. Overall > that's still a big performance win but there could be some pathological cases. > > All these assume that the common way to create a SkXfermode is using > SkXfermode::Create(Mode mode). If that's not the case or if I've missed > something important (I'm absolutely not familiar with how this code is used), > I'm open to suggestions.
Thanks for the feedback. On 2013/10/04 13:00:44, djsollen wrote: > Unlike the micro-benchmarks in practice we don't often see > small values of count so I'm not too concerned about the overhead of one extra > function call. I hadn't studied everything in great details but I don't think the function call alone was responsible for the slowdown. Code locality probably had an impact too. > After talking with a few people on the team, I'm convinced that for > maintainability we want to move the core logic out of SkXferMode.cpp and into an > external function. Ok great then I'll go for my first proposal minus the constructor function which in fact isn't necessary. To sum up: - Get rid of the private header in SkXfermode.h and define SkXfermodeProcSIMD directly in it (NEON/non-NEON). This would solve the gyp files mess. - Have the xfer32 function calling the right helper function at runtime. That would allow to get rid of the NEON code in SkXfermode.cpp and handle the dynamically detected NEON case. - Keep the static definition of the SIMD procs in gProcCoeffs. If you have objections, please shout now.
On 2013/10/04 13:24:06, kevin.petit.arm wrote: > Thanks for the feedback. > > On 2013/10/04 13:00:44, djsollen wrote: > > Unlike the micro-benchmarks in practice we don't often see > > small values of count so I'm not too concerned about the overhead of one extra > > function call. > > I hadn't studied everything in great details but I don't think the function call > alone was responsible for the slowdown. Code locality probably had an impact > too. > > > After talking with a few people on the team, I'm convinced that for > > maintainability we want to move the core logic out of SkXferMode.cpp and into > an > > external function. > > Ok great then I'll go for my first proposal minus the constructor function which > in fact isn't necessary. To sum up: > > - Get rid of the private header in SkXfermode.h and define SkXfermodeProcSIMD > directly in it (NEON/non-NEON). This would solve the gyp files mess. Yes. > - Have the xfer32 function calling the right helper function at runtime. That > would allow to get rid of the NEON code in SkXfermode.cpp and handle the > dynamically detected NEON case. Yes. For reference see how SkBlitRow.h defines PlatformProcs32(unsigned flags) and how we then implement that function on a per platform basis. > - Keep the static definition of the SIMD procs in gProcCoeffs. No. Why do you need to keep the modification in gProcCoeffs? > > If you have objections, please shout now.
https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp File src/core/SkXfermode.cpp (right): https://codereview.chromium.org/23644006/diff/1/src/core/SkXfermode.cpp#newco... src/core/SkXfermode.cpp:797: #if SK_ARM_NEON_IS_ALWAYS On 2013/10/03 16:38:29, djsollen wrote: > if you do the above approach this code could be put in a static inline function > inside SkXfermode_opts_arm_neon.cpp Looks like we could also return a different subclass when we detect the SIMD proc at factory time. Would make this more readable.
On 2013/10/04 13:28:53, djsollen wrote: > On 2013/10/04 13:24:06, kevin.petit.arm wrote: > > - Keep the static definition of the SIMD procs in gProcCoeffs. > > No. Why do you need to keep the modification in gProcCoeffs? When creating a SkProcCoeffXfermode with SkXfermode::Create(Mode mode) the object contains a pointer to the proc (owned by the super class) and the mode. However this class doesn't override xfer32 and the version used comes from SkProcXfermode which only has a pointer to the proc function. The SIMD code needs to know which ProcSIMD function to call and for this I see the following solutions: - Determine the pointer to the SIMD proc from a pointer to the proc. That would need a map or, alternatively, to look for the mode number in gProcCoeffs by comparing pointers and then use the number to get a pointer to the SIMD proc from another table. Those two solutions look dodgy to me. - Move the fMode member up in the class hierarchy and have a table containing pointers to the SIMD procs. To be honest I haven't looked extensively in this approach as I thought there ought to be a good reason for fMode to be defined only in SkProcCoeffXfermode. - Store a pointer to the SIMD proc at the creation time along with the one to the proc. Those pointers being in the same table as the ones to the procs. I thought that was the cleanest way of doing things that allowed not to have to lookup the pointer in xfer32.
Or you could go with reed's solution that would look something like this... 1) In SkXferMode::Create you query to see if their is a platform specific implementation of xfer32. 2) If so then create a separate subclass of SkProcCoeffXfermode that stores the platform proc and overrides xfer32 to use it.
On 2013/10/04 14:21:34, djsollen wrote: > Or you could go with reed's solution that would look something like this... > > 1) In SkXferMode::Create you query to see if their is a platform specific > implementation of xfer32. > 2) If so then create a separate subclass of SkProcCoeffXfermode that stores the > platform proc and overrides xfer32 to use it. That looks like a better idea :-). Thanks to the both of you, I'll give that a try.
Reed's idea is the way to go. It has so many advantages that I'm ashamed I missed that approach. Patch Set 2 implements it and looks a lot cleaner than the initial one. However I'm not happy I had to move the definition of SkProcCoeffXfermode in a separate header.
not critical, but we'd prefer to not have so much impl inside a shared header for virtual methods. Can we move those impls into either SkXfermode.cpp or a new impl file? https://codereview.chromium.org/23644006/diff/16001/include/core/SkXfermode.h File include/core/SkXfermode.h (right): https://codereview.chromium.org/23644006/diff/16001/include/core/SkXfermode.h... include/core/SkXfermode.h:274: SkXfermodeProc fProc; I presume your subclasses need access to fProc. Is that read-only access? I hope so, as we now *require* that xfermodes be immutable once created, so we can share instances of them between threads. Assuming it is read-only, can you just add a (protected) getter instead of exposing the field?
https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... File src/opts/SkXfermode_opts_arm.cpp (right): https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... src/opts/SkXfermode_opts_arm.cpp:83: [SkXfermode::kSrc_Mode] = NULL, spacing https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... src/opts/SkXfermode_opts_arm.cpp:113: }; Add something like SK_COMPILE_ASSERT(SK_ARRAY_COUNT(gNEONXfermodeProcs) == SkXferMode::kLastMode + 1, mode_count_arm);
https://codereview.chromium.org/23644006/diff/16001/include/core/SkXfermode.h File include/core/SkXfermode.h (right): https://codereview.chromium.org/23644006/diff/16001/include/core/SkXfermode.h... include/core/SkXfermode.h:274: SkXfermodeProc fProc; On 2013/10/07 16:24:31, reed1 wrote: > I presume your subclasses need access to fProc. Is that read-only access? I hope > so, as we now *require* that xfermodes be immutable once created, so we can > share instances of them between threads. > > Assuming it is read-only, can you just add a (protected) getter instead of > exposing the field? Done. https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... File src/opts/SkXfermode_opts_arm.cpp (right): https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... src/opts/SkXfermode_opts_arm.cpp:83: [SkXfermode::kSrc_Mode] = NULL, On 2013/10/07 16:58:03, djsollen wrote: > spacing Done. https://codereview.chromium.org/23644006/diff/16001/src/opts/SkXfermode_opts_... src/opts/SkXfermode_opts_arm.cpp:113: }; On 2013/10/07 16:58:03, djsollen wrote: > Add something like > SK_COMPILE_ASSERT(SK_ARRAY_COUNT(gNEONXfermodeProcs) == SkXferMode::kLastMode + > 1, mode_count_arm); Done.
I've rebased, fixed conflicts and did a quick retest. Any more comments?
assuming existing tests and benches exercise the new code, lgtm
This code is just a skeleton and should have no effect on tests and benchmarks (which already exist BTW). I'll commit this one and upload the NEON code that will make it useful after.
CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/kevin.petit.arm@gmail.com/23644006/30001
Retried try job too often on Build-Mac10.7-Clang-x86-Release-Trybot for step(s) BuildSkiaLib http://skiabot-master.pogerlabs.com:10117/buildstatus?builder=Build-Mac10.7-C...
CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/kevin.petit.arm@gmail.com/23644006/44001
Message was sent while issue was closed.
Change committed as 11654
Message was sent while issue was closed.
This change has been reverted due to breakages in the Android build bots. https://code.google.com/p/skia/source/detail?r=11655
On 2013/10/08 17:01:11, djsollen wrote: > This change has been reverted due to breakages in the Android build bots. > > https://code.google.com/p/skia/source/detail?r=11655 What was the issue? I can't find any log.
On 2013/10/08 17:09:20, kevin.petit.arm wrote: > On 2013/10/08 17:01:11, djsollen wrote: > > This change has been reverted due to breakages in the Android build bots. > > > > https://code.google.com/p/skia/source/detail?r=11655 > > What was the issue? I can't find any log. Console: http://108.170.217.252:10117/console Some of the failing logs... http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-GalaxyNex... http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/bu... http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-NexusS/bu...
> > On 2013/10/08 17:01:11, djsollen wrote: > > > This change has been reverted due to breakages in the Android build bots. > > > > What was the issue? I can't find any log. > > Console: http://108.170.217.252:10117/console > > Some of the failing logs... > http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-GalaxyNex... > http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-Nexus4/bu... > http://108.170.217.252:10117/builders/Build-Ubuntu12-GCC-Arm7-Debug-NexusS/bu... Thanks. I've fixed that and also added the implementation for serialization which I had missed. Patch Set 7 should hopefully stay more than 10 minutes in the tree :-).
CQ is trying da patch. Follow status at https://skia-tree-status.appspot.com/cq/kevin.petit.arm@gmail.com/23644006/51001
Message was sent while issue was closed.
Change committed as 11669 |