|
|
Created:
6 years, 8 months ago by KhNo Modified:
4 years, 7 months ago Reviewers:
Raymond Toy CC:
blink-reviews Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Visibility:
Public. |
DescriptionMore optimize approach for NEON in VectorMath
I have tested some webkit-patch for performance of vectorMath.
https://bugs.webkit.org/show_bug.cgi?id=110744
The test result are presented enhanced point.
Test Site :
https://chromium.googlecode.com/svn/trunk/samples/audio/convolution-effects.html
Spoken Word (spring reverb simulation)
Test Device :
Galaxy S5
Test Result :
<Original>
zvmul -> 5.22 %
vadd -> 4.53 %
<Unlooprolling>
zvmul -> 4.29 %
vadd -> 4.40 %
BUG=416405
Patch Set 1 #
Total comments: 4
Patch Set 2 : relocate function calling for caching machanism. #Messages
Total messages: 16 (3 generated)
keonho07.kim@samsung.com changed reviewers: + rtoy@google.com
Message was sent while issue was closed.
keonho07.kim@samsung.com changed reviewers: + rtoy@chromium.org - rtoy@google.com
Message was sent while issue was closed.
On 2014/09/03 09:25:17, KhNo wrote: > mailto:keonho07.kim@samsung.com changed reviewers: > + mailto:rtoy@chromium.org > - mailto:rtoy@google.com Hi, Raymond. I have tested this patch from webkit. However, there was no big enhanced point differently with kaustubh joshi's comment. It was little bad when we use unloop rolling way. This patch for just sharing information to you.
Message was sent while issue was closed.
On 2014/09/03 09:28:08, KhNo wrote: > On 2014/09/03 09:25:17, KhNo wrote: > > mailto:keonho07.kim@samsung.com changed reviewers: > > + mailto:rtoy@chromium.org > > - mailto:rtoy@google.com > > Hi, Raymond. > > I have tested this patch from webkit. > However, there was no big enhanced point differently with kaustubh joshi's > comment. > It was little bad when we use unloop rolling way. > This patch for just sharing information to you. In the description you give some percentages. What do these mean? And it looks like you have some cpu percentages for some kind of test. What is being tested? Why is only 30-40% cpu in use? As I mentioned earlier, this code, while nice, is never enabled for Chrome on Android because Chrome still needs to run on devices without NEON. I'm very interested in having these optimizations, but we'll have to do them in a different way and add run-time selection of the NEON and non-NEON code paths. We do this for the OpenMAX DL FFT routines. We'll have to do something similar for VectorMath.
Message was sent while issue was closed.
On 2014/09/03 16:33:36, Raymond Toy wrote: > On 2014/09/03 09:28:08, KhNo wrote: > > On 2014/09/03 09:25:17, KhNo wrote: > > > mailto:keonho07.kim@samsung.com changed reviewers: > > > + mailto:rtoy@chromium.org > > > - mailto:rtoy@google.com > > > > Hi, Raymond. > > > > I have tested this patch from webkit. > > However, there was no big enhanced point differently with kaustubh joshi's > > comment. > > It was little bad when we use unloop rolling way. > > This patch for just sharing information to you. > > In the description you give some percentages. What do these mean? And it looks > like you > have some cpu percentages for some kind of test. What is being tested? Why is > only 30-40% > cpu in use? > I should have give more detailed information about my test. I have tested with 3 ways to check cpu usages. Test Site : http://chromium.googlecode.com/svn/trunk/samples/audio/shiny-drum-machine.html (I have know this site one of the most heaviest.) Test Device : Galaxy S5 1. non-neon (common.gypi : arm_neon = 0 and arm_neon_optional = 1) Average CPU % = 37.45% This is default about arm neon usage in current chromium. Blink doesn't handle arm_neon_optional in blink_platform.gyp. So, arm_fpu, arm_float_abi and arm_thumb are " " arm_neon_optional, other thirdpary lib as Skia is using to check on runtime arm supporting. 2. always neon (common.gypi : arm_neon =1 and arm_neon_optional = 0) Average CPU % = 33.7% The flags, arm_neon and arm_neon_optional can not be set both 1. compile option, arm_fpu="neon" , arm_float_abi="softfp" and arm_thumb=1 are enabled. So, ARM_NEON_INTRINSICS is enabled in VectorMath.cpp 3. always neon + unloop rolling patch (common.gypi : arm_neon =1 and arm_neon_optional = 0) Average CPU % = 34.1% Kastubh Joshi patch for vectorMath that is using unloop rolling way. I believe that it can give optimization of neon a little. However, logically in web audio feature, that is very little. If you want to apply this patch, I can revert code review status to open. It will give potential enhancement if there is case that is using VectorMath many times and recursively. > As I mentioned earlier, this code, while nice, is never enabled for Chrome on > Android because > Chrome still needs to run on devices without NEON. I'm very interested in having > these > optimizations, but we'll have to do them in a different way and add run-time > selection of > the NEON and non-NEON code paths. We do this for the OpenMAX DL FFT routines. > We'll have to > do something similar for VectorMath. Currently, as you know, blink is not using default neon for supporting non-neon devices. If it makes supporting, arm_neon flag should be 1. As we discussed previously by email, blink needs to handle arm_neon_optional flag as skia. Now I'm preparing patch that make neon supporting with checking runtime same as skia. Skia has also 3 way type about neon. 1. non-neon 2. always neon 3. dynamic neon ( default ) When it uses option 3, skia build two symbols with _neon. Then, it is checking neon supporting from chipset by __attribute__((pure)) Please refer how they can check with __attribute__((pure)) in SkUtilsArm.h and SkPreConfig.h Now I'm preparing patch with this way. It will be included some neon flag on other module ( platform/graphic - WebGL ) After my patch is successfully landed, thirdparty/blink will support checking on runtime about neon supporting . I hope you and some owner will review my patch set soon about it.
On 2014/09/04 04:02:51, KhNo wrote: > On 2014/09/03 16:33:36, Raymond Toy wrote: > > On 2014/09/03 09:28:08, KhNo wrote: > > > On 2014/09/03 09:25:17, KhNo wrote: > > > > mailto:keonho07.kim@samsung.com changed reviewers: > > > > + mailto:rtoy@chromium.org > > > > - mailto:rtoy@google.com > > > > > > Hi, Raymond. > > > > > > I have tested this patch from webkit. > > > However, there was no big enhanced point differently with kaustubh joshi's > > > comment. > > > It was little bad when we use unloop rolling way. > > > This patch for just sharing information to you. > > > > In the description you give some percentages. What do these mean? And it > looks > > like you > > have some cpu percentages for some kind of test. What is being tested? Why is > > only 30-40% > > cpu in use? > > > > I should have give more detailed information about my test. > I have tested with 3 ways to check cpu usages. > > Test Site : > http://chromium.googlecode.com/svn/trunk/samples/audio/shiny-drum-machine.html > (I have know this site one of the most heaviest.) > Test Device : Galaxy S5 I don't think this is a very good test. I didn't profile shiny-drum-machine, but I'm guessing it is spending the majority of it's time doing FFTs. I did some profiling on /audio/simple.html with a Galaxy Nexus. The FFT routines were taking 60%. The VectorMath::zvmul routine was taking 6%; vsmul, 2.6%. You need to come up with some other testing scheme that exercises the routines more. Or use perf to measure the cost of each routine for some demo and see how the percentages change different implementations are used. > > 1. non-neon (common.gypi : arm_neon = 0 and arm_neon_optional = 1) > Average CPU % = 37.45% > This is default about arm neon usage in current chromium. > Blink doesn't handle arm_neon_optional in blink_platform.gyp. > So, arm_fpu, arm_float_abi and arm_thumb are " " > arm_neon_optional, other thirdpary lib as Skia is using to check on runtime arm > supporting. > > 2. always neon (common.gypi : arm_neon =1 and arm_neon_optional = 0) > Average CPU % = 33.7% > The flags, arm_neon and arm_neon_optional can not be set both 1. > compile option, arm_fpu="neon" , arm_float_abi="softfp" and arm_thumb=1 are > enabled. > So, ARM_NEON_INTRINSICS is enabled in VectorMath.cpp > > 3. always neon + unloop rolling patch (common.gypi : arm_neon =1 and > arm_neon_optional = 0) > Average CPU % = 34.1% > Kastubh Joshi patch for vectorMath that is using unloop rolling way. > I believe that it can give optimization of neon a little. > However, logically in web audio feature, that is very little. > If you want to apply this patch, I can revert code review status to open. > It will give potential enhancement if there is case that is using VectorMath > many times and recursively. > > > > As I mentioned earlier, this code, while nice, is never enabled for Chrome on > > Android because > > Chrome still needs to run on devices without NEON. I'm very interested in > having > > these > > optimizations, but we'll have to do them in a different way and add run-time > > selection of > > the NEON and non-NEON code paths. We do this for the OpenMAX DL FFT routines. > > We'll have to > > do something similar for VectorMath. > > Currently, as you know, blink is not using default neon for supporting non-neon > devices. > If it makes supporting, arm_neon flag should be 1. > As we discussed previously by email, blink needs to handle arm_neon_optional > flag as skia. > Now I'm preparing patch that make neon supporting with checking runtime same as > skia. > > Skia has also 3 way type about neon. > 1. non-neon > 2. always neon > 3. dynamic neon ( default ) > > When it uses option 3, skia build two symbols with _neon. > Then, it is checking neon supporting from chipset by __attribute__((pure)) > Please refer how they can check with __attribute__((pure)) in SkUtilsArm.h and > SkPreConfig.h > > Now I'm preparing patch with this way. It will be included some neon flag on > other module ( platform/graphic - WebGL ) > After my patch is successfully landed, thirdparty/blink will support checking on > runtime about neon supporting . > I hope you and some owner will review my patch set soon about it. I would be excited to see these changes. I think we only need to support dynamic neon; you don't have to support any other way at this time.
On 2014/09/04 17:01:00, Raymond Toy wrote: > On 2014/09/04 04:02:51, KhNo wrote: > > On 2014/09/03 16:33:36, Raymond Toy wrote: > > > On 2014/09/03 09:28:08, KhNo wrote: > > > > On 2014/09/03 09:25:17, KhNo wrote: > > > > > mailto:keonho07.kim@samsung.com changed reviewers: > > > > > + mailto:rtoy@chromium.org > > > > > - mailto:rtoy@google.com > > > > > > > > Hi, Raymond. > > > > > > > > I have tested this patch from webkit. > > > > However, there was no big enhanced point differently with kaustubh joshi's > > > > comment. > > > > It was little bad when we use unloop rolling way. > > > > This patch for just sharing information to you. > > > > > > In the description you give some percentages. What do these mean? And it > > looks > > > like you > > > have some cpu percentages for some kind of test. What is being tested? Why > is > > > only 30-40% > > > cpu in use? > > > > > > > I should have give more detailed information about my test. > > I have tested with 3 ways to check cpu usages. > > > > Test Site : > > http://chromium.googlecode.com/svn/trunk/samples/audio/shiny-drum-machine.html > > (I have know this site one of the most heaviest.) > > Test Device : Galaxy S5 > > I don't think this is a very good test. I didn't profile shiny-drum-machine, but > I'm guessing it is spending the majority of it's time doing FFTs. I did some > profiling on /audio/simple.html with a Galaxy Nexus. The FFT routines were > taking 60%. The VectorMath::zvmul routine was taking 6%; vsmul, 2.6%. > > You need to come up with some other testing scheme that exercises the routines > more. Or use perf to measure the cost of each routine for some demo and see how > the percentages change different implementations are used. > Ok, I will create a simple test, then use profiling tools exactly VectorMath:: is used how much. Thanks for giving hint. > > > > 1. non-neon (common.gypi : arm_neon = 0 and arm_neon_optional = 1) > > Average CPU % = 37.45% > > This is default about arm neon usage in current chromium. > > Blink doesn't handle arm_neon_optional in blink_platform.gyp. > > So, arm_fpu, arm_float_abi and arm_thumb are " " > > arm_neon_optional, other thirdpary lib as Skia is using to check on runtime > arm > > supporting. > > > > 2. always neon (common.gypi : arm_neon =1 and arm_neon_optional = 0) > > Average CPU % = 33.7% > > The flags, arm_neon and arm_neon_optional can not be set both 1. > > compile option, arm_fpu="neon" , arm_float_abi="softfp" and arm_thumb=1 are > > enabled. > > So, ARM_NEON_INTRINSICS is enabled in VectorMath.cpp > > > > 3. always neon + unloop rolling patch (common.gypi : arm_neon =1 and > > arm_neon_optional = 0) > > Average CPU % = 34.1% > > Kastubh Joshi patch for vectorMath that is using unloop rolling way. > > I believe that it can give optimization of neon a little. > > However, logically in web audio feature, that is very little. > > If you want to apply this patch, I can revert code review status to open. > > It will give potential enhancement if there is case that is using VectorMath > > many times and recursively. > > > > > > > As I mentioned earlier, this code, while nice, is never enabled for Chrome > on > > > Android because > > > Chrome still needs to run on devices without NEON. I'm very interested in > > having > > > these > > > optimizations, but we'll have to do them in a different way and add run-time > > > selection of > > > the NEON and non-NEON code paths. We do this for the OpenMAX DL FFT > routines. > > > We'll have to > > > do something similar for VectorMath. > > > > Currently, as you know, blink is not using default neon for supporting > non-neon > > devices. > > If it makes supporting, arm_neon flag should be 1. > > As we discussed previously by email, blink needs to handle arm_neon_optional > > flag as skia. > > Now I'm preparing patch that make neon supporting with checking runtime same > as > > skia. > > > > Skia has also 3 way type about neon. > > 1. non-neon > > 2. always neon > > 3. dynamic neon ( default ) > > > > When it uses option 3, skia build two symbols with _neon. > > Then, it is checking neon supporting from chipset by __attribute__((pure)) > > Please refer how they can check with __attribute__((pure)) in SkUtilsArm.h and > > SkPreConfig.h > > > > Now I'm preparing patch with this way. It will be included some neon flag on > > other module ( platform/graphic - WebGL ) > > After my patch is successfully landed, thirdparty/blink will support checking > on > > runtime about neon supporting . > > I hope you and some owner will review my patch set soon about it. > > I would be excited to see these changes. I think we only need to support > dynamic neon; you don't have to support any other way at this time.
This reply just for moving test result from description since it is too long. # non-neon 18272 3 40% S 26 1579604K 303508K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 35% S 26 1580448K 305836K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 33% S 26 1583380K 307600K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 39% S 26 1584260K 309544K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 39% S 26 1585144K 311480K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 40% S 26 1588148K 313344K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 34% S 26 1588956K 315052K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 36% S 26 1591872K 316628K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 36% S 26 1592852K 317912K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 36% S 26 1593672K 319640K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 39% S 26 1596544K 321284K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 42% S 26 1598656K 322952K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 37% S 26 1601632K 324556K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 36% S 26 1602508K 326672K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 35% S 26 1603356K 328196K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 38% S 26 1606256K 329936K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 35% S 26 1607152K 331604K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 40% S 26 1610028K 333332K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 41% S 26 1610936K 335132K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 18272 3 38% S 26 1611828K 336840K fg u0_i2 org.chromium.chrome.shell:sandboxed_process0 # neon = 33.7 % 23322 1 36% S 26 1570176K 288964K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 33% S 26 1571040K 290684K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 30% S 26 1571932K 292244K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 33% S 26 1574852K 294092K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 29% S 26 1575680K 295808K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 32% S 26 1578632K 297608K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 30% S 26 1579488K 299308K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 33% S 26 1580360K 301560K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 31% S 26 1583284K 303212K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 34% S 26 1584272K 304812K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 38% S 26 1587112K 306596K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 35% S 26 1587940K 308200K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 35% S 26 1588844K 310008K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 36% S 26 1591768K 311860K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 32% S 26 1592632K 313704K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 37% S 26 1593600K 315680K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 33% S 26 1596504K 317412K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 39% S 26 1597368K 318928K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 36% S 26 1600252K 320672K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 23322 1 32% S 26 1601348K 322440K fg u0_i3 org.chromium.chrome.shell:sandboxed_process0 # neon + unloop rolling = 34. 1% 25795 1 35% S 26 1566004K 289000K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 30% S 26 1568780K 290560K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 34% S 26 1569720K 292692K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 35% S 26 1570560K 294500K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 35% S 26 1573532K 296116K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 34% S 26 1574360K 297896K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 37% S 26 1577240K 299472K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 38% S 26 1578136K 301324K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 31% S 26 1579012K 303144K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 36% S 26 1582032K 305072K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 36% S 26 1582884K 306736K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 35% S 26 1585728K 308380K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 32% S 26 1586660K 310188K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 36% S 26 1587504K 312008K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 31% S 26 1590456K 313872K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 31% S 26 1591416K 315584K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 34% S 26 1594304K 317304K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 34% S 26 1595168K 319112K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 36% S 26 1596012K 321172K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0 25795 1 32% S 26 1599060K 322972K fg u0_i4 org.chromium.chrome.shell:sandboxed_process0
Dear Raymond. I did a profiling specific sample for this patch. There was enhanced point for VectorMath functions if unloop rolling method is applied. <Test demo> https://chromium.googlecode.com/svn/trunk/samples/audio/convolution-effects.html Spoken Word (spring reverb simulation) <Original> zvmul -> 5.22 % vadd -> 4.53 % <Unlooprolling> zvmul -> 4.29 % vadd -> 4.40 % I have attached the screenshot of perf data. I think we should apply it for low-end device (neon existed).
Code looks good, with a few minor questions. I'm not sure it's really worth doing these changes here because chrome on android is never compiled with NEON enabled. The improvements you show with unrolling are, however, quite nice. https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... File Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... Source/platform/audio/VectorMath.cpp:278: float32x4_t scaleNum = vdupq_n_f32(*scale); Is there any performance difference between making 4 copies of the same item in scaleNum instead of doing a vector multiply by a single float scalar, as done in the original code? https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... Source/platform/audio/VectorMath.cpp:497: float32x4_t source20 = vld1q_f32(source2P); It might be advantageous to load all of the elements from source1 and then source2 instead of alternating between source1 and source2 Caching might work out better?
Patchset #2 (id:20001) has been deleted
PTAL, I have changed hardly to build and check performance common.gypi file such as below. ['OS=="android"', { 'arm_neon%': 1, 'arm_neon_optional%': 0, ARM_NEON_INTRINSICS was enabled and –mfpu=neon, -mfloat-abi=softfp were defined. As I mentioned before, I will raise up patch for runtime neon checking for blink. https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... File Source/platform/audio/VectorMath.cpp (right): https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... Source/platform/audio/VectorMath.cpp:278: float32x4_t scaleNum = vdupq_n_f32(*scale); On 2014/09/22 20:00:36, Raymond Toy wrote: > Is there any performance difference between making 4 copies of the same item in > scaleNum instead of doing a vector multiply by a single float scalar, as done in > the original code? No performance difference for scaleNum, It is just for making correct type develver to vmulq_f32 function instead of vmulq_n_f32 that is originally used. https://codereview.chromium.org/255573002/diff/1/Source/platform/audio/Vector... Source/platform/audio/VectorMath.cpp:497: float32x4_t source20 = vld1q_f32(source2P); On 2014/09/22 20:00:35, Raymond Toy wrote: > It might be advantageous to load all of the elements from source1 and then > source2 instead of alternating between source1 and source2 Caching might work > out better? Thanks for review, I think also it is better for cache mechanism.
I talked this over with kbr@ and we've decided that, since Chrome on Android never compiles with NEON enabled (because Chrome needs to run on devices without NEON), we do not want this change. Instead, we need to figure out how to split this into a NEON and non-NEON version and do a runtime check. You mentioned that skia does this, so we should try to do something similar.
On 2014/09/24 18:34:09, Raymond Toy wrote: > I talked this over with kbr@ and we've decided that, since Chrome on Android > never compiles with NEON enabled (because Chrome needs to run on devices without > NEON), we do not want this change. > > Instead, we need to figure out how to split this into a NEON and non-NEON > version and do a runtime check. You mentioned that skia does this, so we should > try to do something similar. Sure, let's take look again this patch after successfully supporting run-time neon optimization. If I say basic concept how they are during for runtime-checking, 1. build two symbol and symbol_neon if arm_optional is enabled. 2. use wrapper to call whether symbol or symbol_neon 3. inside wrapper included runtime-checking routine. for android their is APIs in NDK. // Use the Android NDK's cpu-features helper library to detect NEON at runtime. // See http://crbug.com/164154 for skia to see why this is needed in Chromium for Android. result = (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0;
On 2014/09/25 02:17:37, KhNo wrote: > On 2014/09/24 18:34:09, Raymond Toy wrote: > > I talked this over with kbr@ and we've decided that, since Chrome on Android > > never compiles with NEON enabled (because Chrome needs to run on devices > without > > NEON), we do not want this change. > > > > Instead, we need to figure out how to split this into a NEON and non-NEON > > version and do a runtime check. You mentioned that skia does this, so we > should > > try to do something similar. > > Sure, let's take look again this patch after successfully supporting run-time > neon optimization. > > If I say basic concept how they are during for runtime-checking, > 1. build two symbol and symbol_neon if arm_optional is enabled. > 2. use wrapper to call whether symbol or symbol_neon > 3. inside wrapper included runtime-checking routine. > for android their is APIs in NDK. > // Use the Android NDK's cpu-features helper library to detect NEON at runtime. > // See http://crbug.com/164154 for skia to see why this is needed in Chromium > for Android. > result = (android_getCpuFeatures() & ANDROID_CPU_ARM_FEATURE_NEON) != 0; You can see how I did this for the OpenMAX DL FFT library: https://webrtc-codereview.appspot.com/2065004/ I don't know if this approach can be done here because openmax dl creates two libraries, one for neon and and one without because then the appropriate flags can be set for each library. (I don't think you can set compile flags on individual files.) We'll have to see exactly how skia does it.
Now that we can use NEON in Chrome without having to do detection (compile-time option), we should revisit this CL. And also make it work with ARMv8 if the intrinsics are different. |