Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(64)

Issue 10960023: Add ARM NEON intrinsic optimizations for SincResampler. (Closed)

Created:
8 years, 3 months ago by DaleCurtis
Modified:
8 years, 2 months ago
CC:
chromium-reviews, feature-media-reviews_chromium.org, trchen, Johann
Visibility:
Public.

Description

Add ARM NEON intrinsic optimizations for SincResampler. On an exynos board these yielded an ~2.3x speedup: Benchmarking 50000000 iterations: Convolve_C took 5682.71ms. Convolve_NEON(unaligned) took 2451.18ms; which is 2.32x faster than Convolve_C. Convolve_NEON (aligned) took 2397.01ms; which is 2.37x faster than Convolve_C and 1.02x faster than Convolve_NEON (unaligned). BUG=none TEST=try bot, fischman. Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=158870

Patch Set 1 #

Total comments: 14

Patch Set 2 : Comments. #

Total comments: 10

Patch Set 3 : Clean up. #

Patch Set 4 : Use multiply-accumulate intrinsics. #

Total comments: 3

Patch Set 5 : Use exclusive-or. #

Patch Set 6 : Fix NE issue for ARM. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+94 lines, -28 lines) Patch
M media/base/sinc_resampler.h View 1 3 chunks +6 lines, -3 lines 0 comments Download
M media/base/sinc_resampler.cc View 1 2 3 5 3 chunks +40 lines, -3 lines 0 comments Download
M media/base/sinc_resampler_unittest.cc View 1 2 3 4 5 6 chunks +38 lines, -22 lines 0 comments Download
M media/media.gyp View 1 2 3 2 chunks +10 lines, -0 lines 0 comments Download

Messages

Total messages: 23 (0 generated)
Ami GONE FROM CHROMIUM
https://codereview.chromium.org/10960023/diff/1/media/base/sinc_resampler_unittest.cc File media/base/sinc_resampler_unittest.cc (right): https://codereview.chromium.org/10960023/diff/1/media/base/sinc_resampler_unittest.cc#newcode115 media/base/sinc_resampler_unittest.cc:115: double result2 = result; I don't like this b/c ...
8 years, 3 months ago (2012-09-21 04:41:41 UTC) #1
Ami GONE FROM CHROMIUM
Looks like you got some 'splainin' to do. Due to rietveld limits, splitting this response ...
8 years, 3 months ago (2012-09-21 04:42:43 UTC) #2
Ami GONE FROM CHROMIUM
Then the same cmdline with both the test & .cc file built with USE_NEON: Note: ...
8 years, 3 months ago (2012-09-21 04:42:52 UTC) #3
DaleCurtis
Fast work Ami! Thanks! Too bad the optimizations aren't :) Have to look at the ...
8 years, 3 months ago (2012-09-21 05:40:04 UTC) #4
DaleCurtis
On 2012/09/21 05:40:04, DaleCurtis wrote: > Fast work Ami! Thanks! Too bad the optimizations aren't ...
8 years, 3 months ago (2012-09-21 17:49:36 UTC) #5
trchen
On 2012/09/21 17:49:36, DaleCurtis wrote: > On 2012/09/21 05:40:04, DaleCurtis wrote: > > Fast work ...
8 years, 3 months ago (2012-09-21 21:24:18 UTC) #6
Johann
On 2012/09/21 21:24:18, trchen wrote: > What compiler flags do you use? If I only ...
8 years, 3 months ago (2012-09-21 23:07:53 UTC) #7
DaleCurtis
On 2012/09/21 23:07:53, Johann wrote: > On 2012/09/21 21:24:18, trchen wrote: > > What compiler ...
8 years, 3 months ago (2012-09-21 23:22:58 UTC) #8
Ami GONE FROM WEBRTC_CHROMIUM
The gyp defines I used are in the reviewlog. But this was a Debug build. ...
8 years, 3 months ago (2012-09-22 00:33:05 UTC) #9
Ami GONE FROM CHROMIUM
Yay! 2x speedup! See code comment below, and note there was a failure in SRT.Flush. ...
8 years, 3 months ago (2012-09-22 02:38:27 UTC) #10
DaleCurtis
Thanks for the benchmarks Ami. Comments addressed. https://codereview.chromium.org/10960023/diff/1/media/base/sinc_resampler_unittest.cc File media/base/sinc_resampler_unittest.cc (right): https://codereview.chromium.org/10960023/diff/1/media/base/sinc_resampler_unittest.cc#newcode115 media/base/sinc_resampler_unittest.cc:115: double result2 ...
8 years, 3 months ago (2012-09-24 19:54:36 UTC) #11
Ami GONE FROM CHROMIUM
LGTM % nits https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc File media/base/sinc_resampler_unittest.cc (right): https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc#newcode127 media/base/sinc_resampler_unittest.cc:127: #error This test should only be ...
8 years, 3 months ago (2012-09-24 20:04:16 UTC) #12
DaleCurtis
https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc File media/base/sinc_resampler_unittest.cc (right): https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc#newcode127 media/base/sinc_resampler_unittest.cc:127: #error This test should only be compiled when SSE ...
8 years, 3 months ago (2012-09-24 20:13:50 UTC) #13
DaleCurtis
https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc File media/base/sinc_resampler_unittest.cc (right): https://codereview.chromium.org/10960023/diff/7003/media/base/sinc_resampler_unittest.cc#newcode135 media/base/sinc_resampler_unittest.cc:135: #if defined(ARCH_CPU_X86_FAMILY) && defined(__SSE__) On 2012/09/24 20:04:16, Ami Fischman ...
8 years, 3 months ago (2012-09-24 20:22:27 UTC) #14
Ami GONE FROM CHROMIUM
LGTM
8 years, 3 months ago (2012-09-24 20:24:53 UTC) #15
DaleCurtis
On 2012/09/24 20:24:53, Ami Fischman wrote: > LGTM Ami, I optimized the loop according to ...
8 years, 3 months ago (2012-09-25 00:52:57 UTC) #16
DaleCurtis
trchen or johann, would either of you mind reviewing the ARM NEON code for correctness? ...
8 years, 2 months ago (2012-09-25 17:55:35 UTC) #17
Johann
LGTM http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc File media/base/sinc_resampler.cc (right): http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc#newcode318 media/base/sinc_resampler.cc:318: float32x4_t m_sums2 = vmovq_n_f32(0); For some reason it ...
8 years, 2 months ago (2012-09-25 19:02:51 UTC) #18
DaleCurtis
http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc File media/base/sinc_resampler.cc (right): http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc#newcode318 media/base/sinc_resampler.cc:318: float32x4_t m_sums2 = vmovq_n_f32(0); On 2012/09/25 19:02:51, Johann wrote: ...
8 years, 2 months ago (2012-09-25 20:30:01 UTC) #19
DaleCurtis
http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc File media/base/sinc_resampler.cc (right): http://codereview.chromium.org/10960023/diff/13001/media/base/sinc_resampler.cc#newcode318 media/base/sinc_resampler.cc:318: float32x4_t m_sums2 = vmovq_n_f32(0); On 2012/09/25 20:30:01, DaleCurtis wrote: ...
8 years, 2 months ago (2012-09-25 21:27:00 UTC) #20
Johann
Sounds reasonable to me On Tue, Sep 25, 2012 at 2:27 PM, <dalecurtis@chromium.org> wrote: > ...
8 years, 2 months ago (2012-09-25 21:37:15 UTC) #21
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/dalecurtis@chromium.org/10960023/26001
8 years, 2 months ago (2012-09-25 22:04:11 UTC) #22
commit-bot: I haz the power
8 years, 2 months ago (2012-09-26 00:13:00 UTC) #23

Powered by Google App Engine
This is Rietveld 408576698