Issue 1288773003: SSE2 Optimization for ParamEvent::SetTarget

Issue 1288773003: SSE2 Optimization for ParamEvent::SetTarget (Closed)

Created:
5 years, 4 months ago by adrian.belgun

Modified:
5 years, 4 months ago

Reviewers:
Raymond Toy, robert.bradford, hongchan

CC:
blink-reviews

Base URL:
https://chromium.googlesource.com/chromium/blink.git@master

Target Ref:
refs/heads/master

Project:
blink

Visibility:
Public.

More Reviews

Description

SSE2 Optimization for ParamEvent::SetTarget Optimized with SSE2 and obtained 4.37x speedup. Standard: 2.900553 seconds SSE2 (fp32): 0.662973 seconds Speedup (fp32): 4.375067 Due to multiple multiplications with a small discreteTimeConstant, some divergences appear versus the serialized version. Test adjusted accordingly. BUG=512376 TEST=Run webaudio/audioparam-setTargetAtTime.html Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=200726

Patch Set 1 #

Total comments: 7

Patch Set 2 : Added extra comments. #

Total comments: 2

Patch Set 3 : Calculate constants using Horner's rule. #

Patch Set 4 : Added missing header. #

Patch Set 5 : Include SSE2 header only on x86/x86_64. #

Created: 5 years, 4 months ago

Download [raw] [tar.bz2]

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+38 lines, -1 line)			Patch
	M	LayoutTests/webaudio/audioparam-setTargetAtTime.html	View		1 chunk	+1 line, -1 line	0 comments	Download
	M	Source/modules/webaudio/AudioParamTimeline.cpp	View	1 2 3 4	2 chunks	+37 lines, -0 lines	0 comments	Download

Messages

Total messages: 23 (8 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

adrian.belgun

rtoy@chromium.org: Please review changes in Source/modules/webaudio/AudioParamTimeline.cpp Could you please recommend a reviewer for the test? ...

5 years, 4 months ago (2015-08-13 09:27:13 UTC) #2

Raymond Toy

On 2015/08/13 09:27:13, adrian.belgun wrote: > mailto:rtoy@chromium.org: Please review changes in > Source/modules/webaudio/AudioParamTimeline.cpp > > ...

5 years, 4 months ago (2015-08-13 15:36:02 UTC) #3

Raymond Toy

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp#newcode353 Source/modules/webaudio/AudioParamTimeline.cpp:353: // Necessary for unrolling.. As in the other panner ...

5 years, 4 months ago (2015-08-13 15:42:56 UTC) #6

adrian.belgun

Are you OK with this approach? I would like to also try to optimize other ...

5 years, 4 months ago (2015-08-14 07:49:21 UTC) #7

Raymond Toy

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp#newcode357 Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 ...

5 years, 4 months ago (2015-08-14 16:00:03 UTC) #8

Raymond Toy

On 2015/08/14 07:49:21, adrian.belgun wrote: > Are you OK with this approach? I would like ...

5 years, 4 months ago (2015-08-14 16:00:43 UTC) #9

adrian.belgun

Updated new patch set addressing the issues below. https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/AudioParamTimeline.cpp#newcode357 Source/modules/webaudio/AudioParamTimeline.cpp:357: const ...

5 years, 4 months ago (2015-08-17 07:58:14 UTC) #10

Updated new patch set addressing the issues below.

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud...
File Source/modules/webaudio/AudioParamTimeline.cpp (right):

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud...
Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6
* c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0;
On 2015/08/14 16:00:03, Raymond Toy wrote:
> On 2015/08/14 07:49:21, adrian.belgun wrote:
> > On 2015/08/13 15:42:56, Raymond Toy wrote:
> > > Would it help accuracy if the constants were computed using Horner's rule
> like
> > > so:
> > > 
> > > c1 = c0*(2-c0)
> > > c2 = c0*((c0-3)*c0+3)
> > > c3 = c0*(c0*((4-c0)*c0-6)+4)
> > > 
> > > If that's not enough, would it help if the computations were done using
> double
> > > precision (but leaving the c<n> values as floats)?
> > 
> > Experimented also with doing the computations with doubles and also with
> > Horner's rule. Got the same result.
> > Will keep the expanded version for clarity.
> 
> You mean you computed the c's using doubles and also Horner's rule and still
got
> the same result?
> 

Yes, applied Horner's rule and for testing to make sure that the computations
were made on doubles made target, timeConstant,  discreteTimeConstant and c*
doubles. Still a 3.9e-5 error was needed to pass the test.

The issue is caused by the line operation 
  value += delta * c3;
If we exchange this with 
  value += delta * c0;
  delta = target - value;
  value += delta * c0;
  delta = target - value;
  value += delta * c0;
  delta = target - value;
  value += delta * c0;
The error needed to pass the test got reduced to 2.84e-5 (but still higher due
to SSE ops), but the performance gain got completely canceled.
> Horner's rule has fewer operations (not really important here), and also
> generally has much better roundoff errors.  I prefer Horner's rule here for
> these reasons.

Changed as per request.

https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio...
File Source/modules/webaudio/AudioParamTimeline.cpp (right):

https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio...
Source/modules/webaudio/AudioParamTimeline.cpp:353: // Resorve recursion by
expanding constants to achieve a 4-step loop unrolling.
On 2015/08/14 16:00:03, Raymond Toy wrote:
> "Resorve"?

Done. Sorry for that.

Raymond Toy

lgtm. While it's unfortunate that the threshold had to be increased, I don't think it's ...

5 years, 4 months ago (2015-08-17 15:54:51 UTC) #11

lgtm.

While it's unfortunate that the threshold had to be increased, I don't think
it's show stopper.

Thanks for your patience!

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud...
File Source/modules/webaudio/AudioParamTimeline.cpp (right):

https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud...
Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6
* c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0;
On 2015/08/17 07:58:14, adrian.belgun wrote:
> On 2015/08/14 16:00:03, Raymond Toy wrote:
> > On 2015/08/14 07:49:21, adrian.belgun wrote:
> > > On 2015/08/13 15:42:56, Raymond Toy wrote:
> > > > Would it help accuracy if the constants were computed using Horner's
rule
> > like
> > > > so:
> > > > 
> > > > c1 = c0*(2-c0)
> > > > c2 = c0*((c0-3)*c0+3)
> > > > c3 = c0*(c0*((4-c0)*c0-6)+4)
> > > > 
> > > > If that's not enough, would it help if the computations were done using
> > double
> > > > precision (but leaving the c<n> values as floats)?
> > > 
> > > Experimented also with doing the computations with doubles and also with
> > > Horner's rule. Got the same result.
> > > Will keep the expanded version for clarity.
> > 
> > You mean you computed the c's using doubles and also Horner's rule and still
> got
> > the same result?
> > 
> 
> Yes, applied Horner's rule and for testing to make sure that the computations
> were made on doubles made target, timeConstant,  discreteTimeConstant and c*
> doubles. Still a 3.9e-5 error was needed to pass the test.

Interesting. Thanks for doing the tests.

I think this change in threshold is acceptable.  Note also that the test
computes the expected result using the simple (non-SSE2) equation, so there
might be some round-off there as well.  The results might also be better than
indicated if we used a relative error instead of absolute.  I think the curve
starts at a gain of 100 so absolute errors might be larger than the relative
error.

> 
> The issue is caused by the line operation 
>   value += delta * c3;
> If we exchange this with 
>   value += delta * c0;
>   delta = target - value;
>   value += delta * c0;
>   delta = target - value;
>   value += delta * c0;
>   delta = target - value;
>   value += delta * c0;
> The error needed to pass the test got reduced to 2.84e-5 (but still higher due
> to SSE ops), but the performance gain got completely canceled.
> > Horner's rule has fewer operations (not really important here), and also
> > generally has much better roundoff errors.  I prefer Horner's rule here for
> > these reasons.
> 
> Changed as per request.