|
|
Created:
5 years, 4 months ago by adrian.belgun Modified:
5 years, 4 months ago CC:
blink-reviews Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
DescriptionSSE2 Optimization for ParamEvent::SetTarget
Optimized with SSE2 and obtained 4.37x speedup.
Standard: 2.900553 seconds
SSE2 (fp32): 0.662973 seconds
Speedup (fp32): 4.375067
Due to multiple multiplications with a small discreteTimeConstant, some divergences appear versus the serialized version. Test adjusted accordingly.
BUG=512376
TEST=Run webaudio/audioparam-setTargetAtTime.html
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=200726
Patch Set 1 #
Total comments: 7
Patch Set 2 : Added extra comments. #
Total comments: 2
Patch Set 3 : Calculate constants using Horner's rule. #Patch Set 4 : Added missing header. #Patch Set 5 : Include SSE2 header only on x86/x86_64. #
Messages
Total messages: 23 (8 generated)
adrian.belgun@intel.com changed reviewers: + rtoy@chromium.org
rtoy@chromium.org: Please review changes in Source/modules/webaudio/AudioParamTimeline.cpp Could you please recommend a reviewer for the test? Thank you!
On 2015/08/13 09:27:13, adrian.belgun wrote: > mailto:rtoy@chromium.org: Please review changes in > Source/modules/webaudio/AudioParamTimeline.cpp > > Could you please recommend a reviewer for the test? > Thank you! I can review it.
rtoy@chromium.org changed reviewers: + hongchan@chromium.org
+hongchan@: FYI
https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:353: // Necessary for unrolling.. As in the other panner node CL, can you include a brief comment on how you derived these constants? https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; Would it help accuracy if the constants were computed using Horner's rule like so: c1 = c0*(2-c0) c2 = c0*((c0-3)*c0+3) c3 = c0*(c0*((4-c0)*c0-6)+4) If that's not enough, would it help if the computations were done using double precision (but leaving the c<n> values as floats)?
Are you OK with this approach? I would like to also try to optimize other event event types in this function. https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:353: // Necessary for unrolling.. On 2015/08/13 15:42:56, Raymond Toy wrote: > As in the other panner node CL, can you include a brief comment on how you > derived these constants? Done. https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; On 2015/08/13 15:42:56, Raymond Toy wrote: > Would it help accuracy if the constants were computed using Horner's rule like > so: > > c1 = c0*(2-c0) > c2 = c0*((c0-3)*c0+3) > c3 = c0*(c0*((4-c0)*c0-6)+4) > > If that's not enough, would it help if the computations were done using double > precision (but leaving the c<n> values as floats)? Experimented also with doing the computations with doubles and also with Horner's rule. Got the same result. Will keep the expanded version for clarity.
https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; On 2015/08/14 07:49:21, adrian.belgun wrote: > On 2015/08/13 15:42:56, Raymond Toy wrote: > > Would it help accuracy if the constants were computed using Horner's rule like > > so: > > > > c1 = c0*(2-c0) > > c2 = c0*((c0-3)*c0+3) > > c3 = c0*(c0*((4-c0)*c0-6)+4) > > > > If that's not enough, would it help if the computations were done using double > > precision (but leaving the c<n> values as floats)? > > Experimented also with doing the computations with doubles and also with > Horner's rule. Got the same result. > Will keep the expanded version for clarity. You mean you computed the c's using doubles and also Horner's rule and still got the same result? Horner's rule has fewer operations (not really important here), and also generally has much better roundoff errors. I prefer Horner's rule here for these reasons. https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio... Source/modules/webaudio/AudioParamTimeline.cpp:353: // Resorve recursion by expanding constants to achieve a 4-step loop unrolling. "Resorve"?
On 2015/08/14 07:49:21, adrian.belgun wrote: > Are you OK with this approach? I would like to also try to optimize other event > event types in this function. Yes, the general approach is good. > > https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... > File Source/modules/webaudio/AudioParamTimeline.cpp (right): > > https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... > Source/modules/webaudio/AudioParamTimeline.cpp:353: // Necessary for unrolling.. > On 2015/08/13 15:42:56, Raymond Toy wrote: > > As in the other panner node CL, can you include a brief comment on how you > > derived these constants? > > Done. > > https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... > Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 > * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; > On 2015/08/13 15:42:56, Raymond Toy wrote: > > Would it help accuracy if the constants were computed using Horner's rule like > > so: > > > > c1 = c0*(2-c0) > > c2 = c0*((c0-3)*c0+3) > > c3 = c0*(c0*((4-c0)*c0-6)+4) > > > > If that's not enough, would it help if the computations were done using double > > precision (but leaving the c<n> values as floats)? > > Experimented also with doing the computations with doubles and also with > Horner's rule. Got the same result. > Will keep the expanded version for clarity.
Updated new patch set addressing the issues below. https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; On 2015/08/14 16:00:03, Raymond Toy wrote: > On 2015/08/14 07:49:21, adrian.belgun wrote: > > On 2015/08/13 15:42:56, Raymond Toy wrote: > > > Would it help accuracy if the constants were computed using Horner's rule > like > > > so: > > > > > > c1 = c0*(2-c0) > > > c2 = c0*((c0-3)*c0+3) > > > c3 = c0*(c0*((4-c0)*c0-6)+4) > > > > > > If that's not enough, would it help if the computations were done using > double > > > precision (but leaving the c<n> values as floats)? > > > > Experimented also with doing the computations with doubles and also with > > Horner's rule. Got the same result. > > Will keep the expanded version for clarity. > > You mean you computed the c's using doubles and also Horner's rule and still got > the same result? > Yes, applied Horner's rule and for testing to make sure that the computations were made on doubles made target, timeConstant, discreteTimeConstant and c* doubles. Still a 3.9e-5 error was needed to pass the test. The issue is caused by the line operation value += delta * c3; If we exchange this with value += delta * c0; delta = target - value; value += delta * c0; delta = target - value; value += delta * c0; delta = target - value; value += delta * c0; The error needed to pass the test got reduced to 2.84e-5 (but still higher due to SSE ops), but the performance gain got completely canceled. > Horner's rule has fewer operations (not really important here), and also > generally has much better roundoff errors. I prefer Horner's rule here for > these reasons. Changed as per request. https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/20001/Source/modules/webaudio... Source/modules/webaudio/AudioParamTimeline.cpp:353: // Resorve recursion by expanding constants to achieve a 4-step loop unrolling. On 2015/08/14 16:00:03, Raymond Toy wrote: > "Resorve"? Done. Sorry for that.
lgtm. While it's unfortunate that the threshold had to be increased, I don't think it's show stopper. Thanks for your patience! https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... File Source/modules/webaudio/AudioParamTimeline.cpp (right): https://codereview.chromium.org/1288773003/diff/1/Source/modules/webaudio/Aud... Source/modules/webaudio/AudioParamTimeline.cpp:357: const float c3 = 4 * c0 - 6 * c0 * c0 + 4 * c0 * c0 * c0 - c0 * c0 * c0 * c0; On 2015/08/17 07:58:14, adrian.belgun wrote: > On 2015/08/14 16:00:03, Raymond Toy wrote: > > On 2015/08/14 07:49:21, adrian.belgun wrote: > > > On 2015/08/13 15:42:56, Raymond Toy wrote: > > > > Would it help accuracy if the constants were computed using Horner's rule > > like > > > > so: > > > > > > > > c1 = c0*(2-c0) > > > > c2 = c0*((c0-3)*c0+3) > > > > c3 = c0*(c0*((4-c0)*c0-6)+4) > > > > > > > > If that's not enough, would it help if the computations were done using > > double > > > > precision (but leaving the c<n> values as floats)? > > > > > > Experimented also with doing the computations with doubles and also with > > > Horner's rule. Got the same result. > > > Will keep the expanded version for clarity. > > > > You mean you computed the c's using doubles and also Horner's rule and still > got > > the same result? > > > > Yes, applied Horner's rule and for testing to make sure that the computations > were made on doubles made target, timeConstant, discreteTimeConstant and c* > doubles. Still a 3.9e-5 error was needed to pass the test. Interesting. Thanks for doing the tests. I think this change in threshold is acceptable. Note also that the test computes the expected result using the simple (non-SSE2) equation, so there might be some round-off there as well. The results might also be better than indicated if we used a relative error instead of absolute. I think the curve starts at a gain of 100 so absolute errors might be larger than the relative error. > > The issue is caused by the line operation > value += delta * c3; > If we exchange this with > value += delta * c0; > delta = target - value; > value += delta * c0; > delta = target - value; > value += delta * c0; > delta = target - value; > value += delta * c0; > The error needed to pass the test got reduced to 2.84e-5 (but still higher due > to SSE ops), but the performance gain got completely canceled. > > Horner's rule has fewer operations (not really important here), and also > > generally has much better roundoff errors. I prefer Horner's rule here for > > these reasons. > > Changed as per request.
The CQ bit was checked by robert.bradford@intel.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1288773003/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1288773003/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...) linux_chromium_gn_rel on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by adrian.belgun@intel.com to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1288773003/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1288773003/80001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by adrian.belgun@intel.com
The patchset sent to the CQ was uploaded after l-g-t-m from rtoy@chromium.org Link to the patchset: https://codereview.chromium.org/1288773003/#ps80001 (title: "Include SSE2 header only on x86/x86_64.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1288773003/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1288773003/80001
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://src.chromium.org/viewvc/blink?view=rev&revision=200726 |