|
|
Created:
5 years, 10 months ago by vivekg_samsung Modified:
5 years, 10 months ago CC:
arv+blink, blink-reviews, blink-reviews-bindings_chromium.org, vivekg_samsung Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
Description[bindings] Break V8Binding::toDouble into fast inline and slower non-inline code paths.
The call to value->IsNumber() is done twice once in the V8Binding.cpp:toDouble and inside the v8 code.
This can be avoided if we invoke V8::Value->NumberValue() directly and catching any exception if thrown.
Hence splitting toDouble into fast inilne path and slow non-inline path otherwise.
R=haraken@chromium.org, jl@opera.com
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=190569
Patch Set 1 #Patch Set 2 : Review comments addressed. #
Messages
Total messages: 24 (3 generated)
vivekg@chromium.org changed reviewers: + haraken@chromium.org, jl@opera.com
vivekg@chromium.org changed reviewers: + vivekg@chromium.org
PTAL, thank you.
We have in the past seen rather dramatic performance improvements by eliminating v8::TryCatch objects from the executed code path. This adds one, so I'm immediately somewhat surprised that it improves performance. In what tests do you see an improvement? Another option that you might want to test is to inline the if (value->IsNumber()) return value->NumberValue(); bit. It'd be interesting to know how that compares performance-wise to your proposed patch. (I think "it's already a number" is the case we ought to optimize for, so inlining just the above makes some sense.)
On 2015/02/19 at 14:20:48, jl wrote: Thank you jl@ for your feedback. > We have in the past seen rather dramatic performance improvements by eliminating v8::TryCatch objects from the executed code path. This adds one, so I'm immediately somewhat surprised that it improves performance. In what tests do you see an improvement? I tried running //tools/perf/run_benchmark --browser=android-chrome-shell [kraken/sunspider/octane/speedometer/blink.bindings] for android. But these tests are very flaky in nature so was getting fluctuating results. Sometimes the results were too good to be true (system load etc.) but most of the times were almost same just ~ +/-(1%) > > Another option that you might want to test is to inline the > > if (value->IsNumber()) > return value->NumberValue(); > > bit. It'd be interesting to know how that compares performance-wise to your proposed patch. (I think "it's already a number" is the case we ought to optimize for, so inlining just the above makes some sense.) So are you suggesting something like: inline double toDouble(v8::Handle<v8::Value> value, ExceptionState& exceptionState) { if (value->IsNumber()) return value->NumberValue(); v8::Isolate* isolate = v8::Isolate::GetCurrent(); v8::TryCatch block(isolate); double doubleValue = value->NumberValue(); if (block.HasCaught()) { exceptionState.rethrowV8Exception(block.Exception()); return 0; } return doubleValue; }
On 2015/02/19 15:38:39, vivekg_ wrote: > On 2015/02/19 at 14:20:48, jl wrote: > > Thank you jl@ for your feedback. > > > We have in the past seen rather dramatic performance improvements by > eliminating v8::TryCatch objects from the executed code path. This adds one, so > I'm immediately somewhat surprised that it improves performance. In what tests > do you see an improvement? > > I tried running //tools/perf/run_benchmark --browser=android-chrome-shell > [kraken/sunspider/octane/speedometer/blink.bindings] for android. > > But these tests are very flaky in nature so was getting fluctuating results. > Sometimes the results were too good to be true (system load etc.) but most of > the times were almost same just ~ +/-(1%) I'd expect the potential gain of having one IsNumber() call instead of two is extremely minor compared to the total complexity of V8 and Blink, and that you'll need to construct a very specific micro-benchmark that magnifies the impact of this particular thing in order for any effect to be distinguishable from just random noise. > > > > Another option that you might want to test is to inline the > > > > if (value->IsNumber()) > > return value->NumberValue(); > > > > bit. It'd be interesting to know how that compares performance-wise to your > proposed patch. (I think "it's already a number" is the case we ought to > optimize for, so inlining just the above makes some sense.) > > So are you suggesting something like: > > inline double toDouble(v8::Handle<v8::Value> value, ExceptionState& > exceptionState) > { > if (value->IsNumber()) > return value->NumberValue(); > > v8::Isolate* isolate = v8::Isolate::GetCurrent(); > v8::TryCatch block(isolate); > double doubleValue = value->NumberValue(); > if (block.HasCaught()) { > exceptionState.rethrowV8Exception(block.Exception()); > return 0; > } > return doubleValue; > } I'm rather suggesting inline double toDouble(v8::Handle<v8::Value> value, ExceptionState& exceptionState) { if (value->IsNumber()) return value->NumberValue(); else return toDoubleSlow(value, exceptionState); } where toDoubleSlow() would contain the rest of the code (and not be inlined).
On 2015/02/19 15:44:27, Jens Widell wrote: > On 2015/02/19 15:38:39, vivekg_ wrote: > > On 2015/02/19 at 14:20:48, jl wrote: > > > > Thank you jl@ for your feedback. > > > > > We have in the past seen rather dramatic performance improvements by > > eliminating v8::TryCatch objects from the executed code path. This adds one, > so > > I'm immediately somewhat surprised that it improves performance. In what tests > > do you see an improvement? > > > > I tried running //tools/perf/run_benchmark --browser=android-chrome-shell > > [kraken/sunspider/octane/speedometer/blink.bindings] for android. > > > > But these tests are very flaky in nature so was getting fluctuating results. > > Sometimes the results were too good to be true (system load etc.) but most of > > the times were almost same just ~ +/-(1%) > > I'd expect the potential gain of having one IsNumber() call instead of two is > extremely minor compared to the total complexity of V8 and Blink, and that > you'll need to construct a very specific micro-benchmark that magnifies the > impact of this particular thing in order for any effect to be distinguishable > from just random noise. > Yeah sure. I was also bit skeptical about the change as the processors are equipped with instruction caching etc thereby making the measurement of the impact almost impossible. Let me see if we can arrive at any such micro benchmark reliably as we don't want to add any flakiness there. > > > > > > Another option that you might want to test is to inline the > > > > > > if (value->IsNumber()) > > > return value->NumberValue(); > > > > > > bit. It'd be interesting to know how that compares performance-wise to your > > proposed patch. (I think "it's already a number" is the case we ought to > > optimize for, so inlining just the above makes some sense.) > > > > So are you suggesting something like: > > > > inline double toDouble(v8::Handle<v8::Value> value, ExceptionState& > > exceptionState) > > { > > if (value->IsNumber()) > > return value->NumberValue(); > > > > v8::Isolate* isolate = v8::Isolate::GetCurrent(); > > v8::TryCatch block(isolate); > > double doubleValue = value->NumberValue(); > > if (block.HasCaught()) { > > exceptionState.rethrowV8Exception(block.Exception()); > > return 0; > > } > > return doubleValue; > > } > > I'm rather suggesting > > inline double toDouble(v8::Handle<v8::Value> value, ExceptionState& > exceptionState) > { > if (value->IsNumber()) > return value->NumberValue(); > else > return toDoubleSlow(value, exceptionState); > } > > where toDoubleSlow() would contain the rest of the code (and not be inlined). Sure. This looks pretty neat, thanks for the suggestion, appreciate your feedback. :)
Done, PTAL.
On 2015/02/20 04:18:26, vivekg_ wrote: > Done, PTAL. That matches what I suggested. Now the question is: is this measurably better?
On 2015/02/20 at 07:53:53, jl wrote: > On 2015/02/20 04:18:26, vivekg_ wrote: > > Done, PTAL. > > That matches what I suggested. Now the question is: is this measurably better? I ran subset of perf tests and could see a bit better performance (but unable to get the exact amount due to heavy noise) with the patch. The crypto subset of tests from octane and sunspider performed slightly better with the patch (Guessing that these tests use the numbers heavily). Also didn't notice any regressions either caused by the patch. Do you have any ideas about how do we measure this to be able to take this forward?
On 2015/02/20 08:36:33, vivekg_ wrote: > On 2015/02/20 at 07:53:53, jl wrote: > > On 2015/02/20 04:18:26, vivekg_ wrote: > > > Done, PTAL. > > > > That matches what I suggested. Now the question is: is this measurably better? > > I ran subset of perf tests and could see a bit better performance (but unable to > get the exact amount due to heavy noise) with the patch. > The crypto subset of tests from octane and sunspider performed slightly better > with the patch (Guessing that these tests use the numbers heavily). > Also didn't notice any regressions either caused by the patch. Last I checked, Octane and Sunspider were pretty much pure ECMAScript tests. IOW, they would essentially not call any code in Blink at all. > Do you have any ideas about how do we measure this to be able to take this > forward? I would find a simple double attribute or method with a double argument, and construct a micro-benchmark that simply triggers that code in a tight loop, preferably doing as little as possible. Or, better yet, add a dummy double attribute to some interface, e.g. Document, with an empty C++ setter: Document.idl: + attribute unrestricted double dummy; Document.h: + double dummy() { return 0.0; } + void setDummy(double value) { } micro-benchmark.html: <script> function test() { var d = document; for (var i = 0; i < SOME_LARGE_NUMBER; ++i) d.dummy = 1.5; } </script>
Some initial data: Test ==== <!DOCTYPE html> <body> <script src="../resources/runner.js"></script> <script> PerfTestRunner.measureTime({ description: "Micro benchmark to measure time taken by V8Binding.toDouble", run: function() { (function() { var root = document.createElement("div"); for (var i = 0; i < 1000000; i++) root.scrollLeft = 1.2; })(); } }); </script> </body> Result - Android ================ Command line$ tools/perf/run_benchmark --browser=android-chrome-shell-release blink_perf.bindings --output-dir=perf-results/android +-------------------------------------------+----------+----------+---------+ | Test - ANDROID | Unit | Without | With | +-------------------------------------------+----------+----------+---------+ | blink_perf.bindings: micro-benchmark.html | +-------------------------------------------+----------+----------+---------+ | Run 1 | ms | 693.18 | 647.02 | | Run 2 | ms | 670.79 | 609.45 | | Run 3 | ms | 653.19 | 658.27 | | Run 4 | ms | 702.07 | 641.16 | | Run 5 | ms | 693.11 | 684.22 | +-------------------------------------------+----------+----------+---------+ | Average | ms | 682.47 | 648.02 | +-------------------------------------------+----------+----------+---------+ | Geometric Mean | ms | 682.23 | 647.57 | +-------------------------------------------+----------+----------+---------+ | Standard Deviation | ms | 20.04 | 27.16 | +-------------------------------------------+----------+----------+---------+ Result - Linux ============== Command line$ tools/perf/run_benchmark --browser=content-shell-release blink_perf.bindings --output-dir=perf-results/linux +-------------------------------------------+----------+----------+---------+ | Test - LINUX | Unit | Without | With | +-------------------------------------------+----------+----------+---------+ | blink_perf.bindings: micro-benchmark.html | +-------------------------------------------+----------+----------+---------+ | Run 1 | ms | 92.44 | 92.34 | | Run 2 | ms | 91.92 | 93.77 | | Run 3 | ms | 91.28 | 93.87 | | Run 4 | ms | 91.92 | 92.27 | | Run 5 | ms | 91.86 | 92.29 | +-------------------------------------------+----------+----------+---------+ | Average | ms | 91.88 | 92.91 | +-------------------------------------------+----------+----------+---------+ | Geometric Mean | ms | 91.88 | 92.91 | +-------------------------------------------+----------+----------+---------+ | Standard Deviation | ms | 0.41 | 0.83 | +-------------------------------------------+----------+----------+---------+ From this data, difference on linux is almost none whereas we are seeing slight differences in android measurements.
On 2015/02/20 12:23:12, vivekg_ wrote: > Some initial data: Thanks! That's comparing trunk and PS#2?
On 2015/02/20 at 12:26:32, jl wrote: > On 2015/02/20 12:23:12, vivekg_ wrote: > > Some initial data: > > Thanks! That's comparing trunk and PS#2? Yes.
One more test with slight change of number value setting. Generating this value dynamically now to rule out any possibility of code caching (correct me if I am wrong). Test ==== <!DOCTYPE html> <body> <script src="../resources/runner.js"></script> <script> PerfTestRunner.measureTime({ description: "Micro benchmark to measure time taken by V8Binding.toDouble", run: function() { (function() { var root = document.createElement("div"); for (var i = 0; i < 1000000; i++) root.scrollLeft = i % 1000; })(); } }); </script> </body> +-------------------------------------------+----------+----------+---------+ | Test - ANDROID | Unit | Without | With | +-------------------------------------------+----------+----------+---------+ | blink_perf.bindings: micro-benchmark.html | +-------------------------------------------+----------+----------+---------+ | Run 1 | ms | 689.8 | 631.7 | | Run 2 | ms | 682.4 | 627.9 | | Run 3 | ms | 649.3 | 627.8 | | Run 4 | ms | 668.7 | 687.7 | | Run 5 | ms | 682.0 | 624.0 | +-------------------------------------------+----------+----------+---------+ | Average | ms | 674.44 | 639.82 | +-------------------------------------------+----------+----------+---------+ | Geometric Mean | ms | 674.29 | 639.38 | +-------------------------------------------+----------+----------+---------+ | Standard Deviation | ms | 15.98 | 26.90 | +-------------------------------------------+----------+----------+---------+
On 2015/02/20 13:26:07, vivekg_ wrote: > One more test with slight change of number value setting. > Generating this value dynamically now to rule out any possibility of code > caching (correct me if I am wrong). I doubt caching is anything to worry about; V8 can hardly make assumptions that assigning to this type of property (that V8 knows little about) is without side-effects. One additional thing that you might want to test is the slow case, e.g. root.scrollLeft = true;
<!DOCTYPE html> <body> <script src="../resources/runner.js"></script> <script> PerfTestRunner.measureTime({ description: "Micro benchmark to measure time taken by V8Binding.toDouble", run: function() { (function() { var root = document.createElement("div"); for (var i = 0; i < 1000000; i++) root.scrollLeft = true; })(); } }); </script> </body> +-------------------------------------------+----------+----------+---------+ | Test - ANDROID | Unit | Without | With | +-------------------------------------------+----------+----------+---------+ | blink_perf.bindings: micro-benchmark.html | +-------------------------------------------+----------+----------+---------+ | Run 1 | ms | 1193 | 1180 | | Run 2 | ms | 1222 | 1215 | +-------------------------------------------+----------+----------+---------+ | Average | ms | 1207.5 | 1197.5 | +-------------------------------------------+----------+----------+---------+ These are almost the same.
Right. So, I think this is a simple and reasonable enough change that even a fairly modest performance gain is enough motivation to do it. Doing the equivalent thing for toInt32() and toUInt32(), both starting with equivalent (non-inlined) fast paths currently, seem equally meaningful. And toFloat() could simply be inlined in its current form, which just calls toDouble() and casts the result.
On 2015/02/20 15:00:29, Jens Widell wrote: > Right. So, I think this is a simple and reasonable enough change that even a > fairly modest performance gain is enough motivation to do it. > > Doing the equivalent thing for toInt32() and toUInt32(), both starting with > equivalent (non-inlined) fast paths currently, seem equally meaningful. > > And toFloat() could simply be inlined in its current form, which just calls > toDouble() and casts the result. Yeah sure. The others for int and uint are also in the pipeline on the similar lines :) I thought of making these individual patches after evaluating each separately. WDYT? Also do you think we should add any such micro benchmark tests to perf test suite? The gain is not so significant for warranting one though. :)
LGTM On 2015/02/20 15:29:28, vivekg_ wrote: > On 2015/02/20 15:00:29, Jens Widell wrote: > > Right. So, I think this is a simple and reasonable enough change that even a > > fairly modest performance gain is enough motivation to do it. > > > > Doing the equivalent thing for toInt32() and toUInt32(), both starting with > > equivalent (non-inlined) fast paths currently, seem equally meaningful. > > > > And toFloat() could simply be inlined in its current form, which just calls > > toDouble() and casts the result. > > Yeah sure. The others for int and uint are also in the pipeline on the similar > lines :) > > I thought of making these individual patches after evaluating each separately. > WDYT? Sure, that makes sense. > Also do you think we should add any such micro benchmark tests to perf test > suite? Oh, absolutely not. :-) Micro-benchmarks are noise measuring devices that should be discarded ASAP after proving (or disproving) whatever they were meant to prove.
Thank you for your feedback. :)
The CQ bit was checked by vivekg@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/939923002/20001
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://src.chromium.org/viewvc/blink?view=rev&revision=190569 |