|
|
DescriptionOptimization: Saves 30M instructions on x64 parsing ~6MB of JS code in parser-shell
BUG=
Patch Set 1 #
Messages
Total messages: 9 (2 generated)
littledan@chromium.org changed reviewers: + littledan@chromium.org, nikolaos@chromium.org
Adrian, do you have any overall performance numbers on how this affects the speed of parsing?
On 2016/06/07 16:18:58, Dan Ehrenberg wrote: > Adrian, do you have any overall performance numbers on how this affects the > speed of parsing? I think this is a nice patch, if we keep the existing implementation of expression classifiers. It avoids unnecessarily copies of error objects, when two kinds of errors are forgiven. The problem with copying error objects in all other places, including all calls to Accumulate, still remains however. If you could share the details of your test, maybe we could find a way to improve my patch. I've only tested it with the perf bots, so far, not with valgrind.
On 2016/06/07 16:18:58, Dan Ehrenberg wrote: > Adrian, do you have any overall performance numbers on how this affects the > speed of parsing? Sorry that this is so terse, I have just posted this CL to be able to link it from https://codereview.chromium.org/1708193003/#msg51 The way I have tested this was running a debug build (release builds won't work) like this: % valgrind --read-inline-info=yes --tool=callgrind \ ./out/x64.debug/parser-shell jsfiles/*.js % callgrind_annotate callgrind.out.${PID} The output has the total instruction count, and a sorted list of functions which incur in more CPU instructions being executed. Alternatively, one can use KCachegrind to open the output files. Because Valgrind is running on debug builds improvements observed don't neccessarily translate when measuring on a release build (though they often do), so here's also the timings for a release build: Before: 164/18 167/18 168/18 171/19 166/18 (avg: 167.2/18.2) After: 164/18 165/18 164/18 164/18 165/18 (avg: 164.4/18.0) Speedup: 1.01703/1.01111 (Seven runs, cutting the biggest and smallest values before calculating the average to discard e.g. the first run which usually is slower due to the JS files not being in memory in the disk buffer cache.)
On 2016/06/07 16:40:06, nickie wrote: > On 2016/06/07 16:18:58, Dan Ehrenberg wrote: > > Adrian, do you have any overall performance numbers on how this affects the > > speed of parsing? > > I think this is a nice patch, if we keep the existing implementation of > expression classifiers. It avoids unnecessarily copies of error objects, when > two kinds of errors are forgiven. The problem with copying error objects in all > other places, including all calls to Accumulate, still remains however. Indeed, there's probably more room for improvement as this was a quick proof of concept patch which happily worked. Devoting more time to reason about the parser code with the aid of profiling tools could end up in more optimizations — and not just in ExpressionClassifier :-) > If you could share the details of your test, maybe we could find a way to > improve my patch. I've only tested it with the perf bots, so far, not with > valgrind. See my other comment for info on how I did the testing.
On 2016/06/07 17:03:12, aperez wrote: > On 2016/06/07 16:18:58, Dan Ehrenberg wrote: > > Adrian, do you have any overall performance numbers on how this affects the > > speed of parsing? > > Sorry that this is so terse, I have just posted this CL to be able to link > it from https://codereview.chromium.org/1708193003/#msg51 > > The way I have tested this was running a debug build (release builds won't > work) like this: Why won't release builds work? I'm skeptical that counting instructions in a debug build is likely to give particularly useful results. > % valgrind --read-inline-info=yes --tool=callgrind \ > ./out/x64.debug/parser-shell jsfiles/*.js > % callgrind_annotate callgrind.out.${PID} > > The output has the total instruction count, and a sorted list of functions > which incur in more CPU instructions being executed. Alternatively, one can > use KCachegrind to open the output files. Because Valgrind is running on > debug builds improvements observed don't neccessarily translate when measuring > on a release build (though they often do), so here's also the timings for a > release build: > > Before: 164/18 167/18 168/18 171/19 166/18 (avg: 167.2/18.2) > After: 164/18 165/18 164/18 164/18 165/18 (avg: 164.4/18.0) > Speedup: 1.01703/1.01111 > > (Seven runs, cutting the biggest and smallest values before calculating > the average to discard e.g. the first run which usually is slower due to > the JS files not being in memory in the disk buffer cache.)
On 2016/06/09 15:34:34, adamk wrote: > On 2016/06/07 17:03:12, aperez wrote: > > On 2016/06/07 16:18:58, Dan Ehrenberg wrote: > > > Adrian, do you have any overall performance numbers on how this affects the > > > speed of parsing? > > > > Sorry that this is so terse, I have just posted this CL to be able to link > > it from https://codereview.chromium.org/1708193003/#msg51 > > > > The way I have tested this was running a debug build (release builds won't > > work) like this: > > Why won't release builds work? I'm skeptical that counting instructions in a > debug build is likely to give particularly useful results. Well, Valgrind does need debug info, so release build won't work. I used a debug build to get a grasp on where to look for things to optimize, and then the timings of “parser-shell” in release mode (as seen below) to validate that the change is positive. Today looking at the Makefiles I noticed that it is possible to make “optdebug” builds (with debug info, plus optimizations enabled), and re-ran the tests. As you suspected, the improvement in instruction count with optimized code is meager: Before: 1,364,066,329 instructions After: 1,364,066,284 instructions (only 45 less ☹) > > % valgrind --read-inline-info=yes --tool=callgrind \ > > ./out/x64.debug/parser-shell jsfiles/*.js > > % callgrind_annotate callgrind.out.${PID} > > > > The output has the total instruction count, and a sorted list of functions > > which incur in more CPU instructions being executed. Alternatively, one can > > use KCachegrind to open the output files. Because Valgrind is running on > > debug builds improvements observed don't neccessarily translate when measuring > > on a release build (though they often do), so here's also the timings for a > > release build: > > > > Before: 164/18 167/18 168/18 171/19 166/18 (avg: 167.2/18.2) > > After: 164/18 165/18 164/18 164/18 165/18 (avg: 164.4/18.0) > > Speedup: 1.01703/1.01111 This results still apply (also re-checked today). I think that the slightly smaller values with the patch applied are because the Error::reset() method is marked with V8_INLINE, and function calls into the constructor are avoided. > > (Seven runs, cutting the biggest and smallest values before calculating > > the average to discard e.g. the first run which usually is slower due to > > the JS files not being in memory in the disk buffer cache.)
I'm closing this, as I think it's not relevant any more.
Description was changed from ========== Optimization: Saves 30M instructions on x64 parsing ~6MB of JS code in parser-shell BUG= ========== to ========== Optimization: Saves 30M instructions on x64 parsing ~6MB of JS code in parser-shell BUG= ========== |