|
|
Chromium Code Reviews
DescriptionPossibly merge consecutive script fragments to reduce execution overhead
When parsing webpages written with multiple consecutive script fragments
in below form or similar,
'<script>
...
</script>
<script>
...
</script>'
HTMLScriptRunner will be called for each script fragment, which brings
extra overhead in execution.
The CL can effectively merge these consecutive scripts together and
run them at once to reduce execution cost when parsing HTML. Telemetry
page_cycler tests with Cyan chromebook show ~20% improvement on loading the
Gmail page in pageset top_25, using warm_times as the performance indicator.
BUG=658131
Patch Set 1 #
Messages
Total messages: 24 (9 generated)
lingyun.cai@intel.com changed reviewers: + csharrison@chromium.org, tkent@chromium.org
PTAL
Description was changed from ========== Possibly merge consecutive script fragments to reduce execution overhead When parsing webpages written with multiple consecutive script fragments in below form or similar, '<script> ... </script> <script> ... </script>' HTMLScriptRunner will be called for each script fragment, which brings extra overhead in execution. The CL can effectively merge these consecutive scripts together and run them at once to reduce execution cost when parsing HTML. Telemetry page_cycler tests with Cyan chromebook show ~20% improvment on loading the Gmail page in pageset top_25, using warm_times as the performance indicator. BUG=658131 ========== to ========== Possibly merge consecutive script fragments to reduce execution overhead When parsing webpages written with multiple consecutive script fragments in below form or similar, '<script> ... </script> <script> ... </script>' HTMLScriptRunner will be called for each script fragment, which brings extra overhead in execution. The CL can effectively merge these consecutive scripts together and run them at once to reduce execution cost when parsing HTML. Telemetry page_cycler tests with Cyan chromebook show ~20% improvement on loading the Gmail page in pageset top_25, using warm_times as the performance indicator. BUG=658131 ==========
tkent@chromium.org changed reviewers: + kouhei@chromium.org
+kouhei Does this mean resultant DOM tree is not same before this CL?
On 2016/10/21 08:59:47, tkent wrote: > +kouhei > > > Does this mean resultant DOM tree is not same before this CL? To me it looks like yes, as we are merging the tokens at the tokenizer layer. I'll run this through a CQ dry run to see if anything breaks.
The CQ bit was checked by csharrison@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
20% improvement sounds pretty attractive, but I think the general direction is rather we might want to break up big script execution tasks into multiple smaller ones esp. when we have other tasks to run? I'm also interested in what extra overhead we're having when they are executed in multiple chunks (vs. run in a merged task) and if we could reduce some of the overhead (lingyun@- do you perhaps have any insights on that part too?).
On 2016/10/21 at 12:24:53, csharrison wrote: > On 2016/10/21 08:59:47, tkent wrote: > > +kouhei > > > > > > Does this mean resultant DOM tree is not same before this CL? > > To me it looks like yes, as we are merging the tokens at the tokenizer layer. I'll run this through a CQ dry run to see if anything breaks. Then, this change has web-exposed behavior changes. We need to change the HTML specification first.
Hmm. I'd be more interested to know where the performance gain come from. I don't think we can accept this patch as is (needs spec change as tkent-san said). Also there are existing websites that rely on the existing behavior (such as fb) Would you post more detailed benchmark result? On which metric did you notice 20% improvement?
hong.zheng@intel.com changed reviewers: + hong.zheng@intel.com
because Lingyun is on leave, I am on behalf of her to do some update
On 2016/10/21 12:24:53, Charlie Harrison wrote: > On 2016/10/21 08:59:47, tkent wrote: > > +kouhei > > > > > > Does this mean resultant DOM tree is not same before this CL? > > To me it looks like yes, as we are merging the tokens at the tokenizer layer. > I'll run this through a CQ dry run to see if anything breaks. I see some Trybot results are failure, dose it mean resultant DOM tree break some test?
On 2016/10/21 14:34:13, kinuko wrote: > 20% improvement sounds pretty attractive, but I think the general direction is > rather we might want to break up big script execution tasks into multiple > smaller ones esp. when we have other tasks to run? I'm also interested in what > extra overhead we're having when they are executed in multiple chunks (vs. run > in a merged task) and if we could reduce some of the overhead (lingyun@- do you > perhaps have any insights on that part too?). gmail.com will load a file, which includes about 100 pairs of <script></script>. From the trace in bug #658131, we can see it will trigger many tasks and V8 compilation and execution w/o the patch. I am not sure whether these behaviors can result in extra overhead. What is the extra overhead is an interesting problem. Do you have any idea about how to locate it? If yes, please tell me and I would like to verify it.
On 2016/10/24 01:17:21, tkent wrote: > On 2016/10/21 at 12:24:53, csharrison wrote: > > On 2016/10/21 08:59:47, tkent wrote: > > > +kouhei > > > > > > > > > Does this mean resultant DOM tree is not same before this CL? > > > > To me it looks like yes, as we are merging the tokens at the tokenizer layer. > I'll run this through a CQ dry run to see if anything breaks. > > Then, this change has web-exposed behavior changes. We need to change the HTML > specification first. oh, "change the HTML specification", I think it is a difficult work, is it? Is there other method to work around it and gain the benefit from the patch?
On 2016/10/24 01:33:03, kouhei wrote: > Hmm. I'd be more interested to know where the performance gain come from. > I don't think we can accept this patch as is (needs spec change as tkent-san > said). Also there are existing websites that rely on the existing behavior (such > as fb) > > Would you post more detailed benchmark result? On which metric did you notice > 20% improvement? the detailed results are uploaded in the bug #658131. we uses page loading warm_times as the metric.
On 2016/10/21 14:34:13, kinuko wrote: > 20% improvement sounds pretty attractive, but I think the general direction is > rather we might want to break up big script execution tasks into multiple > smaller ones esp. when we have other tasks to run? I'm also interested in what > extra overhead we're having when they are executed in multiple chunks (vs. run > in a merged task) and if we could reduce some of the overhead (lingyun@- do you > perhaps have any insights on that part too?). Thanks for your comments. I've extracted the HTML file that contains lots of script fragments when gmail is loading, and loaded it locally to get trace statistics (in ms), below are the break-down: Function w/ patch w/o patch Function w/ patch w/o patch parseHTML 620.334 802.416 HTMLScriptRunner ::execute 609.045 784.206 evaluateScript 606.036 776.525 v8.compile 582.905 562.822 v8.run 22.875 207.908 We can see that the main overhead lies in v8 part. Also I simplified this HTML as a test file consists of 140 consecutive script fragments in the form of: <script> var test = 'string'; </script> The break-down trace statistics (in ms) are shown as below, which also indicates the overhead in v8: Function w/ patch w/o patch Function w/ patch w/o patch HTMLScriptRunner ::execute 1.349 45.274 evaluateScript 1.267 38.781 v8.compile 1.016 22.539 v8.run 0.198 10.044 I'm not very familiar with the v8 internals, do the results shown above look reasonable? Besides, all the related HTML files and traces are attached and updated in the BUG #658131. If any more data are needed, I can help to provide :)
On 2016/10/24 01:17:21, tkent wrote: > On 2016/10/21 at 12:24:53, csharrison wrote: > > On 2016/10/21 08:59:47, tkent wrote: > > > +kouhei > > > > > > > > > Does this mean resultant DOM tree is not same before this CL? > > > > To me it looks like yes, as we are merging the tokens at the tokenizer layer. > I'll run this through a CQ dry run to see if anything breaks. > > Then, this change has web-exposed behavior changes. We need to change the HTML > specification first. Thanks for your comments. Would you please kindly point out which part of the HTML specs you refer to? Maybe we can also double-check at our side :)
Another thing that would be worthwhile to check is an about:tracing log before and after the patch is applied. I wonder if this is affecting v8 compilation.
lingyun.cai@intel.com changed reviewers: + kinuko@chromium.org
On 2016/11/08 15:01:50, Charlie Harrison wrote:
> Another thing that would be worthwhile to check is an about:tracing log before
> and after the patch is applied. I wonder if this is affecting v8 compilation.
I've recorded the trace files by using Telemetry option "--profiler=trace". The
tracing statistics (in ms) of the page-loading period are shown as below:
w/ patch w/o patch improvement
warm_times 2859* 3119* 260
break-down
v8.compile 602 602 -
v8.execute 1867 2104 237
The performance gap 260ms mainly lies in the v8.execute part (237ms
improvement), and I'm planning to have a deeper break-down inside this. Do you
perhaps have any insights about it?
One thing needs to be mentioned is that the tracing introduces 1000+ ms overhead
to the final results of both w/ and w/o patch (i.e.,1783ms with no tracing and
3119ms with tracing for results w/o patch), thus the performance improvement is
not as obvious as the previous ones. I don't know if there is a better way for
profiling with less overhead when running Telemetry.
Besides, Gmail page is complex and I wonder if we could use the extracted Gmail
HTML file, which is also the main HTML parsed during the loading period and
consist of 100+ consecutive scripts, for analyzing. It has similar behavior with
above statistics when loaded locally (i.e., performance gap in v8.execute part):
w/ patch w/o patch
parseHTML 620.334 802.416
v8.compile 582.905 562.822
v8.execute 22.875 207.908
Because I can't attach anything here, so for reference, the gmail loading traces
w/ and w/o patch are updated in BUG#658131 Comment#7 in .zip format, and the
extracted Gmail HTML file is in Comment#5 as 'gmail_js.html'.
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
