Issue 2189533004: Add blink_perf.html_to_dom benchmark.

Created:
4 years, 4 months ago by ulan

Modified:
4 years, 3 months ago

Reviewers:
haraken, nednguyen, esprehn, nduca

CC:
blink-reviews, chromium-reviews, telemetry-reviews_chromium.org

Base URL:
https://chromium.googlesource.com/chromium/src.git@master

Target Ref:
refs/pending/heads/master

Project:
chromium

Visibility:
Public.

More Reviews

Description

Add blink_perf.html_to_dom benchmark. For background and motivation see bugs. BUG=625986, 595492 CQ_INCLUDE_TRYBOTS=master.tryserver.chromium.perf:android_s5_perf_cq;master.tryserver.chromium.perf:linux_perf_cq;master.tryserver.chromium.perf:mac_retina_perf_cq;master.tryserver.chromium.perf:winx64_10_perf_cq

Patch Set 1 #

Patch Set 2 : Changes to the original benchmark #

Created: 4 years, 4 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+3860 lines, -7 lines)			Patch
A	third_party/WebKit/PerformanceTests/HTMLToDOM/dom/bindings/fakedomparserbindings.js	View		1 chunk	+99 lines, -0 lines	0 comments	Download
A	third_party/WebKit/PerformanceTests/HTMLToDOM/dom/bindings/realdomparserbindings.js	View		1 chunk	+32 lines, -0 lines	0 comments	Download
A	third_party/WebKit/PerformanceTests/HTMLToDOM/dom/domparser.js	View		1 chunk	+199 lines, -0 lines	0 comments	Download
A	third_party/WebKit/PerformanceTests/HTMLToDOM/html-benchmark.html	View	1	1 chunk	+67 lines, -0 lines	0 comments	Download
A	third_party/WebKit/PerformanceTests/HTMLToDOM/wikipedia-markup.txt	View		1 chunk	+3404 lines, -0 lines	0 comments	Download
M	third_party/WebKit/PerformanceTests/resources/runner.js	View		1 chunk	+24 lines, -0 lines	0 comments	Download
M	tools/perf/benchmarks/blink_perf.py	View		4 chunks	+35 lines, -7 lines	0 comments	Download

Messages

Total messages: 29 (6 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

ulan

I moved the HTML parsing benchmark to blink PerformanceTests as you suggested. I tried to ...

4 years, 4 months ago (2016-07-28 15:33:21 UTC) #3

esprehn

This looks really awesome, will the dashboard show all three lines output lines? I'm happy ...

4 years, 4 months ago (2016-07-28 21:48:23 UTC) #4

sullivan

On 2016/07/28 21:48:23, esprehn wrote: > This looks really awesome, will the dashboard show all ...

4 years, 4 months ago (2016-07-29 01:59:11 UTC) #5

ulan

On 2016/07/29 01:59:11, sullivan wrote: > On 2016/07/28 21:48:23, esprehn wrote: > > This looks ...

4 years, 4 months ago (2016-07-29 09:29:27 UTC) #6

On 2016/07/29 01:59:11, sullivan wrote:
> On 2016/07/28 21:48:23, esprehn wrote:
> > This looks really awesome, will the dashboard show all three lines output
> lines?
> 
> I think so, but Ulan, if you can paste the json that's generated with
> --output-format=chartjson somewhere, I could double-check.

tools/perf/results-chart.json:

{
  "trace_rerun_options": [], 
  "format_version": "0.1", 
  "benchmark_description": null, 
  "charts": {
    "real-bindings": {
      "html-benchmark.html": {
        "std": 0.0, 
        "name": "real-bindings", 
        "type": "list_of_scalar_values", 
        "important": true, 
        "values": [
          68.885
        ], 
        "units": "ms", 
        "page_id": 0
      }, 
      "summary": {
        "std": 0.0, 
        "name": "real-bindings", 
        "important": true, 
        "values": [
          68.885
        ], 
        "units": "ms", 
        "type": "list_of_scalar_values"
      }
    }, 
    "fake-bindings": {
      "html-benchmark.html": {
        "std": 0.0, 
        "name": "fake-bindings", 
        "type": "list_of_scalar_values", 
        "important": true, 
        "values": [
          37.08000000000001
        ], 
        "units": "ms", 
        "page_id": 0
      }, 
      "summary": {
        "std": 0.0, 
        "name": "fake-bindings", 
        "important": true, 
        "values": [
          37.08000000000001
        ], 
        "units": "ms", 
        "type": "list_of_scalar_values"
      }
    }, 
    "innerHTML": {
      "html-benchmark.html": {
        "std": 0.0, 
        "name": "innerHTML", 
        "type": "list_of_scalar_values", 
        "important": true, 
        "values": [
          30.35000000000001
        ], 
        "units": "ms", 
        "page_id": 0
      }, 
      "summary": {
        "std": 0.0, 
        "name": "innerHTML", 
        "important": true, 
        "values": [
          30.35000000000001
        ], 
        "units": "ms", 
        "type": "list_of_scalar_values"
      }
    }
  }, 
  "benchmark_metadata": {
    "rerun_options": [], 
    "type": "telemetry_benchmark", 
    "name": "blink_perf.html_to_dom", 
    "description": null
  }, 
  "next_version": "0.2", 
  "benchmark_name": "blink_perf.html_to_dom"
}

ulan

On 2016/08/15 16:13:18, esprehn wrote: > Okay great, is this ready to land? Yes. Could ...

4 years, 3 months ago (2016-08-25 09:05:40 UTC) #9

ulan

Ned, could you please take a look at tools/perf changes as an owner? The main ...

4 years, 3 months ago (2016-09-01 08:27:45 UTC) #12

nednguyen

On 2016/09/01 08:27:45, ulan wrote: > Ned, could you please take a look at tools/perf ...

4 years, 3 months ago (2016-09-01 13:43:15 UTC) #13

nduca

why'd we move this from a tbmv2 benchmark to the blink_perf benchmark? I don't think ...

4 years, 3 months ago (2016-09-01 22:07:46 UTC) #15

esprehn

On 2016/09/01 at 22:07:46, nduca wrote: > why'd we move this from a tbmv2 benchmark ...

4 years, 3 months ago (2016-09-01 22:10:20 UTC) #16

nednguyen

On 2016/09/01 22:10:20, esprehn wrote: > On 2016/09/01 at 22:07:46, nduca wrote: > > why'd ...

4 years, 3 months ago (2016-09-01 22:49:52 UTC) #17

nduca

This is not blocking on tracing. This is about the architecture team leads having a ...

4 years, 3 months ago (2016-09-01 22:51:51 UTC) #18

esprehn

There was this attempt at doing it with the tracing stuff: https://codereview.chromium.org/2119413003 - I don't ...

4 years, 3 months ago (2016-09-02 00:14:36 UTC) #19

ulan

> 1) What is the total cycle time of BlinkHTMLToDOM benchmark on Android? About 7 ...

4 years, 3 months ago (2016-09-02 14:24:13 UTC) #22

nednguyen

On 2016/09/02 14:24:13, ulan wrote: > > 1) What is the total cycle time of ...

4 years, 3 months ago (2016-09-02 14:25:58 UTC) #23

ulan

On 2016/09/02 14:25:58, nednguyen wrote: > On 2016/09/02 14:24:13, ulan wrote: > > > 1) ...

4 years, 3 months ago (2016-09-05 12:43:33 UTC) #24

On 2016/09/02 14:25:58, nednguyen wrote:
> On 2016/09/02 14:24:13, ulan wrote:
> > > 1) What is the total cycle time of BlinkHTMLToDOM benchmark on Android?
> > About 7 seconds on Nexus5 and 3 seconds on desktop.
> > 
> > > 2) What is the noise level of the metrics on desktop & mobile?
> > 3 runs on Nexus5:
> > blink_perf.html_to_dom:fake-bindings	ms	95.62	± 11.47%
> > blink_perf.html_to_dom:innerHTML	ms	129.85	± 8.83%
> > blink_perf.html_to_dom:real-bindings	ms	255.73	± 4.95%
> > 
> > 3 runs on desktop:
> > blink_perf.html_to_dom:fake-bindings	ms	26.52	± 13.34%
> > blink_perf.html_to_dom:innerHTML	ms	24.89	± 13.62%
> > blink_perf.html_to_dom:real-bindings	ms	48.47	± 8.66%
> > 
> > > The non tracing based benchmark is the technical debt we are trying to get
> rid
> > off.
> > Does it mean that the existing blink_perf, Octane, Kraken, JetStream
> benchmarks
> > are going to be converted to tbmv2 in future?
> 
> Correct. I think we gonna use the v8's runtime callstats metric for those.

I think the TBMv2 versions of the benchmarks should match the results of the
existing non-tracing benchmarks.

For example, http://chromium.github.io/octane computes scores for 17 line items
and then combines
the scores to get the total score. I would expect Octane TBMv2 to have 17 lines
items and the total score too.
Otherwise, we cannot remove the old Octane without losing coverage of what
people outside Chrome team are measuring.

Since the score computation is Octane specific and is not covered by v8 runtime
stats,
we will probably end up having Octane specific TBMv2 metric?

> > Then we would have to add a benchmark specific metric that can handle
X_count
> > and X_total_time?
> Correct, if X here is specific to binding and are not covered by runtime
> callstats already.

I would like to avoid benchmark specific metrics, because I like the idea that
metrics are generic and can be run on any trace.

Looks like we have to choose between the following options for a non-TBM
benchmark:
1. Add TBMv2 version of the benchmark without the benchmark specific scoring.
   In this case the new benchmark would use Chrome specific metric (e.g. v8
runtime callstats), and
   we would go out of sync with the rest of the world, because they will
continue to track the benchmark scores.
   I doubt that this option is feasible for V8 and Blink teams.

2. Add TBMv2 version of the benchmark with the benchmark specific scoring.
   This would lead to many custom metrics. The existing benchmark code will be
split
   between perf/benchmark and catapult/tracing.

3. Keep supporting non-TBM benchmarks.

What option do you think is the best? Is there another option I missed?

nednguyen

On 2016/09/05 12:43:33, ulan wrote: > On 2016/09/02 14:25:58, nednguyen wrote: > > On 2016/09/02 ...

4 years, 3 months ago (2016-09-09 16:01:13 UTC) #25

On 2016/09/05 12:43:33, ulan wrote:
> On 2016/09/02 14:25:58, nednguyen wrote:
> > On 2016/09/02 14:24:13, ulan wrote:
> > > > 1) What is the total cycle time of BlinkHTMLToDOM benchmark on Android?
> > > About 7 seconds on Nexus5 and 3 seconds on desktop.
> > > 
> > > > 2) What is the noise level of the metrics on desktop & mobile?
> > > 3 runs on Nexus5:
> > > blink_perf.html_to_dom:fake-bindings	ms	95.62	± 11.47%
> > > blink_perf.html_to_dom:innerHTML	ms	129.85	± 8.83%
> > > blink_perf.html_to_dom:real-bindings	ms	255.73	± 4.95%
> > > 
> > > 3 runs on desktop:
> > > blink_perf.html_to_dom:fake-bindings	ms	26.52	± 13.34%
> > > blink_perf.html_to_dom:innerHTML	ms	24.89	± 13.62%
> > > blink_perf.html_to_dom:real-bindings	ms	48.47	± 8.66%
> > > 
> > > > The non tracing based benchmark is the technical debt we are trying to
get
> > rid
> > > off.
> > > Does it mean that the existing blink_perf, Octane, Kraken, JetStream
> > benchmarks
> > > are going to be converted to tbmv2 in future?
> > 
> > Correct. I think we gonna use the v8's runtime callstats metric for those.
> 
> I think the TBMv2 versions of the benchmarks should match the results of the
> existing non-tracing benchmarks.
> 
> For example, http://chromium.github.io/octane computes scores for 17 line
items
> and then combines
> the scores to get the total score. I would expect Octane TBMv2 to have 17
lines
> items and the total score too.
> Otherwise, we cannot remove the old Octane without losing coverage of what
> people outside Chrome team are measuring.
> 
> Since the score computation is Octane specific and is not covered by v8
runtime
> stats,
> we will probably end up having Octane specific TBMv2 metric?
> 
> > > Then we would have to add a benchmark specific metric that can handle
> X_count
> > > and X_total_time?
> > Correct, if X here is specific to binding and are not covered by runtime
> > callstats already.
> 
> I would like to avoid benchmark specific metrics, because I like the idea that
> metrics are generic and can be run on any trace.
> 
> Looks like we have to choose between the following options for a non-TBM
> benchmark:
> 1. Add TBMv2 version of the benchmark without the benchmark specific scoring.
>    In this case the new benchmark would use Chrome specific metric (e.g. v8
> runtime callstats), and
>    we would go out of sync with the rest of the world, because they will
> continue to track the benchmark scores.
>    I doubt that this option is feasible for V8 and Blink teams.
>    
> 2. Add TBMv2 version of the benchmark with the benchmark specific scoring.
>    This would lead to many custom metrics. The existing benchmark code will be
> split
>    between perf/benchmark and catapult/tracing.
> 
> 3. Keep supporting non-TBM benchmarks.
> 
> What option do you think is the best? Is there another option I missed?

There is also the option of supporting both benchmark specific metrics & v8
metrics. The way I think we can architect it is as follow:

1) Create a  TBMv2 metric that can parse special trace events that encode
"benchmarks specific metrics" (like binding benchmarks's number in this case) &
put them to tbm2 values.
2) We add an API to action_runner that allow you to add benchmark specific
metrics parsed from the page. So you will write a html_to_dom story as follow:

class HTMLTODomPage(page.Page):

    def RunPageInteractions(...):
      action_runner.WaitForJavaScriptExpression('testRunner.isDone')
      # .... parse the metrics output by page
      action_runner.AddStorySpecificMetric(name='....', value=...)   # <-- use
performance.mark or performance.measure APIs to put those value into chrome
trace


Then with this approach, we can enable both benchmark's specific metrics & other
v8 metrics.

nduca

On 2016/09/02 at 00:14:36, esprehn wrote: > There was this attempt at doing it with ...

4 years, 3 months ago (2016-09-22 15:24:36 UTC) #26

esprehn

On 2016/09/22 at 15:24:36, nduca wrote: > On 2016/09/02 at 00:14:36, esprehn wrote: > > ...

4 years, 3 months ago (2016-09-22 15:46:25 UTC) #27

On 2016/09/22 at 15:24:36, nduca wrote:
> On 2016/09/02 at 00:14:36, esprehn wrote:
> > There was this attempt at doing it with the tracing stuff:
> > https://codereview.chromium.org/2119413003
> > 
> > - I don't want a wpr, that's not compatible with other browsers and is
difficult to run and profile with instruments.
> > - There's no way to run the benchmark in that issue against Safari and
Firefox which is critical here.
> I think we need to step back on the requirements you have here for this
benchmark. Elliot, my understanding is that your primary goal is to speed up
bindings, and track that performance over time. I understand that you also have
a secondary goal of a comparative benchmark. However, we have had numerous
conversations about comparative benchmarking over the years and in every case
concluded that focusing on "chrome / chrome" competition, e.g. us competing with
ourself, shoudl always be our primary goal.

I don't agree, and I don't think others do either. See the massive gains made
from Animometer.

> The main time to do cross-browser benchmarking is when we seek to shift a
vendor that has resisted shifting via other ways.
> 

I don't think that's true.

> If you feel that bindings needs cross browser benchmarking for this use case,
then I think you need to write something into a doc explaining your rationale.
Then we can review it with Dimitri and see if this indeed does make sense in
this case. Right now, I'm not sold on that particular step in your chain of
requirements, so everything else in my head is a bit stuck.

You've been blocking progress now for months which is not acceptable, I want to
land this benchmark for tracking our own progress. We need to move on here and
start adding more benchmarks, for example ones for postMessage. jbroman@ is
landing lots of code for that now. Cross browser testing there is also critical,
we were so slow before, but how does jbroman@ know when he's done optimizing if
he can't tell when we're as fast or faster than Safari?

I'm happy to meet about this, but I'd respectfully ask that you allow my team to
go forward landing benchmarks for our own tracking. You can ignore these
benchmarks, that's fine, but we need them. :)

nduca

I'm sorry you feel blocked. I think I've raised pretty good questions, hoping that maybe ...

4 years, 3 months ago (2016-09-22 19:21:03 UTC) #28

ulan

4 years, 3 months ago (2016-09-23 09:27:54 UTC) #29

Nat, I am sorry that I have not scheduled VC much earlier. Could you please add
me to the upcoming meeting? In the meantime I would appreciate if you could
rubber-stamp this CL.

Please note that it is using the _existing_ blink_perf framework and thus not
adding new problems. There are already dozens of blink_perf benchmarks.

If the existing benchmarks are converted to TBMv2, then converting this
benchmark by following the template will be trivial (I can do this part).

However I doubt that there is a clean solution for converting JS-based
benchmarks to TBMv2: see my concerns above about proliferation of
benchmark-specific TBMv2 metrics to compute scores for Octane, Kraken,
blink-perf, etc. Converting to TBMv2 would also make it harder to compare the
results between browsers, which is critical for these benchmarks. With that in
mind I don't see a great value in converting these benchmarks to TBMv2.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages