Issue 2950993002: Make some functions that are hit during renderer startup available for inlining

hans

hans@chromium.org changed reviewers: + mstarzinger@chromium.org, vogelheim@chromium.org

3 years, 6 months ago (2017-06-20 21:51:37 UTC) #1

hans

The CQ bit was checked by hans@chromium.org to run a CQ dry run

3 years, 6 months ago (2017-06-20 21:57:44 UTC) #3

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2950993002/1

3 years, 6 months ago (2017-06-20 21:57:46 UTC) #4

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 6 months ago (2017-06-20 22:26:07 UTC) #5

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 6 months ago (2017-06-20 22:26:08 UTC) #6

rmcilroy

rmcilroy@chromium.org changed reviewers: + rmcilroy@chromium.org

3 years, 6 months ago (2017-06-20 23:20:32 UTC) #7

rmcilroy

https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h File src/interpreter/bytecodes.h (right): https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h#newcode14 src/interpreter/bytecodes.h:14: #include "src/interpreter/bytecode-traits.h" I'd prefer not to expose bytecode-traits in ...

3 years, 6 months ago (2017-06-20 23:20:33 UTC) #8

hans

https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h File src/interpreter/bytecodes.h (right): https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h#newcode14 src/interpreter/bytecodes.h:14: #include "src/interpreter/bytecode-traits.h" On 2017/06/20 23:20:32, rmcilroy wrote: > I'd ...

3 years, 6 months ago (2017-06-20 23:49:08 UTC) #9

vogelheim

vogelheim@chromium.org changed reviewers: + jochen@chromium.org

3 years, 6 months ago (2017-06-21 10:18:01 UTC) #10

vogelheim

Generally, this looks good, and 2kB code size certainly looks like an acceptable trade-off. The ...

3 years, 6 months ago (2017-06-21 10:18:02 UTC) #11

jochen (gone - plz use gerrit)

what's the expected difference in startup from this change? https://codereview.chromium.org/2950993002/diff/20001/src/heap/objects-visiting.h File src/heap/objects-visiting.h (right): https://codereview.chromium.org/2950993002/diff/20001/src/heap/objects-visiting.h#newcode89 src/heap/objects-visiting.h:89: ...

3 years, 6 months ago (2017-06-21 10:24:21 UTC) #12

Nico

Re code health: I looked at v8's slow build time,l a while ago, and back ...

3 years, 6 months ago (2017-06-21 10:32:37 UTC) #13

Michael Lippautz

Can you quantify the impact of inlining the Scavenger slow path? ScavengeObjectSlow should result in ...

3 years, 6 months ago (2017-06-21 10:46:14 UTC) #14

rmcilroy

On 2017/06/20 23:49:08, hans wrote: > https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h > File src/interpreter/bytecodes.h (right): > > https://codereview.chromium.org/2950993002/diff/1/src/interpreter/bytecodes.h#newcode14 > ...

3 years, 6 months ago (2017-06-21 12:45:38 UTC) #15

marja

marja@chromium.org changed reviewers: + marja@chromium.org

3 years, 6 months ago (2017-06-21 14:21:09 UTC) #16

marja

Q1: How much impact does this CL have? Q2 (more meta): If de-inlining the ast ...

3 years, 6 months ago (2017-06-21 14:21:10 UTC) #17

marja

Ah, noticed thakis's answer. The current problem with build times we're trying to address is ...

3 years, 6 months ago (2017-06-21 14:25:05 UTC) #18

hans

I realize this is trading a bit of build time and header complexity against performance. ...

3 years, 6 months ago (2017-06-21 15:25:14 UTC) #20

hans

vogelheim: owner ping for snapshot/ mstarzinger: owner ping for heap/ and interpreter/ As I mentioned ...

3 years, 6 months ago (2017-06-22 13:57:21 UTC) #21

Michael Starzinger

mstarzinger@chromium.org changed reviewers: + mlippautz@chromium.org

3 years, 6 months ago (2017-06-22 14:35:36 UTC) #22

Michael Starzinger

Sorry for delay, wasn't aware I was a required reviewer on this. LGTM on "interpreter", ...

3 years, 6 months ago (2017-06-22 14:35:36 UTC) #23

Michael Lippautz

Are you solely evaluating based on UMA call counts here? Call counts do not necessarily ...

3 years, 6 months ago (2017-06-22 15:55:48 UTC) #24

hans

On 2017/06/22 15:55:48, Michael Lippautz wrote: > Are you solely evaluating based on UMA call ...

3 years, 6 months ago (2017-06-23 03:23:07 UTC) #25

On 2017/06/22 15:55:48, Michael Lippautz wrote:
> Are you solely evaluating based on UMA call counts here? Call counts do not
> necessarily related to execution times. You might push the compiler over some
> inlining budget (because the current code base is already too aggressive with
> inlining) and regress some other metric.

The call counts are from a local instrumented build. These functions were found
based on those counts, combined with the information about what functions are
inlined in LTO builds. We're trying to analyze what's different between the MSVC
and Clang build, and the data points to these functions.

We use the UMA metric to evaluate (NewTabPage.Loadtime) if this works or not.
Since the binary size was small, I don't expect I drastically changed the
inlining decisions.

> How are you determining the impact of this CL based on UMA? There are dozens
of
> CLs pushed every day that change inlining behavior. Even within a busy roll it
> could already be hard to nail this down, not even talking about a whole Canary
> release.

We're looking at the difference between MSVC and Clang builds on the UMA metric,
and that doesn't seem to move much between releases.

> (vogelheim wrote, but it didn't go to rietveld for some reason)
> I'm still not particularly happy, since this trades a permanent maintenance
burden vs a temporary, potential performance gain.

I understand your concern, but I really don't think these are too bad. Looking
at the functions, most of them are obvious inlining wins. For example, inlining
SizeOfOperand should result in a single load, which is significantly smaller
than the overhead of doing the call. I think until the use of LTO is more
wide-spread, this is something performance-focused C++ code has to live with.

> How much does it gain when trying this locally? I understand you're "targeting
a UMA metric", but surely you've measured this locally at least once, on some
machine, right?

I see 2 ms drop of the median time to refresh the NTP (50 samples), but it's
pretty noisy so it's hard to tell (also I'm not sure if some things might be
cached during the reload).

> What is the 'success' criterion that would determine whether this stays in or
not?

That it shrinks the difference between MSVC and Clang builds on
NewTabPage.Loadtime in a measureable way. If we get 10ms from this and the
inlinings we've landed in chrome, we'll be very happy.

Michael Lippautz

I am fine with inlining in the Scavenger in this CL. The approach in general ...

3 years, 6 months ago (2017-06-23 09:15:01 UTC) #26

I am fine with inlining in the Scavenger in this CL.

The approach in general does not scale though. We cannot just prematurely
optimize based on the assumption that it will move the needle on UMA. What if
UMA does not improve? Are you going to revert this or are the V8 devs going to
be stuck with this change?

I have been looking at NewTabPage.LoadTime for Win dev now. We are in the range
of 1100ms. 2ms is roughly 1 permille on this metric. Where  can is see the clang
build on UMA?

How are the 2KB relating to the overall size?

On 2017/06/23 03:23:07, hans wrote:
> On 2017/06/22 15:55:48, Michael Lippautz wrote:
> > Are you solely evaluating based on UMA call counts here? Call counts do not
> > necessarily related to execution times. You might push the compiler over
some
> > inlining budget (because the current code base is already too aggressive
with
> > inlining) and regress some other metric.
> 
> The call counts are from a local instrumented build. These functions were
found
> based on those counts, combined with the information about what functions are
> inlined in LTO builds. We're trying to analyze what's different between the
MSVC
> and Clang build, and the data points to these functions.
> 
> We use the UMA metric to evaluate (NewTabPage.Loadtime) if this works or not.
> Since the binary size was small, I don't expect I drastically changed the
> inlining decisions.
> 
> > How are you determining the impact of this CL based on UMA? There are dozens
> of
> > CLs pushed every day that change inlining behavior. Even within a busy roll
it
> > could already be hard to nail this down, not even talking about a whole
Canary
> > release.
> 
> We're looking at the difference between MSVC and Clang builds on the UMA
metric,
> and that doesn't seem to move much between releases.
> 
> > (vogelheim wrote, but it didn't go to rietveld for some reason)
> > I'm still not particularly happy, since this trades a permanent maintenance
> burden vs a temporary, potential performance gain.
> 
> I understand your concern, but I really don't think these are too bad. Looking
> at the functions, most of them are obvious inlining wins. For example,
inlining
> SizeOfOperand should result in a single load, which is significantly smaller
> than the overhead of doing the call. I think until the use of LTO is more
> wide-spread, this is something performance-focused C++ code has to live with.
> 
> > How much does it gain when trying this locally? I understand you're
"targeting
> a UMA metric", but surely you've measured this locally at least once, on some
> machine, right?
> 
> I see 2 ms drop of the median time to refresh the NTP (50 samples), but it's
> pretty noisy so it's hard to tell (also I'm not sure if some things might be
> cached during the reload).
> 
> > What is the 'success' criterion that would determine whether this stays in
or
> not?
> 
> That it shrinks the difference between MSVC and Clang builds on
> NewTabPage.Loadtime in a measureable way. If we get 10ms from this and the
> inlinings we've landed in chrome, we'll be very happy.

marja

On 2017/06/23 09:15:01, Michael Lippautz wrote: > I am fine with inlining in the Scavenger ...

3 years, 6 months ago (2017-06-23 11:04:22 UTC) #27

On 2017/06/23 09:15:01, Michael Lippautz wrote:
> I am fine with inlining in the Scavenger in this CL.
> 
> The approach in general does not scale though. We cannot just prematurely
> optimize based on the assumption that it will move the needle on UMA. What if
> UMA does not improve? Are you going to revert this or are the V8 devs going to
> be stuck with this change?
> 
> I have been looking at NewTabPage.LoadTime for Win dev now. We are in the
range
> of 1100ms. 2ms is roughly 1 permille on this metric. Where  can is see the
clang
> build on UMA?
> 
> How are the 2KB relating to the overall size?
> 
> On 2017/06/23 03:23:07, hans wrote:
> > On 2017/06/22 15:55:48, Michael Lippautz wrote:
> > > Are you solely evaluating based on UMA call counts here? Call counts do
not
> > > necessarily related to execution times. You might push the compiler over
> some
> > > inlining budget (because the current code base is already too aggressive
> with
> > > inlining) and regress some other metric.
> > 
> > The call counts are from a local instrumented build. These functions were
> found
> > based on those counts, combined with the information about what functions
are
> > inlined in LTO builds. We're trying to analyze what's different between the
> MSVC
> > and Clang build, and the data points to these functions.
> > 
> > We use the UMA metric to evaluate (NewTabPage.Loadtime) if this works or
not.
> > Since the binary size was small, I don't expect I drastically changed the
> > inlining decisions.
> > 
> > > How are you determining the impact of this CL based on UMA? There are
dozens
> > of
> > > CLs pushed every day that change inlining behavior. Even within a busy
roll
> it
> > > could already be hard to nail this down, not even talking about a whole
> Canary
> > > release.
> > 
> > We're looking at the difference between MSVC and Clang builds on the UMA
> metric,
> > and that doesn't seem to move much between releases.
> > 
> > > (vogelheim wrote, but it didn't go to rietveld for some reason)
> > > I'm still not particularly happy, since this trades a permanent
maintenance
> > burden vs a temporary, potential performance gain.
> > 
> > I understand your concern, but I really don't think these are too bad.
Looking
> > at the functions, most of them are obvious inlining wins. For example,
> inlining
> > SizeOfOperand should result in a single load, which is significantly smaller
> > than the overhead of doing the call. I think until the use of LTO is more
> > wide-spread, this is something performance-focused C++ code has to live
with.
> > 
> > > How much does it gain when trying this locally? I understand you're
> "targeting
> > a UMA metric", but surely you've measured this locally at least once, on
some
> > machine, right?
> > 
> > I see 2 ms drop of the median time to refresh the NTP (50 samples), but it's
> > pretty noisy so it's hard to tell (also I'm not sure if some things might be
> > cached during the reload).
> > 
> > > What is the 'success' criterion that would determine whether this stays in
> or
> > not?
> > 
> > That it shrinks the difference between MSVC and Clang builds on
> > NewTabPage.Loadtime in a measureable way. If we get 10ms from this and the
> > inlinings we've landed in chrome, we'll be very happy.

... which brings us back to one of my previous questions: would it be possible
to see this effect on a smaller, less noisy benchmark? Don't we run the relevant
benchmarks on Win, or why haven't we seen any regressions?

hans

On 2017/06/23 09:15:01, Michael Lippautz wrote: > I am fine with inlining in the Scavenger ...

3 years, 6 months ago (2017-06-23 14:05:48 UTC) #28

hans

On 2017/06/23 11:04:22, marja wrote: > > > That it shrinks the difference between MSVC ...

3 years, 6 months ago (2017-06-23 14:11:08 UTC) #29

hans

mlippautz, vogelheim: I believe I have approval from the others. If there is more I ...

3 years, 6 months ago (2017-06-23 14:12:54 UTC) #30

Nico

thakis@chromium.org changed reviewers: + thakis@chromium.org

3 years, 6 months ago (2017-06-23 15:49:22 UTC) #31

Nico

v8 folks: As said above, if this doesn't help, we'll gladly revert this. If it ...

3 years, 6 months ago (2017-06-23 15:49:22 UTC) #32

Michael Lippautz

lgtm for what it's worth The approach itself does not look good to me and ...

3 years, 6 months ago (2017-06-23 16:05:31 UTC) #33

Nico

On 2017/06/23 16:05:31, Michael Lippautz wrote: > lgtm for what it's worth Thanks for the ...

3 years, 6 months ago (2017-06-23 16:09:57 UTC) #34

Michael Lippautz

On 2017/06/23 16:09:57, Nico wrote: > On 2017/06/23 16:05:31, Michael Lippautz wrote: > > lgtm ...

3 years, 6 months ago (2017-06-23 16:38:57 UTC) #35

Michael Lippautz

One more thing: I just saw that we ship the clang build and there should ...

3 years, 6 months ago (2017-06-23 16:45:48 UTC) #36

Nico

vogelheim, I think we need your approval too. On 2017/06/23 16:38:57, Michael Lippautz wrote: > ...

3 years, 6 months ago (2017-06-23 16:54:34 UTC) #37

vogelheim, I think we need your approval too.

On 2017/06/23 16:38:57, Michael Lippautz wrote:
> On 2017/06/23 16:09:57, Nico wrote:
> > On 2017/06/23 16:05:31, Michael Lippautz wrote:
> > > lgtm for what it's worth
> > 
> > Thanks for the lg.
> > 
> > > The approach itself does not look good to me and I will not approve of any
> > > future CLs that have non-trivial code changes.
> > 
> > I'd like to understand what you dislike about the approach. We ran
benchmarks,
> > looked at differences in code, came up with a data-supported hypothesis and
> now
> > we're doing an experiment to confirm our hypothesis. To me, this seems like
a
> > very reasonable approach. What is your problem with it?
> 
> In essence, I'd like to see the data (you retrieved locally) on a benchmark
that
> is accessible to *all* of us somewhere in an automated fashion. Otherwise
> nothing is preventing us from unintentionally regressing during refactorings
or
> architectural changes -- which there will be some in the future. Without graph
> or alerts, we will regress at some point and unless somebody repeats the
> experiment it will go by unnoticed.

Sure, if the experiment shows that this helps, then we'll look into making
something like that. But most of these ideas don't work out, and building these
benchmarks takes time. So doing them only for experiments that do work out saves
lots of unnecessary busywork.

> Now, in particular for this change, I would expect that we actually have this
> coverage in another form. All microbenchmarks execute a ton of scavenges and
we
> should see an improvement on single items on Octane (e.g. typescript) on
non-LTO
> builds if that change had any impact. That's where we should look first. If it
> doesn't have any impact there, then there is no way that it will have
noticable
> impact when loading the NTP. UMA is nice for a global view but if I understood
> correctly you only have the MSVC roll there and you cannot compare against
> clang. So you end up looking at the page load time numbers MSCV and  I
honestly
> cannot see you pinpoint this CL to a regression *or* improvement there. I
would
> be happy if you could prove me wrong and show me the impact of this particular
> CL on the metric.

We ship both a clang and an MSVC build to the chrome/win dev channel (50% each).

Nico

vogelheim, I think we need your approval too. On 2017/06/23 16:38:57, Michael Lippautz wrote: > ...

3 years, 6 months ago (2017-06-23 16:54:35 UTC) #38

vogelheim, I think we need your approval too.

On 2017/06/23 16:38:57, Michael Lippautz wrote:
> On 2017/06/23 16:09:57, Nico wrote:
> > On 2017/06/23 16:05:31, Michael Lippautz wrote:
> > > lgtm for what it's worth
> > 
> > Thanks for the lg.
> > 
> > > The approach itself does not look good to me and I will not approve of any
> > > future CLs that have non-trivial code changes.
> > 
> > I'd like to understand what you dislike about the approach. We ran
benchmarks,
> > looked at differences in code, came up with a data-supported hypothesis and
> now
> > we're doing an experiment to confirm our hypothesis. To me, this seems like
a
> > very reasonable approach. What is your problem with it?
> 
> In essence, I'd like to see the data (you retrieved locally) on a benchmark
that
> is accessible to *all* of us somewhere in an automated fashion. Otherwise
> nothing is preventing us from unintentionally regressing during refactorings
or
> architectural changes -- which there will be some in the future. Without graph
> or alerts, we will regress at some point and unless somebody repeats the
> experiment it will go by unnoticed.

Sure, if the experiment shows that this helps, then we'll look into making
something like that. But most of these ideas don't work out, and building these
benchmarks takes time. So doing them only for experiments that do work out saves
lots of unnecessary busywork.

> Now, in particular for this change, I would expect that we actually have this
> coverage in another form. All microbenchmarks execute a ton of scavenges and
we
> should see an improvement on single items on Octane (e.g. typescript) on
non-LTO
> builds if that change had any impact. That's where we should look first. If it
> doesn't have any impact there, then there is no way that it will have
noticable
> impact when loading the NTP. UMA is nice for a global view but if I understood
> correctly you only have the MSVC roll there and you cannot compare against
> clang. So you end up looking at the page load time numbers MSCV and  I
honestly
> cannot see you pinpoint this CL to a regression *or* improvement there. I
would
> be happy if you could prove me wrong and show me the impact of this particular
> CL on the metric.

We ship both a clang and an MSVC build to the chrome/win dev channel (50% each).

hans

With mlippautz's help, I ran Typescript-octane2.1 locally; two runs (each took about 5 minutes) with ...

3 years, 6 months ago (2017-06-23 18:06:37 UTC) #39

vogelheim

On 2017/06/23 18:06:37, hans wrote: > With mlippautz's help, I ran Typescript-octane2.1 locally; two runs ...

3 years, 6 months ago (2017-06-23 20:00:11 UTC) #40

vogelheim

lgtm The "lgtm" is more due to attrition - I got pinged to "stamp the ...

3 years, 6 months ago (2017-06-23 20:09:48 UTC) #41

Nico

On 2017/06/23 20:09:48, vogelheim wrote: > lgtm Thanks! > The "lgtm" is more due to ...

3 years, 6 months ago (2017-06-23 20:19:40 UTC) #42

hans

On 2017/06/23 20:00:11, vogelheim wrote: > If I read the numbers on the perfbot correctly ...

3 years, 6 months ago (2017-06-23 20:42:39 UTC) #43

On 2017/06/23 20:00:11, vogelheim wrote:
> If I read the numbers on the perfbot correctly (typescript,
> v8_linux64_perf_try), the patch is measured as ~2.5% slower on average. But
the
> differences between patch/non-patch is less than the differences between runs,
> so this is pretty much all noise.

Right. I ran tryjobs for a null-change
(https://codereview.chromium.org/2958593002) which improved the TypeScript part
of Octane by 2.4%, so I suppose the noise level is pretty high here :-)


> As an aside, the Typescript scores on the perfbot are in the >20k range, and
> even on my Chromebox it's 22k, while you got values of ~430. I do wonder what
> exactly you're measuring there.

Huh, this is my out/release/args.gn:

is_debug = false
target_cpu = "x64"
v8_target_cpu = "arm64"
use_goma = true

I thought I was doing default release builds, but that "arm64" doesn't look
right :-( Let's replace that with "x64". Sorry about that; this stuff is not so
easy for a project outsider.

New runs:

$ for i in `seq 1 50`; do /work/v8/out/release/d8.inline run-some.js --
typescript ; done
Typescript-octane2.1(Score): 37568
Typescript-octane2.1(Score): 36223
Typescript-octane2.1(Score): 39366
Typescript-octane2.1(Score): 39208
Typescript-octane2.1(Score): 38452
Typescript-octane2.1(Score): 36809
Typescript-octane2.1(Score): 39586
Typescript-octane2.1(Score): 37959
Typescript-octane2.1(Score): 37779
Typescript-octane2.1(Score): 35458
Typescript-octane2.1(Score): 36275
Typescript-octane2.1(Score): 36788
Typescript-octane2.1(Score): 36692
Typescript-octane2.1(Score): 39293
Typescript-octane2.1(Score): 38005
Typescript-octane2.1(Score): 35019
Typescript-octane2.1(Score): 36692
Typescript-octane2.1(Score): 36883
Typescript-octane2.1(Score): 36703
Typescript-octane2.1(Score): 39414
Typescript-octane2.1(Score): 39945
Typescript-octane2.1(Score): 36556
Typescript-octane2.1(Score): 37982
Typescript-octane2.1(Score): 38118
Typescript-octane2.1(Score): 39378
Typescript-octane2.1(Score): 39016
Typescript-octane2.1(Score): 37847
Typescript-octane2.1(Score): 39233
Typescript-octane2.1(Score): 38874
Typescript-octane2.1(Score): 36745
Typescript-octane2.1(Score): 38267
Typescript-octane2.1(Score): 35408
Typescript-octane2.1(Score): 37023
Typescript-octane2.1(Score): 38580
Typescript-octane2.1(Score): 40491
Typescript-octane2.1(Score): 36735
Typescript-octane2.1(Score): 37381
Typescript-octane2.1(Score): 39390
Typescript-octane2.1(Score): 37447
Typescript-octane2.1(Score): 35438
Typescript-octane2.1(Score): 35566
Typescript-octane2.1(Score): 36608
Typescript-octane2.1(Score): 37546
Typescript-octane2.1(Score): 35786
Typescript-octane2.1(Score): 35467
Typescript-octane2.1(Score): 38210
Typescript-octane2.1(Score): 36535
Typescript-octane2.1(Score): 36809
Typescript-octane2.1(Score): 40008
Typescript-octane2.1(Score): 37524
$ for i in `seq 1 50`; do /work/v8/out/release/d8.vanilla run-some.js --
typescript ; done
Typescript-octane2.1(Score): 36441
Typescript-octane2.1(Score): 38615
Typescript-octane2.1(Score): 36682
Typescript-octane2.1(Score): 37370
Typescript-octane2.1(Score): 36745
Typescript-octane2.1(Score): 36368
Typescript-octane2.1(Score): 35947
Typescript-octane2.1(Score): 37195
Typescript-octane2.1(Score): 35968
Typescript-octane2.1(Score): 37414
Typescript-octane2.1(Score): 37304
Typescript-octane2.1(Score): 37141
Typescript-octane2.1(Score): 37087
Typescript-octane2.1(Score): 39709
Typescript-octane2.1(Score): 37447
Typescript-octane2.1(Score): 36213
Typescript-octane2.1(Score): 34762
Typescript-octane2.1(Score): 37914
Typescript-octane2.1(Score): 35696
Typescript-octane2.1(Score): 35686
Typescript-octane2.1(Score): 38909
Typescript-octane2.1(Score): 39697
Typescript-octane2.1(Score): 37326
Typescript-octane2.1(Score): 39821
Typescript-octane2.1(Score): 36080
Typescript-octane2.1(Score): 36285
Typescript-octane2.1(Score): 38897
Typescript-octane2.1(Score): 37768
Typescript-octane2.1(Score): 36969
Typescript-octane2.1(Score): 38267
Typescript-octane2.1(Score): 36524
Typescript-octane2.1(Score): 34424
Typescript-octane2.1(Score): 36357
Typescript-octane2.1(Score): 37066
Typescript-octane2.1(Score): 38510
Typescript-octane2.1(Score): 38371
Typescript-octane2.1(Score): 38050
Typescript-octane2.1(Score): 35656
Typescript-octane2.1(Score): 38591
Typescript-octane2.1(Score): 37746
Typescript-octane2.1(Score): 36121
Typescript-octane2.1(Score): 38615
Typescript-octane2.1(Score): 37044
Typescript-octane2.1(Score): 40121
Typescript-octane2.1(Score): 39341
Typescript-octane2.1(Score): 37033
Typescript-octane2.1(Score): 37304
Typescript-octane2.1(Score): 37513
Typescript-octane2.1(Score): 35606
Typescript-octane2.1(Score): 35988

The median is 37535 with my change and 37168 without, but the standard deviation
is over 1000.

My change is well in the noise, which is to be expected I suppose.


On 2017/06/23 20:09:48, vogelheim wrote:
> The "lgtm" is more due to attrition - I got pinged to "stamp the CL" Friday
> night, while at home - than to the change actually looking good to me.
> 
> In particular
> - I share Michael's concerns in full,
> - it seems the expected best-case result for the V8-side changes is unlikely
to
> exceed the measurement noise,

That's the problem with death by a thousand cuts.

> - the main motivating benchmark - closing the gap between MSVC and clang
builds
> - is a vanity benchmark without user relevance, and for the user-relevant
> benchmark (empty tab page) we expect only ~0.2%.

That's one way to look at it I suppose.

Another way is that this is a real-world web page that's loaded billions of
times per day where our instrumentation shows that V8 is the bottle-neck and
that these functions benefit from inlining.


I'm sorry if this patch has been ill received; I certainly didn't intend to
annoy you with it, and believe me I would have been much happier to show some
hard benchmark numbers.

Assuming all goes well and this rolls in before the next dev cut, we shuold have
numbers on the UMA metric by the end of next week and will report back.

Thanks.

hans

The patchset sent to the CQ was uploaded after l-g-t-m from mstarzinger@chromium.org, mlippautz@chromium.org, marja@chromium.org Link ...

3 years, 6 months ago (2017-06-23 20:42:55 UTC) #45

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2950993002/40001

3 years, 6 months ago (2017-06-23 20:42:57 UTC) #46

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1498250574619930, "parent_rev": "ee0e295d8e829b5c3fc1225b09c7c886e00305f6", "commit_rev": "d00d52be1fce9c1bf5558c8b26bf984efd09e65b"}

3 years, 6 months ago (2017-06-23 21:12:14 UTC) #47

commit-bot: I haz the power

Description was changed from ========== Make some functions that are hit during renderer startup available ...

3 years, 6 months ago (2017-06-23 21:12:26 UTC) #48

commit-bot: I haz the power

Committed patchset #3 (id:40001) as https://chromium.googlesource.com/v8/v8/+/d00d52be1fce9c1bf5558c8b26bf984efd09e65b

3 years, 6 months ago (2017-06-23 21:12:28 UTC) #49

Michael Achenbach

A revert of this CL (patchset #3 id:40001) has been created in https://codereview.chromium.org/2955793002/ by machenbach@chromium.org. ...

3 years, 6 months ago (2017-06-25 20:29:39 UTC) #50

hans

Description was changed from ========== Make some functions that are hit during renderer startup available ...

3 years, 5 months ago (2017-06-26 17:37:30 UTC) #51

hans

Description was changed from ========== Make some functions that are hit during renderer startup available ...

3 years, 5 months ago (2017-06-26 17:39:18 UTC) #52

hans

On 2017/06/25 20:29:39, Michael Achenbach wrote: > A revert of this CL (patchset #3 id:40001) ...

3 years, 5 months ago (2017-06-26 17:40:12 UTC) #53

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2950993002/40001

3 years, 5 months ago (2017-06-26 17:40:40 UTC) #55

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 5 months ago (2017-06-26 17:40:42 UTC) #56

commit-bot: I haz the power

Your CL can not be processed by CQ because of: * Failed to parse additional ...

3 years, 5 months ago (2017-06-26 17:40:43 UTC) #57

hans

Description was changed from ========== Make some functions that are hit during renderer startup available ...

3 years, 5 months ago (2017-06-26 17:41:30 UTC) #58

commit-bot: I haz the power

CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2950993002/40001

3 years, 5 months ago (2017-06-26 17:41:50 UTC) #60

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 40001, "attempt_start_ts": 1498498901604130, "parent_rev": "a2f51f779025adcd32fa72664591246edfcc3b9b", "commit_rev": "777da354d20286d39048f1421d89fa109e38b9e1"}

3 years, 5 months ago (2017-06-26 18:17:06 UTC) #61

commit-bot: I haz the power

Description was changed from ========== Make some functions that are hit during renderer startup available ...

3 years, 5 months ago (2017-06-26 18:17:15 UTC) #62

commit-bot: I haz the power

Committed patchset #3 (id:40001) as https://chromium.googlesource.com/v8/v8/+/777da354d20286d39048f1421d89fa109e38b9e1

3 years, 5 months ago (2017-06-26 18:17:17 UTC) #63

hans

To follow up: since this got reverted initially (sorry, I should have watched the roller), ...

3 years, 5 months ago (2017-06-28 21:35:27 UTC) #64

Michael Lippautz

On 2017/06/28 21:35:27, hans wrote: > To follow up: since this got reverted initially (sorry, ...

3 years, 5 months ago (2017-06-29 10:20:30 UTC) #65

hans

3 years, 5 months ago (2017-07-18 08:15:55 UTC) #66

Message was sent while issue was closed.

It took a week longer than expected because there was no Dev channel release on
the week of 3 July.

(If you're at Google, the numbers are at go/61.0.3153.2-uma)

Since most of us were on vacation, this V8 change is the only CL targeting the
MSVC/Clang gap in that release.

What we saw is that on SessionRestore.ForegroundTabFirstLoaded and
Scheduling.Renderer.DrawDuration2, Clang-built Chrome is now faster than
MSVC-built Chrome on the median.

We can't tell with certainty that this is because of the V8 patch, but since we
didn't do any other patches in this release, and since Clang-built Chrome was
previously slower on those metrics, we'd like to keep this if possible.

Having said that, I promised to reconsider these inlinings when we had metrics.

To re-cap, these are the functions that were inlined, along with the number of
times they were called before inlining during NTP load:

205153 Scanner::ScanTemplateContinuation
151841 VariableProxy::VariableProxy(AsRawString, ...)
140454 Scope::NewUnresolved
80931 Scavenger::ScavengeObjectsSlow
57994 Variable::Variable(Scope, AstRawString, ...)
45411 SnapshotByteSource::CopyRaw
37636 Heap::AllocateFixedArray
32558 StaticVisitorBase::GetVisitorId
30137 ByteCodes::SizeOfOperand

I think some of these make a lot of sense on their own, but if you think some of
them need to reverted, I'll do so.

Scope::NewUnresolved was the only one that required including another header
(ast.h in scopes.h). StaticVisitorBase::GetVisitorId is a rather ugly switch
statement, and Scavenger::ScavengeObjectsSlow looks designed to be out-of-line.

I'm also in Munich this week so I can discuss this in person if you'd like.

Issue 2950993002: Make some functions that are hit during renderer startup available for inlining (Closed)

Description

Patch Set 1 #

Patch Set 2 : don't expose bytecode-traits.h in bytecodes.h #

Patch Set 3 : rebase #

Messages