Issue 2140093002: STH Set: Delay loading until after start-up.

Eran Messeri

Description was changed from ========== STH Set: Delay loading until after start-up. When loading the ...

4 years, 5 months ago (2016-07-12 12:27:19 UTC) #1

Eran Messeri

eranm@chromium.org changed reviewers: + asvitkine@chromium.org, jam@chromium.org, waffles@chromium.org

4 years, 5 months ago (2016-07-12 12:27:19 UTC) #2

Eran Messeri

jam, kindly review the small documentation change to content/public/browser/browser_thread.h. waffles / asvitkine, the STHSet component ...

4 years, 5 months ago (2016-07-12 12:28:48 UTC) #3

Eran Messeri

jam, kindly review the small documentation change to content/public/browser/browser_thread.h. waffles / asvitkine, the STHSet component ...

4 years, 5 months ago (2016-07-12 12:28:49 UTC) #4

waffles

https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc#newcode79 chrome/browser/component_updater/sth_set_component_installer.cc:79: base::Bind(&STHSetComponentInstallerTraits::PostStartupLoadSTHsFromDisk, Why not directly post LoadSTHsFromDisk? (I'm not sure ...

4 years, 5 months ago (2016-07-12 13:57:00 UTC) #5

Alexei Svitkine (slow)

https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc#newcode77 chrome/browser/component_updater/sth_set_component_installer.cc:77: content::BrowserThread::PostAfterStartupTask( Per comment on the bug, please keep the ...

4 years, 5 months ago (2016-07-12 14:56:09 UTC) #6

Eran Messeri

Addressed both comments, PTAL. https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_updater/sth_set_component_installer.cc#newcode77 chrome/browser/component_updater/sth_set_component_installer.cc:77: content::BrowserThread::PostAfterStartupTask( On 2016/07/12 14:56:09, Alexei ...

4 years, 5 months ago (2016-07-12 16:14:42 UTC) #7

jam

I just read through a bunch of the notes on the bug. Why should we ...

4 years, 5 months ago (2016-07-12 16:27:00 UTC) #9

Eran Messeri

On 2016/07/12 16:27:00, jam wrote: > I just read through a bunch of the notes ...

4 years, 5 months ago (2016-07-12 19:56:34 UTC) #10

Alexei Svitkine (slow)

This change looks good % comment below. At the very least, I think it's a ...

4 years, 5 months ago (2016-07-12 19:59:28 UTC) #11

gab

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode80 chrome/browser/component_updater/sth_set_component_installer.cc:80: "delayed_load") == "yes") { Flip this condition around (i.e. ...

4 years, 5 months ago (2016-07-12 20:09:25 UTC) #13

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
File chrome/browser/component_updater/sth_set_component_installer.cc (right):

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:80:
"delayed_load") == "yes") {
Flip this condition around (i.e. make the default on trunk in the absence of a
value for the feature to be a delayed load), this will make it run on the
waterfall (and hence a cleaner merge to Stable when we confirm it works).

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:81:
content::BrowserThread::PostAfterStartupTask(
Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this is
*really* not critical (i.e. it could even be delayed much further after
startup?) Can we make it a 5 or even 10 minutes delay instead?

PostAfterStartupTask is meant for non-startup critical work which should still
run very shortly after startup (i.e. earlier than a fixed delay on fast machines
but later on slow machines).

PS: In the upcoming TaskScheduler world this will be a perfect candidate for
BACKGROUND work, but until then we have to play with hackish delays...

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:89:
base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk,
Extract:

const base::Closure load_sths_closure = base::Bind(...);

outside of the conditionals and re-use |load_sths_closure| in both places.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:92:
NOTREACHED();
This NOTREACHED() is not necessary. PostTask() will never fail unless during
late shutdown in which case we don't care to load STHs I assume.

(callers of PostTask() pretty much never check the return value unless it is
critical to run before shutdown -- and even then the API states that "true"
means that it "may run"...)

jam

On 2016/07/12 19:56:34, Eran Messeri wrote: > On 2016/07/12 16:27:00, jam wrote: > > I ...

4 years, 5 months ago (2016-07-12 20:12:00 UTC) #14

On 2016/07/12 19:56:34, Eran Messeri wrote:
> On 2016/07/12 16:27:00, jam wrote:
> > I just read through a bunch of the notes on the bug.
> > 
> > Why should we load the utility process on each browser startup to do the
same
> > task over and over, instead of saving a sanitized version once per comment
> 138?
> 
> To be clear, fresh data is periodically (every day) pushed via the component
> updater. It's not static.
> 
> My understanding is that sanitizing the data from the component updater is
> component-specific - that is, the STHSetComponent will have to sanitize each
of
> the JSON files it receives, save them to disk and load them from disk on
> start-up, having to distinguish between browser start-up (where the call to
> ComponentReady should not do anything because sanitized data is available) and
> regular updates (where the call to ComponentReady should sanitize the input
and
> store it to disk).
> 
> I do not think the added complexity is justified, simply because the data
> (Signed Tree Heads from Certificate Transparency logs) is used asynchronously
to
> audit CT logs. It is not used during start-up at all and it is perfectly fine
if
> it is made available later.
> 
> Right now the STHSetComponent benefits from being very simple thanks to the
> component updater interface - it does not persist anything to disk by itself,
> does not have to care about per-profile directories / ChromeOS storage because
> the component updater takes care of all of that. Adding the logic of handling
> sanitized/unsanitized data and distinguishing between start-up/regular update
> scenarios will not provide any functional benefits.

The motivation is performance and power. On mobile, a user will have many
browser startups a day. Doing redundant work on each startup, even if it's
delayed, causes two problems:
-there are limited number of cores, starting a utility process to sanitize the
json on each browser startup means the cores are tied up and not handling tasks
that the user is waiting on
-the battery drains faster

Regarding per-profile, note as an example we already have installation wide data
that is also updated frequently: safe browsing. For that we also don't
download/save it per profile since it's not user specific.

waffles

I'm fine with the CL as-is, since it seems an improvement over the existing code. ...

4 years, 5 months ago (2016-07-12 20:46:07 UTC) #15

I'm fine with the CL as-is, since it seems an improvement over the existing
code. But since we're discussing, I will toss in my 2¢:

> My understanding is that sanitizing the data from the component updater is
> component-specific - that is, the STHSetComponent will have to sanitize each
of
> the JSON files it receives, save them to disk and load them from disk on
> start-up, having to distinguish between browser start-up (where the call to
> ComponentReady should not do anything because sanitized data is available) and
> regular updates (where the call to ComponentReady should sanitize the input
and
> store it to disk).

You're correct that its component specific, but if we went this path I think
what you would do here is customize OnCustomInstall to sanitize the contents and
write them out (in the dir passed to OnCustomInstall) as
[filename]_sanitized.json or something like that. OnCustomInstall is called
exactly once during each update. ComponentReady can just assume *_santized.json
exists and load them - no distinction between update and browser boot case.


> Regarding per-profile, note as an example we already have installation wide
data
> that is also updated frequently: safe browsing. For that we also don't
> download/save it per profile since it's not user specific.

To clarify, components are not per-profile, but they are per-user-data-dir (so
per-OS-user). We don't have a solution for an installation-wide installs of
components because we believe it requires user→system elevation in system-wide
install cases. I am very curious to know if safe browsing somehow solved this
problem, maybe we can talk out of band?


Can we take a step back and ask: What's the purpose of using SafeJsonParser at
all in this situation? Which attacker are we protecting against?
 * Compromised Google server? (Seem far-fetched: sending crashy JSON seems less
useful than telling the browser to go fetch evil_widevine_implementation.dll.
Plus they could just mess with manifest.json which is already parsed
in-process.)
 * Network MITM? (Already secured by component updater using a pinned ECDSA key
& SHA-256, defense-in-depth with Google-signed CRX.)
 * Evildoer with access to local disk? (Obviously not, if writing post-sanitized
files back to disk is an option.)

jam

On 2016/07/12 20:46:07, waffles wrote: > I'm fine with the CL as-is, since it seems ...

4 years, 5 months ago (2016-07-12 20:52:49 UTC) #16

On 2016/07/12 20:46:07, waffles wrote:
> I'm fine with the CL as-is, since it seems an improvement over the existing
> code. But since we're discussing, I will toss in my 2¢:
> 
> > My understanding is that sanitizing the data from the component updater is
> > component-specific - that is, the STHSetComponent will have to sanitize each
> of
> > the JSON files it receives, save them to disk and load them from disk on
> > start-up, having to distinguish between browser start-up (where the call to
> > ComponentReady should not do anything because sanitized data is available)
and
> > regular updates (where the call to ComponentReady should sanitize the input
> and
> > store it to disk).
> 
> You're correct that its component specific, but if we went this path I think
> what you would do here is customize OnCustomInstall to sanitize the contents
and
> write them out (in the dir passed to OnCustomInstall) as
> [filename]_sanitized.json or something like that. OnCustomInstall is called
> exactly once during each update. ComponentReady can just assume
*_santized.json
> exists and load them - no distinction between update and browser boot case.
> 
> 
> > Regarding per-profile, note as an example we already have installation wide
> data
> > that is also updated frequently: safe browsing. For that we also don't
> > download/save it per profile since it's not user specific.
> 
> To clarify, components are not per-profile, but they are per-user-data-dir (so
> per-OS-user). We don't have a solution for an installation-wide installs of
> components because we believe it requires user→system elevation in system-wide
> install cases. I am very curious to know if safe browsing somehow solved this
> problem, maybe we can talk out of band?

Apologies I wasn't speaking precisely, I meant per user data directory.

> 
> 
> Can we take a step back and ask: What's the purpose of using SafeJsonParser at
> all in this situation? Which attacker are we protecting against?
>  * Compromised Google server? (Seem far-fetched: sending crashy JSON seems
less
> useful than telling the browser to go fetch evil_widevine_implementation.dll.
> Plus they could just mess with manifest.json which is already parsed
> in-process.)
>  * Network MITM? (Already secured by component updater using a pinned ECDSA
key
> & SHA-256, defense-in-depth with Google-signed CRX.)
>  * Evildoer with access to local disk? (Obviously not, if writing
post-sanitized
> files back to disk is an option.)

+1, thanks for bringing this up. I was going to, but then I figured that would
be obvious and there's a reason it's done this way. But we should verify :)

Eran Messeri

[+rsleevi, as he originally pointed out JSON parsing should be done in a separate process]. ...

4 years, 5 months ago (2016-07-13 10:47:43 UTC) #17

Eran Messeri

PTAL - Is everyone, including jam, happy with the CL in its current state? Note ...

4 years, 5 months ago (2016-07-13 10:51:56 UTC) #18

PTAL - Is everyone, including jam, happy with the CL in its current state?

Note I have switched from GetVariationParamsByFeature to GetVariationParamValue
which takes a trial name as my understanding of variation params is that they
are set at the experiment level, not feature level. The STHSetComponentStudy
trial is now set up to have Disabled/Control experiments and we'll further split
down the Control experiment to two, both have the STHSetComponent feature
enabled, one with "delayed_load" param set to "no" and another with
"delayed_load" param set to "yes", or not set at all.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
File chrome/browser/component_updater/sth_set_component_installer.cc (right):

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:80:
"delayed_load") == "yes") {
On 2016/07/12 20:09:25, gab wrote:
> Flip this condition around (i.e. make the default on trunk in the absence of a
> value for the feature to be a delayed load), this will make it run on the
> waterfall (and hence a cleaner merge to Stable when we confirm it works).

Done.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:81:
content::BrowserThread::PostAfterStartupTask(
On 2016/07/12 20:09:24, gab wrote:
> Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this is
> *really* not critical (i.e. it could even be delayed much further after
> startup?) Can we make it a 5 or even 10 minutes delay instead?
> 
> PostAfterStartupTask is meant for non-startup critical work which should still
> run very shortly after startup (i.e. earlier than a fixed delay on fast
machines
> but later on slow machines).
> 
> PS: In the upcoming TaskScheduler world this will be a perfect candidate for
> BACKGROUND work, but until then we have to play with hackish delays...

Yes, loading the STHs can be delayed by 5-10 minutes.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:84:
base::Unretained(this), GetInstalledPath(install_dir),
On 2016/07/12 19:59:27, Alexei Svitkine (slow) wrote:
> As an aside, why is this used base::Unretained()? Is there a guarantee that
this
> class won't be destroyed (e.g. shutting down while it's fetching the
component)
> before the task runs? If there's no obvious such guarantee, please add a
> WeakPointerFactory as a a member and pass a weak pointer.
> 
> (If it is guaranteed, add a comment.)

There are no guarantees, so I've switched to a WeakPointerFactory.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:89:
base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk,
On 2016/07/12 20:09:24, gab wrote:
> Extract:
> 
> const base::Closure load_sths_closure = base::Bind(...);
> 
> outside of the conditionals and re-use |load_sths_closure| in both places.

Done.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:92:
NOTREACHED();
On 2016/07/12 20:09:24, gab wrote:
> This NOTREACHED() is not necessary. PostTask() will never fail unless during
> late shutdown in which case we don't care to load STHs I assume.
> 
> (callers of PostTask() pretty much never check the return value unless it is
> critical to run before shutdown -- and even then the API states that "true"
> means that it "may run"...)

Thanks for the explanation, removed. Does it make sense to change the API? the
if (...PostTask()) { NOTREACHED(); } pattern is something I saw scattered around
 the code quite a few times.

Alexei Svitkine (slow)

On Jul 13, 2016 6:51 AM, <eranm@chromium.org> wrote: > > PTAL - Is everyone, including ...

4 years, 5 months ago (2016-07-13 12:35:50 UTC) #19

On Jul 13, 2016 6:51 AM, <eranm@chromium.org> wrote:

>

> PTAL - Is everyone, including jam, happy with the CL in its current state?
>
> Note I have switched from GetVariationParamsByFeature to
GetVariationParamValue
> which takes a trial name as my understanding of variation params is that
they
> are set at the experiment level, not feature level. The
STHSetComponentStudy
> trial is now set up to have Disabled/Control experiments and we'll
further split
> down the Control experiment to two, both have the STHSetComponent feature
> enabled, one with "delayed_load" param set to "no" and another with
> "delayed_load" param set to "yes", or not set at all.
>

GetVariationParamsByFeature should be fine to use, please switch back to
using this. It's true they are per experiment, but the experiment is
associated with a feature. We might have some old docs, however - so if you
saw something that says the former is not supported, let me know and I can
fix the docs.

>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> File chrome/browser/component_updater/sth_set_component_installer.cc
> (right):
>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> chrome/browser/component_updater/sth_set_component_installer.cc:80:
> "delayed_load") == "yes") {
> On 2016/07/12 20:09:25, gab wrote:
> > Flip this condition around (i.e. make the default on trunk in the
> absence of a
> > value for the feature to be a delayed load), this will make it run on
> the
> > waterfall (and hence a cleaner merge to Stable when we confirm it
> works).
>
> Done.
>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> chrome/browser/component_updater/sth_set_component_installer.cc:81:
> content::BrowserThread::PostAfterStartupTask(
> On 2016/07/12 20:09:24, gab wrote:
> > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10
> this is
> > *really* not critical (i.e. it could even be delayed much further
> after
> > startup?) Can we make it a 5 or even 10 minutes delay instead?
> >
> > PostAfterStartupTask is meant for non-startup critical work which
> should still
> > run very shortly after startup (i.e. earlier than a fixed delay on
> fast machines
> > but later on slow machines).
> >
> > PS: In the upcoming TaskScheduler world this will be a perfect
> candidate for
> > BACKGROUND work, but until then we have to play with hackish delays...
>
> Yes, loading the STHs can be delayed by 5-10 minutes.
>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> chrome/browser/component_updater/sth_set_component_installer.cc:84:
> base::Unretained(this), GetInstalledPath(install_dir),
> On 2016/07/12 19:59:27, Alexei Svitkine (slow) wrote:
> > As an aside, why is this used base::Unretained()? Is there a guarantee
> that this
> > class won't be destroyed (e.g. shutting down while it's fetching the
> component)
> > before the task runs? If there's no obvious such guarantee, please add
> a
> > WeakPointerFactory as a a member and pass a weak pointer.
> >
> > (If it is guaranteed, add a comment.)
>
> There are no guarantees, so I've switched to a WeakPointerFactory.
>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> chrome/browser/component_updater/sth_set_component_installer.cc:89:
> base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk,
> On 2016/07/12 20:09:24, gab wrote:
> > Extract:
> >
> > const base::Closure load_sths_closure = base::Bind(...);
> >
> > outside of the conditionals and re-use |load_sths_closure| in both
> places.
>
> Done.
>
>
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
<https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...>
> chrome/browser/component_updater/sth_set_component_installer.cc:92:
> NOTREACHED();
> On 2016/07/12 20:09:24, gab wrote:
> > This NOTREACHED() is not necessary. PostTask() will never fail unless
> during
> > late shutdown in which case we don't care to load STHs I assume.
> >
> > (callers of PostTask() pretty much never check the return value unless
> it is
> > critical to run before shutdown -- and even then the API states that
> "true"
> > means that it "may run"...)
>
> Thanks for the explanation, removed. Does it make sense to change the
> API? the if (...PostTask()) { NOTREACHED(); } pattern is something I saw
> scattered around the code quite a few times.
>
> https://codereview.chromium.org/2140093002/
<https://codereview.chromium.org/2140093002/>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Eran Messeri

> > GetVariationParamsByFeature should be fine to use, please switch back to > using this. ...

4 years, 5 months ago (2016-07-13 16:03:56 UTC) #20

jam

On 2016/07/13 10:47:43, Eran Messeri wrote: > [+rsleevi, as he originally pointed out JSON parsing ...

4 years, 5 months ago (2016-07-13 17:22:01 UTC) #23

gab

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode81 chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/13 10:51:55, Eran Messeri wrote: > On ...

4 years, 5 months ago (2016-07-13 17:29:08 UTC) #24

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
File chrome/browser/component_updater/sth_set_component_installer.cc (right):

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:81:
content::BrowserThread::PostAfterStartupTask(
On 2016/07/13 10:51:55, Eran Messeri wrote:
> On 2016/07/12 20:09:24, gab wrote:
> > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this
is
> > *really* not critical (i.e. it could even be delayed much further after
> > startup?) Can we make it a 5 or even 10 minutes delay instead?
> > 
> > PostAfterStartupTask is meant for non-startup critical work which should
still
> > run very shortly after startup (i.e. earlier than a fixed delay on fast
> machines
> > but later on slow machines).
> > 
> > PS: In the upcoming TaskScheduler world this will be a perfect candidate for
> > BACKGROUND work, but until then we have to play with hackish delays...
> 
> Yes, loading the STHs can be delayed by 5-10 minutes.

Ok then please use:

// STHs perform an async sanity verification, it is fine to delay them much
after startup.
constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10);
content::BrowserThread::GetBlockingPool()->PostDelayedTask(..., sth_load_delay);

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:92:
NOTREACHED();
On 2016/07/13 10:51:55, Eran Messeri wrote:
> On 2016/07/12 20:09:24, gab wrote:
> > This NOTREACHED() is not necessary. PostTask() will never fail unless during
> > late shutdown in which case we don't care to load STHs I assume.
> > 
> > (callers of PostTask() pretty much never check the return value unless it is
> > critical to run before shutdown -- and even then the API states that "true"
> > means that it "may run"...)
> 
> Thanks for the explanation, removed. Does it make sense to change the API? the
> if (...PostTask()) { NOTREACHED(); } pattern is something I saw scattered
around
>  the code quite a few times.

There are far more places that don't have NOTREACHED() then places that do.

TBH, I would like for the API to just return void but it's *way* too spread for
this by now.

It's not harmful to NOTREACHED(), but it's not really ever correct either
(except in a few low-level corner cases which it's not worth going into here --
but are the reason why this can't be trivially turned into "void").

Ryan Sleevi

rsleevi@chromium.org changed reviewers: + rsleevi@chromium.org

4 years, 5 months ago (2016-07-13 17:42:18 UTC) #25

Ryan Sleevi

LGTM As for in-proc/out-of-proc, while jam@'s correct that if it's coming from Google it should ...

4 years, 5 months ago (2016-07-13 17:42:19 UTC) #26

jam

On 2016/07/13 17:22:01, jam wrote: > On 2016/07/13 10:47:43, Eran Messeri wrote: > > [+rsleevi, ...

4 years, 5 months ago (2016-07-13 17:59:12 UTC) #28

jam

On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > LGTM > > As for in-proc/out-of-proc, ...

4 years, 5 months ago (2016-07-13 18:30:02 UTC) #29

Eran Messeri

On 2016/07/13 18:30:02, jam wrote: > On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > ...

4 years, 5 months ago (2016-07-13 18:46:54 UTC) #30

jam

On 2016/07/13 18:46:54, Eran Messeri wrote: > On 2016/07/13 18:30:02, jam wrote: > > On ...

4 years, 5 months ago (2016-07-13 19:19:09 UTC) #31

Eran Messeri

> Thanks for the clarification. Given that information, agree it does sound like > we ...

4 years, 5 months ago (2016-07-13 19:22:11 UTC) #32

Alexei Svitkine (slow)

https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode158 chrome/browser/component_updater/sth_set_component_installer.cc:158: int error_code = 0; Can you keep the old ...

4 years, 5 months ago (2016-07-13 19:25:01 UTC) #33

jam

lgtm, thanks for the simplification IMO using finch here to keep the old code path ...

4 years, 5 months ago (2016-07-13 19:31:25 UTC) #34

Alexei Svitkine (slow)

How do we know that the new codepath fully restores the regression rather than only ...

4 years, 5 months ago (2016-07-13 19:38:04 UTC) #35

waffles

lgtm (FWIW I agree with jam@ regarding keeping Finch here, unless we are worried about ...

4 years, 5 months ago (2016-07-13 19:41:54 UTC) #36

jam

On 2016/07/13 19:38:04, Alexei Svitkine (slow) wrote: > How do we know that the new ...

4 years, 5 months ago (2016-07-13 19:46:24 UTC) #37

brettw

I agree we do not need Finch in this case. We should run experiments when ...

4 years, 5 months ago (2016-07-13 19:51:16 UTC) #38

Alexei Svitkine (slow)

Actually, thinking about this more, I think you're right - we can omit the additional ...

4 years, 5 months ago (2016-07-13 20:07:02 UTC) #39

gab

On 2016/07/13 20:07:02, Alexei Svitkine (slow) wrote: > Actually, thinking about this more, I think ...

4 years, 5 months ago (2016-07-13 20:18:52 UTC) #40

On 2016/07/13 20:07:02, Alexei Svitkine (slow) wrote:
> Actually, thinking about this more, I think you're right - we can omit the
> additional Finch control here.
> 
> We already have the hammer set up that turns off the feature entirely via
> Finch which we've used to validate this caused a regression. We still need
> to get results from Dev for it to see if it really caused the full Dev
> regression (there was no Dev release this week and Canary regression is
> much smaller I think).
> 
> However, that existing set up should be sufficient to check that the
> regression is resolved by this CL. Basically, we just need to compare the
> Enabled/Disabled groups on versions after this CL to ensure there's no
> longer any performance difference between the two.

Well, that's only true if we don't land this CL before the next Dev as it
currently removes the safe JSON parse which was part of the problem whether the
feature is on or off.

In this case though the regression is large enough (visible on timeline) that I
agree we don't need Finch to tell the effect (we do on bugs where the effect is
lost in noise but not here).

Speaking of which, maybe we can just remove the Finch code altogether?

> 
> So, I agree we can avoid the extra Finch control in the JSON code and we
> can also remove the variations param check in the startup delay codepath.
> Sorry for the opposite advice earlier.
> 
> On Wed, Jul 13, 2016 at 3:51 PM,
<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=brettw@chromium.org> wrote:
> 
> > I agree we do not need Finch in this case. We should run experiments when
> > we
> > need to, but they make everything more complicated. In this case, we've
> > done a
> > lot of analysis and have confidence in the fix. And since we noticed the
> > regression in the first place, it should be equally easy to validate that
> > the
> > fix is working as intended.
> >
> > https://codereview.chromium.org/2140093002/
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to
https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=chromium-reviews+unsubscri....

Eran Messeri

I'm fine either way - asvitkine, please make a decision, I'll revise the code according ...

4 years, 5 months ago (2016-07-13 20:58:44 UTC) #41

I'm fine either way - asvitkine, please make a decision, I'll revise the
code according to your decision tomorrow morning (London time).

On 13 Jul 2016 9:18 pm, <gab@chromium.org> wrote:

On 2016/07/13 20:07:02, Alexei Svitkine (slow) wrote:
> Actually, thinking about this more, I think you're right - we can omit the
> additional Finch control here.
>
> We already have the hammer set up that turns off the feature entirely via
> Finch which we've used to validate this caused a regression. We still need
> to get results from Dev for it to see if it really caused the full Dev
> regression (there was no Dev release this week and Canary regression is
> much smaller I think).
>
> However, that existing set up should be sufficient to check that the
> regression is resolved by this CL. Basically, we just need to compare the
> Enabled/Disabled groups on versions after this CL to ensure there's no
> longer any performance difference between the two.

Well, that's only true if we don't land this CL before the next Dev as it
currently removes the safe JSON parse which was part of the problem whether
the
feature is on or off.

In this case though the regression is large enough (visible on timeline)
that I
agree we don't need Finch to tell the effect (we do on bugs where the
effect is
lost in noise but not here).

Speaking of which, maybe we can just remove the Finch code altogether?


>
> So, I agree we can avoid the extra Finch control in the JSON code and we
> can also remove the variations param check in the startup delay codepath.
> Sorry for the opposite advice earlier.
>
> On Wed, Jul 13, 2016 at 3:51 PM,
<https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=brettw@chromium.org&gt;
wrote:
>
> > I agree we do not need Finch in this case. We should run experiments
when
> > we
> > need to, but they make everything more complicated. In this case, we've
> > done a
> > lot of analysis and have confidence in the fix. And since we noticed the
> > regression in the first place, it should be equally easy to validate
that
> > the
> > fix is working as intended.
> >
> > https://codereview.chromium.org/2140093002/
> >
>
> --
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to
https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=chromium-reviews+unsubscri...
.



https://codereview.chromium.org/2140093002/

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

jam

On 2016/07/13 20:58:44, Eran Messeri wrote: > I'm fine either way - asvitkine, please make ...

4 years, 5 months ago (2016-07-13 21:05:59 UTC) #42

Alexei Svitkine (slow)

Okay, I discussed this more with gab@ and rkaplow@ offline and here's some more thoughts ...

4 years, 5 months ago (2016-07-13 21:18:53 UTC) #43

Okay, I discussed this more with gab@ and rkaplow@ offline and here's some
more thoughts on this. They're different from my previous reply.

If we just remove all Finch control right now and land this before dev cut,
then we risk conflating the result of this change with any other startup
regression or improvement that happened since the previous dev (note: and
since we didn't have a dev release this week, it's a longer time window
than usual). For example, if something regressed startup by 200ms in the
mean time and this improves it by 1s, we'll think it improved it by 800ms
and miss the other regression.

So I don't think this would be acceptable. We shouldn't allow the
possibility of new regressions to slip through if we can easily control for
it.

Also note that Canary data is not very reliable for startup - for the
original regression we had a lot of trouble identifying anything on Canary
due to noise, we only saw a noticeable effect on dev/beta/stable.

So given the above, I can think of two paths forward:
  1. We just Finch it (the JSON parser change) - defaulting of course to
the fixed behavior by default. The developer overhead is just cleaning it
up after, which honestly is like 10 mins of work - probably more than we've
spent discussing whether to do it already.
  2. We remove all Finch control in this CL, but do not land it until after
Dev cut. This way, we can still get data about the dev impact (in
particular, which will explain whether this issue explains the full M51
regression or not which we don't have a signal for yet).

I am okay with either approach, but have a preference for 1, since then we
don't have to delay landing the CL.

On Wed, Jul 13, 2016 at 5:05 PM, <jam@chromium.org> wrote:

> On 2016/07/13 20:58:44, Eran Messeri wrote:
> > I'm fine either way - asvitkine, please make a decision, I'll revise the
> > code according to your decision tomorrow morning (London time).
>
> +1 to removing the old code (that was my expectation if we have consensus
> that
> we don't need to use finch)
>
> https://codereview.chromium.org/2140093002/
>

-- 
You received this message because you are subscribed to the Google Groups
"Chromium-reviews" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

jam

On 2016/07/13 21:18:53, Alexei Svitkine (slow) wrote: > Okay, I discussed this more with gab@ ...

4 years, 5 months ago (2016-07-13 23:09:28 UTC) #44

On 2016/07/13 21:18:53, Alexei Svitkine (slow) wrote:
> Okay, I discussed this more with gab@ and rkaplow@ offline and here's some
> more thoughts on this. They're different from my previous reply.
> 
> If we just remove all Finch control right now and land this before dev cut,
> then we risk conflating the result of this change with any other startup
> regression or improvement that happened since the previous dev (note: and
> since we didn't have a dev release this week, it's a longer time window
> than usual). 

The flip side is that there's also less changes than usual because many people
are gone.

> For example, if something regressed startup by 200ms in the
> mean time and this improves it by 1s, we'll think it improved it by 800ms
> and miss the other regression.

There's a lot of noise in the startup numbers regardless afaik, so I don't think
the data is as clear cut to pinpoint small regressions. At the end of the day,
there are hundreds of changes landing per day and as a result there could be
many other changes landing that change timing. We don't isolate each of them
through finch experiments.

I still think that we should remove the old code path and finch code in this
change, and not use finch for bug fixes.

> 
> So I don't think this would be acceptable. We shouldn't allow the
> possibility of new regressions to slip through if we can easily control for
> it.
> 
> Also note that Canary data is not very reliable for startup - for the
> original regression we had a lot of trouble identifying anything on Canary
> due to noise, we only saw a noticeable effect on dev/beta/stable.
> 
> So given the above, I can think of two paths forward:
>   1. We just Finch it (the JSON parser change) - defaulting of course to
> the fixed behavior by default. The developer overhead is just cleaning it
> up after, which honestly is like 10 mins of work - probably more than we've
> spent discussing whether to do it already.
>   2. We remove all Finch control in this CL, but do not land it until after
> Dev cut. This way, we can still get data about the dev impact (in
> particular, which will explain whether this issue explains the full M51
> regression or not which we don't have a signal for yet).
> 
> I am okay with either approach, but have a preference for 1, since then we
> don't have to delay landing the CL.
> 
> On Wed, Jul 13, 2016 at 5:05 PM, <mailto:jam@chromium.org> wrote:
> 
> > On 2016/07/13 20:58:44, Eran Messeri wrote:
> > > I'm fine either way - asvitkine, please make a decision, I'll revise the
> > > code according to your decision tomorrow morning (London time).
> >
> > +1 to removing the old code (that was my expectation if we have consensus
> > that
> > we don't need to use finch)
> >
> > https://codereview.chromium.org/2140093002/
> >
> 
> -- 
> You received this message because you are subscribed to the Google Groups
> "Chromium-reviews" group.
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.

Eran Messeri

In the latest patchset there's another variation parameter for choosing whether to use the SafeJsonParser ...

4 years, 5 months ago (2016-07-14 10:00:47 UTC) #45

Eran Messeri

https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode158 chrome/browser/component_updater/sth_set_component_installer.cc:158: int error_code = 0; On 2016/07/13 19:25:01, Alexei Svitkine ...

4 years, 5 months ago (2016-07-14 10:01:04 UTC) #46

Alexei Svitkine (slow)

patchset 7 LGTM jam: Just to be clear, I completely agree that in general, we ...

4 years, 5 months ago (2016-07-14 13:37:57 UTC) #47

gab

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode81 chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/13 17:29:08, gab wrote: > On 2016/07/13 ...

4 years, 5 months ago (2016-07-14 14:46:03 UTC) #48

gab

On 2016/07/14 14:46:03, gab wrote: > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc > File chrome/browser/component_updater/sth_set_component_installer.cc (right): > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/component_updater/sth_set_component_installer.cc#newcode81 > ...

4 years, 5 months ago (2016-07-14 14:48:43 UTC) #49

jam

On 2016/07/14 10:00:47, Eran Messeri wrote: > In the latest patchset there's another variation parameter ...

4 years, 5 months ago (2016-07-14 16:08:15 UTC) #50

On 2016/07/14 10:00:47, Eran Messeri wrote:
> In the latest patchset there's another variation parameter for choosing
whether
> to use the SafeJsonParser or parse directly using the JSONReader in-process.
> 
> I note jam@ still objects to using Finch here and so may not like that I've
> added *more* finch code.
> 
> jam, asvitkine, any chance you could discuss this out of band and update the
> issue with the conclusions?

Yep I tried to VC yesterday but Alexei was in training.
I'll try again today.



On 2016/07/14 13:37:57, Alexei Svitkine (slow) wrote:
> patchset 7 LGTM
> 
> jam:
> 
> Just to be clear, I completely agree that in general, we shouldn't use Finch
for
> bug fixes. But this is an exceptional case and here it *does* make sense to
use
> it, see my comment #43.
> 
> (It's an exceptional case because we're chasing down a massive startup
> regression that we don't yet have full evidence that's caused just by the bug
> this is fixing. Landing without Finch will conflate the signal from this fix
> with other churn on dev channel.)

The bug comments seem clear that we know that this feature is what led to the
startup regression, since when it was disabled through finch the regression went
away.

Given that there's still a finch experiment for enabling/disabling this feature,
that can be used to verify that renabling it again doesn't result in a
regression.

> 
> Just to be clear, the code in patchset 7 is making the new behavior the
default
> in the code and once it lands TOT will run the bugfixed version. Then the only
> other extra work would be to go back and spend 10 minutes to clean up the code
> in a follow-up CL, which we can do as soon as Dev branches with this change.
> (The Finch config / launch bug are already in place from the config we used to
> disable the component on canary during the regression investigation, so
besides
> cleaning up the code there's no additional overhead.)

Yep I understand all this. I still don't think it's necessary, and I'm against
setting a precedent of using finch for fixing regressions like this.
At the extreme, any feature or even commit can cause performance regressions.

jam

btw we've sent an email to launch-review asking for feedback.

4 years, 5 months ago (2016-07-14 17:32:24 UTC) #51

Eran Messeri

The CQ bit was checked by eranm@chromium.org to run a CQ dry run

4 years, 5 months ago (2016-07-18 08:56:17 UTC) #52

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2140093002/140001

4 years, 5 months ago (2016-07-18 08:56:28 UTC) #53

Eran Messeri

Per out of band discussion with asvitkine, will submit this now with finch experiments code ...

4 years, 5 months ago (2016-07-18 09:36:06 UTC) #54

Per out of band discussion with asvitkine, will submit this now with finch
experiments code in.

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
File chrome/browser/component_updater/sth_set_component_installer.cc (right):

https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.cc:81:
content::BrowserThread::PostAfterStartupTask(
On 2016/07/14 14:46:03, gab -- OOO until Monday wrote:
> On 2016/07/13 17:29:08, gab wrote:
> > On 2016/07/13 10:51:55, Eran Messeri wrote:
> > > On 2016/07/12 20:09:24, gab wrote:
> > > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10
this
> > is
> > > > *really* not critical (i.e. it could even be delayed much further after
> > > > startup?) Can we make it a 5 or even 10 minutes delay instead?
> > > > 
> > > > PostAfterStartupTask is meant for non-startup critical work which should
> > still
> > > > run very shortly after startup (i.e. earlier than a fixed delay on fast
> > > machines
> > > > but later on slow machines).
> > > > 
> > > > PS: In the upcoming TaskScheduler world this will be a perfect candidate
> for
> > > > BACKGROUND work, but until then we have to play with hackish delays...
> > > 
> > > Yes, loading the STHs can be delayed by 5-10 minutes.
> > 
> > Ok then please use:
> > 
> > // STHs perform an async sanity verification, it is fine to delay them much
> > after startup.
> > constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10);
> > content::BrowserThread::GetBlockingPool()->PostDelayedTask(...,
> sth_load_delay);
> 
> ping (shall we do a mega delay? that won't result in a better metric but
> potential jank right after recording first paint is not great either if really
> not needed right away)

To follow up, since we came to the conclusion we can parse the JSON simply using
the JSONReader rather than the SafeJsonParser, it seems unnecessary to do such a
long delay. 
There's now another variation parameter in place that'll allow us to experiment
both types of parsing methods, so if post-startup read using the JSONReader is
still too slow we can re-visit the option of a long delay.

https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen...
File chrome/browser/component_updater/sth_set_component_installer.h (right):

https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen...
chrome/browser/component_updater/sth_set_component_installer.h:81: // Provides
weak_ptrs to this for callbacks.
On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote:
> Seems an unnecessary comment

Done.

Eran Messeri

The patchset sent to the CQ was uploaded after l-g-t-m from rsleevi@chromium.org, waffles@chromium.org, jam@chromium.org, gab@chromium.org, ...

4 years, 5 months ago (2016-07-18 09:36:22 UTC) #56

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2140093002/160001

4 years, 5 months ago (2016-07-18 09:36:36 UTC) #57

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 5 months ago (2016-07-18 10:43:51 UTC) #58

commit-bot: I haz the power

Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/246381)

4 years, 5 months ago (2016-07-18 10:43:53 UTC) #59

Alexei Svitkine (slow)

The CQ bit was checked by asvitkine@chromium.org

4 years, 5 months ago (2016-07-18 14:41:33 UTC) #60

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2140093002/160001

4 years, 5 months ago (2016-07-18 14:42:07 UTC) #61

commit-bot: I haz the power

Description was changed from ========== STH Set: Delay loading until after start-up. When loading the ...

4 years, 5 months ago (2016-07-18 16:12:54 UTC) #62

commit-bot: I haz the power

Description was changed from ========== STH Set: Delay loading until after start-up. When loading the ...

4 years, 5 months ago (2016-07-18 16:15:15 UTC) #65

commit-bot: I haz the power

4 years, 5 months ago (2016-07-18 16:15:17 UTC) #66

Message was sent while issue was closed.

Patchset 9 (id:??) landed as
https://crrev.com/af4e289461794790203de6d050198b7e8e7ad5d1
Cr-Commit-Position: refs/heads/master@{#406009}

Issue 2140093002: STH Set: Delay loading until after start-up. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Directly posting #

Patch Set 3 : Adding variation parameter #

Patch Set 4 : WeakPtrFactory, other comments #

Patch Set 5 : Switched back to GetVariationParamValueByFeature #

Patch Set 6 : Not using SafeJsonParser #

Patch Set 7 : Another variation parameter #

Patch Set 8 : Merging with master #

Patch Set 9 : Removed comment #

Messages