|
|
Created:
4 years, 5 months ago by Eran Messeri Modified:
4 years, 5 months ago CC:
chromium-reviews, darin-cc_chromium.org, RyanS, Ryan Sleevi Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionSTH Set: Delay loading until after start-up.
When loading the STHs, the STHSet Component will invoke the
SafeJsonParser up to half a dozen times (one for each STH), which
can have performance implications.
Also update the PostAfterStartupTask documentation to indicate the
task will be posted immediately if called after start-up.
BUG=607946
Committed: https://crrev.com/af4e289461794790203de6d050198b7e8e7ad5d1
Cr-Commit-Position: refs/heads/master@{#406009}
Patch Set 1 #
Total comments: 4
Patch Set 2 : Directly posting #Patch Set 3 : Adding variation parameter #
Total comments: 14
Patch Set 4 : WeakPtrFactory, other comments #Patch Set 5 : Switched back to GetVariationParamValueByFeature #
Total comments: 2
Patch Set 6 : Not using SafeJsonParser #
Total comments: 2
Patch Set 7 : Another variation parameter #Patch Set 8 : Merging with master #Patch Set 9 : Removed comment #
Messages
Total messages: 66 (13 generated)
Description was changed from ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 ========== to ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 ==========
eranm@chromium.org changed reviewers: + asvitkine@chromium.org, jam@chromium.org, waffles@chromium.org
jam, kindly review the small documentation change to content/public/browser/browser_thread.h. waffles / asvitkine, the STHSet component changes are for your review.
jam, kindly review the small documentation change to content/public/browser/browser_thread.h. waffles / asvitkine, the STHSet component changes are for your review.
https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... chrome/browser/component_updater/sth_set_component_installer.cc:79: base::Bind(&STHSetComponentInstallerTraits::PostStartupLoadSTHsFromDisk, Why not directly post LoadSTHsFromDisk? (I'm not sure what the purpose of PostStartupLoadSTHsFromDisk is.)
https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... chrome/browser/component_updater/sth_set_component_installer.cc:77: content::BrowserThread::PostAfterStartupTask( Per comment on the bug, please keep the old behavior as well and use variations params (go/variations-params) to choose one behavior over the other. This way, we can confirm with an experiment that this change is working as expected.
Addressed both comments, PTAL. https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... chrome/browser/component_updater/sth_set_component_installer.cc:77: content::BrowserThread::PostAfterStartupTask( On 2016/07/12 14:56:09, Alexei Svitkine wrote: > Per comment on the bug, please keep the old behavior as well and use variations > params (go/variations-params) to choose one behavior over the other. > > This way, we can confirm with an experiment that this change is working as > expected. Done. https://codereview.chromium.org/2140093002/diff/1/chrome/browser/component_up... chrome/browser/component_updater/sth_set_component_installer.cc:79: base::Bind(&STHSetComponentInstallerTraits::PostStartupLoadSTHsFromDisk, On 2016/07/12 13:56:59, waffles wrote: > Why not directly post LoadSTHsFromDisk? (I'm not sure what the purpose of > PostStartupLoadSTHsFromDisk is.) No reason really, removed.
lgtm
I just read through a bunch of the notes on the bug. Why should we load the utility process on each browser startup to do the same task over and over, instead of saving a sanitized version once per comment 138?
On 2016/07/12 16:27:00, jam wrote: > I just read through a bunch of the notes on the bug. > > Why should we load the utility process on each browser startup to do the same > task over and over, instead of saving a sanitized version once per comment 138? To be clear, fresh data is periodically (every day) pushed via the component updater. It's not static. My understanding is that sanitizing the data from the component updater is component-specific - that is, the STHSetComponent will have to sanitize each of the JSON files it receives, save them to disk and load them from disk on start-up, having to distinguish between browser start-up (where the call to ComponentReady should not do anything because sanitized data is available) and regular updates (where the call to ComponentReady should sanitize the input and store it to disk). I do not think the added complexity is justified, simply because the data (Signed Tree Heads from Certificate Transparency logs) is used asynchronously to audit CT logs. It is not used during start-up at all and it is perfectly fine if it is made available later. Right now the STHSetComponent benefits from being very simple thanks to the component updater interface - it does not persist anything to disk by itself, does not have to care about per-profile directories / ChromeOS storage because the component updater takes care of all of that. Adding the logic of handling sanitized/unsanitized data and distinguishing between start-up/regular update scenarios will not provide any functional benefits.
This change looks good % comment below. At the very least, I think it's a good intermediate solution even if we decide to do something more involved like jam suggests later. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:84: base::Unretained(this), GetInstalledPath(install_dir), As an aside, why is this used base::Unretained()? Is there a guarantee that this class won't be destroyed (e.g. shutting down while it's fetching the component) before the task runs? If there's no obvious such guarantee, please add a WeakPointerFactory as a a member and pass a weak pointer. (If it is guaranteed, add a comment.)
gab@chromium.org changed reviewers: + gab@chromium.org
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:80: "delayed_load") == "yes") { Flip this condition around (i.e. make the default on trunk in the absence of a value for the feature to be a delayed load), this will make it run on the waterfall (and hence a cleaner merge to Stable when we confirm it works). https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this is *really* not critical (i.e. it could even be delayed much further after startup?) Can we make it a 5 or even 10 minutes delay instead? PostAfterStartupTask is meant for non-startup critical work which should still run very shortly after startup (i.e. earlier than a fixed delay on fast machines but later on slow machines). PS: In the upcoming TaskScheduler world this will be a perfect candidate for BACKGROUND work, but until then we have to play with hackish delays... https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:89: base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk, Extract: const base::Closure load_sths_closure = base::Bind(...); outside of the conditionals and re-use |load_sths_closure| in both places. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:92: NOTREACHED(); This NOTREACHED() is not necessary. PostTask() will never fail unless during late shutdown in which case we don't care to load STHs I assume. (callers of PostTask() pretty much never check the return value unless it is critical to run before shutdown -- and even then the API states that "true" means that it "may run"...)
On 2016/07/12 19:56:34, Eran Messeri wrote: > On 2016/07/12 16:27:00, jam wrote: > > I just read through a bunch of the notes on the bug. > > > > Why should we load the utility process on each browser startup to do the same > > task over and over, instead of saving a sanitized version once per comment > 138? > > To be clear, fresh data is periodically (every day) pushed via the component > updater. It's not static. > > My understanding is that sanitizing the data from the component updater is > component-specific - that is, the STHSetComponent will have to sanitize each of > the JSON files it receives, save them to disk and load them from disk on > start-up, having to distinguish between browser start-up (where the call to > ComponentReady should not do anything because sanitized data is available) and > regular updates (where the call to ComponentReady should sanitize the input and > store it to disk). > > I do not think the added complexity is justified, simply because the data > (Signed Tree Heads from Certificate Transparency logs) is used asynchronously to > audit CT logs. It is not used during start-up at all and it is perfectly fine if > it is made available later. > > Right now the STHSetComponent benefits from being very simple thanks to the > component updater interface - it does not persist anything to disk by itself, > does not have to care about per-profile directories / ChromeOS storage because > the component updater takes care of all of that. Adding the logic of handling > sanitized/unsanitized data and distinguishing between start-up/regular update > scenarios will not provide any functional benefits. The motivation is performance and power. On mobile, a user will have many browser startups a day. Doing redundant work on each startup, even if it's delayed, causes two problems: -there are limited number of cores, starting a utility process to sanitize the json on each browser startup means the cores are tied up and not handling tasks that the user is waiting on -the battery drains faster Regarding per-profile, note as an example we already have installation wide data that is also updated frequently: safe browsing. For that we also don't download/save it per profile since it's not user specific.
I'm fine with the CL as-is, since it seems an improvement over the existing code. But since we're discussing, I will toss in my 2¢: > My understanding is that sanitizing the data from the component updater is > component-specific - that is, the STHSetComponent will have to sanitize each of > the JSON files it receives, save them to disk and load them from disk on > start-up, having to distinguish between browser start-up (where the call to > ComponentReady should not do anything because sanitized data is available) and > regular updates (where the call to ComponentReady should sanitize the input and > store it to disk). You're correct that its component specific, but if we went this path I think what you would do here is customize OnCustomInstall to sanitize the contents and write them out (in the dir passed to OnCustomInstall) as [filename]_sanitized.json or something like that. OnCustomInstall is called exactly once during each update. ComponentReady can just assume *_santized.json exists and load them - no distinction between update and browser boot case. > Regarding per-profile, note as an example we already have installation wide data > that is also updated frequently: safe browsing. For that we also don't > download/save it per profile since it's not user specific. To clarify, components are not per-profile, but they are per-user-data-dir (so per-OS-user). We don't have a solution for an installation-wide installs of components because we believe it requires user→system elevation in system-wide install cases. I am very curious to know if safe browsing somehow solved this problem, maybe we can talk out of band? Can we take a step back and ask: What's the purpose of using SafeJsonParser at all in this situation? Which attacker are we protecting against? * Compromised Google server? (Seem far-fetched: sending crashy JSON seems less useful than telling the browser to go fetch evil_widevine_implementation.dll. Plus they could just mess with manifest.json which is already parsed in-process.) * Network MITM? (Already secured by component updater using a pinned ECDSA key & SHA-256, defense-in-depth with Google-signed CRX.) * Evildoer with access to local disk? (Obviously not, if writing post-sanitized files back to disk is an option.)
On 2016/07/12 20:46:07, waffles wrote: > I'm fine with the CL as-is, since it seems an improvement over the existing > code. But since we're discussing, I will toss in my 2¢: > > > My understanding is that sanitizing the data from the component updater is > > component-specific - that is, the STHSetComponent will have to sanitize each > of > > the JSON files it receives, save them to disk and load them from disk on > > start-up, having to distinguish between browser start-up (where the call to > > ComponentReady should not do anything because sanitized data is available) and > > regular updates (where the call to ComponentReady should sanitize the input > and > > store it to disk). > > You're correct that its component specific, but if we went this path I think > what you would do here is customize OnCustomInstall to sanitize the contents and > write them out (in the dir passed to OnCustomInstall) as > [filename]_sanitized.json or something like that. OnCustomInstall is called > exactly once during each update. ComponentReady can just assume *_santized.json > exists and load them - no distinction between update and browser boot case. > > > > Regarding per-profile, note as an example we already have installation wide > data > > that is also updated frequently: safe browsing. For that we also don't > > download/save it per profile since it's not user specific. > > To clarify, components are not per-profile, but they are per-user-data-dir (so > per-OS-user). We don't have a solution for an installation-wide installs of > components because we believe it requires user→system elevation in system-wide > install cases. I am very curious to know if safe browsing somehow solved this > problem, maybe we can talk out of band? Apologies I wasn't speaking precisely, I meant per user data directory. > > > Can we take a step back and ask: What's the purpose of using SafeJsonParser at > all in this situation? Which attacker are we protecting against? > * Compromised Google server? (Seem far-fetched: sending crashy JSON seems less > useful than telling the browser to go fetch evil_widevine_implementation.dll. > Plus they could just mess with manifest.json which is already parsed > in-process.) > * Network MITM? (Already secured by component updater using a pinned ECDSA key > & SHA-256, defense-in-depth with Google-signed CRX.) > * Evildoer with access to local disk? (Obviously not, if writing post-sanitized > files back to disk is an option.) +1, thanks for bringing this up. I was going to, but then I figured that would be obvious and there's a reason it's done this way. But we should verify :)
[+rsleevi, as he originally pointed out JSON parsing should be done in a separate process]. As far as I recall Ryan suggested parsing the Signed Tree Heads in a separate process, but that was when we were fetching the STHs directly from CT logs. Now that the STHs are provided by Google I don't see such a strong argument for using the SafeJsonParser. The only scenario I could think of is if we (Google) push a bad STH Set that not only fails to parse but crashes the parser. In that case crashing the SJP's process is less bad than crashing the browser process itself. I don't know how likely that is to happen.
PTAL - Is everyone, including jam, happy with the CL in its current state? Note I have switched from GetVariationParamsByFeature to GetVariationParamValue which takes a trial name as my understanding of variation params is that they are set at the experiment level, not feature level. The STHSetComponentStudy trial is now set up to have Disabled/Control experiments and we'll further split down the Control experiment to two, both have the STHSetComponent feature enabled, one with "delayed_load" param set to "no" and another with "delayed_load" param set to "yes", or not set at all. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:80: "delayed_load") == "yes") { On 2016/07/12 20:09:25, gab wrote: > Flip this condition around (i.e. make the default on trunk in the absence of a > value for the feature to be a delayed load), this will make it run on the > waterfall (and hence a cleaner merge to Stable when we confirm it works). Done. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/12 20:09:24, gab wrote: > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this is > *really* not critical (i.e. it could even be delayed much further after > startup?) Can we make it a 5 or even 10 minutes delay instead? > > PostAfterStartupTask is meant for non-startup critical work which should still > run very shortly after startup (i.e. earlier than a fixed delay on fast machines > but later on slow machines). > > PS: In the upcoming TaskScheduler world this will be a perfect candidate for > BACKGROUND work, but until then we have to play with hackish delays... Yes, loading the STHs can be delayed by 5-10 minutes. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:84: base::Unretained(this), GetInstalledPath(install_dir), On 2016/07/12 19:59:27, Alexei Svitkine (slow) wrote: > As an aside, why is this used base::Unretained()? Is there a guarantee that this > class won't be destroyed (e.g. shutting down while it's fetching the component) > before the task runs? If there's no obvious such guarantee, please add a > WeakPointerFactory as a a member and pass a weak pointer. > > (If it is guaranteed, add a comment.) There are no guarantees, so I've switched to a WeakPointerFactory. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:89: base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk, On 2016/07/12 20:09:24, gab wrote: > Extract: > > const base::Closure load_sths_closure = base::Bind(...); > > outside of the conditionals and re-use |load_sths_closure| in both places. Done. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:92: NOTREACHED(); On 2016/07/12 20:09:24, gab wrote: > This NOTREACHED() is not necessary. PostTask() will never fail unless during > late shutdown in which case we don't care to load STHs I assume. > > (callers of PostTask() pretty much never check the return value unless it is > critical to run before shutdown -- and even then the API states that "true" > means that it "may run"...) Thanks for the explanation, removed. Does it make sense to change the API? the if (...PostTask()) { NOTREACHED(); } pattern is something I saw scattered around the code quite a few times.
On Jul 13, 2016 6:51 AM, <eranm@chromium.org> wrote: > > PTAL - Is everyone, including jam, happy with the CL in its current state? > > Note I have switched from GetVariationParamsByFeature to GetVariationParamValue > which takes a trial name as my understanding of variation params is that they > are set at the experiment level, not feature level. The STHSetComponentStudy > trial is now set up to have Disabled/Control experiments and we'll further split > down the Control experiment to two, both have the STHSetComponent feature > enabled, one with "delayed_load" param set to "no" and another with > "delayed_load" param set to "yes", or not set at all. > GetVariationParamsByFeature should be fine to use, please switch back to using this. It's true they are per experiment, but the experiment is associated with a feature. We might have some old docs, however - so if you saw something that says the former is not supported, let me know and I can fix the docs. > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > File chrome/browser/component_updater/sth_set_component_installer.cc > (right): > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > chrome/browser/component_updater/sth_set_component_installer.cc:80: > "delayed_load") == "yes") { > On 2016/07/12 20:09:25, gab wrote: > > Flip this condition around (i.e. make the default on trunk in the > absence of a > > value for the feature to be a delayed load), this will make it run on > the > > waterfall (and hence a cleaner merge to Stable when we confirm it > works). > > Done. > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > chrome/browser/component_updater/sth_set_component_installer.cc:81: > content::BrowserThread::PostAfterStartupTask( > On 2016/07/12 20:09:24, gab wrote: > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 > this is > > *really* not critical (i.e. it could even be delayed much further > after > > startup?) Can we make it a 5 or even 10 minutes delay instead? > > > > PostAfterStartupTask is meant for non-startup critical work which > should still > > run very shortly after startup (i.e. earlier than a fixed delay on > fast machines > > but later on slow machines). > > > > PS: In the upcoming TaskScheduler world this will be a perfect > candidate for > > BACKGROUND work, but until then we have to play with hackish delays... > > Yes, loading the STHs can be delayed by 5-10 minutes. > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > chrome/browser/component_updater/sth_set_component_installer.cc:84: > base::Unretained(this), GetInstalledPath(install_dir), > On 2016/07/12 19:59:27, Alexei Svitkine (slow) wrote: > > As an aside, why is this used base::Unretained()? Is there a guarantee > that this > > class won't be destroyed (e.g. shutting down while it's fetching the > component) > > before the task runs? If there's no obvious such guarantee, please add > a > > WeakPointerFactory as a a member and pass a weak pointer. > > > > (If it is guaranteed, add a comment.) > > There are no guarantees, so I've switched to a WeakPointerFactory. > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > chrome/browser/component_updater/sth_set_component_installer.cc:89: > base::Bind(&STHSetComponentInstallerTraits::LoadSTHsFromDisk, > On 2016/07/12 20:09:24, gab wrote: > > Extract: > > > > const base::Closure load_sths_closure = base::Bind(...); > > > > outside of the conditionals and re-use |load_sths_closure| in both > places. > > Done. > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... <https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen...> > chrome/browser/component_updater/sth_set_component_installer.cc:92: > NOTREACHED(); > On 2016/07/12 20:09:24, gab wrote: > > This NOTREACHED() is not necessary. PostTask() will never fail unless > during > > late shutdown in which case we don't care to load STHs I assume. > > > > (callers of PostTask() pretty much never check the return value unless > it is > > critical to run before shutdown -- and even then the API states that > "true" > > means that it "may run"...) > > Thanks for the explanation, removed. Does it make sense to change the > API? the if (...PostTask()) { NOTREACHED(); } pattern is something I saw > scattered around the code quite a few times. > > https://codereview.chromium.org/2140093002/ <https://codereview.chromium.org/2140093002/> -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
> > GetVariationParamsByFeature should be fine to use, please switch back to > using this. It's true they are per experiment, but the experiment is > associated with a feature. We might have some old docs, however - so if you > saw something that says the former is not supported, let me know and I can > fix the docs. > Done. FWIW These are the flags I used: --enable-features="STHSetComponent<STHSetComponentStudy" --force-fieldtrials=STHSetComponentStudy/Control --force-fieldtrial-params=STHSetComponentStudy.Control:delayed_load/no
lgtm
lgtm
On 2016/07/13 10:47:43, Eran Messeri wrote: > [+rsleevi, as he originally pointed out JSON parsing should be done in a > separate process]. > > As far as I recall Ryan suggested parsing the Signed Tree Heads in a separate > process, but that was when we were fetching the STHs directly from CT logs. > Now that the STHs are provided by Google I don't see such a strong argument for > using the SafeJsonParser. > > The only scenario I could think of is if we (Google) push a bad STH Set that not > only fails to parse but crashes the parser. In that case crashing the SJP's > process is less bad than crashing the browser process itself. I don't know how > likely that is to happen. There are many places that we send JSON (or other data types) from Google and parse it in the browser process. The point of the safe json parser is to protect against untrusted sources, which isn't the case here. IMO there's no need to launch separate processes and take the performance/power hit because theoretically google can send json that's so malformed that it crashes our parser. If that happens, we'll get crash reports and we should fix the parser to not crash. But there won't be security issues because the response from Google won't be crafted to exploit the user.
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/13 10:51:55, Eran Messeri wrote: > On 2016/07/12 20:09:24, gab wrote: > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this is > > *really* not critical (i.e. it could even be delayed much further after > > startup?) Can we make it a 5 or even 10 minutes delay instead? > > > > PostAfterStartupTask is meant for non-startup critical work which should still > > run very shortly after startup (i.e. earlier than a fixed delay on fast > machines > > but later on slow machines). > > > > PS: In the upcoming TaskScheduler world this will be a perfect candidate for > > BACKGROUND work, but until then we have to play with hackish delays... > > Yes, loading the STHs can be delayed by 5-10 minutes. Ok then please use: // STHs perform an async sanity verification, it is fine to delay them much after startup. constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10); content::BrowserThread::GetBlockingPool()->PostDelayedTask(..., sth_load_delay); https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:92: NOTREACHED(); On 2016/07/13 10:51:55, Eran Messeri wrote: > On 2016/07/12 20:09:24, gab wrote: > > This NOTREACHED() is not necessary. PostTask() will never fail unless during > > late shutdown in which case we don't care to load STHs I assume. > > > > (callers of PostTask() pretty much never check the return value unless it is > > critical to run before shutdown -- and even then the API states that "true" > > means that it "may run"...) > > Thanks for the explanation, removed. Does it make sense to change the API? the > if (...PostTask()) { NOTREACHED(); } pattern is something I saw scattered around > the code quite a few times. There are far more places that don't have NOTREACHED() then places that do. TBH, I would like for the API to just return void but it's *way* too spread for this by now. It's not harmful to NOTREACHED(), but it's not really ever correct either (except in a few low-level corner cases which it's not worth going into here -- but are the reason why this can't be trivially turned into "void").
rsleevi@chromium.org changed reviewers: + rsleevi@chromium.org
LGTM As for in-proc/out-of-proc, while jam@'s correct that if it's coming from Google it should be fine to trust (in-proc), my understanding of the current pipeline is that we're just re-bundling up externally provided data, from potentially 'untrustworthy' sources, and not validating on the server. Put differently, if the pipeline for packaging this data delivered in the component was validating, on the server, that it was well-formed, then we'd be safe to parse this data in-browser-process on the clients. But if we're just packaging up the API responses from the logs (as I understood we were), then you should treat it as hostile data. In either event, more documentation in the code about these threat models doesn't hurt :) https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.h (right): https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.h:81: // Provides weak_ptrs to this for callbacks. Seems an unnecessary comment
jam@chromium.org changed reviewers: + jschuh@chromium.org
On 2016/07/13 17:22:01, jam wrote: > On 2016/07/13 10:47:43, Eran Messeri wrote: > > [+rsleevi, as he originally pointed out JSON parsing should be done in a > > separate process]. > > > > As far as I recall Ryan suggested parsing the Signed Tree Heads in a separate > > process, but that was when we were fetching the STHs directly from CT logs. > > Now that the STHs are provided by Google I don't see such a strong argument > for > > using the SafeJsonParser. > > > > The only scenario I could think of is if we (Google) push a bad STH Set that > not > > only fails to parse but crashes the parser. In that case crashing the SJP's > > process is less bad than crashing the browser process itself. I don't know how > > likely that is to happen. > > There are many places that we send JSON (or other data types) from Google and > parse it in the browser process. The point of the safe json parser is to protect > against untrusted sources, which isn't the case here. > > IMO there's no need to launch separate processes and take the performance/power > hit because theoretically google can send json that's so malformed that it > crashes our parser. If that happens, we'll get crash reports and we should fix > the parser to not crash. But there won't be security issues because the response > from Google won't be crafted to exploit the user. (just chatted with Justin in person, adding him as well so he can write his thoughts better than I can transcribe them)
On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > LGTM > > As for in-proc/out-of-proc, while jam@'s correct that if it's coming from Google > it should be fine to trust (in-proc), my understanding of the current pipeline > is that we're just re-bundling up externally provided data, from potentially > 'untrustworthy' sources, and not validating on the server. > > Put differently, if the pipeline for packaging this data delivered in the > component was validating, on the server, that it was well-formed, then we'd be > safe to parse this data in-browser-process on the clients. But if we're just > packaging up the API responses from the logs (as I understood we were), then you > should treat it as hostile data. In either event, more documentation in the code > about these threat models doesn't hurt :) Ah, thanks for the background. I'll defer to security & net folks to decide on whether this needs to be done in a sandbox or not. If it's in a sandbox, I think we should have sanitized versions so that we don't need to redo work needlessly.
On 2016/07/13 18:30:02, jam wrote: > On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > > LGTM > > > > As for in-proc/out-of-proc, while jam@'s correct that if it's coming from > Google > > it should be fine to trust (in-proc), my understanding of the current pipeline > > is that we're just re-bundling up externally provided data, from potentially > > 'untrustworthy' sources, and not validating on the server. > > > > Put differently, if the pipeline for packaging this data delivered in the > > component was validating, on the server, that it was well-formed, then we'd be > > safe to parse this data in-browser-process on the clients. But if we're just > > packaging up the API responses from the logs (as I understood we were), then > you > > should treat it as hostile data. In either event, more documentation in the > code > > about these threat models doesn't hurt :) > > Ah, thanks for the background. I'll defer to security & net folks to decide on > whether this needs to be done in a sandbox or not. > If it's in a sandbox, I think we should have sanitized versions so that we don't > need to redo work needlessly. We're not just re-bundling up externally provided data, we're re-packaging the JSON after we've parsed it and validated the signature over the STH, so according to this criteria it should be safe to parse in-process. I'll send the links internally.
On 2016/07/13 18:46:54, Eran Messeri wrote: > On 2016/07/13 18:30:02, jam wrote: > > On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > > > LGTM > > > > > > As for in-proc/out-of-proc, while jam@'s correct that if it's coming from > > Google > > > it should be fine to trust (in-proc), my understanding of the current > pipeline > > > is that we're just re-bundling up externally provided data, from potentially > > > 'untrustworthy' sources, and not validating on the server. > > > > > > Put differently, if the pipeline for packaging this data delivered in the > > > component was validating, on the server, that it was well-formed, then we'd > be > > > safe to parse this data in-browser-process on the clients. But if we're just > > > packaging up the API responses from the logs (as I understood we were), then > > you > > > should treat it as hostile data. In either event, more documentation in the > > code > > > about these threat models doesn't hurt :) > > > > Ah, thanks for the background. I'll defer to security & net folks to decide on > > whether this needs to be done in a sandbox or not. > > If it's in a sandbox, I think we should have sanitized versions so that we > don't > > need to redo work needlessly. > > We're not just re-bundling up externally provided data, we're re-packaging the > JSON after we've parsed it and validated the signature over the STH, so > according to this criteria it should be safe to parse in-process. > I'll send the links internally. Thanks for the clarification. Given that information, agree it does sound like we can just parse it in the browser process.
> Thanks for the clarification. Given that information, agree it does sound like > we can just parse it in the browser process. Done, PTAL.
https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/compone... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/compone... chrome/browser/component_updater/sth_set_component_installer.cc:158: int error_code = 0; Can you keep the old behavior based on variation params so we can see the effect experimentally? Otherwise, it will be hard to judge the effect of this.
lgtm, thanks for the simplification IMO using finch here to keep the old code path is over-kill. We know this feature is what caused the startup regression per the bug. There's very little to gain to keep old code path just so we can verify with a finch experiment what we already know. This seems like not the best use of everyone's time.
How do we know that the new codepath fully restores the regression rather than only partially without being able to A/B test? And no, looking at the value before and after the change is not sufficient due to noise. On Wed, Jul 13, 2016 at 3:31 PM, <jam@chromium.org> wrote: > lgtm, thanks for the simplification > > IMO using finch here to keep the old code path is over-kill. We know this > feature is what caused the startup regression per the bug. There's very > little > to gain to keep old code path just so we can verify with a finch > experiment what > we already know. This seems like not the best use of everyone's time. > > https://codereview.chromium.org/2140093002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
lgtm (FWIW I agree with jam@ regarding keeping Finch here, unless we are worried about regressions of some sort.)
On 2016/07/13 19:38:04, Alexei Svitkine (slow) wrote: > How do we know that the new codepath fully restores the regression rather > than only partially without being able to A/B test? And no, looking at the > value before and after the change is not sufficient due to noise. We'll see that the number of utility process launches dropped back down. IMO this is a slippery slope where bug fixes now are done through finch experiments. Finch is for experimentation. We don't need experimentation here to know that a feature which launched the utility process at startup slowed things down, and removing the process launching will fix things. > > On Wed, Jul 13, 2016 at 3:31 PM, <mailto:jam@chromium.org> wrote: > > > lgtm, thanks for the simplification > > > > IMO using finch here to keep the old code path is over-kill. We know this > > feature is what caused the startup regression per the bug. There's very > > little > > to gain to keep old code path just so we can verify with a finch > > experiment what > > we already know. This seems like not the best use of everyone's time. > > > > https://codereview.chromium.org/2140093002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
I agree we do not need Finch in this case. We should run experiments when we need to, but they make everything more complicated. In this case, we've done a lot of analysis and have confidence in the fix. And since we noticed the regression in the first place, it should be equally easy to validate that the fix is working as intended.
Actually, thinking about this more, I think you're right - we can omit the additional Finch control here. We already have the hammer set up that turns off the feature entirely via Finch which we've used to validate this caused a regression. We still need to get results from Dev for it to see if it really caused the full Dev regression (there was no Dev release this week and Canary regression is much smaller I think). However, that existing set up should be sufficient to check that the regression is resolved by this CL. Basically, we just need to compare the Enabled/Disabled groups on versions after this CL to ensure there's no longer any performance difference between the two. So, I agree we can avoid the extra Finch control in the JSON code and we can also remove the variations param check in the startup delay codepath. Sorry for the opposite advice earlier. On Wed, Jul 13, 2016 at 3:51 PM, <brettw@chromium.org> wrote: > I agree we do not need Finch in this case. We should run experiments when > we > need to, but they make everything more complicated. In this case, we've > done a > lot of analysis and have confidence in the fix. And since we noticed the > regression in the first place, it should be equally easy to validate that > the > fix is working as intended. > > https://codereview.chromium.org/2140093002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/07/13 20:07:02, Alexei Svitkine (slow) wrote: > Actually, thinking about this more, I think you're right - we can omit the > additional Finch control here. > > We already have the hammer set up that turns off the feature entirely via > Finch which we've used to validate this caused a regression. We still need > to get results from Dev for it to see if it really caused the full Dev > regression (there was no Dev release this week and Canary regression is > much smaller I think). > > However, that existing set up should be sufficient to check that the > regression is resolved by this CL. Basically, we just need to compare the > Enabled/Disabled groups on versions after this CL to ensure there's no > longer any performance difference between the two. Well, that's only true if we don't land this CL before the next Dev as it currently removes the safe JSON parse which was part of the problem whether the feature is on or off. In this case though the regression is large enough (visible on timeline) that I agree we don't need Finch to tell the effect (we do on bugs where the effect is lost in noise but not here). Speaking of which, maybe we can just remove the Finch code altogether? > > So, I agree we can avoid the extra Finch control in the JSON code and we > can also remove the variations param check in the startup delay codepath. > Sorry for the opposite advice earlier. > > On Wed, Jul 13, 2016 at 3:51 PM, <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=brettw@chromium.org> wrote: > > > I agree we do not need Finch in this case. We should run experiments when > > we > > need to, but they make everything more complicated. In this case, we've > > done a > > lot of analysis and have confidence in the fix. And since we noticed the > > regression in the first place, it should be equally easy to validate that > > the > > fix is working as intended. > > > > https://codereview.chromium.org/2140093002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=chromium-reviews+unsubscri....
I'm fine either way - asvitkine, please make a decision, I'll revise the code according to your decision tomorrow morning (London time). On 13 Jul 2016 9:18 pm, <gab@chromium.org> wrote: On 2016/07/13 20:07:02, Alexei Svitkine (slow) wrote: > Actually, thinking about this more, I think you're right - we can omit the > additional Finch control here. > > We already have the hammer set up that turns off the feature entirely via > Finch which we've used to validate this caused a regression. We still need > to get results from Dev for it to see if it really caused the full Dev > regression (there was no Dev release this week and Canary regression is > much smaller I think). > > However, that existing set up should be sufficient to check that the > regression is resolved by this CL. Basically, we just need to compare the > Enabled/Disabled groups on versions after this CL to ensure there's no > longer any performance difference between the two. Well, that's only true if we don't land this CL before the next Dev as it currently removes the safe JSON parse which was part of the problem whether the feature is on or off. In this case though the regression is large enough (visible on timeline) that I agree we don't need Finch to tell the effect (we do on bugs where the effect is lost in noise but not here). Speaking of which, maybe we can just remove the Finch code altogether? > > So, I agree we can avoid the extra Finch control in the JSON code and we > can also remove the variations param check in the startup delay codepath. > Sorry for the opposite advice earlier. > > On Wed, Jul 13, 2016 at 3:51 PM, <https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=brettw@chromium.org> wrote: > > > I agree we do not need Finch in this case. We should run experiments when > > we > > need to, but they make everything more complicated. In this case, we've > > done a > > lot of analysis and have confidence in the fix. And since we noticed the > > regression in the first place, it should be equally easy to validate that > > the > > fix is working as intended. > > > > https://codereview.chromium.org/2140093002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to https://mail.google.com/mail/?view=cm&fs=1&tf=1&to=chromium-reviews+unsubscri... . https://codereview.chromium.org/2140093002/ -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/07/13 20:58:44, Eran Messeri wrote: > I'm fine either way - asvitkine, please make a decision, I'll revise the > code according to your decision tomorrow morning (London time). +1 to removing the old code (that was my expectation if we have consensus that we don't need to use finch)
Okay, I discussed this more with gab@ and rkaplow@ offline and here's some more thoughts on this. They're different from my previous reply. If we just remove all Finch control right now and land this before dev cut, then we risk conflating the result of this change with any other startup regression or improvement that happened since the previous dev (note: and since we didn't have a dev release this week, it's a longer time window than usual). For example, if something regressed startup by 200ms in the mean time and this improves it by 1s, we'll think it improved it by 800ms and miss the other regression. So I don't think this would be acceptable. We shouldn't allow the possibility of new regressions to slip through if we can easily control for it. Also note that Canary data is not very reliable for startup - for the original regression we had a lot of trouble identifying anything on Canary due to noise, we only saw a noticeable effect on dev/beta/stable. So given the above, I can think of two paths forward: 1. We just Finch it (the JSON parser change) - defaulting of course to the fixed behavior by default. The developer overhead is just cleaning it up after, which honestly is like 10 mins of work - probably more than we've spent discussing whether to do it already. 2. We remove all Finch control in this CL, but do not land it until after Dev cut. This way, we can still get data about the dev impact (in particular, which will explain whether this issue explains the full M51 regression or not which we don't have a signal for yet). I am okay with either approach, but have a preference for 1, since then we don't have to delay landing the CL. On Wed, Jul 13, 2016 at 5:05 PM, <jam@chromium.org> wrote: > On 2016/07/13 20:58:44, Eran Messeri wrote: > > I'm fine either way - asvitkine, please make a decision, I'll revise the > > code according to your decision tomorrow morning (London time). > > +1 to removing the old code (that was my expectation if we have consensus > that > we don't need to use finch) > > https://codereview.chromium.org/2140093002/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2016/07/13 21:18:53, Alexei Svitkine (slow) wrote: > Okay, I discussed this more with gab@ and rkaplow@ offline and here's some > more thoughts on this. They're different from my previous reply. > > If we just remove all Finch control right now and land this before dev cut, > then we risk conflating the result of this change with any other startup > regression or improvement that happened since the previous dev (note: and > since we didn't have a dev release this week, it's a longer time window > than usual). The flip side is that there's also less changes than usual because many people are gone. > For example, if something regressed startup by 200ms in the > mean time and this improves it by 1s, we'll think it improved it by 800ms > and miss the other regression. There's a lot of noise in the startup numbers regardless afaik, so I don't think the data is as clear cut to pinpoint small regressions. At the end of the day, there are hundreds of changes landing per day and as a result there could be many other changes landing that change timing. We don't isolate each of them through finch experiments. I still think that we should remove the old code path and finch code in this change, and not use finch for bug fixes. > > So I don't think this would be acceptable. We shouldn't allow the > possibility of new regressions to slip through if we can easily control for > it. > > Also note that Canary data is not very reliable for startup - for the > original regression we had a lot of trouble identifying anything on Canary > due to noise, we only saw a noticeable effect on dev/beta/stable. > > So given the above, I can think of two paths forward: > 1. We just Finch it (the JSON parser change) - defaulting of course to > the fixed behavior by default. The developer overhead is just cleaning it > up after, which honestly is like 10 mins of work - probably more than we've > spent discussing whether to do it already. > 2. We remove all Finch control in this CL, but do not land it until after > Dev cut. This way, we can still get data about the dev impact (in > particular, which will explain whether this issue explains the full M51 > regression or not which we don't have a signal for yet). > > I am okay with either approach, but have a preference for 1, since then we > don't have to delay landing the CL. > > On Wed, Jul 13, 2016 at 5:05 PM, <mailto:jam@chromium.org> wrote: > > > On 2016/07/13 20:58:44, Eran Messeri wrote: > > > I'm fine either way - asvitkine, please make a decision, I'll revise the > > > code according to your decision tomorrow morning (London time). > > > > +1 to removing the old code (that was my expectation if we have consensus > > that > > we don't need to use finch) > > > > https://codereview.chromium.org/2140093002/ > > > > -- > You received this message because you are subscribed to the Google Groups > "Chromium-reviews" group. > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
In the latest patchset there's another variation parameter for choosing whether to use the SafeJsonParser or parse directly using the JSONReader in-process. I note jam@ still objects to using Finch here and so may not like that I've added *more* finch code. jam, asvitkine, any chance you could discuss this out of band and update the issue with the conclusions?
https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/compone... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/100001/chrome/browser/compone... chrome/browser/component_updater/sth_set_component_installer.cc:158: int error_code = 0; On 2016/07/13 19:25:01, Alexei Svitkine (slow) wrote: > Can you keep the old behavior based on variation params so we can see the effect > experimentally? Otherwise, it will be hard to judge the effect of this. Done.
patchset 7 LGTM jam: Just to be clear, I completely agree that in general, we shouldn't use Finch for bug fixes. But this is an exceptional case and here it *does* make sense to use it, see my comment #43. (It's an exceptional case because we're chasing down a massive startup regression that we don't yet have full evidence that's caused just by the bug this is fixing. Landing without Finch will conflate the signal from this fix with other churn on dev channel.) Just to be clear, the code in patchset 7 is making the new behavior the default in the code and once it lands TOT will run the bugfixed version. Then the only other extra work would be to go back and spend 10 minutes to clean up the code in a follow-up CL, which we can do as soon as Dev branches with this change. (The Finch config / launch bug are already in place from the config we used to disable the component on canary during the regression investigation, so besides cleaning up the code there's no additional overhead.)
https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/13 17:29:08, gab wrote: > On 2016/07/13 10:51:55, Eran Messeri wrote: > > On 2016/07/12 20:09:24, gab wrote: > > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this > is > > > *really* not critical (i.e. it could even be delayed much further after > > > startup?) Can we make it a 5 or even 10 minutes delay instead? > > > > > > PostAfterStartupTask is meant for non-startup critical work which should > still > > > run very shortly after startup (i.e. earlier than a fixed delay on fast > > machines > > > but later on slow machines). > > > > > > PS: In the upcoming TaskScheduler world this will be a perfect candidate for > > > BACKGROUND work, but until then we have to play with hackish delays... > > > > Yes, loading the STHs can be delayed by 5-10 minutes. > > Ok then please use: > > // STHs perform an async sanity verification, it is fine to delay them much > after startup. > constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10); > content::BrowserThread::GetBlockingPool()->PostDelayedTask(..., sth_load_delay); ping (shall we do a mega delay? that won't result in a better metric but potential jank right after recording first paint is not great either if really not needed right away)
On 2016/07/14 14:46:03, gab wrote: > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... > File chrome/browser/component_updater/sth_set_component_installer.cc (right): > > https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... > chrome/browser/component_updater/sth_set_component_installer.cc:81: > content::BrowserThread::PostAfterStartupTask( > On 2016/07/13 17:29:08, gab wrote: > > On 2016/07/13 10:51:55, Eran Messeri wrote: > > > On 2016/07/12 20:09:24, gab wrote: > > > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this > > is > > > > *really* not critical (i.e. it could even be delayed much further after > > > > startup?) Can we make it a 5 or even 10 minutes delay instead? > > > > > > > > PostAfterStartupTask is meant for non-startup critical work which should > > still > > > > run very shortly after startup (i.e. earlier than a fixed delay on fast > > > machines > > > > but later on slow machines). > > > > > > > > PS: In the upcoming TaskScheduler world this will be a perfect candidate > for > > > > BACKGROUND work, but until then we have to play with hackish delays... > > > > > > Yes, loading the STHs can be delayed by 5-10 minutes. > > > > Ok then please use: > > > > // STHs perform an async sanity verification, it is fine to delay them much > > after startup. > > constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10); > > content::BrowserThread::GetBlockingPool()->PostDelayedTask(..., > sth_load_delay); > > ping (shall we do a mega delay? that won't result in a better metric but > potential jank right after recording first paint is not great either if really > not needed right away) (lgtm otherwise -- I'll be OOO until Monday starting now so don't wait for me)
On 2016/07/14 10:00:47, Eran Messeri wrote: > In the latest patchset there's another variation parameter for choosing whether > to use the SafeJsonParser or parse directly using the JSONReader in-process. > > I note jam@ still objects to using Finch here and so may not like that I've > added *more* finch code. > > jam, asvitkine, any chance you could discuss this out of band and update the > issue with the conclusions? Yep I tried to VC yesterday but Alexei was in training. I'll try again today. On 2016/07/14 13:37:57, Alexei Svitkine (slow) wrote: > patchset 7 LGTM > > jam: > > Just to be clear, I completely agree that in general, we shouldn't use Finch for > bug fixes. But this is an exceptional case and here it *does* make sense to use > it, see my comment #43. > > (It's an exceptional case because we're chasing down a massive startup > regression that we don't yet have full evidence that's caused just by the bug > this is fixing. Landing without Finch will conflate the signal from this fix > with other churn on dev channel.) The bug comments seem clear that we know that this feature is what led to the startup regression, since when it was disabled through finch the regression went away. Given that there's still a finch experiment for enabling/disabling this feature, that can be used to verify that renabling it again doesn't result in a regression. > > Just to be clear, the code in patchset 7 is making the new behavior the default > in the code and once it lands TOT will run the bugfixed version. Then the only > other extra work would be to go back and spend 10 minutes to clean up the code > in a follow-up CL, which we can do as soon as Dev branches with this change. > (The Finch config / launch bug are already in place from the config we used to > disable the component on canary during the regression investigation, so besides > cleaning up the code there's no additional overhead.) Yep I understand all this. I still don't think it's necessary, and I'm against setting a precedent of using finch for fixing regressions like this. At the extreme, any feature or even commit can cause performance regressions.
btw we've sent an email to launch-review asking for feedback.
The CQ bit was checked by eranm@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Per out of band discussion with asvitkine, will submit this now with finch experiments code in. https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.cc (right): https://codereview.chromium.org/2140093002/diff/40001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.cc:81: content::BrowserThread::PostAfterStartupTask( On 2016/07/14 14:46:03, gab -- OOO until Monday wrote: > On 2016/07/13 17:29:08, gab wrote: > > On 2016/07/13 10:51:55, Eran Messeri wrote: > > > On 2016/07/12 20:09:24, gab wrote: > > > > Actually, IIUC from https://codereview.chromium.org/2140093002/#msg10 this > > is > > > > *really* not critical (i.e. it could even be delayed much further after > > > > startup?) Can we make it a 5 or even 10 minutes delay instead? > > > > > > > > PostAfterStartupTask is meant for non-startup critical work which should > > still > > > > run very shortly after startup (i.e. earlier than a fixed delay on fast > > > machines > > > > but later on slow machines). > > > > > > > > PS: In the upcoming TaskScheduler world this will be a perfect candidate > for > > > > BACKGROUND work, but until then we have to play with hackish delays... > > > > > > Yes, loading the STHs can be delayed by 5-10 minutes. > > > > Ok then please use: > > > > // STHs perform an async sanity verification, it is fine to delay them much > > after startup. > > constexpr base::TimeDelta sth_load_delay = base::TimeDelta::FromMinutes(10); > > content::BrowserThread::GetBlockingPool()->PostDelayedTask(..., > sth_load_delay); > > ping (shall we do a mega delay? that won't result in a better metric but > potential jank right after recording first paint is not great either if really > not needed right away) To follow up, since we came to the conclusion we can parse the JSON simply using the JSONReader rather than the SafeJsonParser, it seems unnecessary to do such a long delay. There's now another variation parameter in place that'll allow us to experiment both types of parsing methods, so if post-startup read using the JSONReader is still too slow we can re-visit the option of a long delay. https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen... File chrome/browser/component_updater/sth_set_component_installer.h (right): https://codereview.chromium.org/2140093002/diff/80001/chrome/browser/componen... chrome/browser/component_updater/sth_set_component_installer.h:81: // Provides weak_ptrs to this for callbacks. On 2016/07/13 17:42:19, Ryan Sleevi (extremely slow) wrote: > Seems an unnecessary comment Done.
The CQ bit was checked by eranm@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rsleevi@chromium.org, waffles@chromium.org, jam@chromium.org, gab@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/2140093002/#ps160001 (title: "Removed comment")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by asvitkine@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 ========== to ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
CQ bit was unchecked.
Message was sent while issue was closed.
Description was changed from ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 ========== to ========== STH Set: Delay loading until after start-up. When loading the STHs, the STHSet Component will invoke the SafeJsonParser up to half a dozen times (one for each STH), which can have performance implications. Also update the PostAfterStartupTask documentation to indicate the task will be posted immediately if called after start-up. BUG=607946 Committed: https://crrev.com/af4e289461794790203de6d050198b7e8e7ad5d1 Cr-Commit-Position: refs/heads/master@{#406009} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/af4e289461794790203de6d050198b7e8e7ad5d1 Cr-Commit-Position: refs/heads/master@{#406009} |