|
|
Created:
5 years, 1 month ago by Charlie Harrison Modified:
5 years ago Reviewers:
Bryan McQuade CC:
chromium-reviews, droger+watchlist_chromium.org, blundell+watchlist_chromium.org, sdefresne+watchlist_chromium.org, loading-reviews+metrics_chromium.org, csharrison+watch_chromium.org, asvitkine+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@plm_navigation_start Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[page_load_metrics] User Initiated Abort Tracking
Brief implementation design document here:
https://docs.google.com/document/d/110xLqjh9pRBfUevRZYgJevyBoq36uulzkUvTYhDveR4/edit#
BUG=517209
Patch Set 1 #
Total comments: 19
Patch Set 2 : Bryan partial review #Patch Set 3 : Add 100ms condition to overriding ABORT_OTHER. Remove ABORT_OTHER for committed loads #
Total comments: 17
Depends on Patchset: Messages
Total messages: 9 (2 generated)
csharrison@chromium.org changed reviewers: + bmcquade@chromium.org
PTAL. Not in scope: adding this to observers. I'll add more tests when you sign off on the design. Thanks!
Thanks, this looks really good! My only concern is around having to keep track of aborted_provisional_loads_. IIUC, the reason we do this is to be able to update their abort reason in a subsequent callback, but it feels unfortunate and in some cases it seems like we could end up producing the wrong result (see my comments for details). I'm not sure there's anything we can do to address that, other than perhaps annotating the navhandle to record the abort reason as part of DidFinishNavigation (and I don't know if we even have the state needed to do that at the point where DidFinishNavigation gets invoked). Let's discuss today - will find some time on calendar. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... File components/page_load_metrics/browser/metrics_web_contents_observer.cc (right): https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:230: PAGE_LOAD_HISTOGRAM(kHistogramCommittedAbortBackground, this does feel like a user-initiated abort, but i worry that since it's different from the typical definition of an aborted load, consumers of this in the UMA dash might misinterpret it. maybe it's better to call this a 'background before first paint' and not mix the abort concept in here. wdyt? if we do that, seems like this might be better logged in RecordTimingHistograms as e.g. PageLoad.Timing2.NavigationToFirstBackgroundBeforePaint https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:241: // Loads are not considered aborts if they painted before the abort event. i'm inclined to either ignore aborts when a tab is backgrounded, or if you think tracking time-to-abort for backgrounded tabs is useful, separating them into '.background'-suffixed histograms. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:531: : error == net::ERR_ABORTED ? PROVISIONAL_LOAD_ERR_ABORTED IIRC downloads/204s report ERR_ABORTED, so we'll end up counting them as aborts currently. I like your TODO above - wondering if it's important for us to fix it so we can differentiate these cases. I guess as-is we can at least know that these aborts will end up in the 'other' category and we can look at it with skepticism for now. Still though I worry about consumers of this data not understanding the subtleties and drawing incorrect conclusions. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:536: aborted_provisional_loads_.push_back(finished_nav.Pass()); the aborted_provisional_loads_ member makes me a bit uneasy. seems like we hold on to these in order to find a possibly more accurate abort reason in the future. but it also seems like we could end up attributing the wrong abort reason in some cases. for example, imagine we get a 204 response which leads to an ERR_ABORTED, which we classify here as ABORT_OTHER. Then many seconds in the future, before the next nav commits (perhaps the user just never navigates again), the user closes the tab and the destructor runs, causing us to update the abort type to ABORT_CLOSE with the abort timestamp from when the 204 completed. This seems incorrect to me - the 'close' event was not really associated with the 204 that happened a bit in the past. It seems like there is a good bit of complexity here though, and your solution may be the best we can do given the constraints, but I'd like to talk things through to see if there are any simplifications we can make. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:540: finished_nav->RecordProvisionalEvent(PROVISIONAL_LOAD_COMMITTED); just noticing now - unrelated to this change - should this call to record a PROVISIONAL_LOAD_COMMITTED event be after the IsSamePage check below? I seem to remember feeling this was correct at the time it was added but now I'm not sure. This is not something we need to fix in this CL - but just want to make sure it doesn't get overlooked. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:564: aborted_provisional_loads_.clear(); should we clear aborted_provisional_loads_ even if IsRelevantNavigation returns false? should this be moved further up, near committed_load_.reset()? https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... File components/page_load_metrics/browser/metrics_web_contents_observer.h (right): https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; can we use 'BeforePaint' rather than 'BeforeFirstPaint' to shorten these names a bit? https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; do you think the provisional vs committed distinction is important here? from a user standpoint whether we aborted before vs after commit doesn't seem all that significant to me. i could see it making sense to track a general time-to-commit stat to track how quickly we get to commit (for e.g. plznav) but i'm not sure we also need to track time-to-abort before vs after commit. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; nit: let's stick to singular rather than pluran in these names, so 'Abort' rather than 'Aborts'. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:213: ABORT_NULL, most enums I see use a _NONE suffix for this, rather than _NULL.
Address most of the comments except aborted_provisional_load_. Thanks for the review. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... File components/page_load_metrics/browser/metrics_web_contents_observer.cc (right): https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:230: PAGE_LOAD_HISTOGRAM(kHistogramCommittedAbortBackground, On 2015/11/17 13:33:08, Bryan McQuade wrote: > this does feel like a user-initiated abort, but i worry that since it's > different from the typical definition of an aborted load, consumers of this in > the UMA dash might misinterpret it. maybe it's better to call this a 'background > before first paint' and not mix the abort concept in here. wdyt? if we do that, > seems like this might be better logged in RecordTimingHistograms as e.g. > PageLoad.Timing2.NavigationToFirstBackgroundBeforePaint I could go either way. I only put it here because it got brought up multiple times in reference to user abort discussions. Moving it sgtm. Note that this will end up tracking only committed, relevant loads. I think that's fine. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:241: // Loads are not considered aborts if they painted before the abort event. On 2015/11/17 13:33:08, Bryan McQuade wrote: > i'm inclined to either ignore aborts when a tab is backgrounded, or if you think > tracking time-to-abort for backgrounded tabs is useful, separating them into > '.background'-suffixed histograms. I'm inclined to ignore them for now and add later if we feel we need them. Good catch. Done. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:531: : error == net::ERR_ABORTED ? PROVISIONAL_LOAD_ERR_ABORTED On 2015/11/17 13:33:08, Bryan McQuade wrote: > IIRC downloads/204s report ERR_ABORTED, so we'll end up counting them as aborts > currently. I like your TODO above - wondering if it's important for us to fix it > so we can differentiate these cases. I guess as-is we can at least know that > these aborts will end up in the 'other' category and we can look at it with > skepticism for now. Still though I worry about consumers of this data not > understanding the subtleties and drawing incorrect conclusions. Let's see. It's possible clamy@ will land her change before this one. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:540: finished_nav->RecordProvisionalEvent(PROVISIONAL_LOAD_COMMITTED); On 2015/11/17 13:33:08, Bryan McQuade wrote: > just noticing now - unrelated to this change - should this call to record a > PROVISIONAL_LOAD_COMMITTED event be after the IsSamePage check below? I seem to > remember feeling this was correct at the time it was added but now I'm not sure. > This is not something we need to fix in this CL - but just want to make sure it > doesn't get overlooked. This was intentional. If we want we can break up the provisional loads into another class (same-page). This was done so all the provisional events add up. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.cc:564: aborted_provisional_loads_.clear(); On 2015/11/17 13:33:08, Bryan McQuade wrote: > should we clear aborted_provisional_loads_ even if IsRelevantNavigation returns > false? should this be moved further up, near committed_load_.reset()? Yupp good idea. Done. https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... File components/page_load_metrics/browser/metrics_web_contents_observer.h (right): https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; On 2015/11/17 13:33:08, Bryan McQuade wrote: > do you think the provisional vs committed distinction is important here? from a > user standpoint whether we aborted before vs after commit doesn't seem all that > significant to me. i could see it making sense to track a general time-to-commit > stat to track how quickly we get to commit (for e.g. plznav) but i'm not sure we > also need to track time-to-abort before vs after commit. Yeah the distinction is not great, but there are two main reasons to do this: - We don't know if a provisional load is relevant - We have much coarser heuristics for provisional loads https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; On 2015/11/17 13:33:08, Bryan McQuade wrote: > can we use 'BeforePaint' rather than 'BeforeFirstPaint' to shorten these names a > bit? Done. (BeforePaint) https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:79: "PageLoad.Timing2.Aborts.Provisional.ForwardBackBeforeFirstPaint"; On 2015/11/17 13:33:08, Bryan McQuade wrote: > nit: let's stick to singular rather than pluran in these names, so 'Abort' > rather than 'Aborts'. Done. (Aborts => Abort) https://codereview.chromium.org/1449253002/diff/1/components/page_load_metric... components/page_load_metrics/browser/metrics_web_contents_observer.h:213: ABORT_NULL, On 2015/11/17 13:33:08, Bryan McQuade wrote: > most enums I see use a _NONE suffix for this, rather than _NULL. Done.
Updated per discussion about adding a 100ms delay check on overriding abort causes. This new change keeps the aborted_provisional_load_ vector, but only overrides their abort cause if the timestamp was within 100ms of the load aborting. For some edge cases (like reloading a zillion times really fast), this can cause many loads to be bucketed in OTHER (because they were aborted by another provisional load whose transition we don't know, and their abort was > 100ms from the committed load). However, the other buckets will be more or less correct.
Some thoughts specifically on metric naming. I'll look at the implementation next. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... File components/page_load_metrics/browser/metrics_web_contents_observer.cc (right): https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:253: } can we add: else { DLOG(FATAL) << "Received ABORT_OTHER for committed load."; } https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... File components/page_load_metrics/browser/metrics_web_contents_observer.h (right): https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:69: "PageLoad.Timing2.NavigationToFirstBackgroundBeforePaint"; What do you think about using '.BeforePaint' as a suffix here? PageLoad.Timing2.NavigationToFirstBackground.BeforePaint This uses the same suffix as the proposal for aborts below, and IMO makes the name slightly more readable as well. And then (see comment below for more details) we can also use the .BeforeCommit suffix to track backgrounds of provisional loads: PageLoad.Timing2.NavigationToFirstBackground.BeforeCommit (a provisional load is also implicitly before first paint, since we can't paint until we commit). https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:81: "PageLoad.Timing2.Abort.Provisional.ForwardBackBeforePaint"; nit: ForwardBack -> ForwardBackNavigation (even though it's slightly longer) https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:81: "PageLoad.Timing2.Abort.Provisional.ForwardBackBeforePaint"; I know there are a bunch of different ways to name these, but I'm wondering if we can: * make them a bit shorter * name them such that they are self describing if you say the name aloud Including Timing2 in the name is technically correct since these are timing metrics, but I'd claim that these belong in their own top-level PageLoad metric category for aborts. Maybe 'Timing2' was named over generally since my intent for that category was to measure time to successful points during the page load. With all that in mind, how about using more suffixes. Example: Bare metric names are: PageLoad.Abort.ForwardBackNavigation PageLoad.Abort.Reload PageLoad.Abort.Stop etc but we would never log bare metrics, and only log with certain suffixes, below: .BeforeCommit suffix for provisional loads (note that this is also implicitly before paint, since paints only happen after commit): PageLoad.Abort.ForwardBackNavigation.BeforeCommit PageLoad.Abort.Reload.BeforeCommit etc .AfterCommit.BeforePaint suffix for things after commit but before paint: PageLoad.Abort.Reload.AfterCommit.BeforePaint we could consider dropping .AfterCommit and using just '.BeforePaint' which makes the metric name shorter: PageLoad.Abort.Reload.BeforePaint but I worry that consumers of these metrics would not realize that PageLoad.Abort.Reload.BeforePaint doesn't include uncommitted reloads before paint, and draw incorrect conclusions. So in this case I think it's better to be verbose and indicate that the metric tracks aborts after commit but before paint, by using the .AfterCommit.BeforePaint suffix. WDYT? https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:92: const char kHistogramProvisionalAbortBackground[] = looks like we aren't logging this anymore, but it does still seem like a useful metric. wdyt about logging this as PageLoad.Timing2.NavigationToFirstBackground.BeforeCommit? This makes for an additional case where we can use the .BeforeCommit suffix to log metrics for provisional loads, in addition to the proposal for aborts above. note that any load before commit is also before first paint, so .BeforeCommit implies before paint as well.
looks good overall. initial comments. will finish tomorrow. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... File components/page_load_metrics/browser/metrics_web_contents_observer.cc (right): https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:86: if (ui::PageTransitionCoreTypeIs(transition, ui::PAGE_TRANSITION_RELOAD)) any reason we use PageTransitionCoreTypeIs here but not in the 'else if' below? https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:221: // methods. Let's link to your design doc here so people reading the code can get more insight on why the 100ms threshold is the best overall option, given the complexity tradeoffs. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:225: abort_time_ = std::min(abort_time_, timestamp); do we ever expect timestamp to be less than abort_time_? on first glance that seems like it should never happen. if you don't expect it, i'd rather do: DCHECK_LE(abort_time_, timestamp); abort_time_ = timestamp; https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:383: if (!background_time_.is_null()) { should this be moved up above the timing_.IsEmpty() check as well given that these metrics are not based on data in PageLoadTiming? Perhaps add a TODO to address this in a follow up change. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:537: if (event != PROVISIONAL_LOAD_ERR_FAILED_NON_ABORT) { if event == PROVISIONAL_LOAD_STOPPED, should we pass ABORT_STOP to NotifyAbort? or does PROVISIONAL_LOAD_STOPPED have a slightly different meaning than the STOP in ABORT_STOP (in which case is there a better name we should use instead of PROVISIONAL_LOAD_STOPPED)?
I'm going to transition to using the new observer system, fwiw. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... File components/page_load_metrics/browser/metrics_web_contents_observer.cc (right): https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:86: if (ui::PageTransitionCoreTypeIs(transition, ui::PAGE_TRANSITION_RELOAD)) On 2015/11/23 22:39:44, Bryan McQuade wrote: > any reason we use PageTransitionCoreTypeIs here but not in the 'else if' below? PAGE_TRANSITION_FORWARD_BACK is a qualifier, not a core type. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:221: // methods. On 2015/11/23 22:39:44, Bryan McQuade wrote: > Let's link to your design doc here so people reading the code can get more > insight on why the 100ms threshold is the best overall option, given the > complexity tradeoffs. Done. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:225: abort_time_ = std::min(abort_time_, timestamp); On 2015/11/23 22:39:44, Bryan McQuade wrote: > do we ever expect timestamp to be less than abort_time_? on first glance that > seems like it should never happen. if you don't expect it, i'd rather do: > > DCHECK_LE(abort_time_, timestamp); > abort_time_ = timestamp; I think it is possible. For instance if we get an ABORT_OTHER which gets replaced by a navigation start. We always want the earliest value. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:253: } On 2015/11/23 21:33:42, Bryan McQuade wrote: > can we add: > else { > DLOG(FATAL) << "Received ABORT_OTHER for committed load."; > } Done. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.cc:537: if (event != PROVISIONAL_LOAD_ERR_FAILED_NON_ABORT) { On 2015/11/23 22:39:44, Bryan McQuade wrote: > if event == PROVISIONAL_LOAD_STOPPED, should we pass ABORT_STOP to NotifyAbort? > or does PROVISIONAL_LOAD_STOPPED have a slightly different meaning than the STOP > in ABORT_STOP (in which case is there a better name we should use instead of > PROVISIONAL_LOAD_STOPPED)? I think it should represent stop, but I'm not fully sure. Right now they supposedly should not be occuring but we're still seeing some samples. I think best to be safe for now and we can continue investigating. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... File components/page_load_metrics/browser/metrics_web_contents_observer.h (right): https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:81: "PageLoad.Timing2.Abort.Provisional.ForwardBackBeforePaint"; On 2015/11/23 21:33:43, Bryan McQuade wrote: > nit: ForwardBack -> ForwardBackNavigation (even though it's slightly longer) Done. https://codereview.chromium.org/1449253002/diff/40001/components/page_load_me... components/page_load_metrics/browser/metrics_web_contents_observer.h:92: const char kHistogramProvisionalAbortBackground[] = On 2015/11/23 21:33:42, Bryan McQuade wrote: > looks like we aren't logging this anymore, but it does still seem like a useful > metric. wdyt about logging this as > PageLoad.Timing2.NavigationToFirstBackground.BeforeCommit? This makes for an > additional case where we can use the .BeforeCommit suffix to log metrics for > provisional loads, in addition to the proposal for aborts above. note that any > load before commit is also before first paint, so .BeforeCommit implies before > paint as well. Yeah we can log this. I convinced myself it's useful.
Description was changed from ========== [page_load_metrics] User Initiated Abort Tracking Brief implementation design document here: https://docs.google.com/document/d/110xLqjh9pRBfUevRZYgJevyBoq36uulzkUvTYhDve... BUG=517209 ========== to ========== [page_load_metrics] User Initiated Abort Tracking Brief implementation design document here: https://docs.google.com/document/d/110xLqjh9pRBfUevRZYgJevyBoq36uulzkUvTYhDve... BUG=517209 ========== |