|
|
Created:
5 years, 7 months ago by Randy Smith (Not in Mondays) Modified:
5 years, 7 months ago CC:
chromium-reviews, darin-cc_chromium.org, jam, asvitkine+watch_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionCreate histograms to track time-to-error values for different errors.
This is only done for main frame loads (so that it matches the behavior
of Net.ErrorCodesForMainFrame3), and is done for the dominant errors
seen in the wild. It also includes successful loads (for comparison) and all
non-dominant errors in a final bucket (for completeness).
BUG=487663
R=mmenke@chromium.org
R=jkarlin@chromium.org
Committed: https://crrev.com/8b7c064122b1d0a28718672a44bcaea99f0d4d91
Cr-Commit-Position: refs/heads/master@{#329858}
Patch Set 1 #Patch Set 2 : Merged to revision p329442. #
Total comments: 10
Patch Set 3 : Incorporated comments. #
Total comments: 2
Patch Set 4 : Added Connection Timed Out to error list. #
Total comments: 2
Patch Set 5 : Add name not resolved to histograms.xml. #
Total comments: 3
Patch Set 6 : Used histogram suffixes. #Patch Set 7 : Shifted separator to '.' #
Total comments: 2
Patch Set 8 : Switched header file. #
Messages
Total messages: 38 (7 generated)
Josh: Could you review this with an eye for how useful these histograms are for examining chrome behavior in poor network situations. Matt: Could you review this with an eye towards how well the code fits into resource_dispatcher_host_impl.cc? Thanks!
https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:867: // the aggregate remainder errors. High level question: What's the motivation for adding timing hooks here instead of in https://code.google.com/p/chromium/codesearch#chromium/src/net/url_request/ur... Looks like there's also a cancel time there. The times here do include redirects, and those don't, but adding yet another incomparable set of time histograms. I suppose I do want to get rid of the LOAD_MAIN_FRAME flag, and we can't only filter out the non-main frame resources there without using it. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:870: switch (loader->request()->status().error()) { Looking at error without looking at loader->request()->status().status() makes me nervous. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:873: "Net.RequestTimeToSuccess", request_loading_time); optional: May want to add another dot in here, to make them stand out more. Net.RequestTime.Blah https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:875: case net::ERR_ABORTED: We should include loader->request()->status().status() == CANCELED in this block, too.
lgtm https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:890: break; Perhaps ERR_CONNECTION_REFUSED, ERR_CONNECTION_TIMED_OUT, and ERR_NAME_NOT_RESOLVED as well? Those all seem to show up a fair bit in Net.ErrorCodesForMainFrame3 UMA.
On 2015/05/12 18:23:26, mmenke wrote: > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > File content/browser/loader/resource_dispatcher_host_impl.cc (right): > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:867: // the aggregate > remainder errors. > High level question: > > What's the motivation for adding timing hooks here instead of in > https://code.google.com/p/chromium/codesearch#chromium/src/net/url_request/ur... > > Looks like there's also a cancel time there. The times here do include > redirects, and those don't, but adding yet another incomparable set of time > histograms. > > I suppose I do want to get rid of the LOAD_MAIN_FRAME flag, and we can't only > filter out the non-main frame resources there without using it. > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:870: switch > (loader->request()->status().error()) { > Looking at error without looking at loader->request()->status().status() makes > me nervous. > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:873: > "Net.RequestTimeToSuccess", request_loading_time); > optional: May want to add another dot in here, to make them stand out more. > > Net.RequestTime.Blah > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:875: case > net::ERR_ABORTED: > We should include loader->request()->status().status() == CANCELED in this > block, too. These all see to have a similar theme, so I'm responding at a high level, and when we resolve the high level issue I'll go back to the details and make sure I've taken care of everything. The basic issue here is that I'm trying to create stats in parallel to Net.ErrorCodesForMainFrame3, because that's got a fairly direct connection to the (bad) experience the user has that's driven by network loads. That's why the location in the code, why I'm just looking at status().error(). Do the concerns you've raised apply to Net.ErrorCodesForMainFrame3? If so, do you have a sense as to how/if that histogram is going to be changed to address those concerns? I don't have a major objection to moving the histograms to url_request_http_job.cc, as long as I can keep them main frame only; what would your feelings about that be?
On 2015/05/12 18:32:21, rdsmith wrote: > On 2015/05/12 18:23:26, mmenke wrote: > > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > > File content/browser/loader/resource_dispatcher_host_impl.cc (right): > > > > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > > content/browser/loader/resource_dispatcher_host_impl.cc:867: // the aggregate > > remainder errors. > > High level question: > > > > What's the motivation for adding timing hooks here instead of in > > > https://code.google.com/p/chromium/codesearch#chromium/src/net/url_request/ur... > > > > Looks like there's also a cancel time there. The times here do include > > redirects, and those don't, but adding yet another incomparable set of time > > histograms. > > > > I suppose I do want to get rid of the LOAD_MAIN_FRAME flag, and we can't only > > filter out the non-main frame resources there without using it. > > > > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > > content/browser/loader/resource_dispatcher_host_impl.cc:870: switch > > (loader->request()->status().error()) { > > Looking at error without looking at loader->request()->status().status() makes > > me nervous. > > > > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > > content/browser/loader/resource_dispatcher_host_impl.cc:873: > > "Net.RequestTimeToSuccess", request_loading_time); > > optional: May want to add another dot in here, to make them stand out more. > > > > Net.RequestTime.Blah > > > > > https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... > > content/browser/loader/resource_dispatcher_host_impl.cc:875: case > > net::ERR_ABORTED: > > We should include loader->request()->status().status() == CANCELED in this > > block, too. > > These all see to have a similar theme, so I'm responding at a high level, and > when we resolve the high level issue I'll go back to the details and make sure > I've taken care of everything. The basic issue here is that I'm trying to > create stats in parallel to Net.ErrorCodesForMainFrame3, because that's got a > fairly direct connection to the (bad) experience the user has that's driven by > network loads. That's why the location in the code, why I'm just looking at > status().error(). Do the concerns you've raised apply to > Net.ErrorCodesForMainFrame3? If so, do you have a sense as to how/if that > histogram is going to be changed to address those concerns? > > I don't have a major objection to moving the histograms to > url_request_http_job.cc, as long as I can keep them main frame only; what would > your feelings about that be? (Just a sub-note that it's not my ideal because then I can't assume that the Net.ErrorCodesForMainFrame3 histogram is directly comparable with these outputs, but I don't think that's a really high priority.)
On 2015/05/12 18:33:00, rdsmith wrote: > On 2015/05/12 18:32:21, rdsmith wrote: > > These all see to have a similar theme, so I'm responding at a high level, and > > when we resolve the high level issue I'll go back to the details and make sure > > I've taken care of everything. The basic issue here is that I'm trying to > > create stats in parallel to Net.ErrorCodesForMainFrame3, because that's got a > > fairly direct connection to the (bad) experience the user has that's driven by > > network loads. That's why the location in the code, why I'm just looking at > > status().error(). Do the concerns you've raised apply to > > Net.ErrorCodesForMainFrame3? If so, do you have a sense as to how/if that > > histogram is going to be changed to address those concerns? They do. URLRequest::DoCancel has the fun and exciting behavior of setting an error other than ERR_ABORTED along with URLRequestStatus::Canceled, which is weird. https://code.google.com/p/chromium/codesearch#chromium/src/content/browser/lo... is a lovely bit of code that handles the URLRequestStatus::CANCELED / FAILED and error() == net::OK cases. I had been thinking there was a more general case where we were mapping CANCELED to ERR_ABORTED, but I must be misremembering - I can't find it. Given AsyncResourceHandler's behavior, looks like it's best to keep as-is, for now. > > I don't have a major objection to moving the histograms to > > url_request_http_job.cc, as long as I can keep them main frame only; what > would > > your feelings about that be? > (Just a sub-note that it's not my ideal because then I can't assume that the > Net.ErrorCodesForMainFrame3 histogram is directly comparable with these outputs, > but I don't think that's a really high priority.) I'm thinking this may be the way to go - both to correlate with the Net.ErrorCodesForMainFrame3 histogram, and because I want to get the load flag out of net.
Next round. Josh, Matt: PTAL? https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:867: // the aggregate remainder errors. On 2015/05/12 18:23:26, mmenke wrote: > High level question: > > What's the motivation for adding timing hooks here instead of in > https://code.google.com/p/chromium/codesearch#chromium/src/net/url_request/ur... > > Looks like there's also a cancel time there. The times here do include > redirects, and those don't, but adding yet another incomparable set of time > histograms. > > I suppose I do want to get rid of the LOAD_MAIN_FRAME flag, and we can't only > filter out the non-main frame resources there without using it. Discussed on main CL thread; IMO no code changes came from discussion. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:870: switch (loader->request()->status().error()) { On 2015/05/12 18:23:26, mmenke wrote: > Looking at error without looking at loader->request()->status().status() makes > me nervous. Believe main thread discussion left this as ok to be consistent wiht the ErrorCodesForMainFrame3 histogram above. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:873: "Net.RequestTimeToSuccess", request_loading_time); On 2015/05/12 18:23:26, mmenke wrote: > optional: May want to add another dot in here, to make them stand out more. > > Net.RequestTime.Blah Good idea; done. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:875: case net::ERR_ABORTED: On 2015/05/12 18:23:26, mmenke wrote: > We should include loader->request()->status().status() == CANCELED in this > block, too. Left as is for compatibility with ErrorCodesForMainFrame3 above as per main thread discussion. https://codereview.chromium.org/1127383009/diff/20001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:890: break; On 2015/05/12 18:26:06, jkarlin wrote: > Perhaps ERR_CONNECTION_REFUSED, ERR_CONNECTION_TIMED_OUT, and > ERR_NAME_NOT_RESOLVED as well? Those all seem to show up a fair bit in > Net.ErrorCodesForMainFrame3 UMA. ERR_NAME_NOT_RESOLVED was an oversight on my part; added. CONNECTION_REFUSED doesn't seem that high to me (close to a tenth of INTERNET_DISCONNECTED/NAME_NOT_RESOLVED in an android only everything UMA view) and I can't find CONNECTION_TIMED_OUT at all (as opposed to TIMED_OUT, which I'm already including). I've also sorted these by name, which might have been the confusion with the timed out errors.
https://codereview.chromium.org/1127383009/diff/40001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/40001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:891: case net::ERR_TIMED_OUT: Josh was right about ERR_CONNECTION_TIMED_OUT - TIMED_OUT is pretty uncommon. I think we only get it on TCP keep-alive timeouts. ERR_CONNECTION_TIMED_OUT is when it takes too long to establish a connection, and is pretty clearly what you want here.
https://codereview.chromium.org/1127383009/diff/40001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/40001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:891: case net::ERR_TIMED_OUT: On 2015/05/12 20:57:00, mmenke wrote: > Josh was right about ERR_CONNECTION_TIMED_OUT - TIMED_OUT is pretty uncommon. I > think we only get it on TCP keep-alive timeouts. ERR_CONNECTION_TIMED_OUT is > when it takes too long to establish a connection, and is pretty clearly what you > want here. Oh, we also get it when we sent a TCP packet and never get a response, I guess.
So I've added CONNECTION_TIMED_OUT, but in all the stats I'm looking at (which are mobile and EM focussed), TIMED_OUT shows up at about double the frequency. FWIW. PTAL?
On 2015/05/13 15:27:25, rdsmith wrote: > So I've added CONNECTION_TIMED_OUT, but in all the stats I'm looking at (which > are mobile and EM focussed), TIMED_OUT shows up at about double the frequency. > FWIW. > > PTAL? Just FYI, iOS does not use use RDH, so this won't give us those. Could you file a bug for this? Be nice to have some record of the reason we're looking at these, for the record, and allows for better evaluation of what errors we should care about here.
On 2015/05/13 15:39:40, mmenke wrote: > On 2015/05/13 15:27:25, rdsmith wrote: > > So I've added CONNECTION_TIMED_OUT, but in all the stats I'm looking at (which > > are mobile and EM focussed), TIMED_OUT shows up at about double the frequency. > > > FWIW. > > > > PTAL? > > Just FYI, iOS does not use use RDH, so this won't give us those. "Those" being numbers on iOS. > Could you file a bug for this? Be nice to have some record of the reason we're > looking at these, for the record, and allows for better evaluation of what > errors we should care about here.
On 2015/05/13 15:39:40, mmenke wrote: > On 2015/05/13 15:27:25, rdsmith wrote: > > So I've added CONNECTION_TIMED_OUT, but in all the stats I'm looking at (which > > are mobile and EM focussed), TIMED_OUT shows up at about double the frequency. > > > FWIW. > > > > PTAL? > > Just FYI, iOS does not use use RDH, so this won't give us those. > > Could you file a bug for this? Be nice to have some record of the reason we're > looking at these, for the record, and allows for better evaluation of what > errors we should care about here. Bug filed. Given your comments about iOS I think I'm not going to worry about it.
https://codereview.chromium.org/1127383009/diff/60001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/60001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:893: "Net.RequestTime.ErrNameNotResolved", request_loading_time); This doesn't have corresponding xml data.
https://codereview.chromium.org/1127383009/diff/60001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/60001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:893: "Net.RequestTime.ErrNameNotResolved", request_loading_time); On 2015/05/13 16:26:57, jkarlin wrote: > This doesn't have corresponding xml data. *rolls eyes at self* Fixed.
nit: In description, dominate errors -> dominant errors. LGTM
On 2015/05/13 16:39:27, mmenke wrote: > nit: In description, dominate errors -> dominant errors. > > LGTM (Note that nit applies twice)
On 2015/05/13 16:39:47, mmenke wrote: > On 2015/05/13 16:39:27, mmenke wrote: > > nit: In description, dominate errors -> dominant errors. > > > > LGTM > > (Note that nit applies twice) Fixed. Thanks!
The CQ bit was checked by rdsmith@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from jkarlin@chromium.org Link to the patchset: https://codereview.chromium.org/1127383009/#ps80001 (title: "Add name not resolved to histograms.xml.")
The CQ bit was unchecked by rdsmith@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1127383009/80001
rdsmith@chromium.org changed reviewers: + asvitkine@chromium.org
Alexei: Could you take a look at histograms.xml?
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: chromium_presubmit on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:21158: +<histogram name="Net.RequestTime.ErrAborted"> You can use the histogram_suffixes feature to make this more concise - see docs at the top of the file and examples throughout.
Alexei: PTAL? Matt: Note that I'm now using "_" instead of "." as the separator between "RequestTime" and the particular type of request time. https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:21158: +<histogram name="Net.RequestTime.ErrAborted"> On 2015/05/13 16:55:30, Alexei Svitkine wrote: > You can use the histogram_suffixes feature to make this more concise - see docs > at the top of the file and examples throughout. I didn't find a way to use the same names; it looked from the docs like using suffices for real always used "_", and using suffix with order=prefix put things after the first ".", not the second. So I switched names and used histogram_suffixes; please let e know if I did something wrong.
https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:21158: +<histogram name="Net.RequestTime.ErrAborted"> On 2015/05/13 18:29:56, rdsmith wrote: > On 2015/05/13 16:55:30, Alexei Svitkine wrote: > > You can use the histogram_suffixes feature to make this more concise - see > docs > > at the top of the file and examples throughout. > > I didn't find a way to use the same names; it looked from the docs like using > suffices for real always used "_", and using suffix with order=prefix put things > after the first ".", not the second. So I switched names and used > histogram_suffixes; please let e know if I did something wrong. You can specify separator="." if you'd like.
On 2015/05/13 18:46:18, Alexei Svitkine wrote: > https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... > File tools/metrics/histograms/histograms.xml (right): > > https://codereview.chromium.org/1127383009/diff/80001/tools/metrics/histogram... > tools/metrics/histograms/histograms.xml:21158: +<histogram > name="Net.RequestTime.ErrAborted"> > On 2015/05/13 18:29:56, rdsmith wrote: > > On 2015/05/13 16:55:30, Alexei Svitkine wrote: > > > You can use the histogram_suffixes feature to make this more concise - see > > docs > > > at the top of the file and examples throughout. > > > > I didn't find a way to use the same names; it looked from the docs like using > > suffices for real always used "_", and using suffix with order=prefix put > things > > after the first ".", not the second. So I switched names and used > > histogram_suffixes; please let e know if I did something wrong. > > You can specify separator="." if you'd like. I would actually. Hold for new PS.
New PS uploaded. PTAL.
lgtm % nit https://codereview.chromium.org/1127383009/diff/120001/content/browser/loader... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/120001/content/browser/loader... content/browser/loader/resource_dispatcher_host_impl.cc:22: #include "base/metrics/histogram.h" Nit: Can you switch this to histogram_macros.h instead?
Thanks! https://codereview.chromium.org/1127383009/diff/120001/content/browser/loader... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1127383009/diff/120001/content/browser/loader... content/browser/loader/resource_dispatcher_host_impl.cc:22: #include "base/metrics/histogram.h" On 2015/05/14 14:56:04, Alexei Svitkine wrote: > Nit: Can you switch this to histogram_macros.h instead? Done.
The CQ bit was checked by rdsmith@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from jkarlin@chromium.org, mmenke@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/1127383009/#ps140001 (title: "Switched header file.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1127383009/140001
Message was sent while issue was closed.
Committed patchset #8 (id:140001)
Message was sent while issue was closed.
Patchset 8 (id:??) landed as https://crrev.com/8b7c064122b1d0a28718672a44bcaea99f0d4d91 Cr-Commit-Position: refs/heads/master@{#329858} |