|
|
Chromium Code Reviews
DescriptionAdd NetworkTimeTracker UMA histograms
This CL adds histograms to count:
- the number of attempted time query fetches
- the number of fetches that succeeded
- the number of fetches that fetched valid data
- the number of fetches that failed (and their net error
code, if applicable)
This will help us investigate unexpectedly large numbers of
Canary users who do not have network time available.
This CL also fixes a small logic bug where a response had to have
both a non-200 status code *and* a non-success status to be
considered an error.
BUG=589700
TBR=zea@chromium.org
Committed: https://crrev.com/be8e2e6e69a68977785d2deb3d76b867a8abb88a
Cr-Commit-Position: refs/heads/master@{#409566}
Patch Set 1 #
Total comments: 17
Patch Set 2 : mab comments #
Total comments: 2
Patch Set 3 : add a comment about net::OK in the unit test #
Total comments: 4
Patch Set 4 : holte comments #
Messages
Total messages: 48 (27 generated)
Description was changed from ========== Add NetworkTimeTracker UMA histograms BUG=589700 ========== to ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. BUG=589700 ==========
Description was changed from ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. BUG=589700 ========== to ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 ==========
estark@chromium.org changed reviewers: + mab@google.com
mab, WDYT?
https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:373: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchAttempted", true); Is a Boolean the idiomatic way to count things? https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:382: if (time_fetcher_->GetStatus().status() != net::URLRequestStatus::SUCCESS || Yikes, good catch. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:387: -time_fetcher_->GetStatus().error()); Why negative? Is that the convention for errors? Is error() well-defined for all cases? https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); I think you could get away with having just one histogram that records just one outcome (including success) per attempt. That feels easier to read. In particular, do you have a theory about how an attempt would be started without subsequently invoking UpdateTimeFromResponse()? Or is part of the goal here to detect whether this is happening? It would be nice to be explicit about this detail.
(no code changes yet, clarifying question inline) https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:373: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchAttempted", true); On 2016/07/26 21:05:48, mab wrote: > Is a Boolean the idiomatic way to count things? I have been told that it is, though admittedly I can't remember who told me that and don't think it's documented anywhere. There are other examples of this, though (https://cs.chromium.org/chromium/src/chrome/browser/chromeos/first_run/first_..., https://cs.chromium.org/chromium/src/chrome/browser/extensions/api/gcm/gcm_ap..., https://cs.chromium.org/chromium/src/chrome/browser/extensions/chrome_content..., https://cs.chromium.org/chromium/src/chrome/browser/profiles/profile_manager....). https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:387: -time_fetcher_->GetStatus().error()); On 2016/07/26 21:05:48, mab wrote: > Why negative? Is that the convention for errors? Net error codes are negative (see net/base/net_error_list.h) but histograms.xml defines a parallel enum that is positive (https://cs.chromium.org/chromium/src/tools/metrics/histograms/histograms.xml?...). :( I'll add a comment. > > Is error() well-defined for all cases? Believe so. It will be net::OK in some cases (e.g. connection succeeded but server responded with a 500) but I think that's okay. If we are getting large numbers of UpdateTimeFetchFailed with net::OK as the error code, then I think we should be able to diagnose on the server what non-200 status codes it's sending. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); On 2016/07/26 21:05:48, mab wrote: > I think you could get away with having just one histogram that records just one > outcome (including success) per attempt. That feels easier to read. > > In particular, do you have a theory about how an attempt would be started > without subsequently invoking UpdateTimeFromResponse()? Or is part of the goal > here to detect whether this is happening? It would be nice to be explicit about > this detail. By "just one histogram", do you mean collapsing UpdateTimeFetchAttempted and UpdateTimeFetchSucceeded into one, or collapsing all 4 into one (including UpdateTimeFetchFailed and UpdateTimeFetchValid)? If the latter, I'm not sure how I would collapse UpdateTimeFetchFailed and continue to record the error code. I could collapse UpdateTimeFetchSucceeded and UpdateTimeFetchValid into one very easily, I should at the very least do that. As for why I separated Attempted and Succeeded, I was thinking that a request could hang in some way that messes things up, e.g. we are issuing a query before the previous one finishes or something like that. Do you think that sounds remotely plausible? If so I'll leave them separate but add a comment about why they are separate.
https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:373: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchAttempted", true); Ack. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:387: -time_fetcher_->GetStatus().error()); OK, SG. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); I meant collapse all 4 into one. Is the problem that there is no way to reserve values outside the space of valid error() codes, to represent various the various non-transport-error possibilities? Or is just that this would be ugly? (I don't reject these arguments; just want to understand what you're saying.) To your second point: I'm trying to imagine what we could learn from the numbers. It seems implausible that Succeeded would be greater than Attempted. They might be ~equal. Or, possibly, Attempted might be greater than Failed + Succeeded by enough that it's weird. I guess I don't object to checking for that possibility, but I posit that this can only be the result of a logic error. (And your comment makes me think you may agree with this.) If so, perhaps it makes sense to remove the Attempted counter in the future if we don't see the condition it's looking for.
In the latest patch set I combined the Succeeded and Valid histograms into one. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); On 2016/07/27 02:47:56, mab wrote: > I meant collapse all 4 into one. Is the problem that there is no way to reserve > values outside the space of valid error() codes, to represent various the > various non-transport-error possibilities? Or is just that this would be ugly? > (I don't reject these arguments; just want to understand what you're saying.) Yeah, I don't know of any way to do that... but a metrics owner will have to approve the histograms.xml changes so maybe they will come along and enlighten me. :) > > To your second point: I'm trying to imagine what we could learn from the > numbers. It seems implausible that Succeeded would be greater than Attempted. > They might be ~equal. Or, possibly, Attempted might be greater than Failed + > Succeeded by enough that it's weird. > > I guess I don't object to checking for that possibility, but I posit that this > can only be the result of a logic error. (And your comment makes me think you > may agree with this.) If so, perhaps it makes sense to remove the Attempted > counter in the future if we don't see the condition it's looking for. That sounds reasonable, and yes I think I agree with you that it would be a logic error. How about we put it on Canary for a few days and then remove it if it's not interesting?
lgtm https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:382: if (time_fetcher_->GetStatus().status() != net::URLRequestStatus::SUCCESS || (I take it your metrics would have caught this bug?) https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); Sure. Can you document that intent somewhere outside this CL? https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker_unittest.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker_unittest.cc:601: histograms.ExpectBucketCount(kFetchFailedHistogram, net::OK, 1); Why is this net::OK for an error response? https://codereview.chromium.org/2176373003/diff/20001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/20001/components/network_time... components/network_time/network_time_tracker.cc:389: -time_fetcher_->GetStatus().error()); What do you think about recording this in the success case as well as the error case, just for completeness?
The CQ bit was checked by estark@chromium.org to run a CQ dry run
https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:382: if (time_fetcher_->GetStatus().status() != net::URLRequestStatus::SUCCESS || On 2016/07/28 06:58:54, mab wrote: > (I take it your metrics would have caught this bug?) Presumably, yeah -- one of the tests I added failed. https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:391: UMA_HISTOGRAM_BOOLEAN("NetworkTimeTracker.UpdateTimeFetchSucceeded", true); On 2016/07/28 06:58:55, mab wrote: > Sure. Can you document that intent somewhere outside this CL? Filed https://crbug.com/632419 https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... File components/network_time/network_time_tracker_unittest.cc (right): https://codereview.chromium.org/2176373003/diff/1/components/network_time/net... components/network_time/network_time_tracker_unittest.cc:601: histograms.ExpectBucketCount(kFetchFailedHistogram, net::OK, 1); On 2016/07/28 06:58:55, mab wrote: > Why is this net::OK for an error response? net::OK because there was no network error, just a non-200 status code. I could have sworn I had a comment here to explain that. O.o I guess I was thinking of the histogram description, so I'll add a comment here. I don't think we need to histogram the HTTP status codes that we receive because I assume we can get that data from the server. https://codereview.chromium.org/2176373003/diff/20001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/20001/components/network_time... components/network_time/network_time_tracker.cc:389: -time_fetcher_->GetStatus().error()); On 2016/07/28 06:58:55, mab wrote: > What do you think about recording this in the success case as well as the error > case, just for completeness? Hmm, we'd have no way to differentiate success from net::OK-with-a-non-200-status-code in that case. I think we'd have to do two histograms, one for net error code and one for HTTP status code. (Where we could only record the latter in non-net::OK cases, and doesn't give us any information that we couldn't get from the server.) I think that confuses things a lot just for completeness.
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
estark@chromium.org changed reviewers: + holte@chromium.org, zea@chromium.org
zea: can you please review components/network_time_tracker? holte: can you please review histograms.xml? Thanks!
https://codereview.chromium.org/2176373003/diff/40001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/40001/components/network_time... components/network_time/network_time_tracker.cc:398: UMA_HISTOGRAM_BOOLEAN(kHistogramName, false); This macro uses a static pointer to the underlying Histogram object, so it is generally prefered to use the histogram name inline, and only invoke the macro from a single place. You can coalesce these by using a helper function in the anonymous namespace. https://codereview.chromium.org/2176373003/diff/40001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2176373003/diff/40001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34630: +<histogram name="NetworkTimeTracker.UpdateTimeFetchValid"> Add enum="BooleanValid"
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by estark@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Patchset #4 (id:60001) has been deleted
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-gn on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-gn/bui...) ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-gn on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-gn/...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by estark@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2176373003/diff/40001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2176373003/diff/40001/components/network_time... components/network_time/network_time_tracker.cc:398: UMA_HISTOGRAM_BOOLEAN(kHistogramName, false); On 2016/07/28 19:22:46, Steven Holte wrote: > This macro uses a static pointer to the underlying Histogram object, so it is > generally prefered to use the histogram name inline, and only invoke the macro > from a single place. You can coalesce these by using a helper function in the > anonymous namespace. Done. https://codereview.chromium.org/2176373003/diff/40001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2176373003/diff/40001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:34630: +<histogram name="NetworkTimeTracker.UpdateTimeFetchValid"> On 2016/07/28 19:22:46, Steven Holte wrote: > Add enum="BooleanValid" Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
m7md.aljehani@gmail.com changed reviewers: + m7md.aljehani@gmail.com
lgtm
The patchset sent to the CQ was uploaded after l-g-t-m from mab@google.com Link to the patchset: https://codereview.chromium.org/2176373003/#ps80001 (title: "holte comments")
Friendly ping to zea and holte
estark@chromium.org changed reviewers: - m7md.aljehani@gmail.com
lgtm
estark@chromium.org changed reviewers: + jochen@chromium.org
Hi Jochen -- The only components/network_time_tracker OWNER (zea@) is OOO right now, so I'm wondering if you'd be comfortable reviewing this. It's not super urgent so I can wait till he gets back if you'd rather, but thought I'd check. Thanks!
jochen@chromium.org changed reviewers: + m7md.aljehani@gmail.com
lgtm
Description was changed from ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 ========== to ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 TBR=zea@chromium.org ==========
The CQ bit was checked by estark@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by estark@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 TBR=zea@chromium.org ========== to ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 TBR=zea@chromium.org ==========
Message was sent while issue was closed.
Committed patchset #4 (id:80001)
Message was sent while issue was closed.
Description was changed from ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 TBR=zea@chromium.org ========== to ========== Add NetworkTimeTracker UMA histograms This CL adds histograms to count: - the number of attempted time query fetches - the number of fetches that succeeded - the number of fetches that fetched valid data - the number of fetches that failed (and their net error code, if applicable) This will help us investigate unexpectedly large numbers of Canary users who do not have network time available. This CL also fixes a small logic bug where a response had to have both a non-200 status code *and* a non-success status to be considered an error. BUG=589700 TBR=zea@chromium.org Committed: https://crrev.com/be8e2e6e69a68977785d2deb3d76b867a8abb88a Cr-Commit-Position: refs/heads/master@{#409566} ==========
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/be8e2e6e69a68977785d2deb3d76b867a8abb88a Cr-Commit-Position: refs/heads/master@{#409566} |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
