|
|
DescriptionWhen network time is unavailable, record the reason in UMA
Network time is frequently unavailable when we hit certificate date
errors. This CL histograms additional information about why network time
is unavailable (specifically, whether no time query has been made or
whether the system clock lost sync since the last time query).
To do so, I deprecated the old network time histogram that shared an
enum with the build time histogram, and introduced a new network
time histogram with a network time enum (rather than adding more
network-specific values to the enum that is shared with the build time
histogram).
BUG=
Committed: https://crrev.com/62a431efd87c068af27f689f7f6ef0fa7b44ffe5
Cr-Commit-Position: refs/heads/master@{#415244}
Patch Set 1 #
Total comments: 11
Patch Set 2 : rebase #Patch Set 3 : mab comments #Patch Set 4 : fix compile failure #
Total comments: 6
Patch Set 5 : mab comments #
Total comments: 1
Patch Set 6 : rebase #Patch Set 7 : holte comments #
Total comments: 2
Patch Set 8 : don't use kClockDivergenceSeconds in histogram buckets #Patch Set 9 : fix maximums to not overflow, and fix unit tests #
Messages
Total messages: 50 (25 generated)
Description was changed from ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network timehistogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= ========== to ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network time histogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= ==========
estark@chromium.org changed reviewers: + mab@google.com
mab, PTAL?
lgtm https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:206: time_at_last_measurement_ = base::Time::FromJsTime(time_js); Totally optional, but I've thought for a while maybe we should just get rid of storing time in prefs. It might make startup behavior easier to reason about. https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.h (right): https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:59: NETWORK_TIME_SYNC_LOST, I was thinking this morning that if we've once obtained sync, and if we trust the TickClock, we at least know a lower bound on the time. I don't know if this is useful. :-) https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:62: NETWORK_TIME_NO_SYNC, There's an implementation detail here that you may or may not want to expose. Either the TickClock is ahead of the Clock, or the Clock is ahead of the TickClock. Might it be useful to know which? https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:91: base::TimeDelta* uncertainty) const; Most callers don't care about the reason for failure. You could do this if you wanted to avoid changing callers. But there are so few callers maybe it isn't worth it. bool GetNetworkTime(base::Time* network_time, base::TimeDelta* uncertainty, NetworkTimeResult* status = nullptr) https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... components/ssl_errors/error_classification.cc:238: network_state = NETWORK_CLOCK_STATE_FUTURE; I'm not sure how to fix this, but it sounds a bit funny. It's not the network clock that's in the future, but the system clock!
The CQ bit was checked by estark@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
estark@chromium.org changed reviewers: + holte@chromium.org, thestig@chromium.org, zea@chromium.org
zea, can you please review components/network_time? holte, can you please review histograms.xml? thestig, can you please review chrome/browser/upgrade_detector_impl.cc? https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.cc:206: time_at_last_measurement_ = base::Time::FromJsTime(time_js); On 2016/08/16 19:04:52, mab wrote: > Totally optional, but I've thought for a while maybe we should just get rid of > storing time in prefs. It might make startup behavior easier to reason about. +1, the prefs stuff hurts my brain. Will save for a different CL though. https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... File components/network_time/network_time_tracker.h (right): https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:59: NETWORK_TIME_SYNC_LOST, On 2016/08/16 19:04:52, mab wrote: > I was thinking this morning that if we've once obtained sync, and if we trust > the TickClock, we at least know a lower bound on the time. I don't know if this > is useful. :-) Acknowledged. Might be useful; I was also thinking that if it turns out that we just can't keep sync well, then maybe we should revisit the strategy of querying on-demand when we hit ERR_CERT_DATE_INVALID... https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:62: NETWORK_TIME_NO_SYNC, On 2016/08/16 19:04:52, mab wrote: > There's an implementation detail here that you may or may not want to expose. > Either the TickClock is ahead of the Clock, or the Clock is ahead of the > TickClock. Might it be useful to know which? It occurred to me that knowing the amount of divergence might be useful (maybe kClockDivergenceSeconds is a little too small and we could maintain "sync" a lot more often if we relax the definition of "synced"). So I added another histogram to record the difference of the two clocks, which will also reveal which was ahead of the other. https://codereview.chromium.org/2254433003/diff/1/components/network_time/net... components/network_time/network_time_tracker.h:91: base::TimeDelta* uncertainty) const; On 2016/08/16 19:04:52, mab wrote: > Most callers don't care about the reason for failure. You could do this if you > wanted to avoid changing callers. But there are so few callers maybe it isn't > worth it. > > bool GetNetworkTime(base::Time* network_time, base::TimeDelta* uncertainty, > NetworkTimeResult* status = nullptr) Acknowledged. Decided to leave as-is because I already changed the small number of callers, plus I think default arguments might be discouraged or forbidden in Chromium. https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... components/ssl_errors/error_classification.cc:238: network_state = NETWORK_CLOCK_STATE_FUTURE; On 2016/08/16 19:04:52, mab wrote: > I'm not sure how to fix this, but it sounds a bit funny. It's not the network > clock that's in the future, but the system clock! Changed it to NETWORK_CLOCK_STATE_CLOCK_IN_FUTURE (/PAST) which is kinda clunky but hopefully slightly less confusing.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by estark@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm https://codereview.chromium.org/2254433003/diff/60001/chrome/browser/upgrade_... File chrome/browser/upgrade_detector_impl.cc (right): https://codereview.chromium.org/2254433003/diff/60001/chrome/browser/upgrade_... chrome/browser/upgrade_detector_impl.cc:415: network_time::NetworkTimeTracker::NETWORK_TIME_AVAILABLE) { network time, network time, network time!
lgtm
https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... File components/ssl_errors/error_classification.cc (right): https://codereview.chromium.org/2254433003/diff/1/components/ssl_errors/error... components/ssl_errors/error_classification.cc:238: network_state = NETWORK_CLOCK_STATE_FUTURE; _DINOSAURS and _FLYING_CARS? https://codereview.chromium.org/2254433003/diff/60001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/60001/components/network_time... components/network_time/network_time_tracker.cc:325: return NETWORK_TIME_SYNC_LOST; Do you not want to record the new histogram in this case as well? https://codereview.chromium.org/2254433003/diff/60001/components/network_time... components/network_time/network_time_tracker.cc:335: divergence.InSeconds()); Is this always positive? It would be nice to get a sign as well as a magnitude, so we know which clock is ahead of the other.
The CQ bit was checked by estark@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2254433003/diff/60001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/60001/components/network_time... components/network_time/network_time_tracker.cc:325: return NETWORK_TIME_SYNC_LOST; On 2016/08/19 01:51:08, mab wrote: > Do you not want to record the new histogram in this case as well? I think it would be clearest to have this be yet another histogram. NetworkTimeTracker.ClockDivergence is the tick delta compared to the time delta, whereas this is the wall clock compared to itself. Hmm, guess I might as well add that one as well. https://codereview.chromium.org/2254433003/diff/60001/components/network_time... components/network_time/network_time_tracker.cc:335: divergence.InSeconds()); On 2016/08/19 01:51:07, mab wrote: > Is this always positive? It would be nice to get a sign as well as a magnitude, > so we know which clock is ahead of the other. This should be signed. My bad for not writing a test to make it obvious. Added one now.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2254433003/diff/80001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/80001/components/network_time... components/network_time/network_time_tracker.cc:336: UMA_HISTOGRAM_SPARSE_SLOWLY("NetworkTimeTracker.ClockDivergence", Using sparse histogram here seems a little surprising. Are you sure that this won't get called frequently with many different values? Sparse histogram will use more memory for each distinct value recorded. It can also be problematic to process this histogram if you get bunch of distinct values from different users, since this histogram won't be aggregate well.
On 2016/08/19 19:24:52, Steven Holte wrote: > https://codereview.chromium.org/2254433003/diff/80001/components/network_time... > File components/network_time/network_time_tracker.cc (right): > > https://codereview.chromium.org/2254433003/diff/80001/components/network_time... > components/network_time/network_time_tracker.cc:336: > UMA_HISTOGRAM_SPARSE_SLOWLY("NetworkTimeTracker.ClockDivergence", > Using sparse histogram here seems a little surprising. Are you sure that this > won't get called frequently with many different values? Sparse histogram will > use more memory for each distinct value recorded. > > It can also be problematic to process this histogram if you get bunch of > distinct values from different users, since this histogram won't be aggregate > well. Oh, I think I misunderstood what SPARSE_SLOWLY is for, because I am indeed expecting this to get called with many different values. Can you recommend the right macro to use for this? I'm trying to track the number of seconds by which a user's clock runs backwards or by which the tick clock and wall clock get out of sync with each other.
On 2016/08/19 20:09:55, Steven Holte wrote: I think Reitveld ate my comment, but basically look at UMA_HISTOGRAM_CUSTOM_TIMES and macros that reuse it. Pick a min and max time that are reasonable boundaries one the times that you care about the distinction between, so if you don't care about subsecond times, use a min of 1 second, and if all you need to know about a time is that it was longer than day, use a max of 1 day. The buckets will spaced on a log scale, so you'll have about the same number of buckets between 1-100 seconds as 100 - 10000. The cost of the histogram is proportional to the number of buckets, and 50 is the most common number used.
https://codereview.chromium.org/2254433003/diff/60001/components/network_time... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/60001/components/network_time... components/network_time/network_time_tracker.cc:325: return NETWORK_TIME_SYNC_LOST; Your comment makes me wonder if it's worth distinguishing the two cases of NETWORK_TIME_SYNC_LOST. You could even distinguish 3 cases, if you wanted: - wall clock ran backwards - wall clock is ahead of tick clock - wall clock is behind tick clock Though maybe that's redundant enough with the histograms that it's not worth bothering. Totally up to you, of course, but it seems like you're aiming for the instrumentation to end instrumentation, so I thought I'd mention it. :-)
On 2016/08/19 20:37:36, Steven Holte wrote: > On 2016/08/19 20:09:55, Steven Holte wrote: > > I think Reitveld ate my comment, but basically look at > UMA_HISTOGRAM_CUSTOM_TIMES and macros that reuse it. > Pick a min and max time that are reasonable boundaries one the times that you > care about the distinction between, so if you don't care about subsecond times, > use a min of 1 second, and if all you need to know about a time is that it was > longer than day, use a max of 1 day. The buckets will spaced on a log scale, so > you'll have about the same number of buckets between 1-100 seconds as 100 - > 10000. The cost of the histogram is proportional to the number of buckets, and > 50 is the most common number used. Thanks! I forgot to mention, one complication is that the times I'm recording are differences, so they can be negative (if a user reset their clock, for example) or positive. It seems like UMA_HISTOGRAM_CUSTOM_TIMES might not be designed for that case. Is there any conventional way to handle this? Maybe I should just split up each histogram into two histograms, one for negative values and one for positive, and use CUSTOM_TIMES for each of those?
On 2016/08/20 16:40:22, estark wrote: > On 2016/08/19 20:37:36, Steven Holte wrote: > > On 2016/08/19 20:09:55, Steven Holte wrote: > > > > I think Reitveld ate my comment, but basically look at > > UMA_HISTOGRAM_CUSTOM_TIMES and macros that reuse it. > > Pick a min and max time that are reasonable boundaries one the times that you > > care about the distinction between, so if you don't care about subsecond > times, > > use a min of 1 second, and if all you need to know about a time is that it was > > longer than day, use a max of 1 day. The buckets will spaced on a log scale, > so > > you'll have about the same number of buckets between 1-100 seconds as 100 - > > 10000. The cost of the histogram is proportional to the number of buckets, > and > > 50 is the most common number used. > > Thanks! I forgot to mention, one complication is that the times I'm recording > are differences, so they can be negative (if a user reset their clock, for > example) or positive. It seems like UMA_HISTOGRAM_CUSTOM_TIMES might not be > designed for that case. Is there any conventional way to handle this? Maybe I > should just split up each histogram into two histograms, one for negative values > and one for positive, and use CUSTOM_TIMES for each of those? Friendly ping to holte.
On 2016/08/24 23:11:45, estark wrote: > On 2016/08/20 16:40:22, estark wrote: > > On 2016/08/19 20:37:36, Steven Holte wrote: > > > On 2016/08/19 20:09:55, Steven Holte wrote: > > > > > > I think Reitveld ate my comment, but basically look at > > > UMA_HISTOGRAM_CUSTOM_TIMES and macros that reuse it. > > > Pick a min and max time that are reasonable boundaries one the times that > you > > > care about the distinction between, so if you don't care about subsecond > > times, > > > use a min of 1 second, and if all you need to know about a time is that it > was > > > longer than day, use a max of 1 day. The buckets will spaced on a log > scale, > > so > > > you'll have about the same number of buckets between 1-100 seconds as 100 - > > > 10000. The cost of the histogram is proportional to the number of buckets, > > and > > > 50 is the most common number used. > > > > Thanks! I forgot to mention, one complication is that the times I'm recording > > are differences, so they can be negative (if a user reset their clock, for > > example) or positive. It seems like UMA_HISTOGRAM_CUSTOM_TIMES might not be > > designed for that case. Is there any conventional way to handle this? Maybe I > > should just split up each histogram into two histograms, one for negative > values > > and one for positive, and use CUSTOM_TIMES for each of those? > > Friendly ping to holte. Sorry for the slow reply, I missed your question somehow. Yes, splitting into two histograms is preferred if you have both negative and positive values. This is a reasonably common pattern, e.g. Extensions.ResourceLastModified{Negative}Delta or Navigation.Start.RendererBrowserDifference.{Positive,Negative}
On 2016/08/25 01:28:31, Steven Holte wrote: > On 2016/08/24 23:11:45, estark wrote: > > On 2016/08/20 16:40:22, estark wrote: > > > On 2016/08/19 20:37:36, Steven Holte wrote: > > > > On 2016/08/19 20:09:55, Steven Holte wrote: > > > > > > > > I think Reitveld ate my comment, but basically look at > > > > UMA_HISTOGRAM_CUSTOM_TIMES and macros that reuse it. > > > > Pick a min and max time that are reasonable boundaries one the times that > > you > > > > care about the distinction between, so if you don't care about subsecond > > > times, > > > > use a min of 1 second, and if all you need to know about a time is that it > > was > > > > longer than day, use a max of 1 day. The buckets will spaced on a log > > scale, > > > so > > > > you'll have about the same number of buckets between 1-100 seconds as 100 > - > > > > 10000. The cost of the histogram is proportional to the number of > buckets, > > > and > > > > 50 is the most common number used. > > > > > > Thanks! I forgot to mention, one complication is that the times I'm > recording > > > are differences, so they can be negative (if a user reset their clock, for > > > example) or positive. It seems like UMA_HISTOGRAM_CUSTOM_TIMES might not be > > > designed for that case. Is there any conventional way to handle this? Maybe > I > > > should just split up each histogram into two histograms, one for negative > > values > > > and one for positive, and use CUSTOM_TIMES for each of those? > > > > Friendly ping to holte. > > Sorry for the slow reply, I missed your question somehow. Yes, splitting into > two histograms is preferred if you have both negative and positive values. This > is a reasonably common pattern, e.g. > Extensions.ResourceLastModified{Negative}Delta or > Navigation.Start.RendererBrowserDifference.{Positive,Negative} Ok, thanks -- PTAL.
Friendly ping to holte
lgtm
One note though https://codereview.chromium.org/2254433003/diff/120001/components/network_tim... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/120001/components/network_tim... components/network_time/network_time_tracker.cc:340: base::TimeDelta::FromSeconds(kClockDivergenceSeconds), You may want to be a bit careful here, since even if you change kClockDiveregenceSeconds later you should not change it here, so that buckets do not change and you would have incompatible data from old and new clients.
The CQ bit was checked by estark@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mab@google.com, thestig@chromium.org, zea@chromium.org Link to the patchset: https://codereview.chromium.org/2254433003/#ps120001 (title: "holte comments")
The CQ bit was unchecked by estark@chromium.org
https://codereview.chromium.org/2254433003/diff/120001/components/network_tim... File components/network_time/network_time_tracker.cc (right): https://codereview.chromium.org/2254433003/diff/120001/components/network_tim... components/network_time/network_time_tracker.cc:340: base::TimeDelta::FromSeconds(kClockDivergenceSeconds), On 2016/08/29 22:22:59, Steven Holte wrote: > You may want to be a bit careful here, since even if you change > kClockDiveregenceSeconds later you should not change it here, so that buckets do > not change and you would have incompatible data from old and new clients. Ah, thanks. I think I'll just change this hard-code the constant value 60, with a comment. I anticipate that we might want to raise the value of |kClockDivergenceSeconds| so that way we'll have room to do that.
The CQ bit was checked by estark@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thestig@chromium.org, zea@chromium.org, mab@google.com, holte@chromium.org Link to the patchset: https://codereview.chromium.org/2254433003/#ps140001 (title: "don't use kClockDivergenceSeconds in histogram buckets")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by estark@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from thestig@chromium.org, zea@chromium.org, mab@google.com, holte@chromium.org Link to the patchset: https://codereview.chromium.org/2254433003/#ps160001 (title: "fix maximums to not overflow, and fix unit tests")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Message was sent while issue was closed.
Description was changed from ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network time histogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= ========== to ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network time histogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= ==========
Message was sent while issue was closed.
Committed patchset #9 (id:160001)
Message was sent while issue was closed.
Description was changed from ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network time histogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= ========== to ========== When network time is unavailable, record the reason in UMA Network time is frequently unavailable when we hit certificate date errors. This CL histograms additional information about why network time is unavailable (specifically, whether no time query has been made or whether the system clock lost sync since the last time query). To do so, I deprecated the old network time histogram that shared an enum with the build time histogram, and introduced a new network time histogram with a network time enum (rather than adding more network-specific values to the enum that is shared with the build time histogram). BUG= Committed: https://crrev.com/62a431efd87c068af27f689f7f6ef0fa7b44ffe5 Cr-Commit-Position: refs/heads/master@{#415244} ==========
Message was sent while issue was closed.
Patchset 9 (id:??) landed as https://crrev.com/62a431efd87c068af27f689f7f6ef0fa7b44ffe5 Cr-Commit-Position: refs/heads/master@{#415244} |