|
|
Chromium Code Reviews|
Created:
3 years, 7 months ago by altimin Modified:
3 years, 7 months ago CC:
chromium-reviews, danakj+watch_chromium.org, asvitkine+watch_chromium.org, vmpstr+watch_chromium.org, Alexei Svitkine (slow), bcwhite, Steven Holte, skare_ Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
Description[histogram] Make histograms more resistant to overflows.
Instead of calculating deltas from cumulative snapshots, store unlogged
histogram samples in a separate HistogramSample object.
This will reduce overflows from long-running sessions given that data
between UMA pings fits into 32-bit integer.
BUG=719446
Review-Url: https://codereview.chromium.org/2867303004
Cr-Commit-Position: refs/heads/master@{#471359}
Committed: https://chromium.googlesource.com/chromium/src/+/498c838a685f07dda12f74620f9a13ae0cf62383
Patch Set 1 #
Total comments: 10
Patch Set 2 : addressed comments from isherman@ #
Total comments: 4
Patch Set 3 : addressed comments from bcwhite@ #
Total comments: 16
Patch Set 4 : addressed comments from isherman@ and asvitkine@ #Patch Set 5 : fix unittest compilation #
Total comments: 5
Patch Set 6 : addressed comments #
Total comments: 1
Patch Set 7 : final fix #
Total comments: 6
Patch Set 8 : final final fix #
Messages
Total messages: 53 (35 generated)
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
altimin@chromium.org changed reviewers: + isherman@chromium.org
PTAL
isherman@chromium.org changed reviewers: + asvitkine@chromium.org
Thanks! I'm adding Alexei as a reviewer as well -- this code is subtle enough that I'd like to have an extra pair of eyes on it. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:463: logged_samples_->Add(*snapshot); I'm noticing that the samples in this vector could still overflow. Might that be an issue? Please audit code that ends up reading values from this vector to evaluate. I'm not worried about the about:histograms UI, nor test code, since overflow is both (a) unexpected and (b) not a big deal in those cases. But I'm not sure whether there are perhaps other parts of the code that might break from overflow. (I do realize that your CL doesn't directly make this situation worse. However, it does encourage people to author histograms that might overflow, so it might have an impact in practice.) https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:515: logged_samples_.reset(new SampleVector(HashMetricName(name), ranges)); nit: Could you avoid the second HashMetricName call? https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:598: std::unique_ptr<SampleVector> Histogram::CloneSampleVector( nit: I'd name this SnapshotUnloggedSamples, and drop the param, since the param is always the same. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... base/metrics/histogram_unittest.cc:404: TEST_P(HistogramTest, AddCount_LargeCountsDontOverflow) { Please add a comment documenting what this is intending to test. In particular, it should be clear to someone reading this test that ten calls to AddCount(42, 1 << 30) without an intervening call to SnapshotDelta() should be expected to overflow. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... base/metrics/histogram_unittest.cc:407: Histogram::FactoryGet("AddCountHistogram", 10, 1000000000, kBucketCount, nit: Do you need such a large max bucket value? It's a bit hard to read and distracting from the rest of the test.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:463: logged_samples_->Add(*snapshot); On 2017/05/09 22:15:32, Ilya Sherman wrote: > I'm noticing that the samples in this vector could still overflow. Might that > be an issue? Please audit code that ends up reading values from this vector to > evaluate. I'm not worried about the about:histograms UI, nor test code, since > overflow is both (a) unexpected and (b) not a big deal in those cases. But I'm > not sure whether there are perhaps other parts of the code that might break from > overflow. (I do realize that your CL doesn't directly make this situation > worse. However, it does encourage people to author histograms that might > overflow, so it might have an impact in practice.) I can see three call sites to Histogram::SnapshotSamples: * SnapshotHistogramToValue in task_scheduler_internals - Snapshots data for the UI. * LogAggregatedMetrics in components/metrics/net - Used for network errors. No real danger of overflowing here. * OnSetUMACallback in tracing. - This does not look at count, only at values. But I think we should add a comment to HistogramBase::SnapshotSamples with a warning. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:515: logged_samples_.reset(new SampleVector(HashMetricName(name), ranges)); On 2017/05/09 22:15:32, Ilya Sherman wrote: > nit: Could you avoid the second HashMetricName call? Done. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram.cc#n... base/metrics/histogram.cc:598: std::unique_ptr<SampleVector> Histogram::CloneSampleVector( On 2017/05/09 22:15:32, Ilya Sherman wrote: > nit: I'd name this SnapshotUnloggedSamples, and drop the param, since the param > is always the same. Nice, thanks! Done. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... base/metrics/histogram_unittest.cc:404: TEST_P(HistogramTest, AddCount_LargeCountsDontOverflow) { On 2017/05/09 22:15:32, Ilya Sherman wrote: > Please add a comment documenting what this is intending to test. In particular, > it should be clear to someone reading this test that ten calls to AddCount(42, 1 > << 30) without an intervening call to SnapshotDelta() should be expected to > overflow. Done. https://codereview.chromium.org/2867303004/diff/1/base/metrics/histogram_unit... base/metrics/histogram_unittest.cc:407: Histogram::FactoryGet("AddCountHistogram", 10, 1000000000, kBucketCount, On 2017/05/09 22:15:32, Ilya Sherman wrote: > nit: Do you need such a large max bucket value? It's a bit hard to read and > distracting from the rest of the test. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: cast_shell_linux on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/cast_shell_linu...)
bcwhite@chromium.org changed reviewers: + bcwhite@chromium.org
Please do the same for SparseHistogram. https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:514: logged_samples_.reset(new SampleVector(unlogged_samples_->id(), ranges)); Historical Note: This was avoided in the past so as to not allocate a large "logged" array that might never be used. Now, the array is managed internally to the SampleVector object and only allocated when 2 different samples are saved; so not a problem. https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram_... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram_... base/metrics/histogram_unittest.cc:409: "AddCountHistogram", 10, 50, kBucketCount, HistogramBase::kNoFlags)); You have 50 buckets for a range of 40.
PTAL. SparseHistogram is changed now too. https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram.... base/metrics/histogram.cc:514: logged_samples_.reset(new SampleVector(unlogged_samples_->id(), ranges)); On 2017/05/10 12:40:33, bcwhite wrote: > Historical Note: This was avoided in the past so as to not allocate a large > "logged" array that might never be used. Now, the array is managed internally > to the SampleVector object and only allocated when 2 different samples are > saved; so not a problem. Acknowledged. https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram_... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/20001/base/metrics/histogram_... base/metrics/histogram_unittest.cc:409: "AddCountHistogram", 10, 50, kBucketCount, HistogramBase::kNoFlags)); On 2017/05/10 12:40:33, bcwhite wrote: > You have 50 buckets for a range of 40. Done.
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:461: unlogged_samples_->Subtract(*snapshot); So logic difference here was that before it was: 1. Take snapshot. 2. Subtract from snapshot logged samples 3. Add to logged samples the snapshot 4. Return snapshot And now it's: 1. Take unlogged snapshot 2. Subtract from unlogged_samples the snapshot. 3. Add to logged_samples the snapshot 4. Return snapshot This has the effect that unlogged_samples gets modified (via Subtract()) while it may be being updated from other threads that may be logging histograms. Maybe, this is OK - but it's a difference from before, where this code was only *reading* that state and not modifying it (it was *modifying* logged_samples but no one else was changing that behind the scenes). So this makes me a bit worried about concurrent modifications around here. At the very least, we should have a comment here discussing why this is OK. The comment could describe what happens under those scenarios and why the end result is still safe. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:599: base::MakeUnique<SampleVector>(unlogged_samples_->id(), bucket_ranges()); Nit: base::MakeUnique doesn't seem to add much here since it's already declaring the type of |samples| so I'd just use the previous syntax - passing new to the std::unique_ptr ctor. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.h:268: std::unique_ptr<SampleVector> SnapshotSampleVector() const; While you're here, probably worth renaming this and expanding the comment to be more clear about what it returns (i.e. both logged and unlogged samples together). https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.h:312: // Accumulation of all samples that have logged with SnapshotDelta(). Nit: "that have been logged" or "that were logged"
Thanks! LGTM % nits and other reviewer's comments. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram_... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram_... base/metrics/histogram_unittest.cc:405: // even when a total count (returned by Histogram::SnapshotSample) does. nit: I'd phrase this comment as something like: "Some metrics are designed so that they are guaranteed not to overflow between snapshots, but could overflow over a long-running session. Make sure that ..." https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... File base/metrics/sparse_histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... base/metrics/sparse_histogram.cc:88: return logged_samples_->id(); nit: Why logged_ rather than unlogged_ here? It looks like they actually differ by 1, so it should matter which one is chosen. I take it there aren't any tests that verify this behavior is correct? (Or am I confused about something?)
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
PTAL https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:461: unlogged_samples_->Subtract(*snapshot); On 2017/05/10 16:55:38, Alexei Svitkine (slow) wrote: > So logic difference here was that before it was: > 1. Take snapshot. > 2. Subtract from snapshot logged samples > 3. Add to logged samples the snapshot > 4. Return snapshot > > And now it's: > 1. Take unlogged snapshot > 2. Subtract from unlogged_samples the snapshot. > 3. Add to logged_samples the snapshot > 4. Return snapshot > > This has the effect that unlogged_samples gets modified (via Subtract()) while > it may be being updated from other threads that may be logging histograms. > Maybe, this is OK - but it's a difference from before, where this code was only > *reading* that state and not modifying it (it was *modifying* logged_samples but > no one else was changing that behind the scenes). > > So this makes me a bit worried about concurrent modifications around here. At > the very least, we should have a comment here discussing why this is OK. The > comment could describe what happens under those scenarios and why the end result > is still safe. Please take a look. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:599: base::MakeUnique<SampleVector>(unlogged_samples_->id(), bucket_ranges()); On 2017/05/10 16:55:38, Alexei Svitkine (slow) wrote: > Nit: base::MakeUnique doesn't seem to add much here since it's already declaring > the type of |samples| so I'd just use the previous syntax - passing new to the > std::unique_ptr ctor. Done. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.h:268: std::unique_ptr<SampleVector> SnapshotSampleVector() const; On 2017/05/10 16:55:38, Alexei Svitkine (slow) wrote: > While you're here, probably worth renaming this and expanding the comment to be > more clear about what it returns (i.e. both logged and unlogged samples > together). Done. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.h:312: // Accumulation of all samples that have logged with SnapshotDelta(). On 2017/05/10 16:55:38, Alexei Svitkine (slow) wrote: > Nit: "that have been logged" or "that were logged" Done. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram_... File base/metrics/histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram_... base/metrics/histogram_unittest.cc:405: // even when a total count (returned by Histogram::SnapshotSample) does. On 2017/05/11 00:42:34, Ilya Sherman wrote: > nit: I'd phrase this comment as something like: > > "Some metrics are designed so that they are guaranteed not to overflow between > snapshots, but could overflow over a long-running session. Make sure that ..." Done. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... File base/metrics/sparse_histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... base/metrics/sparse_histogram.cc:88: return logged_samples_->id(); On 2017/05/11 00:42:34, Ilya Sherman wrote: > nit: Why logged_ rather than unlogged_ here? It looks like they actually differ > by 1, so it should matter which one is chosen. I take it there aren't any tests > that verify this behavior is correct? (Or am I confused about something?) Yes, you're correct. Note that this behaviour is very confusing (at least for the reader unfamiliar with this code) and it probably warrants a comments from someone who understands what's going on.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...)
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... File base/metrics/sparse_histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... base/metrics/sparse_histogram.cc:88: return logged_samples_->id(); On 2017/05/11 12:38:33, altimin wrote: > On 2017/05/11 00:42:34, Ilya Sherman wrote: > > nit: Why logged_ rather than unlogged_ here? It looks like they actually > differ > > by 1, so it should matter which one is chosen. I take it there aren't any > tests > > that verify this behavior is correct? (Or am I confused about something?) > > Yes, you're correct. Note that this behaviour is very confusing (at least for > the reader unfamiliar with this code) and it probably warrants a comments from > someone who understands what's going on. I think it should be unlogged_samples to match previous behavior. Otherwise, this is a change in behavior. Since what was previously samples is now unlogged_samples. I believe they need different ids because ids need to be unique and are what are stored in the persistent allocator. However, name_hash() on the histogram object is what is used to report to UMA - i.e. it's a cached version of HashMetricName(name). So this looks like to be breaking that. Given no tests failed for this, please add a test for this in this CL since you would otherwise be breaking this. The test should verify that value returned by name_hash() corresponds to HashMetricName(hist_name). https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.... base/metrics/histogram.cc:466: // to be reported by the next call to SnapshotDelta. Maybe call out the function names where the atomicity happens? e.g. for Subtract(), it's in SampleVectorBase::AddSubtractImpl(). Maybe also worth discussing why it's OK with the single bucket optimization (where we don't allocate the full counts array until a 2nd bucket is needed). Basically, discuss the case where we can go from single bucket to full counts array in the middle of any of these operations (including in the middle of Subtract) - and clearly point out why that's correct. bcwhite@ can answer questions about how that works. https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.... base/metrics/histogram.h:268: // Implementation of SnapshotSamples method with correct type Nit: Wrapping is off - some of the words in line below should fit on this line.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:461: unlogged_samples_->Subtract(*snapshot); > So logic difference here was that before it was: > 1. Take snapshot. > 2. Subtract from snapshot logged samples > 3. Add to logged samples the snapshot > 4. Return snapshot > > And now it's: > 1. Take unlogged snapshot > 2. Subtract from unlogged_samples the snapshot. > 3. Add to logged_samples the snapshot > 4. Return snapshot > > This has the effect that unlogged_samples gets modified (via Subtract()) while > it may be being updated from other threads that may be logging histograms. > Maybe, this is OK - but it's a difference from before, where this code was only > *reading* that state and not modifying it (it was *modifying* logged_samples but > no one else was changing that behind the scenes). It's okay in that everything is still "eventually consistent". There is a brief moment between lines 461 and 462 where the total count (logged + unlogged) has gone down should a total be taken in parallel by about://histograms or whatever. The sample counts themselves are protected by the atomic add/sub operations. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/histogram.... base/metrics/histogram.cc:599: base::MakeUnique<SampleVector>(unlogged_samples_->id(), bucket_ranges()); On 2017/05/11 12:38:33, altimin wrote: > On 2017/05/10 16:55:38, Alexei Svitkine (slow) wrote: > > Nit: base::MakeUnique doesn't seem to add much here since it's already > declaring > > the type of |samples| so I'd just use the previous syntax - passing new to the > > std::unique_ptr ctor. > > Done. I've been told that its desirable to always use MakeUnique if possible as it means someone doing a manual audit of unique_ptr+new invocations doesn't have to check it. https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... File base/metrics/sparse_histogram.cc (right): https://codereview.chromium.org/2867303004/diff/40001/base/metrics/sparse_his... base/metrics/sparse_histogram.cc:88: return logged_samples_->id(); Yes, using logged_samples ID will cause breakage and mismatches. The comment on line 188 explains why the IDs differ. I have a plan (no timeline, just a plan) to change this to holding both samples in the same allocation rather than the existing separate allocations. That'll mean only one ID and prevent another possible problem should it be possible to allocate only one due to a full allocator. https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.... base/metrics/histogram.cc:466: // to be reported by the next call to SnapshotDelta. On 2017/05/11 13:54:35, Alexei Svitkine (slow) wrote: > Maybe call out the function names where the atomicity happens? > > e.g. for Subtract(), it's in SampleVectorBase::AddSubtractImpl(). > > Maybe also worth discussing why it's OK with the single bucket optimization > (where we don't allocate the full counts array until a 2nd bucket is needed). > Basically, discuss the case where we can go from single bucket to full counts > array in the middle of any of these operations (including in the middle of > Subtract) - and clearly point out why that's correct. > > bcwhite@ can answer questions about how that works. Basically, every individual operation is atomic but there's no synchronization between them, hence the "eventually consistent" design. You might read 5 but have it (externally) incremented to 50 before that count is removed and dropping it to 45. The snapshot sum/count could also be off because of timing between updates and reads. We just live with that. The single-bucket optimization is contained in SampleVector and its transition from one-sample to many-sample is also atomic and "eventually consistent". At the SampleVectorBase API, there is no apparent difference.
PTAL https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.... base/metrics/histogram.cc:466: // to be reported by the next call to SnapshotDelta. On 2017/05/11 13:54:35, Alexei Svitkine (slow) wrote: > Maybe call out the function names where the atomicity happens? > > e.g. for Subtract(), it's in SampleVectorBase::AddSubtractImpl(). > > Maybe also worth discussing why it's OK with the single bucket optimization > (where we don't allocate the full counts array until a 2nd bucket is needed). > Basically, discuss the case where we can go from single bucket to full counts > array in the middle of any of these operations (including in the middle of > Subtract) - and clearly point out why that's correct. > > bcwhite@ can answer questions about how that works. I think that does not belong here. We should mention that SampleVector::Subtract and SampleVector::Accumulate are thread-safe and maybe add a comment in SampleVector explaining the details. https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/80001/base/metrics/histogram.... base/metrics/histogram.h:268: // Implementation of SnapshotSamples method with correct type On 2017/05/11 13:54:35, Alexei Svitkine (slow) wrote: > Nit: Wrapping is off - some of the words in line below should fit on this line. Done.
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm https://codereview.chromium.org/2867303004/diff/100001/base/metrics/sparse_hi... File base/metrics/sparse_histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/100001/base/metrics/sparse_hi... base/metrics/sparse_histogram_unittest.cc:388: const char* name = "TestName"; Nit: static const char kName[] =
lgtm
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks! LGTM % comments on comments (and a nit): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram... base/metrics/histogram.cc:468: // the next call to SnapshotDelta. I find this comment quite confusing, as it is not clear to me from reading it what "thread-safe" and "atomic" are referring to. Here's my take at a suggested rephrasing. Feel free to refine further! Note: The code below has subtle thread-safety guarantees! All changes to the underlying SampleVectors use atomic integer operations, which guarantee eventual consistency, but do not guarantee full synchronization between different entries in the SampleVector. In particular, this means that concurrent updates to the histogram might result in the reported sum not matching the individual bucket counts; or there being some buckets that are logically updated "together", but end up being only partially updated when a snapshot is captured. Note that this is why it's important to subtract exactly the snapshotted unlogged samples, rather than simply resetting the vector: this way, the next snapshot will include any concurrent updates missed by the current snapshot. https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram... base/metrics/histogram.h:268: // Implementation of SnapshotSamples method with correct type for internal nit: s/correct type/a more specific type https://codereview.chromium.org/2867303004/diff/120001/base/metrics/sparse_hi... File base/metrics/sparse_histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/sparse_hi... base/metrics/sparse_histogram_unittest.cc:388: const char* kName = "TestName"; nit: "const char kName[]"
The CQ bit was checked by altimin@chromium.org to run a CQ dry run
https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram.cc File base/metrics/histogram.cc (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram... base/metrics/histogram.cc:468: // the next call to SnapshotDelta. On 2017/05/11 19:57:41, Ilya Sherman wrote: > I find this comment quite confusing, as it is not clear to me from reading it > what "thread-safe" and "atomic" are referring to. Here's my take at a suggested > rephrasing. Feel free to refine further! > > Note: The code below has subtle thread-safety guarantees! All changes to > the underlying SampleVectors use atomic integer operations, which guarantee > eventual consistency, but do not guarantee full synchronization between > different entries in the SampleVector. In particular, this means that > concurrent updates to the histogram might result in the reported sum not > matching the individual bucket counts; or there being some buckets that are > logically updated "together", but end up being only partially updated when > a snapshot is captured. Note that this is why it's important to subtract > exactly the snapshotted unlogged samples, rather than simply resetting the > vector: this way, the next snapshot will include any concurrent updates > missed by the current snapshot. Done. https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram.h File base/metrics/histogram.h (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/histogram... base/metrics/histogram.h:268: // Implementation of SnapshotSamples method with correct type for internal On 2017/05/11 19:57:41, Ilya Sherman wrote: > nit: s/correct type/a more specific type Done. https://codereview.chromium.org/2867303004/diff/120001/base/metrics/sparse_hi... File base/metrics/sparse_histogram_unittest.cc (right): https://codereview.chromium.org/2867303004/diff/120001/base/metrics/sparse_hi... base/metrics/sparse_histogram_unittest.cc:388: const char* kName = "TestName"; On 2017/05/11 19:57:41, Ilya Sherman wrote: > nit: "const char kName[]" Done.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
LGTM, thanks.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by altimin@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from bcwhite@chromium.org, asvitkine@chromium.org Link to the patchset: https://codereview.chromium.org/2867303004/#ps140001 (title: "final final fix")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 140001, "attempt_start_ts": 1494609891154320,
"parent_rev": "c19a67fae86a4247b53ccece5bd37eac5afa7edf", "commit_rev":
"498c838a685f07dda12f74620f9a13ae0cf62383"}
Message was sent while issue was closed.
Description was changed from ========== [histogram] Make histograms more resistant to overflows. Instead of calculating deltas from cumulative snapshots, store unlogged histogram samples in a separate HistogramSample object. This will reduce overflows from long-running sessions given that data between UMA pings fits into 32-bit integer. BUG=719446 ========== to ========== [histogram] Make histograms more resistant to overflows. Instead of calculating deltas from cumulative snapshots, store unlogged histogram samples in a separate HistogramSample object. This will reduce overflows from long-running sessions given that data between UMA pings fits into 32-bit integer. BUG=719446 Review-Url: https://codereview.chromium.org/2867303004 Cr-Commit-Position: refs/heads/master@{#471359} Committed: https://chromium.googlesource.com/chromium/src/+/498c838a685f07dda12f74620f9a... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://chromium.googlesource.com/chromium/src/+/498c838a685f07dda12f74620f9a... |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
