|
|
Chromium Code Reviews|
Created:
3 years, 7 months ago by manzagop (departed) Modified:
3 years, 7 months ago CC:
chromium-reviews, asvitkine+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionQuantify instability according to the stability instrumentation
Crash reporting based on the stability instrumentation does not provide
an accurate idea of the magnitude of issues because of report throttling.
This CL introduces user metrics to provide a more accurate idea
of the magnitude. Note however that the metrics will be affected by
version smear: the collecting Chrome's version may be different from the
unclean one's.
Details:
- UncleanShutdownCount is an estimate of unclean shutdowns as derived
from the stability instrumentation. Only successfully parsed stability
files contribute to this count.
- UncleanSystemCount is an estimate of unclean shutdowns attributable to
system instability. The number may be underestimated because of analysis
failure or overestimated because of the looseness in detection (Chrome
might not be running when the system suffers instability).
BUG=620813
Review-Url: https://codereview.chromium.org/2883103002
Cr-Commit-Position: refs/heads/master@{#473318}
Committed: https://chromium.googlesource.com/chromium/src/+/2c8986a0628c40c2c78175723454e0bd5f4b7abc
Patch Set 1 #
Total comments: 8
Patch Set 2 : address Siggi's comments #
Total comments: 8
Patch Set 3 : Siggi round 2 #
Total comments: 2
Patch Set 4 : Siggi round 3 #
Total comments: 2
Patch Set 5 : final nit #
Messages
Total messages: 37 (23 generated)
Description was changed from ========== Quantify instability according to the stability instrumentation Crash reporting based on the stability instrumentation does not provide an accurate idea of the magnitude of issues because of report throttling. This CL introduces some user metrics that will provide an accurate idea of the magnitude. Note however that the metrics will be affected by version smear: the collecting Chrome's version may be different from the unclean one's. Details: - UncleanShutdownCount is an estimate of unclean shutdowns as derived from the stability instrumentation. Only successfully parsed stability files contribute to this count. - UncleanSystemCount is an estimate of unclean shutdowns attributable to system instability. The number may be underestimated because of analysis failure or overestimated because of the looseness in detection (Chrome might not be running when the system suffers instability). BUG=620813 ========== to ========== Quantify instability according to the stability instrumentation Crash reporting based on the stability instrumentation does not provide an accurate idea of the magnitude of issues because of report throttling. This CL introduces user metrics to provide a more accurate idea of the magnitude. Note however that the metrics will be affected by version smear: the collecting Chrome's version may be different from the unclean one's. Details: - UncleanShutdownCount is an estimate of unclean shutdowns as derived from the stability instrumentation. Only successfully parsed stability files contribute to this count. - UncleanSystemCount is an estimate of unclean shutdowns attributable to system instability. The number may be underestimated because of analysis failure or overestimated because of the looseness in detection (Chrome might not be running when the system suffers instability). BUG=620813 ==========
manzagop@chromium.org changed reviewers: + siggi@chromium.org
Hi Siggi, Could you have a quick look at this? Thanks, Pierre
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
nice! https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.cc:79: unclean_system_cnt_ = 0; looks like rather than a member variable, you want a local that's passed into CollectAndSubmitOneReport? https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.cc:110: UMA_STABILITY_HISTOGRAM_COUNTS_100( is it possible to test these metrics - it'd be pretty sweet to have metrics we can rely on ;). https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.h:113: mutable int unclean_system_cnt_; This doesn't have to be mutable, from what I can see. Also perhaps document what this is and when it's valid? https://codereview.chromium.org/2883103002/diff/1/tools/metrics/histograms/hi... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2883103002/diff/1/tools/metrics/histograms/hi... tools/metrics/histograms/histograms.xml:437: + Number of unclean shutdowns that can potentially be attributed to system This should be less than or equal to UCS?
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
Thanks! Another look? https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.cc:79: unclean_system_cnt_ = 0; On 2017/05/15 20:18:04, Sigurður Ásgeirsson wrote: > looks like rather than a member variable, you want a local that's passed into > CollectAndSubmitOneReport? Yes! Done. https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.cc:110: UMA_STABILITY_HISTOGRAM_COUNTS_100( On 2017/05/15 20:18:04, Sigurður Ásgeirsson wrote: > is it possible to test these metrics - it'd be pretty sweet to have metrics we > can rely on ;). I've added some histogram testing. https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/1/components/browser_watcher/... components/browser_watcher/postmortem_report_collector.h:113: mutable int unclean_system_cnt_; On 2017/05/15 20:18:04, Sigurður Ásgeirsson wrote: > This doesn't have to be mutable, from what I can see. Also perhaps document what > this is and when it's valid? It was because RecordSystemShutdownState is const. Removed the variable as you suggested in anther comment. https://codereview.chromium.org/2883103002/diff/1/tools/metrics/histograms/hi... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2883103002/diff/1/tools/metrics/histograms/hi... tools/metrics/histograms/histograms.xml:437: + Number of unclean shutdowns that can potentially be attributed to system On 2017/05/15 20:18:04, Sigurður Ásgeirsson wrote: > This should be less than or equal to UCS? Yes. Modified comment. Also noticed an inconsistency: the counter for this was incremented on system log analysis but the UCS counter was conditional on other things, such as successful deletion of the stability file. Deferred the increment until after we know collection is successful.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
nice - cool to see how easily the histogram collection is tested! https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector.cc:109: if (status == SUCCESS) yups, if the CollectOne function reports a bool, then your counting all comes to this point. https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:93: crashpad::CrashReportDatabase* report_database, A single session can only contribute a single count - right? So maybe the contract here is better with a bool out parameter, signifying whether THIS session was unclean. That way the success checking logic also comes to a single point, and doesn't need a whole lot of commenting? https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector_unittest.cc (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector_unittest.cc:262: RunBasicTest(true); ubernit: at the cost of a single additional line of code, you can make this so much more readable. Consider RecordOneSession(<true|false>); ValidateHistograms(...); https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector_unittest.cc:273: base::HistogramTester histogram_tester; do you need this in addition to the member variable?
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks! Another look? https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector.cc:109: if (status == SUCCESS) On 2017/05/18 12:45:03, Sigurður Ásgeirsson wrote: > yups, if the CollectOne function reports a bool, then your counting all comes to > this point. Done. https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:93: crashpad::CrashReportDatabase* report_database, On 2017/05/18 12:45:03, Sigurður Ásgeirsson wrote: > A single session can only contribute a single count - right? > So maybe the contract here is better with a bool out parameter, signifying > whether THIS session was unclean. > That way the success checking logic also comes to a single point, and doesn't > need a whole lot of commenting? Done. https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... File components/browser_watcher/postmortem_report_collector_unittest.cc (right): https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector_unittest.cc:262: RunBasicTest(true); On 2017/05/18 12:45:03, Sigurður Ásgeirsson wrote: > ubernit: at the cost of a single additional line of code, you can make this so > much more readable. > > Consider > > RecordOneSession(<true|false>); > ValidateHistograms(...); Done. https://codereview.chromium.org/2883103002/diff/20001/components/browser_watc... components/browser_watcher/postmortem_report_collector_unittest.cc:273: base::HistogramTester histogram_tester; On 2017/05/18 12:45:03, Sigurður Ásgeirsson wrote: > do you need this in addition to the member variable? Done.
https://codereview.chromium.org/2883103002/diff/40001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/40001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:95: int* unclean_system_cnt); sorry - what I meant that instead of passing an int* here, you can pass a bool*. As the count is incremented by zero or one, it's IMHO clearer to return a bool in the *OneReport function and have the counting done in the *AllReports function.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Thanks! PTAL. https://codereview.chromium.org/2883103002/diff/40001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2883103002/diff/40001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:95: int* unclean_system_cnt); On 2017/05/18 15:10:04, Sigurður Ásgeirsson wrote: > sorry - what I meant that instead of passing an int* here, you can pass a bool*. > As the count is incremented by zero or one, it's IMHO clearer to return a bool > in the *OneReport function and have the counting done in the *AllReports > function. > Oh, my bad! Done. Though I left the previous change.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm with but a little nit. https://codereview.chromium.org/2883103002/diff/60001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/60001/components/browser_watc... components/browser_watcher/postmortem_report_collector.cc:114: if (system_unclean) nit: put this inside the block, as that way you're not brittle to maintenance where this gets set on failure.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Thanks for the review! https://codereview.chromium.org/2883103002/diff/60001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2883103002/diff/60001/components/browser_watc... components/browser_watcher/postmortem_report_collector.cc:114: if (system_unclean) On 2017/05/18 16:38:18, Sigurður Ásgeirsson wrote: > nit: put this inside the block, as that way you're not brittle to maintenance > where this gets set on failure. Done.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
manzagop@chromium.org changed reviewers: + jwd@chromium.org
Hi Jesse! Could you have an OWNERS' look for tools\metrics\histograms\histograms.xml? Thanks, Pierre-Antoine
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
Thanks for the review!
The CQ bit was checked by manzagop@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from siggi@chromium.org Link to the patchset: https://codereview.chromium.org/2883103002/#ps80001 (title: "final nit")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 80001, "attempt_start_ts": 1495223395157780,
"parent_rev": "d01f2e3a7396302e91f56391f4f62e7d2cc0ff68", "commit_rev":
"2c8986a0628c40c2c78175723454e0bd5f4b7abc"}
Message was sent while issue was closed.
Description was changed from ========== Quantify instability according to the stability instrumentation Crash reporting based on the stability instrumentation does not provide an accurate idea of the magnitude of issues because of report throttling. This CL introduces user metrics to provide a more accurate idea of the magnitude. Note however that the metrics will be affected by version smear: the collecting Chrome's version may be different from the unclean one's. Details: - UncleanShutdownCount is an estimate of unclean shutdowns as derived from the stability instrumentation. Only successfully parsed stability files contribute to this count. - UncleanSystemCount is an estimate of unclean shutdowns attributable to system instability. The number may be underestimated because of analysis failure or overestimated because of the looseness in detection (Chrome might not be running when the system suffers instability). BUG=620813 ========== to ========== Quantify instability according to the stability instrumentation Crash reporting based on the stability instrumentation does not provide an accurate idea of the magnitude of issues because of report throttling. This CL introduces user metrics to provide a more accurate idea of the magnitude. Note however that the metrics will be affected by version smear: the collecting Chrome's version may be different from the unclean one's. Details: - UncleanShutdownCount is an estimate of unclean shutdowns as derived from the stability instrumentation. Only successfully parsed stability files contribute to this count. - UncleanSystemCount is an estimate of unclean shutdowns attributable to system instability. The number may be underestimated because of analysis failure or overestimated because of the looseness in detection (Chrome might not be running when the system suffers instability). BUG=620813 Review-Url: https://codereview.chromium.org/2883103002 Cr-Commit-Position: refs/heads/master@{#473318} Committed: https://chromium.googlesource.com/chromium/src/+/2c8986a0628c40c2c78175723454... ==========
Message was sent while issue was closed.
Committed patchset #5 (id:80001) as https://chromium.googlesource.com/chromium/src/+/2c8986a0628c40c2c78175723454... |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
