|
|
Created:
3 years, 6 months ago by manzagop (departed) Modified:
3 years, 6 months ago CC:
chromium-reviews, asvitkine+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionStability instrumentation: metrics for collection on crash
This CL revises adds metrics for stability instrumentation
collection on crash. The metrics can be used to estimate unclean
shutdown coverage in combination with postmortem collection (see
https://crrev.com/2883103002).
Additionally addresses a todo: sets the persitent allocator state to
deleted on crash.
BUG=620813
Review-Url: https://codereview.chromium.org/2910003002
Cr-Commit-Position: refs/heads/master@{#476319}
Committed: https://chromium.googlesource.com/chromium/src/+/47f2c48469be9345d1f9ab668d05212794767a0c
Patch Set 1 #
Total comments: 4
Patch Set 2 : Set pma state to deleted #
Total comments: 1
Patch Set 3 : Address comment #
Total comments: 11
Patch Set 4 : revise histograms #Patch Set 5 : missing files #Patch Set 6 : remove extra std::move #
Total comments: 2
Patch Set 7 : revise comment #Patch Set 8 : merge #
Total comments: 4
Patch Set 9 : address rkaplow comments #Messages
Total messages: 54 (31 generated)
manzagop@chromium.org changed reviewers: + siggi@chromium.org
Hi, This is needed for proper coverage estimation. Please have a look! P-A
Nice - I have questions on how best to record your metric. Are there other event counts that could be recorded in a common enum? https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... components/browser_watcher/stability_report_user_stream_data_source.cc:90: // TODO(manzagop): set the persistent allocator file's state to deleted in This seems like a potentially important gotcha here, as failure to delete will cause double-counting, no? Is this hard to do? Is there a metric for how often you encounter a nominally deleted file? https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... components/browser_watcher/stability_report_user_stream_data_source.cc:96: UMA_STABILITY_HISTOGRAM_COUNTS_100("ActivityTracker.CollectCrash.CrashCount", QQ: is this the best way to record this? I've seen enums used to record event counts before. They're not super-convenient to decipher, but have the advantage of being extensible to new events and presumably fairly dense in memory and on wire.
Made the change to set the allocator state. PTAL. https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... components/browser_watcher/stability_report_user_stream_data_source.cc:90: // TODO(manzagop): set the persistent allocator file's state to deleted in On 2017/05/30 13:46:37, Sigurður Ásgeirsson wrote: > This seems like a potentially important gotcha here, as failure to delete will > cause double-counting, no? Correct. Note currently we undercount, as crashes are only counted if we fail to delete on crash. > Is this hard to do? I thought I needed multi-process collection, but turns out no. Done. > Is there a metric for how often you encounter a nominally deleted file? Looks like it's 5% of the files (~60 out of 1100). The activity analyzer logs when files are encountered with the deleted state: ActivityTracker.Collect.AnalyzerCreationError https://cs.chromium.org/chromium/src/base/debug/activity_analyzer.cc?l=120 https://codereview.chromium.org/2910003002/diff/1/components/browser_watcher/... components/browser_watcher/stability_report_user_stream_data_source.cc:96: UMA_STABILITY_HISTOGRAM_COUNTS_100("ActivityTracker.CollectCrash.CrashCount", On 2017/05/30 13:46:37, Sigurður Ásgeirsson wrote: > QQ: is this the best way to record this? I've seen enums used to record event > counts before. They're not super-convenient to decipher, but have the advantage > of being extensible to new events and presumably fairly dense in memory and on > wire. I'd want the metric to be usable in a timeline formula for coverage estimation. Is that possible?
manzagop@chromium.org changed reviewers: + bcwhite@chromium.org
Hi Brian, This change makes it so the PMA state is set to deleted from the crashpad handler process, on crash. Could you have a look? Thanks! P-A
> > I'd want the metric to be usable in a timeline formula for coverage > estimation. Is that possible? > Yes, see e.g. go/crashpad-handler-lifetime-milestone and go/crashpad-handler-success-rate, both of which compute ratios on enumerated event counts. I believe you can also create formulas across histograms the same way, though I don't think I've ever had the occasion. -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
I'd have a chat with some UMA folks about how to best record these event counts. https://codereview.chromium.org/2910003002/diff/20001/components/browser_watc... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/20001/components/browser_watc... components/browser_watcher/stability_paths.cc:124: UMA_HISTOGRAM_BOOLEAN("ActivityTracker.Record.SetPmaDeletedSuccess", success); Yeah, this looks like another event count - I think it'd be better to consolidate these in an enumerated histogram.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Addressed comment. PTAL.
Btw, I checked that I'd be able to use the value in a timeline formula.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_paths.h (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.h:43: bool MarkStabilityFileDeleted(const base::FilePath& file_path); Under what conditions would it fail? https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_report_user_stream_data_source.cc:77: enum class CollectionEvent { Do you track failures? You could have a kAttempts that is counted every time and the kSuccess, etc. to indicate the things that happened within that attempt. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_report_user_stream_data_source.cc:78: kSuccess = 0, Omit the numbers and just say "do not insert or remove values because..."
couple of more nits... https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.cc:46: std::unique_ptr<MemoryMappedFile> mmfile(new MemoryMappedFile()); Does a MemoryMappedFile require that you allocate it from heap? If not, then why not a local variable? https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.cc:115: return; // TODO(manzagop): add a metric for this. is now an opportune moment for addressing this TODO?
https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.cc:46: std::unique_ptr<MemoryMappedFile> mmfile(new MemoryMappedFile()); On 2017/05/31 17:19:05, Sigurður Ásgeirsson wrote: > Does a MemoryMappedFile require that you allocate it from heap? If not, then why > not a local variable? Owership of the object is passed via std::move to the FilePersistentMemoryAllocator so has to be heap-based.
Just noticed that the CL description has speling[sic] problems.
Description was changed from ========== Stability intrumentation: quantify crash coverage This CL adds a histogram to count the number of crashes for which the stability instrumentation is successfully collected. This is a follow up to https://crrev.com/2883103002. BUG=620813 ========== to ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ==========
Description was changed from ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ========== to ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ==========
Description was changed from ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ========== to ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ==========
Patchset #4 (id:60001) has been deleted
Addressed comments. PTAL. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.cc:46: std::unique_ptr<MemoryMappedFile> mmfile(new MemoryMappedFile()); On 2017/05/31 17:19:05, Sigurður Ásgeirsson wrote: > Does a MemoryMappedFile require that you allocate it from heap? If not, then why > not a local variable? The allocator takes ownership of it at l.53. That said, the allocator doesn't need to be on heap. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.cc:115: return; // TODO(manzagop): add a metric for this. On 2017/05/31 17:19:05, Sigurður Ásgeirsson wrote: > is now an opportune moment for addressing this TODO? Done. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_paths.h (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_paths.h:43: bool MarkStabilityFileDeleted(const base::FilePath& file_path); On 2017/05/31 15:41:19, bcwhite wrote: > Under what conditions would it fail? No longer can! I've renamed the function and moved the error logging inside. The failure is if we can't get an allocator on the file and set the pma state. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_report_user_stream_data_source.cc:77: enum class CollectionEvent { On 2017/05/31 15:41:19, bcwhite wrote: > Do you track failures? You could have a kAttempts that is counted every time > and the kSuccess, etc. to indicate the things that happened within that attempt. I've added a bunch of states. Done. https://codereview.chromium.org/2910003002/diff/40001/components/browser_watc... components/browser_watcher/stability_report_user_stream_data_source.cc:78: kSuccess = 0, On 2017/05/31 15:41:19, bcwhite wrote: > Omit the numbers and just say "do not insert or remove values because..." Done.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I don't see any files defining LogCollectOnCrashEvent and the associated enum.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Oops! Now added.
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
lgtm https://codereview.chromium.org/2910003002/diff/120001/components/browser_wat... File components/browser_watcher/stability_paths.h (right): https://codereview.chromium.org/2910003002/diff/120001/components/browser_wat... components/browser_watcher/stability_paths.h:41: // Marks a stability file for deletion on crash. Meant for the crashpad handler. nit: This description seems pretty ambiguous and redundant to the name of the function. Maybe you want to mention that some of these functions log specific kinds of events?
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks for the reviews! https://codereview.chromium.org/2910003002/diff/120001/components/browser_wat... File components/browser_watcher/stability_paths.h (right): https://codereview.chromium.org/2910003002/diff/120001/components/browser_wat... components/browser_watcher/stability_paths.h:41: // Marks a stability file for deletion on crash. Meant for the crashpad handler. On 2017/06/01 12:50:45, Sigurður Ásgeirsson wrote: > nit: This description seems pretty ambiguous and redundant to the name of the > function. > > Maybe you want to mention that some of these functions log specific kinds of > events? +1 Hopefully better now. Done.
manzagop@chromium.org changed reviewers: + rkaplow@chromium.org
Hi Rob! Could you have an OWNERS' look at: chrome\browser\chrome_browser_field_trials_desktop.cc chrome\browser\metrics\chrome_metrics_service_client.cc tools\metrics\histograms\histograms.xml Thanks! P-A
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-device on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device/builds...) ios-device-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-device-xcode-...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...)
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... components/browser_watcher/stability_paths.cc:19: #include "base/metrics/histogram_macros.h" don't think this include is used https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... components/browser_watcher/stability_report_user_stream_data_source.cc:77: // TODO(manzagop): collection should factor in whether this is a true crash or very minor nit - Collection
Thanks for the review! https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... File components/browser_watcher/stability_paths.cc (right): https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... components/browser_watcher/stability_paths.cc:19: #include "base/metrics/histogram_macros.h" On 2017/06/01 14:53:48, rkaplow wrote: > don't think this include is used Done. https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... File components/browser_watcher/stability_report_user_stream_data_source.cc (right): https://codereview.chromium.org/2910003002/diff/160001/components/browser_wat... components/browser_watcher/stability_report_user_stream_data_source.cc:77: // TODO(manzagop): collection should factor in whether this is a true crash or On 2017/06/01 14:53:48, rkaplow wrote: > very minor nit - Collection Done.
The CQ bit was checked by manzagop@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from bcwhite@chromium.org, siggi@chromium.org, rkaplow@chromium.org Link to the patchset: https://codereview.chromium.org/2910003002/#ps180001 (title: "address rkaplow comments")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 180001, "attempt_start_ts": 1496329114647350, "parent_rev": "3461b02d0ae8b12326a02fe4a41e3c956600f4a5", "commit_rev": "47f2c48469be9345d1f9ab668d05212794767a0c"}
Message was sent while issue was closed.
Description was changed from ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 ========== to ========== Stability instrumentation: metrics for collection on crash This CL revises adds metrics for stability instrumentation collection on crash. The metrics can be used to estimate unclean shutdown coverage in combination with postmortem collection (see https://crrev.com/2883103002). Additionally addresses a todo: sets the persitent allocator state to deleted on crash. BUG=620813 Review-Url: https://codereview.chromium.org/2910003002 Cr-Commit-Position: refs/heads/master@{#476319} Committed: https://chromium.googlesource.com/chromium/src/+/47f2c48469be9345d1f9ab668d05... ==========
Message was sent while issue was closed.
Committed patchset #9 (id:180001) as https://chromium.googlesource.com/chromium/src/+/47f2c48469be9345d1f9ab668d05... |