|
|
Chromium Code Reviews|
Created:
3 years, 10 months ago by manzagop (departed) Modified:
3 years, 9 months ago CC:
chromium-reviews, bcwhite Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionBound the impact of system instability on chrome instability.
Add to stability reports whether the containing system session was clean. This information is derived in the following way:
- add a timestamp to the stability file when chrome starts
- when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context
The log inspection and analysis is in system_session_analyzer_win*, the rest is plumbing.
BUG=695987
Review-Url: https://codereview.chromium.org/2715903003
Cr-Commit-Position: refs/heads/master@{#455476}
Committed: https://chromium.googlesource.com/chromium/src/+/deb62bfb3b4c7c2b78d6b6205766d9cf10ca39a0
Patch Set 1 #Patch Set 2 : Wire it in #Patch Set 3 : record start timestamp, analysis metric, make log scrape lazy #
Total comments: 4
Patch Set 4 : wfh comments #Patch Set 5 : BUILD fixup #Patch Set 6 : style + size mismatches #Patch Set 7 : Don't inline complex destructors #
Total comments: 20
Patch Set 8 : Address siggi@ comments #
Total comments: 21
Patch Set 9 : Address Siggi's comments #
Total comments: 26
Patch Set 10 : Address Siggi's comments #
Total comments: 8
Patch Set 11 : Merge #Patch Set 12 : Address Siggi's comments #Patch Set 13 : Merge #Patch Set 14 : Merge #Patch Set 15 : Merge fixups #Messages
Total messages: 58 (37 generated)
Description was changed from ========== Bound the impact of system instability on chrome instability. Label stability reports as potentially explained by system instability if they correspond to an unclean system session. BUG=695987 ========== to ========== Bound the impact of system instability on chrome instability. Add to stability reports whether the containing system session was clean. This information is derived in the following way: - add a timestamp to the stability file when chrome starts - when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context BUG=695987 ==========
Description was changed from ========== Bound the impact of system instability on chrome instability. Add to stability reports whether the containing system session was clean. This information is derived in the following way: - add a timestamp to the stability file when chrome starts - when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context BUG=695987 ========== to ========== Bound the impact of system instability on chrome instability. Add to stability reports whether the containing system session was clean. This information is derived in the following way: - add a timestamp to the stability file when chrome starts - when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context The log inspection and analysis is in system_session_analyzer_win*, the rest is plumbing. BUG=695987 ==========
manzagop@chromium.org changed reviewers: + bcwhite@chromium.org
Hi Brian, (+Will, optional / FYI) This change is to look at the event log as a possible explanation for stability issues. Please have a look. Thanks! Pierre
driveby but Q. when is this analysis done and on which thread?
wfh@chromium.org changed reviewers: + wfh@chromium.org
also had some nits https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:109: void DetermineSystemSate(StabilityReport* report) const; typo https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... components/browser_watcher/system_session_analyzer_win.h:53: virtual Status IsSessionUnclean(base::Time timestamp); comment?
On 2017/02/27 22:36:44, Will Harris wrote: > driveby but Q. when is this analysis done and on which thread? (Heading out) - It's part of the metrics init tasks: https://cs.chromium.org/chromium/src/chrome/browser/metrics/chrome_metrics_se... - Note one of the function calls (IIRC EvtNext) asks for a timeout and I put 5 seconds.
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
manzagop@chromium.org changed reviewers: - bcwhite@chromium.org
Looks like Brian is OOO this week. Will: do you think you could take a look at this? If not I'd look for another reviewer. Thanks. https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... components/browser_watcher/postmortem_report_collector.h:109: void DetermineSystemSate(StabilityReport* report) const; On 2017/02/27 22:40:10, Will Harris wrote: > typo Done. https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/40001/components/browser_watc... components/browser_watcher/system_session_analyzer_win.h:53: virtual Status IsSessionUnclean(base::Time timestamp); On 2017/02/27 22:40:11, Will Harris wrote: > comment? Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_clang on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_clang/builds/...)
manzagop@chromium.org changed reviewers: + siggi@chromium.org
Hi Siggi, Brian is OOO this week. Could you have a look? Thanks!
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
nice, didn't review logic in great detail yet. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/BUILD.gn (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/BUILD.gn:101: static_library("system_session_analyzer") { can you glom all these static libraries into one? There's - I believe - a cost to each distinct library you add to the link, and there's no penalty for glomming things together that I can think of. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:21: const uint16_t kIdSessionStart = 6005U; do these come out of documentation somewhere? Maybe a link, or else a comment that these are from observation? https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:63: PLOG(ERROR) << "Failed to create render context."; DLOG, here and below. Also, the PLOG macros are useless, as they don't necessarily guarantee to grab the last error in time, plus they're brittle to maintenance. If you need the last error, explicitly grab it right after the call that caused the error. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:75: DWORD attribut_cnt = 2U; nit: attribut[e]_ https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:76: std::vector<EVT_VARIANT> buffer(attribut_cnt); running commentary through here would be nice, as this is an uncommonly used API. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:95: ULONGLONG time = buffer[1].FileTimeVal; ULARGE_INTEGER has a union to crack this? https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win_unittest.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:18: TEST(SystemSessionEventFetcherTest, TestRetrieval) { mmm - will this fail on every new test bot until it's rebooted? Maybe add a comment that this might happen for a totally fresh bot? https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:36: In Syzygy we use "Mock"XXX - is there a tradition to use "Fake"XXX? https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:58: fetcher->AddEvent({6005U, time}); nit: can we have declarative constants for the magic values here? https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/watcher_metrics_provider_win.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/watcher_metrics_provider_win.cc:273: std::unique_ptr<SystemSessionEventFetcher> fetcher( why can't these be simple locals? why the allocation? Are these things posting tasks and bouncing around threads?
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Addressed comments. Please have another look. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/BUILD.gn (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/BUILD.gn:101: static_library("system_session_analyzer") { On 2017/03/01 15:09:26, Sigurður Ásgeirsson wrote: > can you glom all these static libraries into one? There's - I believe - a cost > to each distinct library you add to the link, and there's no penalty for > glomming things together that I can think of. I think I got that notion from here: https://docs.google.com/presentation/d/15Zwb53JcncHfEwHpnG_PoIbbzQ3GQi_cpujYw... But your suggestion makes sense. I've merged :postmortem_minidump_writer and :system_session_analyzer into :postmortem_report_collector. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:21: const uint16_t kIdSessionStart = 6005U; On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > do these come out of documentation somewhere? Maybe a link, or else a comment > that these are from observation? Done. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:63: PLOG(ERROR) << "Failed to create render context."; On 2017/03/01 15:09:26, Sigurður Ásgeirsson wrote: > DLOG, here and below. Done. > Also, the PLOG macros are useless, as they don't necessarily guarantee to grab > the last error in time, plus they're brittle to maintenance. If you need the > last error, explicitly grab it right after the call that caused the error. Acknowledged. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:75: DWORD attribut_cnt = 2U; On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > nit: attribut[e]_ Done. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:76: std::vector<EVT_VARIANT> buffer(attribut_cnt); On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > running commentary through here would be nice, as this is an uncommonly used > API. Hopefully better. Done. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:95: ULONGLONG time = buffer[1].FileTimeVal; On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > ULARGE_INTEGER has a union to crack this? Done. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win_unittest.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:18: TEST(SystemSessionEventFetcherTest, TestRetrieval) { On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > mmm - will this fail on every new test bot until it's rebooted? > Maybe add a comment that this might happen for a totally fresh bot? Yeah, this will happen and will be annoying. I've commented and disabled the test. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:36: On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > In Syzygy we use "Mock"XXX - is there a tradition to use "Fake"XXX? Actually, this is a stub. Mocks validate interaction, fakes are lightweight implementations of the real thing and stubs returned canned responses. Changed to Stub*. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:58: fetcher->AddEvent({6005U, time}); On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > nit: can we have declarative constants for the magic values here? Done. https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... File components/browser_watcher/watcher_metrics_provider_win.cc (right): https://codereview.chromium.org/2715903003/diff/120001/components/browser_wat... components/browser_watcher/watcher_metrics_provider_win.cc:273: std::unique_ptr<SystemSessionEventFetcher> fetcher( On 2017/03/01 15:09:27, Sigurður Ásgeirsson wrote: > why can't these be simple locals? why the allocation? > Are these things posting tasks and bouncing around threads? No reason. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
nice - got some more thoughts. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:350: // Note: a single process is instrumented. this comment is a little opaque - what does "instrumented" mean here? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:395: void PostmortemReportCollector::DetermineSystemState( Record{Previous|System}Shutdown{State|Reason}? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:401: // An analasis status for metrics: did the analysis succeed? analasis->analysis? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:429: UMA_HISTOGRAM_ENUMERATION( isn't this the terrible caching version of the histograms? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.h:106: virtual CollectionStatus Collect(const base::FilePath& debug_state_file, Nit: It would help the poor reader if there were a short comment on each of these functions as to what they do. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:91: DWORD attribute_cnt = 2U; nit: const kAttributeCnt? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:110: info->event_time = ULongLongTimeToFileTime(buffer[1].FileTimeVal); ULLFileTimeToTime, maybe, as this returns base::Time? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:50: explicit SystemSessionAnalyzer(SystemSessionEventFetcher* fetcher); If you (can) rearrange initialization, maybe it makes sense to just pass in the map here? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:54: virtual Status IsSessionUnclean(base::Time timestamp); looks like initialization here is implicit. Is there a good reason for this, e.g. are you doing this because most of the time it'll never happen? In that case, maybe you could mention this? If not, perhaps it's better to make initialization explicit, so that you can record/report it explicitly? EnsureInitialized() is an option... https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win_unittest.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:25: // Note: disabling test as it requires the test host have at least 1 prior don't want to be a pest, but did you see this failing? I'd say it's better to have this test in place if it fails once a year, than not to have it. However, you'd want to leave a comment here saying "This test fails under <circumstances>."
Thanks! Another look? https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:350: // Note: a single process is instrumented. On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > this comment is a little opaque - what does "instrumented" mean here? Hopefully clearer. Done. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:395: void PostmortemReportCollector::DetermineSystemState( On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > Record{Previous|System}Shutdown{State|Reason}? Done. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:401: // An analasis status for metrics: did the analysis succeed? On 2017/03/02 21:52:57, Sigurður Ásgeirsson wrote: > analasis->analysis? Done. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:429: UMA_HISTOGRAM_ENUMERATION( On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > isn't this the terrible caching version of the histograms? FWIU this is the only version that has a shorthand (macro). Last time I had multiple uses of the macro for the same histogram. We ended up creating a function so the macro would be only used once per metric. Tell me what you think. If you feel strongly about it we should loop in metrics team, as I think most people use this without second thought. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.h (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.h:106: virtual CollectionStatus Collect(const base::FilePath& debug_state_file, On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > Nit: It would help the poor reader if there were a short comment on each of > these functions as to what they do. If you're ok with it, let's leave that for https://chromiumcodereview.appspot.com/2722223002/ https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:91: DWORD attribute_cnt = 2U; On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > nit: const kAttributeCnt? Done. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:110: info->event_time = ULongLongTimeToFileTime(buffer[1].FileTimeVal); On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > ULLFileTimeToTime, maybe, as this returns base::Time? Done. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:50: explicit SystemSessionAnalyzer(SystemSessionEventFetcher* fetcher); On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > If you (can) rearrange initialization, maybe it makes sense to just pass in the > map here? As commented below, using lazy initialization as most of the time the query is not necessary. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:54: virtual Status IsSessionUnclean(base::Time timestamp); On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > looks like initialization here is implicit. Is there a good reason for this, > e.g. are you doing this because most of the time it'll never happen? > In that case, maybe you could mention this? If not, perhaps it's better to make > initialization explicit, so that you can record/report it explicitly? > EnsureInitialized() is an option... That's correct: most of the time there will be no stability file and we won't need to query the system log. Expanded class comment. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win_unittest.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/system_session_analyzer_win_unittest.cc:25: // Note: disabling test as it requires the test host have at least 1 prior On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > don't want to be a pest, but did you see this failing? I'd say it's better to > have this test in place if it fails once a year, than not to have it. > However, you'd want to leave a comment here saying "This test fails under > <circumstances>." Ok. I wasn't sure how prevalent the failures might be. +1 to let's see if it's a problem. Done.
Phew - very nice. I have moar nits. https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2715903003/diff/140001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:429: UMA_HISTOGRAM_ENUMERATION( On 2017/03/06 15:37:40, manzagop wrote: > On 2017/03/02 21:52:58, Sigurður Ásgeirsson wrote: > > isn't this the terrible caching version of the histograms? > > FWIU this is the only version that has a shorthand (macro). > > Last time I had multiple uses of the macro for the same histogram. We ended up > creating a function so the macro would be only used once per metric. > > Tell me what you think. If you feel strongly about it we should loop in metrics > team, as I think most people use this without second thought. It irks me that this generates a whole lot of code for optimize for efficiency on re-use. When a histogram is recorded only once in a process' lifetime, that's the wrong optimization to make. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:192: int PostmortemReportCollector::CollectAndSubmitForUpload( MaybeCollectAndSubmit or CollectAndSubmitAllPendingReports? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:249: PostmortemReportCollector::CollectAndSubmit( Per below comment, this could be collectAndSubmitOnereport? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:313: PostmortemReportCollector::CollectionStatus PostmortemReportCollector::Collect( I'm having trouble keeping pluralities straight in my head. Would this method perhaps be better named CollectOneReport? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:378: a quick comment here that we do this because the reporter is not necessarily the same version/bitness as the crashee would be nice? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:399: // A session state for the stability report: was the system session clean? rather than phrasing things as questions, maybe have a statement detailing what it is we record, and why? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/stability_report.proto (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/stability_report.proto:21: // The state of the system session that contained Chrome's execution. It's not necessarily obvious what a "system session" is, maybe document by the enum values above? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:70: DWORD value_cnt = 2U; nit: kValueCnt, maybe initialized from arraysize()? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:89: // specify the retrieval two attributes (event id and event time), each with nit: retrieval of? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:141: // Ensure we capture the last error immediately. Note: returning below to missing code for this - would have to ::GetLastError()? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:183: if (!init_success_) you might want to return a bool from Initialize, and assign it below. Also separate policy from mechanism in this small way, by doing e.g. if (!initalized_) { DCHECK(!init_success_); init_success_ = Initialize(); initialized_ = true; } This way the reader doesn't have to read the entire implementation to make sure these are maintained properly. Globals are evil, and so are member variables :). Also, if there are a great many ways to error out of initialization, I think you probably want the measure of how often that happens in - say - a histogram. Otherwise we've no way of assessing how successful this code is in parsing event logs in the wild? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:213: if (event_cnt < 3) don't we get stopped/started pairs for normal shutdown, startup? Oh, you're expecting to get either a started/stopped pair, or a started/unexpected pair, plus a started entry for the current session? Maybe make this explicit? Also, I'm not sure it's wise to filter this hard, "nature" is unclean and noisy. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:223: for (; start < event_cnt && end < event_cnt; start += 2, end += 2) { this loop needs some running commentary, please. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:17: class SystemSessionEventFetcher { This class appears to be a test seam, and it's state is an int. Does it make sense to fold it into the SystemSessionAnalyzer, maybe using a virtual function as a test seam?
Thanks! Another look? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/postmortem_report_collector.cc (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:192: int PostmortemReportCollector::CollectAndSubmitForUpload( On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > MaybeCollectAndSubmit or CollectAndSubmitAllPendingReports? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:249: PostmortemReportCollector::CollectAndSubmit( On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > Per below comment, this could be collectAndSubmitOnereport? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:313: PostmortemReportCollector::CollectionStatus PostmortemReportCollector::Collect( On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > I'm having trouble keeping pluralities straight in my head. Would this method > perhaps be better named CollectOneReport? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:378: On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > a quick comment here that we do this because the reporter is not necessarily the > same version/bitness as the crashee would be nice? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/postmortem_report_collector.cc:399: // A session state for the stability report: was the system session clean? On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > rather than phrasing things as questions, maybe have a statement detailing what > it is we record, and why? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/stability_report.proto (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/stability_report.proto:21: // The state of the system session that contained Chrome's execution. On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > It's not necessarily obvious what a "system session" is, maybe document by the > enum values above? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:70: DWORD value_cnt = 2U; On 2017/03/06 19:00:10, Sigurður Ásgeirsson wrote: > nit: kValueCnt, maybe initialized from arraysize()? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:89: // specify the retrieval two attributes (event id and event time), each with On 2017/03/06 19:00:10, Sigurður Ásgeirsson wrote: > nit: retrieval of? Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:141: // Ensure we capture the last error immediately. Note: returning below to On 2017/03/06 19:00:10, Sigurður Ásgeirsson wrote: > missing code for this - would have to ::GetLastError()? Removed, as I'm no longer using PLOG. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:183: if (!init_success_) On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > you might want to return a bool from Initialize, and assign it below. Also > separate policy from mechanism in this small way, by doing e.g. > > if (!initalized_) { > DCHECK(!init_success_); > init_success_ = Initialize(); > initialized_ = true; > } > > This way the reader doesn't have to read the entire implementation to make sure > these are maintained properly. Globals are evil, and so are member variables :). Done. > Also, if there are a great many ways to error out of initialization, I think you > probably want the measure of how often that happens in - say - a histogram. > Otherwise we've no way of assessing how successful this code is in parsing event > logs in the wild? I was thinking I'd wait and see how often the overall metric (ActivityTracker.Collect.SystemSessionAnalysisStatus) shows a failure before trying to drill down into the details of the errors. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:213: if (event_cnt < 3) On 2017/03/06 19:00:10, Sigurður Ásgeirsson wrote: > don't we get stopped/started pairs for normal shutdown, startup? > Oh, you're expecting to get either a started/stopped pair, or a > started/unexpected pair, plus a started entry for the current session? > Maybe make this explicit? Done. > Also, I'm not sure it's wise to filter this hard, "nature" is unclean and noisy. Start simple and iterate? https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:223: for (; start < event_cnt && end < event_cnt; start += 2, end += 2) { On 2017/03/06 19:00:09, Sigurður Ásgeirsson wrote: > this loop needs some running commentary, please. Done. https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/160001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:17: class SystemSessionEventFetcher { On 2017/03/06 19:00:10, Sigurður Ásgeirsson wrote: > This class appears to be a test seam, and it's state is an int. Does it make > sense to fold it into the SystemSessionAnalyzer, maybe using a virtual function > as a test seam? Done.
lgtm https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/fetch_system_session_events_main_win.cc (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/fetch_system_session_events_main_win.cc:14: using browser_watcher::SystemSessionEventFetcher; ubernit: put this in the sole scope where it's used. https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:187: coverage_start_ = events[event_cnt - 1].event_time; DCHECK event_cnt > 0 for defense against maintenance/extension regressions. https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:61: // delta to the next session start, and changes nothing wrt classiying nit: speling[sic] https://codereview.chromium.org/2715903003/diff/180001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2715903003/diff/180001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:79082: + <int value="3" label="Analysis failed"/> ah - this should allow you to bound the error rate. Missed that.
Thanks for the review! https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/fetch_system_session_events_main_win.cc (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/fetch_system_session_events_main_win.cc:14: using browser_watcher::SystemSessionEventFetcher; On 2017/03/07 14:38:39, Sigurður Ásgeirsson wrote: > ubernit: put this in the sole scope where it's used. Actually this doesn't exist anymore. Changed a few things. https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.cc (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.cc:187: coverage_start_ = events[event_cnt - 1].event_time; On 2017/03/07 14:38:39, Sigurður Ásgeirsson wrote: > DCHECK event_cnt > 0 for defense against maintenance/extension regressions. Done. https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... File components/browser_watcher/system_session_analyzer_win.h (right): https://codereview.chromium.org/2715903003/diff/180001/components/browser_wat... components/browser_watcher/system_session_analyzer_win.h:61: // delta to the next session start, and changes nothing wrt classiying On 2017/03/07 14:38:39, Sigurður Ásgeirsson wrote: > nit: speling[sic] Done. https://codereview.chromium.org/2715903003/diff/180001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2715903003/diff/180001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:79082: + <int value="3" label="Analysis failed"/> On 2017/03/07 14:38:39, Sigurður Ásgeirsson wrote: > ah - this should allow you to bound the error rate. Missed that. Acknowledged.
manzagop@chromium.org changed reviewers: + jwd@chromium.org
Hi Jesse!
Could you have an OWNERS' look at:
chrome\browser\chrome_browser_field_trials_desktop.cc
tools\metrics\histograms\histograms.xml
Thanks!
Pierre
The CQ bit was checked by manzagop@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm
Thanks for the reviews. Going ahead with submission.
The CQ bit was checked by manzagop@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from siggi@chromium.org, jwd@chromium.org Link to the patchset: https://codereview.chromium.org/2715903003/#ps250015 (title: "Merge fixups")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by manzagop@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch.
Bot data: {"patchset_id": 250015, "attempt_start_ts": 1488989459853430,
"parent_rev": "05c38bb17ef885788bba95fa263089834d220446", "commit_rev":
"deb62bfb3b4c7c2b78d6b6205766d9cf10ca39a0"}
Message was sent while issue was closed.
Description was changed from ========== Bound the impact of system instability on chrome instability. Add to stability reports whether the containing system session was clean. This information is derived in the following way: - add a timestamp to the stability file when chrome starts - when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context The log inspection and analysis is in system_session_analyzer_win*, the rest is plumbing. BUG=695987 ========== to ========== Bound the impact of system instability on chrome instability. Add to stability reports whether the containing system session was clean. This information is derived in the following way: - add a timestamp to the stability file when chrome starts - when processing unclean chrome shutdowns, inspect the system event logs for the past X sessions and interpret the stability file timestamps in that context The log inspection and analysis is in system_session_analyzer_win*, the rest is plumbing. BUG=695987 Review-Url: https://codereview.chromium.org/2715903003 Cr-Commit-Position: refs/heads/master@{#455476} Committed: https://chromium.googlesource.com/chromium/src/+/deb62bfb3b4c7c2b78d6b6205766... ==========
Message was sent while issue was closed.
Committed patchset #15 (id:250015) as https://chromium.googlesource.com/chromium/src/+/deb62bfb3b4c7c2b78d6b6205766... |
