|
|
Created:
3 years, 9 months ago by Ilya Sherman Modified:
3 years, 9 months ago CC:
chromium-reviews, kalyank, sadrul, asvitkine+watch_chromium.org, agrieve+watch_chromium.org Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
Description[Android Crash Reporting] Simplify crash report upload code.
In particular: run the code to extract logcat output within the context
of the main Chrome process/service, rather than kicking off the work to
a pair of background services. Moreover, dramatically simplify the
MetricsUploadService code: remove all but one of the intents that it
handles.
BUG=694884
R=mariakhomenko@chromium.org
Review-Url: https://codereview.chromium.org/2727573004
Cr-Commit-Position: refs/heads/master@{#455704}
Committed: https://chromium.googlesource.com/chromium/src/+/a9a8f4e062bd1c72c3b511de3c41611e24b7949e
Patch Set 1 #Patch Set 2 : --similarity=20 #Patch Set 3 : Update tests #Patch Set 4 : Cleaned up a bit #Patch Set 5 : Rebase harder #
Total comments: 3
Patch Set 6 : Self-review #Patch Set 7 : This CL is now fully armed and operational #Patch Set 8 : Mebbe fix findbugs bugs? #Patch Set 9 : Mark IOException as thrown #Patch Set 10 : Simplify the test code exception handling a bit #
Total comments: 49
Patch Set 11 : 15-minute timeout for logcat extraction (and address other code review comments) #Patch Set 12 : Tuck up one more return stmt #
Total comments: 2
Patch Set 13 : Add a histogram to measure how many pending minidumps are present on startup #Patch Set 14 : Rebase #Patch Set 15 : Fix up a comment #
Total comments: 2
Messages
Total messages: 64 (41 generated)
Maria, PTAL. I still need to update some tests, but I've manually verified the functionality, so I think the non-test code is ready for review. In the interest of time, with the branch point approaching, I figure it's better to start discussing this code sooner rather than later. Also, I'm planning to inline the code from MinidumpLogcatPrepender.java into LogcatExtractionRunnable.java -- but I'll defer that to a follow-up CL, for clearer diffs.
Okay, tests are now updated as well. Maria, PTAL! =)
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...) ios-simulator-xcode-clang on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator-xco...) mac_chromium_compile_dbg_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_comp...) mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_...)
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/03/02 23:22:08, Ilya Sherman wrote: > Okay, tests are now updated as well. Maria, PTAL! =) Some high level comments: - You'll need code reviewers other than me, because I don't know enough to figure out pitfalls here, but I'll do a review too - Why do we have to only upload logcats for subprocess crashes? I would imagine logcats are primarily helpful with browser crashes, which we have a lot more than renderer ones?
https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:37: new Thread(new LogcatExtractionRunnable(context, path), "MinidumpLogcatWorkerThread") I would use AsyncTask here instead unless there's a reason to create a new thread explicitly.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:37: new Thread(new LogcatExtractionRunnable(context, path), "MinidumpLogcatWorkerThread") On 2017/03/03 00:50:41, Maria wrote: > I would use AsyncTask here instead unless there's a reason to create a new > thread explicitly. Chatted with some folks, the suggestion I got was to use AsyncTask.THREAD_POOL_EXECUTOR.execute(runnable), which will reuse a thread from thread pool, but won't bog you down in AsyncTask semantics.
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
isherman@chromium.org changed reviewers: + gsennton@chromium.org
Thanks for the review, Maria! I've now added code to handle appending logcat output to browser crashes as well, as long as those are recent. PTAL! Gustav, could you PTAL as well? I know this is outside of the Webview code, but you probably know the Chrome crash report uploading code better than anyone else at this point =) (Also, note to self: I have run automated tests for the latest patch set, but I have not yet manually sanity-checked things. I need to do that before submitting! Specifically, I should test that: (1) logcat data is included for subprocess crashes; (2) logcat data is included for browser crashes, but only if they're recent, and (3) forced upload of crash reports still works.) https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:37: new Thread(new LogcatExtractionRunnable(context, path), "MinidumpLogcatWorkerThread") On 2017/03/03 19:44:21, Maria wrote: > On 2017/03/03 00:50:41, Maria wrote: > > I would use AsyncTask here instead unless there's a reason to create a new > > thread explicitly. > > Chatted with some folks, the suggestion I got was to use > AsyncTask.THREAD_POOL_EXECUTOR.execute(runnable), which will reuse a thread from > thread pool, but won't bog you down in AsyncTask semantics. Thanks! Done.
Description was changed from ========== [Android Crash Reporting] Only append logcats to subprocess crashes. Moreover, do so within the main Chrome process/service, rather than kicking off the work to a pair of background services. BUG=694884 R=mariakhomenko@chromium.org ========== to ========== [Android Crash Reporting] Simplify crash report upload code. In particular: run the code to extract logcat output within the context of the main Chrome process/service, rather than kicking off the work to a pair of background services. Moreover, dramatically simplify the MetricsUploadService code: remove all but one of the intents that it handles. BUG=694884 R=mariakhomenko@chromium.org ==========
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_clang_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_clan...)
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_comp...)
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
I've now also manually verified that this all seems to work. PTAL!
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
Didn't look at the tests yet, will take a look tomorrow. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:255: TimeUnit.MILLISECONDS); looks like you are changing the semantics of when this histogram is reported to include users who opted out of crash uploading. I don't think we should do that. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:316: // Now check whether crash reporting is enable. If it is, broadcast the appropriate nit: enable -> enabled https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:320: if (crashReportingDisabled) { nit: feel free to use if (crashReportingDisabled) return; one-liner. This is common style in clank code. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:328: if (minidumps.length == 0) { ditto https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:332: Log.i(TAG, "Attempting to upload accumulated crash dumps."); Might as well log how many there are https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333: if (!doesCrashMinidumpNeedLogcat(minidumps[0])) { Do we have some sort of guarantee on what the first minidump returned is? Do we now just attach logcat to one random minidump? I think it would make sense to attach to the freshest dump. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; How did you choose the duration? I would absolutely support this being much shorter, e.g. 15 minutes. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119: public LogcatExtractionRunnable(Context context, File minidump) { So I know we are getting rid of the services here, but what's the reason we can't just attach the logcat to all minidumps like we used to do? Prepend all logcats here and then have the job scheduler take care of uploads? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java:96: if (!intent.getAction().equals(ACTION_UPLOAD)) { nit: !ACTION_UPLOAD.equals(intent.getAction()) is a safer way to check
I also haven't looked at the tests yet. Is all the heavy-duty work put in the async-executor thread pool now? (I believe intents are handled on the main thread). https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:317: // permisison. nit: permisison -> permission https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333: if (!doesCrashMinidumpNeedLogcat(minidumps[0])) { On 2017/03/07 06:36:39, Maria wrote: > Do we have some sort of guarantee on what the first minidump returned is? Do we > now just attach logcat to one random minidump? I think it would make sense to > attach to the freshest dump. Please double-check whether the first minidump here is the newest or the oldest one (I remember that I was surprised about the order at some point but can't remember which way it is from the top of my head). Also, why only attach the logcat to the most recent (I assume) one? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335: } else { Would you mind switching the if- and else-statements here? IMO it is more clear when the if-statement doesn't include a negation. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:356: * it provides a cool-off period that can help circumvent a possible infinite crash loop, if the The cool-off period, is that the infinite period after the first duration (currently an hour) where we can attach a logcat? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/07 06:36:39, Maria wrote: > How did you choose the duration? I would absolutely support this being much > shorter, e.g. 15 minutes. What's the reasoning behind such a short duration? Looking at a logcat dump from my personal phone (running O - so probably a lot spammier than a regular user), I can see system output back to 4 and a half hours before the current time. I would not be surprised if a normal user could see logcat output 24+ hours after it is logged. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (left): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146: if (context == null) { Is this check unnecessary? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45: private static final String GOOD_IRI_CHAR = "a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF"; Not sure if you want to make these format changes in this CL? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29: private MinidumpDirectoryObserver(CrashFileManager fileManager) { Can we just inline this into the public ctor?
https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/07 18:32:04, gsennton wrote: > On 2017/03/07 06:36:39, Maria wrote: > > How did you choose the duration? I would absolutely support this being much > > shorter, e.g. 15 minutes. > > What's the reasoning behind such a short duration? Looking at a logcat dump from > my personal phone (running O - so probably a lot spammier than a regular user), > I can see system output back to 4 and a half hours before the current time. > I would not be surprised if a normal user could see logcat output 24+ hours > after it is logged. The problem is that we only grab 256 lines from the log. On my personal user build N device that covers less than one minute of logs (and all I did was turn the screen on). I was just thinking if we are crashing and this is the way the user gets back into their Chrome, shorter is better. I guess I don't have a good feel for the duration between the browser crashes and Chrome is reopened again as well as how often we get good logs in that scenario.
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:255: TimeUnit.MILLISECONDS); On 2017/03/07 06:36:39, Maria wrote: > looks like you are changing the semantics of when this histogram is reported to > include users who opted out of crash uploading. I don't think we should do that. Done. Good catch! I misread when this histogram was recorded and what it means. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:316: // Now check whether crash reporting is enable. If it is, broadcast the appropriate On 2017/03/07 06:36:39, Maria wrote: > nit: enable -> enabled Done. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:317: // permisison. On 2017/03/07 18:32:04, gsennton wrote: > nit: permisison -> permission Done. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:320: if (crashReportingDisabled) { On 2017/03/07 06:36:39, Maria wrote: > nit: feel free to use > > if (crashReportingDisabled) return; > > one-liner. This is common style in clank code. Done. Yay! This was bugging me, but I wasn't sure whether it was acceptable to collapse the code to a single line =) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:328: if (minidumps.length == 0) { On 2017/03/07 06:36:39, Maria wrote: > ditto Done. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:332: Log.i(TAG, "Attempting to upload accumulated crash dumps."); On 2017/03/07 06:36:39, Maria wrote: > Might as well log how many there are Done. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333: if (!doesCrashMinidumpNeedLogcat(minidumps[0])) { On 2017/03/07 18:32:04, gsennton wrote: > On 2017/03/07 06:36:39, Maria wrote: > > Do we have some sort of guarantee on what the first minidump returned is? Do > we > > now just attach logcat to one random minidump? I think it would make sense to > > attach to the freshest dump. > > Please double-check whether the first minidump here is the newest or the oldest > one (I remember that I was surprised about the order at some point but can't > remember which way it is from the top of my head). > > Also, why only attach the logcat to the most recent (I assume) one? I've double-checked that this is indeed the freshest minidump. I chose to attach the logcat to just the most recent crash for a couple of reasons: (1) We always grab the most recent logcat output, so it seems most likely to be relevant to the most recent crash; and (2) honestly, I'd expect it to almost never happen that we have multiple pending crash reports that need logcat output attached. Given that, I didn't want to support the extra complexity needed to attach logcat output to all the pending minidumps. That said, if y'all think it's worth a bit of extra complexity, we certainly could cover all of the minidumps. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335: } else { On 2017/03/07 18:32:04, gsennton wrote: > Would you mind switching the if- and else-statements here? IMO it is more clear > when the if-statement doesn't include a negation. I can do that, but let me run this by you first: WDYT of the code with some named variables extracted? I generally prefer to have the shorter/simpler branch come first, for scannability. But if you think it's still less clear this way, I'm happy to swap the order. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:356: * it provides a cool-off period that can help circumvent a possible infinite crash loop, if the On 2017/03/07 18:32:04, gsennton wrote: > The cool-off period, is that the infinite period after the first duration > (currently an hour) where we can attach a logcat? Right. I updated the comment to hopefully clarify a bit -- please let me know if it's still not quite clear. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/07 20:04:57, Maria wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > On 2017/03/07 06:36:39, Maria wrote: > > > How did you choose the duration? I would absolutely support this being much > > > shorter, e.g. 15 minutes. > > > > What's the reasoning behind such a short duration? Looking at a logcat dump > from > > my personal phone (running O - so probably a lot spammier than a regular > user), > > I can see system output back to 4 and a half hours before the current time. > > I would not be surprised if a normal user could see logcat output 24+ hours > > after it is logged. > > The problem is that we only grab 256 lines from the log. On my personal user > build N device that covers less than one minute of logs (and all I did was turn > the screen on). > > I was just thinking if we are crashing and this is the way the user gets back > into their Chrome, shorter is better. I guess I don't have a good feel for the > duration between the browser crashes and Chrome is reopened again as well as how > often we get good logs in that scenario. I picked semi-randomly :P I'm happy to start with a 15-minute period, and adjust that if we find that logcat output is too often missing. Conversely, I'm also content with increasing the period to something longer, if we believe there's useful data that will be extracted. I think the likelihood of the code entering an infinite crash loop is relatively low, so I'm okay with making the tradeoff in favor of better logcat data. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (left): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146: if (context == null) { On 2017/03/07 18:32:04, gsennton wrote: > Is this check unnecessary? I think so? We pass the application context into this method. Is it accurate to assume that the application context is always non-null? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45: private static final String GOOD_IRI_CHAR = "a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF"; On 2017/03/07 18:32:04, gsennton wrote: > Not sure if you want to make these format changes in this CL? You're right, probably not. Oh, the joys of clang autoformatting... I split off https://codereview.chromium.org/2735133003/ to format ALL the code. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119: public LogcatExtractionRunnable(Context context, File minidump) { On 2017/03/07 06:36:39, Maria wrote: > So I know we are getting rid of the services here, but what's the reason we > can't just attach the logcat to all minidumps like we used to do? Prepend all > logcats here and then have the job scheduler take care of uploads? I chose to attach the logcat to just the most recent crash for a couple of reasons: (1) We always grab the most recent logcat output, so it seems most likely to be relevant to the most recent crash; and (2) honestly, I'd expect it to almost never happen that we have multiple pending crash reports that need logcat output attached. Given that, I didn't want to support the extra complexity needed to attach logcat output to all the pending minidumps. That said, if y'all think it's worth a bit of extra complexity, we certainly could cover all of the minidumps. (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so I'm including this reply twice) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29: private MinidumpDirectoryObserver(CrashFileManager fileManager) { On 2017/03/07 18:32:04, gsennton wrote: > Can we just inline this into the public ctor? I don't know of a way to pass the same CrashFileManager to super() and to store into mFileManager with just a single no-argument constructor. We could construct the file manager twice, or we could have clients pass it into the constructor. Do either of those sound preferable, and/or is there a nice trick I should know about? I cribbed the current approach from StackOverflow =) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java:96: if (!intent.getAction().equals(ACTION_UPLOAD)) { On 2017/03/07 06:36:39, Maria wrote: > nit: !ACTION_UPLOAD.equals(intent.getAction()) is a safer way to check Done. Is it safer because intent.getAction() could theoretically be null?
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
lgtm https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 20:04:57, Maria wrote: > > On 2017/03/07 18:32:04, gsennton wrote: > > > On 2017/03/07 06:36:39, Maria wrote: > > > > How did you choose the duration? I would absolutely support this being > much > > > > shorter, e.g. 15 minutes. > > > > > > What's the reasoning behind such a short duration? Looking at a logcat dump > > from > > > my personal phone (running O - so probably a lot spammier than a regular > > user), > > > I can see system output back to 4 and a half hours before the current time. > > > I would not be surprised if a normal user could see logcat output 24+ hours > > > after it is logged. > > > > The problem is that we only grab 256 lines from the log. On my personal user > > build N device that covers less than one minute of logs (and all I did was > turn > > the screen on). > > > > I was just thinking if we are crashing and this is the way the user gets back > > into their Chrome, shorter is better. I guess I don't have a good feel for the > > duration between the browser crashes and Chrome is reopened again as well as > how > > often we get good logs in that scenario. > > I picked semi-randomly :P I'm happy to start with a 15-minute period, and > adjust that if we find that logcat output is too often missing. Conversely, I'm > also content with increasing the period to something longer, if we believe > there's useful data that will be extracted. I think the likelihood of the code > entering an infinite crash loop is relatively low, so I'm okay with making the > tradeoff in favor of better logcat data. Hmm, one thing I thought of: logcat data returned for an application would only contain log entries generated by the same application. So in theory even though we only get last 256 lines, we may be able to get the last crash even much later because those would be the last Chrome-generated log lines. I think this convinces me to go back on my suggestion to make this smaller -- let's do what Gustav suggested and make it longer. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (left): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146: if (context == null) { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > Is this check unnecessary? > > I think so? We pass the application context into this method. Is it accurate > to assume that the application context is always non-null? I don't think it can be null anymore -- I think there are cases where it can be null within a service, but we are no longer calling this from a service. If anything, this should have been assert context != null; https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119: public LogcatExtractionRunnable(Context context, File minidump) { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 06:36:39, Maria wrote: > > So I know we are getting rid of the services here, but what's the reason we > > can't just attach the logcat to all minidumps like we used to do? Prepend all > > logcats here and then have the job scheduler take care of uploads? > > I chose to attach the logcat to just the most recent crash for a couple of > reasons: (1) We always grab the most recent logcat output, so it seems most > likely to be relevant to the most recent crash; and (2) honestly, I'd expect it > to almost never happen that we have multiple pending crash reports that need > logcat output attached. Given that, I didn't want to support the extra > complexity needed to attach logcat output to all the pending minidumps. > > That said, if y'all think it's worth a bit of extra complexity, we certainly > could cover all of the minidumps. > > (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so I'm > including this reply twice) What do you think about adding a histogram in the calling code to record how many pending crashes we are uploading? I would be cool with just adding logcat to one crash right now and checking that histogram. If we see the non-1 case often enough, we could consider re-attaching logcats to everything later on (I don't think it's a blocker for M-58 impl). Though I agree, the value of it is still pretty questionable. It would probably need to be paired with some code to find the right dump in the logcat and grab logcat preceding it. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java:96: if (!intent.getAction().equals(ACTION_UPLOAD)) { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 06:36:39, Maria wrote: > > nit: !ACTION_UPLOAD.equals(intent.getAction()) is a safer way to check > > Done. Is it safer because intent.getAction() could theoretically be null? Yes (and I believe it happens sometimes in practice too, though probably not to intents we construct ourselves). I generally like the style of CONST.equals(func()) and have been trying to get everyone to adopt it :)
Thanks! I'll address your comments in the morning, and also wait for Gustav's review prior to committing =) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/08 06:18:47, Maria wrote: > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > On 2017/03/07 20:04:57, Maria wrote: > > > On 2017/03/07 18:32:04, gsennton wrote: > > > > On 2017/03/07 06:36:39, Maria wrote: > > > > > How did you choose the duration? I would absolutely support this being > > much > > > > > shorter, e.g. 15 minutes. > > > > > > > > What's the reasoning behind such a short duration? Looking at a logcat > dump > > > from > > > > my personal phone (running O - so probably a lot spammier than a regular > > > user), > > > > I can see system output back to 4 and a half hours before the current > time. > > > > I would not be surprised if a normal user could see logcat output 24+ > hours > > > > after it is logged. > > > > > > The problem is that we only grab 256 lines from the log. On my personal user > > > build N device that covers less than one minute of logs (and all I did was > > turn > > > the screen on). > > > > > > I was just thinking if we are crashing and this is the way the user gets > back > > > into their Chrome, shorter is better. I guess I don't have a good feel for > the > > > duration between the browser crashes and Chrome is reopened again as well as > > how > > > often we get good logs in that scenario. > > > > I picked semi-randomly :P I'm happy to start with a 15-minute period, and > > adjust that if we find that logcat output is too often missing. Conversely, > I'm > > also content with increasing the period to something longer, if we believe > > there's useful data that will be extracted. I think the likelihood of the > code > > entering an infinite crash loop is relatively low, so I'm okay with making the > > tradeoff in favor of better logcat data. > > Hmm, one thing I thought of: logcat data returned for an application would only > contain log entries generated by the same application. So in theory even though > we only get last 256 lines, we may be able to get the last crash even much later > because those would be the last Chrome-generated log lines. > > I think this convinces me to go back on my suggestion to make this smaller -- > let's do what Gustav suggested and make it longer. Okay, sounds good. Any specific suggestions on how long to make it? ~12 hours?
Regarding the 'append-logcat-to-only-one-minidump' question, it sounds like having a logcat for just one minidump is fine for now (as long as we add a histogram at some point to know whether we ever hit the several-minidumps-in-a-row scenario). https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333: if (!doesCrashMinidumpNeedLogcat(minidumps[0])) { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > On 2017/03/07 06:36:39, Maria wrote: > > > Do we have some sort of guarantee on what the first minidump returned is? Do > > we > > > now just attach logcat to one random minidump? I think it would make sense > to > > > attach to the freshest dump. > > > > Please double-check whether the first minidump here is the newest or the > oldest > > one (I remember that I was surprised about the order at some point but can't > > remember which way it is from the top of my head). > > > > Also, why only attach the logcat to the most recent (I assume) one? > > I've double-checked that this is indeed the freshest minidump. > > I chose to attach the logcat to just the most recent crash for a couple of > reasons: (1) We always grab the most recent logcat output, so it seems most > likely to be relevant to the most recent crash; and (2) honestly, I'd expect it > to almost never happen that we have multiple pending crash reports that need > logcat output attached. Given that, I didn't want to support the extra > complexity needed to attach logcat output to all the pending minidumps. > > That said, if y'all think it's worth a bit of extra complexity, we certainly > could cover all of the minidumps. I would say there are two different scenarios we care about here: 1. Chrome crashing over and over again (the user hits a reproducible crash) - in this case the logcat would probably be relevant for all of the queued minidump uploads, unless they are uploaded directly after the crash occurs (and I believe it would be nice with a logcat attached to each report here - I guess the investigator could look up the client ID to try to find reports close in time but I doubt investigators will do this in general). 2. Chrome hits several independent/unrelated crashes within the logcat-keeping time (15 minutes, or 12 hours or w/e), and won't upload each report directly (e.g. because we don't have WiFi). I have no idea how often this happens, but with a logcat-keeping time of ~12 hours this seems like it could happen every now and then (especially on Canary/Dev/Beta). Again, I don't really know how likely these events are.. it would be nice with a dremel query telling us how often the same device crashes twice within a short time-period ;) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335: } else { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > Would you mind switching the if- and else-statements here? IMO it is more > clear > > when the if-statement doesn't include a negation. > > I can do that, but let me run this by you first: WDYT of the code with some > named variables extracted? I generally prefer to have the shorter/simpler What do you mean by "some named variables extracted" in this case? (could you provide an example) > branch come first, for scannability. But if you think it's still less clear > this way, I'm happy to swap the order. I don't have a super-strong opinion here (and especially don't want to block the whole CL on this), go with what you feel is right. Note: one option could be to do something like if (doesCrashMinidumpNeedLogcat(minidumps[0])) { AsyncTask.THREAD_POOL_EXECUTOR.execute( new LogcatExtractionRunnable(mAppContext, minidumps[0])); } else { MinidumpUploadService.tryUploadCrashDump(mAppContext, minidumps[0]); } List<File> remainingMinidumps = Arrays.asList(minidumps).subList(1, minidumps.length); for (File minidump : remainingMinidumps) { MinidumpUploadService.tryUploadCrashDump(mAppContext, minidump); } https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/08 09:12:30, Ilya Sherman wrote: > On 2017/03/08 06:18:47, Maria wrote: > > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > > On 2017/03/07 20:04:57, Maria wrote: > > > > On 2017/03/07 18:32:04, gsennton wrote: > > > > > On 2017/03/07 06:36:39, Maria wrote: > > > > > > How did you choose the duration? I would absolutely support this being > > > much > > > > > > shorter, e.g. 15 minutes. > > > > > > > > > > What's the reasoning behind such a short duration? Looking at a logcat > > dump > > > > from > > > > > my personal phone (running O - so probably a lot spammier than a regular > > > > user), > > > > > I can see system output back to 4 and a half hours before the current > > time. > > > > > I would not be surprised if a normal user could see logcat output 24+ > > hours > > > > > after it is logged. > > > > > > > > The problem is that we only grab 256 lines from the log. On my personal > user > > > > build N device that covers less than one minute of logs (and all I did was > > > turn > > > > the screen on). > > > > > > > > I was just thinking if we are crashing and this is the way the user gets > > back > > > > into their Chrome, shorter is better. I guess I don't have a good feel for > > the > > > > duration between the browser crashes and Chrome is reopened again as well > as > > > how > > > > often we get good logs in that scenario. > > > > > > I picked semi-randomly :P I'm happy to start with a 15-minute period, and > > > adjust that if we find that logcat output is too often missing. Conversely, > > I'm > > > also content with increasing the period to something longer, if we believe > > > there's useful data that will be extracted. I think the likelihood of the > > code > > > entering an infinite crash loop is relatively low, so I'm okay with making > the > > > tradeoff in favor of better logcat data. > > > > Hmm, one thing I thought of: logcat data returned for an application would > only > > contain log entries generated by the same application. So in theory even > though > > we only get last 256 lines, we may be able to get the last crash even much > later > > because those would be the last Chrome-generated log lines. > > > > I think this convinces me to go back on my suggestion to make this smaller -- > > let's do what Gustav suggested and make it longer. > > Okay, sounds good. Any specific suggestions on how long to make it? ~12 hours? Ideally we would just read the time-stamp of the first line of logcat we read and check whether that is before (or slightly after) the time of the minidump generation? But I guess doing so will be brittle? Otherwise it would be great to count the time chrome has been active rather than the real-life time that has passed (using something like UsageStatsManager.queryUsageStats() - if that makes sense - though that API was introduced in API level 21). [if none of the above sound promising - or if you feel like implementing one of those for 59 instead] 12 hours sounds fairly reasonable to me :P (maybe run a logcat dump on your own personal device to check how far back in time you reach with 256 lines of Chrome-logs?) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (left): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146: if (context == null) { On 2017/03/08 06:18:47, Maria wrote: > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > On 2017/03/07 18:32:04, gsennton wrote: > > > Is this check unnecessary? > > > > I think so? We pass the application context into this method. Is it accurate > > to assume that the application context is always non-null? > > I don't think it can be null anymore -- I think there are cases where it can be > null within a service, but we are no longer calling this from a service. If > anything, this should have been assert context != null; The application context should never be null. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45: private static final String GOOD_IRI_CHAR = "a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF"; On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > Not sure if you want to make these format changes in this CL? > > You're right, probably not. Oh, the joys of clang autoformatting... I split > off https://codereview.chromium.org/2735133003/ to format ALL the code. Nice, thanks! https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119: public LogcatExtractionRunnable(Context context, File minidump) { On 2017/03/08 06:18:47, Maria wrote: > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > On 2017/03/07 06:36:39, Maria wrote: > > > So I know we are getting rid of the services here, but what's the reason we > > > can't just attach the logcat to all minidumps like we used to do? Prepend > all > > > logcats here and then have the job scheduler take care of uploads? > > > > I chose to attach the logcat to just the most recent crash for a couple of > > reasons: (1) We always grab the most recent logcat output, so it seems most > > likely to be relevant to the most recent crash; and (2) honestly, I'd expect > it > > to almost never happen that we have multiple pending crash reports that need > > logcat output attached. Given that, I didn't want to support the extra > > complexity needed to attach logcat output to all the pending minidumps. > > > > That said, if y'all think it's worth a bit of extra complexity, we certainly > > could cover all of the minidumps. > > > > (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so > I'm > > including this reply twice) > > What do you think about adding a histogram in the calling code to record how > many pending crashes we are uploading? I would be cool with just adding logcat > to one crash right now and checking that histogram. If we see the non-1 case > often enough, we could consider re-attaching logcats to everything later on (I > don't think it's a blocker for M-58 impl). Though I agree, the value of it is > still pretty questionable. It would probably need to be paired with some code to > find the right dump in the logcat and grab logcat preceding it. Ah, good idea to add a histogram! It would be great to be able to make an informed decision on this (without blocking this CL for 58). https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29: private MinidumpDirectoryObserver(CrashFileManager fileManager) { On 2017/03/08 02:32:44, Ilya Sherman wrote: > On 2017/03/07 18:32:04, gsennton wrote: > > Can we just inline this into the public ctor? > > I don't know of a way to pass the same CrashFileManager to super() and to store > into mFileManager with just a single no-argument constructor. We could > construct the file manager twice, or we could have clients pass it into the > constructor. Do either of those sound preferable, and/or is there a nice trick > I should know about? I cribbed the current approach from StackOverflow =) Ah ok, I just didn't see the problem, nice solution ;) https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:361: * would be circumvented. Cool, thanks for updating the explanation :). nit: should it be "at which point this potential crash loop would be circumvented." or "at which point this potential crash loop will be circumvented." ?
lgtm modulo going through the latest comments ;) (and following up on Maria's note on adding a histogram for the logcat/minidump upload).
The CQ bit was checked by isherman@chromium.org to run a CQ dry run
Thank you both for the review! Maria, would you mind taking one last look to sanity-check the post-approval changes? https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333: if (!doesCrashMinidumpNeedLogcat(minidumps[0])) { On 2017/03/08 15:18:41, gsennton wrote: > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > On 2017/03/07 18:32:04, gsennton wrote: > > > On 2017/03/07 06:36:39, Maria wrote: > > > > Do we have some sort of guarantee on what the first minidump returned is? > Do > > > we > > > > now just attach logcat to one random minidump? I think it would make sense > > to > > > > attach to the freshest dump. > > > > > > Please double-check whether the first minidump here is the newest or the > > oldest > > > one (I remember that I was surprised about the order at some point but can't > > > remember which way it is from the top of my head). > > > > > > Also, why only attach the logcat to the most recent (I assume) one? > > > > I've double-checked that this is indeed the freshest minidump. > > > > I chose to attach the logcat to just the most recent crash for a couple of > > reasons: (1) We always grab the most recent logcat output, so it seems most > > likely to be relevant to the most recent crash; and (2) honestly, I'd expect > it > > to almost never happen that we have multiple pending crash reports that need > > logcat output attached. Given that, I didn't want to support the extra > > complexity needed to attach logcat output to all the pending minidumps. > > > > That said, if y'all think it's worth a bit of extra complexity, we certainly > > could cover all of the minidumps. > > I would say there are two different scenarios we care about here: > 1. Chrome crashing over and over again (the user hits a reproducible crash) - in > this case the logcat would probably be relevant for all of the queued minidump > uploads, unless they are uploaded directly after the crash occurs (and I believe > it would be nice with a logcat attached to each report here - I guess the > investigator could look up the client ID to try to find reports close in time > but I doubt investigators will do this in general). It would have to be a crash that brings down the browser process before we have a chance to attach the logcat output -- so most likely, a crash on startup. It's certainly possible! But I'd worry that attaching the logcat output to the first crash in a series might be confusing, since it might have future output. (It would make more sense if we do something smarter, along the lines of attaching logcat output up to the timestamp of the crash. That's harder to implement correctly, though =)) > 2. Chrome hits several independent/unrelated crashes within the logcat-keeping > time (15 minutes, or 12 hours or w/e), and won't upload each report directly > (e.g. because we don't have WiFi). I have no idea how often this happens, but > with a logcat-keeping time of ~12 hours this seems like it could happen every > now and then (especially on Canary/Dev/Beta). Again, these would pretty much need to be crashes on startup. Once the logcat output is attached, we don't need to worry about that particular minidump anymore -- it's never detached =) > > Again, I don't really know how likely these events are.. it would be nice with a > dremel query telling us how often the same device crashes twice within a short > time-period ;) https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335: } else { On 2017/03/08 15:18:41, gsennton wrote: > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > On 2017/03/07 18:32:04, gsennton wrote: > > > Would you mind switching the if- and else-statements here? IMO it is more > > clear > > > when the if-statement doesn't include a negation. > > > > I can do that, but let me run this by you first: WDYT of the code with some > > named variables extracted? I generally prefer to have the shorter/simpler > > What do you mean by "some named variables extracted" in this case? (could you > provide an example) I meant what I included in the latest patch set ;-) > > branch come first, for scannability. But if you think it's still less clear > > this way, I'm happy to swap the order. > > I don't have a super-strong opinion here (and especially don't want to block the > whole CL on this), go with what you feel is right. > Note: one option could be to do something like > > if (doesCrashMinidumpNeedLogcat(minidumps[0])) { > AsyncTask.THREAD_POOL_EXECUTOR.execute( > new LogcatExtractionRunnable(mAppContext, minidumps[0])); > } else { > MinidumpUploadService.tryUploadCrashDump(mAppContext, minidumps[0]); > } > List<File> remainingMinidumps = Arrays.asList(minidumps).subList(1, > minidumps.length); > for (File minidump : remainingMinidumps) { > MinidumpUploadService.tryUploadCrashDump(mAppContext, minidump); > } Yeah, I think that's a better way to write it, at least for the current CL. I was kind of thinking about this with the next CL in mind, but can wait to cross that bridge until I actually come to it =) Thanks for the suggestion! Done. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/08 15:18:41, gsennton wrote: > On 2017/03/08 09:12:30, Ilya Sherman wrote: > > On 2017/03/08 06:18:47, Maria wrote: > > > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > > > On 2017/03/07 20:04:57, Maria wrote: > > > > > On 2017/03/07 18:32:04, gsennton wrote: > > > > > > On 2017/03/07 06:36:39, Maria wrote: > > > > > > > How did you choose the duration? I would absolutely support this > being > > > > much > > > > > > > shorter, e.g. 15 minutes. > > > > > > > > > > > > What's the reasoning behind such a short duration? Looking at a logcat > > > dump > > > > > from > > > > > > my personal phone (running O - so probably a lot spammier than a > regular > > > > > user), > > > > > > I can see system output back to 4 and a half hours before the current > > > time. > > > > > > I would not be surprised if a normal user could see logcat output 24+ > > > hours > > > > > > after it is logged. > > > > > > > > > > The problem is that we only grab 256 lines from the log. On my personal > > user > > > > > build N device that covers less than one minute of logs (and all I did > was > > > > turn > > > > > the screen on). > > > > > > > > > > I was just thinking if we are crashing and this is the way the user gets > > > back > > > > > into their Chrome, shorter is better. I guess I don't have a good feel > for > > > the > > > > > duration between the browser crashes and Chrome is reopened again as > well > > as > > > > how > > > > > often we get good logs in that scenario. > > > > > > > > I picked semi-randomly :P I'm happy to start with a 15-minute period, and > > > > adjust that if we find that logcat output is too often missing. > Conversely, > > > I'm > > > > also content with increasing the period to something longer, if we believe > > > > there's useful data that will be extracted. I think the likelihood of the > > > code > > > > entering an infinite crash loop is relatively low, so I'm okay with making > > the > > > > tradeoff in favor of better logcat data. > > > > > > Hmm, one thing I thought of: logcat data returned for an application would > > only > > > contain log entries generated by the same application. So in theory even > > though > > > we only get last 256 lines, we may be able to get the last crash even much > > later > > > because those would be the last Chrome-generated log lines. > > > > > > I think this convinces me to go back on my suggestion to make this smaller > -- > > > let's do what Gustav suggested and make it longer. > > > > Okay, sounds good. Any specific suggestions on how long to make it? ~12 > hours? > > Ideally we would just read the time-stamp of the first line of logcat we read > and check whether that is before (or slightly after) the time of the minidump > generation? But I guess doing so will be brittle? I'm not sure whether it's brittle, but it's harder to implement correctly, given how we buffer the logcat output. I think it's probably the right direction, but I'd prefer not to try to land something like this for M58 =) > Otherwise it would be great to count the time chrome has been active rather than > the real-life time that has passed (using something like > UsageStatsManager.queryUsageStats() - if that makes sense - though that API was > introduced in API level 21). That would likely make the escape-hatch not very viable. Plus, the relevant question is mostly "Is there still Chrome data in the logcat, or has it all aged out?" The logcat is a shared resource, and data could age out based on other app use. > [if none of the above sound promising - or if you feel like implementing one of > those for 59 instead] 12 hours sounds fairly reasonable to me :P > (maybe run a logcat dump on your own personal device to check how far back in > time you reach with 256 lines of Chrome-logs?) > https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119: public LogcatExtractionRunnable(Context context, File minidump) { On 2017/03/08 15:18:41, gsennton wrote: > On 2017/03/08 06:18:47, Maria wrote: > > On 2017/03/08 02:32:44, Ilya Sherman wrote: > > > On 2017/03/07 06:36:39, Maria wrote: > > > > So I know we are getting rid of the services here, but what's the reason > we > > > > can't just attach the logcat to all minidumps like we used to do? Prepend > > all > > > > logcats here and then have the job scheduler take care of uploads? > > > > > > I chose to attach the logcat to just the most recent crash for a couple of > > > reasons: (1) We always grab the most recent logcat output, so it seems most > > > likely to be relevant to the most recent crash; and (2) honestly, I'd expect > > it > > > to almost never happen that we have multiple pending crash reports that need > > > logcat output attached. Given that, I didn't want to support the extra > > > complexity needed to attach logcat output to all the pending minidumps. > > > > > > That said, if y'all think it's worth a bit of extra complexity, we certainly > > > could cover all of the minidumps. > > > > > > (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so > > I'm > > > including this reply twice) > > > > What do you think about adding a histogram in the calling code to record how > > many pending crashes we are uploading? I would be cool with just adding logcat > > to one crash right now and checking that histogram. If we see the non-1 case > > often enough, we could consider re-attaching logcats to everything later on (I > > don't think it's a blocker for M-58 impl). Though I agree, the value of it is > > still pretty questionable. It would probably need to be paired with some code > to > > find the right dump in the logcat and grab logcat preceding it. > > Ah, good idea to add a histogram! It would be great to be able to make an > informed decision on this (without blocking this CL for 58). Done. I agree that this is a great idea -- thanks! =) https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:361: * would be circumvented. On 2017/03/08 15:18:41, gsennton wrote: > Cool, thanks for updating the explanation :). > nit: should it be > "at which point this potential crash loop would be circumvented." > or > "at which point this potential crash loop will be circumvented." > ? Mmm, I think this is the "subjunctive" tense, and should be "would" -- because I'm writing about a hypothetical situation -- but I'm not 100% sure on that =)
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
lgtm https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:394: .executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); this doesn't look like correct formatting to me. might need another git cl format?
https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:394: .executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); On 2017/03/09 01:27:57, Maria wrote: > this doesn't look like correct formatting to me. might need another git cl > format? It's what git cl format outputs :/ I tried reverting it, and git cl format insisted on this change.
The CQ bit was unchecked by isherman@chromium.org
The CQ bit was checked by isherman@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from gsennton@chromium.org Link to the patchset: https://codereview.chromium.org/2727573004/#ps280001 (title: "Fix up a comment")
OK. On Wed, Mar 8, 2017 at 5:29 PM <isherman@chromium.org> wrote: > > > https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... > File > > chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java > (right): > > > https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/sr... > > chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:394: > .executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); > On 2017/03/09 01:27:57, Maria wrote: > > this doesn't look like correct formatting to me. might need another > git cl > > format? > > It's what git cl format outputs :/ I tried reverting it, and git cl > format insisted on this change. > > https://codereview.chromium.org/2727573004/ > -- You received this message because you are subscribed to the Google Groups "Chromium-reviews" group. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by isherman@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 280001, "attempt_start_ts": 1489047258110040, "parent_rev": "2f6c9f2a9a793b4b57d3b52fc59e7545dffceb65", "commit_rev": "a9a8f4e062bd1c72c3b511de3c41611e24b7949e"}
Message was sent while issue was closed.
Description was changed from ========== [Android Crash Reporting] Simplify crash report upload code. In particular: run the code to extract logcat output within the context of the main Chrome process/service, rather than kicking off the work to a pair of background services. Moreover, dramatically simplify the MetricsUploadService code: remove all but one of the intents that it handles. BUG=694884 R=mariakhomenko@chromium.org ========== to ========== [Android Crash Reporting] Simplify crash report upload code. In particular: run the code to extract logcat output within the context of the main Chrome process/service, rather than kicking off the work to a pair of background services. Moreover, dramatically simplify the MetricsUploadService code: remove all but one of the intents that it handles. BUG=694884 R=mariakhomenko@chromium.org Review-Url: https://codereview.chromium.org/2727573004 Cr-Commit-Position: refs/heads/master@{#455704} Committed: https://chromium.googlesource.com/chromium/src/+/a9a8f4e062bd1c72c3b511de3c41... ==========
Message was sent while issue was closed.
Committed patchset #15 (id:280001) as https://chromium.googlesource.com/chromium/src/+/a9a8f4e062bd1c72c3b511de3c41... |