Issue 2727573004: [Android Crash Reporting] Simplify crash report upload code.

Ilya Sherman

Maria, PTAL. I still need to update some tests, but I've manually verified the functionality, ...

3 years, 9 months ago (2017-03-02 03:22:24 UTC) #1

Ilya Sherman

Okay, tests are now updated as well. Maria, PTAL! =)

3 years, 9 months ago (2017-03-02 23:22:08 UTC) #2

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-02 23:22:13 UTC) #3

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/60001

3 years, 9 months ago (2017-03-02 23:23:00 UTC) #4

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-02 23:26:20 UTC) #5

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/builds/165927) ios-simulator-xcode-clang on ...

3 years, 9 months ago (2017-03-02 23:26:23 UTC) #6

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-02 23:52:39 UTC) #7

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/80001

3 years, 9 months ago (2017-03-02 23:53:08 UTC) #8

Maria

On 2017/03/02 23:22:08, Ilya Sherman wrote: > Okay, tests are now updated as well. Maria, ...

3 years, 9 months ago (2017-03-03 00:17:19 UTC) #9

Maria

https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java#newcode37 chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:37: new Thread(new LogcatExtractionRunnable(context, path), "MinidumpLogcatWorkerThread") I would use AsyncTask ...

3 years, 9 months ago (2017-03-03 00:50:41 UTC) #10

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-03 01:07:14 UTC) #11

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_rel_ng/builds/401819)

3 years, 9 months ago (2017-03-03 01:07:15 UTC) #12

Maria

https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java File chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java (right): https://codereview.chromium.org/2727573004/diff/80001/chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java#newcode37 chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:37: new Thread(new LogcatExtractionRunnable(context, path), "MinidumpLogcatWorkerThread") On 2017/03/03 00:50:41, Maria ...

3 years, 9 months ago (2017-03-03 19:44:21 UTC) #13

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-04 23:28:54 UTC) #14

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/120001

3 years, 9 months ago (2017-03-04 23:29:08 UTC) #15

Ilya Sherman

isherman@chromium.org changed reviewers: + gsennton@chromium.org

3 years, 9 months ago (2017-03-04 23:31:37 UTC) #16

Ilya Sherman

Thanks for the review, Maria! I've now added code to handle appending logcat output to ...

3 years, 9 months ago (2017-03-04 23:31:37 UTC) #17

Ilya Sherman

Description was changed from ========== [Android Crash Reporting] Only append logcats to subprocess crashes. Moreover, ...

3 years, 9 months ago (2017-03-04 23:34:42 UTC) #18

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-05 01:21:11 UTC) #19

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: android_clang_dbg_recipe on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_clang_dbg_recipe/builds/223032)

3 years, 9 months ago (2017-03-05 01:21:13 UTC) #20

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-05 07:46:26 UTC) #21

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/140001

3 years, 9 months ago (2017-03-05 07:46:36 UTC) #22

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-05 08:23:59 UTC) #23

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: android_compile_dbg on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_compile_dbg/builds/223354)

3 years, 9 months ago (2017-03-05 08:24:00 UTC) #24

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-06 22:22:49 UTC) #25

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/160001

3 years, 9 months ago (2017-03-06 22:23:31 UTC) #26

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-07 00:41:02 UTC) #27

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/395074)

3 years, 9 months ago (2017-03-07 00:41:04 UTC) #28

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-07 01:10:14 UTC) #29

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/180001

3 years, 9 months ago (2017-03-07 01:11:03 UTC) #30

Ilya Sherman

I've now also manually verified that this all seems to work. PTAL!

3 years, 9 months ago (2017-03-07 01:24:29 UTC) #31

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-07 02:34:28 UTC) #32

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_chromeos_rel_ng/builds/377958)

3 years, 9 months ago (2017-03-07 02:34:29 UTC) #33

Maria

Didn't look at the tests yet, will take a look tomorrow. https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): ...

3 years, 9 months ago (2017-03-07 06:36:39 UTC) #34

gsennton

I also haven't looked at the tests yet. Is all the heavy-duty work put in ...

3 years, 9 months ago (2017-03-07 18:32:04 UTC) #35

I also haven't looked at the tests yet.
Is all the heavy-duty work put in the async-executor thread pool now? (I believe
intents are handled on the main thread).

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:317:
// permisison.
nit: permisison -> permission

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333:
if (!doesCrashMinidumpNeedLogcat(minidumps[0])) {
On 2017/03/07 06:36:39, Maria wrote:
> Do we have some sort of guarantee on what the first minidump returned is? Do
we
> now just attach logcat to one random minidump? I think it would make sense to
> attach to the freshest dump.

Please double-check whether the first minidump here is the newest or the oldest
one (I remember that I was surprised about the order at some point but can't
remember which way it is from the top of my head).

Also, why only attach the logcat to the most recent (I assume) one?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335:
} else {
Would you mind switching the if- and else-statements here? IMO it is more clear
when the if-statement doesn't include a negation.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:356:
* it provides a cool-off period that can help circumvent a possible infinite
crash loop, if the
The cool-off period, is that the infinite period after the first duration
(currently an hour) where we can attach a logcat?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/07 06:36:39, Maria wrote:
> How did you choose the duration? I would absolutely support this being much
> shorter, e.g. 15 minutes. 

What's the reasoning behind such a short duration? Looking at a logcat dump from
my personal phone (running O - so probably a lot spammier than a regular user),
I can see system output back to 4 and a half hours before the current time.
I would not be surprised if a normal user could see logcat output 24+ hours
after it is logged.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(left):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146:
if (context == null) {
Is this check unnecessary?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45:
private static final String GOOD_IRI_CHAR =
"a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
Not sure if you want to make these format changes in this CL?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29:
private MinidumpDirectoryObserver(CrashFileManager fileManager) {
Can we just inline this into the public ctor?

Maria

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java#newcode366 chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/07 18:32:04, gsennton wrote: ...

3 years, 9 months ago (2017-03-07 20:04:57 UTC) #36

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-08 02:18:44 UTC) #37

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/200001

3 years, 9 months ago (2017-03-08 02:19:32 UTC) #38

Ilya Sherman

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java#newcode255 chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:255: TimeUnit.MILLISECONDS); On 2017/03/07 06:36:39, Maria wrote: > looks like ...

3 years, 9 months ago (2017-03-08 02:32:45 UTC) #39

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:255:
TimeUnit.MILLISECONDS);
On 2017/03/07 06:36:39, Maria wrote:
> looks like you are changing the semantics of when this histogram is reported
to
> include users who opted out of crash uploading. I don't think we should do
that.

Done.  Good catch!  I misread when this histogram was recorded and what it
means.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:316:
// Now check whether crash reporting is enable. If it is, broadcast the
appropriate
On 2017/03/07 06:36:39, Maria wrote:
> nit: enable -> enabled

Done.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:317:
// permisison.
On 2017/03/07 18:32:04, gsennton wrote:
> nit: permisison -> permission

Done.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:320:
if (crashReportingDisabled) {
On 2017/03/07 06:36:39, Maria wrote:
> nit: feel free to use 
> 
> if (crashReportingDisabled) return; 
> 
> one-liner. This is common style in clank code.

Done.  Yay!  This was bugging me, but I wasn't sure whether it was acceptable to
collapse the code to a single line =)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:328:
if (minidumps.length == 0) {
On 2017/03/07 06:36:39, Maria wrote:
> ditto

Done.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:332:
Log.i(TAG, "Attempting to upload accumulated crash dumps.");
On 2017/03/07 06:36:39, Maria wrote:
> Might as well log how many there are

Done.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333:
if (!doesCrashMinidumpNeedLogcat(minidumps[0])) {
On 2017/03/07 18:32:04, gsennton wrote:
> On 2017/03/07 06:36:39, Maria wrote:
> > Do we have some sort of guarantee on what the first minidump returned is? Do
> we
> > now just attach logcat to one random minidump? I think it would make sense
to
> > attach to the freshest dump.
> 
> Please double-check whether the first minidump here is the newest or the
oldest
> one (I remember that I was surprised about the order at some point but can't
> remember which way it is from the top of my head).
> 
> Also, why only attach the logcat to the most recent (I assume) one?

I've double-checked that this is indeed the freshest minidump.

I chose to attach the logcat to just the most recent crash for a couple of
reasons: (1) We always grab the most recent logcat output, so it seems most
likely to be relevant to the most recent crash; and (2) honestly, I'd expect it
to almost never happen that we have multiple pending crash reports that need
logcat output attached.  Given that, I didn't want to support the extra
complexity needed to attach logcat output to all the pending minidumps.

That said, if y'all think it's worth a bit of extra complexity, we certainly
could cover all of the minidumps.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335:
} else {
On 2017/03/07 18:32:04, gsennton wrote:
> Would you mind switching the if- and else-statements here? IMO it is more
clear
> when the if-statement doesn't include a negation.

I can do that, but let me run this by you first: WDYT of the code with some
named variables extracted?  I generally prefer to have the shorter/simpler
branch come first, for scannability.  But if you think it's still less clear
this way, I'm happy to swap the order.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:356:
* it provides a cool-off period that can help circumvent a possible infinite
crash loop, if the
On 2017/03/07 18:32:04, gsennton wrote:
> The cool-off period, is that the infinite period after the first duration
> (currently an hour) where we can attach a logcat?

Right.  I updated the comment to hopefully clarify a bit -- please let me know
if it's still not quite clear.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/07 20:04:57, Maria wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > On 2017/03/07 06:36:39, Maria wrote:
> > > How did you choose the duration? I would absolutely support this being
much
> > > shorter, e.g. 15 minutes. 
> > 
> > What's the reasoning behind such a short duration? Looking at a logcat dump
> from
> > my personal phone (running O - so probably a lot spammier than a regular
> user),
> > I can see system output back to 4 and a half hours before the current time.
> > I would not be surprised if a normal user could see logcat output 24+ hours
> > after it is logged.
> 
> The problem is that we only grab 256 lines from the log. On my personal user
> build N device that covers less than one minute of logs (and all I did was
turn
> the screen on). 
> 
> I was just thinking if we are crashing and this is the way the user gets back
> into their Chrome, shorter is better. I guess I don't have a good feel for the
> duration between the browser crashes and Chrome is reopened again as well as
how
> often we get good logs in that scenario.

I picked semi-randomly :P  I'm happy to start with a 15-minute period, and
adjust that if we find that logcat output is too often missing.  Conversely, I'm
also content with increasing the period to something longer, if we believe
there's useful data that will be extracted.  I think the likelihood of the code
entering an infinite crash loop is relatively low, so I'm okay with making the
tradeoff in favor of better logcat data.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(left):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146:
if (context == null) {
On 2017/03/07 18:32:04, gsennton wrote:
> Is this check unnecessary?

I think so?  We pass the application context into this method.  Is it accurate
to assume that the application context is always non-null?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45:
private static final String GOOD_IRI_CHAR =
"a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
On 2017/03/07 18:32:04, gsennton wrote:
> Not sure if you want to make these format changes in this CL?

You're right, probably not.  Oh, the joys of clang autoformatting...  I split
off https://codereview.chromium.org/2735133003/ to format ALL the code.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119:
public LogcatExtractionRunnable(Context context, File minidump) {
On 2017/03/07 06:36:39, Maria wrote:
> So I know we are getting rid of the services here, but what's the reason we
> can't just attach the logcat to all minidumps like we used to do? Prepend all
> logcats here and then have the job scheduler take care of uploads?

I chose to attach the logcat to just the most recent crash for a couple of
reasons: (1) We always grab the most recent logcat output, so it seems most
likely to be relevant to the most recent crash; and (2) honestly, I'd expect it
to almost never happen that we have multiple pending crash reports that need
logcat output attached.  Given that, I didn't want to support the extra
complexity needed to attach logcat output to all the pending minidumps.

That said, if y'all think it's worth a bit of extra complexity, we certainly
could cover all of the minidumps.

(Gustav asked the same thing in a comment on DeferredStartupHandler.java, so I'm
including this reply twice)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29:
private MinidumpDirectoryObserver(CrashFileManager fileManager) {
On 2017/03/07 18:32:04, gsennton wrote:
> Can we just inline this into the public ctor?

I don't know of a way to pass the same CrashFileManager to super() and to store
into mFileManager with just a single no-argument constructor.  We could
construct the file manager twice, or we could have clients pass it into the
constructor.  Do either of those sound preferable, and/or is there a nice trick
I should know about?  I cribbed the current approach from StackOverflow =)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java:96:
if (!intent.getAction().equals(ACTION_UPLOAD)) {
On 2017/03/07 06:36:39, Maria wrote:
> nit: !ACTION_UPLOAD.equals(intent.getAction()) is a safer way to check

Done.  Is it safer because intent.getAction() could theoretically be null?

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-08 02:50:04 UTC) #40

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/220001

3 years, 9 months ago (2017-03-08 02:50:21 UTC) #41

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-08 04:24:40 UTC) #42

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

3 years, 9 months ago (2017-03-08 04:24:41 UTC) #43

Maria

lgtm https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java#newcode366 chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366: return ageInHours < 1; On 2017/03/08 02:32:44, Ilya ...

3 years, 9 months ago (2017-03-08 06:18:47 UTC) #44

lgtm

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 20:04:57, Maria wrote:
> > On 2017/03/07 18:32:04, gsennton wrote:
> > > On 2017/03/07 06:36:39, Maria wrote:
> > > > How did you choose the duration? I would absolutely support this being
> much
> > > > shorter, e.g. 15 minutes. 
> > > 
> > > What's the reasoning behind such a short duration? Looking at a logcat
dump
> > from
> > > my personal phone (running O - so probably a lot spammier than a regular
> > user),
> > > I can see system output back to 4 and a half hours before the current
time.
> > > I would not be surprised if a normal user could see logcat output 24+
hours
> > > after it is logged.
> > 
> > The problem is that we only grab 256 lines from the log. On my personal user
> > build N device that covers less than one minute of logs (and all I did was
> turn
> > the screen on). 
> > 
> > I was just thinking if we are crashing and this is the way the user gets
back
> > into their Chrome, shorter is better. I guess I don't have a good feel for
the
> > duration between the browser crashes and Chrome is reopened again as well as
> how
> > often we get good logs in that scenario.
> 
> I picked semi-randomly :P  I'm happy to start with a 15-minute period, and
> adjust that if we find that logcat output is too often missing.  Conversely,
I'm
> also content with increasing the period to something longer, if we believe
> there's useful data that will be extracted.  I think the likelihood of the
code
> entering an infinite crash loop is relatively low, so I'm okay with making the
> tradeoff in favor of better logcat data.

Hmm, one thing I thought of: logcat data returned for an application would only
contain log entries generated by the same application. So in theory even though
we only get last 256 lines, we may be able to get the last crash even much later
because those would be the last Chrome-generated log lines.

I think this convinces me to go back on my suggestion to make this smaller --
let's do what Gustav suggested and make it longer.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(left):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146:
if (context == null) {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > Is this check unnecessary?
> 
> I think so?  We pass the application context into this method.  Is it accurate
> to assume that the application context is always non-null?

I don't think it can be null anymore -- I think there are cases where it can be
null within a service, but we are no longer calling this from a service. If
anything, this should have been assert context != null;

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119:
public LogcatExtractionRunnable(Context context, File minidump) {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 06:36:39, Maria wrote:
> > So I know we are getting rid of the services here, but what's the reason we
> > can't just attach the logcat to all minidumps like we used to do? Prepend
all
> > logcats here and then have the job scheduler take care of uploads?
> 
> I chose to attach the logcat to just the most recent crash for a couple of
> reasons: (1) We always grab the most recent logcat output, so it seems most
> likely to be relevant to the most recent crash; and (2) honestly, I'd expect
it
> to almost never happen that we have multiple pending crash reports that need
> logcat output attached.  Given that, I didn't want to support the extra
> complexity needed to attach logcat output to all the pending minidumps.
> 
> That said, if y'all think it's worth a bit of extra complexity, we certainly
> could cover all of the minidumps.
> 
> (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so
I'm
> including this reply twice)

What do you think about adding a histogram in the calling code to record how
many pending crashes we are uploading? I would be cool with just adding logcat
to one crash right now and checking that histogram. If we see the non-1 case
often enough, we could consider re-attaching logcats to everything later on (I
don't think it's a blocker for M-58 impl). Though I agree, the value of it is
still pretty questionable. It would probably need to be paired with some code to
find the right dump in the logcat and grab logcat preceding it.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpUploadService.java:96:
if (!intent.getAction().equals(ACTION_UPLOAD)) {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 06:36:39, Maria wrote:
> > nit: !ACTION_UPLOAD.equals(intent.getAction()) is a safer way to check
> 
> Done.  Is it safer because intent.getAction() could theoretically be null?

Yes (and I believe it happens sometimes in practice too, though probably not to
intents we construct ourselves). I generally like the style of
CONST.equals(func()) and have been trying to get everyone to adopt it :)

Ilya Sherman

Thanks! I'll address your comments in the morning, and also wait for Gustav's review prior ...

3 years, 9 months ago (2017-03-08 09:12:30 UTC) #45

Thanks!  I'll address your comments in the morning, and also wait for Gustav's
review prior to committing =)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/08 06:18:47, Maria wrote:
> On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > On 2017/03/07 20:04:57, Maria wrote:
> > > On 2017/03/07 18:32:04, gsennton wrote:
> > > > On 2017/03/07 06:36:39, Maria wrote:
> > > > > How did you choose the duration? I would absolutely support this being
> > much
> > > > > shorter, e.g. 15 minutes. 
> > > > 
> > > > What's the reasoning behind such a short duration? Looking at a logcat
> dump
> > > from
> > > > my personal phone (running O - so probably a lot spammier than a regular
> > > user),
> > > > I can see system output back to 4 and a half hours before the current
> time.
> > > > I would not be surprised if a normal user could see logcat output 24+
> hours
> > > > after it is logged.
> > > 
> > > The problem is that we only grab 256 lines from the log. On my personal
user
> > > build N device that covers less than one minute of logs (and all I did was
> > turn
> > > the screen on). 
> > > 
> > > I was just thinking if we are crashing and this is the way the user gets
> back
> > > into their Chrome, shorter is better. I guess I don't have a good feel for
> the
> > > duration between the browser crashes and Chrome is reopened again as well
as
> > how
> > > often we get good logs in that scenario.
> > 
> > I picked semi-randomly :P  I'm happy to start with a 15-minute period, and
> > adjust that if we find that logcat output is too often missing.  Conversely,
> I'm
> > also content with increasing the period to something longer, if we believe
> > there's useful data that will be extracted.  I think the likelihood of the
> code
> > entering an infinite crash loop is relatively low, so I'm okay with making
the
> > tradeoff in favor of better logcat data.
> 
> Hmm, one thing I thought of: logcat data returned for an application would
only
> contain log entries generated by the same application. So in theory even
though
> we only get last 256 lines, we may be able to get the last crash even much
later
> because those would be the last Chrome-generated log lines.
> 
> I think this convinces me to go back on my suggestion to make this smaller --
> let's do what Gustav suggested and make it longer.

Okay, sounds good.  Any specific suggestions on how long to make it?  ~12 hours?

gsennton

Regarding the 'append-logcat-to-only-one-minidump' question, it sounds like having a logcat for just one minidump is ...

3 years, 9 months ago (2017-03-08 15:18:41 UTC) #46

Regarding the 'append-logcat-to-only-one-minidump' question, it sounds like
having a logcat for just one minidump is fine for now (as long as we add a
histogram at some point to know whether we ever hit the
several-minidumps-in-a-row scenario).

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333:
if (!doesCrashMinidumpNeedLogcat(minidumps[0])) {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > On 2017/03/07 06:36:39, Maria wrote:
> > > Do we have some sort of guarantee on what the first minidump returned is?
Do
> > we
> > > now just attach logcat to one random minidump? I think it would make sense
> to
> > > attach to the freshest dump.
> > 
> > Please double-check whether the first minidump here is the newest or the
> oldest
> > one (I remember that I was surprised about the order at some point but can't
> > remember which way it is from the top of my head).
> > 
> > Also, why only attach the logcat to the most recent (I assume) one?
> 
> I've double-checked that this is indeed the freshest minidump.
> 
> I chose to attach the logcat to just the most recent crash for a couple of
> reasons: (1) We always grab the most recent logcat output, so it seems most
> likely to be relevant to the most recent crash; and (2) honestly, I'd expect
it
> to almost never happen that we have multiple pending crash reports that need
> logcat output attached.  Given that, I didn't want to support the extra
> complexity needed to attach logcat output to all the pending minidumps.
> 
> That said, if y'all think it's worth a bit of extra complexity, we certainly
> could cover all of the minidumps.

I would say there are two different scenarios we care about here:
1. Chrome crashing over and over again (the user hits a reproducible crash) - in
this case the logcat would probably be relevant for all of the queued minidump
uploads, unless they are uploaded directly after the crash occurs (and I believe
it would be nice with a logcat attached to each report here - I guess the
investigator could look up the client ID to try to find reports close in time
but I doubt investigators will do this in general).
2. Chrome hits several independent/unrelated crashes within the logcat-keeping
time (15 minutes, or 12 hours or w/e), and won't upload each report directly
(e.g. because we don't have WiFi). I have no idea how often this happens, but
with a logcat-keeping time of ~12 hours this seems like it could happen every
now and then (especially on Canary/Dev/Beta).

Again, I don't really know how likely these events are.. it would be nice with a
dremel query telling us how often the same device crashes twice within a short
time-period ;)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335:
} else {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > Would you mind switching the if- and else-statements here? IMO it is more
> clear
> > when the if-statement doesn't include a negation.
> 
> I can do that, but let me run this by you first: WDYT of the code with some
> named variables extracted?  I generally prefer to have the shorter/simpler

What do you mean by "some named variables extracted" in this case? (could you
provide an example)

> branch come first, for scannability.  But if you think it's still less clear
> this way, I'm happy to swap the order.

I don't have a super-strong opinion here (and especially don't want to block the
whole CL on this), go with what you feel is right.
Note: one option could be to do something like

if (doesCrashMinidumpNeedLogcat(minidumps[0])) {
    AsyncTask.THREAD_POOL_EXECUTOR.execute(
            new LogcatExtractionRunnable(mAppContext, minidumps[0]));
} else {
    MinidumpUploadService.tryUploadCrashDump(mAppContext, minidumps[0]);
}
List<File> remainingMinidumps = Arrays.asList(minidumps).subList(1,
minidumps.length);
for (File minidump : remainingMinidumps) {
    MinidumpUploadService.tryUploadCrashDump(mAppContext, minidump);
}

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/08 09:12:30, Ilya Sherman wrote:
> On 2017/03/08 06:18:47, Maria wrote:
> > On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > > On 2017/03/07 20:04:57, Maria wrote:
> > > > On 2017/03/07 18:32:04, gsennton wrote:
> > > > > On 2017/03/07 06:36:39, Maria wrote:
> > > > > > How did you choose the duration? I would absolutely support this
being
> > > much
> > > > > > shorter, e.g. 15 minutes. 
> > > > > 
> > > > > What's the reasoning behind such a short duration? Looking at a logcat
> > dump
> > > > from
> > > > > my personal phone (running O - so probably a lot spammier than a
regular
> > > > user),
> > > > > I can see system output back to 4 and a half hours before the current
> > time.
> > > > > I would not be surprised if a normal user could see logcat output 24+
> > hours
> > > > > after it is logged.
> > > > 
> > > > The problem is that we only grab 256 lines from the log. On my personal
> user
> > > > build N device that covers less than one minute of logs (and all I did
was
> > > turn
> > > > the screen on). 
> > > > 
> > > > I was just thinking if we are crashing and this is the way the user gets
> > back
> > > > into their Chrome, shorter is better. I guess I don't have a good feel
for
> > the
> > > > duration between the browser crashes and Chrome is reopened again as
well
> as
> > > how
> > > > often we get good logs in that scenario.
> > > 
> > > I picked semi-randomly :P  I'm happy to start with a 15-minute period, and
> > > adjust that if we find that logcat output is too often missing. 
Conversely,
> > I'm
> > > also content with increasing the period to something longer, if we believe
> > > there's useful data that will be extracted.  I think the likelihood of the
> > code
> > > entering an infinite crash loop is relatively low, so I'm okay with making
> the
> > > tradeoff in favor of better logcat data.
> > 
> > Hmm, one thing I thought of: logcat data returned for an application would
> only
> > contain log entries generated by the same application. So in theory even
> though
> > we only get last 256 lines, we may be able to get the last crash even much
> later
> > because those would be the last Chrome-generated log lines.
> > 
> > I think this convinces me to go back on my suggestion to make this smaller
--
> > let's do what Gustav suggested and make it longer.
> 
> Okay, sounds good.  Any specific suggestions on how long to make it?  ~12
hours?

Ideally we would just read the time-stamp of the first line of logcat we read
and check whether that is before (or slightly after) the time of the minidump
generation? But I guess doing so will be brittle?

Otherwise it would be great to count the time chrome has been active rather than
the real-life time that has passed (using something like
UsageStatsManager.queryUsageStats() - if that makes sense - though that API was
introduced in API level 21).

[if none of the above sound promising - or if you feel like implementing one of
those for 59 instead] 12 hours sounds fairly reasonable to me :P
(maybe run a logcat dump on your own personal device to check how far back in
time you reach with 256 lines of Chrome-logs?)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(left):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:146:
if (context == null) {
On 2017/03/08 06:18:47, Maria wrote:
> On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > On 2017/03/07 18:32:04, gsennton wrote:
> > > Is this check unnecessary?
> > 
> > I think so?  We pass the application context into this method.  Is it
accurate
> > to assume that the application context is always non-null?
> 
> I don't think it can be null anymore -- I think there are cases where it can
be
> null within a service, but we are no longer calling this from a service. If
> anything, this should have been assert context != null;

The application context should never be null.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:45:
private static final String GOOD_IRI_CHAR =
"a-zA-Z0-9\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF";
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > Not sure if you want to make these format changes in this CL?
> 
> You're right, probably not.  Oh, the joys of clang autoformatting...  I split
> off https://codereview.chromium.org/2735133003/ to format ALL the code.

Nice, thanks!

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119:
public LogcatExtractionRunnable(Context context, File minidump) {
On 2017/03/08 06:18:47, Maria wrote:
> On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > On 2017/03/07 06:36:39, Maria wrote:
> > > So I know we are getting rid of the services here, but what's the reason
we
> > > can't just attach the logcat to all minidumps like we used to do? Prepend
> all
> > > logcats here and then have the job scheduler take care of uploads?
> > 
> > I chose to attach the logcat to just the most recent crash for a couple of
> > reasons: (1) We always grab the most recent logcat output, so it seems most
> > likely to be relevant to the most recent crash; and (2) honestly, I'd expect
> it
> > to almost never happen that we have multiple pending crash reports that need
> > logcat output attached.  Given that, I didn't want to support the extra
> > complexity needed to attach logcat output to all the pending minidumps.
> > 
> > That said, if y'all think it's worth a bit of extra complexity, we certainly
> > could cover all of the minidumps.
> > 
> > (Gustav asked the same thing in a comment on DeferredStartupHandler.java, so
> I'm
> > including this reply twice)
> 
> What do you think about adding a histogram in the calling code to record how
> many pending crashes we are uploading? I would be cool with just adding logcat
> to one crash right now and checking that histogram. If we see the non-1 case
> often enough, we could consider re-attaching logcats to everything later on (I
> don't think it's a blocker for M-58 impl). Though I agree, the value of it is
> still pretty questionable. It would probably need to be paired with some code
to
> find the right dump in the logcat and grab logcat preceding it.

Ah, good idea to add a histogram! It would be great to be able to make an
informed decision on this (without blocking this CL for 58).

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/MinidumpDirectoryObserver.java:29:
private MinidumpDirectoryObserver(CrashFileManager fileManager) {
On 2017/03/08 02:32:44, Ilya Sherman wrote:
> On 2017/03/07 18:32:04, gsennton wrote:
> > Can we just inline this into the public ctor?
> 
> I don't know of a way to pass the same CrashFileManager to super() and to
store
> into mFileManager with just a single no-argument constructor.  We could
> construct the file manager twice, or we could have clients pass it into the
> constructor.  Do either of those sound preferable, and/or is there a nice
trick
> I should know about?  I cribbed the current approach from StackOverflow =)

Ah ok, I just didn't see the problem, nice solution ;)

https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:361:
* would be circumvented.
Cool, thanks for updating the explanation :).
nit: should it be
"at which point this potential crash loop would be circumvented."
or
"at which point this potential crash loop will be circumvented."
?

gsennton

lgtm modulo going through the latest comments ;) (and following up on Maria's note on ...

3 years, 9 months ago (2017-03-08 18:08:18 UTC) #47

Ilya Sherman

The CQ bit was checked by isherman@chromium.org to run a CQ dry run

3 years, 9 months ago (2017-03-09 01:08:42 UTC) #48

Ilya Sherman

Thank you both for the review! Maria, would you mind taking one last look to ...

3 years, 9 months ago (2017-03-09 01:08:45 UTC) #49

Thank you both for the review!  Maria, would you mind taking one last look to
sanity-check the post-approval changes?

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:333:
if (!doesCrashMinidumpNeedLogcat(minidumps[0])) {
On 2017/03/08 15:18:41, gsennton wrote:
> On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > On 2017/03/07 18:32:04, gsennton wrote:
> > > On 2017/03/07 06:36:39, Maria wrote:
> > > > Do we have some sort of guarantee on what the first minidump returned
is?
> Do
> > > we
> > > > now just attach logcat to one random minidump? I think it would make
sense
> > to
> > > > attach to the freshest dump.
> > > 
> > > Please double-check whether the first minidump here is the newest or the
> > oldest
> > > one (I remember that I was surprised about the order at some point but
can't
> > > remember which way it is from the top of my head).
> > > 
> > > Also, why only attach the logcat to the most recent (I assume) one?
> > 
> > I've double-checked that this is indeed the freshest minidump.
> > 
> > I chose to attach the logcat to just the most recent crash for a couple of
> > reasons: (1) We always grab the most recent logcat output, so it seems most
> > likely to be relevant to the most recent crash; and (2) honestly, I'd expect
> it
> > to almost never happen that we have multiple pending crash reports that need
> > logcat output attached.  Given that, I didn't want to support the extra
> > complexity needed to attach logcat output to all the pending minidumps.
> > 
> > That said, if y'all think it's worth a bit of extra complexity, we certainly
> > could cover all of the minidumps.
> 
> I would say there are two different scenarios we care about here:
> 1. Chrome crashing over and over again (the user hits a reproducible crash) -
in
> this case the logcat would probably be relevant for all of the queued minidump
> uploads, unless they are uploaded directly after the crash occurs (and I
believe
> it would be nice with a logcat attached to each report here - I guess the
> investigator could look up the client ID to try to find reports close in time
> but I doubt investigators will do this in general).

It would have to be a crash that brings down the browser process before we have
a chance to attach the logcat output -- so most likely, a crash on startup. 
It's certainly possible!  But I'd worry that attaching the logcat output to the
first crash in a series might be confusing, since it might have future output. 
(It would make more sense if we do something smarter, along the lines of
attaching logcat output up to the timestamp of the crash.  That's harder to
implement correctly, though =))

> 2. Chrome hits several independent/unrelated crashes within the logcat-keeping
> time (15 minutes, or 12 hours or w/e), and won't upload each report directly
> (e.g. because we don't have WiFi). I have no idea how often this happens, but
> with a logcat-keeping time of ~12 hours this seems like it could happen every
> now and then (especially on Canary/Dev/Beta).

Again, these would pretty much need to be crashes on startup.  Once the logcat
output is attached, we don't need to worry about that particular minidump
anymore -- it's never detached =)

> > Again, I don't really know how likely these events are.. it would be nice
with a
> dremel query telling us how often the same device crashes twice within a short
> time-period ;)

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:335:
} else {
On 2017/03/08 15:18:41, gsennton wrote:
> On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > On 2017/03/07 18:32:04, gsennton wrote:
> > > Would you mind switching the if- and else-statements here? IMO it is more
> > clear
> > > when the if-statement doesn't include a negation.
> > 
> > I can do that, but let me run this by you first: WDYT of the code with some
> > named variables extracted?  I generally prefer to have the shorter/simpler
> 
> What do you mean by "some named variables extracted" in this case? (could you
> provide an example)

I meant what I included in the latest patch set ;-)

> > branch come first, for scannability.  But if you think it's still less clear
> > this way, I'm happy to swap the order.
> 
> I don't have a super-strong opinion here (and especially don't want to block
the
> whole CL on this), go with what you feel is right.
> Note: one option could be to do something like
> 
> if (doesCrashMinidumpNeedLogcat(minidumps[0])) {
>     AsyncTask.THREAD_POOL_EXECUTOR.execute(
>             new LogcatExtractionRunnable(mAppContext, minidumps[0]));
> } else {
>     MinidumpUploadService.tryUploadCrashDump(mAppContext, minidumps[0]);
> }
> List<File> remainingMinidumps = Arrays.asList(minidumps).subList(1,
> minidumps.length);
> for (File minidump : remainingMinidumps) {
>     MinidumpUploadService.tryUploadCrashDump(mAppContext, minidump);
> }

Yeah, I think that's a better way to write it, at least for the current CL.  I
was kind of thinking about this with the next CL in mind, but can wait to cross
that bridge until I actually come to it =)  Thanks for the suggestion!  Done.

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:366:
return ageInHours < 1;
On 2017/03/08 15:18:41, gsennton wrote:
> On 2017/03/08 09:12:30, Ilya Sherman wrote:
> > On 2017/03/08 06:18:47, Maria wrote:
> > > On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > > > On 2017/03/07 20:04:57, Maria wrote:
> > > > > On 2017/03/07 18:32:04, gsennton wrote:
> > > > > > On 2017/03/07 06:36:39, Maria wrote:
> > > > > > > How did you choose the duration? I would absolutely support this
> being
> > > > much
> > > > > > > shorter, e.g. 15 minutes. 
> > > > > > 
> > > > > > What's the reasoning behind such a short duration? Looking at a
logcat
> > > dump
> > > > > from
> > > > > > my personal phone (running O - so probably a lot spammier than a
> regular
> > > > > user),
> > > > > > I can see system output back to 4 and a half hours before the
current
> > > time.
> > > > > > I would not be surprised if a normal user could see logcat output
24+
> > > hours
> > > > > > after it is logged.
> > > > > 
> > > > > The problem is that we only grab 256 lines from the log. On my
personal
> > user
> > > > > build N device that covers less than one minute of logs (and all I did
> was
> > > > turn
> > > > > the screen on). 
> > > > > 
> > > > > I was just thinking if we are crashing and this is the way the user
gets
> > > back
> > > > > into their Chrome, shorter is better. I guess I don't have a good feel
> for
> > > the
> > > > > duration between the browser crashes and Chrome is reopened again as
> well
> > as
> > > > how
> > > > > often we get good logs in that scenario.
> > > > 
> > > > I picked semi-randomly :P  I'm happy to start with a 15-minute period,
and
> > > > adjust that if we find that logcat output is too often missing. 
> Conversely,
> > > I'm
> > > > also content with increasing the period to something longer, if we
believe
> > > > there's useful data that will be extracted.  I think the likelihood of
the
> > > code
> > > > entering an infinite crash loop is relatively low, so I'm okay with
making
> > the
> > > > tradeoff in favor of better logcat data.
> > > 
> > > Hmm, one thing I thought of: logcat data returned for an application would
> > only
> > > contain log entries generated by the same application. So in theory even
> > though
> > > we only get last 256 lines, we may be able to get the last crash even much
> > later
> > > because those would be the last Chrome-generated log lines.
> > > 
> > > I think this convinces me to go back on my suggestion to make this smaller
> --
> > > let's do what Gustav suggested and make it longer.
> > 
> > Okay, sounds good.  Any specific suggestions on how long to make it?  ~12
> hours?
> 
> Ideally we would just read the time-stamp of the first line of logcat we read
> and check whether that is before (or slightly after) the time of the minidump
> generation? But I guess doing so will be brittle?

I'm not sure whether it's brittle, but it's harder to implement correctly, given
how we buffer the logcat output.  I think it's probably the right direction, but
I'd prefer not to try to land something like this for M58 =)

> Otherwise it would be great to count the time chrome has been active rather
than
> the real-life time that has passed (using something like
> UsageStatsManager.queryUsageStats() - if that makes sense - though that API
was
> introduced in API level 21).

That would likely make the escape-hatch not very viable.  Plus, the relevant
question is mostly "Is there still Chrome data in the logcat, or has it all aged
out?"  The logcat is a shared resource, and data could age out based on other
app use.

> [if none of the above sound promising - or if you feel like implementing one
of
> those for 59 instead] 12 hours sounds fairly reasonable to me :P
> (maybe run a logcat dump on your own personal device to check how far back in
> time you reach with 256 lines of Chrome-logs?)
>

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java
(right):

https://codereview.chromium.org/2727573004/diff/180001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/crash/LogcatExtractionRunnable.java:119:
public LogcatExtractionRunnable(Context context, File minidump) {
On 2017/03/08 15:18:41, gsennton wrote:
> On 2017/03/08 06:18:47, Maria wrote:
> > On 2017/03/08 02:32:44, Ilya Sherman wrote:
> > > On 2017/03/07 06:36:39, Maria wrote:
> > > > So I know we are getting rid of the services here, but what's the reason
> we
> > > > can't just attach the logcat to all minidumps like we used to do?
Prepend
> > all
> > > > logcats here and then have the job scheduler take care of uploads?
> > > 
> > > I chose to attach the logcat to just the most recent crash for a couple of
> > > reasons: (1) We always grab the most recent logcat output, so it seems
most
> > > likely to be relevant to the most recent crash; and (2) honestly, I'd
expect
> > it
> > > to almost never happen that we have multiple pending crash reports that
need
> > > logcat output attached.  Given that, I didn't want to support the extra
> > > complexity needed to attach logcat output to all the pending minidumps.
> > > 
> > > That said, if y'all think it's worth a bit of extra complexity, we
certainly
> > > could cover all of the minidumps.
> > > 
> > > (Gustav asked the same thing in a comment on DeferredStartupHandler.java,
so
> > I'm
> > > including this reply twice)
> > 
> > What do you think about adding a histogram in the calling code to record how
> > many pending crashes we are uploading? I would be cool with just adding
logcat
> > to one crash right now and checking that histogram. If we see the non-1 case
> > often enough, we could consider re-attaching logcats to everything later on
(I
> > don't think it's a blocker for M-58 impl). Though I agree, the value of it
is
> > still pretty questionable. It would probably need to be paired with some
code
> to
> > find the right dump in the logcat and grab logcat preceding it.
> 
> Ah, good idea to add a histogram! It would be great to be able to make an
> informed decision on this (without blocking this CL for 58).

Done.  I agree that this is a great idea -- thanks! =)

https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr...
File
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java
(right):

https://codereview.chromium.org/2727573004/diff/220001/chrome/android/java/sr...
chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:361:
* would be circumvented.
On 2017/03/08 15:18:41, gsennton wrote:
> Cool, thanks for updating the explanation :).
> nit: should it be
> "at which point this potential crash loop would be circumvented."
> or
> "at which point this potential crash loop will be circumvented."
> ?

Mmm, I think this is the "subjunctive" tense, and should be "would" -- because
I'm writing about a hypothetical situation -- but I'm not 100% sure on that =)

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/280001

3 years, 9 months ago (2017-03-09 01:09:40 UTC) #50

Maria

lgtm https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java#newcode394 chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:394: .executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); this doesn't look like correct formatting to ...

3 years, 9 months ago (2017-03-09 01:27:57 UTC) #51

Ilya Sherman

https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java File chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java (right): https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java#newcode394 chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java:394: .executeOnExecutor(AsyncTask.THREAD_POOL_EXECUTOR); On 2017/03/09 01:27:57, Maria wrote: > this doesn't ...

3 years, 9 months ago (2017-03-09 01:29:59 UTC) #52

Ilya Sherman

The patchset sent to the CQ was uploaded after l-g-t-m from gsennton@chromium.org Link to the ...

3 years, 9 months ago (2017-03-09 01:30:18 UTC) #55

chromium-reviews

OK. On Wed, Mar 8, 2017 at 5:29 PM <isherman@chromium.org> wrote: > > > https://codereview.chromium.org/2727573004/diff/280001/chrome/android/java/src/org/chromium/chrome/browser/DeferredStartupHandler.java ...

3 years, 9 months ago (2017-03-09 01:30:49 UTC) #56

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/280001

3 years, 9 months ago (2017-03-09 01:31:20 UTC) #57

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 9 months ago (2017-03-09 02:32:29 UTC) #58

commit-bot: I haz the power

Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_ng/builds/397489)

3 years, 9 months ago (2017-03-09 02:32:30 UTC) #59

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2727573004/280001

3 years, 9 months ago (2017-03-09 08:14:40 UTC) #61

commit-bot: I haz the power

CQ is committing da patch. Bot data: {"patchset_id": 280001, "attempt_start_ts": 1489047258110040, "parent_rev": "2f6c9f2a9a793b4b57d3b52fc59e7545dffceb65", "commit_rev": "a9a8f4e062bd1c72c3b511de3c41611e24b7949e"}

3 years, 9 months ago (2017-03-09 09:05:48 UTC) #62

commit-bot: I haz the power

Description was changed from ========== [Android Crash Reporting] Simplify crash report upload code. In particular: ...

3 years, 9 months ago (2017-03-09 09:06:33 UTC) #63

commit-bot: I haz the power

3 years, 9 months ago (2017-03-09 09:06:34 UTC) #64

Message was sent while issue was closed.

Committed patchset #15 (id:280001) as
https://chromium.googlesource.com/chromium/src/+/a9a8f4e062bd1c72c3b511de3c41...

Issue 2727573004: [Android Crash Reporting] Simplify crash report upload code. (Closed)

Description

Patch Set 1 #

Patch Set 2 : --similarity=20 #

Patch Set 3 : Update tests #

Patch Set 4 : Cleaned up a bit #

Patch Set 5 : Rebase harder #

Patch Set 6 : Self-review #

Patch Set 7 : This CL is now fully armed and operational #

Patch Set 8 : Mebbe fix findbugs bugs? #

Patch Set 9 : Mark IOException as thrown #

Patch Set 10 : Simplify the test code exception handling a bit #

Patch Set 11 : 15-minute timeout for logcat extraction (and address other code review comments) #

Patch Set 12 : Tuck up one more return stmt #

Patch Set 13 : Add a histogram to measure how many pending minidumps are present on startup #

Patch Set 14 : Rebase #

Patch Set 15 : Fix up a comment #

Messages