|
|
Created:
5 years, 1 month ago by Charlie Harrison Modified:
5 years ago CC:
chromium-reviews, loading-reviews_chromium.org, darin-cc_chromium.org, jam, mmenke, kinuko Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionSimulated Crash reporting for ErrAborted requests
BUG=557430
Committed: https://crrev.com/760092291653c7591c6a2d99e4069ec6a5bf3d58
Cr-Commit-Position: refs/heads/master@{#361341}
Patch Set 1 #Patch Set 2 : fix include #Patch Set 3 : #Patch Set 4 : #
Total comments: 2
Patch Set 5 : Dump and get stack trace at Cancel time, not creation #
Total comments: 6
Patch Set 6 : Per browser limit, int64 milliseconds, raw char[] on stack #
Total comments: 3
Patch Set 7 : remove alias #Messages
Total messages: 34 (12 generated)
eroman@chromium.org changed reviewers: + eroman@chromium.org
https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... File content/browser/loader/resource_dispatcher_host_impl.cc (right): https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:918: base::debug::SetCrashKeyToStackTrace(kFastErrAbortTrace, Instead of adding a crash key to the report, I think it will be easier to just alias a local variable: base::Debug::StackTrace strack_trace_copy = *loader->request()->stack_trace(); base::Debug::Alias(&stack_trace_copy); .. <dumpwithoutcrashing> The advantage of this approach is you can easily symbolicate it for Windows crash dumps using something like: > ddp @@(&stack_trace_copy) L100 Also note that getting stack traces is somewhat unpredictable; it doesn't always work. It is better now that we no longer omit the frame pointer, but I have done this plenty of times and gotten back an unwalkable callstack, so don't hold your breath :( https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... content/browser/loader/resource_dispatcher_host_impl.cc:920: base::debug::DumpWithoutCrashing(); Do you have any estimates on how often this will be called? Can you suggest some limits or bounds on this? As a safety-net, how about limiting this to either a total number per session (say 1 or 2 for the lifetime of the browser), or maybe a time based threshold, like 1 report at most every hour? Also, put a link to the bug report by this line, so crash triagers will know what it corresponds to.
On 2015/11/16 22:56:11, eroman wrote: > https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... > File content/browser/loader/resource_dispatcher_host_impl.cc (right): > > https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:918: > base::debug::SetCrashKeyToStackTrace(kFastErrAbortTrace, > Instead of adding a crash key to the report, I think it will be easier to just > alias a local variable: > > base::Debug::StackTrace strack_trace_copy = *loader->request()->stack_trace(); > base::Debug::Alias(&stack_trace_copy); > .. > <dumpwithoutcrashing> > > > The advantage of this approach is you can easily symbolicate it for Windows > crash dumps using something like: > > > ddp @@(&stack_trace_copy) L100 > > > Also note that getting stack traces is somewhat unpredictable; it doesn't always > work. It is better now that we no longer omit the frame pointer, but I have done > this plenty of times and gotten back an unwalkable callstack, so don't hold your > breath :( > > https://codereview.chromium.org/1438863002/diff/60001/content/browser/loader/... > content/browser/loader/resource_dispatcher_host_impl.cc:920: > base::debug::DumpWithoutCrashing(); > Do you have any estimates on how often this will be called? Can you suggest some > limits or bounds on this? > > As a safety-net, how about limiting this to either a total number per session > (say 1 or 2 for the lifetime of the browser), or maybe a time based threshold, > like 1 report at most every hour? > > Also, put a link to the bug report by this line, so crash triagers will know > what it corresponds to. Thanks for the tips. Quick question I have after I inspected the code: Currently, set_stack_trace() is only set in URLFetcherCore::StartURLRequest. It is a member of URLFetcherCore, so it'll be initialized when URLFetcherCore is. Now I'm wondering if this is the wrong stack trace to use, and if we should conditionally be setting stack traces from the URLRequest proper. That seems much more invasive, however. I was also going to Alias the actual url, is that appropriate? WDYT?
csharrison@chromium.org changed reviewers: + rdsmith@chromium.org
Eric, After discussion at our net-error-codes meeting, Matt suggested that stack traces at cancel time will be more meaningful, as all MAIN_FRAME loads will have similar stack traces at start time. This makes the change pretty simple. PTAL. One note is that I don't know how to / if it's possible to restrict a bit of code for Canary (runtime or compile time). I'd do this if anyone knows a way. This is sampled to .01% of these "quick aborts."
On 2015/11/18 21:18:31, csharrison wrote: > Eric, > After discussion at our net-error-codes meeting, Matt suggested that stack > traces at cancel time will be more meaningful, as all MAIN_FRAME loads will have > similar stack traces at start time. > > This makes the change pretty simple. PTAL. > > One note is that I don't know how to / if it's possible to restrict a bit of > code for Canary (runtime or compile time). I'd do this if anyone knows a way. > > This is sampled to .01% of these "quick aborts." Can you add a bug associated with this.
On 2015/11/18 21:18:31, csharrison wrote: > Eric, > After discussion at our net-error-codes meeting, Matt suggested that stack > traces at cancel time will be more meaningful, as all MAIN_FRAME loads will have > similar stack traces at start time. > > This makes the change pretty simple. PTAL. > > One note is that I don't know how to / if it's possible to restrict a bit of > code for Canary (runtime or compile time). I'd do this if anyone knows a way. > > This is sampled to .01% of these "quick aborts." Can you add a bug associated with this.
Description was changed from ========== Simulated Crash reporting for ErrAborted requests BUG= ========== to ========== Simulated Crash reporting for ErrAborted requests BUG=557430 ==========
Done. When does the Canary branch get cut?
On 2015/11/18 22:40:03, csharrison wrote: > Done. When does the Canary branch get cut? Canary branches get cut daily.
cbentzel@chromium.org changed reviewers: + cbentzel@chromium.org
https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:690: double millis = (base::TimeTicks::Now() - creation_time_).InMillisecondsF(); Why are you converting to a double instead of just using integratl units?
https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:690: double millis = (base::TimeTicks::Now() - creation_time_).InMillisecondsF(); On 2015/11/19 01:14:47, cbentzel wrote: > Why are you converting to a double instead of just using integratl units? No real reason. I can change it back if you want.
https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:690: double millis = (base::TimeTicks::Now() - creation_time_).InMillisecondsF(); On 2015/11/19 02:14:25, csharrison wrote: > On 2015/11/19 01:14:47, cbentzel wrote: > > Why are you converting to a double instead of just using integratl units? > > No real reason. I can change it back if you want. Yes - doing conversion doesn't seem to be doing anything since just checking 100.0. Not sure if it will be easier to decipher integral values in stack traces as well.
lgtm https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:688: // ~500,000 ERR_ABORTED < 100ms in Canary channel a day. Sample .01% to get Thanks for doing some estimates! I would also suggest adding a per-user maximum as a safety net for users that are hitting thousands and thousands of these. https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:694: base::debug::Alias(&url_copy); This may not save the information you want (because the string contents to alias may not be directly in the url_copy object). A more reliable way to save strings into minidumps is [1] char url_copy[256] = {0}; base::debug::Alias(&url_copy); strncpy(url_copy, url().c_str(), sizeof(url_copy)); [1] Hard to say current version will be included or not. If the URL is small, thanks to small-string optimization it will be there. If it is larger, the heap-allocated buffer may or may not be reachable in the minidump (dump generation has some heuristics to include memory some pages for memory pointed to by the stack). To be safe just copy it onto stack directly. https://codereview.chromium.org/1438863002/diff/80001/net/url_request/url_req... net/url_request/url_request.cc:696: base::debug::DumpWithoutCrashing(); Please also add a TODO somewhere to delete this code once experiment is done.
Addressed comments. Will try to land this today.
Note that I used a hard limit per execution instance using a statically allocated int. LMK if there's a better way.
Going to try to land this. Can update if needed.
The CQ bit was checked by csharrison@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from eroman@chromium.org Link to the patchset: https://codereview.chromium.org/1438863002/#ps100001 (title: "Per browser limit, int64 milliseconds, raw char[] on stack")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1438863002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1438863002/100001
lgtm https://codereview.chromium.org/1438863002/diff/100001/net/url_request/url_re... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/100001/net/url_request/url_re... net/url_request/url_request.cc:695: static int dump_times = 0; This works on the assumption that URLRequest is only ever used from 1 thread. Which I believe is true in Chrome https://codereview.chromium.org/1438863002/diff/100001/net/url_request/url_re... net/url_request/url_request.cc:698: base::debug::Alias(&url_copy); No need to alias twice (done below as well)
The CQ bit was unchecked by csharrison@chromium.org
mmenke@chromium.org changed reviewers: + mmenke@chromium.org
https://codereview.chromium.org/1438863002/diff/100001/net/url_request/url_re... File net/url_request/url_request.cc (right): https://codereview.chromium.org/1438863002/diff/100001/net/url_request/url_re... net/url_request/url_request.cc:695: static int dump_times = 0; On 2015/11/23 23:29:05, eroman wrote: > This works on the assumption that URLRequest is only ever used from 1 thread. > Which I believe is true in Chrome I'm not in favor of adding more places that assume that...but given that we shouldn't be keeping this around too long, think we can live with it.
The CQ bit was checked by csharrison@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from eroman@chromium.org Link to the patchset: https://codereview.chromium.org/1438863002/#ps120001 (title: "remove alias")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1438863002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1438863002/120001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by csharrison@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1438863002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1438863002/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/760092291653c7591c6a2d99e4069ec6a5bf3d58 Cr-Commit-Position: refs/heads/master@{#361341} |