|
|
DescriptionAdd FD exhaustion logging for Chrome OS.
Users are encountering a large number of file descriptors being used,
causing us to hit the soft limit and eventually crash. This change logs
where those FDs point to as a measure for chasing down the source of
this bug.
I was able to trigger this logic by leaking file descriptors from the
task manager code and confirmed that the file descriptors were correctly
written to the user log.
BUG=733718
R=derat@chromium.org, thestig@chromium.org
Review-Url: https://codereview.chromium.org/2969453002 .
Cr-Commit-Position: refs/heads/master@{#483563}
Committed: https://chromium.googlesource.com/chromium/src/+/89e298f70872077a0e2325f57b6cafc5c7a36463
Patch Set 1 #Patch Set 2 : Add FD exhaustion logging for Chrome OS. #Patch Set 3 : with setrlimit #
Total comments: 6
Patch Set 4 : add stack data for breakpad #
Total comments: 10
Patch Set 5 : shrink buffer to 32K, walk FDs backward #Patch Set 6 : fixes for thestig #
Total comments: 5
Patch Set 7 : fix for derat #Messages
Total messages: 24 (8 generated)
teravest@chromium.org changed reviewers: + derat@chromium.org
Dan, let me know what you think of this change. We could also just call setrlimit() here to ensure we get crash data via breakpad; then we don't have to wait on https://chromium-review.googlesource.com/c/545240/ to get through.
setrlimit() logic added to simplify dealing with merging and getting breakpad data.
https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:116: rlim.rlim_cur = 16384; i think you meant for this (and the PLOG(ERROR) above) to be in an else block if the getrlimit call fails, right? https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:124: sprintf(fd_path, "/proc/self/fd/%d", i); snprintf https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:125: ssize_t count = readlink(fd_path, buf, PATH_MAX); arraysize(buf)-1 instead of PATH_MAX here, i think (arraysize is generally better when possible, and readlink doesn't append a null byte)
I've added debug info to the stack as well. The stack size soft limit is 8M, so I chose 2M as hopefully enough room for us to write out information. I'd like to test to make sure that part works, but crash reporting is disabled on my chromium build. https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:116: rlim.rlim_cur = 16384; On 2017/06/29 19:26:04, Daniel Erat wrote: > i think you meant for this (and the PLOG(ERROR) above) to be in an else block if > the getrlimit call fails, right? Yep. Fixed, thanks. https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:124: sprintf(fd_path, "/proc/self/fd/%d", i); On 2017/06/29 19:26:03, Daniel Erat wrote: > snprintf Done. https://codereview.chromium.org/2969453002/diff/40001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:125: ssize_t count = readlink(fd_path, buf, PATH_MAX); On 2017/06/29 19:26:03, Daniel Erat wrote: > arraysize(buf)-1 instead of PATH_MAX here, i think (arraysize is generally > better when possible, and readlink doesn't append a null byte) Done.
lgtm
Description was changed from ========== Add FD exhaustion logging for Chrome OS. Users are encountering a large number of file descriptors being used, causing us to hit the soft limit and eventually crash. This change logs where those FDs point to as a measure for chasing down the source of this bug. I was able to trigger this logic by leaking file descriptors from the task manager code and confirmed that the file descriptors were correctly written to the user log. BUG=733718 ========== to ========== Add FD exhaustion logging for Chrome OS. Users are encountering a large number of file descriptors being used, causing us to hit the soft limit and eventually crash. This change logs where those FDs point to as a measure for chasing down the source of this bug. I was able to trigger this logic by leaking file descriptors from the task manager code and confirmed that the file descriptors were correctly written to the user log. BUG=733718 ==========
teravest@chromium.org changed reviewers: + thestig@chromium.org
https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:121: static const char* kFileDataMarker = "FDATA"; Can this be trully const as const char kFileDataMarker[] ? https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:122: char buf[PATH_MAX]; Declare |buf| and |fd_path| closer to where they are used? https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:135: sprintf(fd_path, "/proc/self/fd/%d", i, arraysize(fd_path) - 1); Did you mean to use snprintf() here? I only see "%d" in the format string, but there are 2 arguments following it. Also, the size argument is in the wrong place. https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:136: ssize_t count = readlink(fd_path, buf, arraysize(buf) - 1); Do you care if readlink() fails and returns -1? https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:146: CHECK(false) << "Logged for file descriptor exhaustion, crashing now"; super nitty: I like LOG(FATAL) better when it's actually logging a message. :)
Thanks for the quick review! https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:121: static const char* kFileDataMarker = "FDATA"; On 2017/06/29 22:06:29, Lei Zhang wrote: > Can this be trully const as const char kFileDataMarker[] ? Done. https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:122: char buf[PATH_MAX]; On 2017/06/29 22:06:29, Lei Zhang wrote: > Declare |buf| and |fd_path| closer to where they are used? This is gross, but I want crash_buffer to be as close to the bottom of the stack as possible. Hopefully the compiler isn't reordering these too much. https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:135: sprintf(fd_path, "/proc/self/fd/%d", i, arraysize(fd_path) - 1); On 2017/06/29 22:06:29, Lei Zhang wrote: > Did you mean to use snprintf() here? I only see "%d" in the format string, but > there are 2 arguments following it. Also, the size argument is in the wrong > place. Fixed. https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:136: ssize_t count = readlink(fd_path, buf, arraysize(buf) - 1); On 2017/06/29 22:06:28, Lei Zhang wrote: > Do you care if readlink() fails and returns -1? Hmm. Probably not, but I'll add a log anyway. https://codereview.chromium.org/2969453002/diff/60001/base/memory/shared_memo... base/memory/shared_memory_helper.cc:146: CHECK(false) << "Logged for file descriptor exhaustion, crashing now"; On 2017/06/29 22:06:29, Lei Zhang wrote: > super nitty: I like LOG(FATAL) better when it's actually logging a message. :) Done.
https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... base/memory/shared_memory_helper.cc:107: int original_fd_limit = 16384; should this be updated below when the getrlimit() call succeeds? https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... base/memory/shared_memory_helper.cc:125: char crash_buffer[32 * 1024]; i think you can avoid the next memset by using "= {0}" here
https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... base/memory/shared_memory_helper.cc:107: int original_fd_limit = 16384; On 2017/06/29 22:23:24, Daniel Erat wrote: > should this be updated below when the getrlimit() call succeeds? I believe it is, at line 110. I don't force it up to the "new" rlim_cur, since we shouldn't care about fds that happen after our setrlimit. https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... base/memory/shared_memory_helper.cc:125: char crash_buffer[32 * 1024]; On 2017/06/29 22:23:24, Daniel Erat wrote: > i think you can avoid the next memset by using "= {0}" here cool, done.
https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... File base/memory/shared_memory_helper.cc (right): https://codereview.chromium.org/2969453002/diff/100001/base/memory/shared_mem... base/memory/shared_memory_helper.cc:107: int original_fd_limit = 16384; On 2017/06/29 22:37:05, teravest wrote: > On 2017/06/29 22:23:24, Daniel Erat wrote: > > should this be updated below when the getrlimit() call succeeds? > > I believe it is, at line 110. I don't force it up to the "new" rlim_cur, since > we shouldn't care about fds that happen after our setrlimit. whoops, my mistake
lgtm
The CQ bit was checked by teravest@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from derat@chromium.org Link to the patchset: https://codereview.chromium.org/2969453002/#ps120001 (title: "fix for derat")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios-simulator on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios-simulator/bui...)
The CQ bit was checked by teravest@chromium.org
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Add FD exhaustion logging for Chrome OS. Users are encountering a large number of file descriptors being used, causing us to hit the soft limit and eventually crash. This change logs where those FDs point to as a measure for chasing down the source of this bug. I was able to trigger this logic by leaking file descriptors from the task manager code and confirmed that the file descriptors were correctly written to the user log. BUG=733718 ========== to ========== Add FD exhaustion logging for Chrome OS. Users are encountering a large number of file descriptors being used, causing us to hit the soft limit and eventually crash. This change logs where those FDs point to as a measure for chasing down the source of this bug. I was able to trigger this logic by leaking file descriptors from the task manager code and confirmed that the file descriptors were correctly written to the user log. BUG=733718 R=derat@chromium.org, thestig@chromium.org Review-Url: https://codereview.chromium.org/2969453002 . Cr-Commit-Position: refs/heads/master@{#483563} Committed: https://chromium.googlesource.com/chromium/src/+/89e298f70872077a0e2325f57b6c... ==========
Message was sent while issue was closed.
Committed patchset #7 (id:120001) manually as 89e298f70872077a0e2325f57b6cafc5c7a36463 (presubmit successful). |