|
|
Created:
6 years, 3 months ago by Primiano Tucci (use gerrit) Modified:
6 years, 3 months ago CC:
chromium-reviews, klundberg+watch_chromium.org, yfriedman+watch_chromium.org, Dai Mikurube (NOT FULLTIME), bccheng1, eugenis Base URL:
https://chromium.googlesource.com/chromium/src.git@master Project:
chromium Visibility:
Public. |
Description[Android] libheap_profiler/heap_dump: use ptrace instead of SIGSTOP.
This change changes heap_dump to use ptrace rather than SIGSTOP to
freeze the target process. Recent kernels (Android L) forbid to attach
to /proc/PID/mem if the process is not ptrace-d.
This has also two advantageous side effects:
1. PTRACE_ATTACH is independent of the process state (if it was STOPped
it will stay stopped after the detach).
2. If heap_dump crashes or is killed, the process is automatically
detached by the kernel (so we can get rid of the SIGPIPE/INT logic).
Furthermore this CL makes the consistency checks optional (adding a -n
switch to disable them).
BUG=382489
Committed: https://crrev.com/e229b8263c7ee6a82ae90af54c32a1ce5705da98
Cr-Commit-Position: refs/heads/master@{#294222}
Patch Set 1 #
Total comments: 9
Patch Set 2 : handle signal interrupts #
Total comments: 3
Patch Set 3 : paranoid read #
Total comments: 2
Patch Set 4 : Fix read #Messages
Total messages: 20 (3 generated)
primiano@chromium.org changed reviewers: + pasko@chromium.org
Egor, would you mind taking a look to this small change? I'm making some fixes for recent Android releases. Thanks
I did not review heap_dump.c, and I forgot a few details about the heap_profiler, so pardon for many questions. In particular, what happens if we sigsnop/attach/whatever to the process while heap_profiler is maintaining the in-memory data structure (under a lock) in a way that the data structure is temporarily inconsistent? Do we report bogus data or fail fast in this case? https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:93: continue; // A false positive. this could be a short read, please check for EINTR and repeat the read, just like we do in base/posix/eintr_wrapper.h Please do that for all read() in this file. The term "false positive" in the comment is a bit too generic. Can we just say "This mapping has a too narrow of a margin to contain HeapStats", or something like that? I would be OK with removing the comment whatsoever, because the code speaks for itself. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:281: while (((c = getopt(argc, argv, "xnc:")) & 0x80) == 0) { Please provide a short human-readable description of these individual options. I think having them in this file's comments is OK, but I would prefer a --help printout of course. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:302: if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) == -1) { ptrace interface is tricky and I am not a good enough expert here. I think it's a good idea to ask eugenis@ to have a quick look at this, since I remember him playing with ptrace extensively. Will PTRACE_ATTACH continue to be allowed on L+? Some distributions require PR_SET_PTRACER followed by PTRACE_TRACEME. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:321: ptrace(PTRACE_DETACH, pid, NULL, NULL); ptrace manual page says: """ While being traced, the tracee will stop each time a signal is delivered, even if the signal is being ignored. """ It seems the code as it is now ignores the signal: """ After signal-delivery-stop is observed by the tracer, the tracer should restart the tracee with the call: ptrace(PTRACE_restart, pid, 0, sig) If sig is 0, then a signal is not delivered. """
https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:321: ptrace(PTRACE_DETACH, pid, NULL, NULL); On 2014/09/10 12:09:18, pasko wrote: > ptrace manual page says: > """ > While being traced, the tracee will stop each time a signal is delivered, even > if the signal is being ignored. > """ > > It seems the code as it is now ignores the signal: > """ > After signal-delivery-stop is observed by the tracer, the tracer > should restart the tracee with the call: > ptrace(PTRACE_restart, pid, 0, sig) > If sig is 0, then a signal is not delivered. > """ please disregard this comment, the quote from above should apply to the signals that occurred while the process runs and is being traced. We have a simpler case, the program is either traced or running. (I am not 100% sure that I read the manual correctly though)
>In particular, what happens if we sigsnop/attach/whatever to >the process while >heap_profiler is maintaining the in-memory data structure (under a lock) in a >way that the data structure is temporarily inconsistent? > Do we report bogus data or fail fast in this case? Excellent question, that was exactly the reason of "pedantic" and the -d switch. In pedantic mode we doublecheck (on the heap_dump side) that the data is consistent. With -n we skip the check. Yes, it can happen that we stop the process while libheap_profiler is updating its metadata under its lock. However, that should be concurrent-safeish for the following reason: the metadata is stored on a flat and static array. The heap_dump only walks the array in memory order (doesn't follow the tree in address order). If we assume that writing a pointer is an atomic operation (which AFAICT is a reasonable assumption on x86 arm and x86_64), in the worst case, we can read one inconsistent entry. The rest is always intact. Therefore, I introduced the -n switch because, should this happen, I prefer to live with the fact that one of the thousands allocations are inconsistent rather than repeating the trace. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:93: continue; // A false positive. > Please do that for all read() in this file. Very hard this will happen (The Android framework is oblivious of the existance of this process, as it is a shell process, and I don't see who could send a signal here). But adding the wrapper should be pretty straightforward. I would be OK with removing the comment whatsoever, because the code > speaks for itself. Me too. Removing the comment. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:281: while (((c = getopt(argc, argv, "xnc:")) & 0x80) == 0) { On 2014/09/10 12:09:18, pasko wrote: > Please provide a short human-readable description of these individual options. I > think having them in this file's comments is OK, but I would prefer a --help > printout of course. OK. I'll expand the usage string. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:302: if (ptrace(PTRACE_ATTACH, pid, NULL, NULL) == -1) { > Will PTRACE_ATTACH continue to be allowed on L+? Very hard to tell. However PTRACE_ATTACH is the way gdb --attach works, and I hope Android will keep supporting gdb :) Some distributions require > PR_SET_PTRACER followed by PTRACE_TRACEME. That is different and must be done by the traced process (child), not by the tracer. In general, AFAICT, as long as the tracer is root everything should be fine. Adding euginins@ to doublecheck btw. https://codereview.chromium.org/553403002/diff/1/tools/android/heap_profiler/... tools/android/heap_profiler/heap_dump.c:321: ptrace(PTRACE_DETACH, pid, NULL, NULL); On 2014/09/10 12:09:18, pasko wrote: > ptrace manual page says: > """ > While being traced, the tracee will stop each time a signal is delivered, even > if the signal is being ignored. > """ That is for the case of fork + TRACEME. In the case of PTRACE_ATTACH, the tracee is already stopped, so it's very hard it will be stopped more than that. :) > It seems the code as it is now ignores the signal: Which signal? The signal eventually goes to the tracee, not to this process. > """ > After signal-delivery-stop is observed by the tracer, the tracer > should restart the tracee with the call: > ptrace(PTRACE_restart, pid, 0, sig) > If sig is 0, then a signal is not delivered. > """ Again, this is for the case of the tracee being alive and the tracer wanting to be notified when the tracee gets a signal / exception. This is not the case here.
On 2014/09/10 13:17:35, Primiano Tucci wrote: > >In particular, what happens if we sigsnop/attach/whatever to >the process while > >heap_profiler is maintaining the in-memory data structure (under a lock) in a > >way that the data structure is temporarily inconsistent? > > Do we report bogus data or fail fast in this case? > > Excellent question, that was exactly the reason of "pedantic" and the -d switch. > In pedantic mode we doublecheck (on the heap_dump side) that the data is > consistent. With -n we skip the check. > > Yes, it can happen that we stop the process while libheap_profiler is updating > its metadata under its lock. However, that should be concurrent-safeish for the > following reason: > the metadata is stored on a flat and static array. The heap_dump only walks the > array in memory order (doesn't follow the tree in address order). > If we assume that writing a pointer is an atomic operation (which AFAICT is a > reasonable assumption on x86 arm and x86_64), in the worst case, we can read one > inconsistent entry. The rest is always intact. Not following pointers is good. Why are you saying that only one entry can be inconsistent? Do we acquire/release between writing entries? Also, consistency checks seem rather weak to me, how about writing CRC checksums to entries? As usual, I would prefer to rerun a tool than spending hours figuring out why individual runs do not agree with each other. *FAIL EARLY*
> Why are you saying that only one entry can be inconsistent? Do we > acquire/release between writing entries? Because record one allocation or free (+ its backtrace, eventually) per time, and every time we take the mutex. Taking a mutex enforces a strong RW barrier (and it couldn't be otherwise, because that's the assumption everybody wants from a critical section). > Also, consistency checks seem rather weak to me, how about writing CRC checksums > to entries? That sounds overkilling (both as regards tracing overhead and memory per entry) > As usual, I would prefer to rerun a tool than spending hours figuring out why > individual runs do not agree with each other. Hmm I'd live pretty happily knowing that one out of thousands entries will eventually report a stale value. Also because that means spending some extra 3-4 seconds to re-trace (due to all the involved adb laziness). To be fair, if we want 100% guarantee we don't need any CRC anyways. We just had to set a boolean before entering the mutex and clearing that after we leave it (+wb). If we see that we broke in the middle of libheap_profiler business we should just retry without even attempting to dump the data. Let's do that in one of the next CLs.
+eugenis@ if you have a spare minute can you please TAL to Egor's questions on ptrace? Thanks :)
On 2014/09/10 14:17:20, Primiano Tucci wrote: > > Why are you saying that only one entry can be inconsistent? Do we > > acquire/release between writing entries? > Because record one allocation or free (+ its backtrace, eventually) per time, > and every time we take the mutex. Taking a mutex enforces a strong RW barrier > (and it couldn't be otherwise, because that's the assumption everybody wants > from a critical section). ah, if we write only one entry per allocation, that should do it > > Also, consistency checks seem rather weak to me, how about writing CRC > checksums > > to entries? > That sounds overkilling (both as regards tracing overhead and memory per entry) > > > As usual, I would prefer to rerun a tool than spending hours figuring out why > > individual runs do not agree with each other. > Hmm I'd live pretty happily knowing that one out of thousands entries will > eventually report a stale value. Also because that means spending some extra 3-4 > seconds to re-trace (due to all the involved adb laziness). > > To be fair, if we want 100% guarantee we don't need any CRC anyways. We just had > to set a boolean before entering the mutex and clearing that after we leave it > (+wb). > If we see that we broke in the middle of libheap_profiler business we should > just retry without even attempting to dump the data. That would probably work, but I am not sure that the kernel is responsible to maintain memory consistency guarantees with userspace locks and reading from /proc/pid/mem. Really, the kernel could have cached some parts of /proc/pid/mem in some read buffers (unlikely, but .. I am not seeing evidence against this hypothesis:) > Let's do that in one of the next CLs. Sure, that was a top-level question out of my non-familiarity with the assumptions.
very close, but reading needs more love .. https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... tools/android/heap_profiler/heap_dump.c:238: } while(res == 1 && errno == EINTR); s/1/-1/ And it still can be a short read. Take a look at base/files/file_posix.cc where we continue reading until either getting the total amount of bytes or an error distinct from (-1 && errno == EINTR). I know, posix file io api is not ideal ...
pasko@chromium.org changed reviewers: + eugenis@chromium.org
https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... tools/android/heap_profiler/heap_dump.c:238: } while(res == 1 && errno == EINTR); On 2014/09/10 15:25:23, pasko wrote: > s/1/-1/ > > And it still can be a short read. Take a look at base/files/file_posix.cc where > we continue reading until either getting the total amount of bytes or an error > distinct from (-1 && errno == EINTR). > > I know, posix file io api is not ideal ... That is veeeeeeery unlikely to happen when reading /proc/pid/mem. And even if it happened, all the reads here are checked (hard file if rsize != sizeof(struct)). IMHO we're just adding useless overhead here. But if it makes you happy... there you go.
On 2014/09/10 15:42:17, Primiano Tucci wrote: > https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... > File tools/android/heap_profiler/heap_dump.c (right): > > https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... > tools/android/heap_profiler/heap_dump.c:238: } while(res == 1 && errno == > EINTR); > On 2014/09/10 15:25:23, pasko wrote: > > s/1/-1/ > > > > And it still can be a short read. Take a look at base/files/file_posix.cc > where > > we continue reading until either getting the total amount of bytes or an error > > distinct from (-1 && errno == EINTR). > > > > I know, posix file io api is not ideal ... > That is veeeeeeery unlikely to happen when reading /proc/pid/mem. > And even if it happened, all the reads here are checked (hard file if rsize != > sizeof(struct)). > IMHO we're just adding useless overhead here. But if it makes you happy... there > you go. I'm not sure what the status of PTRACE_TRACEME on Android is. Many linux distributions require it (by default). I guess, root will always be allowed to attach to any process.
https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/20001/tools/android/heap_profi... tools/android/heap_profiler/heap_dump.c:238: } while(res == 1 && errno == EINTR); On 2014/09/10 15:42:16, Primiano Tucci wrote: > On 2014/09/10 15:25:23, pasko wrote: > > s/1/-1/ > > > > And it still can be a short read. Take a look at base/files/file_posix.cc > where > > we continue reading until either getting the total amount of bytes or an error > > distinct from (-1 && errno == EINTR). > > > > I know, posix file io api is not ideal ... > That is veeeeeeery unlikely to happen when reading /proc/pid/mem. > And even if it happened, all the reads here are checked (hard file if rsize != > sizeof(struct)). > IMHO we're just adding useless overhead here. But if it makes you happy... there > you go. yes, it's unlikely, and your tool is unlikely to be used in as many cases as to make the undesired interrupt happen during the lifetime of the universe. Anyway, I would prefer the code to use the API under general assumptions because people often copy code without checking assumptions under which their code is running. You may argue that it's "their problem", and you'd be mostly correct until you yourself forget about these subtleties and start using this code elsewhere. Also, it's my problem now as well because I know about this wonderful tool, and am quite likely to use it under different assumptions. https://codereview.chromium.org/553403002/diff/40001/tools/android/heap_profi... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/40001/tools/android/heap_profi... tools/android/heap_profiler/heap_dump.c:245: bytes_read += res; buf += res
https://codereview.chromium.org/553403002/diff/40001/tools/android/heap_profi... File tools/android/heap_profiler/heap_dump.c (right): https://codereview.chromium.org/553403002/diff/40001/tools/android/heap_profi... tools/android/heap_profiler/heap_dump.c:245: bytes_read += res; Yeah, Apparently I am not able to copy and paste. Well it is not enough. Also count needs math love.
lgtm
The CQ bit was checked by primiano@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/primiano@chromium.org/553403002/60001
Message was sent while issue was closed.
Committed patchset #4 (id:60001) as a8972be9ebe291007b8a44403922ba64dbcec625
Message was sent while issue was closed.
Patchset 4 (id:??) landed as https://crrev.com/e229b8263c7ee6a82ae90af54c32a1ce5705da98 Cr-Commit-Position: refs/heads/master@{#294222} |