|
|
Chromium Code Reviews|
Created:
4 years, 7 months ago by Dmitry Skiba Modified:
4 years, 6 months ago Reviewers:
Primiano Tucci (use gerrit), jln (very slow on Chromium), rickyz (no longer on Chrome), Nico, Maria CC:
chromium-reviews, Maria Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionCheck stack pointer to be inside stack when unwinding.
TraceStackFramePointers() function, which unwinds stack using frame pointers,
sometimes crashes on Android when it dives deep into JVM internals and finds
bad stack pointer there. See details here: crbug.com/602701#c18
This CL adds checks to make sure only valid stack pointers are dereferenced.
BUG=602701
Committed: https://crrev.com/0bed5150f19acdd4897dd12ea8e4d4802c51d75f
Cr-Commit-Position: refs/heads/master@{#398134}
Patch Set 1 #Patch Set 2 : rebase #
Total comments: 2
Patch Set 3 : rebase #Patch Set 4 : Check on Linux too; rename function to GetStackStart() #Patch Set 5 : Disable Linux #Patch Set 6 : Avoid issues on Linux #Patch Set 7 : Fix renderer deadlock on Linux #
Total comments: 4
Patch Set 8 : Implement mincore() approach #
Total comments: 11
Patch Set 9 : Address comments #Patch Set 10 : rebase #Patch Set 11 : Revert to the LGTMed patchset 1 #Messages
Total messages: 64 (12 generated)
Description was changed from ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android it dives deep into JVM internals and founds bad stack pointer. Details are here: crbug.com/602701#c18 This CL fixes the issue by getting 'end of stack' pointer and checking it before reading from frame pointer. BUG=602701 ========== to ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android it dives deep into JVM internals and founds bad stack pointer. Details are here: crbug.com/602701#c18 This CL makes sure we never read past 'end of stack' pointer on Android. BUG=602701 ==========
dskiba@google.com changed reviewers: + primiano@chromium.org, thakis@chromium.org
Performance numbers: Before: TraceStackFramePointers(main thread): first call took 4us TraceStackFramePointers(main thread): next 10000 calls took 3724us TraceStackFramePointers(some other thread): first call took 4us TraceStackFramePointers(some other thread): next 10000 calls took 7117us After: TraceStackFramePointers(main thread): first call took 4380us TraceStackFramePointers(main thread): next 10000 calls took 4100us TraceStackFramePointers(some other thread): first call took 35us TraceStackFramePointers(some other thread): next 10000 calls took 8976us
Description was changed from ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android it dives deep into JVM internals and founds bad stack pointer. Details are here: crbug.com/602701#c18 This CL makes sure we never read past 'end of stack' pointer on Android. BUG=602701 ========== to ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android when it dives deep into JVM internals and finds bad stack pointer there. See details here: crbug.com/602701#c18 This CL adds checks to make sure only valid stack pointers are dereferenced. BUG=602701 ==========
The patch itself LGTM. However I just realized that Oilpan does pretty much the same in StackFrameDepth::getUnderestimatedStackSize(), but they solved all the various problems for the various platforms. Considering that WebKit/Source/platform/heap/ can include base/, I wonder if the right thing to do here is just moving that function to base, and share it between oilpan an your unwinder. Nico WDYT?
On 2016/05/17 20:08:07, Primiano Tucci wrote: > The patch itself LGTM. Ehm I just realized that my LG is pointless anyways, I thought we were in trace_event here, realized that it is not the case too late. My question about moving getUnderestimatedStackSize still holds though.
On 2016/05/17 20:09:21, Primiano Tucci wrote:
> On 2016/05/17 20:08:07, Primiano Tucci wrote:
> > The patch itself LGTM.
> Ehm I just realized that my LG is pointless anyways, I thought we were in
> trace_event here, realized that it is not the case too late.
> My question about moving getUnderestimatedStackSize still holds though.
also you could define an opaque "struct ThreadLocalStackTraceCache" and pass
take a pointer to that as optional argument to TraceStackFramePointers.
where struct ThreadLocalStackTraceCache { uintptr_t stack_limit; }
In essence the semantic would be: the caller of TraceStackFramePointers is
strongly encouraged to hold a StackTraceCache in a TLS and pass it back, without
having to know its contents ( StackTraceCache would be opaque to the caller).
If the caller does that, TraceStackFramePointers() will be faster faster. you
can pass nullptr but that will make it slower.
I suppose thakis@ just won another question :)
Welcome back Nico! PTAL :)
lgtm Re comment 12 on the bug: This not working in component builds would probably be fine (we never ship those), if that makes the other approach any simpler. https://codereview.chromium.org/1975393002/diff/20001/base/debug/stack_trace.cc File base/debug/stack_trace.cc (right): https://codereview.chromium.org/1975393002/diff/20001/base/debug/stack_trace.... base/debug/stack_trace.cc:97: uintptr_t stack_end = GetStackEnd(); hm, should we do this on linux too? seems nicer if chrome/linux and chrome/android behave the same way here
https://codereview.chromium.org/1975393002/diff/20001/base/debug/stack_trace.cc File base/debug/stack_trace.cc (right): https://codereview.chromium.org/1975393002/diff/20001/base/debug/stack_trace.... base/debug/stack_trace.cc:97: uintptr_t stack_end = GetStackEnd(); On 2016/05/24 20:29:48, Nico wrote: > hm, should we do this on linux too? seems nicer if chrome/linux and > chrome/android behave the same way here On Linux heuristic check that we have (>100000) is enough, I've never seen it crash. And on Android we're crashing because we get that magic 0xfff2f2ee value, which is not a pointer, but something else. It just so happens that its close to the stack and passes 100000 test. The value is always the same, and we could just hardcode a check for it, but this solution is more generic. I'll enable it on Linux too if it starts crashing there.
> https://codereview.chromium.org/1975393002/diff/20001/base/debug/stack_trace.... > base/debug/stack_trace.cc:97: uintptr_t stack_end = GetStackEnd(); > On 2016/05/24 20:29:48, Nico wrote: > > hm, should we do this on linux too? seems nicer if chrome/linux and > > chrome/android behave the same way here +1 > This not working in component builds would probably be fine (we never ship those), if that makes the other approach any simpler. I think it does not apply to this CL, which should work both in component and non-component. The thing that would't have worked in component mode is the heuristic stack scanning he proposed in https://codereview.chromium.org/1996243002/, where he'd scan for addresses on the stack and use /proc/maps to figure out whether a word is possibly a PC or not. But that was wildly esoteric and we decided to just do simpler like this. > On Linux heuristic check that we have (>100000) is enough, I've never seen it > crash. Well the fact that didn't crash in your experiments doesn't mean that there cannot be something that we don't know which might cause is to crash also there. I think the question is more: what is the reason for not having this on Linux as well? If the answer is purely performance, probably we should make this faster (maybe in a separate CL) and do the trick where the caller can pass an optional cache argument (essentially the cache content is opaque to the caller, which has the only responsibility of keeping that alive on its TLS). Having the same code being used on Linux/Android should make maintenance easier.
PTAL again. Performance numbers from Linux: Before: main thread: first call took 4us. main thread: next 1000000 calls took 52933us, 0.052933 us/call some other thread: first call took 2us. some other thread: next 1000000 calls took 61104us, 0.061104 us/call After: main thread: first call took 123us. main thread: next 1000000 calls took 153815us, 0.153815 us/call some other thread: first call took 9us. some other thread: next 1000000 calls took 332724us, 0.332724 us/call So it now takes 3x longer for the main thread and 5x longer for non-main thread. But the absolute numbers are still minuscule, so I think this is fine.
OK, I spoke too soon. Since malloc is weak in Linux, and pthread_getattr_np() allocates memory, we recurse back. And if I workaround that by ignoring recursive calls to AllocationContextTracker::GetContextSnapshot() (using ignore_scope_depth_), I get seccomp-bpf failures later: ../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0204 Received signal 11 SEGV_MAPERR 0000020020cc #0 0x7f53658d3246 base::debug::(anonymous namespace)::StackDumpSignalHandler() #1 0x7f535d518340 <unknown> #2 0x7f53630f2e5e sandbox::CrashSIGSYS_Handler() #3 0x7f53630f17ae sandbox::Trap::SigSys() #4 0x7f535d518340 <unknown> #5 0x7f535d5185e4 __pthread_getaffinity_new #6 0x7f535d51201e pthread_getattr_np #7 0x7f53658d26b0 base::debug::TraceStackFramePointers() #8 0x7f5365928bcb base::trace_event::AllocationContextTracker::GetContextSnapshot() #9 0x7f53659484b7 base::trace_event::MallocDumpProvider::InsertAllocation() #10 0x7f53659485a9 base::trace_event::(anonymous namespace)::HookAlloc() #11 0x7f5361b0e36e ShimMalloc #12 0x7f535a68a83a __strdup On Android malloc is not weak, so even though pthread_getattr_np() allocates, it doesn't cause any issues. So let's focus on Android for now and return to this later.
On 2016/05/25 19:00:57, Dmitry Skiba wrote: > So it now takes 3x longer for the main thread and 5x longer for non-main thread. > But the absolute numbers are still minuscule, so I think this is fine. Thanks for the perf numbers. From my viewpoint if chrome is still reasonably usable it makes sense to share the same code paths and get the extra protection. Note: this cl, as is, seems to be still android-only. I suppose you changed the define locally?
On 2016/05/25 19:44:59, Primiano Tucci wrote: > On 2016/05/25 19:00:57, Dmitry Skiba wrote: > > So it now takes 3x longer for the main thread and 5x longer for non-main > thread. > > But the absolute numbers are still minuscule, so I think this is fine. > > Thanks for the perf numbers. From my viewpoint if chrome is still reasonably > usable it makes sense to share the same code paths and get the extra protection. > Note: this cl, as is, seems to be still android-only. I suppose you changed the > define locally? See my other comment, I disabled it on Linux again because it caused seccomp issues. We can return to Linux issues later.
what makes us recurse back? can we prevent that? (i really think not turning this on on linux 'cause it happens to work isn't so good)
On 2016/05/25 19:51:43, Nico wrote: > what makes us recurse back? can we prevent that? > > (i really think not turning this on on linux 'cause it happens to work isn't so > good) Yes, I can prevent deadlocks caused by recursion, but then I hit seccomp issues.
On 2016/05/25 19:51:43, Nico wrote: > what makes us recurse back? can we prevent that? > > (i really think not turning this on on linux 'cause it happens to work isn't so > good) Ok the issue here is not recursion (that's one problem that as you say is easy to solve). The problem is that pthread_getattr_np seem to call pthread_getaffinity (not sure why you end up getting the cpu affinity if you just asked for the stack limit). pthread_getaffinity causes a __NR_sched_getaffinity syscall, which is filtered in the BPF sandbox: https://code.google.com/p/chromium/codesearch#chromium/src/content/common/san... If I read the code correctly, the BPF policy allows only calls of the form gettafinity(0) (0 == self thread). I suppose the default implementation is passing the actual TID, which causes the blacklist of the sandbox. This means that this code will crash as it is also on Android arm64 where, IIRC, the BPF sandbox is on by default.
> This means that this code will crash as it is also on Android arm64 where, IIRC, > the BPF sandbox is on by default. I take this back, I think it's only glibc that calls getaffinity from getattr. Android bionic is way simpler: https://android.googlesource.com/platform/bionic/+/master/libc/bionic/pthread... The thing that really makes me struggle is: How does the code in blink's StackFrameDepth.cpp not hit the sandbox? It is doing literally the same thing. In which process are you hitting this?
On 2016/05/25 20:08:02, Primiano Tucci wrote: > > This means that this code will crash as it is also on Android arm64 where, > IIRC, > > the BPF sandbox is on by default. > > I take this back, I think it's only glibc that calls getaffinity from getattr. > Android bionic is way simpler: > https://android.googlesource.com/platform/bionic/+/master/libc/bionic/pthread... > > The thing that really makes me struggle is: > How does the code in blink's StackFrameDepth.cpp not hit the sandbox? It is > doing literally the same thing. > > In which process are you hitting this? I think the error was from the renderer, because Chrome was still on the screen, and later I got 'Bad tab / Kill / Wait' dialog.
OK, so the overall question is whether I should make it work on Linux to submit this CL, or can I work on Linux issues later. What do you guys think?
On 2016/05/25 20:47:04, Dmitry Skiba wrote: > OK, so the overall question is whether I should make it work on Linux to submit > this CL, or can I work on Linux issues later. What do you guys think? IMHO would be great to try and see if there is an evident reason why the equivalent code in PartitionAlloc doesn't hit that sandbox issue. Maybe there is something trivial you can fix here?
OK, so I added seccomp exception in profiling mode, and that avoids seccomp
crashes.
But! Now it deadlocks in renderer:
#0 __lll_lock_wait_private ()
at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
#1 0x00007ffff4938286 in _L_lock_40 ()
from /lib/x86_64-linux-gnu/libpthread.so.0
#2 0x00007ffff4937fa9 in pthread_getattr_np (thread_id=140737353818136,
attr=0x7fffffffb848) at pthread_getattr_np.c:41
#3 0x000055555a24d6bb in base::debug::TraceStackFramePointers(void const**,
unsigned long, unsigned long, base::debug::PerThreadStackInfo*) ()
#4 0x000055555a2a3c7f in
base::trace_event::AllocationContextTracker::GetContextSnapshot() ()
#5 0x000055555a2c3567 in
base::trace_event::MallocDumpProvider::InsertAllocation(void*, unsigned long) ()
#6 0x000055555a2c3929 in base::trace_event::(anonymous
namespace)::HookRealloc(base::allocator::AllocatorDispatch const*, void*,
unsigned long) ()
#7 0x00005555564894c9 in realloc ()
#8 0x00007ffff1a9706b in _IO_getdelim (lineptr=lineptr@entry=0x7fffffffc160,
n=n@entry=0x7fffffffc188, delimiter=delimiter@entry=10,
fp=fp@entry=0x7ffff7e59d40) at iogetdelim.c:106
#9 0x00007ffff49381be in pthread_getattr_np (thread_id=140737353816576,
attr=0x7fffffffc1f0) at pthread_getattr_np.c:116
#10 0x000055555915a9a6 in blink::StackFrameDepth::getStackStart() ()
#11 0x000055555915aa4e in blink::ThreadState::ThreadState(bool) ()
#12 0x0000555558f16829 in blink::Platform::initialize(blink::Platform*) ()
#13 0x00005555591b004f in blink::initialize(blink::Platform*) ()
#14 0x0000555559ffb220 in
content::RenderThreadImpl::InitializeWebKit(scoped_refptr<base::SingleThreadTaskRunner>&)
()
#15 0x0000555559ffa201 in
content::RenderThreadImpl::Init(scoped_refptr<base::SingleThreadTaskRunner>&) ()
#16 0x0000555559ffa074 in
content::RenderThreadImpl::RenderThreadImpl(std::unique_ptr<base::MessageLoop,
std::default_delete<base::MessageLoop> >,
std::unique_ptr<scheduler::RendererScheduler,
std::default_delete<scheduler::RendererScheduler> >) ()
#17 0x0000555559ff9d3c in
content::RenderThreadImpl::Create(std::unique_ptr<base::MessageLoop,
std::default_delete<base::MessageLoop> >,
std::unique_ptr<scheduler::RendererScheduler,
std::default_delete<scheduler::RendererScheduler> >) ()
#18 0x000055555a023a1c in content::RendererMain(content::MainFunctionParams
const&) ()
#19 0x000055555a228702 in content::ContentMainRunnerImpl::Run() ()
#20 0x000055555a2271d6 in content::ContentMain(content::ContentMainParams
const&) ()
#21 0x000055555606c20a in ChromeMain ()
#22 0x00007ffff1a49ec5 in __libc_start_main (main=0x55555606c1c0 <main>,
argc=28, argv=0x7fffffffcb58, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>, stack_end=0x7fffffffcb48) at libc-start.c:287
#23 0x000055555606c0dd in _start ()
pthread_getattr_np() recursively calls itself. Yet another reason why we
shouldn't override weak malloc on Linux - it catches too many things.
On 2016/05/27 08:02:43, Dmitry Skiba wrote: > pthread_getattr_np() recursively calls itself. This is not true. What's happening is that you are causing a reentrant call of pthread_getattr_np. The situation seems pretty clear form this stack: 1. Oilpan code in Blink calls getStackStart() 2. Oilpan's getStackStart() calls pthread_getattr_np 3. pthread_getattr_np calls malloc (well realloc, whatever) 4. you intercept malloc 5. you call pthread_getattr_np() in the malloc hook 6. you have now caused a re-entrant call in pthread_getattr_np :) Welcome to the wonderful world of heap profiling, where the assumption that whatever function you call doesn't malloc() doesn't always hold :) > Yet another reason why we shouldn't override weak malloc on Linux - it catches too many things. I really don't understand how this is related. - How do you intend to do heap profiling without hooking malloc? - How is "weak" related? Conversely to popular belief malloc is NOT a weak symbol. What is weak are the glibc malloc_hook symbols, but that is a different story and is irrelevant here. - We've always overridden malloc on Linux, before any heap profiling. We did that for the sake of security checks and having our own allocator (TCMalloc). Not sure why we should change that now. - This problem is not about overriding malloc. The problem is that writing a heap profiler is tricky. Here you were assuming that pthread_gettattr_np doesn't call malloc re-entering the shim. You just found that this is not true.
On 2016/05/27 08:39:43, Primiano Tucci wrote: > On 2016/05/27 08:02:43, Dmitry Skiba wrote: > > pthread_getattr_np() recursively calls itself. > This is not true. What's happening is that you are causing a reentrant call of > pthread_getattr_np. Right. Bad wording on my side. > > Yet another reason why we shouldn't override weak malloc on Linux - it catches > too many things. > I really don't understand how this is related. > - How do you intend to do heap profiling without hooking malloc? > - How is "weak" related? Conversely to popular belief malloc is NOT a weak > symbol. What is weak are the glibc malloc_hook symbols, but that is a different > story and is irrelevant here. The problem with weak malloc is that we get calls from system libraries. We could just wrap malloc as we do on Android and limit our scope to Chrome only. > - We've always overridden malloc on Linux, before any heap profiling. We did > that for the sake of security checks and having our own allocator (TCMalloc). > Not sure why we should change that now. > - This problem is not about overriding malloc. The problem is that writing a > heap profiler is tricky. Here you were assuming that pthread_gettattr_np doesn't > call malloc re-entering the shim. You just found that this is not true. The problem is not in the heap profiler, but in the fact that pthread_getattr_np() is not reentrant by design. And I can't blame Glibc guys - they obviously thought that if you are not explicitly reentering a function, you don't need to make it reentrant. BTW, thread name deadlock situation in Chrome is exactly the same - we can't ask for a thread name in GetCurrentSnapshot() because GetThreadName() would deadlock if indirectly called from SetThreadName() via an allocation. So on Android where we initialize heap profiling too late, we have to get thread names from backchannels. Anyway, I have a workaround for that reentrancy deadlock, but now I found that GPU process reports way less allocations, and doesn't report 'gpu' dumper (in the same row where 'malloc' is).
Here's an alternative proposal, which will not require you to punch any hole in the sandbox (^__^), will avoid any re-entrancy and should work on both linux and android (And even mac). Don't use pthread_getattr_np. Estimate the stack start using mincore(). mincore is a very nice and fast syscall which cal tell you two important things: 1. If a range of page pages is resident in memory. 2. If all the pages in the range are mapped or accessing it would segfault. What you want here is 2. So my proposal is: do a binary search from (cur_stack) to (cur_stack - X), and check for errno=ENOMEM. Where X some reasonable expected stack amount (~256k). Make also sure you doublecheck at the end that the area [result_of_binary_search; cur_stack] is fully mapped (you didn't hit holes in the middle) by doing a final mincore() call. On linux desktop the cost of a mincore() call seems to be ~260 ns. If it turns out to be a perf issue, cache the stack start address in the TLS. Also, mincore is supported also on Mac (not sure if it has the same semantics though, check). So the odds are that this code would work also there. Definitely should work on Linux and Android. You are welcome ;-)
On 2016/05/27 09:21:16, Primiano Tucci wrote: > Here's an alternative proposal, which will not require you to punch any hole in > the sandbox (^__^), will avoid any re-entrancy and should work on both linux and > android (And even mac). I don't think there is a problem with punching a hole inside sandbox - we're already punching a hole for sanitizers, so we might as well punch one more for profiling builds. > Don't use pthread_getattr_np. Estimate the stack start using mincore(). > mincore is a very nice and fast syscall which cal tell you two important things: > 1. If a range of page pages is resident in memory. 2. If all the pages in the > range are mapped or accessing it would segfault. > What you want here is 2. > > So my proposal is: do a binary search from (cur_stack) to (cur_stack - X), and > check for errno=ENOMEM. Where X some reasonable expected stack amount (~256k). > Make also sure you doublecheck at the end that the area > [result_of_binary_search; cur_stack] is fully mapped (you didn't hit holes in > the middle) by doing a final mincore() call. > On linux desktop the cost of a mincore() call seems to be ~260 ns. If it turns > out to be a perf issue, cache the stack start address in the TLS. > > Also, mincore is supported also on Mac (not sure if it has the same semantics > though, check). So the odds are that this code would work also there. > Definitely should work on Linux and Android. Hmm, documentation says that mincore() determines whether accessing a page would cause a page fault (not a segfault). So I can imagine that for large stacks some pages would go to the page file and would be reported by mincore() as absent, causing us to stop early. Overall, I've already spent day and and half on a problem that doesn't exist (crashing in unwinding on Linux), so my plan now is to see whether I can fix issues with the GPU process, and revert to pure-Android if not. BTW, it seems that OSX builds everything with frame pointers, so we're fine there (judging from its backtrace() function, which uses frame pointers).
On 2016/05/27 09:43:22, Dmitry Skiba wrote: > I don't think there is a problem with punching a hole inside sandbox - we're > already punching a hole for sanitizers, so we might as well punch one more for > profiling builds. That is a discussion that you should have with chrome-security then. > Hmm, documentation says that mincore() determines whether accessing a page would > cause a page fault (not a segfault). So I can imagine that for large stacks some > pages would go to the page file and would be reported by mincore() as absent, > causing us to stop early. Read more, it says both. Given a range of pages, if ret=0 it determines whether access a page will cause a page fault, as you say. BUT, if ret=-1 && errno=ENOMEM -> "addr to addr + length contained unmapped memory." (which is what causes your segfault)
On 2016/05/27 09:50:11, Primiano Tucci wrote: > On 2016/05/27 09:43:22, Dmitry Skiba wrote: > > I don't think there is a problem with punching a hole inside sandbox - we're > > already punching a hole for sanitizers, so we might as well punch one more for > > profiling builds. > That is a discussion that you should have with chrome-security then. > > > Hmm, documentation says that mincore() determines whether accessing a page > would > > cause a page fault (not a segfault). So I can imagine that for large stacks > some > > pages would go to the page file and would be reported by mincore() as absent, > > causing us to stop early. > > Read more, it says both. > Given a range of pages, if ret=0 it determines whether access a page will cause > a page fault, as you say. > BUT, if ret=-1 && errno=ENOMEM -> "addr to addr + length contained unmapped > memory." (which is what causes your segfault) Oh right. Anyway, I found that GPU process issue is not really an issue, it's the fact that three gpu: dump providers are unregistered after some time. I was just checking different points in the dumps.
dskiba@google.com changed reviewers: + rickyz@chromium.org
Ricky, please review seccomp exception I added for __NR_sched_getaffinity for Linux profiling builds (enable_profiling = true. See comment #13 for the issue it avoids (backtrace is from the GPU process): https://codereview.chromium.org/1975393002/#msg13 BTW, similar exception exists for sanitizers.
Ehm somethign I am missing here is: how did the re-entrancy case menioned in #23 go away? P.S. IMHO you are adding a all bunch of #ifdef and a special snowflake in the sandbox when you could just add few lines of mincore(). But as long as all owners are fine, if you really thing that's the right way to go I'm fine.
On 2016/05/27 11:09:15, Primiano Tucci wrote: > Ehm somethign I am missing here is: how did the re-entrancy case menioned in #23 > go away? > > P.S. IMHO you are adding a all bunch of #ifdef and a special snowflake in the > sandbox when you could just add few lines of mincore(). > But as long as all owners are fine, if you really thing that's the right way to > go I'm fine. Because we're now not calling pthread_getattr_np() for the main thread, avoiding the problem. Honestly, I don't like complexity of mincore() approach. I would rather just parse /proc/maps and find the region that contains the stack. My other CL even has the necessary FindMappedRegion() function. However, there are no issues left with the current approach, so I think we're fine.
On 2016/05/27 15:51:09, Dmitry Skiba wrote: > Honestly, I don't like complexity of mincore() approach. I would rather just > parse /proc/maps and find the region that contains the stack. My other CL even > has the necessary FindMappedRegion() function. Parsing /proc/maps is: 1) Slow, the kernel has to generate the gigantic textual section for you, take the mm lock and fill all the pages. It will take something in the order of ms. for doing that. 2) Complex. Your code there depended on //third_part/symbolize, AFAIK base cannot depend on anything else. Also would you be able to do all that without triggering any malloc? If not you are back to the reentrancy problem again. 3) You are blocked by the sandbox. You cannot open() in child processes on Linux, neither /proc/self/maps nor anything else. > complexity of mincore() approach I think we are talking about something like 10-20 lines of C which invoke only a 200ns syscall. ¯\_(ツ)_/¯
On 2016/05/27 16:19:30, Primiano Tucci wrote: > On 2016/05/27 15:51:09, Dmitry Skiba wrote: > > Honestly, I don't like complexity of mincore() approach. I would rather just > > parse /proc/maps and find the region that contains the stack. My other CL even > > has the necessary FindMappedRegion() function. > > Parsing /proc/maps is: > 1) Slow, the kernel has to generate the gigantic textual section for you, take > the mm lock and fill all the pages. It will take something in the order of ms. > for doing that. That parsing is currently happening in pthread_getattr_np() for the main thread for both bionic and glibc. Looks pretty fast. > 2) Complex. Your code there depended on //third_part/symbolize, AFAIK base > cannot depend on anything else. Also would you be able to do all that without > triggering any malloc? If not you are back to the reentrancy problem again. That dependency is already there for Linux because stack_trace_posix.cc uses google::Symbolize(). One line change to add it for Android. > 3) You are blocked by the sandbox. You cannot open() in child processes on > Linux, neither /proc/self/maps nor anything else. Maybe we can punch a yet another hole? > > > complexity of mincore() approach > I think we are talking about something like 10-20 lines of C which invoke only a > 200ns syscall. ¯\_(ツ)_/¯ Why do you want to do mincore() anyway? Current approach works fine.
On 2016/05/27 16:28:30, Dmitry Skiba wrote: > That parsing is currently happening in pthread_getattr_np() for the main thread > for both bionic and glibc. Looks pretty fast. http://pastebin.com/sF9hzJWd reading /proc/.../maps for chrome takes ~300 us without any processing. > Maybe we can punch a yet another hole? again, that is something you need to discuss with chrome-security. I am not a security expert but: - in principle /proc/self/maps is the last file you want to expose as it reveals all ASLR :) - technically the BPF sandbox can filter based only on the arguments of the syscall and cannot dereference any memory, as it could be an invalid pointer. In other words, you cannot white/blacklist depending on the result of dereferencing a char* pointer. > Why do you want to do mincore() anyway? Current approach works fine. Anyways, as long the other owners here are fine, the change in //base/trace_event itself LGTM https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... File base/trace_event/heap_profiler_allocation_context_tracker.cc (right): https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... base/trace_event/heap_profiler_allocation_context_tracker.cc:212: #if HAVE_TRACE_STACK_FRAME_POINTERS && !defined(OS_NACL) tip: maybe exclude OS_NACL in HAVE_TRACE_STACK_FRAME_POINTERS so you have to check only for HAVE_TRACE_STACK_FRAME_POINTERS without putting OS_NACL everywhere?
mmenke@chromium.org changed reviewers: + mmenke@chromium.org
https://codereview.chromium.org/1975393002/diff/120001/base/debug/stack_trace.cc File base/debug/stack_trace.cc (right): https://codereview.chromium.org/1975393002/diff/120001/base/debug/stack_trace... base/debug/stack_trace.cc:78: CHECK(!error); include base/logging.h
OK, so while I've worked around pthread_getattr_np() reentrancy deadlock for the main thread, it still deadlocks for any other thread. Deadlock is 100% guaranteed if first function we call in a new thread is pthread_getattr_np() itself. So I'm going to implement mincore() approach. It doesn't seem to be that complex as I thought, and the bonus part is that we won't need to unify GetStackStart() with blink. And compared to FindMappedRegion() it's less intrusive. Although we'll still need to punch a hole for mincore syscall. https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... File base/trace_event/heap_profiler_allocation_context_tracker.cc (right): https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... base/trace_event/heap_profiler_allocation_context_tracker.cc:212: #if HAVE_TRACE_STACK_FRAME_POINTERS && !defined(OS_NACL) On 2016/05/27 16:49:43, Primiano Tucci wrote: > tip: maybe exclude OS_NACL in HAVE_TRACE_STACK_FRAME_POINTERS so you have to > check only for HAVE_TRACE_STACK_FRAME_POINTERS without putting OS_NACL > everywhere? OS_NACL is there because in NaCl mode stack_trace.cc is not compiled. So stack_trace.h/cc itself is fine, and it shouldn't care about NACL. The problem is in this file which (without NACL check) tries to use something that is not available.
On 2016/05/27 18:50:09, Dmitry Skiba wrote: > OK, so while I've worked around pthread_getattr_np() reentrancy deadlock for the > main thread, it still deadlocks for any other thread. Deadlock is 100% > guaranteed if first function we call in a new thread is pthread_getattr_np() > itself. > > So I'm going to implement mincore() approach. It doesn't seem to be that complex > as I thought, and the bonus part is that we won't need to unify GetStackStart() > with blink. And compared to FindMappedRegion() it's less intrusive. Although > we'll still need to punch a hole for mincore syscall. > there is no need to punch any hole. mincore is already allowed. we use it to implement resident stats in discardable memory dumper. > https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... > File base/trace_event/heap_profiler_allocation_context_tracker.cc (right): > > https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... > base/trace_event/heap_profiler_allocation_context_tracker.cc:212: #if > HAVE_TRACE_STACK_FRAME_POINTERS && !defined(OS_NACL) > On 2016/05/27 16:49:43, Primiano Tucci wrote: > > tip: maybe exclude OS_NACL in HAVE_TRACE_STACK_FRAME_POINTERS so you have to > > check only for HAVE_TRACE_STACK_FRAME_POINTERS without putting OS_NACL > > everywhere? > > OS_NACL is there because in NaCl mode stack_trace.cc is not compiled. So > stack_trace.h/cc itself is fine, and it shouldn't care about NACL. The problem > is in this file which (without NACL check) tries to use something that is not > available.
On 2016/05/27 20:04:03, Primiano Tucci wrote: > On 2016/05/27 18:50:09, Dmitry Skiba wrote: > > OK, so while I've worked around pthread_getattr_np() reentrancy deadlock for > the > > main thread, it still deadlocks for any other thread. Deadlock is 100% > > guaranteed if first function we call in a new thread is pthread_getattr_np() > > itself. > > > > So I'm going to implement mincore() approach. It doesn't seem to be that > complex > > as I thought, and the bonus part is that we won't need to unify > GetStackStart() > > with blink. And compared to FindMappedRegion() it's less intrusive. Although > > we'll still need to punch a hole for mincore syscall. > > > there is no need to punch any hole. mincore is already allowed. we use it to > implement resident stats in discardable memory dumper. It didn't work without punching. Search src/sandbox - mincore syscall is mentioned only once, in IsAllowedAddressSpaceAccess() where it causes it to return false. > > > https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... > > File base/trace_event/heap_profiler_allocation_context_tracker.cc (right): > > > > > https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... > > base/trace_event/heap_profiler_allocation_context_tracker.cc:212: #if > > HAVE_TRACE_STACK_FRAME_POINTERS && !defined(OS_NACL) > > On 2016/05/27 16:49:43, Primiano Tucci wrote: > > > tip: maybe exclude OS_NACL in HAVE_TRACE_STACK_FRAME_POINTERS so you have to > > > check only for HAVE_TRACE_STACK_FRAME_POINTERS without putting OS_NACL > > > everywhere? > > > > OS_NACL is there because in NaCl mode stack_trace.cc is not compiled. So > > stack_trace.h/cc itself is fine, and it shouldn't care about NACL. The problem > > is in this file which (without NACL check) tries to use something that is not > > available.
dskiba@google.com changed reviewers: + jln@chromium.org
Julien, since Ricky is OOO next week, can you please review mincore exception for profiling builds?
On 2016/05/28 00:03:51, Dmitry Skiba wrote: > Julien, since Ricky is OOO next week, can you please review mincore exception > for profiling builds? This seems to have a pretty significant security impact. When is #if BUILDFLAG(ENABLE_PROFILING) true? Does such a config ever ship to users? I expect it can't be restricted to DEBUG builds, can it? On which OS do you need this?
On 2016/05/28 00:14:42, jln (very slow on Chromium) wrote: > On 2016/05/28 00:03:51, Dmitry Skiba wrote: > > Julien, since Ricky is OOO next week, can you please review mincore exception > > for profiling builds? > > This seems to have a pretty significant security impact. > When is #if BUILDFLAG(ENABLE_PROFILING) true? Does such a config ever ship to users? When enable_profiling GN variable is set to true (profiling = 1 for GYP). Enabling profiling also turns on all sorts of other things (e.g. stack frame pointers, -g2, no relocation packing), so I don't think that we ever ship such builds to users. I use this mode for native heap profiling, where we collect stack frames on each allocation. Primiano, can we guarantee that we won't ever ship profiling builds to users? > I expect it can't be restricted to DEBUG builds, can it? On which OS do you need this? Yeah, we actually need release profiling builds, as they are more performant. I need this only for Linux / Android, but since I modified file in linux/ folder, I suppose it's useless to add #ifdefs there?
Nico, Primiano, please review.
Thanks for the work. Makes sense conceptually but I think the code can be simplified quite a bit (removing template, lambdas and double nested loops :) ) On 2016/05/28 00:14:42, jln (very slow on Chromium) wrote: > On 2016/05/28 00:03:51, Dmitry Skiba wrote: > This seems to have a pretty significant security impact. > When is #if BUILDFLAG(ENABLE_PROFILING) true? Does such a config ever ship to users? > so I don't think that we ever ship such builds to users. Yup I am not aware of any case where we ship this. profiling=true should just be for local perf development workflow. > Primiano, can we guarantee that we won't ever ship profiling builds to users? Would it make feel everybody better if we added, in the sandbox code something like: #if defined(GOOGLE_CHROME_BUILD) && BUILDFLAG(ENABLE_PROFILING) #error Nononon this is not meant to be shipped #endif to make sure we never ship a chrome-branded thing built with profiling=1? https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... File base/trace_event/heap_profiler_allocation_context_tracker.cc (right): https://codereview.chromium.org/1975393002/diff/120001/base/trace_event/heap_... base/trace_event/heap_profiler_allocation_context_tracker.cc:212: #if HAVE_TRACE_STACK_FRAME_POINTERS && !defined(OS_NACL) On 2016/05/27 18:50:09, Dmitry Skiba wrote: > On 2016/05/27 16:49:43, Primiano Tucci wrote: > > tip: maybe exclude OS_NACL in HAVE_TRACE_STACK_FRAME_POINTERS so you have to > > check only for HAVE_TRACE_STACK_FRAME_POINTERS without putting OS_NACL > > everywhere? > > OS_NACL is there because in NaCl mode stack_trace.cc is not compiled. So > stack_trace.h/cc itself is fine, and it shouldn't care about NACL. The problem > is in this file which (without NACL check) tries to use something that is not > available. Yeah but since we generally don't care about what happens in nacl, can we be aggressive (always set HAVE_TRACE_STACK_FRAME_POINTERS = false on nacl) in favor of code readability? https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace.cc File base/debug/stack_trace.cc (right): https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:62: template <size_t PageCount> why do you need a template function with a lambda inside? Can this function be simpler? https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:66: int result = HANDLE_EINTR( accordin to its manpage mincore doesn't EINTR (vm operations are typically not interruptible because they take the map_sem). At most can EAGAIN if the kernel is temporarily OOM.So maybe you want to do something like https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/p... https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:72: CHECK_EQ(errno, ENOMEM); I'd remove this CHECK or make it a DCHECK. 1) you can get a EAGAIN 2) If you fail this estimation worst case you'll end up scanning a reduced stack which is not undefined behavior (which is what CHECK should protect against). https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:88: for (size_t pages = PageCount; pages != 1;) { I think this is a bit hard to follow as you do a binary search here and an outer while below. Could this be just a single loop in one place so it's easier to read? https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace.h File base/debug/stack_trace.h (right): https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.h:107: struct BASE_EXPORT PerThreadStackInfo { maybe s/PerThreadStackInfo/ThreadStackLimits/ https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.h:122: // Note on |stack_info|. By default the function relies on heuristics to check IMHO this comment is a bit too apologetic. I'd just say: |stack_info| is an optional cache that avoids recomputing the stack limits on each invocation. The caller is supposed to just keep it in a TLS and pass it back.
mariakhomenko@chromium.org changed reviewers: + mariakhomenko@chromium.org
> Thanks for the work. Makes sense conceptually but I think the code can be > simplified quite a bit (removing template, lambdas and double nested loops :) ) I refactored AdvanceMappedPages() into more obvious FindUnmappedPage() function, and moved while loop there, but I still need all those things: 1. I need lambda to abstract nasty mincore() details, since we're really interested in binary answer. Code is just cleaner this way. 2. I need while() loop because mincore() wants 'vec' array to cover the whole page range. I don't want to allocate, and input address range can be arbitrary, so we might need several probe iterations to cover the whole input range. 3. We have two distinct cases for probe() calls - the first one checks if the whole range is fine (and is really a part of #2), the second is inside binary search loop, and divides range in half on each iteration. I peppered FindUnmappedPages() with comments, hopefully it's easier to follow now. https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace.cc File base/debug/stack_trace.cc (right): https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:66: int result = HANDLE_EINTR( On 2016/05/31 16:13:07, Primiano Tucci wrote: > accordin to its manpage mincore doesn't EINTR (vm operations are typically not > interruptible because they take the map_sem). At most can EAGAIN if the kernel > is temporarily OOM.So maybe you want to do something like > https://code.google.com/p/chromium/codesearch#chromium/src/base/trace_event/p... Done. https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:72: CHECK_EQ(errno, ENOMEM); On 2016/05/31 16:13:06, Primiano Tucci wrote: > I'd remove this CHECK or make it a DCHECK. > 1) you can get a EAGAIN > 2) If you fail this estimation worst case you'll end up scanning a reduced stack > which is not undefined behavior (which is what CHECK should protect against). Done. https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.cc:88: for (size_t pages = PageCount; pages != 1;) { On 2016/05/31 16:13:06, Primiano Tucci wrote: > I think this is a bit hard to follow as you do a binary search here and an outer > while below. > Could this be just a single loop in one place so it's easier to read? I don't see how it can be a single loop. When binary searching I half the range on each iteration. So the first check that checks the whole range doens't really belong here. Besides, I need to return true / false depending on the first check. https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace.h File base/debug/stack_trace.h (right): https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.h:107: struct BASE_EXPORT PerThreadStackInfo { On 2016/05/31 16:13:07, Primiano Tucci wrote: > maybe s/PerThreadStackInfo/ThreadStackLimits/ Acknowledged. https://codereview.chromium.org/1975393002/diff/140001/base/debug/stack_trace... base/debug/stack_trace.h:122: // Note on |stack_info|. By default the function relies on heuristics to check On 2016/05/31 16:13:07, Primiano Tucci wrote: > IMHO this comment is a bit too apologetic. I'd just say: > |stack_info| is an optional cache that avoids recomputing the stack limits on > each invocation. The caller is supposed to just keep it in a TLS and pass it > back. Hmm, I don't see it that way. I think it explains motivation behind stack_info, which can help deciding whether or not need you need it.
Julien: I added !defined(GOOGLE_CHROME_BUILD) to the condition, to make sure it doesn't get into official builds.
mmenke@chromium.org changed reviewers: - mmenke@chromium.org
I think this can be really simplified. Unless I am missing something you can do
all this in 1/4 of the LOC, in a way which is IMHO more readable and less
obscure.
I might be a bit old style but I really don't see the point of using fancy
lambdas which in your case just hide the method name and require you comments.
Here's a proposal in < 40 LOC
const size_t kMaxFrameSize = 128 * 1024;
#if defined(HAVE_MINCORE)
bool IsRangeMapped(uintptr_t start, uintptr_t end) {
static uintptr_t page_size = 0;
if (!page_size) page_size = GetPageSize();
const uintptr_t aligned_start = start & ~(page_size - 1);
const uintptr_t aligned_end = end & ~(page_size - 1);
const size_t num_pages = 1 + (aligned_end - aligned_start) / page_size;
const size_t kMaxProbePages = kMaxFrameSize / 4096;
DCHECK_LE(num_pages, kMaxProbePages);
uint8_t unused[kMaxProbePages];
int error_counter = 0;
int result;
do {
result = mincore(reinterpret_cast<void*>(aligned_start), num_pages, unused);
} while (result == -1 && errno == EAGAIN && error_counter++ < 100);
if (result == 0)
return true;
// mincore returns ENOMEM if at least one page in the range is not mapped.
DCHECK_EQ(ENOMEM, errno);
return false;
}
#endif
size_t TraceStackFramePointers(..) {
...
{
uintptr_t next_sp = reinterpret_cast<const uintptr_t*>(sp)[0];
// With the stack growing downwards, older stack frame must be
// at a greater address that the current one.
if (next_sp <= sp) break;
// Check alignment.
if (sp & (sizeof(void*) - 1)) break;
// Assume stack frames larger than kMaxFrameSize bytes are bogus.
if (next_sp - sp > kMaxFrameSize) break;
#if defined(HAVE_MINCORE)
uintptr_t stack_limit = stack_info ? stack_info->stack_limit : 0;
// If the next sp is beyond the previously probed stack limit, probe the new
// limit and cache it.
if (next_sp > stack_limit) {
if (IsRangeMapped(sp, next_sp))) {
stack_limit = base::Align(next_sp, page_size) - 1;
if (stack_info) stack_info->stack_limit = stack_limit;
}
}
if (next_sp > stack_limit)
break;
#endif
sp = next_sp;
}
...
}
On 2016/06/01 20:10:58, Primiano Tucci wrote: > I think this can be really simplified. Unless I am missing something you can do > all this in 1/4 of the LOC, in a way which is IMHO more readable and less > obscure. > I might be a bit old style but I really don't see the point of using fancy > lambdas which in your case just hide the method name and require you comments. > Here's a proposal in < 40 LOC Hmm, your code is not equivalent: 1. It will likely probe all pages one by one, since you are probing [sp, next_sp) range, and next_sp is likely to be close. I.e. it will be slower. 2. For large stack frames it will give inaccurate results, since when IsRangeMapped(sp, next_sp) fails, you don't know which page is unmapped, and simply use previous stack_limit. Eventually because of #1 it will find true stack limit, but at runtime cost. 3. IsRangeMapped hardcodes page size to be 4096 (kMaxProbePages = kMaxFrameSize / 4096). This is probably fine, but all code everywhere else uses GetPageSize(), and I feel that using function is safer. If we want to go this route, we can simply have IsPageMapped() and check all pages between sp and next_sp. This will also solve some stylistic issues, like the fact that kMaxFrameSize, which is TraceStackFramePointers' implementation detail, is exposed. What exactly bothers you in my solution? I think it's pretty well commented, and lambda actually contributes to the function's clarity.
Just found and issue with mincore approach on Linux: 7f4299c1f000-7f429a41f000 rw-p 00000000 00:00 0 [stack:109410] 7f429a41f000-7f429a420000 ---p 00000000 00:00 0 7f429a420000-7f429ac20000 rw-p 00000000 00:00 0 [stack:109287] 7f429ac20000-7f429ac21000 ---p 00000000 00:00 0 7f429ac21000-7f429b421000 rw-p 00000000 00:00 0 [stack:109286] As you can see, stack for thread 109410 ends with unaccessible '---p' page (guard page?), which is immediately followed by a stack for thread 109287. So mincore will report that guard page is mapped, which however doesn't mean that it's readable. What's more, mincore is actually useless here, since stacks follow each other, and there are no gaps. So if there is a bad pointer, which however passes <100000 check, it's probably perfectly readable.
On 2016/06/01 23:26:23, Dmitry Skiba wrote: > 1. It will likely probe all pages one by one, since you are probing [sp, > next_sp) range, and next_sp is likely to be close. I.e. it will be slower. Yes but only once per thread. Is it a real problem? Don't think so. > 2. For large stack frames it will give inaccurate results, since when > IsRangeMapped(sp, next_sp) fails, you don't know which page is unmapped, and > simply use previous stack_limit. Eventually because of #1 it will find true > stack limit, but at runtime cost. If a (sp, next_sp) range is not fully mapped it means that the stack is corrupt. At which point I am not sure we care about whatever happens. The current code will stop at the latest valid stack limit found, which doesn't seem unreasonable as a behavior. > 3. IsRangeMapped hardcodes page size to be 4096 (kMaxProbePages = kMaxFrameSize > / 4096). This is probably fine, but all code everywhere else uses GetPageSize(), > and I feel that using function is safer. Right sorry just divide that by page_size not 4096, my bad. > If we want to go this route, we can simply have IsPageMapped() and check all > pages between sp and next_sp. I just thought this code was the simplest and easier to read. > This will also solve some stylistic issues, like > the fact that kMaxFrameSize, which is TraceStackFramePointers' implementation > detail, is exposed. You had that kMaxFrameSize anyways. It's just called anonymously 100000. I don't see the issue with turning that into a constant. This is not exposed anywhere, it's a const in the anonymous namespace. > What exactly bothers you in my solution? The fact that is ~4x longer and way IMHO harder to read. The time spent on this CL is nothing comapred to the time that we'll spend 6 months from now debugging this code. > I think it's pretty well commented, and lambda actually contributes to the function's clarity. Ok don't know what else to say. I'm not even a reviewer here, was trying to help. Figure it out with a //base/ OWNER > So mincore will report that guard page is mapped, which however doesn't mean that it's readable. What's more, mincore is actually useless here, since stacks follow each other, and there are no gaps. So if there is a bad pointer, which however passes <100000 check, it's probably perfectly readable. Good point. So either you look at the resident bit of mincore() and just drop the ball early if you hit stack that has been swapped (I wouldn't expect that as stack should be in LRU, but I never checked actual data) or figure out some other solution.
The CQ bit was checked by dskiba@google.com
The patchset sent to the CQ was uploaded after l-g-t-m from thakis@chromium.org, primiano@chromium.org Link to the patchset: https://codereview.chromium.org/1975393002/#ps200001 (title: "Revert to the LGTMed patchset 1")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1975393002/200001
Message was sent while issue was closed.
Description was changed from ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android when it dives deep into JVM internals and finds bad stack pointer there. See details here: crbug.com/602701#c18 This CL adds checks to make sure only valid stack pointers are dereferenced. BUG=602701 ========== to ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android when it dives deep into JVM internals and finds bad stack pointer there. See details here: crbug.com/602701#c18 This CL adds checks to make sure only valid stack pointers are dereferenced. BUG=602701 ==========
Message was sent while issue was closed.
Committed patchset #11 (id:200001)
Message was sent while issue was closed.
Description was changed from ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android when it dives deep into JVM internals and finds bad stack pointer there. See details here: crbug.com/602701#c18 This CL adds checks to make sure only valid stack pointers are dereferenced. BUG=602701 ========== to ========== Check stack pointer to be inside stack when unwinding. TraceStackFramePointers() function, which unwinds stack using frame pointers, sometimes crashes on Android when it dives deep into JVM internals and finds bad stack pointer there. See details here: crbug.com/602701#c18 This CL adds checks to make sure only valid stack pointers are dereferenced. BUG=602701 Committed: https://crrev.com/0bed5150f19acdd4897dd12ea8e4d4802c51d75f Cr-Commit-Position: refs/heads/master@{#398134} ==========
Message was sent while issue was closed.
Patchset 11 (id:??) landed as https://crrev.com/0bed5150f19acdd4897dd12ea8e4d4802c51d75f Cr-Commit-Position: refs/heads/master@{#398134}
Message was sent while issue was closed.
Documented my attempts here: crbug.com/617730 |
