|
|
Created:
6 years, 2 months ago by haraken Modified:
6 years, 2 months ago CC:
blink-reviews, kouhei+heap_chromium.org, Mads Ager (chromium), mkwst+moarreviews_chromium.org Base URL:
svn://svn.chromium.org/blink/trunk Project:
blink Visibility:
Public. |
DescriptionOilpan: Avoid looking up TLS in ThreadState::current()
This CL adds a fast, conservative way to determine if we are in the main thread.
1) In ThreadState::init(), take the underestimated size of the main thread.
2) When ThreadState::current() is called, check if the current stack address is within the underestimated size from the stack start of the main thread. If yes, we can immediately return the mainThreadState() without looking up TLS. Otherwise, we look up TLS.
TLS is very slow especially in Mac and Windows. This CL improves blink_perf.parser.querySelector-* (and other) benchmarks by 6% in Mac.
BUG=420515
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=183637
Patch Set 1 #Patch Set 2 : #
Total comments: 1
Patch Set 3 : #
Total comments: 7
Patch Set 4 : #Patch Set 5 : #
Total comments: 4
Patch Set 6 : #Patch Set 7 : #Patch Set 8 : #Patch Set 9 : #Messages
Total messages: 28 (7 generated)
haraken@chromium.org changed reviewers: + oilpan-reviews@chromium.org
PTAL
This optimization idea was rejected in https://codereview.chromium.org/113693002/ (9 months ago), but it would make sense to revisit the optimization now. This actually improves various benchmarks significantly. I agree that this is a kind of nasty heuristics, but we've already introduced a lot of heuristics to our GC infrastructure :)
ager@chromium.org changed reviewers: + ager@chromium.org
https://codereview.chromium.org/643883002/diff/40001/Source/platform/heap/Thr... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/40001/Source/platform/heap/Thr... Source/platform/heap/ThreadState.h:385: if (LIKELY(addressDiff <= 1024 * 1024)) { There is no way to guarantee that the stack is at least 1MB so this is not safe. Imagine that the system decides to use stacks of size 512kB each. Then you can get in a situation where objects that belong in a worker thread heap will be allocated in the main heap. To avoid thread-local access we should use the affinity support that we already have. We decided to just go through thread-local for anything that is not a Node or a CSS object. We can revisit that by modifying the thread affinity traits.
On 2014/10/10 13:44:36, Mads Ager (chromium) wrote: > https://codereview.chromium.org/643883002/diff/40001/Source/platform/heap/Thr... > File Source/platform/heap/ThreadState.h (right): > > https://codereview.chromium.org/643883002/diff/40001/Source/platform/heap/Thr... > Source/platform/heap/ThreadState.h:385: if (LIKELY(addressDiff <= 1024 * 1024)) > { > There is no way to guarantee that the stack is at least 1MB so this is not safe. > Imagine that the system decides to use stacks of size 512kB each. Then you can > get in a situation where objects that belong in a worker thread heap will be > allocated in the main heap. > > To avoid thread-local access we should use the affinity support that we already > have. We decided to just go through thread-local for anything that is not a Node > or a CSS object. We can revisit that by modifying the thread affinity traits. I'm not sure if it's realistic because: - we cannot make CSSValues MainThreadOnly since CSSValues are touched by a MediaQueryParser thread. - we've found that the MainThreadOnly macros add a lot of noises to the code base and produce unexpected crashes (if we specify the macro on a class that is actually not MainThreadOnly). If your concern is the uncertainty about the stack size, can we implement something like ThreadState::getStackStart() to get an exact stack size?
vegorov@google.com changed reviewers: + vegorov@google.com
FWIW apparently NaCl does quite an hack to get fast TLS access: they check in runtime that pthread_getspecific is implemented in a certain way and refuse to startup otherwise. In runtime they later just use inline assembly to get fast access to TLS through %gs/%fs segment registers https://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/tru... I wonder if we could just do the same for Oilpan
On 2014/10/10 14:12:26, Vyacheslav Egorov (Google) wrote: > FWIW apparently NaCl does quite an hack to get fast TLS access: they check in > runtime that pthread_getspecific is implemented in a certain way and refuse to > startup otherwise. > > In runtime they later just use inline assembly to get fast access to TLS through > %gs/%fs segment registers > > https://src.chromium.org/viewvc/native_client/trunk/src/native_client/src/tru... > > I wonder if we could just do the same for Oilpan The Android version of getStackStart() looks pthreads-generic. Is __get_stack_base() a faster (and stable enough) alternative, or has that already been considered?
We should not really call getStackStart() on the performance critical path anywhere, so I don't think it's really important to have a Android specific version of it. I think we used to use a more Android specific code at some point (or tried one), but to be honest at the moment I don't really remember what happened with that
On 2014/10/10 19:13:15, Vyacheslav Egorov (Google) wrote: > We should not really call getStackStart() on the performance critical path > anywhere, so I don't think it's really important to have a Android specific > version of it. > > I think we used to use a more Android specific code at some point (or tried > one), but to be honest at the moment I don't really remember what happened with > that I think getStackStart() is called only once when initializing ThreadState.
Implemented getStackSize(), although I'm not sure if the implementation is correct. The problem is that pthread_getattr_np fails for the main thread in __GLIBC__, ANDROID and FREEBSD and thus we cannot get a stack size in these platforms. However, we actually don't need to get the stack size for these platforms because TLS lookup is fast in these platforms. We can just use TLS lookup for these platforms. Mads is on vacation, so Slava and Sigborn, would you take a look?
sigbjornf@opera.com changed reviewers: + sigbjornf@opera.com
https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: unsigned addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); To remain safe, doesn't this need to be ptrdiff_t (or intptr_t) ?
https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.cpp:128: return __readgsqword(offsetof(NT_TIB64, StackBase)) - __readgsqword(offsetof(NT_TIB64, StackLimit)); Does this account for stack usage&sizes exceeding the initial commit, i.e., won't StackLimit grow as the guard page is hit and more is committed? (This comment really belongs at the getStackSize() call site, I suppose.)
https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.cpp:128: return __readgsqword(offsetof(NT_TIB64, StackBase)) - __readgsqword(offsetof(NT_TIB64, StackLimit)); On 2014/10/12 08:09:23, sof wrote: > Does this account for stack usage&sizes exceeding the initial commit, i.e., > won't StackLimit grow as the guard page is hit and more is committed? > > (This comment really belongs at the getStackSize() call site, I suppose.) I don't know if the StackLimit grows up dynamically. Given that the stack pointer check is a best-effort service, it is OK to underestimate the stack size. I renamed the method to getUnderestimatedStackSize(). However, I welcome a better solution. https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: unsigned addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); On 2014/10/12 07:53:55, sof wrote: > To remain safe, doesn't this need to be ptrdiff_t (or intptr_t) ? Replaced it with uintptr_t. It is important that it is *u*intptr_t to skip a comparison with 0.
The optimization made looks sound&fine to me, but deferring to platform/heap/ owners. [ The hottest current() call being from allocate(), right? ] https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.cpp (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.cpp:128: return __readgsqword(offsetof(NT_TIB64, StackBase)) - __readgsqword(offsetof(NT_TIB64, StackLimit)); On 2014/10/12 08:45:49, haraken wrote: > On 2014/10/12 08:09:23, sof wrote: > > Does this account for stack usage&sizes exceeding the initial commit, i.e., > > won't StackLimit grow as the guard page is hit and more is committed? > > > > (This comment really belongs at the getStackSize() call site, I suppose.) > > I don't know if the StackLimit grows up dynamically. > > Given that the stack pointer check is a best-effort service, it is OK to > underestimate the stack size. I renamed the method to > getUnderestimatedStackSize(). > > However, I welcome a better solution. > Yes, if the initial commit is 1M, then calling current() with a deeper stack ought to be rare (a guess), so not having it track the current stack limit/size seems just fine. https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: unsigned addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); On 2014/10/12 08:45:50, haraken wrote: > On 2014/10/12 07:53:55, sof wrote: > > To remain safe, doesn't this need to be ptrdiff_t (or intptr_t) ? > > Replaced it with uintptr_t. It is important that it is *u*intptr_t to skip a > comparison with 0. That type needs to be used for s_maniThreadStackSize also (signed/unsigned mismatch if not) ?
Slava@: Would you take a look? Mads is on vacation. https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/170001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: unsigned addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); On 2014/10/12 11:39:17, sof wrote: > On 2014/10/12 08:45:50, haraken wrote: > > On 2014/10/12 07:53:55, sof wrote: > > > To remain safe, doesn't this need to be ptrdiff_t (or intptr_t) ? > > > > Replaced it with uintptr_t. It is important that it is *u*intptr_t to skip a > > comparison with 0. > > That type needs to be used for s_maniThreadStackSize also (signed/unsigned > mismatch if not) ? Done.
lgtm https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: uintptr_t addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); any reason to cast to intptr_t not uintptr_t here? https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:390: if (LIKELY(addressDiff <= s_mainThreadUnderestimatedStackSize)) { I think s_mainThreadStackStart is exclusive, not inclusive (e.g. it points past the last page of the stack in linear order) so addressDiff == 0 actually does not belong to the main thread state. I think substracting sizeof(void*) from the addressDiff and using < instead of <= would fix it.
https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... File Source/platform/heap/ThreadState.h (right): https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:385: uintptr_t addressDiff = s_mainThreadStackStart - reinterpret_cast<intptr_t>(&dummy); On 2014/10/13 14:07:23, Vyacheslav Egorov (Google) wrote: > any reason to cast to intptr_t not uintptr_t here? Done. https://codereview.chromium.org/643883002/diff/320001/Source/platform/heap/Th... Source/platform/heap/ThreadState.h:390: if (LIKELY(addressDiff <= s_mainThreadUnderestimatedStackSize)) { On 2014/10/13 14:07:23, Vyacheslav Egorov (Google) wrote: > I think s_mainThreadStackStart is exclusive, not inclusive (e.g. it points past > the last page of the stack in linear order) so addressDiff == 0 actually does > not belong to the main thread state. > > I think substracting sizeof(void*) from the addressDiff and using < instead of > <= would fix it. Good point. In order to avoid do the subtraction in ThreadState::current() (which is performance-sensitive), I added the subtraction code to ThreadState's constructor.
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/643883002/380001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: mac_blink_rel on tryserver.blink (http://build.chromium.org/p/tryserver.blink/builders/mac_blink_rel/builds/26878)
The CQ bit was checked by haraken@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/643883002/270002
Message was sent while issue was closed.
Committed patchset #9 (id:270002) as 183637 |