|
|
Created:
6 years, 10 months ago by bulach Modified:
6 years, 10 months ago CC:
chromium-reviews, dsinclair+watch_chromium.org, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionTrace event micro-optimization: use finer grained lock.
GetCategoryGroupEnabledInternal is a hotspot.
It traverses an append-only list, so it's safe to use a finer-grained lock,
and avoid it altogether for the "fast path", i.e., when the category is
already known.
BUG=
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=251178
Patch Set 1 #Patch Set 2 : comments #Patch Set 3 : off by one #
Total comments: 1
Messages
Total messages: 23 (0 generated)
ptal
+wangxianzhu who has done a lot of work with the various tracing locks.
Can we avoid the lock by making g_category_index an AtomicWord?
There seems a problem: when two threads are getting the enabled flag of the same category, they may both at first find the category has not been added, then the category may be added twice, so the lock of the whole function seems still needed. I had an idea of static categories which is created at compile-time to avoid most of the GetCategoryGroupEnabledInternal() calls at startup. Measurement of the overhead showed that it contributed little to the startup time so I didn't continue. Perhaps it is still worth investigating as it is still a hotspot.
good points! I'll change to an atomic word, and how about this: - have an unlocked fast path to detect the category is there (should be the most common ocurrence) - if not: acquire the lock, check again.. if it's there, bail out. if it isn't, append. would that work? eric's profiler shows that this is a hotspot, so even if the code ends up a bit more complicated, it's probably worth such microptimization? someone said that locks in ARM are somewhat costly, so avoiding them may be a good thing...
On 2014/02/05 18:01:42, bulach wrote: > good points! I'll change to an atomic word, and how about this: > - have an unlocked fast path to detect the category is there (should be the most > common ocurrence) > - if not: acquire the lock, check again.. if it's there, bail out. if it isn't, > append. > > would that work? eric's profiler shows that this is a hotspot, so even if the > code ends up a bit more complicated, it's probably worth such microptimization? > someone said that locks in ARM are somewhat costly, so avoiding them may be a > good thing... I think that would work.
This looks really nice! The more we can remove locks, use thread-locals, and defer stuff into 'overhead', the better :)
On 2014/02/05 20:44:44, epennerAtGoogle wrote: > This looks really nice! > > The more we can remove locks, use thread-locals, and defer stuff into > 'overhead', the better :) Question: I think I get how this is testing internal categories, but what about other categories that are not built in? Do we end up doing a string-comparison on all categories to determine if they are enabled? If so does that only happen all the time, or just once for every trace-site (presumably it could keep an index after that)?
second try, ptal..
On 2014/02/05 20:48:36, bulach wrote: > second try, ptal.. sorry, we crossed our streams... :) I haven't got a chance to look into the full release build yet, but from disassembling a Debug build: 017a7aec <base::MessageLoop::RunTask(base::PendingTask const&)>: 17a7aec: e92d 4ff0 stmdb sp!, {r4, r5, r6, r7, r8, r9, sl, fp, lr} 17a7af0: b0d3 sub sp, #332 ; 0x14c 17a7af2: f8df a288 ldr.w sl, [pc, #648] ; 17a7d7c <base::MessageLoop::RunTask(base::PendingTask const&)+0x290> 17a7af6: 4605 mov r5, r0 17a7af8: 6808 ldr r0, [r1, #0] 17a7afa: 460c mov r4, r1 17a7afc: 44fa add sl, pc 17a7afe: f8da a000 ldr.w sl, [sl] 17a7b02: 4f9f ldr r7, [pc, #636] ; (17a7d80 <base::MessageLoop::RunTask(base::PendingTask const&)+0x294>) 17a7b04: f8da 3000 ldr.w r3, [sl] 17a7b08: 447f add r7, pc 17a7b0a: 9351 str r3, [sp, #324] ; 0x144 17a7b0c: f000 f94e bl 17a7dac <tracked_objects::ThreadData::NowForStartOfRun(tracked_objects::Births const*)> 17a7b10: 683e ldr r6, [r7, #0] 17a7b12: 9011 str r0, [sp, #68] ; 0x44 17a7b14: b92e cbnz r6, 17a7b22 <base::MessageLoop::RunTask(base::PendingTask const&)+0x36> 17a7b16: 489b ldr r0, [pc, #620] ; (17a7d84 <base::MessageLoop::RunTask(base::PendingTask const&)+0x298>) 17a7b18: 4478 add r0, pc 17a7b1a: f7ea fd0d bl 1792538 <base::debug::TraceLog::GetCategoryGroupEnabled(char const*)> .... ^^^ this is where I got the inspiration to microoptimize... it looks like it does happen all the time for every TRACE macro, which I suppose is specially bad within RunTask.. (GetCategoryGroupEnabled calls into this GetCategoryGroupEnabledInternal that I'm trying trying to optimize for the fast-path...)
On 2014/02/05 20:47:21, epennerAtGoogle wrote: > On 2014/02/05 20:44:44, epennerAtGoogle wrote: > > This looks really nice! > > > > The more we can remove locks, use thread-locals, and defer stuff into > > 'overhead', the better :) > > Question: I think I get how this is testing internal categories, but what about > other categories that are not built in? Do we end up doing a string-comparison > on all categories to determine if they are enabled? If so does that only happen > all the time, or just once for every trace-site (presumably it could keep an > index after that)? Actually, I understand this better now. We initialize the list with some built-ins. Then we append other ones as we go. Without reading other code though, I can't see how often we do this string search. I'm thinking ideally it should be at most once per trace-site.
On 2014/02/05 20:55:04, epennerAtGoogle wrote: > Actually, I understand this better now. We initialize the list with some > built-ins. Then we append other ones as we go. Without reading other code > though, I can't see how often we do this string search. I'm thinking ideally it > should be at most once per trace-site. Yes, it does work like this.
lgtm. Thanks!
On 2014/02/05 20:55:04, epennerAtGoogle wrote: > On 2014/02/05 20:47:21, epennerAtGoogle wrote: > > On 2014/02/05 20:44:44, epennerAtGoogle wrote: > > > This looks really nice! > > > > > > The more we can remove locks, use thread-locals, and defer stuff into > > > 'overhead', the better :) > > > > Question: I think I get how this is testing internal categories, but what > about > > other categories that are not built in? Do we end up doing a string-comparison > > on all categories to determine if they are enabled? If so does that only > happen > > all the time, or just once for every trace-site (presumably it could keep an > > index after that)? > > Actually, I understand this better now. We initialize the list with some > built-ins. Then we append other ones as we go. Without reading other code > though, I can't see how often we do this string search. I'm thinking ideally it > should be at most once per trace-site. by once per trace-site you mean, once per trace invocation? yes, I think that's what happens... but then, on every invocation, it traverses this list again, under a lock. since your profiling showed it's hot, I think it can't hurt to have a fast path: - put the categories used by message loop first (the other patch) - traverse the list lock-free, which should be by far the most common case.. ...worse case, fallback to slow path... ;)
> by once per trace-site you mean, once per trace invocation? > yes, I think that's what happens... > but then, on every invocation, it traverses this list again, under a lock. > since your profiling showed it's hot, I think it can't hurt to have a fast path: > - put the categories used by message loop first (the other patch) > - traverse the list lock-free, which should be by far the most common case.. > ...worse case, fallback to slow path... ;) LGTM First, I totally like this patch and the approach, I'm just trying to learn more :) If we do this string search every invocation, it seems like we could do even better. We could store the index in some static or thread-local index stored in each trace event. We could do the search only if that isn't set. From Xianzhu it sounds like that might already be happening, in which case, all's good.
nat / dsinclair, can I have your OWNERS review please?
lgtm. Sorry for the delay.
also lgtm
hey folks, I had a stupid facepalm off by one error... bots are now happy.. it was a trivial change and bots are happy, so if I don't hear from you, it's ok: I'll put it through the CQ tomorrow since it was really a tiny change from previous patchset.. thanks anyways! https://codereview.chromium.org/144423009/diff/300001/base/debug/trace_event_... File base/debug/trace_event_impl.cc (right): https://codereview.chromium.org/144423009/diff/300001/base/debug/trace_event_... base/debug/trace_event_impl.cc:1289: int new_index = base::subtle::Acquire_Load(&g_category_index); the + 1 was here.. :(
The CQ bit was checked by dsinclair@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/144423009/300001
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bulach@chromium.org/144423009/300001
Message was sent while issue was closed.
Change committed as 251178 |