|
|
Chromium Code Reviews|
Created:
7 years ago by qsr Modified:
6 years, 5 months ago CC:
chromium-reviews, erikwright+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionDisable timing from chrome://profiler on Android.
On Android, on a arm CPU, TimeTicks::Now() is taking 900ns (20 times
what it takes on a desktop x86 CPU). The bigger user of TimeTicks::Now()
is tracked_objects::TrackedTime, and this impacts the overall
performance of Chrome on this platform.
This CL adds compile line flags that allows to disable
tracked_objects::TrackedTime. It also disable
tracked_objects::TrackedTime by default on Android.
BUG=315070
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=240092
Patch Set 1 #Patch Set 2 : Change flag location #Patch Set 3 : Fix compilation #
Total comments: 4
Patch Set 4 : Follow review #
Total comments: 17
Patch Set 5 : Follow review #
Total comments: 4
Patch Set 6 : Follow review #
Total comments: 4
Patch Set 7 : Follow review #
Total comments: 6
Patch Set 8 : Follow review #Patch Set 9 : Fix uninitialized command line/ #
Messages
Total messages: 39 (0 generated)
Hi Jar, Mark, This CL is disabling the timing in chrome://profiler by default on Android. The reason is the following: base::TimeTicks is taking 900 ns on android (that's 20 time what it takes on desktop). The result is that tracked_objects::TrackedTime() does appear as a couple of % on any perf run I do. I investigated disabling tracked_objects::TrackedTime() and I found an improvement in PLT from 2 to 10% using the page cycler telemtry test. The reason there is 2 command line argument is to allow re-enabling the profiler timing on android using a command line option: without any command line option, the profiler will be enabled. Android will hardcode the disable option, but if both options are set, the profiler will be enabled. What do you think?
https://codereview.chromium.org/99343002/diff/40001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/40001/base/base_switches.h#newc... base/base_switches.h:29: extern const char kEnableProfilerTiming[]; nit: Rather than defining two flags to compete (with the documentation hidden here), how about having one flag, with a boolean setting. Use a default if there is no flag. Use the supplied setting if one is given. Perhaps: extern const char kProfilerTiming[]; The switch could have values "0" or "1" or whatever is common for such boolean flags in Chrome. [Yeah... I can see lines 37 and 38 below... but it would be nicer if that was not done... but that is another CL.] https://codereview.chromium.org/99343002/diff/40001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/40001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:16: inline bool IsProfilerTimingDisabled() { Nit: code is IMO always much more confusing when negatives are used in flags and function. I try to make it a habit to use positives as much as possible. As a result, I'd much rather see the (very similar) function: IsProfilerTimingEnabled().
https://codereview.chromium.org/99343002/diff/40001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/40001/base/base_switches.h#newc... base/base_switches.h:29: extern const char kEnableProfilerTiming[]; On 2013/12/02 19:01:50, jar wrote: > nit: Rather than defining two flags to compete (with the documentation hidden > here), how about having one flag, with a boolean setting. Use a default if > there is no flag. Use the supplied setting if one is given. > > Perhaps: > extern const char kProfilerTiming[]; > > The switch could have values "0" or "1" or whatever is common for such boolean > flags in Chrome. > > [Yeah... I can see lines 37 and 38 below... but it would be nicer if that was > not done... but that is another CL.] Done. https://codereview.chromium.org/99343002/diff/40001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/40001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:16: inline bool IsProfilerTimingDisabled() { On 2013/12/02 19:01:50, jar wrote: > Nit: code is IMO always much more confusing when negatives are used in flags and > function. I try to make it a habit to use positives as much as possible. > > As a result, I'd much rather see the (very similar) function: > IsProfilerTimingEnabled(). Done. But I didn't change the static variable, as I need the default value to correspond to a negative value for it.
gentle ping?
https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:24: switches::kProfilerTiming) == "0"; You can indeed change the name to avoid negatives. Instead of using a "function static," you should use a "file scope static." You'll then get linker initialization to any constant value of your choice, and not need to use the negative naming convention. You could get fancy and use atomics here... but your original code was equally or more susceptible to the race initialization issues (gcc might try to protect these... but it is not guaranteed in most other implementations). For instance, the code could read: static enum { // You could rely on namespace, instead of static. UNDEFINED_TIMING, ENABLED_TIMING, DISABLED_TIMING, } timing_enabled = UNDEFINED TIMING; inline bool IsProfilerTimingEnabled() { if (timing_enabled == UNDEFINED_TIMING) { timing_enabled = (CommandLine::ForCurrentProcess()->GetSwitchValueASCII(switches::kProfilerTiming) != "0") ? ENABLED_TIMING : DISABLED_TIMING; return timing_enabled == ENABLED_TIMING; } https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:70: if (IsProfilerTimingEnabled()) { nit: Early return, inducing less indentation, etc., is generally the preferred style. It helps with readability. Please consider undoing the changes on lines 70-80.
Jim probably wants to look again too, but if you don’t hear back from him, I’ll LG it with these changes. https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:26: // Configure whether profiler will contain timing information. This option is Nit: the profiler (or reword). https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:27: // enabled by default. A value of 0 will disable profiler timing, while all Put the zero in quotes so that it’s clearer that you’re looking for string "0" and not any numeric expression that evaluates to 0 (such as "0x0" or "000"). https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:28: // other value will enable it. Nit: values, plural, not value. All of the other switches in here have all of their documentation in the .cc file. This should match. https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:17: // This initialization is not thread safe. In particular, some early calls Nit: thread-safe (with dash). https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:70: if (IsProfilerTimingEnabled()) { I’d rather flip the logic. Having the “real” definition of this function indented inside a conditional makes it seem more like it’s the afterthought and not the prime purpose here. https://codereview.chromium.org/99343002/diff/60001/content/browser/android/c... File content/browser/android/content_startup_flags.cc (right): https://codereview.chromium.org/99343002/diff/60001/content/browser/android/c... content/browser/android/content_startup_flags.cc:94: parsed_command_line->AppendSwitchASCII(switches::kProfilerTiming, "0"); Since you’ve used "0" as the magic value in a couple of spots now, you probably want a const char* for it somewhere.
Oh, well, there’s Jim. This is just a note on something he mentioned. https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:24: switches::kProfilerTiming) == "0"; jar wrote: > You could get fancy and use atomics here... but your original code was equally > or more susceptible to the race initialization issues (gcc might try to protect > these... but it is not guaranteed in most other implementations). C++11 requires thread-safe static initialization, and gcc and clang both have had thread-safe statics by default for forever, but we have disabled this feature globally for compatibility with our brain-dead Windows compiler. Lazy instances and singletons are cheap-ish in that they have a fast path that doesn’t use a lock, but there’s still a barrier.
https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:26: // Configure whether profiler will contain timing information. This option is On 2013/12/05 23:54:04, Mark Mentovai wrote: > Nit: the profiler (or reword). Done. https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:27: // enabled by default. A value of 0 will disable profiler timing, while all On 2013/12/05 23:54:04, Mark Mentovai wrote: > Put the zero in quotes so that it’s clearer that you’re looking for string "0" > and not any numeric expression that evaluates to 0 (such as "0x0" or "000"). Done. https://codereview.chromium.org/99343002/diff/60001/base/base_switches.h#newc... base/base_switches.h:28: // other value will enable it. On 2013/12/05 23:54:04, Mark Mentovai wrote: > Nit: values, plural, not value. > > All of the other switches in here have all of their documentation in the .cc > file. This should match. Done. https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:17: // This initialization is not thread safe. In particular, some early calls On 2013/12/05 23:54:04, Mark Mentovai wrote: > Nit: thread-safe (with dash). Done. https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:24: switches::kProfilerTiming) == "0"; On 2013/12/05 23:51:06, jar wrote: > You can indeed change the name to avoid negatives. > > Instead of using a "function static," you should use a "file scope static." > You'll then get linker initialization to any constant value of your choice, and > not need to use the negative naming convention. > > You could get fancy and use atomics here... but your original code was equally > or more susceptible to the race initialization issues (gcc might try to protect > these... but it is not guaranteed in most other implementations). > > For instance, the code could read: > static enum { // You could rely on namespace, instead of static. > UNDEFINED_TIMING, > ENABLED_TIMING, > DISABLED_TIMING, > } timing_enabled = UNDEFINED TIMING; > > inline bool IsProfilerTimingEnabled() { > if (timing_enabled == UNDEFINED_TIMING) { > timing_enabled = > (CommandLine::ForCurrentProcess()->GetSwitchValueASCII(switches::kProfilerTiming) > != "0") > ? ENABLED_TIMING : DISABLED_TIMING; > return timing_enabled == ENABLED_TIMING; > } Done. https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:70: if (IsProfilerTimingEnabled()) { On 2013/12/05 23:51:06, jar wrote: > nit: Early return, inducing less indentation, etc., is generally the preferred > style. It helps with readability. Please consider undoing the changes on lines > 70-80. Done. https://codereview.chromium.org/99343002/diff/60001/content/browser/android/c... File content/browser/android/content_startup_flags.cc (right): https://codereview.chromium.org/99343002/diff/60001/content/browser/android/c... content/browser/android/content_startup_flags.cc:94: parsed_command_line->AppendSwitchASCII(switches::kProfilerTiming, "0"); On 2013/12/05 23:54:04, Mark Mentovai wrote: > Since you’ve used "0" as the magic value in a couple of spots now, you probably > want a const char* for it somewhere. Done.
+sievers@chromium.org for OWNERS
LGTM (I've opted out of base ownership... so Mark may need to bless this). https://codereview.chromium.org/99343002/diff/80001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/80001/base/base_switches.h#newc... base/base_switches.h:26: extern const char kTraceToConsole[]; nit: could you fix this alphabetization as well.
LGTM
https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:79: return TrackedTime(); What functionally does this break exactly? Seems weird to break things inside this function here, rather than properly disabling whatever the feature is at a higher level.
On 2013/12/06 20:13:54, sievers wrote: > https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... > File base/profiler/tracked_time.cc (right): > > https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... > base/profiler/tracked_time.cc:79: return TrackedTime(); > What functionally does this break exactly? meeh. 'functionality' > Seems weird to break things inside this function here, rather than properly > disabling whatever the feature is at a higher level.
https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:79: return TrackedTime(); On 2013/12/06 20:13:54, sievers wrote: > What functionally does this break exactly? > Seems weird to break things inside this function here, rather than properly > disabling whatever the feature is at a higher level. This stops getting timing... but still gets call counts, etc., providing some utility (that would be lost if we completely inhibited profiling).
On 2013/12/06 21:57:39, jar wrote: > https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... > File base/profiler/tracked_time.cc (right): > > https://codereview.chromium.org/99343002/diff/80001/base/profiler/tracked_tim... > base/profiler/tracked_time.cc:79: return TrackedTime(); > On 2013/12/06 20:13:54, sievers wrote: > > What functionally does this break exactly? > > Seems weird to break things inside this function here, rather than properly > > disabling whatever the feature is at a higher level. > > This stops getting timing... but still gets call counts, etc., providing some > utility (that would be lost if we completely inhibited profiling). Does it make sense to change the caller of the function instead?
sievers wrote: > Does it make sense to change the caller of the function instead? Would that be ThreadData::Now in src/base/tracked_objects.cc? There’s already logic there to alter the TrackedTime in use, so that does seem like a better place to fix this. You could add another condition to fall through to the “zero” TrackedTime case, or you could push in a custom now_function_ from elsewhere.
PTAL https://codereview.chromium.org/99343002/diff/80001/base/base_switches.h File base/base_switches.h (right): https://codereview.chromium.org/99343002/diff/80001/base/base_switches.h#newc... base/base_switches.h:26: extern const char kTraceToConsole[]; On 2013/12/06 18:19:23, jar wrote: > nit: could you fix this alphabetization as well. Done.
LGTM https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc#... base/tracked_objects.cc:57: enum ProfilerTimingState { You can move this whole enum definition and variable declaration to be in-line in IsProfilerTimingEnabled. The enum can also be anonymous, it doesn’t need to have a type name. https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc#... base/tracked_objects.cc:65: // can be computed multiple time. This is not an issue, as the computed value time→times
https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc#... base/tracked_objects.cc:57: enum ProfilerTimingState { On 2013/12/09 14:18:32, Mark Mentovai wrote: > You can move this whole enum definition and variable declaration to be in-line > in IsProfilerTimingEnabled. > > The enum can also be anonymous, it doesn’t need to have a type name. Done. https://codereview.chromium.org/99343002/diff/100001/base/tracked_objects.cc#... base/tracked_objects.cc:65: // can be computed multiple time. This is not an issue, as the computed value On 2013/12/09 14:18:32, Mark Mentovai wrote: > time→times Done.
LGTM
lgtm for android https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... File base/profiler/tracked_time.cc (right): https://codereview.chromium.org/99343002/diff/60001/base/profiler/tracked_tim... base/profiler/tracked_time.cc:24: switches::kProfilerTiming) == "0"; On 2013/12/05 23:51:06, jar wrote: > You could get fancy and use atomics here... but your original code was equally > or more susceptible to the race initialization issues (gcc might try to protect > these... but it is not guaranteed in most other implementations). Or you could just use LazyInstance (or Singleton), which is thread-safe and "will hopefully have fast access when the instance is already created" (lazy_instance.h).
LazyInstance and Singleton do have that optimization, but they still have a barrier. Ben’s comment in the code indicated that the barrier was undesirable. The comment has changed, though.
https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:57: enum { missing static
https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:61: } g_timing_enabled = UNDEFINED_TIMING; It probably *should* be a global static per the g_* prefix, or something similar in an anonymous file-scope namespace. That way it won't bother with initialization checking, rather than "just" a function static which would (perchance with barriers etc.). You'll probably get a thread sanitizer complaint... but that seems acceptable. If necessary, you could use atomics.
https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:61: } g_timing_enabled = UNDEFINED_TIMING; jar wrote: > It probably *should* be a global static per the g_* prefix, Yes, drop the g_ prefix and put on the “static” keyword as noted by sievers. > or something similar > in an anonymous file-scope namespace. That way it won't bother with > initialization checking, rather than "just" a function static which would > (perchance with barriers etc.). Huh? How would access to a file-scoped static be handled any differently than access to a function-scoped one? An integer constant-initialized static can be in BSS or the data segment whether it’s file-scoped or function-scoped.
https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:61: } g_timing_enabled = UNDEFINED_TIMING; On 2013/12/09 20:42:39, Mark Mentovai wrote: > jar wrote: > > It probably *should* be a global static per the g_* prefix, > > Yes, drop the g_ prefix and put on the “static” keyword as noted by sievers. > > > or something similar > > in an anonymous file-scope namespace. That way it won't bother with > > initialization checking, rather than "just" a function static which would > > (perchance with barriers etc.). > > Huh? > > How would access to a file-scoped static be handled any differently than access > to a function-scoped one? An integer constant-initialized static can be in BSS > or the data segment whether it’s file-scoped or function-scoped. <doh> My mistaken... function statics don't need the fancy handling when they are known at link time... and are the same as file scope statics. I don't know what I was thinking.
On 2013/12/09 19:38:18, Mark Mentovai wrote: > LazyInstance and Singleton do have that optimization, but they still have a > barrier. Ben’s comment in the code indicated that the barrier was undesirable. > The comment has changed, though. Comment is back. I do not want a memory barrier here because my instrumentation showed that this can be called, at peak time, 100k times in a second. Even at 20ns for the barrier, this is 2ms of CPU time.
https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc File base/tracked_objects.cc (right): https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:57: enum { On 2013/12/09 20:10:23, sievers wrote: > missing static Done. https://codereview.chromium.org/99343002/diff/120001/base/tracked_objects.cc#... base/tracked_objects.cc:61: } g_timing_enabled = UNDEFINED_TIMING; On 2013/12/09 20:42:39, Mark Mentovai wrote: > jar wrote: > > It probably *should* be a global static per the g_* prefix, > > Yes, drop the g_ prefix and put on the “static” keyword as noted by sievers. > > > or something similar > > in an anonymous file-scope namespace. That way it won't bother with > > initialization checking, rather than "just" a function static which would > > (perchance with barriers etc.). > > Huh? > > How would access to a file-scoped static be handled any differently than access > to a function-scoped one? An integer constant-initialized static can be in BSS > or the data segment whether it’s file-scoped or function-scoped. Done.
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/qsr@chromium.org/99343002/140001
Retried try job too often on mac_rel for step(s) nacl_integration http://build.chromium.org/p/tryserver.chromium/buildstatus?builder=mac_rel&nu...
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/qsr@chromium.org/99343002/160001
Message was sent while issue was closed.
Change committed as 240092
Message was sent while issue was closed.
Question: You noted that TimeTicks::Now() can take 900ns, or 0.9 microseconds, and suggested this is causing a slowdown from use by the profiler? The profiler only calls for time at the start and the end of a message-loop task. As a result, it would *appear* that this is adding roughly 1.8 microseconds to a task. Given the rate at which tasks are run... it is rather surprising that this is causing a visible perf degradation. What am I missing? Is the time estimate of 900ns unimportant, and is the real issue some cross-thread locking contention?
Message was sent while issue was closed.
On 2014/07/14 21:55:23, jar wrote: > Question: You noted that TimeTicks::Now() can take 900ns, or 0.9 microseconds, > and suggested this is causing a slowdown from use by the profiler? > > The profiler only calls for time at the start and the end of a message-loop > task. As a result, it would *appear* that this is adding roughly 1.8 > microseconds to a task. > > Given the rate at which tasks are run... it is rather surprising that this is > causing a visible perf degradation. > > What am I missing? Is the time estimate of 900ns unimportant, and is the real > issue some cross-thread locking contention? I do not think it has anything to do with locking contention. The issue was found looking into profile data showing that TimeTicks::Now is using 2% of the CPU, of those calls, half were from TrackedTime. Following this patch, I indeed see the decrease I was expecting, seeing only 1% of CPU used by TimeTicks::Now. Now, I do not see that as so surprising. Looking at about:profiler, I see thousands of tasks for each page load. To get 1% of CPU with those 1.8us, it is sufficient that task takes an average of 0.2 ms, which doesn't strike me as absurd.
Message was sent while issue was closed.
On 2014/07/15 08:16:08, qsr wrote: > On 2014/07/14 21:55:23, jar wrote: > > Question: You noted that TimeTicks::Now() can take 900ns, or 0.9 microseconds, > > and suggested this is causing a slowdown from use by the profiler? > > > > The profiler only calls for time at the start and the end of a message-loop > > task. As a result, it would *appear* that this is adding roughly 1.8 > > microseconds to a task. > > > > Given the rate at which tasks are run... it is rather surprising that this is > > causing a visible perf degradation. > > > > What am I missing? Is the time estimate of 900ns unimportant, and is the real > > issue some cross-thread locking contention? > > I do not think it has anything to do with locking contention. The issue was > found looking into profile data showing that TimeTicks::Now is using 2% of the > CPU, of those calls, half were from TrackedTime. Following this patch, I indeed > see the decrease I was expecting, seeing only 1% of CPU used by TimeTicks::Now. > > Now, I do not see that as so surprising. Looking at about:profiler, I see > thousands of tasks for each page load. To get 1% of CPU with those 1.8us, it is > sufficient that task takes an average of 0.2 ms, which doesn't strike me as > absurd. Not sure I find this somewhat speculative data convincing enough yet to break useful features and UMA stats.
On Tue, Jul 15, 2014 at 8:29 PM, <sievers@chromium.org> wrote: > On 2014/07/15 08:16:08, qsr wrote: >> >> On 2014/07/14 21:55:23, jar wrote: >> > Question: You noted that TimeTicks::Now() can take 900ns, or 0.9 > > microseconds, >> >> > and suggested this is causing a slowdown from use by the profiler? >> > >> > The profiler only calls for time at the start and the end of a >> > message-loop >> > task. As a result, it would *appear* that this is adding roughly 1.8 >> > microseconds to a task. >> > >> > Given the rate at which tasks are run... it is rather surprising that >> > this > > is >> >> > causing a visible perf degradation. >> > >> > What am I missing? Is the time estimate of 900ns unimportant, and is >> > the > > real >> >> > issue some cross-thread locking contention? > > >> I do not think it has anything to do with locking contention. The issue >> was >> found looking into profile data showing that TimeTicks::Now is using 2% of >> the >> CPU, of those calls, half were from TrackedTime. Following this patch, I > > indeed >> >> see the decrease I was expecting, seeing only 1% of CPU used by > > TimeTicks::Now. > >> Now, I do not see that as so surprising. Looking at about:profiler, I >> see >> thousands of tasks for each page load. To get 1% of CPU with those 1.8us, >> it > > is >> >> sufficient that task takes an average of 0.2 ms, which doesn't strike me >> as >> absurd. > +pasko@ and tonyg@ for chrome performance. > > Not sure I find this somewhat speculative data convincing enough yet to > break > useful features and UMA stats. What exactly do you mean by speculative data? This changes is based on perf profiling chrome for Android during page load time. The result is that 1% of CPU is used by base::TrackedTime::Now(). This change is to gain those 1% of CPU. The only speculation I make is why those 1.8 us can add up to 1% of CPU, which only mean a task has an average of 0.2 ms. Moreover, the feature (chrome://profiler) itself is only used by developer than can re-enable it by command line. On the other hand, I missed the UMA stats, and I guess this is a reason to see if we are ready to spend 1% of CPU of load time for those. Added pasko@ and tonyg@ to weight in. > > https://codereview.chromium.org/99343002/ To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Wed, Jul 16, 2014 at 11:20 AM, Benjamin Lerman <qsr@chromium.org> wrote: > On Tue, Jul 15, 2014 at 8:29 PM, <sievers@chromium.org> wrote: > > On 2014/07/15 08:16:08, qsr wrote: > >> > >> On 2014/07/14 21:55:23, jar wrote: > >> > Question: You noted that TimeTicks::Now() can take 900ns, or 0.9 > > > > microseconds, > >> > >> > and suggested this is causing a slowdown from use by the profiler? > >> > > >> > The profiler only calls for time at the start and the end of a > >> > message-loop > >> > task. As a result, it would *appear* that this is adding roughly 1.8 > >> > microseconds to a task. > >> > > >> > Given the rate at which tasks are run... it is rather surprising that > >> > this > > > > is > >> > >> > causing a visible perf degradation. > >> > > >> > What am I missing? Is the time estimate of 900ns unimportant, and is > >> > the > > > > real > >> > >> > issue some cross-thread locking contention? > > > > > >> I do not think it has anything to do with locking contention. The > issue > >> was > >> found looking into profile data showing that TimeTicks::Now is using 2% > of > >> the > >> CPU, of those calls, half were from TrackedTime. Following this patch, I > > > > indeed > >> > >> see the decrease I was expecting, seeing only 1% of CPU used by > > > > TimeTicks::Now. > > > >> Now, I do not see that as so surprising. Looking at about:profiler, I > >> see > >> thousands of tasks for each page load. To get 1% of CPU with those > 1.8us, > >> it > > > > is > >> > >> sufficient that task takes an average of 0.2 ms, which doesn't strike me > >> as > >> absurd. > > > > +pasko@ and tonyg@ for chrome performance. > > > > > Not sure I find this somewhat speculative data convincing enough yet to > > break > > useful features and UMA stats. > > What exactly do you mean by speculative data? This changes is based > on perf profiling chrome for Android during page load time. The result > is that 1% of CPU is used by base::TrackedTime::Now(). > > This change is to gain those 1% of CPU. The only speculation I make > is why those 1.8 us can add up to 1% of CPU, which only mean a task > has an average of 0.2 ms. > > Moreover, the feature (chrome://profiler) itself is only used by > developer than can re-enable it by command line. > > On the other hand, I missed the UMA stats, and I guess this is a > reason to see if we are ready to spend 1% of CPU of load time for > those. Added pasko@ and tonyg@ to weight in. > UMA from about:profiler is something new to me. How actionable would the number be? We have quite a few UMA stats reported on Android, and we have alerts firing, but we cannot do anything about those alerts because of the population shifts caused by the Omaha-vs-Play-Problem: https://code.google.com/p/chromium/issues/detail?id=392168 Can we fix that first before adding more UMA performance-related stats on Android with non-zero overhead? -- Egor Pasko To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Message was sent while issue was closed.
Just an extra two cents: In the PostTask micro-benchmark, we can see that task overhead (in Chrome's code - there is more in the kernel) is about 25us per task. There is at least 60 tasks per frame IIRC, so that's 1.5ms, or ~10% of one CPU spent on task overhead (at 60Hz). If I recall, the entire profiler takes about 4-5us per task, which would be about 0.3ms or 2% of one CPU core, however only the timers are disabled so we are currently saving ~1% - similar to load time it seems. I don't have an opinion on whether it it should be behind a flag, but is there any reason to leave it enabled but yet with no timers? It seems like all or nothing makes more sense?
Message was sent while issue was closed.
On 2014/07/17 02:19:25, epennerAtGoogle wrote: > Just an extra two cents: > > In the PostTask micro-benchmark, we can see that task overhead (in Chrome's code > - there is more in the kernel) is about 25us per task. There is at least 60 > tasks per frame IIRC, so that's 1.5ms, or ~10% of one CPU spent on task overhead > (at 60Hz). If I recall, the entire profiler takes about 4-5us per task, which > would be about 0.3ms or 2% of one CPU core, however only the timers are disabled > so we are currently saving ~1% - similar to load time it seems. > > I don't have an opinion on whether it it should be behind a flag, but is there > any reason to leave it enabled but yet with no timers? It seems like all or > nothing makes more sense? I agree that probably all-or-nothing makes sense. That said, it is plausible that we are running too many tasks, that are too short, and *that* is the problem. Hence my suggestion seems vastly different from the current direction. You can shoot the messenger (profiler) and make it impossible to see performance problems (like this one!), or you can look to see what tasks are firing so often, but doing so little work, that task overhead becomes significant. This is indeed one of the things that a profiler identifies. To explore this avenue... before you completely disable the profiler... visit about:profiler.... the default sort-order is by count (which will highlight the tasks that must surely dominate this).... you should look at all tasks that have an average run time of under 1 ms (bad news... the partial neutering of the profiler gives all tasks 0 time....)... but you can at least explore the first 5 or 6 tasks, which dominate the task-count by several orders of magnitude!!!! That visitation (suggested above) is merely anecdotal. Visit Google's profiler dashboard page, and select the profiler summary for All Android... and you'll get the more general picture. Sort by Count, and sadly, you'll see that the usage won't fall off quite as fast. Select only browser processes, and then there is an order of magnitude fall off in the top 10 tasks. Sadly, due to the timer castration induced by this CL, you can't see which tasks do so little that the task-overhead is dominant... so we probably have to examine them all (or turn back on the profiler timing so we can do performance optimization!!). I suspect the answer will quickly emerge that for 1 or 2 types of tasks, we should NOT be using tasks!!! We should optimize away the task infrastructure in those SPECIAL cases, use a shared queue, restricted by a global lock, and aggregate the multitude of tasks into a single task that services NUMEROUS entries in the queue. Summary: Please don't remove the profiler. It is the tool that shows you how to optimize. It was written to be super efficient, and when *it* becomes a problem, then we need to fix the task that is being dominated by overhead! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
