Issue 22911026: Add instrumentation to the MessagePumpMac family of classes.

Robert Sesek

mark@ for initial review. I'll probably add darin@ and jar@ for followup.

7 years, 4 months ago (2013-08-20 22:18:37 UTC) #1

Mark Mentovai

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_pump_mac.mm File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_pump_mac.mm#newcode161 base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j < QUEUE_SIZE; ++j) ...

7 years, 4 months ago (2013-08-20 22:44:34 UTC) #2

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j
< QUEUE_SIZE; ++j) {
There’s no indication in the enum above that QUEUE_SIZE has been overloaded to
be “first thing in the list that’s not time-based.” Someone could add something
before QUEUE_SIZE and mess this up.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:176: }
That was only 75 histograms.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:186: case HISTOGRAM_EVENT_MAX:
This line is not needed. (Some compiler with lots of warnings might get upset if
it sees you’ve handled all of the possible enum values AND provided a default:,
so it might be better to remove this line.)

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:217: Timer timer_;
…this is running on a Timer. And the timers are dispatched by the message loop.

1. Would a CFTimer be better? Still dispatched by the message loop, still not
guaranteed to run in any time, but maybe you’d get better priority given the
current loop structure?

2. Maybe you should do something to track the actual time it took for your timer
to fire. You’d then have a massive data-processing pain to deal with the data
that produced, though.

3. Maybe it’s fine as-is, but tracking sample_count_ as a simple
monotonically-increasing thing per timer “generation” is bad. Maybe what you
need to do is set the timer on each iteration so that its fire time is as close
to the time you want as possible, and maybe you need to adjust things so that if
you expected a timer to fire at 2 seconds past the epoch to record things that
happened in the [1.0, 2.0) interval, but it actually fired at 4 seconds, you
record them as having happened in the [1.0, 4.0) interval, since that’s the best
information you’d really have at that point. But I don’t know if that would work
well for analysis with your current schema either.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:287: PreWaitObserver,
Either your new code is temporary (and you should add a comment explaining why
you call something called PreWaitObserver for post-wait also) or it’s not (and
you should rename PreWaitObserver).

Robert Sesek

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_pump_mac.mm File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_pump_mac.mm#newcode161 base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j < QUEUE_SIZE; ++j) ...

7 years, 4 months ago (2013-08-22 17:49:15 UTC) #3

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j
< QUEUE_SIZE; ++j) {
On 2013/08/20 22:44:34, Mark Mentovai wrote:
> There’s no indication in the enum above that QUEUE_SIZE has been overloaded to
> be “first thing in the list that’s not time-based.” Someone could add
something
> before QUEUE_SIZE and mess this up.

"Solved" with a comment. Fixing this with an in-between marker value makes the
code to account for it harder to follow IMO.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:176: }
On 2013/08/20 22:44:34, Mark Mentovai wrote:
> That was only 75 histograms.

I'll leave it to jar@ to comment on the implications. I could probably reduce
sampling_duration_ if necessary.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:186: case HISTOGRAM_EVENT_MAX:
On 2013/08/20 22:44:34, Mark Mentovai wrote:
> This line is not needed. (Some compiler with lots of warnings might get upset
if
> it sees you’ve handled all of the possible enum values AND provided a
default:,
> so it might be better to remove this line.)

Done.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:217: Timer timer_;
On 2013/08/20 22:44:34, Mark Mentovai wrote:
> …this is running on a Timer. And the timers are dispatched by the message
loop.

Sigh, I know.

> 1. Would a CFTimer be better? Still dispatched by the message loop, still not
> guaranteed to run in any time, but maybe you’d get better priority given the
> current loop structure?

Yes, I've switched to a CFRunLoopTimer and it performs pretty well. I
experimented with using the events the Instrumentation is recording as chances
for advancing the generation. That had problems when the run loop went to sleep
and there was nothing to interrupt it to advance the generation.


> 2. Maybe you should do something to track the actual time it took for your
timer
> to fire. You’d then have a massive data-processing pain to deal with the data
> that produced, though.

Done. I'm not sure it can be used to correct any of the data (and I'm not sure
it matters _too_ much). I'm more concerned with how data changes over time,
local and absolute minima and maxima, and the effect of a dependent variable
(the scheduling behavior) on it. If data doesn't line exactly up into buckets,
that's probably fine for my purposes.

> 3. Maybe it’s fine as-is, but tracking sample_count_ as a simple
> monotonically-increasing thing per timer “generation”&nbsp;is bad. Maybe what
you
> need to do is set the timer on each iteration so that its fire time is as
close
> to the time you want as possible, and maybe you need to adjust things so that
if
> you expected a timer to fire at 2 seconds past the epoch to record things that
> happened in the [1.0, 2.0) interval, but it actually fired at 4 seconds, you
> record them as having happened in the [1.0, 4.0) interval, since that’s the
best
> information you’d really have at that point. But I don’t know if that would
work
> well for analysis with your current schema either.

Now that I'm recording timer skew, I'm also advancing sample_count_ (now
sample_generation_) by any skew that is greater than or equal to
sampling_interval_.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:287: PreWaitObserver,
On 2013/08/20 22:44:34, Mark Mentovai wrote:
> Either your new code is temporary (and you should add a comment explaining why
> you call something called PreWaitObserver for post-wait also) or it’s not (and
> you should rename PreWaitObserver).

Done.

Mark Mentovai

This is pretty much LGTM now, but you still need to get at least jar’s ...

7 years, 4 months ago (2013-08-22 18:00:12 UTC) #4

Robert Sesek

+jar for my extensive use of histograms and another base/ reviwer https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message_loop.cc File base/message_loop/message_loop.cc (right): ...

7 years, 4 months ago (2013-08-22 19:12:21 UTC) #5

jar (doing other things)

I'm not clear on where this whole investigation is going. Perhaps you can provide a ...

7 years, 4 months ago (2013-08-22 23:03:39 UTC) #6

I'm not clear on where this whole investigation is going.  Perhaps you can
provide a link for me to read about what you are contemplating?

Have you looked at stats gathered in about:profiler?  We gathere a lot of stats
about queue latency, without using sampling methodology.  I would expect it to
have a lot of info of interest to you.

I'm very wary of biases exposed by attempts to "sample" data that relates to how
busy the system is.  You might actually do better using a ordinary delayed task
as a timer... since there would be no (sampling) timers injected into the system
when you were doing the sampling (assuming you never post more than one at at
time) but again... the biasing impact has my head spinning a tad.

I think there are a few mistakes and confusion in what you do sample... and I
tried to list what I saw below... but since I'm not clear on what you're after,
it is hard to say "what is right" to do.

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_...
base/message_loop/message_pump_mac.mm:176: }
On 2013/08/22 17:49:15, rsesek wrote:
> On 2013/08/20 22:44:34, Mark Mentovai wrote:
> > That was only 75 histograms.
> 
> I'll leave it to jar@ to comment on the implications. I could probably reduce
> sampling_duration_ if necessary.

I'm not convinced that all these histogtrams are useful (which is I think what
Mark was quietly suggesting by pointing out the total).

I'd be curious to understand better what you are trying to do, as I'm not at all
following the utility.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_loop.cc (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:661: size_t size = work_queue_.size();
Note that the "work_queue" is not really the full queue, but is only the portion
of the queue that is visible without a lock, and accessed locally on this
thread.

See ReloadWorkQueue() if you care about seeing what else is queued.

Under heavy load, I'd expect the work_queue.size() to be about half the actual
(combined) queue size... but things will vary.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:662: *queue_size = size +
delayed_work_queue_.size();
Why is it interesting to add together pending (ready to run) queue size, plus
the delayed_work_queue (which probably mostly can't be run for a while... can be
an arbitrarily queue large... and would have no significant impact on
performance)?

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:669: const PendingTask& next_to_run =
work_queue_.front();
The work_queue_ could be empty... so you should test first.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump.h (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump.h:47: // been waiting to run.
Define semantics if the work queue is empty.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump_mac.h (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.h:143: static void
PrePostWaitObserver(CFRunLoopObserverRef observer,
This name seems worse than the original.  What semantic difference are you
trying to illuminate?

The old name seemed to mean that this observer was called just before waiting
(which seems like the right time to call for idle work, as we're about to
sleep).

The new names seems to mean that just after we're done waiting (yikes: When we
wake up due to scheduled work?) we call this observer before we call the
PostWaitObserver?.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:169: for (TimeDelta i; i <
sampling_duration_; i += sampling_interval_) {
I'm not clear at all why you want these different samples contained in separate
histograms.

This all seems random... so why separate stuff sampled during a first interval
from things in a second interval??

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:221:
UMA_HISTOGRAM_TIMES("MessagePumpMac.TimerSkew", sampling_interval_ - delta);
I think you're trying to measure how the actual interval differs from the
planned interval.  How does that impact anything?

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:614: return;
This maybe suggests it could be called before or after.  Perhaps a name like:
PreOrPostWaitObserver would be meaningful.

The word "Post" could be a word-prefix (meaning before), but it also has meaning
in the context of MessageLoop to mean "Post a task."  I'd suggest avoiding its
use here.... so maybe:

StartOrEndWaitObserver()

...or perhaps we should have two distinct observers... and not pass in a boolean
to distinguish them.

Robert Sesek

On 2013/08/22 23:03:39, jar wrote: > I'm not clear on where this whole investigation is ...

7 years, 4 months ago (2013-08-23 16:58:24 UTC) #7

On 2013/08/22 23:03:39, jar wrote:
> I'm not clear on where this whole investigation is going.  Perhaps you can
> provide a link for me to read about what you are contemplating?

Sure, the linked bug
(https://code.google.com/p/chromium/issues/detail?id=264886) has the full
details. In short: we've noticed that the MessagePump could probably be more
efficient about servicing the MessageLoop. I've confirmed this and some other
experiments on my machines, but I'd like data from production to verify my
initial findings any and experiments.

Ultimately, what I want to do is try something like
https://codereview.chromium.org/20803005 (do not even bother reviewing at this
stage :)) in a Finch field trial to compare results, potentially using Finch to
control different time quanta in the "scheduler."

> Have you looked at stats gathered in about:profiler?  We gathere a lot of
stats
> about queue latency, without using sampling methodology.  I would expect it to
> have a lot of info of interest to you.

Yes! about:profiler is what I used to perform my initial investigation
(https://codereview.chromium.org/20792003). However, I want data from
production, not an engineering workstation. Unfortunately, about:profiling is
not enabled by default and is not shipped back to us like UMA. This is why I'm
trying to replicate about:profiling via sampling here.

I want to go with a sampling approach, as opposed to just aggregating all values
into single histograms, because my experiments have shown the values vary
significantly throughout the startup process. I've found that startup comes in
roughly three "waves". Very early in startup, there is a medium-sized queue with
significant queueing delays. The second wave is a long queue (1k+ items) of
tasks that all execute quickly. The third wave is a small queue of long-running
tasks that have a moderate queuing delay. Between each wave is a quiet/efficient
period of work.

Initially, I want to know if the above conclusion roughly matches real-world
data. For example, I was also only testing a single, fairly repeatable (though I
think common) startup scenario. Later, after I start experimenting, I want to
know how my experiments affect these "waves" individually (it may positively
affect some work conditions and adversely others).

In short: the absolute maximum and minimum for these data are definitely
interesting by themselves (if we were to aggregate in a single histogram, rather
than into samples), but I'm more interested in the local minima and maxima for
what I'm doing.

> I'm very wary of biases exposed by attempts to "sample" data that relates to
how
> busy the system is.  You might actually do better using a ordinary delayed
task
> as a timer... since there would be no (sampling) timers injected into the
system
> when you were doing the sampling (assuming you never post more than one at at
> time) but again... the biasing impact has my head spinning a tad.

I'm a little confused by your first sentence.

In patch set 1, I used a base::Timer. I found that using a system timer gives
better fidelity to the interval. This is probably because the system timer will
be treated as an independent run loop source, whereas a delayed task will be
serviced along with all other delayed tasks on the single delayed_work_timer_.

> I think there are a few mistakes and confusion in what you do sample... and I
> tried to list what I saw below... but since I'm not clear on what you're
after,
> it is hard to say "what is right" to do.

That's fair. Please let me know if you have more questions or better ideas after
reading the bug. It's the most thorough explanation I've written on the problem
we're working at. In the Perfect World™, I'd want to turn on tracing for
specific events and have it shipped back, but that doesn't seem possible. This
is the closest approximation to that I could think of.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_loop.cc (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:661: size_t size = work_queue_.size();
On 2013/08/22 23:03:39, jar wrote:
> Note that the "work_queue" is not really the full queue, but is only the
portion
> of the queue that is visible without a lock, and accessed locally on this
> thread.
> 
> See ReloadWorkQueue() if you care about seeing what else is queued.
> 
> Under heavy load, I'd expect the work_queue.size() to be about half the actual
> (combined) queue size... but things will vary.
> 

Yes, I know the incoming_queue_ is not included in this count, but I did not
want to take a lock to get that information.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:662: *queue_size = size +
delayed_work_queue_.size();
On 2013/08/22 23:03:39, jar wrote:
> Why is it interesting to add together pending (ready to run) queue size, plus
> the delayed_work_queue (which probably mostly can't be run for a while... can
be
> an arbitrarily queue large... and would have no significant impact on
> performance)?

Fair point. I did not count this in my initial investigation, so removed.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_loop.cc:669: const PendingTask& next_to_run =
work_queue_.front();
On 2013/08/22 23:03:39, jar wrote:
> The work_queue_ could be empty... so you should test first.

Already checking size above.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump.h (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump.h:47: // been waiting to run.
On 2013/08/22 23:03:39, jar wrote:
> Define semantics if the work queue is empty.

Done.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump_mac.h (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.h:143: static void
PrePostWaitObserver(CFRunLoopObserverRef observer,
On 2013/08/22 23:03:39, jar wrote:
> This name seems worse than the original.  What semantic difference are you
> trying to illuminate?
> 
> The old name seemed to mean that this observer was called just before waiting
> (which seems like the right time to call for idle work, as we're about to
> sleep).
> 
> The new names seems to mean that just after we're done waiting (yikes: When we
> wake up due to scheduled work?) we call this observer before we call the
> PostWaitObserver?.  

It matches the EnterExit observer that's already present. The name is trying to
convey that it is called prior to going to sleep and after waking up.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:221:
UMA_HISTOGRAM_TIMES("MessagePumpMac.TimerSkew", sampling_interval_ - delta);
On 2013/08/22 23:03:39, jar wrote:
> I think you're trying to measure how the actual interval differs from the
> planned interval.  How does that impact anything?

I guess measuring it here is not really necessary since it's taken care of below
with the skew_advance loop.

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:614: return;
On 2013/08/22 23:03:39, jar wrote:
> This maybe suggests it could be called before or after.  Perhaps a name like:
> PreOrPostWaitObserver would be meaningful.
> 
> The word "Post" could be a word-prefix (meaning before), but it also has
meaning
> in the context of MessageLoop to mean "Post a task."  I'd suggest avoiding its
> use here.... so maybe:
> 
> StartOrEndWaitObserver()
> 
> ...or perhaps we should have two distinct observers... and not pass in a
boolean
> to distinguish them.

Each observer actually has an associated cost (a lock and some processing
overhead) in the system run loop, so I don't think adding another is the best
solution. I like StartOrEndWaitObserver().

jar (doing other things)

On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: > >> Have you looked ...

7 years, 4 months ago (2013-08-23 22:24:47 UTC) #8

Robert Sesek

On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote: > On Fri, ...

7 years, 3 months ago (2013-08-26 16:21:58 UTC) #9

On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote:

> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote:
>
>>
>>> Have you looked at stats gathered in about:profiler?  We gathere a lot of
>>>
>> stats
>>
>>> about queue latency, without using sampling methodology.  I would expect
>>> it to
>>> have a lot of info of interest to you.
>>>
>>
>> Yes! about:profiler is what I used to perform my initial investigation
>>
(https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...).
>> However, I want data from
>> production, not an engineering workstation. Unfortunately,
>> about:profiling is
>> not enabled by default and is not shipped back to us like UMA.
>>
>>
> FWIW: about:profiling is enabled by default full-time in all official
> releases of Chrome.  It was written to be cross-platform, so I *think* it
> runs on Mac as well..  We also currently send back the data from startup as
> part of UMA (data from the first 30 seconds)..  We haven't (at google) done
> work to visualize this data... but this would perchance be a good time to
> start :-).  I've wanted to write code to visualize it... but I've always
> had other more pressing things to do.  Isherman also started to do this...
> and got pulled off with other stuff :-(.
>

Yes, about:profiling does work on Mac, but it's not enabled by default in
production builds and its data does not appear to have an upload pipeline.
I looked into the startup data you mentioned though, and it doesn't seem
like it quite aligns with my goals. The MetricsService uses the
TrackingSynchronizer to record the tracked_objects::ThreadData, which is
not (exclusively) what I want to measure. I'm trying to gather performance
data about the underlying system run loop, rather than information about
Tasks.

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

jar (doing other things)

https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message_loop.cc File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message_loop.cc#newcode669 base/message_loop/message_loop.cc:669: const PendingTask& next_to_run = work_queue_.front(); On 2013/08/23 16:58:24, rsesek ...

7 years, 3 months ago (2013-08-27 22:16:55 UTC) #10

jar (doing other things)

On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org> wrote: > On Fri, ...

7 years, 3 months ago (2013-08-27 22:30:41 UTC) #11

On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org> wrote:

> On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote:
>
>> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote:
>>
>>>
>>>> Have you looked at stats gathered in about:profiler?  We gathere a lot
>>>> of
>>>>
>>> stats
>>>
>>>> about queue latency, without using sampling methodology.  I would
>>>> expect it to
>>>> have a lot of info of interest to you.
>>>>
>>>
>>> Yes! about:profiler is what I used to perform my initial investigation
>>>
(https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...).
>>> However, I want data from
>>> production, not an engineering workstation. Unfortunately,
>>> about:profiling is
>>> not enabled by default and is not shipped back to us like UMA.
>>>
>>>
>> FWIW: about:profiling is enabled by default full-time in all official
>> releases of Chrome.  It was written to be cross-platform, so I *think* it
>> runs on Mac as well..  We also currently send back the data from startup as
>> part of UMA (data from the first 30 seconds)..  We haven't (at google) done
>> work to visualize this data... but this would perchance be a good time to
>> start :-).  I've wanted to write code to visualize it... but I've always
>> had other more pressing things to do.  Isherman also started to do this...
>> and got pulled off with other stuff :-(.
>>
>
> Yes, about:profiling does work on Mac, but it's not enabled by default in
> production builds
>

It is supposed to be enabled on all platforms. I just checked, and this
does indeed include Mac.  Have you tried ot visit about:profiler in a
release version of Chrome?


> and its data does not appear to have an upload pipeline.
>

It currently is uploaded on all platforms with the first UMA upload (at the
60 second point). Until we have visualization of those metrics, we didn't
want to spend time considering what other upload point we might continue
(such as shutdown??).


>  I looked into the startup data you mentioned though, and it doesn't seem
> like it quite aligns with my goals. The MetricsService uses the
> TrackingSynchronizer to record the tracked_objects::ThreadData, which is
> not (exclusively) what I want to measure. I'm trying to gather performance
> data about the underlying system run loop, rather than information about
> Tasks.
>

...probably not everything you're after... but at one time, I wrote code to
track sleep time on each thread, as though it was a task.  That *might* be
of interest to you.

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Robert Sesek

On Tue, Aug 27, 2013 at 6:30 PM, Jim Roskind <jar@chromium.org> wrote: > On Mon, ...

7 years, 3 months ago (2013-08-27 23:54:53 UTC) #12

On Tue, Aug 27, 2013 at 6:30 PM, Jim Roskind <jar@chromium.org> wrote:

> On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org> wrote:
>
>> On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote:
>>
>>> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote:
>>>
>>>>
>>>>> Have you looked at stats gathered in about:profiler?  We gathere a lot
>>>>> of
>>>>>
>>>> stats
>>>>
>>>>> about queue latency, without using sampling methodology.  I would
>>>>> expect it to
>>>>> have a lot of info of interest to you.
>>>>>
>>>>
>>>> Yes! about:profiler is what I used to perform my initial investigation
>>>>
(https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...).
>>>> However, I want data from
>>>> production, not an engineering workstation. Unfortunately,
>>>> about:profiling is
>>>> not enabled by default and is not shipped back to us like UMA.
>>>>
>>>>
>>> FWIW: about:profiling is enabled by default full-time in all official
>>> releases of Chrome.  It was written to be cross-platform, so I *think* it
>>> runs on Mac as well..  We also currently send back the data from startup as
>>> part of UMA (data from the first 30 seconds)..  We haven't (at google) done
>>> work to visualize this data... but this would perchance be a good time to
>>> start :-).  I've wanted to write code to visualize it... but I've always
>>> had other more pressing things to do.  Isherman also started to do this...
>>> and got pulled off with other stuff :-(.
>>>
>>
>> Yes, about:profiling does work on Mac, but it's not enabled by default in
>> production builds
>>
>
> It is supposed to be enabled on all platforms. I just checked, and this
> does indeed include Mac.  Have you tried ot visit about:profiler in a
> release version of Chrome?
>

I think I'm conflating some terms here, so let me clarify. about:profiling
is shipped with the production binary, but arbitrary TRACE_EVENT macros are
not active at startup and are not uploaded to UMA. Profiling data about
task births, deaths, and running times (i.e. tracked_objects via FROM_HERE)
is shipped with the binary, is enabled at startup, and is uploaded.

> and its data does not appear to have an upload pipeline.
>>
>
> It currently is uploaded on all platforms with the first UMA upload (at
> the 60 second point). Until we have visualization of those metrics, we
> didn't want to spend time considering what other upload point we might
> continue (such as shutdown??).
>

The data are limited to tracked_objects -- i.e., only work that's posted to
the MessageLoop. What I'm trying to capture are things happening
outside/below the MessageLoop, so tracked_objects data are not really
useful. I have not found anything for uploading arbitrary TRACE_EVENTS in
the protobufs used by the MetricsService, which indicated to me that data
from about:profiling cannot at present be uploaded. Therefore, I've created
this set of histograms to mimic as close as possible a category of
arbitrary TRACE_EVENTS uploaded to UMA for the first 15 seconds of startup.
This CL from my initial investigation shows the TRACE_EVENTS I'd add if
those could be uploaded: https://codereview.chromium.org/20792003

>  I looked into the startup data you mentioned though, and it doesn't seem
>> like it quite aligns with my goals. The MetricsService uses the
>> TrackingSynchronizer to record the tracked_objects::ThreadData, which is
>> not (exclusively) what I want to measure. I'm trying to gather performance
>> data about the underlying system run loop, rather than information about
>> Tasks.
>>
>
> ...probably not everything you're after... but at one time, I wrote code
> to track sleep time on each thread, as though it was a task.  That *might*
> be of interest to you.
>

That is one dimension that I'm interested in, but I'm focusing here on how
the system run loop is dividing work between the MessagePump and other
system work, and how those divisions affect the efficiency of the
MessageLoop. Thread sleep time is one of the things I'm recording, but
there are several others.

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Ilya Sherman

A note from the sidelines: I think you're conflating about:profiler and about:tracing. about:tracing data is ...

7 years, 3 months ago (2013-08-27 23:57:58 UTC) #13

A note from the sidelines: I think you're conflating about:profiler and
about:tracing.  about:tracing data is definitely currently included in UMA
uploads, whereas about:profiler data is.


On Tue, Aug 27, 2013 at 4:54 PM, Robert Sesek <rsesek@chromium.org> wrote:

> On Tue, Aug 27, 2013 at 6:30 PM, Jim Roskind <jar@chromium.org> wrote:
>
>> On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org>wrote:
>>
>>> On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote:
>>>
>>>> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote:
>>>>
>>>>>
>>>>>> Have you looked at stats gathered in about:profiler?  We gathere a
>>>>>> lot of
>>>>>>
>>>>> stats
>>>>>
>>>>>> about queue latency, without using sampling methodology.  I would
>>>>>> expect it to
>>>>>> have a lot of info of interest to you.
>>>>>>
>>>>>
>>>>> Yes! about:profiler is what I used to perform my initial investigation
>>>>>
(https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...).
>>>>> However, I want data from
>>>>> production, not an engineering workstation. Unfortunately,
>>>>> about:profiling is
>>>>> not enabled by default and is not shipped back to us like UMA.
>>>>>
>>>>>
>>>> FWIW: about:profiling is enabled by default full-time in all official
>>>> releases of Chrome.  It was written to be cross-platform, so I *think* it
>>>> runs on Mac as well..  We also currently send back the data from startup
as
>>>> part of UMA (data from the first 30 seconds)..  We haven't (at google)
done
>>>> work to visualize this data... but this would perchance be a good time to
>>>> start :-).  I've wanted to write code to visualize it... but I've always
>>>> had other more pressing things to do.  Isherman also started to do this...
>>>> and got pulled off with other stuff :-(.
>>>>
>>>
>>> Yes, about:profiling does work on Mac, but it's not enabled by default
>>> in production builds
>>>
>>
>> It is supposed to be enabled on all platforms. I just checked, and this
>> does indeed include Mac.  Have you tried ot visit about:profiler in a
>> release version of Chrome?
>>
>
> I think I'm conflating some terms here, so let me clarify. about:profiling
> is shipped with the production binary, but arbitrary TRACE_EVENT macros are
> not active at startup and are not uploaded to UMA. Profiling data about
> task births, deaths, and running times (i.e. tracked_objects via FROM_HERE)
> is shipped with the binary, is enabled at startup, and is uploaded.
>
>
>> and its data does not appear to have an upload pipeline.
>>>
>>
>> It currently is uploaded on all platforms with the first UMA upload (at
>> the 60 second point). Until we have visualization of those metrics, we
>> didn't want to spend time considering what other upload point we might
>> continue (such as shutdown??).
>>
>
> The data are limited to tracked_objects -- i.e., only work that's posted
> to the MessageLoop. What I'm trying to capture are things happening
> outside/below the MessageLoop, so tracked_objects data are not really
> useful. I have not found anything for uploading arbitrary TRACE_EVENTS in
> the protobufs used by the MetricsService, which indicated to me that data
> from about:profiling cannot at present be uploaded. Therefore, I've created
> this set of histograms to mimic as close as possible a category of
> arbitrary TRACE_EVENTS uploaded to UMA for the first 15 seconds of startup.
> This CL from my initial investigation shows the TRACE_EVENTS I'd add if
> those could be uploaded: https://codereview.chromium.org/20792003
>
>
>>  I looked into the startup data you mentioned though, and it doesn't
>>> seem like it quite aligns with my goals. The MetricsService uses the
>>> TrackingSynchronizer to record the tracked_objects::ThreadData, which is
>>> not (exclusively) what I want to measure. I'm trying to gather performance
>>> data about the underlying system run loop, rather than information about
>>> Tasks.
>>>
>>
>> ...probably not everything you're after... but at one time, I wrote code
>> to track sleep time on each thread, as though it was a task.  That *might*
>> be of interest to you.
>>
>
> That is one dimension that I'm interested in, but I'm focusing here on how
> the system run loop is dividing work between the MessagePump and other
> system work, and how those divisions affect the efficiency of the
> MessageLoop. Thread sleep time is one of the things I'm recording, but
> there are several others.
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Robert Sesek

On Tue, Aug 27, 2013 at 7:57 PM, Ilya Sherman <isherman@chromium.org> wrote: > A note ...

7 years, 3 months ago (2013-08-28 00:00:41 UTC) #14

jar (doing other things)

Are we already placing an observer into the list just before and after each task ...

7 years, 3 months ago (2013-08-29 22:15:08 UTC) #15

Robert Sesek

https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message_pump_mac.mm File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message_pump_mac.mm#newcode328 base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = CFRunLoopObserverCreate(NULL, // allocator On 2013/08/29 22:15:08, jar ...

7 years, 3 months ago (2013-08-30 16:25:07 UTC) #16

jar (doing other things)

On 2013/08/30 16:25:07, rsesek wrote: > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message_pump_mac.mm > File base/message_loop/message_pump_mac.mm (right): > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message_pump_mac.mm#newcode328 > ...

7 years, 3 months ago (2013-08-30 16:41:57 UTC) #17

Robert Sesek

On 2013/08/30 16:41:57, jar wrote: > On 2013/08/30 16:25:07, rsesek wrote: > > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message_pump_mac.mm ...

7 years, 3 months ago (2013-08-30 18:00:52 UTC) #18

On 2013/08/30 16:41:57, jar wrote:
> On 2013/08/30 16:25:07, rsesek wrote:
> >
>
https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
> > File base/message_loop/message_pump_mac.mm (right):
> > 
> >
>
https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
> > base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ =
> > CFRunLoopObserverCreate(NULL,  // allocator
> > On 2013/08/29 22:15:08, jar wrote:
> > > I'm surprised we do this.  Adding an observer on a per-message basis has a
> > large
> > > impact on performance (as I recall), even if you try to only do work
> > > infrequently.
> > 
> > This observer is added once at the creation of the MessagePump, though it's
> > called every pass through the system run loop as appropriate. The cost is
> small.
> 
> It isn't that clear that the "recurring cost is small."
> 
> In the past, when I was single stepping through assembly code in the message
> loop, I spotted the observer (just before and maybe after the task.run()
> invocation).  It used to routinely check the a list, which was at the time
> empty.  The cost of each "examination of the empty list" used to include (as I
> recall) two lock acquisitions and releases, and at least about 3 mallocs and
> frees (and that was for an empty list!).  I think the cost derived from the
> fancy nature of the observer lists, which allowed for async
> multi-thread/reentrant addition and deletion from the list.  I added a
> short-circuit in the "empty list" case to avoid most of this overhead, but if
> the list has even one element, the costs (and more) reappear.\
> 
> Have you looked at the (recurring) cost of having an observer on this list?

I'm a bit confused - there is no "list" here and this isn't adding an
ObserverList. This is an observer on the system's run loop that is called out to
before the run loop is going to sleep (which we use for DoIdleWork), and I'm
adding here the corresponding event for when the run loop is waking up from
sleep so that I can measure the sleep duration. This is the documentation for
this observer, FYI:
https://developer.apple.com/library/mac/documentation/CoreFoundation/Referenc...

One of the things this instrumentation will be used to measure is the cost of
small but recurring events (something I call out explicitly in the bug). As is
the nature of adding instrumentation code, there is some cost, but I think this
data this would collect is significant to warrant paying for it.

jar (doing other things)

Please correct the comment (nit) listed below, or explain why there isn't a factor of ...

7 years, 3 months ago (2013-09-04 01:20:16 UTC) #19

Please correct the comment (nit) listed below, or explain why there isn't a
factor of 4 error in the metric, then LGTM.

As a general comment on the goal: IMO, when we have "too many tasks" being
queued up, we should look to see if those tasks can be coalesced, generally at a
higher level.  As I understand the bug, you are wondering why the overhead on a
loop is "so high" as compared with task execution time.  I think the coalescing
approach will often be much more effective than searching for ways to run more
tasks sooner (with less yielding and checking for "other things" to do). 
Coalescing can often skip over portions of tasks that will be "redone" by later
tasks, and can also benefit from maintaining memory context (cache state) from
one task to a related task (that could be combined).  

If you can run the tasks a lot faster.... more power to you... as we tried to
minimize overhead.  It is critical to avoid starving "other things," and on some
platforms (most notably, windows), it is expensive to constantly check the time
to see if you should give a slot to "other things" instead of tasks.  It may be
that on the Mac there is different mix of tasks and "other things" so there may
be some value to gain... but it should be even larger if you could coalesce the
tasks.

https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
File base/message_loop/message_pump.h (right):

https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
base/message_loop/message_pump.h:46: // work queue and the length of time that
the first item in the queue has
nit: You already know this... and we talked about it in the code... but....

It only returns the length of the internalized work-queue, not the full queue,
for which (on average) less than half the queued tasks are recorded.  When work
loads are small, then this may be the entire queue (typically 0 queued items). 
When work loads are large, then there will (when the system is stable) be about
N tasks in the external (unmeasured) queue, and between 0 and N tasks (average
around N/2) tasks in the internal queue.

As a result, I think this metric will (on average) return a number that is 1/4th
the queue size (when it returns any size at all)... and of course 0 is 1/4th of
0... so it probably always returns a metric that is 1/4th of reality (on
average).

https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
File base/message_loop/message_pump_mac.mm (right):

https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ =
CFRunLoopObserverCreate(NULL,  // allocator
On 2013/08/30 16:25:07, rsesek wrote:
> On 2013/08/29 22:15:08, jar wrote:
> > I'm surprised we do this.  Adding an observer on a per-message basis has a
> large
> > impact on performance (as I recall), even if you try to only do work
> > infrequently.
> 
> This observer is added once at the creation of the MessagePump, though it's
> called every pass through the system run loop as appropriate. The cost is
small.

I see... this is a Mac thing... not the generic task observer list that I've
seen placed in the message loop.  Sorry for my confusion.

Robert Sesek

On 2013/09/04 01:20:16, jar wrote: > As a general comment on the goal: IMO, when ...

7 years, 3 months ago (2013-09-05 14:40:49 UTC) #20

On 2013/09/04 01:20:16, jar wrote:
> As a general comment on the goal: IMO, when we have "too many tasks" being
> queued up, we should look to see if those tasks can be coalesced, generally at
a
> higher level.  As I understand the bug, you are wondering why the overhead on
a
> loop is "so high" as compared with task execution time.  I think the
coalescing
> approach will often be much more effective than searching for ways to run more
> tasks sooner (with less yielding and checking for "other things" to do). 
> Coalescing can often skip over portions of tasks that will be "redone" by
later
> tasks, and can also benefit from maintaining memory context (cache state) from
> one task to a related task (that could be combined).  

This is something that I'm aware of. When using my about:tracing implementation,
I did notice one specific FROM_HERE was repeatedly being sent to the UI loop.
I'm investigating whether that can be coalesced (it's related to sending cookie
change notifications to extensions from the IO thread) and will be filing a bug
soon. I can CC you on that. Most of the other tasks did not look like they
would/could benefit from coalescing.

> If you can run the tasks a lot faster.... more power to you... as we tried to
> minimize overhead.  It is critical to avoid starving "other things," and on
some
> platforms (most notably, windows), it is expensive to constantly check the
time
> to see if you should give a slot to "other things" instead of tasks.  It may
be
> that on the Mac there is different mix of tasks and "other things" so there
may
> be some value to gain... but it should be even larger if you could coalesce
the
> tasks.

Agreed that starvation is bad. When I move on to the experimentation phase, I'm
going to be sure that any changes do not starve the other run loop sources. The
goal will to be to run a bunch of small tasks in the amount of time of a
medium-to-large sized task, while being aware that since we can't interrupt
tasks, a long-running one could blow past a "scheduler" deadline.

>
https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
> File base/message_loop/message_pump.h (right):
> 
>
https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message...
> base/message_loop/message_pump.h:46: // work queue and the length of time that
> the first item in the queue has
> nit: You already know this... and we talked about it in the code... but....
> 
> It only returns the length of the internalized work-queue, not the full queue,
> for which (on average) less than half the queued tasks are recorded.  When
work
> loads are small, then this may be the entire queue (typically 0 queued items).

> When work loads are large, then there will (when the system is stable) be
about
> N tasks in the external (unmeasured) queue, and between 0 and N tasks (average
> around N/2) tasks in the internal queue.
> 
> As a result, I think this metric will (on average) return a number that is
1/4th
> the queue size (when it returns any size at all)... and of course 0 is 1/4th
of
> 0... so it probably always returns a metric that is 1/4th of reality (on
> average).

I've added this caveat as a comment. In my own tests, I haven't quite noticed
this extreme of a distribution between the two queues (though again on an
engineering workstation), but there certainly can be quite a disparity (I
noticed closer to 1/2 of reality), so it's good to highlight that. I don't think
this is a very good general API, to be honest, and I wish it could be an
implementation detail, but it's unfortunately the only way to extract the
information for use in the MessagePumpInstrumentation. I'll try and keep this in
the tree for as short a time as possible to avoid abuse.

Robert Sesek

7 years, 3 months ago (2013-09-05 14:53:32 UTC) #21

Message was sent while issue was closed.

Committed patchset #7 manually as r221427 (presubmit successful).

Issue 22911026: Add instrumentation to the MessagePumpMac family of classes. (Closed)

Description

Patch Set 1 #

Patch Set 2 : iOS compile #

Patch Set 3 : Address comments #

Patch Set 4 : More comments from mark #

Patch Set 5 : Address some comments from jar #

Patch Set 6 : Rebase #

Patch Set 7 : Add caveat #

Messages