|
|
Created:
7 years, 4 months ago by Robert Sesek Modified:
7 years, 3 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, sadrul Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdd instrumentation to the MessagePumpMac family of classes.
This adds UMA-uploaded histograms of sampling intervals for key points of data:
* Total time spent in a CFRunLoop
* Total time waiting in CFRunLoop
* Total time servicing MessagePump::Delegate (the MessageLoop)
* The MessageLoop queue size and queueing delay before servicing
It adds 1 second sampling for 15 seconds at startup, only for the main thread
(NSApplication-based) run loops.
The data will be used to determine if adding scheduling-like behavior to the
MessagePump will result in more efficient processing of the MessageLoop work.
An unforunate side effect of this change is exposing another method on the
MessagePump::Delegate interface, but there does not appear to be a better way
to do this.
BUG=264886
R=jar@chromium.org, mark@chromium.org
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=221427
Patch Set 1 #Patch Set 2 : iOS compile #
Total comments: 11
Patch Set 3 : Address comments #
Total comments: 4
Patch Set 4 : More comments from mark #
Total comments: 16
Patch Set 5 : Address some comments from jar #
Total comments: 4
Patch Set 6 : Rebase #Patch Set 7 : Add caveat #
Messages
Total messages: 21 (0 generated)
mark@ for initial review. I'll probably add darin@ and jar@ for followup.
https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j < QUEUE_SIZE; ++j) { There’s no indication in the enum above that QUEUE_SIZE has been overloaded to be “first thing in the list that’s not time-based.” Someone could add something before QUEUE_SIZE and mess this up. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:176: } That was only 75 histograms. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:186: case HISTOGRAM_EVENT_MAX: This line is not needed. (Some compiler with lots of warnings might get upset if it sees you’ve handled all of the possible enum values AND provided a default:, so it might be better to remove this line.) https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:217: Timer timer_; …this is running on a Timer. And the timers are dispatched by the message loop. 1. Would a CFTimer be better? Still dispatched by the message loop, still not guaranteed to run in any time, but maybe you’d get better priority given the current loop structure? 2. Maybe you should do something to track the actual time it took for your timer to fire. You’d then have a massive data-processing pain to deal with the data that produced, though. 3. Maybe it’s fine as-is, but tracking sample_count_ as a simple monotonically-increasing thing per timer “generation” is bad. Maybe what you need to do is set the timer on each iteration so that its fire time is as close to the time you want as possible, and maybe you need to adjust things so that if you expected a timer to fire at 2 seconds past the epoch to record things that happened in the [1.0, 2.0) interval, but it actually fired at 4 seconds, you record them as having happened in the [1.0, 4.0) interval, since that’s the best information you’d really have at that point. But I don’t know if that would work well for analysis with your current schema either. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:287: PreWaitObserver, Either your new code is temporary (and you should add a comment explaining why you call something called PreWaitObserver for post-wait also) or it’s not (and you should rename PreWaitObserver).
https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:161: for (HistogramEvent j = LOOP_CYCLE; j < QUEUE_SIZE; ++j) { On 2013/08/20 22:44:34, Mark Mentovai wrote: > There’s no indication in the enum above that QUEUE_SIZE has been overloaded to > be “first thing in the list that’s not time-based.” Someone could add something > before QUEUE_SIZE and mess this up. "Solved" with a comment. Fixing this with an in-between marker value makes the code to account for it harder to follow IMO. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:176: } On 2013/08/20 22:44:34, Mark Mentovai wrote: > That was only 75 histograms. I'll leave it to jar@ to comment on the implications. I could probably reduce sampling_duration_ if necessary. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:186: case HISTOGRAM_EVENT_MAX: On 2013/08/20 22:44:34, Mark Mentovai wrote: > This line is not needed. (Some compiler with lots of warnings might get upset if > it sees you’ve handled all of the possible enum values AND provided a default:, > so it might be better to remove this line.) Done. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:217: Timer timer_; On 2013/08/20 22:44:34, Mark Mentovai wrote: > …this is running on a Timer. And the timers are dispatched by the message loop. Sigh, I know. > 1. Would a CFTimer be better? Still dispatched by the message loop, still not > guaranteed to run in any time, but maybe you’d get better priority given the > current loop structure? Yes, I've switched to a CFRunLoopTimer and it performs pretty well. I experimented with using the events the Instrumentation is recording as chances for advancing the generation. That had problems when the run loop went to sleep and there was nothing to interrupt it to advance the generation. > 2. Maybe you should do something to track the actual time it took for your timer > to fire. You’d then have a massive data-processing pain to deal with the data > that produced, though. Done. I'm not sure it can be used to correct any of the data (and I'm not sure it matters _too_ much). I'm more concerned with how data changes over time, local and absolute minima and maxima, and the effect of a dependent variable (the scheduling behavior) on it. If data doesn't line exactly up into buckets, that's probably fine for my purposes. > 3. Maybe it’s fine as-is, but tracking sample_count_ as a simple > monotonically-increasing thing per timer “generation” is bad. Maybe what you > need to do is set the timer on each iteration so that its fire time is as close > to the time you want as possible, and maybe you need to adjust things so that if > you expected a timer to fire at 2 seconds past the epoch to record things that > happened in the [1.0, 2.0) interval, but it actually fired at 4 seconds, you > record them as having happened in the [1.0, 4.0) interval, since that’s the best > information you’d really have at that point. But I don’t know if that would work > well for analysis with your current schema either. Now that I'm recording timer skew, I'm also advancing sample_count_ (now sample_generation_) by any skew that is greater than or equal to sampling_interval_. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:287: PreWaitObserver, On 2013/08/20 22:44:34, Mark Mentovai wrote: > Either your new code is temporary (and you should add a comment explaining why > you call something called PreWaitObserver for post-wait also) or it’s not (and > you should rename PreWaitObserver). Done.
This is pretty much LGTM now, but you still need to get at least jar’s review. https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... base/message_loop/message_loop.cc:662: if (queue_size) Since nobody needs to call this with NULL arguments, you can get rid of the code that lets the arguments be NULL, and just document that both arguments are required. That will let you clean up the logic in this function in a way that will make the block on lines 665-669 substantially more readable. https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... base/message_loop/message_pump_mac.mm:167: const char kHistogramName[] = "MessagePumpMac.%s.SampleMs.%lld"; Can you use PRId64 here?
+jar for my extensive use of histograms and another base/ reviwer https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... base/message_loop/message_loop.cc:662: if (queue_size) On 2013/08/22 18:00:12, Mark Mentovai wrote: > Since nobody needs to call this with NULL arguments, you can get rid of the code > that lets the arguments be NULL, and just document that both arguments are > required. That will let you clean up the logic in this function in a way that > will make the block on lines 665-669 substantially more readable. Done. https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/12001/base/message_loop/message... base/message_loop/message_pump_mac.mm:167: const char kHistogramName[] = "MessagePumpMac.%s.SampleMs.%lld"; On 2013/08/22 18:00:12, Mark Mentovai wrote: > Can you use PRId64 here? Done.
I'm not clear on where this whole investigation is going. Perhaps you can provide a link for me to read about what you are contemplating? Have you looked at stats gathered in about:profiler? We gathere a lot of stats about queue latency, without using sampling methodology. I would expect it to have a lot of info of interest to you. I'm very wary of biases exposed by attempts to "sample" data that relates to how busy the system is. You might actually do better using a ordinary delayed task as a timer... since there would be no (sampling) timers injected into the system when you were doing the sampling (assuming you never post more than one at at time) but again... the biasing impact has my head spinning a tad. I think there are a few mistakes and confusion in what you do sample... and I tried to list what I saw below... but since I'm not clear on what you're after, it is hard to say "what is right" to do. https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/6001/base/message_loop/message_... base/message_loop/message_pump_mac.mm:176: } On 2013/08/22 17:49:15, rsesek wrote: > On 2013/08/20 22:44:34, Mark Mentovai wrote: > > That was only 75 histograms. > > I'll leave it to jar@ to comment on the implications. I could probably reduce > sampling_duration_ if necessary. I'm not convinced that all these histogtrams are useful (which is I think what Mark was quietly suggesting by pointing out the total). I'd be curious to understand better what you are trying to do, as I'm not at all following the utility. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:661: size_t size = work_queue_.size(); Note that the "work_queue" is not really the full queue, but is only the portion of the queue that is visible without a lock, and accessed locally on this thread. See ReloadWorkQueue() if you care about seeing what else is queued. Under heavy load, I'd expect the work_queue.size() to be about half the actual (combined) queue size... but things will vary. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:662: *queue_size = size + delayed_work_queue_.size(); Why is it interesting to add together pending (ready to run) queue size, plus the delayed_work_queue (which probably mostly can't be run for a while... can be an arbitrarily queue large... and would have no significant impact on performance)? https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:669: const PendingTask& next_to_run = work_queue_.front(); The work_queue_ could be empty... so you should test first. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump.h (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump.h:47: // been waiting to run. Define semantics if the work queue is empty. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump_mac.h (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.h:143: static void PrePostWaitObserver(CFRunLoopObserverRef observer, This name seems worse than the original. What semantic difference are you trying to illuminate? The old name seemed to mean that this observer was called just before waiting (which seems like the right time to call for idle work, as we're about to sleep). The new names seems to mean that just after we're done waiting (yikes: When we wake up due to scheduled work?) we call this observer before we call the PostWaitObserver?. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.mm:169: for (TimeDelta i; i < sampling_duration_; i += sampling_interval_) { I'm not clear at all why you want these different samples contained in separate histograms. This all seems random... so why separate stuff sampled during a first interval from things in a second interval?? https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.mm:221: UMA_HISTOGRAM_TIMES("MessagePumpMac.TimerSkew", sampling_interval_ - delta); I think you're trying to measure how the actual interval differs from the planned interval. How does that impact anything? https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.mm:614: return; This maybe suggests it could be called before or after. Perhaps a name like: PreOrPostWaitObserver would be meaningful. The word "Post" could be a word-prefix (meaning before), but it also has meaning in the context of MessageLoop to mean "Post a task." I'd suggest avoiding its use here.... so maybe: StartOrEndWaitObserver() ...or perhaps we should have two distinct observers... and not pass in a boolean to distinguish them.
On 2013/08/22 23:03:39, jar wrote: > I'm not clear on where this whole investigation is going. Perhaps you can > provide a link for me to read about what you are contemplating? Sure, the linked bug (https://code.google.com/p/chromium/issues/detail?id=264886) has the full details. In short: we've noticed that the MessagePump could probably be more efficient about servicing the MessageLoop. I've confirmed this and some other experiments on my machines, but I'd like data from production to verify my initial findings any and experiments. Ultimately, what I want to do is try something like https://codereview.chromium.org/20803005 (do not even bother reviewing at this stage :)) in a Finch field trial to compare results, potentially using Finch to control different time quanta in the "scheduler." > Have you looked at stats gathered in about:profiler? We gathere a lot of stats > about queue latency, without using sampling methodology. I would expect it to > have a lot of info of interest to you. Yes! about:profiler is what I used to perform my initial investigation (https://codereview.chromium.org/20792003). However, I want data from production, not an engineering workstation. Unfortunately, about:profiling is not enabled by default and is not shipped back to us like UMA. This is why I'm trying to replicate about:profiling via sampling here. I want to go with a sampling approach, as opposed to just aggregating all values into single histograms, because my experiments have shown the values vary significantly throughout the startup process. I've found that startup comes in roughly three "waves". Very early in startup, there is a medium-sized queue with significant queueing delays. The second wave is a long queue (1k+ items) of tasks that all execute quickly. The third wave is a small queue of long-running tasks that have a moderate queuing delay. Between each wave is a quiet/efficient period of work. Initially, I want to know if the above conclusion roughly matches real-world data. For example, I was also only testing a single, fairly repeatable (though I think common) startup scenario. Later, after I start experimenting, I want to know how my experiments affect these "waves" individually (it may positively affect some work conditions and adversely others). In short: the absolute maximum and minimum for these data are definitely interesting by themselves (if we were to aggregate in a single histogram, rather than into samples), but I'm more interested in the local minima and maxima for what I'm doing. > I'm very wary of biases exposed by attempts to "sample" data that relates to how > busy the system is. You might actually do better using a ordinary delayed task > as a timer... since there would be no (sampling) timers injected into the system > when you were doing the sampling (assuming you never post more than one at at > time) but again... the biasing impact has my head spinning a tad. I'm a little confused by your first sentence. In patch set 1, I used a base::Timer. I found that using a system timer gives better fidelity to the interval. This is probably because the system timer will be treated as an independent run loop source, whereas a delayed task will be serviced along with all other delayed tasks on the single delayed_work_timer_. > I think there are a few mistakes and confusion in what you do sample... and I > tried to list what I saw below... but since I'm not clear on what you're after, > it is hard to say "what is right" to do. That's fair. Please let me know if you have more questions or better ideas after reading the bug. It's the most thorough explanation I've written on the problem we're working at. In the Perfect World™, I'd want to turn on tracing for specific events and have it shipped back, but that doesn't seem possible. This is the closest approximation to that I could think of. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:661: size_t size = work_queue_.size(); On 2013/08/22 23:03:39, jar wrote: > Note that the "work_queue" is not really the full queue, but is only the portion > of the queue that is visible without a lock, and accessed locally on this > thread. > > See ReloadWorkQueue() if you care about seeing what else is queued. > > Under heavy load, I'd expect the work_queue.size() to be about half the actual > (combined) queue size... but things will vary. > Yes, I know the incoming_queue_ is not included in this count, but I did not want to take a lock to get that information. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:662: *queue_size = size + delayed_work_queue_.size(); On 2013/08/22 23:03:39, jar wrote: > Why is it interesting to add together pending (ready to run) queue size, plus > the delayed_work_queue (which probably mostly can't be run for a while... can be > an arbitrarily queue large... and would have no significant impact on > performance)? Fair point. I did not count this in my initial investigation, so removed. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:669: const PendingTask& next_to_run = work_queue_.front(); On 2013/08/22 23:03:39, jar wrote: > The work_queue_ could be empty... so you should test first. Already checking size above. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump.h (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump.h:47: // been waiting to run. On 2013/08/22 23:03:39, jar wrote: > Define semantics if the work queue is empty. Done. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump_mac.h (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.h:143: static void PrePostWaitObserver(CFRunLoopObserverRef observer, On 2013/08/22 23:03:39, jar wrote: > This name seems worse than the original. What semantic difference are you > trying to illuminate? > > The old name seemed to mean that this observer was called just before waiting > (which seems like the right time to call for idle work, as we're about to > sleep). > > The new names seems to mean that just after we're done waiting (yikes: When we > wake up due to scheduled work?) we call this observer before we call the > PostWaitObserver?. It matches the EnterExit observer that's already present. The name is trying to convey that it is called prior to going to sleep and after waking up. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.mm:221: UMA_HISTOGRAM_TIMES("MessagePumpMac.TimerSkew", sampling_interval_ - delta); On 2013/08/22 23:03:39, jar wrote: > I think you're trying to measure how the actual interval differs from the > planned interval. How does that impact anything? I guess measuring it here is not really necessary since it's taken care of below with the skew_advance loop. https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_pump_mac.mm:614: return; On 2013/08/22 23:03:39, jar wrote: > This maybe suggests it could be called before or after. Perhaps a name like: > PreOrPostWaitObserver would be meaningful. > > The word "Post" could be a word-prefix (meaning before), but it also has meaning > in the context of MessageLoop to mean "Post a task." I'd suggest avoiding its > use here.... so maybe: > > StartOrEndWaitObserver() > > ...or perhaps we should have two distinct observers... and not pass in a boolean > to distinguish them. Each observer actually has an associated cost (a lock and some processing overhead) in the system run loop, so I don't think adding another is the best solution. I like StartOrEndWaitObserver().
On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: > >> Have you looked at stats gathered in about:profiler? We gathere a lot of >> > stats > >> about queue latency, without using sampling methodology. I would expect >> it to >> have a lot of info of interest to you. >> > > Yes! about:profiler is what I used to perform my initial investigation > (https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...). > However, I want data from > production, not an engineering workstation. Unfortunately, about:profiling > is > not enabled by default and is not shipped back to us like UMA. > > FWIW: about:profiling is enabled by default full-time in all official releases of Chrome. It was written to be cross-platform, so I *think* it runs on Mac as well.. We also currently send back the data from startup as part of UMA (data from the first 30 seconds).. We haven't (at google) done work to visualize this data... but this would perchance be a good time to start :-). I've wanted to write code to visualize it... but I've always had other more pressing things to do. Isherman also started to do this... and got pulled off with other stuff :-(. Jim To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote: > On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: > >> >>> Have you looked at stats gathered in about:profiler? We gathere a lot of >>> >> stats >> >>> about queue latency, without using sampling methodology. I would expect >>> it to >>> have a lot of info of interest to you. >>> >> >> Yes! about:profiler is what I used to perform my initial investigation >> (https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...). >> However, I want data from >> production, not an engineering workstation. Unfortunately, >> about:profiling is >> not enabled by default and is not shipped back to us like UMA. >> >> > FWIW: about:profiling is enabled by default full-time in all official > releases of Chrome. It was written to be cross-platform, so I *think* it > runs on Mac as well.. We also currently send back the data from startup as > part of UMA (data from the first 30 seconds).. We haven't (at google) done > work to visualize this data... but this would perchance be a good time to > start :-). I've wanted to write code to visualize it... but I've always > had other more pressing things to do. Isherman also started to do this... > and got pulled off with other stuff :-(. > Yes, about:profiling does work on Mac, but it's not enabled by default in production builds and its data does not appear to have an upload pipeline. I looked into the startup data you mentioned though, and it doesn't seem like it quite aligns with my goals. The MetricsService uses the TrackingSynchronizer to record the tracked_objects::ThreadData, which is not (exclusively) what I want to measure. I'm trying to gather performance data about the underlying system run loop, rather than information about Tasks. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... File base/message_loop/message_loop.cc (right): https://codereview.chromium.org/22911026/diff/19001/base/message_loop/message... base/message_loop/message_loop.cc:669: const PendingTask& next_to_run = work_queue_.front(); On 2013/08/23 16:58:24, rsesek wrote: > On 2013/08/22 23:03:39, jar wrote: > > The work_queue_ could be empty... so you should test first. > > Already checking size above. <doh>... I thought you checked the sum of the sizes (queue_size). You are correct.
On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org> wrote: > On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote: > >> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: >> >>> >>>> Have you looked at stats gathered in about:profiler? We gathere a lot >>>> of >>>> >>> stats >>> >>>> about queue latency, without using sampling methodology. I would >>>> expect it to >>>> have a lot of info of interest to you. >>>> >>> >>> Yes! about:profiler is what I used to perform my initial investigation >>> (https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...). >>> However, I want data from >>> production, not an engineering workstation. Unfortunately, >>> about:profiling is >>> not enabled by default and is not shipped back to us like UMA. >>> >>> >> FWIW: about:profiling is enabled by default full-time in all official >> releases of Chrome. It was written to be cross-platform, so I *think* it >> runs on Mac as well.. We also currently send back the data from startup as >> part of UMA (data from the first 30 seconds).. We haven't (at google) done >> work to visualize this data... but this would perchance be a good time to >> start :-). I've wanted to write code to visualize it... but I've always >> had other more pressing things to do. Isherman also started to do this... >> and got pulled off with other stuff :-(. >> > > Yes, about:profiling does work on Mac, but it's not enabled by default in > production builds > It is supposed to be enabled on all platforms. I just checked, and this does indeed include Mac. Have you tried ot visit about:profiler in a release version of Chrome? > and its data does not appear to have an upload pipeline. > It currently is uploaded on all platforms with the first UMA upload (at the 60 second point). Until we have visualization of those metrics, we didn't want to spend time considering what other upload point we might continue (such as shutdown??). > I looked into the startup data you mentioned though, and it doesn't seem > like it quite aligns with my goals. The MetricsService uses the > TrackingSynchronizer to record the tracked_objects::ThreadData, which is > not (exclusively) what I want to measure. I'm trying to gather performance > data about the underlying system run loop, rather than information about > Tasks. > ...probably not everything you're after... but at one time, I wrote code to track sleep time on each thread, as though it was a task. That *might* be of interest to you. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Tue, Aug 27, 2013 at 6:30 PM, Jim Roskind <jar@chromium.org> wrote: > On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org> wrote: > >> On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote: >> >>> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: >>> >>>> >>>>> Have you looked at stats gathered in about:profiler? We gathere a lot >>>>> of >>>>> >>>> stats >>>> >>>>> about queue latency, without using sampling methodology. I would >>>>> expect it to >>>>> have a lot of info of interest to you. >>>>> >>>> >>>> Yes! about:profiler is what I used to perform my initial investigation >>>> (https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...). >>>> However, I want data from >>>> production, not an engineering workstation. Unfortunately, >>>> about:profiling is >>>> not enabled by default and is not shipped back to us like UMA. >>>> >>>> >>> FWIW: about:profiling is enabled by default full-time in all official >>> releases of Chrome. It was written to be cross-platform, so I *think* it >>> runs on Mac as well.. We also currently send back the data from startup as >>> part of UMA (data from the first 30 seconds).. We haven't (at google) done >>> work to visualize this data... but this would perchance be a good time to >>> start :-). I've wanted to write code to visualize it... but I've always >>> had other more pressing things to do. Isherman also started to do this... >>> and got pulled off with other stuff :-(. >>> >> >> Yes, about:profiling does work on Mac, but it's not enabled by default in >> production builds >> > > It is supposed to be enabled on all platforms. I just checked, and this > does indeed include Mac. Have you tried ot visit about:profiler in a > release version of Chrome? > I think I'm conflating some terms here, so let me clarify. about:profiling is shipped with the production binary, but arbitrary TRACE_EVENT macros are not active at startup and are not uploaded to UMA. Profiling data about task births, deaths, and running times (i.e. tracked_objects via FROM_HERE) is shipped with the binary, is enabled at startup, and is uploaded. > and its data does not appear to have an upload pipeline. >> > > It currently is uploaded on all platforms with the first UMA upload (at > the 60 second point). Until we have visualization of those metrics, we > didn't want to spend time considering what other upload point we might > continue (such as shutdown??). > The data are limited to tracked_objects -- i.e., only work that's posted to the MessageLoop. What I'm trying to capture are things happening outside/below the MessageLoop, so tracked_objects data are not really useful. I have not found anything for uploading arbitrary TRACE_EVENTS in the protobufs used by the MetricsService, which indicated to me that data from about:profiling cannot at present be uploaded. Therefore, I've created this set of histograms to mimic as close as possible a category of arbitrary TRACE_EVENTS uploaded to UMA for the first 15 seconds of startup. This CL from my initial investigation shows the TRACE_EVENTS I'd add if those could be uploaded: https://codereview.chromium.org/20792003 > I looked into the startup data you mentioned though, and it doesn't seem >> like it quite aligns with my goals. The MetricsService uses the >> TrackingSynchronizer to record the tracked_objects::ThreadData, which is >> not (exclusively) what I want to measure. I'm trying to gather performance >> data about the underlying system run loop, rather than information about >> Tasks. >> > > ...probably not everything you're after... but at one time, I wrote code > to track sleep time on each thread, as though it was a task. That *might* > be of interest to you. > That is one dimension that I'm interested in, but I'm focusing here on how the system run loop is dividing work between the MessagePump and other system work, and how those divisions affect the efficiency of the MessageLoop. Thread sleep time is one of the things I'm recording, but there are several others. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
A note from the sidelines: I think you're conflating about:profiler and about:tracing. about:tracing data is definitely currently included in UMA uploads, whereas about:profiler data is. On Tue, Aug 27, 2013 at 4:54 PM, Robert Sesek <rsesek@chromium.org> wrote: > On Tue, Aug 27, 2013 at 6:30 PM, Jim Roskind <jar@chromium.org> wrote: > >> On Mon, Aug 26, 2013 at 9:21 AM, Robert Sesek <rsesek@chromium.org>wrote: >> >>> On Fri, Aug 23, 2013 at 6:24 PM, Jim Roskind <jar@chromium.org> wrote: >>> >>>> On Fri, Aug 23, 2013 at 9:58 AM, <rsesek@chromium.org> wrote: >>>> >>>>> >>>>>> Have you looked at stats gathered in about:profiler? We gathere a >>>>>> lot of >>>>>> >>>>> stats >>>>> >>>>>> about queue latency, without using sampling methodology. I would >>>>>> expect it to >>>>>> have a lot of info of interest to you. >>>>>> >>>>> >>>>> Yes! about:profiler is what I used to perform my initial investigation >>>>> (https://codereview.chromium.**org/20792003<https://codereview.chromium.org/20...). >>>>> However, I want data from >>>>> production, not an engineering workstation. Unfortunately, >>>>> about:profiling is >>>>> not enabled by default and is not shipped back to us like UMA. >>>>> >>>>> >>>> FWIW: about:profiling is enabled by default full-time in all official >>>> releases of Chrome. It was written to be cross-platform, so I *think* it >>>> runs on Mac as well.. We also currently send back the data from startup as >>>> part of UMA (data from the first 30 seconds).. We haven't (at google) done >>>> work to visualize this data... but this would perchance be a good time to >>>> start :-). I've wanted to write code to visualize it... but I've always >>>> had other more pressing things to do. Isherman also started to do this... >>>> and got pulled off with other stuff :-(. >>>> >>> >>> Yes, about:profiling does work on Mac, but it's not enabled by default >>> in production builds >>> >> >> It is supposed to be enabled on all platforms. I just checked, and this >> does indeed include Mac. Have you tried ot visit about:profiler in a >> release version of Chrome? >> > > I think I'm conflating some terms here, so let me clarify. about:profiling > is shipped with the production binary, but arbitrary TRACE_EVENT macros are > not active at startup and are not uploaded to UMA. Profiling data about > task births, deaths, and running times (i.e. tracked_objects via FROM_HERE) > is shipped with the binary, is enabled at startup, and is uploaded. > > >> and its data does not appear to have an upload pipeline. >>> >> >> It currently is uploaded on all platforms with the first UMA upload (at >> the 60 second point). Until we have visualization of those metrics, we >> didn't want to spend time considering what other upload point we might >> continue (such as shutdown??). >> > > The data are limited to tracked_objects -- i.e., only work that's posted > to the MessageLoop. What I'm trying to capture are things happening > outside/below the MessageLoop, so tracked_objects data are not really > useful. I have not found anything for uploading arbitrary TRACE_EVENTS in > the protobufs used by the MetricsService, which indicated to me that data > from about:profiling cannot at present be uploaded. Therefore, I've created > this set of histograms to mimic as close as possible a category of > arbitrary TRACE_EVENTS uploaded to UMA for the first 15 seconds of startup. > This CL from my initial investigation shows the TRACE_EVENTS I'd add if > those could be uploaded: https://codereview.chromium.org/20792003 > > >> I looked into the startup data you mentioned though, and it doesn't >>> seem like it quite aligns with my goals. The MetricsService uses the >>> TrackingSynchronizer to record the tracked_objects::ThreadData, which is >>> not (exclusively) what I want to measure. I'm trying to gather performance >>> data about the underlying system run loop, rather than information about >>> Tasks. >>> >> >> ...probably not everything you're after... but at one time, I wrote code >> to track sleep time on each thread, as though it was a task. That *might* >> be of interest to you. >> > > That is one dimension that I'm interested in, but I'm focusing here on how > the system run loop is dividing work between the MessagePump and other > system work, and how those divisions affect the efficiency of the > MessageLoop. Thread sleep time is one of the things I'm recording, but > there are several others. > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On Tue, Aug 27, 2013 at 7:57 PM, Ilya Sherman <isherman@chromium.org> wrote: > A note from the sidelines: I think you're conflating about:profiler and > about:tracing. about:tracing data is definitely currently included in UMA > uploads, whereas about:profiler data is. > Oops! You're right; I do mean about:tracing. To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Are we already placing an observer into the list just before and after each task run? Has anyone looked at the performance impact of this? https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = CFRunLoopObserverCreate(NULL, // allocator I'm surprised we do this. Adding an observer on a per-message basis has a large impact on performance (as I recall), even if you try to only do work infrequently.
https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = CFRunLoopObserverCreate(NULL, // allocator On 2013/08/29 22:15:08, jar wrote: > I'm surprised we do this. Adding an observer on a per-message basis has a large > impact on performance (as I recall), even if you try to only do work > infrequently. This observer is added once at the creation of the MessagePump, though it's called every pass through the system run loop as appropriate. The cost is small.
On 2013/08/30 16:25:07, rsesek wrote: > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > File base/message_loop/message_pump_mac.mm (right): > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = > CFRunLoopObserverCreate(NULL, // allocator > On 2013/08/29 22:15:08, jar wrote: > > I'm surprised we do this. Adding an observer on a per-message basis has a > large > > impact on performance (as I recall), even if you try to only do work > > infrequently. > > This observer is added once at the creation of the MessagePump, though it's > called every pass through the system run loop as appropriate. The cost is small. It isn't that clear that the "recurring cost is small." In the past, when I was single stepping through assembly code in the message loop, I spotted the observer (just before and maybe after the task.run() invocation). It used to routinely check the a list, which was at the time empty. The cost of each "examination of the empty list" used to include (as I recall) two lock acquisitions and releases, and at least about 3 mallocs and frees (and that was for an empty list!). I think the cost derived from the fancy nature of the observer lists, which allowed for async multi-thread/reentrant addition and deletion from the list. I added a short-circuit in the "empty list" case to avoid most of this overhead, but if the list has even one element, the costs (and more) reappear.\ Have you looked at the (recurring) cost of having an observer on this list?
On 2013/08/30 16:41:57, jar wrote: > On 2013/08/30 16:25:07, rsesek wrote: > > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > > File base/message_loop/message_pump_mac.mm (right): > > > > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > > base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = > > CFRunLoopObserverCreate(NULL, // allocator > > On 2013/08/29 22:15:08, jar wrote: > > > I'm surprised we do this. Adding an observer on a per-message basis has a > > large > > > impact on performance (as I recall), even if you try to only do work > > > infrequently. > > > > This observer is added once at the creation of the MessagePump, though it's > > called every pass through the system run loop as appropriate. The cost is > small. > > It isn't that clear that the "recurring cost is small." > > In the past, when I was single stepping through assembly code in the message > loop, I spotted the observer (just before and maybe after the task.run() > invocation). It used to routinely check the a list, which was at the time > empty. The cost of each "examination of the empty list" used to include (as I > recall) two lock acquisitions and releases, and at least about 3 mallocs and > frees (and that was for an empty list!). I think the cost derived from the > fancy nature of the observer lists, which allowed for async > multi-thread/reentrant addition and deletion from the list. I added a > short-circuit in the "empty list" case to avoid most of this overhead, but if > the list has even one element, the costs (and more) reappear.\ > > Have you looked at the (recurring) cost of having an observer on this list? I'm a bit confused - there is no "list" here and this isn't adding an ObserverList. This is an observer on the system's run loop that is called out to before the run loop is going to sleep (which we use for DoIdleWork), and I'm adding here the corresponding event for when the run loop is waking up from sleep so that I can measure the sleep duration. This is the documentation for this observer, FYI: https://developer.apple.com/library/mac/documentation/CoreFoundation/Referenc... One of the things this instrumentation will be used to measure is the cost of small but recurring events (something I call out explicitly in the bug). As is the nature of adding instrumentation code, there is some cost, but I think this data this would collect is significant to warrant paying for it.
Please correct the comment (nit) listed below, or explain why there isn't a factor of 4 error in the metric, then LGTM. As a general comment on the goal: IMO, when we have "too many tasks" being queued up, we should look to see if those tasks can be coalesced, generally at a higher level. As I understand the bug, you are wondering why the overhead on a loop is "so high" as compared with task execution time. I think the coalescing approach will often be much more effective than searching for ways to run more tasks sooner (with less yielding and checking for "other things" to do). Coalescing can often skip over portions of tasks that will be "redone" by later tasks, and can also benefit from maintaining memory context (cache state) from one task to a related task (that could be combined). If you can run the tasks a lot faster.... more power to you... as we tried to minimize overhead. It is critical to avoid starving "other things," and on some platforms (most notably, windows), it is expensive to constantly check the time to see if you should give a slot to "other things" instead of tasks. It may be that on the Mac there is different mix of tasks and "other things" so there may be some value to gain... but it should be even larger if you could coalesce the tasks. https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... File base/message_loop/message_pump.h (right): https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... base/message_loop/message_pump.h:46: // work queue and the length of time that the first item in the queue has nit: You already know this... and we talked about it in the code... but.... It only returns the length of the internalized work-queue, not the full queue, for which (on average) less than half the queued tasks are recorded. When work loads are small, then this may be the entire queue (typically 0 queued items). When work loads are large, then there will (when the system is stable) be about N tasks in the external (unmeasured) queue, and between 0 and N tasks (average around N/2) tasks in the internal queue. As a result, I think this metric will (on average) return a number that is 1/4th the queue size (when it returns any size at all)... and of course 0 is 1/4th of 0... so it probably always returns a metric that is 1/4th of reality (on average). https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... File base/message_loop/message_pump_mac.mm (right): https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... base/message_loop/message_pump_mac.mm:328: pre_wait_observer_ = CFRunLoopObserverCreate(NULL, // allocator On 2013/08/30 16:25:07, rsesek wrote: > On 2013/08/29 22:15:08, jar wrote: > > I'm surprised we do this. Adding an observer on a per-message basis has a > large > > impact on performance (as I recall), even if you try to only do work > > infrequently. > > This observer is added once at the creation of the MessagePump, though it's > called every pass through the system run loop as appropriate. The cost is small. I see... this is a Mac thing... not the generic task observer list that I've seen placed in the message loop. Sorry for my confusion.
On 2013/09/04 01:20:16, jar wrote: > As a general comment on the goal: IMO, when we have "too many tasks" being > queued up, we should look to see if those tasks can be coalesced, generally at a > higher level. As I understand the bug, you are wondering why the overhead on a > loop is "so high" as compared with task execution time. I think the coalescing > approach will often be much more effective than searching for ways to run more > tasks sooner (with less yielding and checking for "other things" to do). > Coalescing can often skip over portions of tasks that will be "redone" by later > tasks, and can also benefit from maintaining memory context (cache state) from > one task to a related task (that could be combined). This is something that I'm aware of. When using my about:tracing implementation, I did notice one specific FROM_HERE was repeatedly being sent to the UI loop. I'm investigating whether that can be coalesced (it's related to sending cookie change notifications to extensions from the IO thread) and will be filing a bug soon. I can CC you on that. Most of the other tasks did not look like they would/could benefit from coalescing. > If you can run the tasks a lot faster.... more power to you... as we tried to > minimize overhead. It is critical to avoid starving "other things," and on some > platforms (most notably, windows), it is expensive to constantly check the time > to see if you should give a slot to "other things" instead of tasks. It may be > that on the Mac there is different mix of tasks and "other things" so there may > be some value to gain... but it should be even larger if you could coalesce the > tasks. Agreed that starvation is bad. When I move on to the experimentation phase, I'm going to be sure that any changes do not starve the other run loop sources. The goal will to be to run a bunch of small tasks in the amount of time of a medium-to-large sized task, while being aware that since we can't interrupt tasks, a long-running one could blow past a "scheduler" deadline. > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > File base/message_loop/message_pump.h (right): > > https://codereview.chromium.org/22911026/diff/24001/base/message_loop/message... > base/message_loop/message_pump.h:46: // work queue and the length of time that > the first item in the queue has > nit: You already know this... and we talked about it in the code... but.... > > It only returns the length of the internalized work-queue, not the full queue, > for which (on average) less than half the queued tasks are recorded. When work > loads are small, then this may be the entire queue (typically 0 queued items). > When work loads are large, then there will (when the system is stable) be about > N tasks in the external (unmeasured) queue, and between 0 and N tasks (average > around N/2) tasks in the internal queue. > > As a result, I think this metric will (on average) return a number that is 1/4th > the queue size (when it returns any size at all)... and of course 0 is 1/4th of > 0... so it probably always returns a metric that is 1/4th of reality (on > average). I've added this caveat as a comment. In my own tests, I haven't quite noticed this extreme of a distribution between the two queues (though again on an engineering workstation), but there certainly can be quite a disparity (I noticed closer to 1/2 of reality), so it's good to highlight that. I don't think this is a very good general API, to be honest, and I wish it could be an implementation detail, but it's unfortunately the only way to extract the information for use in the MessagePumpInstrumentation. I'll try and keep this in the tree for as short a time as possible to avoid abuse.
Message was sent while issue was closed.
Committed patchset #7 manually as r221427 (presubmit successful). |