|
|
Created:
6 years, 7 months ago by Simon Que Modified:
6 years, 6 months ago CC:
chromium-reviews, jar (doing other things), asvitkine+watch_chromium.org Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
Descriptionmetrics: Use absolute interval-based perf collection
Replace the old CWP collection scheme with a new scheme:
- Divide post-login time into four-hour intervals.
- Randomly select a time in each interval to collect a perf profile. The
probability distribution should be even across that interval.
- Use the new SampledProfile protobuf to store the collected data.
Note that since this patch still collects only perf data, the module
perf_provider_chromeos is not renamed to reflect something more generic.
We would like to introduce other types of data collection in the future
but they may be in other modules, or we may create a new generic module
for data collection.
BUG=chromium:358778
TEST=The interval is too long to directly test. Shorten it to something
on the order of a minute and rebuild and deploy Chrome. Should be getting
some profile collections.
I added some logging to print the collected data size. It's around 12 kB,
and I get the same size data from running quipper manually:
quipper 2 `which perf` record -a -e cycles -c 1000003 > /usr/local/perf.proto
Signed-off-by: Simon Que <sque@chromium.org>
Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=273971
Patch Set 1 #
Total comments: 19
Patch Set 2 : Addressed minor issues #Patch Set 3 : Updated interval definition; fixed scheduling #Patch Set 4 : Rebased on top of blundell's refactor (http://crbug.com/375776) #Patch Set 5 : Refactor collection to allow different trigger events in the future #
Total comments: 18
Patch Set 6 : Addressed nits #
Messages
Total messages: 20 (0 generated)
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:97: : weak_factory_(this), The weak pointer factory should always be the last member in the class. This ensures that weak pointers are destroyed before other parts of the class start to be torn down. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:132: // the same time in the next interval. So, the next interval might have two random times sampled from it? I'd suggest using two timers: One that fires every 4 hours to schedule a collection, and one that fires at some point within the four-hour window to actually perform the collection. Alternately, if you prefer to have just one timer, I'd use something other than the epoch as the base time -- perhaps the time at which this class was constructed? https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:209: set_ms_after_login((base::Time::Now() - login_time_).InSeconds()); The field names for the time fields claim the units are milliseconds, but here you're logging a time delta measured in seconds. Please decide which units you want, and either fix the field names or the logging. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.h (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.h:64: base::Time login_time_; Did you mean TimeTicks?
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:30: const size_t kPerfProfilingIntervalSeconds = 14400; It's clearer to write 4 * 60 * 60. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:30: const size_t kPerfProfilingIntervalSeconds = 14400; How'd you pick four hours? You should make some effort to tailor this number to the volume of data we can actually consume. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:134: base::RandGenerator(kPerfProfilingIntervalSeconds); You want repeated calls to ScheduleCollection to be determinstic (e.g., across reboots). So I think you need to seed a rand with two things: 1) a machine-specific identifier (so different machines will get sampled at different times) 2) the start time of the period
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:97: : weak_factory_(this), On 2014/05/16 21:48:57, Ilya Sherman wrote: > The weak pointer factory should always be the last member in the class. This > ensures that weak pointers are destroyed before other parts of the class start > to be torn down. Done. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:209: set_ms_after_login((base::Time::Now() - login_time_).InSeconds()); On 2014/05/16 21:48:57, Ilya Sherman wrote: > The field names for the time fields claim the units are milliseconds, but here > you're logging a time delta measured in seconds. Please decide which units you > want, and either fix the field names or the logging. Done. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.h (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.h:64: base::Time login_time_; On 2014/05/16 21:48:57, Ilya Sherman wrote: > Did you mean TimeTicks? Done.
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:134: base::RandGenerator(kPerfProfilingIntervalSeconds); On 2014/05/16 22:17:26, tipp wrote: > You want repeated calls to ScheduleCollection to be determinstic (e.g., across > reboots). > > So I think you need to seed a rand with two things: > 1) a machine-specific identifier (so different machines will get sampled at > different times) > 2) the start time of the period I'm not sure what your first sentence means. Shouldn't we want it to be as random and non-deterministic as possible? That said, I think that seeding with start time since the epoch + machine specific identifier is fine, but just the start time by itself should be good enough too, right?
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:127: int64 seconds_since_epoch = (now - epoch).InSeconds(); How about milliseconds everywhere instead of seconds. That'll be consistent with your proto units, and it feels a little weird to align the start time to such a large unit. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:134: base::RandGenerator(kPerfProfilingIntervalSeconds); On 2014/05/17 03:46:27, Simon Que wrote: > On 2014/05/16 22:17:26, tipp wrote: > > You want repeated calls to ScheduleCollection to be determinstic (e.g., across > > reboots). > > > > So I think you need to seed a rand with two things: > > 1) a machine-specific identifier (so different machines will get sampled at > > different times) > > 2) the start time of the period > > I'm not sure what your first sentence means. Shouldn't we want it to be as > random and non-deterministic as possible? > > That said, I think that seeding with start time since the epoch + machine > specific identifier is fine, but just the start time by itself should be good > enough too, right? You want it to be deterministically random. For example, suppose the start of your interval is at time 0, and ScheduleCollection says we should do a collection at time 17. Then at time 22 we reboot, and ScheduleCollection gets called again. You want ScheduleCollection to know that the collection time (17) for this machine+interval has already passed. Otherwise, you'll end up with a bias toward machines which reboot often.
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:127: int64 seconds_since_epoch = (now - epoch).InSeconds(); On 2014/05/17 04:18:29, tipp wrote: > How about milliseconds everywhere instead of seconds. That'll be consistent > with your proto units, and it feels a little weird to align the start time to > such a large unit. Will do. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:132: // the same time in the next interval. On 2014/05/16 21:48:57, Ilya Sherman wrote: > So, the next interval might have two random times sampled from it? I'd suggest > using two timers: One that fires every 4 hours to schedule a collection, and one > that fires at some point within the four-hour window to actually perform the > collection. Alternately, if you prefer to have just one timer, I'd use > something other than the epoch as the base time -- perhaps the time at which > this class was constructed? Yes, using the time of login / initialization would be a good alternative. If the randomly selected time is already past (and it would only be slightly past, due to the time it takes to execute the computations), I think we could just run the collection at the current time. The distribution would only slightly deviate from even across time. How does that sound?
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:97: : weak_factory_(this), On 2014/05/17 03:25:07, Simon Que wrote: > On 2014/05/16 21:48:57, Ilya Sherman wrote: > > The weak pointer factory should always be the last member in the class. This > > ensures that weak pointers are destroyed before other parts of the class start > > to be torn down. > > Done. Done. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:134: base::RandGenerator(kPerfProfilingIntervalSeconds); On 2014/05/17 04:18:29, tipp wrote: > On 2014/05/17 03:46:27, Simon Que wrote: > > On 2014/05/16 22:17:26, tipp wrote: > > > You want repeated calls to ScheduleCollection to be determinstic (e.g., > across > > > reboots). > > > > > > So I think you need to seed a rand with two things: > > > 1) a machine-specific identifier (so different machines will get sampled at > > > different times) > > > 2) the start time of the period > > > > I'm not sure what your first sentence means. Shouldn't we want it to be as > > random and non-deterministic as possible? > > > > That said, I think that seeding with start time since the epoch + machine > > specific identifier is fine, but just the start time by itself should be good > > enough too, right? > > You want it to be deterministically random. For example, suppose the start of > your interval is at time 0, and ScheduleCollection says we should do a > collection at time 17. Then at time 22 we reboot, and ScheduleCollection gets > called again. You want ScheduleCollection to know that the collection time (17) > for this machine+interval has already passed. Otherwise, you'll end up with a > bias toward machines which reboot often. Why can't we seed the random generator using the time since epoch (different each time boot up / login)? In your example, ScheduleCollection will choose 17, the user reboots, and ScheduleCollection will choose a time other than 17.
https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:30: const size_t kPerfProfilingIntervalSeconds = 14400; On 2014/05/16 22:17:26, tipp wrote: > It's clearer to write 4 * 60 * 60. Done. https://codereview.chromium.org/282093011/diff/1/chrome/browser/metrics/perf_... chrome/browser/metrics/perf_provider_chromeos.cc:30: const size_t kPerfProfilingIntervalSeconds = 14400; On 2014/05/16 22:17:26, tipp wrote: > How'd you pick four hours? You should make some effort to tailor this number to > the volume of data we can actually consume. I'm not sure we have enough information to make that decision right now. However, I can increase this to something longer to be on the safe side. e.g. 24 hours.
I'll wait for you and Tipp to resolve the question of how to sample the measurements. Once that's resolved, I'll do another review pass.
On 2014/05/21 08:55:16, Ilya Sherman wrote: > I'll wait for you and Tipp to resolve the question of how to sample the > measurements. Once that's resolved, I'll do another review pass. We discussed the scheduling strategy more in person and we've reasoned it seems fine.
On 2014/05/29 21:47:44, tipp wrote: > On 2014/05/21 08:55:16, Ilya Sherman wrote: > > I'll wait for you and Tipp to resolve the question of how to sample the > > measurements. Once that's resolved, I'll do another review pass. > > We discussed the scheduling strategy more in person and we've reasoned it seems > fine. Ok. Please rebase, and then I'll take another look. Thanks.
On 2014/05/29 21:59:54, Ilya Sherman wrote: > On 2014/05/29 21:47:44, tipp wrote: > > On 2014/05/21 08:55:16, Ilya Sherman wrote: > > > I'll wait for you and Tipp to resolve the question of how to sample the > > > measurements. Once that's resolved, I'll do another review pass. > > > > We discussed the scheduling strategy more in person and we've reasoned it > seems > > fine. > > Ok. Please rebase, and then I'll take another look. Thanks. Rebased already.
On 2014/05/29 22:02:48, Simon Que wrote: > On 2014/05/29 21:59:54, Ilya Sherman wrote: > > On 2014/05/29 21:47:44, tipp wrote: > > > On 2014/05/21 08:55:16, Ilya Sherman wrote: > > > > I'll wait for you and Tipp to resolve the question of how to sample the > > > > measurements. Once that's resolved, I'll do another review pass. > > > > > > We discussed the scheduling strategy more in person and we've reasoned it > > seems > > > fine. > > > > Ok. Please rebase, and then I'll take another look. Thanks. > > Rebased already. Oh, sorry, I could have sworn I saw a timestamp that was much older. Looking now.
https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:29: // This interval is twenty-four hours. nit: Please update this comment. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:147: void PerfProvider::Activate() { nit: It might be wise to rename this method to something like "OnUserDidLogIn()", now that it directly sets login_time_ internally. As is, it would be easy to accidentally start calling Activate() from some other trigger as well, without realizing that it has this side-effect. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:173: } nit: No need for curly braces. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:221: base::TimeDelta::FromMilliseconds(kPerfProfilingIntervalMs); nit: I think it would be slightly better to reset this within ScheduleCollection(). That way, if somebody were to accidentally call ScheduleCollection() multiple times, it wouldn't doubly-schedule collections during a single interval; and if the code were to be refactored to call ScheduleCollection() from somewhere else, it wouldn't be possible to forget to update the next_profile_interval_start_. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:247: collection_data.set_ms_after_boot(perf_data_proto.timestamp_sec()); You are still assigning seconds to a millisecond field. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:249: set_ms_after_login((base::TimeTicks::Now() - login_time_) nit: Please add a DCHECK that login_time_ is non-null. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... File chrome/browser/metrics/perf_provider_chromeos.h (right): https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:32: bool GetPerfData(std::vector<SampledProfile>* perf_data); Optional nit: Perhaps name this "GetSampledProfile()", and name the variable "sampled_profile"? https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:61: // interval. nit: Please update this comment. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:66: void CollectIfNecessary(SampledProfile::TriggerEvent trigger_event); nit: Please update the docs.
https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... File chrome/browser/metrics/perf_provider_chromeos.cc (right): https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:29: // This interval is twenty-four hours. On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: Please update this comment. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:147: void PerfProvider::Activate() { On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: It might be wise to rename this method to something like > "OnUserDidLogIn()", now that it directly sets login_time_ internally. As is, it > would be easy to accidentally start calling Activate() from some other trigger > as well, without realizing that it has this side-effect. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:173: } On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: No need for curly braces. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:221: base::TimeDelta::FromMilliseconds(kPerfProfilingIntervalMs); On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: I think it would be slightly better to reset this within > ScheduleCollection(). That way, if somebody were to accidentally call > ScheduleCollection() multiple times, it wouldn't doubly-schedule collections > during a single interval; and if the code were to be refactored to call > ScheduleCollection() from somewhere else, it wouldn't be possible to forget to > update the next_profile_interval_start_. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:247: collection_data.set_ms_after_boot(perf_data_proto.timestamp_sec()); On 2014/05/29 22:56:49, Ilya Sherman wrote: > You are still assigning seconds to a millisecond field. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.cc:249: set_ms_after_login((base::TimeTicks::Now() - login_time_) On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: Please add a DCHECK that login_time_ is non-null. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... File chrome/browser/metrics/perf_provider_chromeos.h (right): https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:32: bool GetPerfData(std::vector<SampledProfile>* perf_data); On 2014/05/29 22:56:49, Ilya Sherman wrote: > Optional nit: Perhaps name this "GetSampledProfile()", and name the variable > "sampled_profile"? Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:61: // interval. On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: Please update this comment. Done. https://codereview.chromium.org/282093011/diff/70001/chrome/browser/metrics/p... chrome/browser/metrics/perf_provider_chromeos.h:66: void CollectIfNecessary(SampledProfile::TriggerEvent trigger_event); On 2014/05/29 22:56:49, Ilya Sherman wrote: > nit: Please update the docs. Done.
LGTM, thanks.
The CQ bit was checked by isherman@chromium.org
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/sque@chromium.org/282093011/90001
Message was sent while issue was closed.
Change committed as 273971 |