|
|
Created:
8 years, 8 months ago by Joao da Silva Modified:
8 years, 2 months ago CC:
chromium-reviews, MAD, nkostylev+watch_chromium.org, stevenjb+watch_chromium.org, Ilya Sherman, davemoore+watch_chromium.org, jar (doing other things), petkov Base URL:
svn://svn.chromium.org/chrome/trunk/src Visibility:
Public. |
DescriptionAdded asynchronous notification of readiness to the StatisticsProvider, and
updated the MetricsService to wait for this.
BUG=chromium-os:29642
TEST=No more empty hardware_class fields in UMA reports
Patch Set 1 #
Total comments: 9
Patch Set 2 : Fixed non-chromeos builds #Patch Set 3 : Addressed comments #
Total comments: 2
Patch Set 4 : Load hardware_class when available without delaying init of MetricsService #
Total comments: 6
Patch Set 5 : Addressed comments, rebased #
Total comments: 19
Patch Set 6 : Don't use base::Unretained. Rebased. #
Total comments: 22
Patch Set 7 : Timeout while waiting for the hardware_class; sample UMA stats #
Total comments: 6
Patch Set 8 : Addressed Jim's comments #Patch Set 9 : Fixed clang build #Patch Set 10 : Fixed clang build, reupload #Patch Set 11 : Upload again... #Patch Set 12 : . #
Total comments: 8
Patch Set 13 : Addressed Ilya's comments #
Total comments: 18
Messages
Total messages: 38 (0 generated)
We're seeing a lot of UMA reports with hardware_class missing, and we now suspect it's the StatisticsProvider execution of crossystem that is timing out. This should fix that issue; other ideas are welcome! :-) @satorux: please review the changes to the StatisticsProvider @jar: please review the changes to the MetricsService Thanks!
(cc'ing @petkov, who helped investigate this issue)
http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:765: self)); Does the WhenReady() call have any sort of timeout? I'd be worried about adding any code to the metrics initialization critical path that can block indefinitely... http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:854: self_ptr_factory_.GetWeakPtr()), nit: No need to use a weak pointer if this is all happening on the UI thread -- or am I missing something? http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.h:163: void OnStatisticsProviderReady(); nit: This is #ifdef'd in the implementation file, but not here in the header. The two should probably be consistent.
Thanks for jumping in Ilya! Please see replies inline. http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:765: self)); On 2012/04/19 19:35:32, Ilya Sherman wrote: > Does the WhenReady() call have any sort of timeout? I'd be worried about adding > any code to the metrics initialization critical path that can block > indefinitely... StatisticsProvider is a singleton whose ctor immediately posts to FILE to load stats from the output of a cmd line tool. WhenReady() fires its callbacks once this tool returns. The problem we're seeing is that some hardware is taking very long to initialize these stats (a couple seconds more than the 30 secs delay for this task). If the tool didn't ever return it'd completely block the FILE thread, and there'd be larger problems in that case :-) So this is expected to slightly delay the MetricsService initialization on only a couple of flaky hardware setups, but won't affect other cases. It will eventually start in any case. http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:854: self_ptr_factory_.GetWeakPtr()), On 2012/04/19 19:35:32, Ilya Sherman wrote: > nit: No need to use a weak pointer if this is all happening on the UI thread -- > or am I missing something? The WeakPtr is bound to the callback task passed to WhenReady(), which is posted with a long delay (30 secs). Shutdown might have started when that task executes, and |this| might have been deleted by then (unlikely, but possible). http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.h:163: void OnStatisticsProviderReady(); On 2012/04/19 19:35:32, Ilya Sherman wrote: > nit: This is #ifdef'd in the implementation file, but not here in the header. > The two should probably be consistent. Of course, good catch. Done.
http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:765: self)); On 2012/04/19 19:54:33, Joao da Silva wrote: > On 2012/04/19 19:35:32, Ilya Sherman wrote: > > Does the WhenReady() call have any sort of timeout? I'd be worried about > adding > > any code to the metrics initialization critical path that can block > > indefinitely... > > StatisticsProvider is a singleton whose ctor immediately posts to FILE to load > stats from the output of a cmd line tool. WhenReady() fires its callbacks once > this tool returns. > > The problem we're seeing is that some hardware is taking very long to initialize > these stats (a couple seconds more than the 30 secs delay for this task). If the > tool didn't ever return it'd completely block the FILE thread, and there'd be > larger problems in that case :-) So this is expected to slightly delay the > MetricsService initialization on only a couple of flaky hardware setups, but > won't affect other cases. It will eventually start in any case. What if the StatisticsProvider's task gets scheduled after a task that takes minutes to run? (Sadly, these exist sometimes, especially on the FILE thread.) I don't think it's appropriate to block the MetricsService initialization path for that long. How about instead not making fetching the hardware class be part of the critical path, e.g. (a) by calling into StatisticsProvider::GetMachineStatistic() when we would first use hardware_class_ or (b) notifying the MetricsService when the data is loaded, rather trying to cache it earlier on? http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:854: self_ptr_factory_.GetWeakPtr()), On 2012/04/19 19:54:33, Joao da Silva wrote: > On 2012/04/19 19:35:32, Ilya Sherman wrote: > > nit: No need to use a weak pointer if this is all happening on the UI thread > -- > > or am I missing something? > > The WeakPtr is bound to the callback task passed to WhenReady(), which is posted > with a long delay (30 secs). Shutdown might have started when that task > executes, and |this| might have been deleted by then (unlikely, but possible). Fair enough.
http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.h:164: void OnStatisticsProviderReady(); Rather than conditionally defining a method... and having the comments that seems very confused... How about always having the methods all the time (all OS versions)... always having the comment that described the flow... and then having the interior of the method *optionally* do work work, depending on whether we are in CrOS or not? The comments on the init tasks are confusing to start with (since we need to set several things up... with several callbacks)... and merging in the OS options to cause different flows across these methods seems extra confusing.
Thanks for the comments! Please have another look. http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics... chrome/browser/metrics/metrics_service.cc:765: self)); On 2012/04/19 20:16:54, Ilya Sherman wrote: > On 2012/04/19 19:54:33, Joao da Silva wrote: > > On 2012/04/19 19:35:32, Ilya Sherman wrote: > > > Does the WhenReady() call have any sort of timeout? I'd be worried about > > adding > > > any code to the metrics initialization critical path that can block > > > indefinitely... > > > > StatisticsProvider is a singleton whose ctor immediately posts to FILE to load > > stats from the output of a cmd line tool. WhenReady() fires its callbacks once > > this tool returns. > > > > The problem we're seeing is that some hardware is taking very long to > initialize > > these stats (a couple seconds more than the 30 secs delay for this task). If > the > > tool didn't ever return it'd completely block the FILE thread, and there'd be > > larger problems in that case :-) So this is expected to slightly delay the > > MetricsService initialization on only a couple of flaky hardware setups, but > > won't affect other cases. It will eventually start in any case. > > What if the StatisticsProvider's task gets scheduled after a task that takes > minutes to run? (Sadly, these exist sometimes, especially on the FILE thread.) > I don't think it's appropriate to block the MetricsService initialization path > for that long. There's another CL that moves its task to the BlockingPool instead; this is just FYI, it's not relevant for this CL. > > How about instead not making fetching the hardware class be part of the critical > path, e.g. (a) by calling into StatisticsProvider::GetMachineStatistic() when we > would first use hardware_class_ or (b) notifying the MetricsService when the > data is loaded, rather trying to cache it earlier on? At first I went for the lazy, straightforward fix :-) (b) sounds like a good idea to me, and I've read a bit more of the MetricsService code to figure how to do that correctly. Please have another look. http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.h:164: void OnStatisticsProviderReady(); On 2012/04/20 00:37:39, jar wrote: > Rather than conditionally defining a method... and having the comments that > seems very confused... > > How about always having the methods all the time (all OS versions)... always > having the comment that described the flow... and then having the interior of > the method *optionally* do work work, depending on whether we are in CrOS or > not? > > The comments on the init tasks are confusing to start with (since we need to set > several things up... with several callbacks)... and merging in the OS options to > cause different flows across these methods seems extra confusing. Good points; I thought it was acceptable in this class because there were 2 precedents (StartExternalMetrics and LogChromeOSCrash). The last patch set removes the hardware_class lookup step out of the init flow completely, and the StatisticsProvider callback has an internal #ifdef as suggested.
LGTM with a few small nits. Thanks :) http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:184: VLOG(1) << "Finished loading statistics"; nit: Can this be a DVLOG(1) rather than a VLOG(1), so that it's compiled out in release builds? http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.h (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:21: // preserves the previous value. nit: Perhaps "is unchanged" rather than "preserves the previous value"? I had to read this a couple of times to figure out that this just meant that if |name| is empty then this method is a no-op. (In fact, could that just be a DCHECK instead?) http://codereview.chromium.org/10078017/diff/20001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:835: hardware_class_ = kHardwareClassNotReady; nit: Any reason not to initialize this in the constructor?
@satorux: friendly ping :-) http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:184: VLOG(1) << "Finished loading statistics"; On 2012/04/20 21:18:37, Ilya Sherman wrote: > nit: Can this be a DVLOG(1) rather than a VLOG(1), so that it's compiled out in > release builds? I'll defer to @satorux on this one. I think most chromeos testers use prebuilt release images; if that's the case then it's better to leave as is. http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.h (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:21: // preserves the previous value. On 2012/04/20 21:18:37, Ilya Sherman wrote: > nit: Perhaps "is unchanged" rather than "preserves the previous value"? I had > to read this a couple of times to figure out that this just meant that if |name| > is empty then this method is a no-op. (In fact, could that just be a DCHECK > instead?) Rephrased as suggested. It's not a DCHECK, since it's valid to query for names that aren't present on some machines. The return value indicates whether the requested name was found or not. http://codereview.chromium.org/10078017/diff/20001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:835: hardware_class_ = kHardwareClassNotReady; On 2012/04/20 21:18:37, Ilya Sherman wrote: > nit: Any reason not to initialize this in the constructor? Done.
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( So, now we don't wait for the hardware class to become available, right? How does this solve the potential problem with missing HWIDs? If initialization is not complete, I think we still collect metrics (I may be wrong), we just don't send them, no? So maybe we can wait for the HWID with some increased timeout -- a couple of minutes, or whatever.
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/23 10:32:55, petkov wrote: > So, now we don't wait for the hardware class to become available, right? How > does this solve the potential problem with missing HWIDs? > > If initialization is not complete, I think we still collect metrics (I may be > wrong), we just don't send them, no? So maybe we can wait for the HWID with some > increased timeout -- a couple of minutes, or whatever. @isherman: if histograms are still collected even when the MetricsService is not initialized, then making the MetricsService wait for the StatisticsProvider to become ready should not be a problem. Is that the case? WDYT?
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:191: base::Unretained(this))); base::Unretained should generally be avoided. Can we ensure that this object is alive when the callback is run? Proving it is often difficult. Instead, I'd suggest to copy callbacks like this: namespace { void NotifyOnUI(const std::vector<base::Closure>& callbacks) { ... } } // namespace This should work as callbacks are copyable. then we don't have to worry about base::Unretained(). http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:241: callback.Run(); maybe: DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, callback);
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:116: on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs)); Ideally, this waitable event stuff should be gone. IIRC, this code was added as the existing clients were expecting the return values to be retrieved synchronously. The right solution was to make all clients code to be asynchronous but it was not feasible, as there were many clients... http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:127: void StatisticsProviderImpl::WhenReady(const base::Closure& callback) { Instead of adding this, wouldn't it be cleaner to add StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback? http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: BrowserThread::FILE, FILE thread is now deprecated. We should use BrowserThread::PostBlockingPoolTask() instead. Perhaps this is sufficient to fix the problem we are seeing?
Satoru: thanks for the comments! Please see inline. Ilya: can you comment on the suggestion proposed by Darin? http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:116: on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs)); On 2012/04/23 17:02:26, satorux1 wrote: > Ideally, this waitable event stuff should be gone. IIRC, this code was added as > the existing clients were expecting the return values to be retrieved > synchronously. The right solution was to make all clients code to be > asynchronous but it was not feasible, as there were many clients... That's the purpose of this CL, but as you mention there are many clients right now. I'd like to have this in for a few weeks and see if the empty HWID are gone. http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:127: void StatisticsProviderImpl::WhenReady(const base::Closure& callback) { On 2012/04/23 17:02:26, satorux1 wrote: > Instead of adding this, wouldn't it be cleaner to add > StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback? I considered both options, and went for this one because some clients read more than one value and having a callback for each would be cumbersome. A single callback suffices to indicate that all the statistics have been loaded. wdyt? http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: BrowserThread::FILE, On 2012/04/23 17:02:26, satorux1 wrote: > FILE thread is now deprecated. We should use > BrowserThread::PostBlockingPoolTask() instead. > Yes indeed! That has just landed in a related CL :-) http://codereview.chromium.org/10127009/diff/22001/chrome/browser/chromeos/sy... > Perhaps this is sufficient to fix the problem we are seeing? It's quite possible that some long-running task on FILE is delaying the initialization of this class. In any case, the interface for clients should be asynchronous and shouldn't block on the WaitableEvent from UI. http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:191: base::Unretained(this))); On 2012/04/23 16:45:49, satorux1 wrote: > base::Unretained should generally be avoided. Can we ensure that this object is > alive when the callback is run? Proving it is often difficult. Instead, I'd > suggest to copy callbacks like this: > > > namespace { > void NotifyOnUI(const std::vector<base::Closure>& callbacks) { > ... > } > } // namespace > > This should work as callbacks are copyable. then we don't have to worry about > base::Unretained(). Done. I've created a vector* here so that it can be swapped with callbacks_ to avoid copying the closures, and so that it can be passed as a pointer into the function. The callback owns the memory. An Unretained() is used above in StartLoadingMachineStatistics() too, btw. Usage of Unretained() here should be ok since this class is a singleton. http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:241: callback.Run(); On 2012/04/23 16:45:49, satorux1 wrote: > maybe: > > DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI)); > BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, callback); Done.
LGTM, with a request for adding some comment: http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:127: void StatisticsProviderImpl::WhenReady(const base::Closure& callback) { On 2012/04/23 17:24:42, Joao da Silva wrote: > On 2012/04/23 17:02:26, satorux1 wrote: > > Instead of adding this, wouldn't it be cleaner to add > > StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback? > > I considered both options, and went for this one because some clients read more > than one value and having a callback for each would be cumbersome. A single > callback suffices to indicate that all the statistics have been loaded. wdyt? That makes sense. Could you add some comment about the rational in the function comment for WhenReady()? http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:191: base::Unretained(this))); On 2012/04/23 17:24:42, Joao da Silva wrote: > On 2012/04/23 16:45:49, satorux1 wrote: > > base::Unretained should generally be avoided. Can we ensure that this object > is > > alive when the callback is run? Proving it is often difficult. Instead, I'd > > suggest to copy callbacks like this: > > > > > > namespace { > > void NotifyOnUI(const std::vector<base::Closure>& callbacks) { > > ... > > } > > } // namespace > > > > This should work as callbacks are copyable. then we don't have to worry about > > base::Unretained(). > > Done. I've created a vector* here so that it can be swapped with callbacks_ to > avoid copying the closures, and so that it can be passed as a pointer into the > function. The callback owns the memory. I'd personally think it's an unnecessary optimization. copying a small number of callbacks is cheap and simpler. > An Unretained() is used above in StartLoadingMachineStatistics() too, btw. Usage > of Unretained() here should be ok since this class is a singleton. Yes, but singletons are deleted in the end. It should be done after the UI thread loop ends, but we usually don't want to rely on a subtle assumption like this. :)
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/23 11:15:56, Joao da Silva wrote: > On 2012/04/23 10:32:55, petkov wrote: > > So, now we don't wait for the hardware class to become available, right? How > > does this solve the potential problem with missing HWIDs? > > > > If initialization is not complete, I think we still collect metrics (I may be > > wrong), we just don't send them, no? So maybe we can wait for the HWID with > some > > increased timeout -- a couple of minutes, or whatever. > > @isherman: if histograms are still collected even when the MetricsService is not > initialized, then making the MetricsService wait for the StatisticsProvider to > become ready should not be a problem. Is that the case? WDYT? Data is collected independently of uploading, but the initial upload includes critical heartbeat metrics like crash & stability counts and other basic sanity checks. Jim can correct me on this, but I don't think the hardware class is critical enough for us to want to potentially block reporting of these heartbeat metrics on fetching that data. Note that with your change, the hardware class will be included with the recurring logs as soon as it is available. In contrast, previously, if fetching the hardware class failed for the initial upload, it would never be properly set. Since UMA users also have a stable client_id on each device, a single client_id should map to exactly one hardware class. Hence, if you need to, you should be able to scan over the uploaded logs and fill in any missing hardware class info in earlier uploads based on the data in later uploads. Does that address your concern?
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( I do queries on UMA data extremely rarely and I don't know how easy it is to match a HWIDs with client_id's. Maybe you can loop in interested PMs/TPMs. This may also require changes to quite a few server-side query tools and dashboards. We don't want to make it difficult to match critical data included in the initial report -- stability info, boot times, etc. -- with the hardware platform it is coming from. What's the downside with delaying the initial report? It will either be sent a few seconds or minutes later, or it will be sent on the next Chrome start, right? On 2012/04/23 22:44:13, Ilya Sherman wrote: > On 2012/04/23 11:15:56, Joao da Silva wrote: > > On 2012/04/23 10:32:55, petkov wrote: > > > So, now we don't wait for the hardware class to become available, right? How > > > does this solve the potential problem with missing HWIDs? > > > > > > If initialization is not complete, I think we still collect metrics (I may > be > > > wrong), we just don't send them, no? So maybe we can wait for the HWID with > > some > > > increased timeout -- a couple of minutes, or whatever. > > > > @isherman: if histograms are still collected even when the MetricsService is > not > > initialized, then making the MetricsService wait for the StatisticsProvider to > > become ready should not be a problem. Is that the case? WDYT? > > Data is collected independently of uploading, but the initial upload includes > critical heartbeat metrics like crash & stability counts and other basic sanity > checks. Jim can correct me on this, but I don't think the hardware class is > critical enough for us to want to potentially block reporting of these heartbeat > metrics on fetching that data. > > Note that with your change, the hardware class will be included with the > recurring logs as soon as it is available. In contrast, previously, if fetching > the hardware class failed for the initial upload, it would never be properly > set. > > Since UMA users also have a stable client_id on each device, a single client_id > should map to exactly one hardware class. Hence, if you need to, you should be > able to scan over the uploaded logs and fill in any missing hardware class info > in earlier uploads based on the data in later uploads. Does that address your > concern?
http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/24 07:42:18, petkov wrote: > I do queries on UMA data extremely rarely and I don't know how easy it is to > match a HWIDs with client_id's. Maybe you can loop in interested PMs/TPMs. This > may also require changes to quite a few server-side query tools and dashboards. > > We don't want to make it difficult to match critical data included in the > initial report -- stability info, boot times, etc. -- with the hardware platform > it is coming from. > > What's the downside with delaying the initial report? It will either be sent a > few seconds or minutes later, or it will be sent on the next Chrome start, > right? Of course, this needs to be checked. Maybe the initial report will be sent 20 minutes later or something if initialization hasn't finished. But that can be fixed too, I guess. > > On 2012/04/23 22:44:13, Ilya Sherman wrote: > > On 2012/04/23 11:15:56, Joao da Silva wrote: > > > On 2012/04/23 10:32:55, petkov wrote: > > > > So, now we don't wait for the hardware class to become available, right? > How > > > > does this solve the potential problem with missing HWIDs? > > > > > > > > If initialization is not complete, I think we still collect metrics (I may > > be > > > > wrong), we just don't send them, no? So maybe we can wait for the HWID > with > > > some > > > > increased timeout -- a couple of minutes, or whatever. > > > > > > @isherman: if histograms are still collected even when the MetricsService is > > not > > > initialized, then making the MetricsService wait for the StatisticsProvider > to > > > become ready should not be a problem. Is that the case? WDYT? > > > > Data is collected independently of uploading, but the initial upload includes > > critical heartbeat metrics like crash & stability counts and other basic > sanity > > checks. Jim can correct me on this, but I don't think the hardware class is > > critical enough for us to want to potentially block reporting of these > heartbeat > > metrics on fetching that data. > > > > Note that with your change, the hardware class will be included with the > > recurring logs as soon as it is available. In contrast, previously, if > fetching > > the hardware class failed for the initial upload, it would never be properly > > set. > > > > Since UMA users also have a stable client_id on each device, a single > client_id > > should map to exactly one hardware class. Hence, if you need to, you should > be > > able to scan over the uploaded logs and fill in any missing hardware class > info > > in earlier uploads based on the data in later uploads. Does that address your > > concern? >
@satorux: addressed your comments, thanks for the review! @isherman, @petkov: let's discuss offline what's the right thing to do re initialization of the MetricsService. http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:127: void StatisticsProviderImpl::WhenReady(const base::Closure& callback) { On 2012/04/23 17:51:01, satorux1 wrote: > On 2012/04/23 17:24:42, Joao da Silva wrote: > > On 2012/04/23 17:02:26, satorux1 wrote: > > > Instead of adding this, wouldn't it be cleaner to add > > > StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback? > > > > I considered both options, and went for this one because some clients read > more > > than one value and having a callback for each would be cumbersome. A single > > callback suffices to indicate that all the statistics have been loaded. wdyt? > > That makes sense. Could you add some comment about the rational in the function > comment for WhenReady()? Done. http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:191: base::Unretained(this))); On 2012/04/23 17:51:01, satorux1 wrote: > On 2012/04/23 17:24:42, Joao da Silva wrote: > > On 2012/04/23 16:45:49, satorux1 wrote: > > > base::Unretained should generally be avoided. Can we ensure that this object > > is > > > alive when the callback is run? Proving it is often difficult. Instead, I'd > > > suggest to copy callbacks like this: > > > > > > > > > namespace { > > > void NotifyOnUI(const std::vector<base::Closure>& callbacks) { > > > ... > > > } > > > } // namespace > > > > > > This should work as callbacks are copyable. then we don't have to worry > about > > > base::Unretained(). > > > > Done. I've created a vector* here so that it can be swapped with callbacks_ to > > avoid copying the closures, and so that it can be passed as a pointer into the > > function. The callback owns the memory. > > I'd personally think it's an unnecessary optimization. copying a small number of > callbacks is cheap and simpler. > Right, done. > > > An Unretained() is used above in StartLoadingMachineStatistics() too, btw. > Usage > > of Unretained() here should be ok since this class is a singleton. > > Yes, but singletons are deleted in the end. It should be done after the UI > thread loop ends, but we usually don't want to rely on a subtle assumption like > this. :)
Do we have a histogram showing how long it takes to get this interface ready (so we can call for a hardware class) in real machines in the field?
http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:108: const std::string& name, std::string* result) { nit: arguments should be aligned, one per line. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:121: LOG(WARNING) << "Waiting to load statistics. Requested statistic: " To reduce the size of the binary, brett made a big sweep getting rid of LOG messages, and turning them into DLOG. Messages like the above probably never get looked at, clog the logs, and tend to provide little to no value. I suspect other LOG messages below are also worth considering transition to DLOG. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:123: on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs)); IMO, the timeout should definitely be part of the function argument list. The caller is in a much better position to decide how long it wishes to wait, and having this unknown time-out (unknown to the caller) makes this very questionable to use. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:137: return false; This API makes it impossible to tell whether there was a time-out, or if the value was not present. Hence you can't tell if it is "worth asking again." http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:172: LOG(WARNING) << "There were errors parsing the output of " Shouldn't there be action taken to rectify this situation? Is this LOG message valuable? If there was value to be had here, I'd expect that a error report, sent via UMA, such as in a histogram would be much more useful (informative). http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:203: callbacks->swap(callbacks_); What thread are you running on? Are you ensured to be on the UI thread, where callbacks is accessed? http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.h (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:26: virtual bool GetMachineStatistic(const std::string& name, nit: I had to read the comment a few times to parse it and possibly undrestand it. For instance, I still don't know the significance of "This does not update the statistics.". Plausibly better comment (for current API) would be: /* Get the machine statistic with a key of |name| (e.g., "hardware_class"). Returns true if the key was found, and data was returned in |result|. Returns false otherwise, and the contents of |result| are undefined. This call may block for an arbitrary length of time (waiting for statistics to load, or a timeout to take place). If this function is called anytime after WhenReady() has invoked any callback, then this function is guaranteed not to block. */ Reading that description, it is IMO clear that the interface *should* specify this timeout, and that would perchance be more valuable than providing an async interface in this case. With a timeout provided in this call, a polling loop could easily get the data. Given that this data is *not* something that a user needs to wait for, and I'm *guessing* that that the call is cheap and rarely used, a polling loop (which keeps posting delayed tasks to retry) sure sounds like a nice solution to avoid this whole problem. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:32: virtual void WhenReady(const base::Closure& callback) = 0; You should probably use a more standard pattern, and just register an observer that will be notified.
Thanks all for the comments here and via email. Please have another look at this CL :-) Histograms have been added to sample invalid crossystem output, and timeouts while waiting for the StatisticsProvider. @petkov: the 3 cases you described are handled: - valid hardware_classes are used when available; - invalid crossystem output will use "unknown", and sample a UMA histogram; - timed out reads will use "(not ready)", and start using hardware_class once it's available. The current timeout is 5 minutes. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:108: const std::string& name, std::string* result) { On 2012/04/24 18:37:40, jar wrote: > nit: arguments should be aligned, one per line. Done. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:121: LOG(WARNING) << "Waiting to load statistics. Requested statistic: " On 2012/04/24 18:37:40, jar wrote: > To reduce the size of the binary, brett made a big sweep getting rid of LOG > messages, and turning them into DLOG. Messages like the above probably never > get looked at, clog the logs, and tend to provide little to no value. > > I suspect other LOG messages below are also worth considering transition to > DLOG. Agree, done. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:123: on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs)); On 2012/04/24 18:37:40, jar wrote: > IMO, the timeout should definitely be part of the function argument list. The > caller is in a much better position to decide how long it wishes to wait, and > having this unknown time-out (unknown to the caller) makes this very > questionable to use. This is definitely questionable and can surprise callers; I think the right approach for these cases are async interfaces. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:137: return false; On 2012/04/24 18:37:40, jar wrote: > This API makes it impossible to tell whether there was a time-out, or if the > value was not present. Hence you can't tell if it is "worth asking again." This problem doesn't exist with the async notification; every user of this interface should be refactored to use the async interface, and the WaitableEvent should be removed. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:172: LOG(WARNING) << "There were errors parsing the output of " On 2012/04/24 18:37:40, jar wrote: > Shouldn't there be action taken to rectify this situation? Is this LOG message > valuable? If there was value to be had here, I'd expect that a error report, > sent via UMA, such as in a histogram would be much more useful (informative). Yes indeed! Added a UMA_HISTOGRAM_BOOLEAN to sample this case. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:203: callbacks->swap(callbacks_); On 2012/04/24 18:37:40, jar wrote: > What thread are you running on? Are you ensured to be on the UI thread, where > callbacks is accessed? This happens in the blocking pool, but the callbacks are passed to UI before being invoked. The |callbacks_| vector isn't modified from UI anymore after signaling the WaitableEvent. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.h (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:26: virtual bool GetMachineStatistic(const std::string& name, On 2012/04/24 18:37:40, jar wrote: > nit: I had to read the comment a few times to parse it and possibly undrestand > it. For instance, I still don't know the significance of "This does not update > the statistics.". Plausibly better comment (for current API) would be: > > /* Get the machine statistic with a key of |name| (e.g., "hardware_class"). > Returns true if the key was found, and data was returned in |result|. Returns > false otherwise, and the contents of |result| are undefined. This call may > block for an arbitrary length of time (waiting for statistics to load, or a > timeout to take place). If this function is called anytime after WhenReady() > has invoked any callback, then this function is guaranteed not to block. > */ Thanks, that's a much better wording :-) > > Reading that description, it is IMO clear that the interface *should* specify > this timeout, and that would perchance be more valuable than providing an async > interface in this case. With a timeout provided in this call, a polling loop > could easily get the data. Given that this data is *not* something that a user > needs to wait for, and I'm *guessing* that that the call is cheap and rarely > used, a polling loop (which keeps posting delayed tasks to retry) sure sounds > like a nice solution to avoid this whole problem. It seems to me that both approaches would be similar from the client's perspective; in both cases, the client needs to have a method that either handles the callback task or that retries polling. The callback interface doesn't require rewriting clients that don't care about the timeout though, and that's why it was the first choice. wdyt? http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:32: virtual void WhenReady(const base::Closure& callback) = 0; On 2012/04/24 18:37:40, jar wrote: > You should probably use a more standard pattern, and just register an observer > that will be notified. I've been encouraged in other CLs to use callbacks when possible. Observers are more appropriate when there is more than 1 callback method, or when there's a need to unregister them. Callbacks work well for one-shot notifications, as is the case here. Do you strongly prefer an Observer for this case?
http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:137: return false; I think what you're saying is that you plan to remove this API... and hence you don't want to fix it. Is that correct? On 2012/04/27 09:47:48, Joao da Silva wrote: > On 2012/04/24 18:37:40, jar wrote: > > This API makes it impossible to tell whether there was a time-out, or if the > > value was not present. Hence you can't tell if it is "worth asking again." > > This problem doesn't exist with the async notification; every user of this > interface should be refactored to use the async interface, and the WaitableEvent > should be removed. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:203: callbacks->swap(callbacks_); It appears as though the line: callbacks_.push_back(callback); can be executed asynchronously on the UI thread. Keep in mind that the test that preceeds it (by several lines) may be executed befroe you Signal() on line 196 (since we are on different threads). It is not clear to me what will happen if the swap() is being executed at the same time as the push is being attempted. It is plausible that you'd be safe, but it is also quite likely that we'd corrupt memory. What am I missing? On 2012/04/27 09:47:48, Joao da Silva wrote: > On 2012/04/24 18:37:40, jar wrote: > > What thread are you running on? Are you ensured to be on the UI thread, where > > callbacks is accessed? > > This happens in the blocking pool, but the callbacks are passed to UI before > being invoked. The |callbacks_| vector isn't modified from UI anymore after > signaling the WaitableEvent. http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.h (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.h:32: virtual void WhenReady(const base::Closure& callback) = 0; On 2012/04/27 09:47:48, Joao da Silva wrote: > On 2012/04/24 18:37:40, jar wrote: > > You should probably use a more standard pattern, and just register an observer > > that will be notified. > > I've been encouraged in other CLs to use callbacks when possible. Observers are > more appropriate when there is more than 1 callback method, or when there's a > need to unregister them. Callbacks work well for one-shot notifications, as is > the case here. Do you strongly prefer an Observer for this case? I like callbacks better, as you noted, when there is only one callback method. In this case, you implemented a stack of callbacks, and a push, which suggested you wanted to have a multitude of callbacks. I'd prefer to see exactly one callback... with a CHECK if we ever had more than one. If you want support for a multitude of callbacks, I'd rather see you use the observer pattern.
http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:815: void MetricsService::OnStatisticsProviderTimeout() { nit: Perhaps log a histogram/event for the timeout case as well? http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:818: // ready. It looks like, with this setup, we'll never record the hardware class if we reach the timeout, even for ongoing uploads. Am I misreading the code? If not, it would be nice to still listen for the "ready" event from the StatisticsProvider, just not have it block the init path, in the case of a timeout. http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.h:425: base::WeakPtrFactory<MetricsService> timeout_task_factory_; nit: Why not just use self_ptr_factory_ here?
Joao, what's the status of this CL?
On Mon, May 7, 2012 at 10:15 PM, <isherman@chromium.org> wrote: > Joao, what's the status of this CL? > I still plan to upload a new version after finishing other work, probably this week. > > https://chromiumcodereview.**appspot.com/10078017/<https://chromiumcodereview... >
A CL back from the dead! :-) @isherman, @jar, @petkov: Please have another look I've introduced an Observer interface as requested by Jim, and simplified the timeout code for the StatisticsProvider in MetricsService, using CancelableCallbacks. https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr... File chrome/browser/chromeos/system/statistics_provider.cc (right): https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr... chrome/browser/chromeos/system/statistics_provider.cc:137: return false; On 2012/04/27 18:00:38, jar wrote: > I think what you're saying is that you plan to remove this API... and hence you > don't want to fix it. Is that correct? > This API is used in a couple of places that aren't straightforward to refactor to wait for the asynchronous notification. For example, StartupCustomizationDocument (chrome/browser/chromeos/customization_document.cc) call this from the ctor, and I don't see an easy way to make it wait for an async notification. So for now I'd like to fix the problem at hand (empty hardware_class in UMA reports), and introduce an API that clients who care should use :-) > On 2012/04/27 09:47:48, Joao da Silva wrote: > > On 2012/04/24 18:37:40, jar wrote: > > > This API makes it impossible to tell whether there was a time-out, or if the > > > value was not present. Hence you can't tell if it is "worth asking again." > > > > This problem doesn't exist with the async notification; every user of this > > interface should be refactored to use the async interface, and the > WaitableEvent > > should be removed. > https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr... chrome/browser/chromeos/system/statistics_provider.cc:203: callbacks->swap(callbacks_); On 2012/04/27 18:00:38, jar wrote: > It appears as though the line: > > callbacks_.push_back(callback); > > can be executed asynchronously on the UI thread. Keep in mind that the test > that preceeds it (by several lines) may be executed befroe you Signal() on line > 196 (since we are on different threads). > > It is not clear to me what will happen if the swap() is being executed at the > same time as the push is being attempted. It is plausible that you'd be safe, > but it is also quite likely that we'd corrupt memory. > > What am I missing? You're totally right. I tried to make this a static method to execute on the blocking pool and pass it a WeakPtr created from UI so that it could signal completion to UI, and pass the parsed data along the callback. However that'd mean that whoever is blocking in GetMachineStatistic() won't see the data before this task runs in UI too, so I gave up that approach. The current solution is to post back to UI a task just to invoke the callbacks, but keep the WaitableEvent to unblock whoever might be waiting at GetMachineStatistic() on UI. WDYT? (I'd really really like to get the whole thing right, but it's not easy at all to modify callers of GetMachineStatistic() that expect it to be a synchronous call...) > > On 2012/04/27 09:47:48, Joao da Silva wrote: > > On 2012/04/24 18:37:40, jar wrote: > > > What thread are you running on? Are you ensured to be on the UI thread, > where > > > callbacks is accessed? > > > > This happens in the blocking pool, but the callbacks are passed to UI before > > being invoked. The |callbacks_| vector isn't modified from UI anymore after > > signaling the WaitableEvent. > https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr... File chrome/browser/chromeos/system/statistics_provider.h (right): https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr... chrome/browser/chromeos/system/statistics_provider.h:32: virtual void WhenReady(const base::Closure& callback) = 0; On 2012/04/27 18:00:38, jar wrote: > On 2012/04/27 09:47:48, Joao da Silva wrote: > > On 2012/04/24 18:37:40, jar wrote: > > > You should probably use a more standard pattern, and just register an > observer > > > that will be notified. > > > > I've been encouraged in other CLs to use callbacks when possible. Observers > are > > more appropriate when there is more than 1 callback method, or when there's a > > need to unregister them. Callbacks work well for one-shot notifications, as is > > the case here. Do you strongly prefer an Observer for this case? > > I like callbacks better, as you noted, when there is only one callback method. > In this case, you implemented a stack of callbacks, and a push, which suggested > you wanted to have a multitude of callbacks. I'd prefer to see exactly one > callback... with a CHECK if we ever had more than one. > > If you want support for a multitude of callbacks, I'd rather see you use the > observer pattern. Added an observer as suggested. https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met... File chrome/browser/metrics/metrics_service.cc (right): https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met... chrome/browser/metrics/metrics_service.cc:815: void MetricsService::OnStatisticsProviderTimeout() { On 2012/04/30 07:36:41, Ilya Sherman wrote: > nit: Perhaps log a histogram/event for the timeout case as well? Good idea. I've moved these histograms to the StatisticsProvider code and added this one there as well. https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met... chrome/browser/metrics/metrics_service.cc:818: // ready. On 2012/04/30 07:36:41, Ilya Sherman wrote: > It looks like, with this setup, we'll never record the hardware class if we > reach the timeout, even for ongoing uploads. Am I misreading the code? If not, > it would be nice to still listen for the "ready" event from the > StatisticsProvider, just not have it block the init path, in the case of a > timeout. The WhenReady() callback is still pending in this case, and will update hardware_class_ once it executes. This code has changed meanwhile. https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met... File chrome/browser/metrics/metrics_service.h (right): https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met... chrome/browser/metrics/metrics_service.h:425: base::WeakPtrFactory<MetricsService> timeout_task_factory_; On 2012/04/30 07:36:41, Ilya Sherman wrote: > nit: Why not just use self_ptr_factory_ here? Because self_ptr_factory_ is used to post other tasks as well, and they shouldn't be invalidated when we only want to invalidate the timeout. This has been redone due to the new Observer interface at the StatisticsProvider. I've found it clearer to use CancelableCallbacks now.
metrics/ changes LGTM with nits https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... File chrome/browser/metrics/metrics_service.cc (right): https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... chrome/browser/metrics/metrics_service.cc:790: #endif nit: Please add " // #if defined(OS_CHROMEOS)" https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... chrome/browser/metrics/metrics_service.cc:804: void MetricsService::OnStatisticsProviderTimeout() { nit: DCHECK to make sure we're still on the UI thread? https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... chrome/browser/metrics/metrics_service.cc:812: #endif nit: Ditto https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... File chrome/browser/metrics/metrics_service.h (right): https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/met... chrome/browser/metrics/metrics_service.h:460: base::CancelableClosure on_ready_callback_; nit: I'd recommend either prefixing both callbacks with "on_", or neither.
@jar, @petkov: friendly ping :-) http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:790: #endif On 2012/05/16 01:01:17, Ilya Sherman wrote: > nit: Please add " // #if defined(OS_CHROMEOS)" Done. http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:804: void MetricsService::OnStatisticsProviderTimeout() { On 2012/05/16 01:01:17, Ilya Sherman wrote: > nit: DCHECK to make sure we're still on the UI thread? Done. http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:812: #endif On 2012/05/16 01:01:17, Ilya Sherman wrote: > nit: Ditto Done. http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.h:460: base::CancelableClosure on_ready_callback_; On 2012/05/16 01:01:17, Ilya Sherman wrote: > nit: I'd recommend either prefixing both callbacks with "on_", or neither. Renamed to on_timeout_callback_.
LGTM AFAICT. It might be nice to reduce the CHROMEOS ifdefs but I assume you've looked into that already.
Jim, what do you think of the current patch? (friendly ping! :-) )
I'm missing, and/or confused by, some of the logic. Thanks in advance for explaining. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:146: if (on_statistics_loaded_.IsSignaled()) { This confuses me a bit, as we're doing signalling, and setting up an observer. I'd expect to see one or the other. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: observer_list_.AddObserver(observer); I see an observer optionally being added... but I don't see the corresponding call to remove the observer, presumably when the callback is made. Note that if you make the callback on line 147, you effectively remove (actually, avoid adding) the observer. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:246: FOR_EACH_OBSERVER(Observer, observer_list_, OnMachineStatisticsReady()); This, for instance, would be a great place to remove the observers (after we've done the callbacks). http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:417: chromeos::system::StatisticsProvider::GetInstance()->RemoveObserver(this); Are we sure it was added before we call this remove? http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:787: chromeos::system::StatisticsProvider::GetInstance()->AddObserver(this); I *think* the semantics of AddObserver() are such that they sometimes synchronously call the completion, and other times they actually add to the list. This leaves us unsure if we need to do a remove later. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:800: if (!on_ready_callback_.IsCancelled()) This cancellable callback looks more complex than need be. This code is only called on the UI thread, and the Run() method on the next line, if it runs at all, will call InitTaskGetPluginInfo() (just like we do in the OnStatisticsProviderTimeout(). Why don't we just have boolean like: get_plugin_info_was_called_ What is the complexity I'm missing? http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:941: MessageLoop::current()->PostDelayedTask( Why are we now willing to do this on the current (UI??) thread?
Thanks for your comments Jim. Please see inline; I'd like your comments on the comments before continuing :-) http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:146: if (on_statistics_loaded_.IsSignaled()) { On 2012/05/21 18:10:02, jar wrote: > This confuses me a bit, as we're doing signalling, and setting up an observer. > I'd expect to see one or the other. Removing the signal would require refactoring all the current clients to use the async interface, which isn't feasible in one go. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: observer_list_.AddObserver(observer); On 2012/05/21 18:10:02, jar wrote: > I see an observer optionally being added... but I don't see the corresponding > call to remove the observer, presumably when the callback is made. Note that if > you make the callback on line 147, you effectively remove (actually, avoid > adding) the observer. Right, this is a bit confusing. Removing an observer that wasn't added is OK, therefore this code. But observers that go away before the signal is ready should be able to unregister themselves. This is a one-shot event, and adding an observer after the signal is ready won't do anything, since it won't be notified again. I think the callback version was clearer for this; "Observers" could just invalidate the callback (using CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry about unregistering. I see two options now: (1) automatically remove observers right after notifying them. RemoveObserver() is still required to remove observers that go away before the statistics are ready. (2) go back to the callback version :-) I prefer (2); what's your opinion? http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:417: chromeos::system::StatisticsProvider::GetInstance()->RemoveObserver(this); On 2012/05/21 18:10:02, jar wrote: > Are we sure it was added before we call this remove? Removing an observer that isn't registered is OK. Otherwise we'd have to track whether we are currently observing... http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:800: if (!on_ready_callback_.IsCancelled()) On 2012/05/21 18:10:02, jar wrote: > This cancellable callback looks more complex than need be. > > This code is only called on the UI thread, and the Run() method on the next > line, if it runs at all, will call InitTaskGetPluginInfo() (just like we do in > the OnStatisticsProviderTimeout(). > > Why don't we just have boolean like: > get_plugin_info_was_called_ > > What is the complexity I'm missing? That's another way to do it, I just wanted to avoid adding that state. I'll refactor this like that. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:941: MessageLoop::current()->PostDelayedTask( On 2012/05/21 18:10:02, jar wrote: > Why are we now willing to do this on the current (UI??) thread? Because this isn't a blocking operation anymore, it just sets up the timeout task and registers an observer.
http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:800: if (!on_ready_callback_.IsCancelled()) Note: You could implement the "was_called" bool inside of the method, so that extraneous calls would be ignored. That might simplify your concerns about state (held in the instance), but it might make it harder to test etc. On 2012/05/22 14:55:25, Joao da Silva wrote: > On 2012/05/21 18:10:02, jar wrote: > > This cancellable callback looks more complex than need be. > > > > This code is only called on the UI thread, and the Run() method on the next > > line, if it runs at all, will call InitTaskGetPluginInfo() (just like we do in > > the OnStatisticsProviderTimeout(). > > > > Why don't we just have boolean like: > > get_plugin_info_was_called_ > > > > What is the complexity I'm missing? > > That's another way to do it, I just wanted to avoid adding that state. I'll > refactor this like that. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:941: MessageLoop::current()->PostDelayedTask( Since it is non-blocking, there should be even less concern about running on the UI thread. I think registering an observer is supposed to be done on the "observer's thread." All notifications are done synchronously on that thread as well. If you want cross-thread notification, or registration, you probably need to use ObserverListThreadSafe... but that might mess things up and get the callback to the registration thread :-/. My guess is that you want all registration and notification to happen on the UI thread... so I think you should go ahead and post to the UI thread still (and not bother with ObserverListThreadSafe). On 2012/05/22 14:55:25, Joao da Silva wrote: > On 2012/05/21 18:10:02, jar wrote: > > Why are we now willing to do this on the current (UI??) thread? > > Because this isn't a blocking operation anymore, it just sets up the timeout > task and registers an observer.
Thanks for the comments Jim. Just one last issue before I make another try at this :-) Please see inline. Thanks! http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: observer_list_.AddObserver(observer); On 2012/05/22 14:55:25, Joao da Silva wrote: > On 2012/05/21 18:10:02, jar wrote: > > I see an observer optionally being added... but I don't see the corresponding > > call to remove the observer, presumably when the callback is made. Note that > if > > you make the callback on line 147, you effectively remove (actually, avoid > > adding) the observer. > > Right, this is a bit confusing. Removing an observer that wasn't added is OK, > therefore this code. But observers that go away before the signal is ready > should be able to unregister themselves. > > This is a one-shot event, and adding an observer after the signal is ready won't > do anything, since it won't be notified again. I think the callback version was > clearer for this; "Observers" could just invalidate the callback (using > CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry about > unregistering. > > I see two options now: > > (1) automatically remove observers right after notifying them. RemoveObserver() > is still required to remove observers that go away before the statistics are > ready. > > (2) go back to the callback version :-) > > I prefer (2); what's your opinion? I'd like your take on this matter before proceeding further, please comment. Thanks! http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met... chrome/browser/metrics/metrics_service.cc:941: MessageLoop::current()->PostDelayedTask( On 2012/05/22 17:07:58, jar wrote: > Since it is non-blocking, there should be even less concern about running on the > UI thread. > > I think registering an observer is supposed to be done on the "observer's > thread." All notifications are done synchronously on that thread as well. I may be misunderstanding, but that is the case. The observer is added in the UI thread, and the notification also comes in the UI thread. > > If you want cross-thread notification, or registration, you probably need to use > ObserverListThreadSafe... but that might mess things up and get the callback to > the registration thread :-/. > > My guess is that you want all registration and notification to happen on the UI > thread... so I think you should go ahead and post to the UI thread still (and > not bother with ObserverListThreadSafe). Alright, I'll leave it as is then. > > > On 2012/05/22 14:55:25, Joao da Silva wrote: > > On 2012/05/21 18:10:02, jar wrote: > > > Why are we now willing to do this on the current (UI??) thread? > > > > Because this isn't a blocking operation anymore, it just sets up the timeout > > task and registers an observer. >
http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:149: observer_list_.AddObserver(observer); The "callback" version was re-inventing the wheel... and probably re-inventing all the evolutionary bugs. I think the current direction is better. To maintain symmetry, I'd suggest you remove the observer after you notify it, so that the end result is the same for both immediate notification, or async notification. You are right, you still have to clean up, just in case. On 2012/05/22 14:55:25, Joao da Silva wrote: > On 2012/05/21 18:10:02, jar wrote: > > I see an observer optionally being added... but I don't see the corresponding > > call to remove the observer, presumably when the callback is made. Note that > if > > you make the callback on line 147, you effectively remove (actually, avoid > > adding) the observer. > > Right, this is a bit confusing. Removing an observer that wasn't added is OK, > therefore this code. But observers that go away before the signal is ready > should be able to unregister themselves. > > This is a one-shot event, and adding an observer after the signal is ready won't > do anything, since it won't be notified again. I think the callback version was > clearer for this; "Observers" could just invalidate the callback (using > CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry about > unregistering. > > I see two options now: > > (1) automatically remove observers right after notifying them. RemoveObserver() > is still required to remove observers that go away before the statistics are > ready. > > (2) go back to the callback version :-) > > I prefer (2); what's your opinion? http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy... chrome/browser/chromeos/system/statistics_provider.cc:246: FOR_EACH_OBSERVER(Observer, observer_list_, OnMachineStatisticsReady()); On 2012/05/21 18:10:02, jar wrote: > This, for instance, would be a great place to remove the observers (after we've > done the callbacks). Note that it would get messy if you tried to use async registration from other threads while you were iterating here. Again, single thread registration and notification seems like a much clearer picture. |