Issue 10078017: Added asynchronous notification of readiness to the StatisticsProvider, and

Joao da Silva

We're seeing a lot of UMA reports with hardware_class missing, and we now suspect it's ...

8 years, 8 months ago (2012-04-19 19:08:47 UTC) #1

Joao da Silva

(cc'ing @petkov, who helped investigate this issue)

8 years, 8 months ago (2012-04-19 19:09:20 UTC) #2

Ilya Sherman

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc#newcode765 chrome/browser/metrics/metrics_service.cc:765: self)); Does the WhenReady() call have any sort of ...

8 years, 8 months ago (2012-04-19 19:35:32 UTC) #3

Joao da Silva

Thanks for jumping in Ilya! Please see replies inline. http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc#newcode765 chrome/browser/metrics/metrics_service.cc:765: ...

8 years, 8 months ago (2012-04-19 19:54:33 UTC) #4

Thanks for jumping in Ilya! Please see replies inline.

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.cc:765: self));
On 2012/04/19 19:35:32, Ilya Sherman wrote:
> Does the WhenReady() call have any sort of timeout?  I'd be worried about
adding
> any code to the metrics initialization critical path that can block
> indefinitely...

StatisticsProvider is a singleton whose ctor immediately posts to FILE to load
stats from the output of a cmd line tool. WhenReady() fires its callbacks once
this tool returns.

The problem we're seeing is that some hardware is taking very long to initialize
these stats (a couple seconds more than the 30 secs delay for this task). If the
tool didn't ever return it'd completely block the FILE thread, and there'd be
larger problems in that case :-) So this is expected to slightly delay the
MetricsService initialization on only a couple of flaky hardware setups, but
won't affect other cases. It will eventually start in any case.

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.cc:854: self_ptr_factory_.GetWeakPtr()),
On 2012/04/19 19:35:32, Ilya Sherman wrote:
> nit: No need to use a weak pointer if this is all happening on the UI thread
--
> or am I missing something?

The WeakPtr is bound to the callback task passed to WhenReady(), which is posted
with a long delay (30 secs). Shutdown might have started when that task
executes, and |this| might have been deleted by then (unlikely, but possible).

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
File chrome/browser/metrics/metrics_service.h (right):

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.h:163: void OnStatisticsProviderReady();
On 2012/04/19 19:35:32, Ilya Sherman wrote:
> nit: This is #ifdef'd in the implementation file, but not here in the header. 
> The two should probably be consistent.

Of course, good catch. Done.

Ilya Sherman

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc#newcode765 chrome/browser/metrics/metrics_service.cc:765: self)); On 2012/04/19 19:54:33, Joao da Silva wrote: > ...

8 years, 8 months ago (2012-04-19 20:16:53 UTC) #5

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.cc:765: self));
On 2012/04/19 19:54:33, Joao da Silva wrote:
> On 2012/04/19 19:35:32, Ilya Sherman wrote:
> > Does the WhenReady() call have any sort of timeout?  I'd be worried about
> adding
> > any code to the metrics initialization critical path that can block
> > indefinitely...
> 
> StatisticsProvider is a singleton whose ctor immediately posts to FILE to load
> stats from the output of a cmd line tool. WhenReady() fires its callbacks once
> this tool returns.
> 
> The problem we're seeing is that some hardware is taking very long to
initialize
> these stats (a couple seconds more than the 30 secs delay for this task). If
the
> tool didn't ever return it'd completely block the FILE thread, and there'd be
> larger problems in that case :-) So this is expected to slightly delay the
> MetricsService initialization on only a couple of flaky hardware setups, but
> won't affect other cases. It will eventually start in any case.

What if the StatisticsProvider's task gets scheduled after a task that takes
minutes to run?  (Sadly, these exist sometimes, especially on the FILE thread.) 
I don't think it's appropriate to block the MetricsService initialization path
for that long.

How about instead not making fetching the hardware class be part of the critical
path, e.g. (a) by calling into StatisticsProvider::GetMachineStatistic() when we
would first use hardware_class_ or (b) notifying the MetricsService when the
data is loaded, rather trying to cache it earlier on?

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.cc:854: self_ptr_factory_.GetWeakPtr()),
On 2012/04/19 19:54:33, Joao da Silva wrote:
> On 2012/04/19 19:35:32, Ilya Sherman wrote:
> > nit: No need to use a weak pointer if this is all happening on the UI thread
> --
> > or am I missing something?
> 
> The WeakPtr is bound to the callback task passed to WhenReady(), which is
posted
> with a long delay (30 secs). Shutdown might have started when that task
> executes, and |this| might have been deleted by then (unlikely, but possible).

Fair enough.

jar (doing other things)

http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/metrics_service.h File chrome/browser/metrics/metrics_service.h (right): http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/metrics_service.h#newcode164 chrome/browser/metrics/metrics_service.h:164: void OnStatisticsProviderReady(); Rather than conditionally defining a method... and ...

8 years, 8 months ago (2012-04-20 00:37:39 UTC) #8

Joao da Silva

Thanks for the comments! Please have another look. http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics_service.cc#newcode765 chrome/browser/metrics/metrics_service.cc:765: self)); ...

8 years, 8 months ago (2012-04-20 11:53:35 UTC) #9

Thanks for the comments! Please have another look.

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/1/chrome/browser/metrics/metrics...
chrome/browser/metrics/metrics_service.cc:765: self));
On 2012/04/19 20:16:54, Ilya Sherman wrote:
> On 2012/04/19 19:54:33, Joao da Silva wrote:
> > On 2012/04/19 19:35:32, Ilya Sherman wrote:
> > > Does the WhenReady() call have any sort of timeout?  I'd be worried about
> > adding
> > > any code to the metrics initialization critical path that can block
> > > indefinitely...
> > 
> > StatisticsProvider is a singleton whose ctor immediately posts to FILE to
load
> > stats from the output of a cmd line tool. WhenReady() fires its callbacks
once
> > this tool returns.
> > 
> > The problem we're seeing is that some hardware is taking very long to
> initialize
> > these stats (a couple seconds more than the 30 secs delay for this task). If
> the
> > tool didn't ever return it'd completely block the FILE thread, and there'd
be
> > larger problems in that case :-) So this is expected to slightly delay the
> > MetricsService initialization on only a couple of flaky hardware setups, but
> > won't affect other cases. It will eventually start in any case.
> 
> What if the StatisticsProvider's task gets scheduled after a task that takes
> minutes to run?  (Sadly, these exist sometimes, especially on the FILE
thread.) 
> I don't think it's appropriate to block the MetricsService initialization path
> for that long.

There's another CL that moves its task to the BlockingPool instead; this is just
FYI, it's not relevant for this CL.

> 
> How about instead not making fetching the hardware class be part of the
critical
> path, e.g. (a) by calling into StatisticsProvider::GetMachineStatistic() when
we
> would first use hardware_class_ or (b) notifying the MetricsService when the
> data is loaded, rather trying to cache it earlier on?

At first I went for the lazy, straightforward fix :-) (b) sounds like a good
idea to me, and I've read a bit more of the MetricsService code to figure how to
do that correctly. Please have another look.

http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.h (right):

http://codereview.chromium.org/10078017/diff/13001/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.h:164: void OnStatisticsProviderReady();
On 2012/04/20 00:37:39, jar wrote:
> Rather than conditionally defining a method... and having the comments that
> seems very confused... 
> 
> How about always having the methods all the time (all OS versions)... always
> having the comment that described the flow... and then having the interior of
> the method *optionally* do work work, depending on whether we are in CrOS or
> not?
> 
> The comments on the init tasks are confusing to start with (since we need to
set
> several things up... with several callbacks)... and merging in the OS options
to
> cause different flows across these methods seems extra confusing.

Good points; I thought it was acceptable in this class because there were 2
precedents (StartExternalMetrics and LogChromeOSCrash).

The last patch set removes the hardware_class lookup step out of the init flow
completely, and the StatisticsProvider callback has an internal #ifdef as
suggested.

Ilya Sherman

LGTM with a few small nits. Thanks :) http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/system/statistics_provider.cc#newcode184 chrome/browser/chromeos/system/statistics_provider.cc:184: VLOG(1) ...

8 years, 8 months ago (2012-04-20 21:18:37 UTC) #10

Joao da Silva

@satorux: friendly ping :-) http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/20001/chrome/browser/chromeos/system/statistics_provider.cc#newcode184 chrome/browser/chromeos/system/statistics_provider.cc:184: VLOG(1) << "Finished loading statistics"; ...

8 years, 8 months ago (2012-04-23 09:36:49 UTC) #11

petkov

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc#newcode838 chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( So, now we don't wait for the hardware ...

8 years, 8 months ago (2012-04-23 10:32:54 UTC) #12

Joao da Silva

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc#newcode838 chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/23 10:32:55, petkov wrote: > So, now ...

8 years, 8 months ago (2012-04-23 11:15:56 UTC) #13

satorux1

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc#newcode191 chrome/browser/chromeos/system/statistics_provider.cc:191: base::Unretained(this))); base::Unretained should generally be avoided. Can we ensure ...

8 years, 8 months ago (2012-04-23 16:45:48 UTC) #14

satorux1

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc#newcode116 chrome/browser/chromeos/system/statistics_provider.cc:116: on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs)); Ideally, this waitable event stuff should be gone. ...

8 years, 8 months ago (2012-04-23 17:02:26 UTC) #15

Joao da Silva

Satoru: thanks for the comments! Please see inline. Ilya: can you comment on the suggestion ...

8 years, 8 months ago (2012-04-23 17:24:42 UTC) #16

Satoru: thanks for the comments! Please see inline.

Ilya: can you comment on the suggestion proposed by Darin?

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:116:
on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs));
On 2012/04/23 17:02:26, satorux1 wrote:
> Ideally, this waitable event stuff should be gone. IIRC, this code was added
as
> the existing clients were expecting the return values to be retrieved
> synchronously. The right solution was to make all clients code to be
> asynchronous but it was not feasible, as there were many clients...

That's the purpose of this CL, but as you mention there are many clients right
now. I'd like to have this in for a few weeks and see if the empty HWID are
gone.

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:127: void
StatisticsProviderImpl::WhenReady(const base::Closure& callback) {
On 2012/04/23 17:02:26, satorux1 wrote:
> Instead of adding this, wouldn't it be cleaner to add
> StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback?

I considered both options, and went for this one because some clients read more
than one value and having a callback for each would be cumbersome. A single
callback suffices to indicate that all the statistics have been loaded. wdyt?

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:149: BrowserThread::FILE,
On 2012/04/23 17:02:26, satorux1 wrote:
> FILE thread is now deprecated. We should use
> BrowserThread::PostBlockingPoolTask() instead.
> 

Yes indeed! That has just landed in a related CL :-)

http://codereview.chromium.org/10127009/diff/22001/chrome/browser/chromeos/sy...

> Perhaps this is sufficient to fix the problem we are seeing?

It's quite possible that some long-running task on FILE is delaying the
initialization of this class. In any case, the interface for clients should be
asynchronous and shouldn't block on the WaitableEvent from UI.

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:191:
base::Unretained(this)));
On 2012/04/23 16:45:49, satorux1 wrote:
> base::Unretained should generally be avoided. Can we ensure that this object
is
> alive when the callback is run? Proving it is often difficult. Instead, I'd
> suggest to copy callbacks like this:
> 
> 
> namespace {
> void NotifyOnUI(const std::vector<base::Closure>& callbacks) {
>   ...
> }
> }  // namespace
> 
> This should work as callbacks are copyable. then we don't have to worry about
> base::Unretained().

Done. I've created a vector* here so that it can be swapped with callbacks_ to
avoid copying the closures, and so that it can be passed as a pointer into the
function. The callback owns the memory.

An Unretained() is used above in StartLoadingMachineStatistics() too, btw. Usage
of Unretained() here should be ok since this class is a singleton.

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:241: callback.Run();
On 2012/04/23 16:45:49, satorux1 wrote:
> maybe:
> 
> DCHECK(BrowserThread::CurrentlyOn(BrowserThread::UI));
> BrowserThread::PostTask(BrowserThread::UI, FROM_HERE, callback);

Done.

satorux1

LGTM, with a request for adding some comment: http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/system/statistics_provider.cc#newcode127 chrome/browser/chromeos/system/statistics_provider.cc:127: void ...

8 years, 8 months ago (2012-04-23 17:51:01 UTC) #17

LGTM, with a request for adding some comment:

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:127: void
StatisticsProviderImpl::WhenReady(const base::Closure& callback) {
On 2012/04/23 17:24:42, Joao da Silva wrote:
> On 2012/04/23 17:02:26, satorux1 wrote:
> > Instead of adding this, wouldn't it be cleaner to add
> > StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback?
> 
> I considered both options, and went for this one because some clients read
more
> than one value and having a callback for each would be cumbersome. A single
> callback suffices to indicate that all the statistics have been loaded. wdyt?

That makes sense. Could you add some comment about the rational in the function
comment for WhenReady()?

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:191:
base::Unretained(this)));
On 2012/04/23 17:24:42, Joao da Silva wrote:
> On 2012/04/23 16:45:49, satorux1 wrote:
> > base::Unretained should generally be avoided. Can we ensure that this object
> is
> > alive when the callback is run? Proving it is often difficult. Instead, I'd
> > suggest to copy callbacks like this:
> > 
> > 
> > namespace {
> > void NotifyOnUI(const std::vector<base::Closure>& callbacks) {
> >   ...
> > }
> > }  // namespace
> > 
> > This should work as callbacks are copyable. then we don't have to worry
about
> > base::Unretained().
> 
> Done. I've created a vector* here so that it can be swapped with callbacks_ to
> avoid copying the closures, and so that it can be passed as a pointer into the
> function. The callback owns the memory.

I'd personally think it's an unnecessary optimization. copying a small number of
callbacks is cheap and simpler.


> An Unretained() is used above in StartLoadingMachineStatistics() too, btw.
Usage
> of Unretained() here should be ok since this class is a singleton.

Yes, but singletons are deleted in the end. It should be done after the UI
thread loop ends, but we usually don't want to rely on a subtle assumption like
this. :)

Ilya Sherman

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc#newcode838 chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/23 11:15:56, Joao da Silva wrote: > ...

8 years, 8 months ago (2012-04-23 22:44:13 UTC) #18

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:838:
chromeos::system::StatisticsProvider::GetInstance()->WhenReady(
On 2012/04/23 11:15:56, Joao da Silva wrote:
> On 2012/04/23 10:32:55, petkov wrote:
> > So, now we don't wait for the hardware class to become available, right? How
> > does this solve the potential problem with missing HWIDs?
> > 
> > If initialization is not complete, I think we still collect metrics (I may
be
> > wrong), we just don't send them, no? So maybe we can wait for the HWID with
> some
> > increased timeout -- a couple of minutes, or whatever.
> 
> @isherman: if histograms are still collected even when the MetricsService is
not
> initialized, then making the MetricsService wait for the StatisticsProvider to
> become ready should not be a problem. Is that the case? WDYT?

Data is collected independently of uploading, but the initial upload includes
critical heartbeat metrics like crash & stability counts and other basic sanity
checks.  Jim can correct me on this, but I don't think the hardware class is
critical enough for us to want to potentially block reporting of these heartbeat
metrics on fetching that data.

Note that with your change, the hardware class will be included with the
recurring logs as soon as it is available.  In contrast, previously, if fetching
the hardware class failed for the initial upload, it would never be properly
set.

Since UMA users also have a stable client_id on each device, a single client_id
should map to exactly one hardware class.  Hence, if you need to, you should be
able to scan over the uploaded logs and fill in any missing hardware class info
in earlier uploads based on the data in later uploads.  Does that address your
concern?

petkov

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc#newcode838 chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( I do queries on UMA data extremely rarely ...

8 years, 8 months ago (2012-04-24 07:42:18 UTC) #19

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:838:
chromeos::system::StatisticsProvider::GetInstance()->WhenReady(
I do queries on UMA data extremely rarely and I don't know how easy it is to
match a HWIDs with client_id's. Maybe you can loop in interested PMs/TPMs. This
may also require changes to quite a few server-side query tools and dashboards.

We don't want to make it difficult to match critical data included in the
initial report -- stability info, boot times, etc. -- with the hardware platform
it is coming from.

What's the downside with delaying the initial report? It will either be sent a
few seconds or minutes later, or it will be sent on the next Chrome start,
right?

On 2012/04/23 22:44:13, Ilya Sherman wrote:
> On 2012/04/23 11:15:56, Joao da Silva wrote:
> > On 2012/04/23 10:32:55, petkov wrote:
> > > So, now we don't wait for the hardware class to become available, right?
How
> > > does this solve the potential problem with missing HWIDs?
> > > 
> > > If initialization is not complete, I think we still collect metrics (I may
> be
> > > wrong), we just don't send them, no? So maybe we can wait for the HWID
with
> > some
> > > increased timeout -- a couple of minutes, or whatever.
> > 
> > @isherman: if histograms are still collected even when the MetricsService is
> not
> > initialized, then making the MetricsService wait for the StatisticsProvider
to
> > become ready should not be a problem. Is that the case? WDYT?
> 
> Data is collected independently of uploading, but the initial upload includes
> critical heartbeat metrics like crash & stability counts and other basic
sanity
> checks.  Jim can correct me on this, but I don't think the hardware class is
> critical enough for us to want to potentially block reporting of these
heartbeat
> metrics on fetching that data.
> 
> Note that with your change, the hardware class will be included with the
> recurring logs as soon as it is available.  In contrast, previously, if
fetching
> the hardware class failed for the initial upload, it would never be properly
> set.
> 
> Since UMA users also have a stable client_id on each device, a single
client_id
> should map to exactly one hardware class.  Hence, if you need to, you should
be
> able to scan over the uploaded logs and fill in any missing hardware class
info
> in earlier uploads based on the data in later uploads.  Does that address your
> concern?

petkov

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/metrics_service.cc#newcode838 chrome/browser/metrics/metrics_service.cc:838: chromeos::system::StatisticsProvider::GetInstance()->WhenReady( On 2012/04/24 07:42:18, petkov wrote: > I do ...

8 years, 8 months ago (2012-04-24 07:49:36 UTC) #20

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:838:
chromeos::system::StatisticsProvider::GetInstance()->WhenReady(
On 2012/04/24 07:42:18, petkov wrote:
> I do queries on UMA data extremely rarely and I don't know how easy it is to
> match a HWIDs with client_id's. Maybe you can loop in interested PMs/TPMs.
This
> may also require changes to quite a few server-side query tools and
dashboards.
> 
> We don't want to make it difficult to match critical data included in the
> initial report -- stability info, boot times, etc. -- with the hardware
platform
> it is coming from.
> 
> What's the downside with delaying the initial report? It will either be sent a
> few seconds or minutes later, or it will be sent on the next Chrome start,
> right?

Of course, this needs to be checked. Maybe the initial report will be sent 20
minutes later or something if initialization hasn't finished. But that can be
fixed too, I guess.

> 
> On 2012/04/23 22:44:13, Ilya Sherman wrote:
> > On 2012/04/23 11:15:56, Joao da Silva wrote:
> > > On 2012/04/23 10:32:55, petkov wrote:
> > > > So, now we don't wait for the hardware class to become available, right?
> How
> > > > does this solve the potential problem with missing HWIDs?
> > > > 
> > > > If initialization is not complete, I think we still collect metrics (I
may
> > be
> > > > wrong), we just don't send them, no? So maybe we can wait for the HWID
> with
> > > some
> > > > increased timeout -- a couple of minutes, or whatever.
> > > 
> > > @isherman: if histograms are still collected even when the MetricsService
is
> > not
> > > initialized, then making the MetricsService wait for the
StatisticsProvider
> to
> > > become ready should not be a problem. Is that the case? WDYT?
> > 
> > Data is collected independently of uploading, but the initial upload
includes
> > critical heartbeat metrics like crash & stability counts and other basic
> sanity
> > checks.  Jim can correct me on this, but I don't think the hardware class is
> > critical enough for us to want to potentially block reporting of these
> heartbeat
> > metrics on fetching that data.
> > 
> > Note that with your change, the hardware class will be included with the
> > recurring logs as soon as it is available.  In contrast, previously, if
> fetching
> > the hardware class failed for the initial upload, it would never be properly
> > set.
> > 
> > Since UMA users also have a stable client_id on each device, a single
> client_id
> > should map to exactly one hardware class.  Hence, if you need to, you should
> be
> > able to scan over the uploaded logs and fill in any missing hardware class
> info
> > in earlier uploads based on the data in later uploads.  Does that address
your
> > concern?
>

Joao da Silva

@satorux: addressed your comments, thanks for the review! @isherman, @petkov: let's discuss offline what's the ...

8 years, 8 months ago (2012-04-24 09:47:06 UTC) #21

@satorux: addressed your comments, thanks for the review!

@isherman, @petkov: let's discuss offline what's the right thing to do re
initialization of the MetricsService.

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:127: void
StatisticsProviderImpl::WhenReady(const base::Closure& callback) {
On 2012/04/23 17:51:01, satorux1 wrote:
> On 2012/04/23 17:24:42, Joao da Silva wrote:
> > On 2012/04/23 17:02:26, satorux1 wrote:
> > > Instead of adding this, wouldn't it be cleaner to add
> > > StatisticsProviderImpl::GetMachineStatisticAsync() that takes a callback?
> > 
> > I considered both options, and went for this one because some clients read
> more
> > than one value and having a callback for each would be cumbersome. A single
> > callback suffices to indicate that all the statistics have been loaded.
wdyt?
> 
> That makes sense. Could you add some comment about the rational in the
function
> comment for WhenReady()?

Done.

http://codereview.chromium.org/10078017/diff/29001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:191:
base::Unretained(this)));
On 2012/04/23 17:51:01, satorux1 wrote:
> On 2012/04/23 17:24:42, Joao da Silva wrote:
> > On 2012/04/23 16:45:49, satorux1 wrote:
> > > base::Unretained should generally be avoided. Can we ensure that this
object
> > is
> > > alive when the callback is run? Proving it is often difficult. Instead,
I'd
> > > suggest to copy callbacks like this:
> > > 
> > > 
> > > namespace {
> > > void NotifyOnUI(const std::vector<base::Closure>& callbacks) {
> > >   ...
> > > }
> > > }  // namespace
> > > 
> > > This should work as callbacks are copyable. then we don't have to worry
> about
> > > base::Unretained().
> > 
> > Done. I've created a vector* here so that it can be swapped with callbacks_
to
> > avoid copying the closures, and so that it can be passed as a pointer into
the
> > function. The callback owns the memory.
> 
> I'd personally think it's an unnecessary optimization. copying a small number
of
> callbacks is cheap and simpler.
> 

Right, done.

> 
> > An Unretained() is used above in StartLoadingMachineStatistics() too, btw.
> Usage
> > of Unretained() here should be ok since this class is a singleton.
> 
> Yes, but singletons are deleted in the end. It should be done after the UI
> thread loop ends, but we usually don't want to rely on a subtle assumption
like
> this. :)

jar (doing other things)

Do we have a histogram showing how long it takes to get this interface ready ...

8 years, 8 months ago (2012-04-24 16:09:53 UTC) #22

jar (doing other things)

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/system/statistics_provider.cc#newcode108 chrome/browser/chromeos/system/statistics_provider.cc:108: const std::string& name, std::string* result) { nit: arguments should ...

8 years, 8 months ago (2012-04-24 18:37:40 UTC) #23

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:108: const std::string&
name, std::string* result) {
nit: arguments should be aligned, one per line.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:121: LOG(WARNING) <<
"Waiting to load statistics. Requested statistic: "
To reduce the size of the binary, brett made a big sweep getting rid of LOG
messages, and turning them into DLOG.  Messages like the above probably never
get looked at, clog the logs, and tend to provide little to no value.

I suspect other LOG messages below are also worth considering transition to
DLOG.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:123:
on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs));
IMO, the timeout should definitely be part of the function argument list.  The
caller is in a much better position to decide how long it wishes to wait, and
having this unknown time-out (unknown to the caller) makes this very
questionable to use.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:137: return false;
This API makes it impossible to tell whether there was a time-out, or if the
value was not present.  Hence you can't tell if it is "worth asking again."

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:172: LOG(WARNING) <<
"There were errors parsing the output of "
Shouldn't there be action taken to rectify this situation?  Is this LOG message
valuable? If there was value to be had here, I'd expect that a error report,
sent via UMA, such as in a histogram would be much more useful (informative).

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:203:
callbacks->swap(callbacks_);
What thread are you running on?  Are you ensured to be on the UI thread, where
callbacks is accessed?

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.h (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.h:26: virtual bool
GetMachineStatistic(const std::string& name,
nit: I had to read the comment a few times to parse it and possibly undrestand
it.  For instance, I still don't know the significance of "This does not update
the statistics.".  Plausibly better comment (for current API) would be:

/* Get the machine statistic with a key of |name| (e.g., "hardware_class"). 
Returns true if the key was found, and data was returned in |result|.  Returns
false otherwise, and the contents of |result| are undefined.  This call may
block for an arbitrary length of time (waiting for statistics to load, or a
timeout to take place).  If this function is called anytime after WhenReady()
has invoked any callback, then this function is guaranteed not to block. 
*/

Reading that description, it is IMO clear that the interface *should* specify
this timeout, and that would perchance be more valuable than providing an async
interface in this case.  With a timeout provided in this call, a polling loop
could easily get the data.  Given that this data is *not* something that a user
needs to wait for, and I'm *guessing* that that the call is cheap and rarely
used, a polling loop (which keeps posting delayed tasks to retry) sure sounds
like a nice solution to avoid this whole problem.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.h:32: virtual void
WhenReady(const base::Closure& callback) = 0;
You should probably use a more standard pattern, and just register an observer
that will be notified.

Joao da Silva

Thanks all for the comments here and via email. Please have another look at this ...

8 years, 8 months ago (2012-04-27 09:47:47 UTC) #24

Thanks all for the comments here and via email. Please have another look at this
CL :-)

Histograms have been added to sample invalid crossystem output, and timeouts
while waiting for the StatisticsProvider.

@petkov: the 3 cases you described are handled:
- valid hardware_classes are used when available;
- invalid crossystem output will use "unknown", and sample a UMA histogram;
- timed out reads will use "(not ready)", and start using hardware_class once
it's available. The current timeout is 5 minutes.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:108: const std::string&
name, std::string* result) {
On 2012/04/24 18:37:40, jar wrote:
> nit: arguments should be aligned, one per line.

Done.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:121: LOG(WARNING) <<
"Waiting to load statistics. Requested statistic: "
On 2012/04/24 18:37:40, jar wrote:
> To reduce the size of the binary, brett made a big sweep getting rid of LOG
> messages, and turning them into DLOG.  Messages like the above probably never
> get looked at, clog the logs, and tend to provide little to no value.
> 
> I suspect other LOG messages below are also worth considering transition to
> DLOG.

Agree, done.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:123:
on_statistics_loaded_.TimedWait(base::TimeDelta::FromSeconds(kTimeoutSecs));
On 2012/04/24 18:37:40, jar wrote:
> IMO, the timeout should definitely be part of the function argument list.  The
> caller is in a much better position to decide how long it wishes to wait, and
> having this unknown time-out (unknown to the caller) makes this very
> questionable to use.

This is definitely questionable and can surprise callers; I think the right
approach for these cases are async interfaces.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:137: return false;
On 2012/04/24 18:37:40, jar wrote:
> This API makes it impossible to tell whether there was a time-out, or if the
> value was not present.  Hence you can't tell if it is "worth asking again."

This problem doesn't exist with the async notification; every user of this
interface should be refactored to use the async interface, and the WaitableEvent
should be removed.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:172: LOG(WARNING) <<
"There were errors parsing the output of "
On 2012/04/24 18:37:40, jar wrote:
> Shouldn't there be action taken to rectify this situation?  Is this LOG
message
> valuable? If there was value to be had here, I'd expect that a error report,
> sent via UMA, such as in a histogram would be much more useful (informative).

Yes indeed! Added a UMA_HISTOGRAM_BOOLEAN to sample this case.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:203:
callbacks->swap(callbacks_);
On 2012/04/24 18:37:40, jar wrote:
> What thread are you running on?  Are you ensured to be on the UI thread, where
> callbacks is accessed?

This happens in the blocking pool, but the callbacks are passed to UI before
being invoked. The |callbacks_| vector isn't modified from UI anymore after
signaling the WaitableEvent.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.h (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.h:26: virtual bool
GetMachineStatistic(const std::string& name,
On 2012/04/24 18:37:40, jar wrote:
> nit: I had to read the comment a few times to parse it and possibly undrestand
> it.  For instance, I still don't know the significance of "This does not
update
> the statistics.".  Plausibly better comment (for current API) would be:
> 
> /* Get the machine statistic with a key of |name| (e.g., "hardware_class"). 
> Returns true if the key was found, and data was returned in |result|.  Returns
> false otherwise, and the contents of |result| are undefined.  This call may
> block for an arbitrary length of time (waiting for statistics to load, or a
> timeout to take place).  If this function is called anytime after WhenReady()
> has invoked any callback, then this function is guaranteed not to block. 
> */

Thanks, that's a much better wording :-)

> 
> Reading that description, it is IMO clear that the interface *should* specify
> this timeout, and that would perchance be more valuable than providing an
async
> interface in this case.  With a timeout provided in this call, a polling loop
> could easily get the data.  Given that this data is *not* something that a
user
> needs to wait for, and I'm *guessing* that that the call is cheap and rarely
> used, a polling loop (which keeps posting delayed tasks to retry) sure sounds
> like a nice solution to avoid this whole problem.

It seems to me that both approaches would be similar from the client's
perspective; in both cases, the client needs to have a method that either
handles the callback task or that retries polling. The callback interface
doesn't require rewriting clients that don't care about the timeout though, and
that's why it was the first choice. wdyt?

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.h:32: virtual void
WhenReady(const base::Closure& callback) = 0;
On 2012/04/24 18:37:40, jar wrote:
> You should probably use a more standard pattern, and just register an observer
> that will be notified.

I've been encouraged in other CLs to use callbacks when possible. Observers are
more appropriate when there is more than 1 callback method, or when there's a
need to unregister them. Callbacks work well for one-shot notifications, as is
the case here. Do you strongly prefer an Observer for this case?

jar (doing other things)

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/system/statistics_provider.cc File chrome/browser/chromeos/system/statistics_provider.cc (right): http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/system/statistics_provider.cc#newcode137 chrome/browser/chromeos/system/statistics_provider.cc:137: return false; I think what you're saying is that ...

8 years, 8 months ago (2012-04-27 18:00:38 UTC) #25

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:137: return false;
I think what you're saying is that you plan to remove this API... and hence you
don't want to fix it.  Is that correct?

On 2012/04/27 09:47:48, Joao da Silva wrote:
> On 2012/04/24 18:37:40, jar wrote:
> > This API makes it impossible to tell whether there was a time-out, or if the
> > value was not present.  Hence you can't tell if it is "worth asking again."
> 
> This problem doesn't exist with the async notification; every user of this
> interface should be refactored to use the async interface, and the
WaitableEvent
> should be removed.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:203:
callbacks->swap(callbacks_);
It appears as though the line:

callbacks_.push_back(callback);

can be executed asynchronously on the UI thread.  Keep in mind that the test
that preceeds it (by several lines) may be executed befroe you Signal() on line
196 (since we are on different threads).

It is not clear to me what will happen if the swap() is being executed at the
same time as the push is being attempted.  It is plausible that you'd be safe,
but it is also quite likely that we'd corrupt memory.

What am I missing?

On 2012/04/27 09:47:48, Joao da Silva wrote:
> On 2012/04/24 18:37:40, jar wrote:
> > What thread are you running on?  Are you ensured to be on the UI thread,
where
> > callbacks is accessed?
> 
> This happens in the blocking pool, but the callbacks are passed to UI before
> being invoked. The |callbacks_| vector isn't modified from UI anymore after
> signaling the WaitableEvent.

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.h (right):

http://codereview.chromium.org/10078017/diff/34001/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.h:32: virtual void
WhenReady(const base::Closure& callback) = 0;
On 2012/04/27 09:47:48, Joao da Silva wrote:
> On 2012/04/24 18:37:40, jar wrote:
> > You should probably use a more standard pattern, and just register an
observer
> > that will be notified.
> 
> I've been encouraged in other CLs to use callbacks when possible. Observers
are
> more appropriate when there is more than 1 callback method, or when there's a
> need to unregister them. Callbacks work well for one-shot notifications, as is
> the case here. Do you strongly prefer an Observer for this case?

I like callbacks better, as you noted, when there is only one callback method. 
In this case, you implemented a stack of callbacks, and a push, which suggested
you wanted to have a multitude of callbacks.  I'd prefer to see exactly one
callback... with a CHECK if we ever had more than one.

If you want support for a multitude of callbacks, I'd rather see you use the
observer pattern.

Ilya Sherman

http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/48001/chrome/browser/metrics/metrics_service.cc#newcode815 chrome/browser/metrics/metrics_service.cc:815: void MetricsService::OnStatisticsProviderTimeout() { nit: Perhaps log a histogram/event for ...

8 years, 7 months ago (2012-04-30 07:36:40 UTC) #26

Joao da Silva

On Mon, May 7, 2012 at 10:15 PM, <isherman@chromium.org> wrote: > Joao, what's the status ...

8 years, 7 months ago (2012-05-07 22:12:51 UTC) #28

Joao da Silva

A CL back from the dead! :-) @isherman, @jar, @petkov: Please have another look I've ...

8 years, 7 months ago (2012-05-15 13:52:46 UTC) #29

A CL back from the dead! :-)

@isherman, @jar, @petkov: Please have another look

I've introduced an Observer interface as requested by Jim, and simplified the
timeout code for the StatisticsProvider in MetricsService, using
CancelableCallbacks.

https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr...
chrome/browser/chromeos/system/statistics_provider.cc:137: return false;
On 2012/04/27 18:00:38, jar wrote:
> I think what you're saying is that you plan to remove this API... and hence
you
> don't want to fix it.  Is that correct?
> 

This API is used in a couple of places that aren't straightforward to refactor
to wait for the asynchronous notification. For example,
StartupCustomizationDocument (chrome/browser/chromeos/customization_document.cc)
call this from the ctor, and I don't see an easy way to make it wait for an
async notification.

So for now I'd like to fix the problem at hand (empty hardware_class in UMA
reports), and introduce an API that clients who care should use :-)

> On 2012/04/27 09:47:48, Joao da Silva wrote:
> > On 2012/04/24 18:37:40, jar wrote:
> > > This API makes it impossible to tell whether there was a time-out, or if
the
> > > value was not present.  Hence you can't tell if it is "worth asking
again."
> > 
> > This problem doesn't exist with the async notification; every user of this
> > interface should be refactored to use the async interface, and the
> WaitableEvent
> > should be removed.
>

https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr...
chrome/browser/chromeos/system/statistics_provider.cc:203:
callbacks->swap(callbacks_);
On 2012/04/27 18:00:38, jar wrote:
> It appears as though the line:
> 
> callbacks_.push_back(callback);
> 
> can be executed asynchronously on the UI thread.  Keep in mind that the test
> that preceeds it (by several lines) may be executed befroe you Signal() on
line
> 196 (since we are on different threads).
> 
> It is not clear to me what will happen if the swap() is being executed at the
> same time as the push is being attempted.  It is plausible that you'd be safe,
> but it is also quite likely that we'd corrupt memory.
> 
> What am I missing?

You're totally right. I tried to make this a static method to execute on the
blocking pool and pass it a WeakPtr created from UI so that it could signal
completion to UI, and pass the parsed data along the callback. However that'd
mean that whoever is blocking in GetMachineStatistic() won't see the data before
this task runs in UI too, so I gave up that approach.

The current solution is to post back to UI a task just to invoke the callbacks,
but keep the WaitableEvent to unblock whoever might be waiting at
GetMachineStatistic() on UI. WDYT?

(I'd really really like to get the whole thing right, but it's not easy at all
to modify callers of GetMachineStatistic() that expect it to be a synchronous
call...)

> 
> On 2012/04/27 09:47:48, Joao da Silva wrote:
> > On 2012/04/24 18:37:40, jar wrote:
> > > What thread are you running on?  Are you ensured to be on the UI thread,
> where
> > > callbacks is accessed?
> > 
> > This happens in the blocking pool, but the callbacks are passed to UI before
> > being invoked. The |callbacks_| vector isn't modified from UI anymore after
> > signaling the WaitableEvent.
>

https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr...
File chrome/browser/chromeos/system/statistics_provider.h (right):

https://chromiumcodereview.appspot.com/10078017/diff/34001/chrome/browser/chr...
chrome/browser/chromeos/system/statistics_provider.h:32: virtual void
WhenReady(const base::Closure& callback) = 0;
On 2012/04/27 18:00:38, jar wrote:
> On 2012/04/27 09:47:48, Joao da Silva wrote:
> > On 2012/04/24 18:37:40, jar wrote:
> > > You should probably use a more standard pattern, and just register an
> observer
> > > that will be notified.
> > 
> > I've been encouraged in other CLs to use callbacks when possible. Observers
> are
> > more appropriate when there is more than 1 callback method, or when there's
a
> > need to unregister them. Callbacks work well for one-shot notifications, as
is
> > the case here. Do you strongly prefer an Observer for this case?
> 
> I like callbacks better, as you noted, when there is only one callback method.

> In this case, you implemented a stack of callbacks, and a push, which
suggested
> you wanted to have a multitude of callbacks.  I'd prefer to see exactly one
> callback... with a CHECK if we ever had more than one.
> 
> If you want support for a multitude of callbacks, I'd rather see you use the
> observer pattern.

Added an observer as suggested.

https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met...
File chrome/browser/metrics/metrics_service.cc (right):

https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met...
chrome/browser/metrics/metrics_service.cc:815: void
MetricsService::OnStatisticsProviderTimeout() {
On 2012/04/30 07:36:41, Ilya Sherman wrote:
> nit: Perhaps log a histogram/event for the timeout case as well?

Good idea. I've moved these histograms to the StatisticsProvider code and added
this one there as well.

https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met...
chrome/browser/metrics/metrics_service.cc:818: // ready.
On 2012/04/30 07:36:41, Ilya Sherman wrote:
> It looks like, with this setup, we'll never record the hardware class if we
> reach the timeout, even for ongoing uploads.  Am I misreading the code?  If
not,
> it would be nice to still listen for the "ready" event from the
> StatisticsProvider, just not have it block the init path, in the case of a
> timeout.

The WhenReady() callback is still pending in this case, and will update
hardware_class_ once it executes. This code has changed meanwhile.

https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met...
File chrome/browser/metrics/metrics_service.h (right):

https://chromiumcodereview.appspot.com/10078017/diff/48001/chrome/browser/met...
chrome/browser/metrics/metrics_service.h:425:
base::WeakPtrFactory<MetricsService> timeout_task_factory_;
On 2012/04/30 07:36:41, Ilya Sherman wrote:
> nit: Why not just use self_ptr_factory_ here?

Because self_ptr_factory_ is used to post other tasks as well, and they
shouldn't be invalidated when we only want to invalidate the timeout.

This has been redone due to the new Observer interface at the
StatisticsProvider. I've found it clearer to use CancelableCallbacks now.

Ilya Sherman

metrics/ changes LGTM with nits https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): https://chromiumcodereview.appspot.com/10078017/diff/70003/chrome/browser/metrics/metrics_service.cc#newcode790 chrome/browser/metrics/metrics_service.cc:790: #endif nit: Please add ...

8 years, 7 months ago (2012-05-16 01:01:17 UTC) #30

Joao da Silva

@jar, @petkov: friendly ping :-) http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70003/chrome/browser/metrics/metrics_service.cc#newcode790 chrome/browser/metrics/metrics_service.cc:790: #endif On 2012/05/16 01:01:17, ...

8 years, 7 months ago (2012-05-16 12:53:34 UTC) #31

petkov

LGTM AFAICT. It might be nice to reduce the CHROMEOS ifdefs but I assume you've ...

8 years, 7 months ago (2012-05-16 15:05:05 UTC) #32

Joao da Silva

Jim, what do you think of the current patch? (friendly ping! :-) )

8 years, 7 months ago (2012-05-21 16:58:55 UTC) #33

jar (doing other things)

I'm missing, and/or confused by, some of the logic. Thanks in advance for explaining. http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/system/statistics_provider.cc ...

8 years, 7 months ago (2012-05-21 18:10:02 UTC) #34

Joao da Silva

Thanks for your comments Jim. Please see inline; I'd like your comments on the comments ...

8 years, 7 months ago (2012-05-22 14:55:25 UTC) #35

Thanks for your comments Jim. Please see inline; I'd like your comments on the
comments before continuing :-)

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:146: if
(on_statistics_loaded_.IsSignaled()) {
On 2012/05/21 18:10:02, jar wrote:
> This confuses me a bit, as we're doing signalling, and setting up an observer.

> I'd expect to see one or the other.

Removing the signal would require refactoring all the current clients to use the
async interface, which isn't feasible in one go.

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:149:
observer_list_.AddObserver(observer);
On 2012/05/21 18:10:02, jar wrote:
> I see an observer optionally being added... but I don't see the corresponding
> call to remove the observer, presumably when the callback is made.  Note that
if
> you make the callback on line 147, you effectively remove (actually, avoid
> adding) the observer.

Right, this is a bit confusing. Removing an observer that wasn't added is OK,
therefore this code. But observers that go away before the signal is ready
should be able to unregister themselves.

This is a one-shot event, and adding an observer after the signal is ready won't
do anything, since it won't be notified again. I think the callback version was
clearer for this; "Observers" could just invalidate the callback (using
CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry about
unregistering.

I see two options now:

(1) automatically remove observers right after notifying them. RemoveObserver()
is still required to remove observers that go away before the statistics are
ready.

(2) go back to the callback version :-)

I prefer (2); what's your opinion?

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:417:
chromeos::system::StatisticsProvider::GetInstance()->RemoveObserver(this);
On 2012/05/21 18:10:02, jar wrote:
> Are we sure it was added before we call this remove?

Removing an observer that isn't registered is OK. Otherwise we'd have to track
whether we are currently observing...

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:800: if
(!on_ready_callback_.IsCancelled())
On 2012/05/21 18:10:02, jar wrote:
> This cancellable callback looks more complex than need be.
> 
> This code is only called on the UI thread, and the Run() method on the next
> line, if it runs at all, will call InitTaskGetPluginInfo() (just like we do in
> the OnStatisticsProviderTimeout().
> 
> Why don't we just have boolean like:
> get_plugin_info_was_called_
> 
> What is the complexity I'm missing?

That's another way to do it, I just wanted to avoid adding that state. I'll
refactor this like that.

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:941:
MessageLoop::current()->PostDelayedTask(
On 2012/05/21 18:10:02, jar wrote:
> Why are we now willing to do this on the current (UI??) thread?

Because this isn't a blocking operation anymore, it just sets up the timeout
task and registers an observer.

jar (doing other things)

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/metrics_service.cc File chrome/browser/metrics/metrics_service.cc (right): http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/metrics_service.cc#newcode800 chrome/browser/metrics/metrics_service.cc:800: if (!on_ready_callback_.IsCancelled()) Note: You could implement the "was_called" bool ...

8 years, 7 months ago (2012-05-22 17:07:58 UTC) #36

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:800: if
(!on_ready_callback_.IsCancelled())
Note: You could implement the "was_called" bool inside of the method, so that
extraneous calls would be ignored. That might simplify your concerns about state
(held in the instance), but it might make it harder to test etc.

On 2012/05/22 14:55:25, Joao da Silva wrote:
> On 2012/05/21 18:10:02, jar wrote:
> > This cancellable callback looks more complex than need be.
> > 
> > This code is only called on the UI thread, and the Run() method on the next
> > line, if it runs at all, will call InitTaskGetPluginInfo() (just like we do
in
> > the OnStatisticsProviderTimeout().
> > 
> > Why don't we just have boolean like:
> > get_plugin_info_was_called_
> > 
> > What is the complexity I'm missing?
> 
> That's another way to do it, I just wanted to avoid adding that state. I'll
> refactor this like that.

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:941:
MessageLoop::current()->PostDelayedTask(
Since it is non-blocking, there should be even less concern about running on the
UI thread.

I think registering an observer is supposed to be done on the "observer's
thread."  All notifications are done synchronously on that thread as well.

If you want cross-thread notification, or registration, you probably need to use
ObserverListThreadSafe... but that might mess things up and get the callback to
the registration thread :-/.

My guess is that you want all registration and notification to happen on the UI
thread... so I think you should go ahead and post to the UI thread still (and
not bother with ObserverListThreadSafe).

On 2012/05/22 14:55:25, Joao da Silva wrote:
> On 2012/05/21 18:10:02, jar wrote:
> > Why are we now willing to do this on the current (UI??) thread?
> 
> Because this isn't a blocking operation anymore, it just sets up the timeout
> task and registers an observer.

Joao da Silva

Thanks for the comments Jim. Just one last issue before I make another try at ...

8 years, 7 months ago (2012-05-22 19:10:06 UTC) #37

Thanks for the comments Jim. Just one last issue before I make another try at
this :-) Please see inline. Thanks!

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:149:
observer_list_.AddObserver(observer);
On 2012/05/22 14:55:25, Joao da Silva wrote:
> On 2012/05/21 18:10:02, jar wrote:
> > I see an observer optionally being added... but I don't see the
corresponding
> > call to remove the observer, presumably when the callback is made.  Note
that
> if
> > you make the callback on line 147, you effectively remove (actually, avoid
> > adding) the observer.
> 
> Right, this is a bit confusing. Removing an observer that wasn't added is OK,
> therefore this code. But observers that go away before the signal is ready
> should be able to unregister themselves.
> 
> This is a one-shot event, and adding an observer after the signal is ready
won't
> do anything, since it won't be notified again. I think the callback version
was
> clearer for this; "Observers" could just invalidate the callback (using
> CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry
about
> unregistering.
> 
> I see two options now:
> 
> (1) automatically remove observers right after notifying them.
RemoveObserver()
> is still required to remove observers that go away before the statistics are
> ready.
> 
> (2) go back to the callback version :-)
> 
> I prefer (2); what's your opinion?

I'd like your take on this matter before proceeding further, please comment.
Thanks!

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
File chrome/browser/metrics/metrics_service.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/metrics/met...
chrome/browser/metrics/metrics_service.cc:941:
MessageLoop::current()->PostDelayedTask(
On 2012/05/22 17:07:58, jar wrote:
> Since it is non-blocking, there should be even less concern about running on
the
> UI thread.
> 
> I think registering an observer is supposed to be done on the "observer's
> thread."  All notifications are done synchronously on that thread as well.

I may be misunderstanding, but that is the case. The observer is added in the UI
thread, and the notification also comes in the UI thread.

> 
> If you want cross-thread notification, or registration, you probably need to
use
> ObserverListThreadSafe... but that might mess things up and get the callback
to
> the registration thread :-/.
> 
> My guess is that you want all registration and notification to happen on the
UI
> thread... so I think you should go ahead and post to the UI thread still (and
> not bother with ObserverListThreadSafe).

Alright, I'll leave it as is then.

> 
> 
> On 2012/05/22 14:55:25, Joao da Silva wrote:
> > On 2012/05/21 18:10:02, jar wrote:
> > > Why are we now willing to do this on the current (UI??) thread?
> > 
> > Because this isn't a blocking operation anymore, it just sets up the timeout
> > task and registers an observer.
>

jar (doing other things)

8 years, 7 months ago (2012-05-22 20:09:30 UTC) #38

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
File chrome/browser/chromeos/system/statistics_provider.cc (right):

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:149:
observer_list_.AddObserver(observer);
The "callback" version was re-inventing the wheel... and probably re-inventing
all the evolutionary bugs.  I think the current direction is better.

To maintain symmetry, I'd suggest you remove the observer after you notify it,
so that the end result is the same for both immediate notification, or async
notification.

You are right, you still have to clean up, just in case.

On 2012/05/22 14:55:25, Joao da Silva wrote:
> On 2012/05/21 18:10:02, jar wrote:
> > I see an observer optionally being added... but I don't see the
corresponding
> > call to remove the observer, presumably when the callback is made.  Note
that
> if
> > you make the callback on line 147, you effectively remove (actually, avoid
> > adding) the observer.
> 
> Right, this is a bit confusing. Removing an observer that wasn't added is OK,
> therefore this code. But observers that go away before the signal is ready
> should be able to unregister themselves.
> 
> This is a one-shot event, and adding an observer after the signal is ready
won't
> do anything, since it won't be notified again. I think the callback version
was
> clearer for this; "Observers" could just invalidate the callback (using
> CancellableCallback or invalidating a WeakPtr), and wouldn't have to worry
about
> unregistering.
> 
> I see two options now:
> 
> (1) automatically remove observers right after notifying them.
RemoveObserver()
> is still required to remove observers that go away before the statistics are
> ready.
> 
> (2) go back to the callback version :-)
> 
> I prefer (2); what's your opinion?

http://codereview.chromium.org/10078017/diff/70005/chrome/browser/chromeos/sy...
chrome/browser/chromeos/system/statistics_provider.cc:246:
FOR_EACH_OBSERVER(Observer, observer_list_, OnMachineStatisticsReady());
On 2012/05/21 18:10:02, jar wrote:
> This, for instance, would be a great place to remove the observers (after
we've
> done the callbacks).

Note that it would get messy if you tried to use async registration from other
threads while you were iterating here.  Again, single thread registration and
notification seems like a much clearer picture.

Issue 10078017: Added asynchronous notification of readiness to the StatisticsProvider, and (Closed)

Description

Patch Set 1 #

Patch Set 2 : Fixed non-chromeos builds #

Patch Set 3 : Addressed comments #

Patch Set 4 : Load hardware_class when available without delaying init of MetricsService #

Patch Set 5 : Addressed comments, rebased #

Patch Set 6 : Don't use base::Unretained. Rebased. #

Patch Set 7 : Timeout while waiting for the hardware_class; sample UMA stats #

Patch Set 8 : Addressed Jim's comments #

Patch Set 9 : Fixed clang build #

Patch Set 10 : Fixed clang build, reupload #

Patch Set 11 : Upload again... #

Patch Set 12 : . #

Patch Set 13 : Addressed Ilya's comments #

Messages