Issue 1083603002: Add histograms to record the number of writes to the prefs file

raymes

raymes@chromium.org changed reviewers: + mnissler@chromium.org

5 years, 8 months ago (2015-04-14 02:42:35 UTC) #1

raymes

Mattias: I would like to write a test for this but I wanted to get ...

5 years, 8 months ago (2015-04-14 02:42:35 UTC) #2

Mattias Nissler (ping if slow)

mnissler@chromium.org changed reviewers: + asvitkine@chromium.org

5 years, 8 months ago (2015-04-14 11:58:26 UTC) #3

Mattias Nissler (ping if slow)

Took a look at the code, but then ended up pondering high-level questions again that ...

5 years, 8 months ago (2015-04-14 11:58:27 UTC) #4

Took a look at the code, but then ended up pondering high-level questions again
that we should probably resolve first:

1. Should we write a 0 sample for clients that don't perform a write in a
bucket? Otherwise it's hard to distinguish clients that shut down before the
bucket starts from those that truly don't have any writes.

2. A simpler approach to this could be to do a sparse histogram that just uses
seconds elapsed since startup as the histogram buckets. Whenever we write we'd
just compute the bucket number and add a sample to a single histogram (but still
per file name). This should still give us enough data to determine the
distribution of writes over time. I'm not sure whether it'll help quantifying
the number of writes per client. I'm not a UMA expert though, so adding
asvitkine@ to comment on the most appropriate way to capture write frequency
stats in UMA.

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
File base/prefs/json_pref_store.cc (right):

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
base/prefs/json_pref_store.cc:55: const scoped_refptr<base::SequencedTaskRunner>
sequenced_task_runner);
Would be good to give this a name that makes it clear what it's used for (also
the member).

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
base/prefs/json_pref_store.cc:81: uint32_t write_count_;
Why not a plain int?

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
base/prefs/json_pref_store.cc:96: int64_t start_delay;
might want to call this start_delay_minutes for clarity?

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
base/prefs/json_pref_store.cc:101: };
Could make this an unnamed struct. If you don't, put a blank line here.

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
start_delay_) / commit_interval_;
This calculation doesn't work in the presence of
PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better choice
here?

raymes

On 2015/04/14 11:58:27, Mattias Nissler wrote: > Took a look at the code, but then ...

5 years, 8 months ago (2015-04-15 01:29:01 UTC) #5

On 2015/04/14 11:58:27, Mattias Nissler wrote:
> Took a look at the code, but then ended up pondering high-level questions
again
> that we should probably resolve first:
> 
> 1. Should we write a 0 sample for clients that don't perform a write in a
> bucket? Otherwise it's hard to distinguish clients that shut down before the
> bucket starts from those that truly don't have any writes.

I agree, I think we should record the write. I believe the current code will
record a 0 sample for clients that don't perform a write in that window.

> 
> 2. A simpler approach to this could be to do a sparse histogram that just uses
> seconds elapsed since startup as the histogram buckets. Whenever we write we'd
> just compute the bucket number and add a sample to a single histogram (but
still
> per file name). This should still give us enough data to determine the
> distribution of writes over time. I'm not sure whether it'll help quantifying
> the number of writes per client. I'm not a UMA expert though, so adding
> asvitkine@ to comment on the most appropriate way to capture write frequency
> stats in UMA.

That's a good idea - I think both stats would be useful and the one you suggest
would certainly be simpler to implement. I can't help but think that stat I
suggested would give us a better picture of what is actually going on even
though it's more complicated to implement. I'm certainly open to other people's
opinions here. asvitkine@? benwells@?

>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
> start_delay_) / commit_interval_;
> This calculation doesn't work in the presence of
> PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better choice
> here?

Assuming that in most cases this isn't called, I think the max value still makes
sense. I /believe/ there should be an overflow bucket in the histogram which
would capture all those that go over the number of max writes. This should give
us a good picture of what's going on. e.g. if users are consistently writing
more than once every 10seconds (and hitting max_writes) I think it would be
worrying and worth investigating further.

benwells

On 2015/04/15 01:29:01, raymes wrote: > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > Took ...

5 years, 8 months ago (2015-04-15 01:45:14 UTC) #6

On 2015/04/15 01:29:01, raymes wrote:
> On 2015/04/14 11:58:27, Mattias Nissler wrote:
> > Took a look at the code, but then ended up pondering high-level questions
> again
> > that we should probably resolve first:
> > 
> > 1. Should we write a 0 sample for clients that don't perform a write in a
> > bucket? Otherwise it's hard to distinguish clients that shut down before the
> > bucket starts from those that truly don't have any writes.
> 
> I agree, I think we should record the write. I believe the current code will
> record a 0 sample for clients that don't perform a write in that window.
> 
> > 
> > 2. A simpler approach to this could be to do a sparse histogram that just
uses
> > seconds elapsed since startup as the histogram buckets. Whenever we write
we'd
> > just compute the bucket number and add a sample to a single histogram (but
> still
> > per file name). This should still give us enough data to determine the
> > distribution of writes over time. I'm not sure whether it'll help
quantifying
> > the number of writes per client. I'm not a UMA expert though, so adding
> > asvitkine@ to comment on the most appropriate way to capture write frequency
> > stats in UMA.
> 
> That's a good idea - I think both stats would be useful and the one you
suggest
> would certainly be simpler to implement. I can't help but think that stat I
> suggested would give us a better picture of what is actually going on even
> though it's more complicated to implement. I'm certainly open to other
people's
> opinions here. asvitkine@? benwells@?
>  
> >
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
> > start_delay_) / commit_interval_;
> > This calculation doesn't work in the presence of
> > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better
choice
> > here?
> 
> Assuming that in most cases this isn't called, I think the max value still
makes
> sense. I /believe/ there should be an overflow bucket in the histogram which
> would capture all those that go over the number of max writes. This should
give
> us a good picture of what's going on. e.g. if users are consistently writing
> more than once every 10seconds (and hitting max_writes) I think it would be
> worrying and worth investigating further.

benwells

On 2015/04/15 01:45:14, benwells wrote: > On 2015/04/15 01:29:01, raymes wrote: > > On 2015/04/14 ...

5 years, 8 months ago (2015-04-15 01:48:37 UTC) #7

On 2015/04/15 01:45:14, benwells wrote:
> On 2015/04/15 01:29:01, raymes wrote:
> > On 2015/04/14 11:58:27, Mattias Nissler wrote:
> > > Took a look at the code, but then ended up pondering high-level questions
> > again
> > > that we should probably resolve first:
> > > 
> > > 1. Should we write a 0 sample for clients that don't perform a write in a
> > > bucket? Otherwise it's hard to distinguish clients that shut down before
the
> > > bucket starts from those that truly don't have any writes.
> > 
> > I agree, I think we should record the write. I believe the current code will
> > record a 0 sample for clients that don't perform a write in that window.
> > 
> > > 
> > > 2. A simpler approach to this could be to do a sparse histogram that just
> uses
> > > seconds elapsed since startup as the histogram buckets. Whenever we write
> we'd
> > > just compute the bucket number and add a sample to a single histogram (but
> > still
> > > per file name). This should still give us enough data to determine the
> > > distribution of writes over time. I'm not sure whether it'll help
> quantifying
> > > the number of writes per client. I'm not a UMA expert though, so adding
> > > asvitkine@ to comment on the most appropriate way to capture write
frequency
> > > stats in UMA.
> > 
> > That's a good idea - I think both stats would be useful and the one you
> suggest
> > would certainly be simpler to implement. I can't help but think that stat I
> > suggested would give us a better picture of what is actually going on even
> > though it's more complicated to implement. I'm certainly open to other
> people's
> > opinions here. asvitkine@? benwells@?

I think the simpler option is probably easier. The only problem is that it will
need to be weighted to derive a writes / minute kind of histogram, as things
which are writing more frequently will add more data points.

E.g. if one client is writing every 10 seconds, it will have 6 times as many
data points as a client writing every minute. This will mean the histograms will
give a skewed impression if read naively.

However I think this is pretty easy to account for by multiplying each bucket by
its average number or seconds, or something like that. We could even account for
it before we write but it might be better to just have the raw data logged.

> >  
> > >
> >
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
> > > start_delay_) / commit_interval_;
> > > This calculation doesn't work in the presence of
> > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better
> choice
> > > here?
> > 
> > Assuming that in most cases this isn't called, I think the max value still
> makes
> > sense. I /believe/ there should be an overflow bucket in the histogram which
> > would capture all those that go over the number of max writes. This should
> give
> > us a good picture of what's going on. e.g. if users are consistently writing
> > more than once every 10seconds (and hitting max_writes) I think it would be
> > worrying and worth investigating further.

raymes

That sounds reasonable to me. Alternatively: what if we accumulate writes and report them every ...

5 years, 8 months ago (2015-04-15 06:18:19 UTC) #8

That sounds reasonable to me.

Alternatively: what if we accumulate writes and report them every 5minutes
(writes/5mins)? We could be clever in doing this and only send histogram
data when a write occurs. E.g. if we are doing a write and no writes have
occurred in the previous 10minutes we could report 2 samples of 0 and start
accumulating for the next 5minute period. Writes/5minutes would give a more
concrete data set to interpret/understand than a weighted "time since last
writes" metric.

What do you think?

As an aside, I think there are 2 things we are interested in:
1) How often we reach the cap on on writes. E.g. how often we write more
than once every 10 seconds.
2) Ensuring that when we add karma data the number of writes does not
increase significantly.

I think that either of these metrics would give a sense of both of these.

Thanks!
Raymes

On Wed, 15 Apr 2015 at 11:48 <benwells@chromium.org> wrote:

> On 2015/04/15 01:45:14, benwells wrote:
> > On 2015/04/15 01:29:01, raymes wrote:
> > > On 2015/04/14 11:58:27, Mattias Nissler wrote:
> > > > Took a look at the code, but then ended up pondering high-level
> > questions
> > > again
> > > > that we should probably resolve first:
> > > >
> > > > 1. Should we write a 0 sample for clients that don't perform a write
> > in a
> > > > bucket? Otherwise it's hard to distinguish clients that shut down
> > before
> the
> > > > bucket starts from those that truly don't have any writes.
> > >
> > > I agree, I think we should record the write. I believe the current code
> > will
> > > record a 0 sample for clients that don't perform a write in that
> window.
> > >
> > > >
> > > > 2. A simpler approach to this could be to do a sparse histogram that
> > just
> > uses
> > > > seconds elapsed since startup as the histogram buckets. Whenever we
> > write
> > we'd
> > > > just compute the bucket number and add a sample to a single histogram
> > (but
> > > still
> > > > per file name). This should still give us enough data to determine
> the
> > > > distribution of writes over time. I'm not sure whether it'll help
> > quantifying
> > > > the number of writes per client. I'm not a UMA expert though, so
> > adding
> > > > asvitkine@ to comment on the most appropriate way to capture write
> frequency
> > > > stats in UMA.
> > >
> > > That's a good idea - I think both stats would be useful and the one you
> > suggest
> > > would certainly be simpler to implement. I can't help but think that
> > stat I
> > > suggested would give us a better picture of what is actually going on
> > even
> > > though it's more complicated to implement. I'm certainly open to other
> > people's
> > > opinions here. asvitkine@? benwells@?
>
> I think the simpler option is probably easier. The only problem is that it
> will
> need to be weighted to derive a writes / minute kind of histogram, as
> things
> which are writing more frequently will add more data points.
>
> E.g. if one client is writing every 10 seconds, it will have 6 times as
> many
> data points as a client writing every minute. This will mean the histograms
> will
> give a skewed impression if read naively.
>
> However I think this is pretty easy to account for by multiplying each
> bucket by
> its average number or seconds, or something like that. We could even
> account for
> it before we write but it might be better to just have the raw data logged.
>
> > >
> > > >
> > >
>
>
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> > > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
> > > > start_delay_) / commit_interval_;
> > > > This calculation doesn't work in the presence of
> > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a
> > better
> > choice
> > > > here?
> > >
> > > Assuming that in most cases this isn't called, I think the max value
> > still
> > makes
> > > sense. I /believe/ there should be an overflow bucket in the histogram
> > which
> > > would capture all those that go over the number of max writes. This
> > should
> > give
> > > us a good picture of what's going on. e.g. if users are consistently
> > writing
> > > more than once every 10seconds (and hitting max_writes) I think it
> > would be
> > > worrying and worth investigating further.
>
>
>
> https://codereview.chromium.org/1083603002/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Alexei Svitkine (slow)

https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_store.cc File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_store.cc#newcode536 base/prefs/json_pref_store.cc:536: ++write_count_; Can you add a DCHECK to ensure this ...

5 years, 8 months ago (2015-04-15 16:41:23 UTC) #9

benwells

On 2015/04/15 06:18:19, raymes wrote: > That sounds reasonable to me. > > Alternatively: what ...

5 years, 8 months ago (2015-04-16 00:41:55 UTC) #10

On 2015/04/15 06:18:19, raymes wrote:
> That sounds reasonable to me.
> 
> Alternatively: what if we accumulate writes and report them every 5minutes
> (writes/5mins)? We could be clever in doing this and only send histogram
> data when a write occurs. E.g. if we are doing a write and no writes have
> occurred in the previous 10minutes we could report 2 samples of 0 and start
> accumulating for the next 5minute period. Writes/5minutes would give a more
> concrete data set to interpret/understand than a weighted "time since last
> writes" metric.
> 
> What do you think?
> 
> As an aside, I think there are 2 things we are interested in:
> 1) How often we reach the cap on on writes. E.g. how often we write more
> than once every 10 seconds.
> 2) Ensuring that when we add karma data the number of writes does not
> increase significantly.
> 
> I think that either of these metrics would give a sense of both of these.

In the absence of advantages I think we should do it in the simplest way
possible. Is there an advantage to the storing it in buckets (e.g. writes/5
mins). Is it just that we don't have to adjust from the raw data to get a feel
for write frequency?

Are you worried about overhead from the amount of UMA stuff we'd be creating? My
understanding is that it is very cheap, but I could be wrong.

asvitkine@: should we be worried about it?

I'm also interested in whether mnissler@ has an opinion on which way to go.

> 
> Thanks!
> Raymes
> 
> On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote:
> 
> > On 2015/04/15 01:45:14, benwells wrote:
> > > On 2015/04/15 01:29:01, raymes wrote:
> > > > On 2015/04/14 11:58:27, Mattias Nissler wrote:
> > > > > Took a look at the code, but then ended up pondering high-level
> > > questions
> > > > again
> > > > > that we should probably resolve first:
> > > > >
> > > > > 1. Should we write a 0 sample for clients that don't perform a write
> > > in a
> > > > > bucket? Otherwise it's hard to distinguish clients that shut down
> > > before
> > the
> > > > > bucket starts from those that truly don't have any writes.
> > > >
> > > > I agree, I think we should record the write. I believe the current code
> > > will
> > > > record a 0 sample for clients that don't perform a write in that
> > window.
> > > >
> > > > >
> > > > > 2. A simpler approach to this could be to do a sparse histogram that
> > > just
> > > uses
> > > > > seconds elapsed since startup as the histogram buckets. Whenever we
> > > write
> > > we'd
> > > > > just compute the bucket number and add a sample to a single histogram
> > > (but
> > > > still
> > > > > per file name). This should still give us enough data to determine
> > the
> > > > > distribution of writes over time. I'm not sure whether it'll help
> > > quantifying
> > > > > the number of writes per client. I'm not a UMA expert though, so
> > > adding
> > > > > asvitkine@ to comment on the most appropriate way to capture write
> > frequency
> > > > > stats in UMA.
> > > >
> > > > That's a good idea - I think both stats would be useful and the one you
> > > suggest
> > > > would certainly be simpler to implement. I can't help but think that
> > > stat I
> > > > suggested would give us a better picture of what is actually going on
> > > even
> > > > though it's more complicated to implement. I'm certainly open to other
> > > people's
> > > > opinions here. asvitkine@? benwells@?
> >
> > I think the simpler option is probably easier. The only problem is that it
> > will
> > need to be weighted to derive a writes / minute kind of histogram, as
> > things
> > which are writing more frequently will add more data points.
> >
> > E.g. if one client is writing every 10 seconds, it will have 6 times as
> > many
> > data points as a client writing every minute. This will mean the histograms
> > will
> > give a skewed impression if read naively.
> >
> > However I think this is pretty easy to account for by multiplying each
> > bucket by
> > its average number or seconds, or something like that. We could even
> > account for
> > it before we write but it might be better to just have the raw data logged.
> >
> > > >
> > > > >
> > > >
> >
> >
> >
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ -
> > > > > start_delay_) / commit_interval_;
> > > > > This calculation doesn't work in the presence of
> > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a
> > > better
> > > choice
> > > > > here?
> > > >
> > > > Assuming that in most cases this isn't called, I think the max value
> > > still
> > > makes
> > > > sense. I /believe/ there should be an overflow bucket in the histogram
> > > which
> > > > would capture all those that go over the number of max writes. This
> > > should
> > > give
> > > > us a good picture of what's going on. e.g. if users are consistently
> > > writing
> > > > more than once every 10seconds (and hitting max_writes) I think it
> > > would be
> > > > worrying and worth investigating further.
> >
> >
> >
> > https://codereview.chromium.org/1083603002/
> >
> 
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.

raymes

I think I was more concerned about having a metric which we can reason about ...

5 years, 8 months ago (2015-04-16 01:00:04 UTC) #11

I think I was more concerned about having a metric which we can reason
about accurately. writes/5mins allows me to look at the actual proportion
of the time (as a true percentage) that we spend in each histogram bucket.
I'm not sure if we can do that with the weighted metric also? My statistics
aren't very good but when I thought about it, it seemed like the weighted
metric would give some approximation and I'm not sure how statistically
accurate that approximation would be. I guess that's what I'm worried about
:)

On Thu, 16 Apr 2015 at 10:41 <benwells@chromium.org> wrote:

> On 2015/04/15 06:18:19, raymes wrote:
> > That sounds reasonable to me.
>
> > Alternatively: what if we accumulate writes and report them every
> 5minutes
> > (writes/5mins)? We could be clever in doing this and only send histogram
> > data when a write occurs. E.g. if we are doing a write and no writes have
> > occurred in the previous 10minutes we could report 2 samples of 0 and
> > start
> > accumulating for the next 5minute period. Writes/5minutes would give a
> > more
> > concrete data set to interpret/understand than a weighted "time since
> last
> > writes" metric.
>
> > What do you think?
>
> > As an aside, I think there are 2 things we are interested in:
> > 1) How often we reach the cap on on writes. E.g. how often we write more
> > than once every 10 seconds.
> > 2) Ensuring that when we add karma data the number of writes does not
> > increase significantly.
>
> > I think that either of these metrics would give a sense of both of these.
>
> In the absence of advantages I think we should do it in the simplest way
> possible. Is there an advantage to the storing it in buckets (e.g. writes/5
> mins). Is it just that we don't have to adjust from the raw data to get a
> feel
> for write frequency?
>
> Are you worried about overhead from the amount of UMA stuff we'd be
> creating? My
> understanding is that it is very cheap, but I could be wrong.
>
> asvitkine@: should we be worried about it?
>
> I'm also interested in whether mnissler@ has an opinion on which way to
> go.
>
>
> > Thanks!
> > Raymes
>
> > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote:
>
> > > On 2015/04/15 01:45:14, benwells wrote:
> > > > On 2015/04/15 01:29:01, raymes wrote:
> > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote:
> > > > > > Took a look at the code, but then ended up pondering high-level
> > > > questions
> > > > > again
> > > > > > that we should probably resolve first:
> > > > > >
> > > > > > 1. Should we write a 0 sample for clients that don't perform a
> > write
> > > > in a
> > > > > > bucket? Otherwise it's hard to distinguish clients that shut down
> > > > before
> > > the
> > > > > > bucket starts from those that truly don't have any writes.
> > > > >
> > > > > I agree, I think we should record the write. I believe the current
> > code
> > > > will
> > > > > record a 0 sample for clients that don't perform a write in that
> > > window.
> > > > >
> > > > > >
> > > > > > 2. A simpler approach to this could be to do a sparse histogram
> > that
> > > > just
> > > > uses
> > > > > > seconds elapsed since startup as the histogram buckets. Whenever
> > we
> > > > write
> > > > we'd
> > > > > > just compute the bucket number and add a sample to a single
> > histogram
> > > > (but
> > > > > still
> > > > > > per file name). This should still give us enough data to
> determine
> > > the
> > > > > > distribution of writes over time. I'm not sure whether it'll help
> > > > quantifying
> > > > > > the number of writes per client. I'm not a UMA expert though, so
> > > > adding
> > > > > > asvitkine@ to comment on the most appropriate way to capture
> write
> > > frequency
> > > > > > stats in UMA.
> > > > >
> > > > > That's a good idea - I think both stats would be useful and the one
> > you
> > > > suggest
> > > > > would certainly be simpler to implement. I can't help but think
> that
> > > > stat I
> > > > > suggested would give us a better picture of what is actually going
> > on
> > > > even
> > > > > though it's more complicated to implement. I'm certainly open to
> > other
> > > > people's
> > > > > opinions here. asvitkine@? benwells@?
> > >
> > > I think the simpler option is probably easier. The only problem is that
> > it
> > > will
> > > need to be weighted to derive a writes / minute kind of histogram, as
> > > things
> > > which are writing more frequently will add more data points.
> > >
> > > E.g. if one client is writing every 10 seconds, it will have 6 times as
> > > many
> > > data points as a client writing every minute. This will mean the
> > histograms
> > > will
> > > give a skewed impression if read naively.
> > >
> > > However I think this is pretty easy to account for by multiplying each
> > > bucket by
> > > its average number or seconds, or something like that. We could even
> > > account for
> > > it before we write but it might be better to just have the raw data
> > logged.
> > >
> > > > >
> > > > > >
> > > > >
> > >
> > >
> > >
>
>
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value =
> > (stop_delay_ -
> > > > > > start_delay_) / commit_interval_;
> > > > > > This calculation doesn't work in the presence of
> > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a
> > > > better
> > > > choice
> > > > > > here?
> > > > >
> > > > > Assuming that in most cases this isn't called, I think the max
> value
> > > > still
> > > > makes
> > > > > sense. I /believe/ there should be an overflow bucket in the
> > histogram
> > > > which
> > > > > would capture all those that go over the number of max writes. This
> > > > should
> > > > give
> > > > > us a good picture of what's going on. e.g. if users are
> consistently
> > > > writing
> > > > > more than once every 10seconds (and hitting max_writes) I think it
> > > > would be
> > > > > worrying and worth investigating further.
> > >
> > >
> > >
> > > https://codereview.chromium.org/1083603002/
> > >
>
> > To unsubscribe from this group and stop receiving emails from it, send an
> email
> > to mailto:chromium-reviews+unsubscribe@chromium.org.
>
>
>
> https://codereview.chromium.org/1083603002/
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

raymes

Also I think writes/5mins would only be a little harder to implement (certainly not as ...

5 years, 8 months ago (2015-04-16 01:05:04 UTC) #12

Also I think writes/5mins would only be a little harder to implement
(certainly not as bad as my original proposal).

On Thu, 16 Apr 2015 at 10:59 Raymes Khoury <raymes@chromium.org> wrote:

> I think I was more concerned about having a metric which we can reason
> about accurately. writes/5mins allows me to look at the actual proportion
> of the time (as a true percentage) that we spend in each histogram bucket.
> I'm not sure if we can do that with the weighted metric also? My statistics
> aren't very good but when I thought about it, it seemed like the weighted
> metric would give some approximation and I'm not sure how statistically
> accurate that approximation would be. I guess that's what I'm worried about
> :)
>
> On Thu, 16 Apr 2015 at 10:41 <benwells@chromium.org> wrote:
>
>> On 2015/04/15 06:18:19, raymes wrote:
>> > That sounds reasonable to me.
>>
>> > Alternatively: what if we accumulate writes and report them every
>> 5minutes
>> > (writes/5mins)? We could be clever in doing this and only send histogram
>> > data when a write occurs. E.g. if we are doing a write and no writes
>> have
>> > occurred in the previous 10minutes we could report 2 samples of 0 and
>> > start
>> > accumulating for the next 5minute period. Writes/5minutes would give a
>> > more
>> > concrete data set to interpret/understand than a weighted "time since
>> last
>> > writes" metric.
>>
>> > What do you think?
>>
>> > As an aside, I think there are 2 things we are interested in:
>> > 1) How often we reach the cap on on writes. E.g. how often we write more
>> > than once every 10 seconds.
>> > 2) Ensuring that when we add karma data the number of writes does not
>> > increase significantly.
>>
>> > I think that either of these metrics would give a sense of both of
>> these.
>>
>> In the absence of advantages I think we should do it in the simplest way
>> possible. Is there an advantage to the storing it in buckets (e.g.
>> writes/5
>> mins). Is it just that we don't have to adjust from the raw data to get a
>> feel
>> for write frequency?
>>
>> Are you worried about overhead from the amount of UMA stuff we'd be
>> creating? My
>> understanding is that it is very cheap, but I could be wrong.
>>
>> asvitkine@: should we be worried about it?
>>
>> I'm also interested in whether mnissler@ has an opinion on which way to
>> go.
>>
>>
>> > Thanks!
>> > Raymes
>>
>> > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote:
>>
>> > > On 2015/04/15 01:45:14, benwells wrote:
>> > > > On 2015/04/15 01:29:01, raymes wrote:
>> > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote:
>> > > > > > Took a look at the code, but then ended up pondering high-level
>> > > > questions
>> > > > > again
>> > > > > > that we should probably resolve first:
>> > > > > >
>> > > > > > 1. Should we write a 0 sample for clients that don't perform a
>> > write
>> > > > in a
>> > > > > > bucket? Otherwise it's hard to distinguish clients that shut
>> down
>> > > > before
>> > > the
>> > > > > > bucket starts from those that truly don't have any writes.
>> > > > >
>> > > > > I agree, I think we should record the write. I believe the current
>> > code
>> > > > will
>> > > > > record a 0 sample for clients that don't perform a write in that
>> > > window.
>> > > > >
>> > > > > >
>> > > > > > 2. A simpler approach to this could be to do a sparse histogram
>> > that
>> > > > just
>> > > > uses
>> > > > > > seconds elapsed since startup as the histogram buckets. Whenever
>> > we
>> > > > write
>> > > > we'd
>> > > > > > just compute the bucket number and add a sample to a single
>> > histogram
>> > > > (but
>> > > > > still
>> > > > > > per file name). This should still give us enough data to
>> determine
>> > > the
>> > > > > > distribution of writes over time. I'm not sure whether it'll
>> help
>> > > > quantifying
>> > > > > > the number of writes per client. I'm not a UMA expert though, so
>> > > > adding
>> > > > > > asvitkine@ to comment on the most appropriate way to capture
>> write
>> > > frequency
>> > > > > > stats in UMA.
>> > > > >
>> > > > > That's a good idea - I think both stats would be useful and the
>> one
>> > you
>> > > > suggest
>> > > > > would certainly be simpler to implement. I can't help but think
>> that
>> > > > stat I
>> > > > > suggested would give us a better picture of what is actually going
>> > on
>> > > > even
>> > > > > though it's more complicated to implement. I'm certainly open to
>> > other
>> > > > people's
>> > > > > opinions here. asvitkine@? benwells@?
>> > >
>> > > I think the simpler option is probably easier. The only problem is
>> that
>> > it
>> > > will
>> > > need to be weighted to derive a writes / minute kind of histogram, as
>> > > things
>> > > which are writing more frequently will add more data points.
>> > >
>> > > E.g. if one client is writing every 10 seconds, it will have 6 times
>> as
>> > > many
>> > > data points as a client writing every minute. This will mean the
>> > histograms
>> > > will
>> > > give a skewed impression if read naively.
>> > >
>> > > However I think this is pretty easy to account for by multiplying each
>> > > bucket by
>> > > its average number or seconds, or something like that. We could even
>> > > account for
>> > > it before we write but it might be better to just have the raw data
>> > logged.
>> > >
>> > > > >
>> > > > > >
>> > > > >
>> > >
>> > >
>> > >
>>
>>
>>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
>> > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value =
>> > (stop_delay_ -
>> > > > > > start_delay_) / commit_interval_;
>> > > > > > This calculation doesn't work in the presence of
>> > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a
>> > > > better
>> > > > choice
>> > > > > > here?
>> > > > >
>> > > > > Assuming that in most cases this isn't called, I think the max
>> value
>> > > > still
>> > > > makes
>> > > > > sense. I /believe/ there should be an overflow bucket in the
>> > histogram
>> > > > which
>> > > > > would capture all those that go over the number of max writes.
>> This
>> > > > should
>> > > > give
>> > > > > us a good picture of what's going on. e.g. if users are
>> consistently
>> > > > writing
>> > > > > more than once every 10seconds (and hitting max_writes) I think it
>> > > > would be
>> > > > > worrying and worth investigating further.
>> > >
>> > >
>> > >
>> > > https://codereview.chromium.org/1083603002/
>> > >
>>
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an
>> email
>> > to mailto:chromium-reviews+unsubscribe@chromium.org.
>>
>>
>>
>> https://codereview.chromium.org/1083603002/
>>
>

To unsubscribe from this group and stop receiving emails from it, send an email
to chromium-reviews+unsubscribe@chromium.org.

Mattias Nissler (ping if slow)

On 2015/04/16 01:05:04, raymes wrote: > Also I think writes/5mins would only be a little ...

5 years, 8 months ago (2015-04-16 14:07:02 UTC) #13

On 2015/04/16 01:05:04, raymes wrote:
> Also I think writes/5mins would only be a little harder to implement
> (certainly not as bad as my original proposal).
> 
> On Thu, 16 Apr 2015 at 10:59 Raymes Khoury <mailto:raymes@chromium.org> wrote:
> 
> > I think I was more concerned about having a metric which we can reason
> > about accurately. writes/5mins allows me to look at the actual proportion
> > of the time (as a true percentage) that we spend in each histogram bucket.
> > I'm not sure if we can do that with the weighted metric also? My statistics
> > aren't very good but when I thought about it, it seemed like the weighted
> > metric would give some approximation and I'm not sure how statistically
> > accurate that approximation would be. I guess that's what I'm worried about
> > :)

This is exactly what I was thinking when I started the discussion, and this is
also why I brought in asvitkine@. If we are able to get the signal we want from
a single histogram, I'd opt for that to keep things simple. If not, I'm fine
with pre-aggreation on the client and sending the resulting data. I haven't
thought deeply about it myself, but was hoping that somebody could help out with
that and convince me :)

> >
> > On Thu, 16 Apr 2015 at 10:41 <mailto:benwells@chromium.org> wrote:
> >
> >> On 2015/04/15 06:18:19, raymes wrote:
> >> > That sounds reasonable to me.
> >>
> >> > Alternatively: what if we accumulate writes and report them every
> >> 5minutes
> >> > (writes/5mins)? We could be clever in doing this and only send histogram
> >> > data when a write occurs. E.g. if we are doing a write and no writes
> >> have
> >> > occurred in the previous 10minutes we could report 2 samples of 0 and
> >> > start
> >> > accumulating for the next 5minute period. Writes/5minutes would give a
> >> > more
> >> > concrete data set to interpret/understand than a weighted "time since
> >> last
> >> > writes" metric.
> >>
> >> > What do you think?
> >>
> >> > As an aside, I think there are 2 things we are interested in:
> >> > 1) How often we reach the cap on on writes. E.g. how often we write more
> >> > than once every 10 seconds.
> >> > 2) Ensuring that when we add karma data the number of writes does not
> >> > increase significantly.
> >>
> >> > I think that either of these metrics would give a sense of both of
> >> these.
> >>
> >> In the absence of advantages I think we should do it in the simplest way
> >> possible. Is there an advantage to the storing it in buckets (e.g.
> >> writes/5
> >> mins). Is it just that we don't have to adjust from the raw data to get a
> >> feel
> >> for write frequency?
> >>
> >> Are you worried about overhead from the amount of UMA stuff we'd be
> >> creating? My
> >> understanding is that it is very cheap, but I could be wrong.
> >>
> >> asvitkine@: should we be worried about it?
> >>
> >> I'm also interested in whether mnissler@ has an opinion on which way to
> >> go.
> >>
> >>
> >> > Thanks!
> >> > Raymes
> >>
> >> > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote:
> >>
> >> > > On 2015/04/15 01:45:14, benwells wrote:
> >> > > > On 2015/04/15 01:29:01, raymes wrote:
> >> > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote:
> >> > > > > > Took a look at the code, but then ended up pondering high-level
> >> > > > questions
> >> > > > > again
> >> > > > > > that we should probably resolve first:
> >> > > > > >
> >> > > > > > 1. Should we write a 0 sample for clients that don't perform a
> >> > write
> >> > > > in a
> >> > > > > > bucket? Otherwise it's hard to distinguish clients that shut
> >> down
> >> > > > before
> >> > > the
> >> > > > > > bucket starts from those that truly don't have any writes.
> >> > > > >
> >> > > > > I agree, I think we should record the write. I believe the current
> >> > code
> >> > > > will
> >> > > > > record a 0 sample for clients that don't perform a write in that
> >> > > window.
> >> > > > >
> >> > > > > >
> >> > > > > > 2. A simpler approach to this could be to do a sparse histogram
> >> > that
> >> > > > just
> >> > > > uses
> >> > > > > > seconds elapsed since startup as the histogram buckets. Whenever
> >> > we
> >> > > > write
> >> > > > we'd
> >> > > > > > just compute the bucket number and add a sample to a single
> >> > histogram
> >> > > > (but
> >> > > > > still
> >> > > > > > per file name). This should still give us enough data to
> >> determine
> >> > > the
> >> > > > > > distribution of writes over time. I'm not sure whether it'll
> >> help
> >> > > > quantifying
> >> > > > > > the number of writes per client. I'm not a UMA expert though, so
> >> > > > adding
> >> > > > > > asvitkine@ to comment on the most appropriate way to capture
> >> write
> >> > > frequency
> >> > > > > > stats in UMA.
> >> > > > >
> >> > > > > That's a good idea - I think both stats would be useful and the
> >> one
> >> > you
> >> > > > suggest
> >> > > > > would certainly be simpler to implement. I can't help but think
> >> that
> >> > > > stat I
> >> > > > > suggested would give us a better picture of what is actually going
> >> > on
> >> > > > even
> >> > > > > though it's more complicated to implement. I'm certainly open to
> >> > other
> >> > > > people's
> >> > > > > opinions here. asvitkine@? benwells@?
> >> > >
> >> > > I think the simpler option is probably easier. The only problem is
> >> that
> >> > it
> >> > > will
> >> > > need to be weighted to derive a writes / minute kind of histogram, as
> >> > > things
> >> > > which are writing more frequently will add more data points.
> >> > >
> >> > > E.g. if one client is writing every 10 seconds, it will have 6 times
> >> as
> >> > > many
> >> > > data points as a client writing every minute. This will mean the
> >> > histograms
> >> > > will
> >> > > give a skewed impression if read naively.
> >> > >
> >> > > However I think this is pretty easy to account for by multiplying each
> >> > > bucket by
> >> > > its average number or seconds, or something like that. We could even
> >> > > account for
> >> > > it before we write but it might be better to just have the raw data
> >> > logged.
> >> > >
> >> > > > >
> >> > > > > >
> >> > > > >
> >> > >
> >> > >
> >> > >
> >>
> >>
> >>
>
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st...
> >> > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value =
> >> > (stop_delay_ -
> >> > > > > > start_delay_) / commit_interval_;
> >> > > > > > This calculation doesn't work in the presence of
> >> > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a
> >> > > > better
> >> > > > choice
> >> > > > > > here?
> >> > > > >
> >> > > > > Assuming that in most cases this isn't called, I think the max
> >> value
> >> > > > still
> >> > > > makes
> >> > > > > sense. I /believe/ there should be an overflow bucket in the
> >> > histogram
> >> > > > which
> >> > > > > would capture all those that go over the number of max writes.
> >> This
> >> > > > should
> >> > > > give
> >> > > > > us a good picture of what's going on. e.g. if users are
> >> consistently
> >> > > > writing
> >> > > > > more than once every 10seconds (and hitting max_writes) I think it
> >> > > > would be
> >> > > > > worrying and worth investigating further.
> >> > >
> >> > >
> >> > >
> >> > > https://codereview.chromium.org/1083603002/
> >> > >
> >>
> >> > To unsubscribe from this group and stop receiving emails from it, send
> >> an
> >> email
> >> > to mailto:chromium-reviews+unsubscribe@chromium.org.
> >>
> >>
> >>
> >> https://codereview.chromium.org/1083603002/
> >>
> >
> 
> To unsubscribe from this group and stop receiving emails from it, send an
email
> to mailto:chromium-reviews+unsubscribe@chromium.org.

raymes

I implemented thea idea of recording the writes at every 5 minute interval. This means ...

5 years, 8 months ago (2015-04-20 04:17:58 UTC) #14

Mattias Nissler (ping if slow)

LGTM w/ nit https://codereview.chromium.org/1083603002/diff/80001/base/prefs/json_pref_store.cc File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/80001/base/prefs/json_pref_store.cc#newcode197 base/prefs/json_pref_store.cc:197: write_count_histogram_.reset(new WriteCountHistogram( looks like there's no ...

5 years, 8 months ago (2015-04-20 09:43:48 UTC) #15

Alexei Svitkine (slow)

Not sure if you missed my comments on the earlier patch set, doesn't seem like ...

5 years, 8 months ago (2015-04-20 15:49:04 UTC) #16

raymes

asvitkine: ptal I'll write a test before landing this though. Thanks! https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_store.cc File base/prefs/json_pref_store.cc (right): ...

5 years, 8 months ago (2015-04-21 07:22:38 UTC) #17

Alexei Svitkine (slow)

https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistics_recorder.h File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistics_recorder.h#newcode38 base/metrics/statistics_recorder.h:38: static void InitializeForTest(); This sounds like Reset rather than ...

5 years, 8 months ago (2015-04-24 16:34:47 UTC) #19

raymes

https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistics_recorder.h File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistics_recorder.h#newcode38 base/metrics/statistics_recorder.h:38: static void InitializeForTest(); On 2015/04/24 16:34:47, Alexei Svitkine wrote: ...

5 years, 8 months ago (2015-04-27 03:27:54 UTC) #20

Alexei Svitkine (slow)

https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistics_recorder.h File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistics_recorder.h#newcode36 base/metrics/statistics_recorder.h:36: // Reset the StatisticsRecorder system for testing. All existing ...

5 years, 8 months ago (2015-04-27 20:21:20 UTC) #21

raymes

Thanks Alexei! https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistics_recorder.h File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistics_recorder.h#newcode36 base/metrics/statistics_recorder.h:36: // Reset the StatisticsRecorder system for testing. ...

5 years, 8 months ago (2015-04-28 00:32:58 UTC) #22

Thanks Alexei!

https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic...
File base/metrics/statistics_recorder.h (right):

https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic...
base/metrics/statistics_recorder.h:36: // Reset the StatisticsRecorder system
for testing. All existing histogram
On 2015/04/27 20:21:20, Alexei Svitkine wrote:
> Nit: Reset -> Resets
> 
> To be consistent with the docs for the function above. (Even though I realise
> the rest of the file is not consistent.)

Done.

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
File base/prefs/json_pref_store.cc (right):

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
base/prefs/json_pref_store.cc:446:
JsonPrefStore::WriteCountHistogram::kHistogramWriteReportIntervalMins = 5;
On 2015/04/27 20:21:20, Alexei Svitkine wrote:
> Add a comment about this that this should not be changed without renaming the
> histograms - else it would create incompatible buckets.

I did something different, but along the same lines. See the comment and DCHECKs
below :)

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
base/prefs/json_pref_store.cc:491: time_since_last_report = current_time -
last_report_time_;
On 2015/04/27 20:21:20, Alexei Svitkine wrote:
> This logic is hard to follow.
> 
> I think the effect is that it logs |writes_since_last_report_| once for the
> current interval and then records 0 a bunch of extra times corresponding to
how
> many extra intervals elapsed.
> 
> If that's the effect, how about expressing that logic differently so that it's
> easier to understand that that's what's going on (otherwise it's very
difficult
> - it took me maybe 5 minutes to work this through with the current loop),
maybe
> something like:
> 
> if (time_since_last_report <= report_interval_)
>   return;
> histogram->Add(writes_since_last_report_);
> int num_additional_intervals_elapsed = /* calculation */;
> for (int i = 0; i < num_additional_intervals_elapsed; i++)
>   histogram->Add(0);
> writes_since_last_report_ = 0;
> last_report_time_ = /* ... */;

Done.

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
base/prefs/json_pref_store.cc:506: int32_t max_value = report_interval_ /
commit_interval_;
I decided to do something more safe. I DCHECKed the exact values of max_value
and num_buckets to ensure that if they change, someone will update the
histogram. Most histograms hard code these values so I think it's ok and
provides a good tradeoff: it still expresses how we arrived at those numbers in
the code and the DCHECKs provide safety.

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
File base/prefs/json_pref_store_unittest.cc (right):

https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s...
base/prefs/json_pref_store_unittest.cc:34: void
SetCurrentTimeInMinutes(base::SimpleTestClock* clock, double minutes) {
On 2015/04/27 20:21:20, Alexei Svitkine wrote:
> Nit: Modifiable parameters should be last.

Done.

raymes

The patchset sent to the CQ was uploaded after l-g-t-m from mnissler@chromium.org Link to the ...

5 years, 7 months ago (2015-04-29 00:18:21 UTC) #25