|
|
Created:
5 years, 8 months ago by raymes Modified:
5 years, 7 months ago CC:
chromium-reviews, erikwright+watch_chromium.org, chrome-apps-syd-reviews_chromium.org Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionAdd histograms to record the number of writes to the prefs file
This adds histograms that records how often we write to the prefs file. The number of writes is recorded for every 5 minute interval of time spent in Chrome.
To avoid doing any work when Chrome is idle and not recording any writes, we only record metrics at the point when a write occurs. Whenever a write occurs we look to see if the last write occurred more than 5mins ago. If it did, we report the write count for the previous period of time.
The histogram is called:
Settings.JsonDataWriteCount.<pref filename>
e.g.
Settings.JsonDataWriteCount.Local_State
This is in a similar vein to Settings.JsonDataReadSizeKilobytes.*
BUG=476800
Committed: https://crrev.com/bfb910ab2937285271031b257f3ba248afff6fdc
Cr-Commit-Position: refs/heads/master@{#327446}
Patch Set 1 #Patch Set 2 : #
Total comments: 9
Patch Set 3 : #Patch Set 4 : #Patch Set 5 : #
Total comments: 1
Patch Set 6 : #Patch Set 7 : #
Total comments: 6
Patch Set 8 : #
Total comments: 10
Patch Set 9 : #Patch Set 10 : #Patch Set 11 : #Patch Set 12 : #
Messages
Total messages: 39 (11 generated)
raymes@chromium.org changed reviewers: + mnissler@chromium.org
Mattias: I would like to write a test for this but I wanted to get feedback before spending the time on it. Does this approach seem ok to you? I managed to avoid changing the API by recording all files that get written, but this seems to be ok to do (I followed the pattern of the Settings.JsonDataReadSizeKilobytes.* histograms which are also recorded in this file). I also managed to avoid changing ImportantFileWriter because it already calls back into JsonPrefStore::SerializeData at the point that it writes. Let me know your thoughts. Thanks!
mnissler@chromium.org changed reviewers: + asvitkine@chromium.org
Took a look at the code, but then ended up pondering high-level questions again that we should probably resolve first: 1. Should we write a 0 sample for clients that don't perform a write in a bucket? Otherwise it's hard to distinguish clients that shut down before the bucket starts from those that truly don't have any writes. 2. A simpler approach to this could be to do a sparse histogram that just uses seconds elapsed since startup as the histogram buckets. Whenever we write we'd just compute the bucket number and add a sample to a single histogram (but still per file name). This should still give us enough data to determine the distribution of writes over time. I'm not sure whether it'll help quantifying the number of writes per client. I'm not a UMA expert though, so adding asvitkine@ to comment on the most appropriate way to capture write frequency stats in UMA. https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:55: const scoped_refptr<base::SequencedTaskRunner> sequenced_task_runner); Would be good to give this a name that makes it clear what it's used for (also the member). https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:81: uint32_t write_count_; Why not a plain int? https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:96: int64_t start_delay; might want to call this start_delay_minutes for clarity? https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:101: }; Could make this an unnamed struct. If you don't, put a blank line here. https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - start_delay_) / commit_interval_; This calculation doesn't work in the presence of PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better choice here?
On 2015/04/14 11:58:27, Mattias Nissler wrote: > Took a look at the code, but then ended up pondering high-level questions again > that we should probably resolve first: > > 1. Should we write a 0 sample for clients that don't perform a write in a > bucket? Otherwise it's hard to distinguish clients that shut down before the > bucket starts from those that truly don't have any writes. I agree, I think we should record the write. I believe the current code will record a 0 sample for clients that don't perform a write in that window. > > 2. A simpler approach to this could be to do a sparse histogram that just uses > seconds elapsed since startup as the histogram buckets. Whenever we write we'd > just compute the bucket number and add a sample to a single histogram (but still > per file name). This should still give us enough data to determine the > distribution of writes over time. I'm not sure whether it'll help quantifying > the number of writes per client. I'm not a UMA expert though, so adding > asvitkine@ to comment on the most appropriate way to capture write frequency > stats in UMA. That's a good idea - I think both stats would be useful and the one you suggest would certainly be simpler to implement. I can't help but think that stat I suggested would give us a better picture of what is actually going on even though it's more complicated to implement. I'm certainly open to other people's opinions here. asvitkine@? benwells@? > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - > start_delay_) / commit_interval_; > This calculation doesn't work in the presence of > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better choice > here? Assuming that in most cases this isn't called, I think the max value still makes sense. I /believe/ there should be an overflow bucket in the histogram which would capture all those that go over the number of max writes. This should give us a good picture of what's going on. e.g. if users are consistently writing more than once every 10seconds (and hitting max_writes) I think it would be worrying and worth investigating further.
On 2015/04/15 01:29:01, raymes wrote: > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > Took a look at the code, but then ended up pondering high-level questions > again > > that we should probably resolve first: > > > > 1. Should we write a 0 sample for clients that don't perform a write in a > > bucket? Otherwise it's hard to distinguish clients that shut down before the > > bucket starts from those that truly don't have any writes. > > I agree, I think we should record the write. I believe the current code will > record a 0 sample for clients that don't perform a write in that window. > > > > > 2. A simpler approach to this could be to do a sparse histogram that just uses > > seconds elapsed since startup as the histogram buckets. Whenever we write we'd > > just compute the bucket number and add a sample to a single histogram (but > still > > per file name). This should still give us enough data to determine the > > distribution of writes over time. I'm not sure whether it'll help quantifying > > the number of writes per client. I'm not a UMA expert though, so adding > > asvitkine@ to comment on the most appropriate way to capture write frequency > > stats in UMA. > > That's a good idea - I think both stats would be useful and the one you suggest > would certainly be simpler to implement. I can't help but think that stat I > suggested would give us a better picture of what is actually going on even > though it's more complicated to implement. I'm certainly open to other people's > opinions here. asvitkine@? benwells@? > > > > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - > > start_delay_) / commit_interval_; > > This calculation doesn't work in the presence of > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better choice > > here? > > Assuming that in most cases this isn't called, I think the max value still makes > sense. I /believe/ there should be an overflow bucket in the histogram which > would capture all those that go over the number of max writes. This should give > us a good picture of what's going on. e.g. if users are consistently writing > more than once every 10seconds (and hitting max_writes) I think it would be > worrying and worth investigating further.
On 2015/04/15 01:45:14, benwells wrote: > On 2015/04/15 01:29:01, raymes wrote: > > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > > Took a look at the code, but then ended up pondering high-level questions > > again > > > that we should probably resolve first: > > > > > > 1. Should we write a 0 sample for clients that don't perform a write in a > > > bucket? Otherwise it's hard to distinguish clients that shut down before the > > > bucket starts from those that truly don't have any writes. > > > > I agree, I think we should record the write. I believe the current code will > > record a 0 sample for clients that don't perform a write in that window. > > > > > > > > 2. A simpler approach to this could be to do a sparse histogram that just > uses > > > seconds elapsed since startup as the histogram buckets. Whenever we write > we'd > > > just compute the bucket number and add a sample to a single histogram (but > > still > > > per file name). This should still give us enough data to determine the > > > distribution of writes over time. I'm not sure whether it'll help > quantifying > > > the number of writes per client. I'm not a UMA expert though, so adding > > > asvitkine@ to comment on the most appropriate way to capture write frequency > > > stats in UMA. > > > > That's a good idea - I think both stats would be useful and the one you > suggest > > would certainly be simpler to implement. I can't help but think that stat I > > suggested would give us a better picture of what is actually going on even > > though it's more complicated to implement. I'm certainly open to other > people's > > opinions here. asvitkine@? benwells@? I think the simpler option is probably easier. The only problem is that it will need to be weighted to derive a writes / minute kind of histogram, as things which are writing more frequently will add more data points. E.g. if one client is writing every 10 seconds, it will have 6 times as many data points as a client writing every minute. This will mean the histograms will give a skewed impression if read naively. However I think this is pretty easy to account for by multiplying each bucket by its average number or seconds, or something like that. We could even account for it before we write but it might be better to just have the raw data logged. > > > > > > > > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - > > > start_delay_) / commit_interval_; > > > This calculation doesn't work in the presence of > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a better > choice > > > here? > > > > Assuming that in most cases this isn't called, I think the max value still > makes > > sense. I /believe/ there should be an overflow bucket in the histogram which > > would capture all those that go over the number of max writes. This should > give > > us a good picture of what's going on. e.g. if users are consistently writing > > more than once every 10seconds (and hitting max_writes) I think it would be > > worrying and worth investigating further.
That sounds reasonable to me. Alternatively: what if we accumulate writes and report them every 5minutes (writes/5mins)? We could be clever in doing this and only send histogram data when a write occurs. E.g. if we are doing a write and no writes have occurred in the previous 10minutes we could report 2 samples of 0 and start accumulating for the next 5minute period. Writes/5minutes would give a more concrete data set to interpret/understand than a weighted "time since last writes" metric. What do you think? As an aside, I think there are 2 things we are interested in: 1) How often we reach the cap on on writes. E.g. how often we write more than once every 10 seconds. 2) Ensuring that when we add karma data the number of writes does not increase significantly. I think that either of these metrics would give a sense of both of these. Thanks! Raymes On Wed, 15 Apr 2015 at 11:48 <benwells@chromium.org> wrote: > On 2015/04/15 01:45:14, benwells wrote: > > On 2015/04/15 01:29:01, raymes wrote: > > > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > > > Took a look at the code, but then ended up pondering high-level > > questions > > > again > > > > that we should probably resolve first: > > > > > > > > 1. Should we write a 0 sample for clients that don't perform a write > > in a > > > > bucket? Otherwise it's hard to distinguish clients that shut down > > before > the > > > > bucket starts from those that truly don't have any writes. > > > > > > I agree, I think we should record the write. I believe the current code > > will > > > record a 0 sample for clients that don't perform a write in that > window. > > > > > > > > > > > 2. A simpler approach to this could be to do a sparse histogram that > > just > > uses > > > > seconds elapsed since startup as the histogram buckets. Whenever we > > write > > we'd > > > > just compute the bucket number and add a sample to a single histogram > > (but > > > still > > > > per file name). This should still give us enough data to determine > the > > > > distribution of writes over time. I'm not sure whether it'll help > > quantifying > > > > the number of writes per client. I'm not a UMA expert though, so > > adding > > > > asvitkine@ to comment on the most appropriate way to capture write > frequency > > > > stats in UMA. > > > > > > That's a good idea - I think both stats would be useful and the one you > > suggest > > > would certainly be simpler to implement. I can't help but think that > > stat I > > > suggested would give us a better picture of what is actually going on > > even > > > though it's more complicated to implement. I'm certainly open to other > > people's > > > opinions here. asvitkine@? benwells@? > > I think the simpler option is probably easier. The only problem is that it > will > need to be weighted to derive a writes / minute kind of histogram, as > things > which are writing more frequently will add more data points. > > E.g. if one client is writing every 10 seconds, it will have 6 times as > many > data points as a client writing every minute. This will mean the histograms > will > give a skewed impression if read naively. > > However I think this is pretty easy to account for by multiplying each > bucket by > its average number or seconds, or something like that. We could even > account for > it before we write but it might be better to just have the raw data logged. > > > > > > > > > > > > > > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - > > > > start_delay_) / commit_interval_; > > > > This calculation doesn't work in the presence of > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a > > better > > choice > > > > here? > > > > > > Assuming that in most cases this isn't called, I think the max value > > still > > makes > > > sense. I /believe/ there should be an overflow bucket in the histogram > > which > > > would capture all those that go over the number of max writes. This > > should > > give > > > us a good picture of what's going on. e.g. if users are consistently > > writing > > > more than once every 10seconds (and hitting max_writes) I think it > > would be > > > worrying and worth investigating further. > > > > https://codereview.chromium.org/1083603002/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:536: ++write_count_; Can you add a DCHECK to ensure this doesn't get hit after SendMetric is hit? Or perhaps you can make it so that this method is responsible for doing the SendMetric once it gets called after stop_delay_ - then it eliminates the potential race (i.e. if for some reason delayed task runs before this hits). Would also simplify things because then you don't need the post task or weak pointer factory. https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:545: "Settings.JsonDataWriteCount." + spaceless_basename + "_" + Please document these histograms in histograms.xml. You can use histogram_suffixes entries to avoid repeating multiple histogram definitions.
On 2015/04/15 06:18:19, raymes wrote: > That sounds reasonable to me. > > Alternatively: what if we accumulate writes and report them every 5minutes > (writes/5mins)? We could be clever in doing this and only send histogram > data when a write occurs. E.g. if we are doing a write and no writes have > occurred in the previous 10minutes we could report 2 samples of 0 and start > accumulating for the next 5minute period. Writes/5minutes would give a more > concrete data set to interpret/understand than a weighted "time since last > writes" metric. > > What do you think? > > As an aside, I think there are 2 things we are interested in: > 1) How often we reach the cap on on writes. E.g. how often we write more > than once every 10 seconds. > 2) Ensuring that when we add karma data the number of writes does not > increase significantly. > > I think that either of these metrics would give a sense of both of these. In the absence of advantages I think we should do it in the simplest way possible. Is there an advantage to the storing it in buckets (e.g. writes/5 mins). Is it just that we don't have to adjust from the raw data to get a feel for write frequency? Are you worried about overhead from the amount of UMA stuff we'd be creating? My understanding is that it is very cheap, but I could be wrong. asvitkine@: should we be worried about it? I'm also interested in whether mnissler@ has an opinion on which way to go. > > Thanks! > Raymes > > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote: > > > On 2015/04/15 01:45:14, benwells wrote: > > > On 2015/04/15 01:29:01, raymes wrote: > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > > > > Took a look at the code, but then ended up pondering high-level > > > questions > > > > again > > > > > that we should probably resolve first: > > > > > > > > > > 1. Should we write a 0 sample for clients that don't perform a write > > > in a > > > > > bucket? Otherwise it's hard to distinguish clients that shut down > > > before > > the > > > > > bucket starts from those that truly don't have any writes. > > > > > > > > I agree, I think we should record the write. I believe the current code > > > will > > > > record a 0 sample for clients that don't perform a write in that > > window. > > > > > > > > > > > > > > 2. A simpler approach to this could be to do a sparse histogram that > > > just > > > uses > > > > > seconds elapsed since startup as the histogram buckets. Whenever we > > > write > > > we'd > > > > > just compute the bucket number and add a sample to a single histogram > > > (but > > > > still > > > > > per file name). This should still give us enough data to determine > > the > > > > > distribution of writes over time. I'm not sure whether it'll help > > > quantifying > > > > > the number of writes per client. I'm not a UMA expert though, so > > > adding > > > > > asvitkine@ to comment on the most appropriate way to capture write > > frequency > > > > > stats in UMA. > > > > > > > > That's a good idea - I think both stats would be useful and the one you > > > suggest > > > > would certainly be simpler to implement. I can't help but think that > > > stat I > > > > suggested would give us a better picture of what is actually going on > > > even > > > > though it's more complicated to implement. I'm certainly open to other > > > people's > > > > opinions here. asvitkine@? benwells@? > > > > I think the simpler option is probably easier. The only problem is that it > > will > > need to be weighted to derive a writes / minute kind of histogram, as > > things > > which are writing more frequently will add more data points. > > > > E.g. if one client is writing every 10 seconds, it will have 6 times as > > many > > data points as a client writing every minute. This will mean the histograms > > will > > give a skewed impression if read naively. > > > > However I think this is pretty easy to account for by multiplying each > > bucket by > > its average number or seconds, or something like that. We could even > > account for > > it before we write but it might be better to just have the raw data logged. > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = (stop_delay_ - > > > > > start_delay_) / commit_interval_; > > > > > This calculation doesn't work in the presence of > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a > > > better > > > choice > > > > > here? > > > > > > > > Assuming that in most cases this isn't called, I think the max value > > > still > > > makes > > > > sense. I /believe/ there should be an overflow bucket in the histogram > > > which > > > > would capture all those that go over the number of max writes. This > > > should > > > give > > > > us a good picture of what's going on. e.g. if users are consistently > > > writing > > > > more than once every 10seconds (and hitting max_writes) I think it > > > would be > > > > worrying and worth investigating further. > > > > > > > > https://codereview.chromium.org/1083603002/ > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
I think I was more concerned about having a metric which we can reason about accurately. writes/5mins allows me to look at the actual proportion of the time (as a true percentage) that we spend in each histogram bucket. I'm not sure if we can do that with the weighted metric also? My statistics aren't very good but when I thought about it, it seemed like the weighted metric would give some approximation and I'm not sure how statistically accurate that approximation would be. I guess that's what I'm worried about :) On Thu, 16 Apr 2015 at 10:41 <benwells@chromium.org> wrote: > On 2015/04/15 06:18:19, raymes wrote: > > That sounds reasonable to me. > > > Alternatively: what if we accumulate writes and report them every > 5minutes > > (writes/5mins)? We could be clever in doing this and only send histogram > > data when a write occurs. E.g. if we are doing a write and no writes have > > occurred in the previous 10minutes we could report 2 samples of 0 and > > start > > accumulating for the next 5minute period. Writes/5minutes would give a > > more > > concrete data set to interpret/understand than a weighted "time since > last > > writes" metric. > > > What do you think? > > > As an aside, I think there are 2 things we are interested in: > > 1) How often we reach the cap on on writes. E.g. how often we write more > > than once every 10 seconds. > > 2) Ensuring that when we add karma data the number of writes does not > > increase significantly. > > > I think that either of these metrics would give a sense of both of these. > > In the absence of advantages I think we should do it in the simplest way > possible. Is there an advantage to the storing it in buckets (e.g. writes/5 > mins). Is it just that we don't have to adjust from the raw data to get a > feel > for write frequency? > > Are you worried about overhead from the amount of UMA stuff we'd be > creating? My > understanding is that it is very cheap, but I could be wrong. > > asvitkine@: should we be worried about it? > > I'm also interested in whether mnissler@ has an opinion on which way to > go. > > > > Thanks! > > Raymes > > > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote: > > > > On 2015/04/15 01:45:14, benwells wrote: > > > > On 2015/04/15 01:29:01, raymes wrote: > > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote: > > > > > > Took a look at the code, but then ended up pondering high-level > > > > questions > > > > > again > > > > > > that we should probably resolve first: > > > > > > > > > > > > 1. Should we write a 0 sample for clients that don't perform a > > write > > > > in a > > > > > > bucket? Otherwise it's hard to distinguish clients that shut down > > > > before > > > the > > > > > > bucket starts from those that truly don't have any writes. > > > > > > > > > > I agree, I think we should record the write. I believe the current > > code > > > > will > > > > > record a 0 sample for clients that don't perform a write in that > > > window. > > > > > > > > > > > > > > > > > 2. A simpler approach to this could be to do a sparse histogram > > that > > > > just > > > > uses > > > > > > seconds elapsed since startup as the histogram buckets. Whenever > > we > > > > write > > > > we'd > > > > > > just compute the bucket number and add a sample to a single > > histogram > > > > (but > > > > > still > > > > > > per file name). This should still give us enough data to > determine > > > the > > > > > > distribution of writes over time. I'm not sure whether it'll help > > > > quantifying > > > > > > the number of writes per client. I'm not a UMA expert though, so > > > > adding > > > > > > asvitkine@ to comment on the most appropriate way to capture > write > > > frequency > > > > > > stats in UMA. > > > > > > > > > > That's a good idea - I think both stats would be useful and the one > > you > > > > suggest > > > > > would certainly be simpler to implement. I can't help but think > that > > > > stat I > > > > > suggested would give us a better picture of what is actually going > > on > > > > even > > > > > though it's more complicated to implement. I'm certainly open to > > other > > > > people's > > > > > opinions here. asvitkine@? benwells@? > > > > > > I think the simpler option is probably easier. The only problem is that > > it > > > will > > > need to be weighted to derive a writes / minute kind of histogram, as > > > things > > > which are writing more frequently will add more data points. > > > > > > E.g. if one client is writing every 10 seconds, it will have 6 times as > > > many > > > data points as a client writing every minute. This will mean the > > histograms > > > will > > > give a skewed impression if read naively. > > > > > > However I think this is pretty easy to account for by multiplying each > > > bucket by > > > its average number or seconds, or something like that. We could even > > > account for > > > it before we write but it might be better to just have the raw data > > logged. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = > > (stop_delay_ - > > > > > > start_delay_) / commit_interval_; > > > > > > This calculation doesn't work in the presence of > > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a > > > > better > > > > choice > > > > > > here? > > > > > > > > > > Assuming that in most cases this isn't called, I think the max > value > > > > still > > > > makes > > > > > sense. I /believe/ there should be an overflow bucket in the > > histogram > > > > which > > > > > would capture all those that go over the number of max writes. This > > > > should > > > > give > > > > > us a good picture of what's going on. e.g. if users are > consistently > > > > writing > > > > > more than once every 10seconds (and hitting max_writes) I think it > > > > would be > > > > > worrying and worth investigating further. > > > > > > > > > > > > https://codereview.chromium.org/1083603002/ > > > > > > To unsubscribe from this group and stop receiving emails from it, send an > email > > to mailto:chromium-reviews+unsubscribe@chromium.org. > > > > https://codereview.chromium.org/1083603002/ > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
Also I think writes/5mins would only be a little harder to implement (certainly not as bad as my original proposal). On Thu, 16 Apr 2015 at 10:59 Raymes Khoury <raymes@chromium.org> wrote: > I think I was more concerned about having a metric which we can reason > about accurately. writes/5mins allows me to look at the actual proportion > of the time (as a true percentage) that we spend in each histogram bucket. > I'm not sure if we can do that with the weighted metric also? My statistics > aren't very good but when I thought about it, it seemed like the weighted > metric would give some approximation and I'm not sure how statistically > accurate that approximation would be. I guess that's what I'm worried about > :) > > On Thu, 16 Apr 2015 at 10:41 <benwells@chromium.org> wrote: > >> On 2015/04/15 06:18:19, raymes wrote: >> > That sounds reasonable to me. >> >> > Alternatively: what if we accumulate writes and report them every >> 5minutes >> > (writes/5mins)? We could be clever in doing this and only send histogram >> > data when a write occurs. E.g. if we are doing a write and no writes >> have >> > occurred in the previous 10minutes we could report 2 samples of 0 and >> > start >> > accumulating for the next 5minute period. Writes/5minutes would give a >> > more >> > concrete data set to interpret/understand than a weighted "time since >> last >> > writes" metric. >> >> > What do you think? >> >> > As an aside, I think there are 2 things we are interested in: >> > 1) How often we reach the cap on on writes. E.g. how often we write more >> > than once every 10 seconds. >> > 2) Ensuring that when we add karma data the number of writes does not >> > increase significantly. >> >> > I think that either of these metrics would give a sense of both of >> these. >> >> In the absence of advantages I think we should do it in the simplest way >> possible. Is there an advantage to the storing it in buckets (e.g. >> writes/5 >> mins). Is it just that we don't have to adjust from the raw data to get a >> feel >> for write frequency? >> >> Are you worried about overhead from the amount of UMA stuff we'd be >> creating? My >> understanding is that it is very cheap, but I could be wrong. >> >> asvitkine@: should we be worried about it? >> >> I'm also interested in whether mnissler@ has an opinion on which way to >> go. >> >> >> > Thanks! >> > Raymes >> >> > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote: >> >> > > On 2015/04/15 01:45:14, benwells wrote: >> > > > On 2015/04/15 01:29:01, raymes wrote: >> > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote: >> > > > > > Took a look at the code, but then ended up pondering high-level >> > > > questions >> > > > > again >> > > > > > that we should probably resolve first: >> > > > > > >> > > > > > 1. Should we write a 0 sample for clients that don't perform a >> > write >> > > > in a >> > > > > > bucket? Otherwise it's hard to distinguish clients that shut >> down >> > > > before >> > > the >> > > > > > bucket starts from those that truly don't have any writes. >> > > > > >> > > > > I agree, I think we should record the write. I believe the current >> > code >> > > > will >> > > > > record a 0 sample for clients that don't perform a write in that >> > > window. >> > > > > >> > > > > > >> > > > > > 2. A simpler approach to this could be to do a sparse histogram >> > that >> > > > just >> > > > uses >> > > > > > seconds elapsed since startup as the histogram buckets. Whenever >> > we >> > > > write >> > > > we'd >> > > > > > just compute the bucket number and add a sample to a single >> > histogram >> > > > (but >> > > > > still >> > > > > > per file name). This should still give us enough data to >> determine >> > > the >> > > > > > distribution of writes over time. I'm not sure whether it'll >> help >> > > > quantifying >> > > > > > the number of writes per client. I'm not a UMA expert though, so >> > > > adding >> > > > > > asvitkine@ to comment on the most appropriate way to capture >> write >> > > frequency >> > > > > > stats in UMA. >> > > > > >> > > > > That's a good idea - I think both stats would be useful and the >> one >> > you >> > > > suggest >> > > > > would certainly be simpler to implement. I can't help but think >> that >> > > > stat I >> > > > > suggested would give us a better picture of what is actually going >> > on >> > > > even >> > > > > though it's more complicated to implement. I'm certainly open to >> > other >> > > > people's >> > > > > opinions here. asvitkine@? benwells@? >> > > >> > > I think the simpler option is probably easier. The only problem is >> that >> > it >> > > will >> > > need to be weighted to derive a writes / minute kind of histogram, as >> > > things >> > > which are writing more frequently will add more data points. >> > > >> > > E.g. if one client is writing every 10 seconds, it will have 6 times >> as >> > > many >> > > data points as a client writing every minute. This will mean the >> > histograms >> > > will >> > > give a skewed impression if read naively. >> > > >> > > However I think this is pretty easy to account for by multiplying each >> > > bucket by >> > > its average number or seconds, or something like that. We could even >> > > account for >> > > it before we write but it might be better to just have the raw data >> > logged. >> > > >> > > > > >> > > > > > >> > > > > >> > > >> > > >> > > >> >> >> https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... >> > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = >> > (stop_delay_ - >> > > > > > start_delay_) / commit_interval_; >> > > > > > This calculation doesn't work in the presence of >> > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a >> > > > better >> > > > choice >> > > > > > here? >> > > > > >> > > > > Assuming that in most cases this isn't called, I think the max >> value >> > > > still >> > > > makes >> > > > > sense. I /believe/ there should be an overflow bucket in the >> > histogram >> > > > which >> > > > > would capture all those that go over the number of max writes. >> This >> > > > should >> > > > give >> > > > > us a good picture of what's going on. e.g. if users are >> consistently >> > > > writing >> > > > > more than once every 10seconds (and hitting max_writes) I think it >> > > > would be >> > > > > worrying and worth investigating further. >> > > >> > > >> > > >> > > https://codereview.chromium.org/1083603002/ >> > > >> >> > To unsubscribe from this group and stop receiving emails from it, send >> an >> email >> > to mailto:chromium-reviews+unsubscribe@chromium.org. >> >> >> >> https://codereview.chromium.org/1083603002/ >> > To unsubscribe from this group and stop receiving emails from it, send an email to chromium-reviews+unsubscribe@chromium.org.
On 2015/04/16 01:05:04, raymes wrote: > Also I think writes/5mins would only be a little harder to implement > (certainly not as bad as my original proposal). > > On Thu, 16 Apr 2015 at 10:59 Raymes Khoury <mailto:raymes@chromium.org> wrote: > > > I think I was more concerned about having a metric which we can reason > > about accurately. writes/5mins allows me to look at the actual proportion > > of the time (as a true percentage) that we spend in each histogram bucket. > > I'm not sure if we can do that with the weighted metric also? My statistics > > aren't very good but when I thought about it, it seemed like the weighted > > metric would give some approximation and I'm not sure how statistically > > accurate that approximation would be. I guess that's what I'm worried about > > :) This is exactly what I was thinking when I started the discussion, and this is also why I brought in asvitkine@. If we are able to get the signal we want from a single histogram, I'd opt for that to keep things simple. If not, I'm fine with pre-aggreation on the client and sending the resulting data. I haven't thought deeply about it myself, but was hoping that somebody could help out with that and convince me :) > > > > On Thu, 16 Apr 2015 at 10:41 <mailto:benwells@chromium.org> wrote: > > > >> On 2015/04/15 06:18:19, raymes wrote: > >> > That sounds reasonable to me. > >> > >> > Alternatively: what if we accumulate writes and report them every > >> 5minutes > >> > (writes/5mins)? We could be clever in doing this and only send histogram > >> > data when a write occurs. E.g. if we are doing a write and no writes > >> have > >> > occurred in the previous 10minutes we could report 2 samples of 0 and > >> > start > >> > accumulating for the next 5minute period. Writes/5minutes would give a > >> > more > >> > concrete data set to interpret/understand than a weighted "time since > >> last > >> > writes" metric. > >> > >> > What do you think? > >> > >> > As an aside, I think there are 2 things we are interested in: > >> > 1) How often we reach the cap on on writes. E.g. how often we write more > >> > than once every 10 seconds. > >> > 2) Ensuring that when we add karma data the number of writes does not > >> > increase significantly. > >> > >> > I think that either of these metrics would give a sense of both of > >> these. > >> > >> In the absence of advantages I think we should do it in the simplest way > >> possible. Is there an advantage to the storing it in buckets (e.g. > >> writes/5 > >> mins). Is it just that we don't have to adjust from the raw data to get a > >> feel > >> for write frequency? > >> > >> Are you worried about overhead from the amount of UMA stuff we'd be > >> creating? My > >> understanding is that it is very cheap, but I could be wrong. > >> > >> asvitkine@: should we be worried about it? > >> > >> I'm also interested in whether mnissler@ has an opinion on which way to > >> go. > >> > >> > >> > Thanks! > >> > Raymes > >> > >> > On Wed, 15 Apr 2015 at 11:48 <mailto:benwells@chromium.org> wrote: > >> > >> > > On 2015/04/15 01:45:14, benwells wrote: > >> > > > On 2015/04/15 01:29:01, raymes wrote: > >> > > > > On 2015/04/14 11:58:27, Mattias Nissler wrote: > >> > > > > > Took a look at the code, but then ended up pondering high-level > >> > > > questions > >> > > > > again > >> > > > > > that we should probably resolve first: > >> > > > > > > >> > > > > > 1. Should we write a 0 sample for clients that don't perform a > >> > write > >> > > > in a > >> > > > > > bucket? Otherwise it's hard to distinguish clients that shut > >> down > >> > > > before > >> > > the > >> > > > > > bucket starts from those that truly don't have any writes. > >> > > > > > >> > > > > I agree, I think we should record the write. I believe the current > >> > code > >> > > > will > >> > > > > record a 0 sample for clients that don't perform a write in that > >> > > window. > >> > > > > > >> > > > > > > >> > > > > > 2. A simpler approach to this could be to do a sparse histogram > >> > that > >> > > > just > >> > > > uses > >> > > > > > seconds elapsed since startup as the histogram buckets. Whenever > >> > we > >> > > > write > >> > > > we'd > >> > > > > > just compute the bucket number and add a sample to a single > >> > histogram > >> > > > (but > >> > > > > still > >> > > > > > per file name). This should still give us enough data to > >> determine > >> > > the > >> > > > > > distribution of writes over time. I'm not sure whether it'll > >> help > >> > > > quantifying > >> > > > > > the number of writes per client. I'm not a UMA expert though, so > >> > > > adding > >> > > > > > asvitkine@ to comment on the most appropriate way to capture > >> write > >> > > frequency > >> > > > > > stats in UMA. > >> > > > > > >> > > > > That's a good idea - I think both stats would be useful and the > >> one > >> > you > >> > > > suggest > >> > > > > would certainly be simpler to implement. I can't help but think > >> that > >> > > > stat I > >> > > > > suggested would give us a better picture of what is actually going > >> > on > >> > > > even > >> > > > > though it's more complicated to implement. I'm certainly open to > >> > other > >> > > > people's > >> > > > > opinions here. asvitkine@? benwells@? > >> > > > >> > > I think the simpler option is probably easier. The only problem is > >> that > >> > it > >> > > will > >> > > need to be weighted to derive a writes / minute kind of histogram, as > >> > > things > >> > > which are writing more frequently will add more data points. > >> > > > >> > > E.g. if one client is writing every 10 seconds, it will have 6 times > >> as > >> > > many > >> > > data points as a client writing every minute. This will mean the > >> > histograms > >> > > will > >> > > give a skewed impression if read naively. > >> > > > >> > > However I think this is pretty easy to account for by multiplying each > >> > > bucket by > >> > > its average number or seconds, or something like that. We could even > >> > > account for > >> > > it before we write but it might be better to just have the raw data > >> > logged. > >> > > > >> > > > > > >> > > > > > > >> > > > > > >> > > > >> > > > >> > > > >> > >> > >> > https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... > >> > > > > > base/prefs/json_pref_store.cc:553: int32_t max_value = > >> > (stop_delay_ - > >> > > > > > start_delay_) / commit_interval_; > >> > > > > > This calculation doesn't work in the presence of > >> > > > > > PrefService::CommitPendingWrite(). Maybe a sparse histogram is a > >> > > > better > >> > > > choice > >> > > > > > here? > >> > > > > > >> > > > > Assuming that in most cases this isn't called, I think the max > >> value > >> > > > still > >> > > > makes > >> > > > > sense. I /believe/ there should be an overflow bucket in the > >> > histogram > >> > > > which > >> > > > > would capture all those that go over the number of max writes. > >> This > >> > > > should > >> > > > give > >> > > > > us a good picture of what's going on. e.g. if users are > >> consistently > >> > > > writing > >> > > > > more than once every 10seconds (and hitting max_writes) I think it > >> > > > would be > >> > > > > worrying and worth investigating further. > >> > > > >> > > > >> > > > >> > > https://codereview.chromium.org/1083603002/ > >> > > > >> > >> > To unsubscribe from this group and stop receiving emails from it, send > >> an > >> email > >> > to mailto:chromium-reviews+unsubscribe@chromium.org. > >> > >> > >> > >> https://codereview.chromium.org/1083603002/ > >> > > > > To unsubscribe from this group and stop receiving emails from it, send an email > to mailto:chromium-reviews+unsubscribe@chromium.org.
I implemented thea idea of recording the writes at every 5 minute interval. This means there is only a single histogram which hopefully simplifies things but it also means that the metric is more accurate. It's implemented in such a way as to avoid doing work when Chrome isn't doing any writes at all. Please let me know your thoughts, Thanks!
LGTM w/ nit https://codereview.chromium.org/1083603002/diff/80001/base/prefs/json_pref_st... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/80001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:197: write_count_histogram_.reset(new WriteCountHistogram( looks like there's no good reason to keep this in a scoped_ptr now (as opposed to just using a plain data member).
Not sure if you missed my comments on the earlier patch set, doesn't seem like they are addressed yet.
asvitkine: ptal I'll write a test before landing this though. Thanks! https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:536: ++write_count_; The design changed so this isn't an issue now. https://codereview.chromium.org/1083603002/diff/20001/base/prefs/json_pref_st... base/prefs/json_pref_store.cc:545: "Settings.JsonDataWriteCount." + spaceless_basename + "_" + On 2015/04/15 16:41:23, Alexei Svitkine wrote: > Please document these histograms in histograms.xml. You can use > histogram_suffixes entries to avoid repeating multiple histogram definitions. Done.
I just added a test :) PTAL Thanks! Raymes
https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistic... File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistic... base/metrics/statistics_recorder.h:38: static void InitializeForTest(); This sounds like Reset rather than initialize. Also, my preferred suffix is "ForTesting()" - so how about ResetForTesting(). https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:473: writes_since_last_report_++; Nit: ++writes https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... File base/prefs/json_pref_store.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... base/prefs/json_pref_store.h:140: base::Callback<base::Time(void)> get_time_func_; Can you use a base::Clock instead?
https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistic... File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/metrics/statistic... base/metrics/statistics_recorder.h:38: static void InitializeForTest(); On 2015/04/24 16:34:47, Alexei Svitkine wrote: > This sounds like Reset rather than initialize. Also, my preferred suffix is > "ForTesting()" - so how about ResetForTesting(). Done. https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:473: writes_since_last_report_++; On 2015/04/24 16:34:47, Alexei Svitkine wrote: > Nit: ++writes Done. https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... File base/prefs/json_pref_store.h (right): https://codereview.chromium.org/1083603002/diff/120001/base/prefs/json_pref_s... base/prefs/json_pref_store.h:140: base::Callback<base::Time(void)> get_time_func_; On 2015/04/24 16:34:47, Alexei Svitkine wrote: > Can you use a base::Clock instead? Ahh! This is what I was looking for!! Done.
https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic... File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic... base/metrics/statistics_recorder.h:36: // Reset the StatisticsRecorder system for testing. All existing histogram Nit: Reset -> Resets To be consistent with the docs for the function above. (Even though I realise the rest of the file is not consistent.) https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:446: JsonPrefStore::WriteCountHistogram::kHistogramWriteReportIntervalMins = 5; Add a comment about this that this should not be changed without renaming the histograms - else it would create incompatible buckets. https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:491: time_since_last_report = current_time - last_report_time_; This logic is hard to follow. I think the effect is that it logs |writes_since_last_report_| once for the current interval and then records 0 a bunch of extra times corresponding to how many extra intervals elapsed. If that's the effect, how about expressing that logic differently so that it's easier to understand that that's what's going on (otherwise it's very difficult - it took me maybe 5 minutes to work this through with the current loop), maybe something like: if (time_since_last_report <= report_interval_) return; histogram->Add(writes_since_last_report_); int num_additional_intervals_elapsed = /* calculation */; for (int i = 0; i < num_additional_intervals_elapsed; i++) histogram->Add(0); writes_since_last_report_ = 0; last_report_time_ = /* ... */; https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:506: int32_t max_value = report_interval_ / commit_interval_; Add a DCHECK() that report_interval_ > commit_interval_? https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... File base/prefs/json_pref_store_unittest.cc (right): https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store_unittest.cc:34: void SetCurrentTimeInMinutes(base::SimpleTestClock* clock, double minutes) { Nit: Modifiable parameters should be last.
Thanks Alexei! https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic... File base/metrics/statistics_recorder.h (right): https://codereview.chromium.org/1083603002/diff/140001/base/metrics/statistic... base/metrics/statistics_recorder.h:36: // Reset the StatisticsRecorder system for testing. All existing histogram On 2015/04/27 20:21:20, Alexei Svitkine wrote: > Nit: Reset -> Resets > > To be consistent with the docs for the function above. (Even though I realise > the rest of the file is not consistent.) Done. https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... File base/prefs/json_pref_store.cc (right): https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:446: JsonPrefStore::WriteCountHistogram::kHistogramWriteReportIntervalMins = 5; On 2015/04/27 20:21:20, Alexei Svitkine wrote: > Add a comment about this that this should not be changed without renaming the > histograms - else it would create incompatible buckets. I did something different, but along the same lines. See the comment and DCHECKs below :) https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:491: time_since_last_report = current_time - last_report_time_; On 2015/04/27 20:21:20, Alexei Svitkine wrote: > This logic is hard to follow. > > I think the effect is that it logs |writes_since_last_report_| once for the > current interval and then records 0 a bunch of extra times corresponding to how > many extra intervals elapsed. > > If that's the effect, how about expressing that logic differently so that it's > easier to understand that that's what's going on (otherwise it's very difficult > - it took me maybe 5 minutes to work this through with the current loop), maybe > something like: > > if (time_since_last_report <= report_interval_) > return; > histogram->Add(writes_since_last_report_); > int num_additional_intervals_elapsed = /* calculation */; > for (int i = 0; i < num_additional_intervals_elapsed; i++) > histogram->Add(0); > writes_since_last_report_ = 0; > last_report_time_ = /* ... */; Done. https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store.cc:506: int32_t max_value = report_interval_ / commit_interval_; I decided to do something more safe. I DCHECKed the exact values of max_value and num_buckets to ensure that if they change, someone will update the histogram. Most histograms hard code these values so I think it's ok and provides a good tradeoff: it still expresses how we arrived at those numbers in the code and the DCHECKs provide safety. https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... File base/prefs/json_pref_store_unittest.cc (right): https://codereview.chromium.org/1083603002/diff/140001/base/prefs/json_pref_s... base/prefs/json_pref_store_unittest.cc:34: void SetCurrentTimeInMinutes(base::SimpleTestClock* clock, double minutes) { On 2015/04/27 20:21:20, Alexei Svitkine wrote: > Nit: Modifiable parameters should be last. Done.
LGTM!
The CQ bit was checked by raymes@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mnissler@chromium.org Link to the patchset: https://codereview.chromium.org/1083603002/#ps180001 (title: " ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1083603002/180001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: win_chromium_compile_dbg_ng on tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_comp...)
Patchset #11 (id:200001) has been deleted
The CQ bit was checked by raymes@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org, mnissler@chromium.org Link to the patchset: https://codereview.chromium.org/1083603002/#ps220001 (title: " ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1083603002/220001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by raymes@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from asvitkine@chromium.org, mnissler@chromium.org Link to the patchset: https://codereview.chromium.org/1083603002/#ps240001 (title: " ")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1083603002/240001
Message was sent while issue was closed.
Committed patchset #12 (id:240001)
Message was sent while issue was closed.
Patchset 12 (id:??) landed as https://crrev.com/bfb910ab2937285271031b257f3ba248afff6fdc Cr-Commit-Position: refs/heads/master@{#327446} |