|
|
DescriptionAdd Precache.Fetch.MinWeight UMA.
This collects the distribution of resource weights that are actually
reached before hitting the byte caps. This helps us tune the value of
PrecacheConfigurationSettings.min_weight.
BUG=696035
Review-Url: https://codereview.chromium.org/2711473006
Cr-Commit-Position: refs/heads/master@{#453059}
Committed: https://chromium.googlesource.com/chromium/src/+/5f9a48e44f1257972cb83947355c7298555ccc94
Patch Set 1 #
Total comments: 11
Patch Set 2 : Replace emplace with operator= and report 0 in edge case. #Patch Set 3 : Track min_weight_fetched in unfinished_work, and add tests. #
Total comments: 12
Patch Set 4 : Rebase. #Patch Set 5 : Comment clarifications. #
Total comments: 2
Depends on Patchset: Messages
Total messages: 29 (10 generated)
twifkak@chromium.org changed reviewers: + jamartin@chromium.org, rkaplow@chromium.org
https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... File components/precache/core/precache_fetcher.cc (right): https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:426: min_weight_fetched.value() * 1000); BTW, the bucket boundaries will look approximately like this: irb(main):005:0> (0..50).map {|i| (Math.exp(Math.log(1) + i * (Math.log(1000) - Math.log(1)) / 50.0) / 1000.0).round(4) } => [0.001, 0.0011, 0.0013, 0.0015, 0.0017, 0.002, 0.0023, 0.0026, 0.003, 0.0035, 0.004, 0.0046, 0.0052, 0.006, 0.0069, 0.0079, 0.0091, 0.0105, 0.012, 0.0138, 0.0158, 0.0182, 0.0209, 0.024, 0.0275, 0.0316, 0.0363, 0.0417, 0.0479, 0.055, 0.0631, 0.0724, 0.0832, 0.0955, 0.1096, 0.1259, 0.1445, 0.166, 0.1905, 0.2188, 0.2512, 0.2884, 0.3311, 0.3802, 0.4365, 0.5012, 0.5754, 0.6607, 0.7586, 0.871, 1.0]
https://codereview.chromium.org/2711473006/diff/1/components/precache/content... File components/precache/content/precache_manager.cc (right): https://codereview.chromium.org/2711473006/diff/1/components/precache/content... components/precache/content/precache_manager.cc:276: unfinished_work->resource_size(), base::nullopt); When does this case happen? When we haven't precached everything that we should nor reached any of the caps yet the precaching session has lasted at least 6 hours? If yes, I wonder if we should still report the min_weight, since it is still representative of the last resource precached by this client. https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... File components/precache/core/precache_fetcher.cc (right): https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:589: !resources_to_fetch_.empty()) If resources_to_fetch_.empty(), shouldn't you report 0? https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:593: min_weight); I wonder if we may want to report also the rank distribution that was precached across hosts. I.e., what is the min, max, avg rank that we have precached across all top hosts. The reason is that, since we don't know the user's top host frequency, which is used to compare the weights across hosts, we are missing part of the picture by only having min_weight and not the rank distribution. For instance, it can happen that the min_weight is .007 (in host a) yet we left out a resource with a weight of .42 (in host b) because of a high difference in host frequency. The min_weight tells us something, but the rank tells us something else which includes additional data unseen by min_weight. https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:604: min_weight.emplace(0); (nit/opt) I prefer min_weight = 0.
No tests?
twifkak@chromium.org changed reviewers: + bengr@chromium.org
Patchset #2 (id:20001) has been deleted
lgtm histogram lg https://codereview.chromium.org/2711473006/diff/1/tools/metrics/histograms/hi... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2711473006/diff/1/tools/metrics/histograms/hi... tools/metrics/histograms/histograms.xml:50635: +<histogram name="Precache.Fetch.MinWeight" units="thousandths"> kind of an odd unit - should it be thousandth of resource weight? What is weight actually here - some internal metric?
I've held off on writing tests for patch 2 for a couple reasons: 1. I realized that the case where the prefetch completes, but was merely capped by total_resources_count, was kind of important. We shouldn't just send 0 in this case. This'll require a new proto in unfinished work. 2. Before I make that change, we should discuss whether it should be min weight, min weight ratio, max rank, or some other thing. https://codereview.chromium.org/2711473006/diff/1/components/precache/content... File components/precache/content/precache_manager.cc (right): https://codereview.chromium.org/2711473006/diff/1/components/precache/content... components/precache/content/precache_manager.cc:276: unfinished_work->resource_size(), base::nullopt); On 2017/02/22 22:54:07, jamartin wrote: > When does this case happen? When we haven't precached everything that we should > nor reached any of the caps yet the precaching session has lasted at least 6 > hours? If yes, I wonder if we should still report the min_weight, since it is > still representative of the last resource precached by this client. Yes. The problem is that the weights aren't in the unfinished_work (just the weight_ratios), so I'd need to add an API hook and some extra work to compute that. I presume that the fraction of precaches that hit this timeout is tiny enough that this isn't worth the bother. https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... File components/precache/core/precache_fetcher.cc (right): https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:589: !resources_to_fetch_.empty()) On 2017/02/22 22:54:07, jamartin wrote: > If resources_to_fetch_.empty(), shouldn't you report 0? Yes. This is a bit of an edge case, I think. If the precache completes without going over the byte cap, then this won't be run and instead the call at line 605 will. The edge case will happen, though, if the byte cap is hit on the very last resource. https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:593: min_weight); On 2017/02/22 22:54:07, jamartin wrote: > I wonder if we may want to report also the rank distribution that was precached > across hosts. I.e., what is the min, max, avg rank that we have precached across > all top hosts. > > The reason is that, since we don't know the user's top host frequency, which is > used to compare the weights across hosts, we are missing part of the picture by > only having min_weight and not the rank distribution. > > For instance, it can happen that the min_weight is .007 (in host a) yet we left > out a resource with a weight of .42 (in host b) because of a high difference in > host frequency. The min_weight tells us something, but the rank tells us > something else which includes additional data unseen by min_weight. The problem is that rank 23 for host A may mean something completely different from host B, if they have different distributions of weight_ratio. (Say host A has 50 common JS files, and host B has only 5, and then another 50 rare ones.) I think we can combine History.MonthlyVisitCount and History.TopHostsVisitsByRank [1] to get an approximate top host frequency for a site at each position in top hosts, and then divide MinWeight by that to get a minimum weight_ratio as a function of where the site sits in the user's top hosts ranking. But perhaps that's too much calculation on top of UMA. I could return the minimum weight_ratio rather than the minimum weight. Given that the fetches are only partially ordered by weight_ratio, this seems weird. But I don't have anything against it other than that. If you're looking to come up with a conservative server-side threshold, would this work? [1] https://uma.googleplex.com/p/chrome/histograms/?endDate=20170221&dayCount=1&h... https://codereview.chromium.org/2711473006/diff/1/components/precache/core/pr... components/precache/core/precache_fetcher.cc:604: min_weight.emplace(0); On 2017/02/22 22:54:07, jamartin wrote: > (nit/opt) I prefer min_weight = 0. Me too. Somehow I missed operator= when scanning the include file. https://codereview.chromium.org/2711473006/diff/1/tools/metrics/histograms/hi... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2711473006/diff/1/tools/metrics/histograms/hi... tools/metrics/histograms/histograms.xml:50635: +<histogram name="Precache.Fetch.MinWeight" units="thousandths"> On 2017/02/23 04:02:14, rkaplow (slow) wrote: > kind of an odd unit - should it be thousandth of resource weight? What is weight > actually here - some internal metric? Yes it's thousandth of resource weight (since we don't support real-valued histos, AFAICT). Resource weight is a funny internal thing, and its meaning depends on the value of resource_weight_function in the experiment config: 1. If FUNCTION_NAIVE, then it's the expected number of page visits in the next 30 days needing that resource. 2. If NUNCTION_GEOMETRIC, then it's the probability of a page visit in the next 30 days needing that resource. I can rename the unit if you like, but I assumed it needed to be something shortish for the UMA UI.
Discussed offline. Summary: - No, or very few users, should reach the end of the activity flow without reaching any of the caps, so there is no place where we should report "everything precached", or what I meant for "0". - It is marginally interesting to report which cap in particular did the user reach. This, however, is fit for a different CL. - It is marginally interesting to report the rank distribution, but again in a different CL. - The current implementation allows us to fine tune a client config parameter. The only missing bit are tests.
jamartin: PTAL. I added fields to track this value in unfinished work, which allows us to report it in two additional cases: hitting the total_resources_count cap, and hitting the deadline. I added tests. It may be simpler to look at the diff from base than from the last patch set; this change simplified the code a bit.
Please file a bug and refer to it in this CL. It makes it much easier to understand and track changes going forward.
Add a bug ID. Otherwise lgtm. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... File components/precache/core/proto/precache.proto (right): https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:25: // weight_ratio and TopHost.visits. Populated only in PrecacheUnfinishedWork. Does this also range from 0.0 to 1.0 with higher values being more important? Say so. https://codereview.chromium.org/2711473006/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2711473006/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:50640: + byte cap, but not if it was cancelled due to an ill-formed manifest or If you prefer American English, the spelling is canceled. :)
https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... File components/precache/core/precache_fetcher_unittest.cc (right): https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/precache_fetcher_unittest.cc:1147: EXPECT_EQ(kNumResources - 2, url_callback_.requested_urls().size()); If it is config (1), manifest (1) and all but 3 resources (kNumResources - 3), shouldn't you EXPECT_EQ(kNumResources - 3 + 1 + 1, ...? https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/precache_fetcher_unittest.cc:1155: good_manifest.resource(kNumResources - 3).weight_ratio(); good_manifest.resource.weight_ratio is the weight ratio as served from the backend. I thought you wanted to report weight_ratio * <either the visit-frequency of the top-host or the probability of visiting the top-host>. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... File components/precache/core/proto/precache.proto (right): https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:26: optional double weight = 4; This also depends on resource_weight_function, isn't it? If so, please say so. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:26: optional double weight = 4; I'm wondering if it makes sense to have distinct weight_naive and weight_geometric or a resource_weight_function enum specifying the kind of weight. For instance, if we have users using both weight functions, the UMA is going to be quite messy. I guess we are fine, since UMA would still allow us to slice and dice and this proto is not meant for long-time storage, but it feels wrong having the same proto field for two related but different things. If you decide to have a fully identified weight, maybe the way would be to have a submessage, so future changes are slightly less painful. message Weight { optional ResourceWeightFunction function = 1; optional double weight = 2; } (or maybe weight_naive and weight_geometric?).
Description was changed from ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us to tune our server-side weight thresholds. BUG= ========== to ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us to tune our server-side weight thresholds. BUG=696035 ==========
https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... File components/precache/core/precache_fetcher_unittest.cc (right): https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/precache_fetcher_unittest.cc:1147: EXPECT_EQ(kNumResources - 2, url_callback_.requested_urls().size()); On 2017/02/24 21:23:29, jamartin wrote: > If it is config (1), manifest (1) and all but 3 resources (kNumResources - 3), > shouldn't you EXPECT_EQ(kNumResources - 3 + 1 + 1, ...? Good point. Added a comment, and clarified the math. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/precache_fetcher_unittest.cc:1155: good_manifest.resource(kNumResources - 3).weight_ratio(); On 2017/02/24 21:23:29, jamartin wrote: > good_manifest.resource.weight_ratio is the weight ratio as served from the > backend. I thought you wanted to report weight_ratio * <either the > visit-frequency of the top-host or the probability of visiting the top-host>. Per line 1096, top_host->visits() == 1, up until line 1139, when the std::move() invalidates top_host. I added a `* 1` for you. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... File components/precache/core/proto/precache.proto (right): https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:25: // weight_ratio and TopHost.visits. Populated only in PrecacheUnfinishedWork. On 2017/02/24 20:58:54, bengr wrote: > Does this also range from 0.0 to 1.0 with higher values being more important? > Say so. Done. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:26: optional double weight = 4; On 2017/02/24 21:23:29, jamartin wrote: > This also depends on resource_weight_function, isn't it? If so, please say so. Done. https://codereview.chromium.org/2711473006/diff/60001/components/precache/cor... components/precache/core/proto/precache.proto:26: optional double weight = 4; On 2017/02/24 21:23:29, jamartin wrote: > I'm wondering if it makes sense to have distinct weight_naive and > weight_geometric or a resource_weight_function enum specifying the kind of > weight. I don't think that's necessary. resource_weight_function is configured in the config URL, which is stable for a given field trial group. So we just need to slice by group. Which we need to do anyways. https://codereview.chromium.org/2711473006/diff/60001/tools/metrics/histogram... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2711473006/diff/60001/tools/metrics/histogram... tools/metrics/histograms/histograms.xml:50640: + byte cap, but not if it was cancelled due to an ill-formed manifest or On 2017/02/24 20:58:54, bengr wrote: > If you prefer American English, the spelling is canceled. :) This is one of the cases where I prefer the British spelling (like theatre and advertisement), but I'm hapy to spel it howevr you wan.
lgtm
lgtm
https://codereview.chromium.org/2711473006/diff/100001/components/precache/co... File components/precache/core/precache_fetcher_unittest.cc (right): https://codereview.chromium.org/2711473006/diff/100001/components/precache/co... components/precache/core/precache_fetcher_unittest.cc:1230: // reason, we are seeing it fetch all but 4 resources. Meh, close enough. Are you sure is close enough?
Thanks for the reviews. I'll send to CQ as soon as my previous CL lands. https://codereview.chromium.org/2711473006/diff/100001/components/precache/co... File components/precache/core/precache_fetcher_unittest.cc (right): https://codereview.chromium.org/2711473006/diff/100001/components/precache/co... components/precache/core/precache_fetcher_unittest.cc:1230: // reason, we are seeing it fetch all but 4 resources. Meh, close enough. On 2017/02/25 02:12:47, jamartin wrote: > Are you sure is close enough? I'd put my certainty at 99%. In any case, this problem preexisted this CL, so I don't feel compelled to fix it here. I added a bug for it: https://bugs.chromium.org/p/chromium/issues/detail?id=696118
The CQ bit was checked by twifkak@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rkaplow@chromium.org, bengr@chromium.org Link to the patchset: https://codereview.chromium.org/2711473006/#ps100001 (title: "Comment clarifications.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Description was changed from ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us to tune our server-side weight thresholds. BUG=696035 ========== to ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us tune the value of PrecacheConfigurationSettings.min_weight. BUG=696035 ==========
Description was changed from ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us tune the value of PrecacheConfigurationSettings.min_weight. BUG=696035 ========== to ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us tune the value of PrecacheConfigurationSettings.min_weight. BUG=696035 ==========
CQ is committing da patch. Bot data: {"patchset_id": 100001, "attempt_start_ts": 1487991317853160, "parent_rev": "c05388c6fc37ef12cf3b11df405c8dbc631b8f70", "commit_rev": "5f9a48e44f1257972cb83947355c7298555ccc94"}
Message was sent while issue was closed.
Description was changed from ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us tune the value of PrecacheConfigurationSettings.min_weight. BUG=696035 ========== to ========== Add Precache.Fetch.MinWeight UMA. This collects the distribution of resource weights that are actually reached before hitting the byte caps. This helps us tune the value of PrecacheConfigurationSettings.min_weight. BUG=696035 Review-Url: https://codereview.chromium.org/2711473006 Cr-Commit-Position: refs/heads/master@{#453059} Committed: https://chromium.googlesource.com/chromium/src/+/5f9a48e44f1257972cb83947355c... ==========
Message was sent while issue was closed.
Committed patchset #5 (id:100001) as https://chromium.googlesource.com/chromium/src/+/5f9a48e44f1257972cb83947355c... |