|
|
Created:
3 years, 9 months ago by shaktisahu Modified:
3 years, 9 months ago CC:
chromium-reviews, asanka, jam, darin-cc_chromium.org, asvitkine+watch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
DescriptionRecording download mime types for normal profile
BUG=699807
Review-Url: https://codereview.chromium.org/2758453003
Cr-Commit-Position: refs/heads/master@{#459344}
Committed: https://chromium.googlesource.com/chromium/src/+/d87debc9b133cb364f9b307d9643ebb7c5bd7fb6
Patch Set 1 #
Total comments: 2
Patch Set 2 : started and completed count #Patch Set 3 : Mime type breakdown #
Total comments: 4
Patch Set 4 : asanka@ comments #Patch Set 5 : Fixed test #Patch Set 6 : Rebase #
Total comments: 8
Patch Set 7 : Renamed histogam and added new metrics for new download #
Total comments: 2
Patch Set 8 : comments #
Messages
Total messages: 48 (29 generated)
Description was changed from ========== Recording download mime types for normal profile BUG=699807 ========== to ========== Recording download mime types for normal profile BUG=699807 ==========
shaktisahu@chromium.org changed reviewers: + asanka@chromium.org
asanka@ - PTAL. I have a question below. https://codereview.chromium.org/2758453003/diff/1/content/browser/download/do... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/1/content/browser/download/do... content/browser/download/download_item_impl.cc:1261: RecordDownloadMimeType(mime_type_); asanka@ - I moved this from DownloadRequestCore::OnResponseStarted as I don't have access to IsOffTheRecord() over there. I can move this to the beginning of the function also. However I think, compared to the older location, I am missing two return points inside DownloadResourceHandler::OnStart and DownloadManagerImpl::StartDownloadWithId. Will that cause significant change in the metrics behavior? Other option would be add a completely new metrics such as Download.Start.ContentType and its counterpart for this purpose.
There's a mismatch in counts as noted below, but that discrepancy should be miniscule compared to the noise introduced by recording these metrics for all network requests handled by downloads. The latter includes all network requests that were initiated for resumption. For partial resumption, there's no requirement that the Content-Type match the original content type since it's not possible for a subrange of a file to have a different Content-Type. Roughly 13% of partial resumption responses include a different Content-Disposition, for comparison. It's possible that a significant percentage doesn't include a Content-Type. In that sense, the metrics are already too noisy to be useful. It's worth considering what you want to measure. * If you want a break down in terms of the *count* of attempted downloads, then we should only count new downloads. Ignore Content-Type and similar metrics for resumption attempts. * If you want a break down in terms of *size* of downloads, then we can: * measure the expected size for new downloads against their content type (noise introduced by downloads of unknown size, interruptions, and Content-Length mismatches), or * measure the count of bytes written against the download's content type (noise introduced by validation failures, downloads that never complete, interruptions that result in the temporary file being discarded), or * measure the size vs. content type for completed downloads (no noise, but the metric depends on how good Chrome is at driving a download to completion). Any of the above is a change in what's being measured, so will warrant new metric names. https://codereview.chromium.org/2758453003/diff/1/content/browser/download/do... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/1/content/browser/download/do... content/browser/download/download_item_impl.cc:1261: RecordDownloadMimeType(mime_type_); On 2017/03/16 23:39:49, shaktisahu wrote: > asanka@ - I moved this from DownloadRequestCore::OnResponseStarted as I don't > have access to IsOffTheRecord() over there. I can move this to the beginning of > the function also. However I think, compared to the older location, I am missing > two return points inside DownloadResourceHandler::OnStart and > DownloadManagerImpl::StartDownloadWithId. Will that cause significant change in > the metrics behavior? > Other option would be add a completely new metrics such as > Download.Start.ContentType and its counterpart for this purpose. This is fine. The early return at DownloadResourceHandler::OnStart() is only taken if the result is a cancellation. The original location wouldn't have recorded UMA for that case anyway. The only discrepancy is the one where the server response indicates a success, but DownloadManagerImpl throws the response away before we get here because the download was cancelled between issuing the request and receiving the response. The incidence rate of that should be within noise. Also, that case would only happen for resumption. I'll address the implications of this as a top level comment.
shaktisahu@chromium.org changed reviewers: + dtrainor@chromium.org
shaktisahu@chromium.org changed reviewers: + dahlke@chromium.org
+dahlke@ to the conversation for the top level comment from asanka@. I think we just need the *count* of new downloads attempted for normal profile. We don't need to collect size/total bytes written/content-type etc for normal profile since most of them have caveats of being noisy. Considering that I am thinking of adding to the Download.Counts metrics one or two new enum entries, e.g. "started for normal profile", "completed for normal profile". I believe that should be enough. dahlke@ - wdyt?
On 2017/03/20 20:16:58, shaktisahu wrote: > +dahlke@ to the conversation for the top level comment from asanka@. > > I think we just need the *count* of new downloads attempted for normal profile. > We don't need to collect size/total bytes written/content-type etc for normal > profile since most of them have caveats of being noisy. > Considering that I am thinking of adding to the Download.Counts metrics one or > two new enum entries, e.g. "started for normal profile", "completed for normal > profile". > I believe that should be enough. dahlke@ - wdyt? Had offline conversation with dahlke@. We need the the total number of new downloads being attempted (and their mime types if possible). I will modify the CL accordingly.
On 2017/03/20 21:21:03, shaktisahu wrote: > On 2017/03/20 20:16:58, shaktisahu wrote: > > +dahlke@ to the conversation for the top level comment from asanka@. > > > > I think we just need the *count* of new downloads attempted for normal > profile. > > We don't need to collect size/total bytes written/content-type etc for normal > > profile since most of them have caveats of being noisy. > > Considering that I am thinking of adding to the Download.Counts metrics one or > > two new enum entries, e.g. "started for normal profile", "completed for normal > > profile". > > I believe that should be enough. dahlke@ - wdyt? > > Had offline conversation with dahlke@. > We need the the total number of new downloads being attempted (and their mime > types if possible). > I will modify the CL accordingly. PTAL.
lgtm % asanka
https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... content/browser/download/download_item_impl.cc:1163: if (GetWebContents() && For a new single resource download, Init() gets called before Start(). So at this point, the DownloadItemImpl doesn't have a request handle. The latter is what binds the download to a WebContents. I.e. at this point GetWebContents() will always fail. Since you just need to check if this is OTR, you can safely call DownloadItemImpl::GetBrowserContext() during Init(). So your condition would be: if (!GetBrowserContext()->IsOffTheRecord()) Even with this, I'd recommend moving this to Start(). Init() is called for SavePackage downloads for which we are not collecting UMA elsewhere. So your counts here are going to slightly inconsistent with those taken after Start(). https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... content/browser/download/download_item_impl.cc:1268: !GetWebContents()->GetBrowserContext()->IsOffTheRecord()) { Call DownloadItemImpl::GetBrowserContext() directly. Here and elsewhere. It's much more reliable. Also, are you okay with this recording the MIME type for all download requests regardless of whether it's a resumption attempt or a new download? You could use (state_ == RESUMING_INTERNAL) to differentiate between the two cases.
And LGTM with the comments addressed and assuming you are okay with the caveats I pointed out.
The CQ bit was checked by shaktisahu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks asanka@. I made the suggested changes. https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... content/browser/download/download_item_impl.cc:1163: if (GetWebContents() && On 2017/03/22 21:03:02, asanka wrote: > For a new single resource download, Init() gets called before Start(). So at > this point, the DownloadItemImpl doesn't have a request handle. The latter is > what binds the download to a WebContents. I.e. at this point GetWebContents() > will always fail. > > Since you just need to check if this is OTR, you can safely call > DownloadItemImpl::GetBrowserContext() during Init(). So your condition would be: > > if (!GetBrowserContext()->IsOffTheRecord()) > > Even with this, I'd recommend moving this to Start(). Init() is called for > SavePackage downloads for which we are not collecting UMA elsewhere. So your > counts here are going to slightly inconsistent with those taken after Start(). Done. https://codereview.chromium.org/2758453003/diff/40001/content/browser/downloa... content/browser/download/download_item_impl.cc:1268: !GetWebContents()->GetBrowserContext()->IsOffTheRecord()) { On 2017/03/22 21:03:03, asanka wrote: > Call DownloadItemImpl::GetBrowserContext() directly. Here and elsewhere. It's > much more reliable. > > Also, are you okay with this recording the MIME type for all download requests > regardless of whether it's a resumption attempt or a new download? You could use > (state_ == RESUMING_INTERNAL) to differentiate between the two cases. Done. Thanks. I am interested in new downloads only. INITIAL_INTERNAL should filter that out. Also I moved the START_COUNT metrics from Init() to here. It will be a bit different from historical data since it doesn't include save package data, but that should be for the better I guess :)
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_tsan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by shaktisahu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_rel_...)
The CQ bit was checked by shaktisahu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presub...)
The CQ bit was checked by shaktisahu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1266: RecordDownloadCount(START_COUNT); START_COUNT is probably better if recorded for all regular download requests (i.e. don't filter START_COUNT based on INITIAL_INTERNAL. Instead just record it each time we hit Start()). I think it's OK to move the metric here so that we don't count save package requests assuming that the noise introduced by save package requests was small. However, if we limit it to new downloads only, then we are removing all the resumption attempts from START_COUNT (roughly 2% overall, and 5% change on mobile). This is a larger change that would make it challenging to interpret UMA without carefully filtering out the old and new data. This is likely to confuse lots of folks who look at these stats. One possibility is to add a new count, NEW_DOWNLOAD_COUNT or somesuch that's recorded here. That way it'll be easier to measure our success rate (i.e. COMPLETED / NEW_DOWNLOAD_COUNT or (COMPLETED + CANCELLED) / NEW_DOWNLOAD_COUNT depending on who you ask :-)). CUrrently we have to do some arithmetic to get at success rates. https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1267: RecordDownloadMimeType(mime_type_); Ditto here, but in this case it's probably easier to just change the name of the metric so that old and new data doesn't get mixed. I agree that recording the MIME type here is the right way to go. HTTP 206 responses aren't guaranteed to have the same MIME type as the original response and we don't MIME sniff resumption responses.
https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1266: RecordDownloadCount(START_COUNT); On 2017/03/23 14:34:43, asanka wrote: > START_COUNT is probably better if recorded for all regular download requests > (i.e. don't filter START_COUNT based on INITIAL_INTERNAL. Instead just record it > each time we hit Start()). I think it's OK to move the metric here so that we > don't count save package requests assuming that the noise introduced by save > package requests was small. > > However, if we limit it to new downloads only, then we are removing all the > resumption attempts from START_COUNT (roughly 2% overall, and 5% change on > mobile). This is a larger change that would make it challenging to interpret UMA > without carefully filtering out the old and new data. This is likely to confuse > lots of folks who look at these stats. > > One possibility is to add a new count, NEW_DOWNLOAD_COUNT or somesuch that's > recorded here. That way it'll be easier to measure our success rate (i.e. > COMPLETED / NEW_DOWNLOAD_COUNT or (COMPLETED + CANCELLED) / NEW_DOWNLOAD_COUNT > depending on who you ask :-)). CUrrently we have to do some arithmetic to get at > success rates. Oh, I misunderstood the comments of Init() that |active| is true for new downloads only, but it is not :-) So START_COUNT includes the resumptions already... I will move this line (and the START_COUNT_NORMAL_PROFILE metrics) before the INITIAL_INTERNAL check. Not sure if it is required to add the NEW_DOWNLOAD_COUNT metrics, but sounds like a good idea if there is no such metrics today. How do we calculate the number of new downloads (or number of resumption attempts that you mentioned 2%) today? https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1267: RecordDownloadMimeType(mime_type_); On 2017/03/23 14:34:43, asanka wrote: > Ditto here, but in this case it's probably easier to just change the name of the > metric so that old and new data doesn't get mixed. I agree that recording the > MIME type here is the right way to go. HTTP 206 responses aren't guaranteed to > have the same MIME type as the original response and we don't MIME sniff > resumption responses. Are you saying it is okay to remove the old metric name (Download.ContentType) altogether and replace it by the new name?
https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1266: RecordDownloadCount(START_COUNT); On 2017/03/23 15:33:28, shaktisahu wrote: > On 2017/03/23 14:34:43, asanka wrote: > > START_COUNT is probably better if recorded for all regular download requests > > (i.e. don't filter START_COUNT based on INITIAL_INTERNAL. Instead just record > it > > each time we hit Start()). I think it's OK to move the metric here so that we > > don't count save package requests assuming that the noise introduced by save > > package requests was small. > > > > However, if we limit it to new downloads only, then we are removing all the > > resumption attempts from START_COUNT (roughly 2% overall, and 5% change on > > mobile). This is a larger change that would make it challenging to interpret > UMA > > without carefully filtering out the old and new data. This is likely to > confuse > > lots of folks who look at these stats. > > > > One possibility is to add a new count, NEW_DOWNLOAD_COUNT or somesuch that's > > recorded here. That way it'll be easier to measure our success rate (i.e. > > COMPLETED / NEW_DOWNLOAD_COUNT or (COMPLETED + CANCELLED) / NEW_DOWNLOAD_COUNT > > depending on who you ask :-)). CUrrently we have to do some arithmetic to get > at > > success rates. > > Oh, I misunderstood the comments of Init() that |active| is true for new > downloads only, but it is not :-) > So START_COUNT includes the resumptions already... > I will move this line (and the START_COUNT_NORMAL_PROFILE metrics) before the > INITIAL_INTERNAL check. > Not sure if it is required to add the NEW_DOWNLOAD_COUNT metrics, but sounds > like a good idea if there is no such metrics today. > How do we calculate the number of new downloads (or number of resumption > attempts that you mentioned 2%) today? Count of all resumptions = Count of (Download.Source["initiated by manual resumption"]) + Count of (Download.Source["Initiated by automatic resumption"]) Count of new downloads = Count of (Download.Counts["STARTED"]) - count of all resumptions. https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1267: RecordDownloadMimeType(mime_type_); On 2017/03/23 15:33:28, shaktisahu wrote: > On 2017/03/23 14:34:43, asanka wrote: > > Ditto here, but in this case it's probably easier to just change the name of > the > > metric so that old and new data doesn't get mixed. I agree that recording the > > MIME type here is the right way to go. HTTP 206 responses aren't guaranteed to > > have the same MIME type as the original response and we don't MIME sniff > > resumption responses. > > Are you saying it is okay to remove the old metric name (Download.ContentType) > altogether and replace it by the new name? We need to keep the old one in histograms.xml since older builds of Chrome will continue to report the old metric. It just needs to be annotated as <obsolete> in the xml file. In C++ land, we can change the metrics recording logic to record to a new histogram name.
The CQ bit was checked by shaktisahu@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... File content/browser/download/download_item_impl.cc (right): https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1266: RecordDownloadCount(START_COUNT); On 2017/03/23 16:23:45, asanka wrote: > On 2017/03/23 15:33:28, shaktisahu wrote: > > On 2017/03/23 14:34:43, asanka wrote: > > > START_COUNT is probably better if recorded for all regular download requests > > > (i.e. don't filter START_COUNT based on INITIAL_INTERNAL. Instead just > record > > it > > > each time we hit Start()). I think it's OK to move the metric here so that > we > > > don't count save package requests assuming that the noise introduced by save > > > package requests was small. > > > > > > However, if we limit it to new downloads only, then we are removing all the > > > resumption attempts from START_COUNT (roughly 2% overall, and 5% change on > > > mobile). This is a larger change that would make it challenging to interpret > > UMA > > > without carefully filtering out the old and new data. This is likely to > > confuse > > > lots of folks who look at these stats. > > > > > > One possibility is to add a new count, NEW_DOWNLOAD_COUNT or somesuch that's > > > recorded here. That way it'll be easier to measure our success rate (i.e. > > > COMPLETED / NEW_DOWNLOAD_COUNT or (COMPLETED + CANCELLED) / > NEW_DOWNLOAD_COUNT > > > depending on who you ask :-)). CUrrently we have to do some arithmetic to > get > > at > > > success rates. > > > > Oh, I misunderstood the comments of Init() that |active| is true for new > > downloads only, but it is not :-) > > So START_COUNT includes the resumptions already... > > I will move this line (and the START_COUNT_NORMAL_PROFILE metrics) before the > > INITIAL_INTERNAL check. > > Not sure if it is required to add the NEW_DOWNLOAD_COUNT metrics, but sounds > > like a good idea if there is no such metrics today. > > How do we calculate the number of new downloads (or number of resumption > > attempts that you mentioned 2%) today? > > Count of all resumptions = Count of (Download.Source["initiated by manual > resumption"]) + Count of (Download.Source["Initiated by automatic resumption"]) > > Count of new downloads = Count of (Download.Counts["STARTED"]) - count of all > resumptions. Done. https://codereview.chromium.org/2758453003/diff/100001/content/browser/downlo... content/browser/download/download_item_impl.cc:1267: RecordDownloadMimeType(mime_type_); On 2017/03/23 16:23:45, asanka wrote: > On 2017/03/23 15:33:28, shaktisahu wrote: > > On 2017/03/23 14:34:43, asanka wrote: > > > Ditto here, but in this case it's probably easier to just change the name of > > the > > > metric so that old and new data doesn't get mixed. I agree that recording > the > > > MIME type here is the right way to go. HTTP 206 responses aren't guaranteed > to > > > have the same MIME type as the original response and we don't MIME sniff > > > resumption responses. > > > > Are you saying it is okay to remove the old metric name (Download.ContentType) > > altogether and replace it by the new name? > > We need to keep the old one in histograms.xml since older builds of Chrome will > continue to report the old metric. It just needs to be annotated as <obsolete> > in the xml file. > > In C++ land, we can change the metrics recording logic to record to a new > histogram name. Done.
shaktisahu@chromium.org changed reviewers: + isherman@chromium.org
isherman@ - PTAL
LGTM. Comments are nits. Feel free to ignore :-) https://codereview.chromium.org/2758453003/diff/120001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2758453003/diff/120001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:13221: + Deprecated 03/2017. Add a note here pointing at Download.Start.ContentType as the new hotness. https://codereview.chromium.org/2758453003/diff/120001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:13770: + <summary>Content types of the downloads that are started.</summary> I'd leave out the "that are started" part. In isolation it's confusing. At some point it's probably worth going through and making it explicit that some metrics are per request while some others are per download. The concepts diverged when resumption rolled out and we didn't clean up the descriptions to keep up. My bad. :-(
Metrics LGTM. FWIW, I think we agreed with the privacy team on an email thread that it's equally fine to measure Incognito usage directly or indirectly (and in both cases there's a pretty high bar to meet for gathering Incognito metrics). So, if you'd prefer to structure these metrics as Normal vs. Incognito rather than Normal vs. Total, that would also be ok.
The CQ bit was checked by shaktisahu@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dtrainor@chromium.org, asanka@chromium.org, isherman@chromium.org Link to the patchset: https://codereview.chromium.org/2758453003/#ps140001 (title: "comments")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2017/03/24 00:44:57, Ilya Sherman wrote: > Metrics LGTM. FWIW, I think we agreed with the privacy team on an email thread > that it's equally fine to measure Incognito usage directly or indirectly (and in > both cases there's a pretty high bar to meet for gathering Incognito metrics). > So, if you'd prefer to structure these metrics as Normal vs. Incognito rather > than Normal vs. Total, that would also be ok. Thanks isherman@. Could you forward the email thread to me for reference?
CQ is committing da patch. Bot data: {"patchset_id": 140001, "attempt_start_ts": 1490320437583510, "parent_rev": "343c2e8b5e562909edb991a235eae0926212d019", "commit_rev": "d87debc9b133cb364f9b307d9643ebb7c5bd7fb6"}
Message was sent while issue was closed.
Description was changed from ========== Recording download mime types for normal profile BUG=699807 ========== to ========== Recording download mime types for normal profile BUG=699807 Review-Url: https://codereview.chromium.org/2758453003 Cr-Commit-Position: refs/heads/master@{#459344} Committed: https://chromium.googlesource.com/chromium/src/+/d87debc9b133cb364f9b307d9643... ==========
Message was sent while issue was closed.
Committed patchset #8 (id:140001) as https://chromium.googlesource.com/chromium/src/+/d87debc9b133cb364f9b307d9643... |