|
|
Created:
3 years, 7 months ago by romax Modified:
3 years, 6 months ago Reviewers:
aelias_OOO_until_Jul13, jochen (gone - plz use gerrit), Steven Holte, fgorski, dcheng, qinmin, Pete Williamson, chili, Łukasz Anforowicz CC:
blink-reviews, blink-reviews-api_chromium.org, blink-reviews-frames_chromium.org, chromium-reviews, creis+watch_chromium.org, darin-cc_chromium.org, dcheng, dglazkov+blink, David Trainor- moved to gerrit, jam, kinuko+watch, mlamouri+watch-content_chromium.org, mlamouri+watch-blink_chromium.org, nasko+codewatch_chromium.org Target Ref:
refs/heads/master Project:
chromium Visibility:
Public. |
Description[Offline Pages] Adding missing image/CSS detection in FrameSerializer.
Adding interfaces of problem detectors which are going to be used for UMA
collecting for offline pages.
See go/offline-pages-problems-metric for more information.
BUG=730233
Review-Url: https://codereview.chromium.org/2886943003
Cr-Commit-Position: refs/heads/master@{#479180}
Committed: https://chromium.googlesource.com/chromium/src/+/df3595bb1db14d038879823a24b60e9e284c861c
Patch Set 1 #
Total comments: 6
Patch Set 2 : comments from petewil@ #Patch Set 3 : unittests. #Patch Set 4 : unittests (merged with TOT) #
Total comments: 2
Patch Set 5 : comments from dcheng@. #
Total comments: 5
Patch Set 6 : Added missing call to VisitingDone, and fix merge. #Patch Set 7 : Remove interfaces and insert code into iteration. #Patch Set 8 : Remove interfaces and add code into iteration. #
Total comments: 3
Patch Set 9 : comments from pete #Patch Set 10 : reducing the number of buckets of percentage to 21. #
Total comments: 17
Patch Set 11 : Addressed comments from lucasza@ and holte@. #
Total comments: 2
Patch Set 12 : more comments. #
Total comments: 2
Patch Set 13 : Change to linear histogram. #Patch Set 14 : fixing build. #Patch Set 15 : Fixing hitting DCHECK. #Patch Set 16 : fix tests. #Patch Set 17 : fix tests. #Patch Set 18 : trying fix test and minor bugs. #Patch Set 19 : finally fixed. #Messages
Total messages: 79 (46 generated)
romax@chromium.org changed reviewers: + chili@chromium.org, fgorski@chromium.org, petewil@chromium.org
This is the drafty CL I mentioned during the BlankPageDetector sync-up. Any comments are welcomed! PTAL, thanks!
Looks pretty good, a few comments. https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebFrameSerializer.cpp:120: std::unique_ptr<BlankPageDetector> blank_page_detector( Let's get rid of the blank page detector for now, we may implement it elsewhere (perhaps on the browser side, perhaps on load instead of on save). https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebPageProblemDetector.cpp (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebPageProblemDetector.cpp:56: void MissingImageDetector::VisitNode(const Node* node) {} Let's move these into their own cpp files. https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebPageProblemDetector.h (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebPageProblemDetector.h:79: class MissingImageDetector : public WebPageProblemDetector { Let's move this and the CSS detector into their own header files (and remove the blank page detector class).
Removed BlankPageDetector. Moved MissingImage/CSS detector to separate files. PTAnL, thanks. https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebFrameSerializer.cpp:120: std::unique_ptr<BlankPageDetector> blank_page_detector( On 2017/05/17 00:31:48, Pete Williamson wrote: > Let's get rid of the blank page detector for now, we may implement it elsewhere > (perhaps on the browser side, perhaps on load instead of on save). Done. https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebPageProblemDetector.cpp (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebPageProblemDetector.cpp:56: void MissingImageDetector::VisitNode(const Node* node) {} On 2017/05/17 00:31:48, Pete Williamson wrote: > Let's move these into their own cpp files. Done. https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebPageProblemDetector.h (right): https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebPageProblemDetector.h:79: class MissingImageDetector : public WebPageProblemDetector { On 2017/05/17 00:31:48, Pete Williamson wrote: > Let's move this and the CSS detector into their own header files (and remove the > blank page detector class). Done.
On 2017/05/17 22:28:19, romax wrote: > Removed BlankPageDetector. > Moved MissingImage/CSS detector to separate files. > PTAnL, thanks. > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > third_party/WebKit/Source/web/WebFrameSerializer.cpp:120: > std::unique_ptr<BlankPageDetector> blank_page_detector( > On 2017/05/17 00:31:48, Pete Williamson wrote: > > Let's get rid of the blank page detector for now, we may implement it > elsewhere > > (perhaps on the browser side, perhaps on load instead of on save). > > Done. > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > File third_party/WebKit/Source/web/WebPageProblemDetector.cpp (right): > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > third_party/WebKit/Source/web/WebPageProblemDetector.cpp:56: void > MissingImageDetector::VisitNode(const Node* node) {} > On 2017/05/17 00:31:48, Pete Williamson wrote: > > Let's move these into their own cpp files. > > Done. > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > File third_party/WebKit/Source/web/WebPageProblemDetector.h (right): > > https://codereview.chromium.org/2886943003/diff/1/third_party/WebKit/Source/w... > third_party/WebKit/Source/web/WebPageProblemDetector.h:79: class > MissingImageDetector : public WebPageProblemDetector { > On 2017/05/17 00:31:48, Pete Williamson wrote: > > Let's move this and the CSS detector into their own header files (and remove > the > > blank page detector class). > > Done. Looks good overall. Can we add some unit tests?
Added 2 unit tests for the collection. PTAL! If there's no major problem then I'll send this for OWNER review. Thanks!
Description was changed from ========== [DRAFT][Offline Pages] Adding base interfaces for Problem Detectors. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. BUG=NONE ========== to ========== [Offline Pages] Adding base interfaces for Problem Detectors. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. BUG=NONE ==========
lgtm
romax@chromium.org changed reviewers: + dcheng@chromium.org, dglazkov@chromium.org, qinmin@chromium.org
qinmin@chromium.org: Please review changes in content/browser/download dglazkov@chromium.org: Please review changes in content/renderer and third_party/WebKit/ dcheng@chromium.org: Please review content/common/frame_messages.h chili@, fgorski@: FYI The design doc is at: go/offline-pages-problems-metric in case you need more context. PTAL, thanks!
Description was changed from ========== [Offline Pages] Adding base interfaces for Problem Detectors. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. BUG=NONE ========== to ========== [Offline Pages] Adding base interfaces for Problem Detectors. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=NONE ==========
Can you give comment access to the design doc? https://codereview.chromium.org/2886943003/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:123: new MissingImageDetector()); Nit: prefer auto missing_image_detector = WTF::MakeUnique<...>(...); Even better: problem_detectors_.AddDetector( WTF::MakeUnique<MissingCSSDetector>()); problem_detectors_.AddDetector( WTF::MakeUnique<MissingImageDetector>());
Just changed the doc to give comment access. Sorry about that.
romax@chromium.org changed reviewers: - dglazkov@chromium.org
romax@chromium.org changed reviewers: + aelias@chromium.org
Addressed dcheng@'s comment, PTAnL. also -dglazkov@. +aelias@ for content/renderer change. dcheng@, please review WebKit/ as well. Thanks! https://codereview.chromium.org/2886943003/diff/60001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/60001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:123: new MissingImageDetector()); On 2017/05/23 05:52:41, dcheng wrote: > Nit: prefer auto missing_image_detector = WTF::MakeUnique<...>(...); > > Even better: > > problem_detectors_.AddDetector( > WTF::MakeUnique<MissingCSSDetector>()); > problem_detectors_.AddDetector( > WTF::MakeUnique<MissingImageDetector>()); Done. https://codereview.chromium.org/2886943003/diff/80001/content/browser/downloa... File content/browser/download/mhtml_generation_manager.cc (right): https://codereview.chromium.org/2886943003/diff/80001/content/browser/downloa... content/browser/download/mhtml_generation_manager.cc:613: << file_path.value(); 'git cl format' change.
Mostly looks OK. Please solve the visiting done call and add a corresponding bug. https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:304: delegate_.VisitNode(node); where is the corresponding call to visiting done? https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:102: void VisitNode(const Node*) override; should we also have visiting done here?
content/browser/download lgtm
content/renderer lgtm
addressed comments from fgorski@. dcheng@ could you please take another look? Happy to chat if you have any concerns. Thanks! https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:304: delegate_.VisitNode(node); On 2017/05/30 17:37:50, fgorski wrote: > where is the corresponding call to visiting done? Done. https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/80001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:102: void VisitNode(const Node*) override; On 2017/05/30 17:37:50, fgorski wrote: > should we also have visiting done here? Done.
I like that it's easy to detect serialization issues--but it seems unnecessary to visit every single node for these. AFAIK, we already have special handling for images/CSS--maybe it would be simpler just to have that logic inline there? WDYT?
A few comments on the new approach. https://codereview.chromium.org/2886943003/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:329: missing_image_count++; I think it is better to count requested vs completed images, as opposed to total and missing - the concepts are a bit more concrete, and should make maintenance a bit easier. Same comment for CSS below. https://codereview.chromium.org/2886943003/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:365: "PageSerialization.ProblemDetection.MissingImagePercentage", I think that stating this in the positive will be slightly easier to understand and maintain - CompletedImagePercentage vs MissingImagePercentage. Then the calculation would be (completed_images/requested_images) * 100 Same for CSS. https://codereview.chromium.org/2886943003/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:367: UMA_HISTOGRAM_COUNTS_100( Is 100 buckets too granular? Maybe about 20 buckets, each 5% would be easier to reason about, and easier to read in the UMA histogram data. One bucket should be exactly 0 completion%, one exactly 100% completion, and others in between running about 5% of the completion range each.
romax@chromium.org changed reviewers: + holte@google.com
Removed the interfaces and added the logic in the iteration of nodes. PTAnL. +holte@ Please review histograms.xml Thanks!
Description was changed from ========== [Offline Pages] Adding base interfaces for Problem Detectors. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=NONE ========== to ========== [Offline Pages] Adding missing image/CSS detection in FrameSerializer. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=NONE ==========
Description was changed from ========== [Offline Pages] Adding missing image/CSS detection in FrameSerializer. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=NONE ========== to ========== [Offline Pages] Adding missing image/CSS detection in FrameSerializer. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=730233 ==========
dcheng@chromium.org changed reviewers: + lukasza@chromium.org
+lukasza, do you mind reviewing this and I'll rubberstamp?
This seems like useful data to collect - thanks! I've just had a few comments and questions below. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:307: delegate_.ShouldCollectProblemMetric() && frame.IsMainFrame(); You are obviously in a better position than me to decide whether you need data for subframes, but it seems weird to me that we don't care if a CSS or image is missing in a subframe. FWIW, I've also commented in the design doc. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:328: if (image_element.complete()) I see that FrameSerializer::AddImageToResources checks whether to serialize the image, by looking at: |!image || !image->HasImage() || image->ErrorOccurred() || !ShouldAddURL(url)|. Will HTMLImageElement::complete always be in-sync with the checks done by AddImageToResources? If not, then maybe we can consider moving the counting logic to the body of AddImageToResources? (my working mental model for these metrics is that we want to count how many MHTML parts containing images we generate VS how many such parts we would have generated if we could wait for all images to finish loading; please shout if this is not the right model). https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:330: } Have you thought about avoiding counting resources skipped by |delegate_.ShouldSkipResourceWithURL(...)|? Could you please comment on that? I think that if we start gathering data for both main frame and for subframes, then avoiding counting of skipping resources would be required to avoid double-counting resources that are included from more than 1 frame. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:357: loaded_css_count++; "loading" vs "loaded" seems out of sync - can you double-check and confirm that this shouldn't be |!sheet->IsLoading()|? Also - I assume that you've manually tested the new metrics, as described in https://chromium.googlesource.com/chromium/src.git/+/HEAD/tools/metrics/histo... ? https://codereview.chromium.org/2886943003/diff/180001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2886943003/diff/180001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:50063: + Percentage of loaded images in the main frame at the time of serialization. It seems that "loaded" (above) vs "missing" (in the name of the histogram) are in conflict.
holte@chromium.org changed reviewers: + holte@chromium.org
https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:368: static_cast<int64_t>(loaded_image_count * 100 / total_image_count), 21); You are using 21 buckets, but these buckets are not 5% wide, you are just going to be recording everything over 20% into the overflow bucket, which doesn't seem like what you want. To get 5% wide buckets, you probably need to use base::LinearHistogram::FactoryGet, or UMA_HISTOGRAM_CUSTOM_ENUMERATION.
On 2017/06/07 22:36:42, Steven Holte wrote: > https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... > File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): > > https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... > third_party/WebKit/Source/core/frame/FrameSerializer.cpp:368: > static_cast<int64_t>(loaded_image_count * 100 / total_image_count), 21); > You are using 21 buckets, but these buckets are not 5% wide, you are just going > to be recording everything over 20% into the overflow bucket, which doesn't seem > like what you want. To get 5% wide buckets, you probably need to use > base::LinearHistogram::FactoryGet, or UMA_HISTOGRAM_CUSTOM_ENUMERATION. BTW: I wonder how hard it would be to add some base::HistogramTester-based test assertions to a small subset of the existing mhtml tests? SavePageSitePerProcessBrowserTest.SaveAsMHTML in particular might be useful for getting coverage for deduping the counts (based on the fact that this test was tweaked when adding mhtml part deduping logic in https://codereview.chromium.org/1417323006/patch/300001/310001). Disclaimer: I don't have much experience using base::HistogramTester (and in particular don't know if it is useful for testing UMA logged from a renderer process).
romax@chromium.org changed reviewers: - holte@google.com
Addressed comments, PTAnL. Thanks! https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:307: delegate_.ShouldCollectProblemMetric() && frame.IsMainFrame(); On 2017/06/07 21:11:35, Łukasz A. wrote: > You are obviously in a better position than me to decide whether you need data > for subframes, but it seems weird to me that we don't care if a CSS or image is > missing in a subframe. FWIW, I've also commented in the design doc. I think it was mainly because in the offline page cases we care more about main frame, and the subframes/iframes are mostly useless/ads, as commented by petewil@ in the doc. So i'd like to keep the main frame check. Is it reasonable to you? https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:328: if (image_element.complete()) On 2017/06/07 21:11:35, Łukasz A. wrote: > I see that FrameSerializer::AddImageToResources checks whether to serialize the > image, by looking at: > |!image || !image->HasImage() || image->ErrorOccurred() || !ShouldAddURL(url)|. > Will HTMLImageElement::complete always be in-sync with the checks done by > AddImageToResources? If not, then maybe we can consider moving the counting > logic to the body of AddImageToResources? > > (my working mental model for these metrics is that we want to count how many > MHTML parts containing images we generate VS how many such parts we would have > generated if we could wait for all images to finish loading; please shout if > this is not the right model). Done. Your mental model is correct. And I wasn't sure if ImageElement::complete() is the right choice. I think moving the logic into the sub-function definitely makes more sense. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:330: } On 2017/06/07 21:11:35, Łukasz A. wrote: > Have you thought about avoiding counting resources skipped by > |delegate_.ShouldSkipResourceWithURL(...)|? Could you please comment on that? > I think that if we start gathering data for both main frame and for subframes, > then avoiding counting of skipping resources would be required to avoid > double-counting resources that are included from more than 1 frame. I didn't think about it before seeing comments. And it seems like after moving the count logic into AddImageToResources and SerializeCSSStyleSheet, the resources which should be skipped will not be counted, so I think we're good in this case. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:357: loaded_css_count++; On 2017/06/07 21:11:35, Łukasz A. wrote: > "loading" vs "loaded" seems out of sync - can you double-check and confirm that > this shouldn't be |!sheet->IsLoading()|? > > Also - I assume that you've manually tested the new metrics, as described in > https://chromium.googlesource.com/chromium/src.git/+/HEAD/tools/metrics/histo... > ? Done. https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:368: static_cast<int64_t>(loaded_image_count * 100 / total_image_count), 21); On 2017/06/07 22:36:42, Steven Holte wrote: > You are using 21 buckets, but these buckets are not 5% wide, you are just going > to be recording everything over 20% into the overflow bucket, which doesn't seem > like what you want. To get 5% wide buckets, you probably need to use > base::LinearHistogram::FactoryGet, or UMA_HISTOGRAM_CUSTOM_ENUMERATION. Done. Changed to CustomCountHistogram in Blink, since classes in base:: cannot be used directly in Blink::Core. However i'm not sure if the change will split the bucket into 5% ones. Otherwise I'll have to either change to 1% buckets using HISTOGRAM_PERCENTAGE, or adding the linear histograms in Blink. Am I missing anything? https://codereview.chromium.org/2886943003/diff/180001/tools/metrics/histogra... File tools/metrics/histograms/histograms.xml (right): https://codereview.chromium.org/2886943003/diff/180001/tools/metrics/histogra... tools/metrics/histograms/histograms.xml:50063: + Percentage of loaded images in the main frame at the time of serialization. On 2017/06/07 21:11:35, Łukasz A. wrote: > It seems that "loaded" (above) vs "missing" (in the name of the histogram) are > in conflict. Done.
https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:307: delegate_.ShouldCollectProblemMetric() && frame.IsMainFrame(); On 2017/06/08 02:25:22, romax wrote: > On 2017/06/07 21:11:35, Łukasz A. wrote: > > You are obviously in a better position than me to decide whether you need data > > for subframes, but it seems weird to me that we don't care if a CSS or image > is > > missing in a subframe. FWIW, I've also commented in the design doc. > > I think it was mainly because in the offline page cases we care more about main > frame, and the subframes/iframes are mostly useless/ads, as commented by > petewil@ in the doc. So i'd like to keep the main frame check. Is it reasonable > to you? Sounds good. One final thought - how would the names of the metrics look like if (hypothetically) we wanted to add tracking of 1) subframe problems or 2) overall problems? One option would be to have PageSerialization.ProblemDetection.TotalImageCount.AllFrames, and PageSerialization.ProblemDetection.TotalImageCount.Subframes in the future. Another option would be to insert "MainFrame" substring somewhere in the name of the metrics being introduced in the current CL (e.g. maybe PageSerialization.ProblemDetection.MainFrame.TotalImageCount?). https://codereview.chromium.org/2886943003/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:521: total_image_count_++; I wonder how the data will be calculated in case of an html with the same image appearing twice (e.g. <img src="same-uri.jpg"> <img src="same-uri.jpg">). AFAIK the second occurence of the image will exit early via ShouldAddURL, and therefore I think this might double-count |total_image_count_|, while still correctly counting |loaded_image_count_| (and therefore artificially reporting a problem rate of 50% in this example). Does the concern above make sense? Maybe this can be solved by moving processing of |total_xxx_count_| below ShouldAddURL (here and for css)?
thanks for the detailed comments, replied and addressed in the latest patch. PTAL. Thanks! https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:307: delegate_.ShouldCollectProblemMetric() && frame.IsMainFrame(); On 2017/06/08 16:08:45, Łukasz A. wrote: > On 2017/06/08 02:25:22, romax wrote: > > On 2017/06/07 21:11:35, Łukasz A. wrote: > > > You are obviously in a better position than me to decide whether you need > data > > > for subframes, but it seems weird to me that we don't care if a CSS or image > > is > > > missing in a subframe. FWIW, I've also commented in the design doc. > > > > I think it was mainly because in the offline page cases we care more about > main > > frame, and the subframes/iframes are mostly useless/ads, as commented by > > petewil@ in the doc. So i'd like to keep the main frame check. Is it > reasonable > > to you? > > Sounds good. One final thought - how would the names of the metrics look like > if (hypothetically) we wanted to add tracking of 1) subframe problems or 2) > overall problems? One option would be to have > PageSerialization.ProblemDetection.TotalImageCount.AllFrames, and > PageSerialization.ProblemDetection.TotalImageCount.Subframes in the future. > Another option would be to insert "MainFrame" substring somewhere in the name of > the metrics being introduced in the current CL (e.g. maybe > PageSerialization.ProblemDetection.MainFrame.TotalImageCount?). Adding suffixes sounds good to me, and it is something that can be added later when we decide to record counting for sub frames. Other than ordering there doesn't seem to have much difference. Or do you want me to add the suffix right now? https://codereview.chromium.org/2886943003/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:521: total_image_count_++; On 2017/06/08 16:08:45, Łukasz A. wrote: > I wonder how the data will be calculated in case of an html with the same image > appearing twice (e.g. <img src="same-uri.jpg"> <img src="same-uri.jpg">). AFAIK > the second occurence of the image will exit early via ShouldAddURL, and > therefore I think this might double-count |total_image_count_|, while still > correctly counting |loaded_image_count_| (and therefore artificially reporting a > problem rate of 50% in this example). > > Does the concern above make sense? Maybe this can be solved by moving > processing of |total_xxx_count_| below ShouldAddURL (here and for css)? Sorry for lacking knowledge about processing HTMLs... I think your concern totally makes sense. I've separated the check which contains ShouldAddURL since image->ErrorOccurred() means it's loaded but failed to decode/load. I think the code after change is a little bit ugly, but that's what I can do I guess (without changing too much code). And for the CSS one, seems like moving the total_count++ below the check makes sense.
https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:368: static_cast<int64_t>(loaded_image_count * 100 / total_image_count), 21); On 2017/06/08 02:25:22, romax wrote: > On 2017/06/07 22:36:42, Steven Holte wrote: > > You are using 21 buckets, but these buckets are not 5% wide, you are just > going > > to be recording everything over 20% into the overflow bucket, which doesn't > seem > > like what you want. To get 5% wide buckets, you probably need to use > > base::LinearHistogram::FactoryGet, or UMA_HISTOGRAM_CUSTOM_ENUMERATION. > > Done. > Changed to CustomCountHistogram in Blink, since classes in base:: cannot be used > directly in Blink::Core. However i'm not sure if the change will split the > bucket into 5% ones. Otherwise I'll have to either change to 1% buckets using > HISTOGRAM_PERCENTAGE, or adding the linear histograms in Blink. Am I missing > anything? CustomCountHistogram is also not going to give you 5% wide histograms. It base::Histogram, which uses exponential spacing, instead of linear spacing. Another option is to just record the ventiles instead of percentiles with EXACT_LINEAR (*20 instead of *100), but either of the options you suggested seem fine to me too.
lgtm (with an optional question/comment below). https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:307: delegate_.ShouldCollectProblemMetric() && frame.IsMainFrame(); On 2017/06/08 20:02:07, romax wrote: > On 2017/06/08 16:08:45, Łukasz A. wrote: > > On 2017/06/08 02:25:22, romax wrote: > > > On 2017/06/07 21:11:35, Łukasz A. wrote: > > > > You are obviously in a better position than me to decide whether you need > > data > > > > for subframes, but it seems weird to me that we don't care if a CSS or > image > > > is > > > > missing in a subframe. FWIW, I've also commented in the design doc. > > > > > > I think it was mainly because in the offline page cases we care more about > > main > > > frame, and the subframes/iframes are mostly useless/ads, as commented by > > > petewil@ in the doc. So i'd like to keep the main frame check. Is it > > reasonable > > > to you? > > > > Sounds good. One final thought - how would the names of the metrics look like > > if (hypothetically) we wanted to add tracking of 1) subframe problems or 2) > > overall problems? One option would be to have > > PageSerialization.ProblemDetection.TotalImageCount.AllFrames, and > > PageSerialization.ProblemDetection.TotalImageCount.Subframes in the future. > > Another option would be to insert "MainFrame" substring somewhere in the name > of > > the metrics being introduced in the current CL (e.g. maybe > > PageSerialization.ProblemDetection.MainFrame.TotalImageCount?). > > Adding suffixes sounds good to me, and it is something that can be added later > when we decide to record counting for sub frames. Other than ordering there > doesn't seem to have much difference. Or do you want me to add the suffix right > now? I don't have a strong opinion - I am fine with whatever approach works for you. https://codereview.chromium.org/2886943003/diff/220001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/220001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:392: if (should_collect_problem_metric_) { Should this say if (should_collect_prolem_metric && !is_inline_css) ? This question assumes that you are interested in success rate for external resources, not for style sheets embedded directly in a html document (i.e. inside <style> tags). (sorry for not catching this in the earlier round of reviews :-/)
RS LGTM with lukasza's comments addressed
romax@chromium.org changed reviewers: + jochen@chromium.org
Addressed the last comment from lukasza@. Also added the linear histogram methods in WebKit/Platform in order to make 5% buckets to report. holte@ PTAL the usage of linear histograms, +jochen@ PTAL WebKit/Source/platform Thanks! https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:368: static_cast<int64_t>(loaded_image_count * 100 / total_image_count), 21); On 2017/06/08 20:43:00, Steven Holte wrote: > On 2017/06/08 02:25:22, romax wrote: > > On 2017/06/07 22:36:42, Steven Holte wrote: > > > You are using 21 buckets, but these buckets are not 5% wide, you are just > > going > > > to be recording everything over 20% into the overflow bucket, which doesn't > > seem > > > like what you want. To get 5% wide buckets, you probably need to use > > > base::LinearHistogram::FactoryGet, or UMA_HISTOGRAM_CUSTOM_ENUMERATION. > > > > Done. > > Changed to CustomCountHistogram in Blink, since classes in base:: cannot be > used > > directly in Blink::Core. However i'm not sure if the change will split the > > bucket into 5% ones. Otherwise I'll have to either change to 1% buckets using > > HISTOGRAM_PERCENTAGE, or adding the linear histograms in Blink. Am I missing > > anything? > > CustomCountHistogram is also not going to give you 5% wide histograms. It > base::Histogram, which uses exponential spacing, instead of linear spacing. > Another option is to just record the ventiles instead of percentiles with > EXACT_LINEAR (*20 instead of *100), but either of the options you suggested seem > fine to me too. > > I've added the LinearHistogram in Webkit/Platforms, so i'm using the linear histogram with 21 buckets(min/max: 1/100) which I think is good for the use case. https://codereview.chromium.org/2886943003/diff/220001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2886943003/diff/220001/third_party/WebKit/Sou... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:392: if (should_collect_problem_metric_) { On 2017/06/08 20:48:54, Łukasz A. wrote: > Should this say > if (should_collect_prolem_metric && !is_inline_css) > ? This question assumes that you are interested in success rate for external > resources, not for style sheets embedded directly in a html document (i.e. > inside <style> tags). > > (sorry for not catching this in the earlier round of reviews :-/) Done.
histogram linear use lgtm
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_compile_dbg_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_android_rel_ng on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/linux_androi...)
lgtm
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: android_n5x_swarming_rel on master.tryserver.chromium.android (JOB_FAILED, https://build.chromium.org/p/tryserver.chromium.android/builders/android_n5x_...)
The CQ bit was checked by romax@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Serialization changes still lgtm - thanks!
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by romax@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from petewil@chromium.org, aelias@chromium.org, qinmin@chromium.org, dcheng@chromium.org, holte@chromium.org, jochen@chromium.org Link to the patchset: https://codereview.chromium.org/2886943003/#ps360001 (title: "finally fixed.")
CQ is trying da patch. Follow status at: https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 360001, "attempt_start_ts": 1497392422938160, "parent_rev": "7938f5d3da218f7d59feddd5cebda6079fd8f09b", "commit_rev": "def15aeecaa2189243abff33cc582e8443e4d6c8"}
CQ is committing da patch. Bot data: {"patchset_id": 360001, "attempt_start_ts": 1497392422938160, "parent_rev": "35001f810af1f93e170667b570b6f38f9f358ad6", "commit_rev": "df3595bb1db14d038879823a24b60e9e284c861c"}
Message was sent while issue was closed.
Description was changed from ========== [Offline Pages] Adding missing image/CSS detection in FrameSerializer. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=730233 ========== to ========== [Offline Pages] Adding missing image/CSS detection in FrameSerializer. Adding interfaces of problem detectors which are going to be used for UMA collecting for offline pages. See go/offline-pages-problems-metric for more information. BUG=730233 Review-Url: https://codereview.chromium.org/2886943003 Cr-Commit-Position: refs/heads/master@{#479180} Committed: https://chromium.googlesource.com/chromium/src/+/df3595bb1db14d038879823a24b6... ==========
Message was sent while issue was closed.
Committed patchset #19 (id:360001) as https://chromium.googlesource.com/chromium/src/+/df3595bb1db14d038879823a24b6... |