Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(33)

Issue 1411603004: Discard hidden articles when using fast path (Closed)

Created:
5 years, 2 months ago by marcelorcorrea
Modified:
4 years, 6 months ago
Reviewers:
mdjones, wychen
CC:
mlongaray, dalmirsilva
Base URL:
https://github.com/chromium/dom-distiller.git@master
Target Ref:
refs/heads/master
Visibility:
Public.

Description

Discard hidden articles when using fast path Dom Distiller returns an empty content string for certain webpages. The shortcut logic that Dom Distiller uses when some pages have one or more <article> or <element itemscope itemtype="Article"> can lead to erroneous content. Some web pages have hidden articles, and these are being considered as valid elements. In order to fix this, when Dom Distiller is searching for article tags, it will now discard hidden article elements. Contributors=marcelorcorrea@hp.com, mlongaray@hp.com BUG=544962, 616954 R=mdjones@chromium.org, wychen@chromium.org Committed: 54d05ba208e431963af489785edf036f9fda3cb7

Patch Set 1 #

Total comments: 1

Patch Set 2 : Fixed inconsistent indentation #

Total comments: 5

Patch Set 3 : Comments addressed #

Total comments: 1

Patch Set 4 : Fixed indentation. #

Patch Set 5 : Added accurate visibility test #

Total comments: 1

Patch Set 6 : wychen's comments addressed #

Total comments: 5

Patch Set 7 : Comments addressed & master rebased #

Total comments: 18

Patch Set 8 : Method and unit tests names changed #

Total comments: 7

Patch Set 9 : isVisibleByOffset() method improved. #

Total comments: 10

Patch Set 10 : nit fixed #

Total comments: 1

Patch Set 11 : nitpick fixed 2 #

Total comments: 2

Patch Set 12 : nitpick 3 fixed #

Total comments: 4

Patch Set 13 : nit fixed 4 #

Unified diffs Side-by-side diffs Delta from patch set Stats (+277 lines, -117 lines) Patch
M java/org/chromium/distiller/ContentExtractor.java View 1 2 3 4 5 1 chunk +1 line, -23 lines 0 comments Download
M java/org/chromium/distiller/DomUtil.java View 1 2 3 4 5 6 7 8 9 10 11 2 chunks +56 lines, -5 lines 0 comments Download
M javatests/org/chromium/distiller/ContentExtractorTest.java View 1 2 3 4 5 6 1 chunk +0 lines, -85 lines 0 comments Download
M javatests/org/chromium/distiller/DomUtilTest.java View 1 2 3 4 5 6 7 8 9 10 11 12 4 chunks +211 lines, -4 lines 0 comments Download
M javatests/org/chromium/distiller/TestUtil.java View 2 chunks +9 lines, -0 lines 0 comments Download

Messages

Total messages: 50 (4 generated)
marcelorcorrea
5 years, 2 months ago (2015-10-21 16:31:34 UTC) #1
wychen
Thanks for the patch. Can you provide some example web sites that have hidden articles? ...
5 years, 2 months ago (2015-10-21 17:21:20 UTC) #2
marcelorcorrea
On 2015/10/21 17:21:20, wychen wrote: > Thanks for the patch. Can you provide some example ...
5 years, 2 months ago (2015-10-21 17:33:13 UTC) #3
mdjones
https://codereview.chromium.org/1411603004/diff/20001/java/org/chromium/distiller/ContentExtractor.java File java/org/chromium/distiller/ContentExtractor.java (right): https://codereview.chromium.org/1411603004/diff/20001/java/org/chromium/distiller/ContentExtractor.java#newcode194 java/org/chromium/distiller/ContentExtractor.java:194: for (int i = 0; i < nodeList.getLength(); i ...
5 years, 2 months ago (2015-10-21 18:15:03 UTC) #4
marcelorcorrea
comments addressed
5 years, 2 months ago (2015-10-21 18:46:11 UTC) #5
wychen
https://codereview.chromium.org/1411603004/diff/40001/javatests/org/chromium/distiller/ContentExtractorTest.java File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1411603004/diff/40001/javatests/org/chromium/distiller/ContentExtractorTest.java#newcode654 javatests/org/chromium/distiller/ContentExtractorTest.java:654: "<h1>" + CONTENT_TEXT + "</h1>" + nit: You could ...
5 years, 2 months ago (2015-10-21 21:00:46 UTC) #6
marcelorcorrea
Fixed. I'm very sorry for the indentation issues. :\
5 years, 2 months ago (2015-10-22 12:29:39 UTC) #7
wychen
On 2015/10/21 17:33:13, marcelorcorrea wrote: > On 2015/10/21 17:21:20, wychen wrote: > > Thanks for ...
5 years, 1 month ago (2015-10-27 23:15:12 UTC) #8
wychen
On 2015/10/27 23:15:12, wychen wrote: > The isVisible() checking is fast but not accurate. For ...
5 years, 1 month ago (2015-10-27 23:54:47 UTC) #9
marcelorcorrea
Hello Wychen. I've never seen this case you mentioned, but I think this is something ...
5 years, 1 month ago (2015-10-28 20:33:33 UTC) #10
wychen
On 2015/10/28 20:33:33, marcelorcorrea wrote: > Hello Wychen. > > I've never seen this case ...
5 years, 1 month ago (2015-10-28 21:24:05 UTC) #11
marcelorcorrea
Actually I tried to add the offsetWidth/Height check into the isVisible() method, but it broke ...
5 years, 1 month ago (2015-10-29 15:36:58 UTC) #12
wychen
On 2015/10/29 15:36:58, marcelorcorrea wrote: > Actually I tried to add the offsetWidth/Height check into ...
5 years, 1 month ago (2015-11-13 06:58:52 UTC) #13
wychen
https://codereview.chromium.org/1411603004/diff/80001/javatests/org/chromium/distiller/ContentExtractorTest.java File javatests/org/chromium/distiller/ContentExtractorTest.java (right): https://codereview.chromium.org/1411603004/diff/80001/javatests/org/chromium/distiller/ContentExtractorTest.java#newcode499 javatests/org/chromium/distiller/ContentExtractorTest.java:499: public void testOnlyProcessArticleElement() { It might be better to ...
5 years, 1 month ago (2015-11-13 07:10:57 UTC) #14
marcelorcorrea
getArticleElement() method moved to DomUtil class.
5 years, 1 month ago (2015-11-13 12:50:04 UTC) #15
wychen
https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java#newcode109 java/org/chromium/distiller/DomUtil.java:109: return !(e.getOffsetHeight() <= 0 || e.getOffsetWidth() <= 0); I ...
5 years ago (2015-12-10 23:33:37 UTC) #16
marcelorcorrea
Hello wychen That CL is not on master branch yet, can you push it so ...
5 years ago (2015-12-11 12:46:16 UTC) #17
marcelorcorrea
Hello, I've applied your solution manually, tested our patch and it worked as expected. So, ...
4 years, 11 months ago (2016-01-06 12:50:01 UTC) #18
mlongaray
Hi there, We have a pending CL here that's being blocked by another one (https://codereview.chromium.org/1508963003/). ...
4 years, 8 months ago (2016-04-18 14:15:10 UTC) #19
wychen
On 2016/04/18 14:15:10, mlongaray wrote: > Hi there, > We have a pending CL here ...
4 years, 6 months ago (2016-06-01 08:46:20 UTC) #20
wychen
https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java#newcode109 java/org/chromium/distiller/DomUtil.java:109: return !(e.getOffsetHeight() <= 0 || e.getOffsetWidth() <= 0); On ...
4 years, 6 months ago (2016-06-01 08:46:30 UTC) #21
wychen
It's a bit hard to review if rebasing and intended modifications are mixed up.
4 years, 6 months ago (2016-06-01 21:18:14 UTC) #22
marcelorcorrea
On 2016/06/01 08:46:30, wychen wrote: > https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java > File java/org/chromium/distiller/DomUtil.java (right): > > https://codereview.chromium.org/1411603004/diff/100001/java/org/chromium/distiller/DomUtil.java#newcode109 > ...
4 years, 6 months ago (2016-06-01 21:30:39 UTC) #23
marcelorcorrea
On 2016/06/01 21:18:14, wychen wrote: > It's a bit hard to review if rebasing and ...
4 years, 6 months ago (2016-06-01 21:35:39 UTC) #24
wychen
On 2016/06/01 21:35:39, marcelorcorrea wrote: > On 2016/06/01 21:18:14, wychen wrote: > > It's a ...
4 years, 6 months ago (2016-06-01 21:55:49 UTC) #25
marcelorcorrea
On 2016/06/01 21:55:49, wychen wrote: > On 2016/06/01 21:35:39, marcelorcorrea wrote: > > On 2016/06/01 ...
4 years, 6 months ago (2016-06-01 21:58:17 UTC) #26
wychen
On 2016/06/01 21:30:39, marcelorcorrea wrote: > However, I'm not sure I understood what you said ...
4 years, 6 months ago (2016-06-01 22:10:24 UTC) #27
wychen
https://codereview.chromium.org/1411603004/diff/120001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/120001/java/org/chromium/distiller/DomUtil.java#newcode108 java/org/chromium/distiller/DomUtil.java:108: public static boolean isVisibleByItsOffset(Element e) { Rename to IsVisibleByOffset()? ...
4 years, 6 months ago (2016-06-02 05:56:01 UTC) #28
marcelorcorrea
https://codereview.chromium.org/1411603004/diff/120001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/120001/java/org/chromium/distiller/DomUtil.java#newcode108 java/org/chromium/distiller/DomUtil.java:108: public static boolean isVisibleByItsOffset(Element e) { On 2016/06/02 05:56:01, ...
4 years, 6 months ago (2016-06-03 16:21:57 UTC) #29
wychen
https://codereview.chromium.org/1411603004/diff/120001/javatests/org/chromium/distiller/DomUtilTest.java File javatests/org/chromium/distiller/DomUtilTest.java (right): https://codereview.chromium.org/1411603004/diff/120001/javatests/org/chromium/distiller/DomUtilTest.java#newcode426 javatests/org/chromium/distiller/DomUtilTest.java:426: "<h1>" + CONTENT_TEXT + "</h1>" + On 2016/06/03 16:21:57, ...
4 years, 6 months ago (2016-06-03 16:49:53 UTC) #30
wychen
4 years, 6 months ago (2016-06-03 16:49:54 UTC) #31
wychen
This CL seems to fix https://bugs.chromium.org/p/chromium/issues/detail?id=616954 as well. If so, it can be added to ...
4 years, 6 months ago (2016-06-04 00:19:04 UTC) #32
marcelorcorrea
On 2016/06/04 00:19:04, wychen wrote: > This CL seems to fix > https://bugs.chromium.org/p/chromium/issues/detail?id=616954 as well. ...
4 years, 6 months ago (2016-06-06 13:15:31 UTC) #34
marcelorcorrea
https://codereview.chromium.org/1411603004/diff/120001/javatests/org/chromium/distiller/DomUtilTest.java File javatests/org/chromium/distiller/DomUtilTest.java (right): https://codereview.chromium.org/1411603004/diff/120001/javatests/org/chromium/distiller/DomUtilTest.java#newcode426 javatests/org/chromium/distiller/DomUtilTest.java:426: "<h1>" + CONTENT_TEXT + "</h1>" + On 2016/06/03 16:49:53, ...
4 years, 6 months ago (2016-06-06 13:17:57 UTC) #35
wychen
https://codereview.chromium.org/1411603004/diff/160001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/160001/java/org/chromium/distiller/DomUtil.java#newcode146 java/org/chromium/distiller/DomUtil.java:146: public static List<Element> getVisibleElements(NodeList<Element> nodeList) { nit: I meant ...
4 years, 6 months ago (2016-06-06 18:11:23 UTC) #36
marcelorcorrea
https://codereview.chromium.org/1411603004/diff/160001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/160001/java/org/chromium/distiller/DomUtil.java#newcode146 java/org/chromium/distiller/DomUtil.java:146: public static List<Element> getVisibleElements(NodeList<Element> nodeList) { On 2016/06/06 18:11:22, ...
4 years, 6 months ago (2016-06-06 19:21:59 UTC) #37
wychen
https://codereview.chromium.org/1411603004/diff/160001/javatests/org/chromium/distiller/webdocument/DomConverterTest.java File javatests/org/chromium/distiller/webdocument/DomConverterTest.java (right): https://codereview.chromium.org/1411603004/diff/160001/javatests/org/chromium/distiller/webdocument/DomConverterTest.java#newcode133 javatests/org/chromium/distiller/webdocument/DomConverterTest.java:133: mBody.appendChild(container); On 2016/06/06 19:21:59, marcelorcorrea wrote: > On 2016/06/06 ...
4 years, 6 months ago (2016-06-06 21:11:28 UTC) #38
wychen
https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java#newcode110 java/org/chromium/distiller/DomUtil.java:110: // Using offsetParent alone wouldn't work because it's also ...
4 years, 6 months ago (2016-06-07 13:43:11 UTC) #39
wychen
https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java#newcode110 java/org/chromium/distiller/DomUtil.java:110: // Using offsetParent alone wouldn't work because it's also ...
4 years, 6 months ago (2016-06-07 13:43:11 UTC) #40
marcelorcorrea
https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java File java/org/chromium/distiller/DomUtil.java (right): https://codereview.chromium.org/1411603004/diff/200001/java/org/chromium/distiller/DomUtil.java#newcode110 java/org/chromium/distiller/DomUtil.java:110: // Using offsetParent alone wouldn't work because it's also ...
4 years, 6 months ago (2016-06-07 15:35:11 UTC) #41
wychen
lgtm with nits. https://codereview.chromium.org/1411603004/diff/220001/javatests/org/chromium/distiller/DomUtilTest.java File javatests/org/chromium/distiller/DomUtilTest.java (right): https://codereview.chromium.org/1411603004/diff/220001/javatests/org/chromium/distiller/DomUtilTest.java#newcode19 javatests/org/chromium/distiller/DomUtilTest.java:19: nit: extra line https://codereview.chromium.org/1411603004/diff/220001/javatests/org/chromium/distiller/DomUtilTest.java#newcode117 javatests/org/chromium/distiller/DomUtilTest.java:117: nit: ...
4 years, 6 months ago (2016-06-07 16:20:08 UTC) #42
marcelorcorrea
https://codereview.chromium.org/1411603004/diff/220001/javatests/org/chromium/distiller/DomUtilTest.java File javatests/org/chromium/distiller/DomUtilTest.java (right): https://codereview.chromium.org/1411603004/diff/220001/javatests/org/chromium/distiller/DomUtilTest.java#newcode19 javatests/org/chromium/distiller/DomUtilTest.java:19: On 2016/06/07 16:20:08, wychen wrote: > nit: extra line ...
4 years, 6 months ago (2016-06-07 17:06:02 UTC) #43
mdjones
Fix typos in cl description, otherwise lgtm. "and this ones" -> "and these" "to a ...
4 years, 6 months ago (2016-06-07 23:02:40 UTC) #44
marcelorcorrea
On 2016/06/07 23:02:40, mdjones wrote: > Fix typos in cl description, otherwise lgtm. > > ...
4 years, 6 months ago (2016-06-08 13:01:49 UTC) #46
wychen
No change in eval scores. Landing.
4 years, 6 months ago (2016-06-11 00:13:35 UTC) #48
wychen
4 years, 6 months ago (2016-06-11 00:15:40 UTC) #50
Message was sent while issue was closed.
Committed patchset #13 (id:240001) manually as
54d05ba208e431963af489785edf036f9fda3cb7 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698