Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(131)

Issue 2020403002: Add support for figure element (Closed)

Created:
4 years, 6 months ago by marcelorcorrea
Modified:
4 years, 4 months ago
Reviewers:
mdjones, wychen
CC:
dalmirsilva
Base URL:
https://github.com/chromium/dom-distiller.git@master
Target Ref:
refs/heads/master
Visibility:
Public.

Description

Add support for figure element Created the WebFigure class that represents a figure element, containing an image and optionally a caption. This class is basically a placeholder for an image and a caption, so it extends WebImage and handles the caption. Contributors=marcelorcorrea@hp.com; dalmirsilva@hp.com ** Score changes: https://x20web.corp.google.com/~wychen/domdistillerscore/figure/cjk.html Average precision: 0.950 → 0.948 https://x20web.corp.google.com/~wychen/domdistillerscore/figure/knowledge.html Average recall: 0.973 → 0.973+ https://x20web.corp.google.com/~wychen/domdistillerscore/figure/multi-page.html Average precision: 0.745 → 0.744 Average recall: 0.563 → 0.563 https://x20web.corp.google.com/~wychen/domdistillerscore/figure/page-links.html Average precision: 0.918 → 0.918- Average recall: 0.946 → 0.947+ https://x20web.corp.google.com/~wychen/domdistillerscore/figure/reader-golden.html Average precision: 0.947 → 0.945 Average recall: 0.975 → 0.976 Overall, more things are extracted, so recall improved, but precision regressed. BUG=613374 R=mdjones@chromium.org, wychen@chromium.org Committed: 365c44e85bbba9c237827444cef6df66e56f3689

Patch Set 1 #

Total comments: 9

Patch Set 2 : wychen's comments addressed #

Total comments: 4

Patch Set 3 : comments addressed #

Total comments: 11

Patch Set 4 : merged from master #

Total comments: 4

Patch Set 5 : wychen's comments addressed #

Patch Set 6 : added support for lazily-loaded images in figures #

Total comments: 8

Patch Set 7 : comments addressed #

Total comments: 1

Patch Set 8 : Handling plain texts with generateOutputFromTree. #

Total comments: 5

Patch Set 9 : rebase & comments addressed #

Total comments: 2

Patch Set 10 : mdjones' comments addressed #

Unified diffs Side-by-side diffs Delta from patch set Stats (+282 lines, -28 lines) Patch
M java/org/chromium/distiller/extractors/embeds/ImageExtractor.java View 1 2 3 4 5 6 7 8 9 3 chunks +69 lines, -27 lines 0 comments Download
A java/org/chromium/distiller/webdocument/WebFigure.java View 1 2 3 4 5 6 7 8 9 1 chunk +44 lines, -0 lines 0 comments Download
M java/org/chromium/distiller/webdocument/WebImage.java View 1 2 3 4 5 6 7 8 1 chunk +7 lines, -0 lines 0 comments Download
M javatests/org/chromium/distiller/EmbedExtractorTest.java View 1 2 3 4 5 6 7 3 chunks +162 lines, -1 line 0 comments Download

Messages

Total messages: 37 (7 generated)
marcelorcorrea
Add support for figure element.
4 years, 6 months ago (2016-05-31 20:52:34 UTC) #1
wychen
Thanks for the patch! https://codereview.chromium.org/2020403002/diff/1/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/1/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode10 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:10: import org.chromium.distiller.webdocument.WebFigure; Forget to upload ...
4 years, 6 months ago (2016-05-31 21:28:04 UTC) #2
marcelorcorrea
Sorry, wychen! Forgot, indeed ! :P WebFigure added. Thanks! https://codereview.chromium.org/2020403002/diff/1/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/1/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode54 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:54: ...
4 years, 6 months ago (2016-05-31 21:46:44 UTC) #3
wychen
How does this look in the viewer? Do we need to update the CSS? https://codereview.chromium.org/2020403002/diff/20001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java ...
4 years, 6 months ago (2016-05-31 22:14:46 UTC) #4
marcelorcorrea
Done. https://codereview.chromium.org/2020403002/diff/20001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/20001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode23 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:23: private String caption; On 2016/05/31 22:14:46, wychen wrote: ...
4 years, 6 months ago (2016-06-01 18:09:54 UTC) #5
marcelorcorrea
On 2016/05/31 22:14:46, wychen wrote: > How does this look in the viewer? Do we ...
4 years, 6 months ago (2016-06-01 18:15:51 UTC) #6
marcelorcorrea
Regarding how it looks like in the preview, it looks really good! See: https://s3-us-west-1.amazonaws.com/printx/public/Screen+Shot+2016-06-01+at+3.39.09+PM.png
4 years, 6 months ago (2016-06-01 18:48:55 UTC) #7
wychen
I've run this through our internal dataset and found some issues. After this CL, some ...
4 years, 6 months ago (2016-06-02 23:48:49 UTC) #8
marcelorcorrea
On 2016/06/02 23:48:49, wychen wrote: > I've run this through our internal dataset and found ...
4 years, 6 months ago (2016-06-06 20:38:23 UTC) #9
marcelorcorrea
https://codereview.chromium.org/2020403002/diff/40001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/40001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode56 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:56: Element cap = getFirstElementByTagName(e, "FIGCAPTION"); On 2016/06/02 23:48:49, wychen ...
4 years, 6 months ago (2016-06-06 20:38:30 UTC) #10
wychen
https://codereview.chromium.org/2020403002/diff/40001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/40001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode56 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:56: Element cap = getFirstElementByTagName(e, "FIGCAPTION"); On 2016/06/06 20:38:30, marcelorcorrea ...
4 years, 6 months ago (2016-06-06 21:49:05 UTC) #11
wychen
We also need to consider compatibility with ContentExtractor.getImageUrls() and LeadImageFinder. In that case, would keeping ...
4 years, 6 months ago (2016-06-07 00:56:08 UTC) #12
wychen
On 2016/06/07 00:56:08, wychen wrote: > We also need to consider compatibility with ContentExtractor.getImageUrls() and ...
4 years, 6 months ago (2016-06-07 13:46:50 UTC) #13
wychen
Thanks for formating my part. https://codereview.chromium.org/2020403002/diff/60001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/60001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode10 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:10: import com.google.gwt.dom.client.NodeList; import should ...
4 years, 6 months ago (2016-06-07 16:27:20 UTC) #14
marcelorcorrea
https://codereview.chromium.org/2020403002/diff/60001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/60001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode10 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:10: import com.google.gwt.dom.client.NodeList; On 2016/06/07 16:27:20, wychen wrote: > import ...
4 years, 6 months ago (2016-06-07 17:02:25 UTC) #15
wychen
https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode12 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:12: import org.chromium.distiller.LogUtil; The sorting is case sensitive. Capital ones ...
4 years, 6 months ago (2016-06-07 17:39:19 UTC) #16
marcelorcorrea
thanks for reviewing it! https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode59 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:59: cap.getInnerHTML() : cap.getInnerText(); On 2016/06/07 ...
4 years, 6 months ago (2016-06-08 14:34:16 UTC) #17
wychen
https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/100001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode59 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:59: cap.getInnerHTML() : cap.getInnerText(); On 2016/06/08 14:34:15, marcelorcorrea wrote: > ...
4 years, 6 months ago (2016-06-20 18:54:20 UTC) #18
wychen
Hi, What is the status of this CL?
4 years, 4 months ago (2016-07-24 23:26:43 UTC) #19
marcelorcorrea
On 2016/07/24 23:26:43, wychen wrote: > Hi, > > What is the status of this ...
4 years, 4 months ago (2016-07-28 20:25:23 UTC) #20
wychen
On 2016/07/28 20:25:23, marcelorcorrea wrote: > On 2016/07/24 23:26:43, wychen wrote: > > Hi, > ...
4 years, 4 months ago (2016-08-01 07:39:16 UTC) #21
marcelorcorrea
Wychen comments addressed
4 years, 4 months ago (2016-08-04 16:42:35 UTC) #22
wychen
Thanks. We are getting close. https://codereview.chromium.org/2020403002/diff/140001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/140001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode57 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:57: Element figcaption = Document.get().createElement("FIGCAPTION"); ...
4 years, 4 months ago (2016-08-05 02:13:04 UTC) #23
marcelorcorrea
wychen's comments addressed again :D! Thanks for reviewing it!
4 years, 4 months ago (2016-08-05 13:57:25 UTC) #26
wychen
lgtm. mdjones@, can you also take a look? Thanks!
4 years, 4 months ago (2016-08-05 16:30:43 UTC) #27
wychen
lgtm. mdjones@, can you also take a look? Thanks!
4 years, 4 months ago (2016-08-05 16:30:44 UTC) #28
mdjones
lgtm % nits https://codereview.chromium.org/2020403002/diff/200001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java File java/org/chromium/distiller/extractors/embeds/ImageExtractor.java (right): https://codereview.chromium.org/2020403002/diff/200001/java/org/chromium/distiller/extractors/embeds/ImageExtractor.java#newcode60 java/org/chromium/distiller/extractors/embeds/ImageExtractor.java:60: figcaption = getFirstElementByTagName(cap, "A") != null ...
4 years, 4 months ago (2016-08-05 17:35:15 UTC) #29
wychen
Could you help update the CL description so that we can land it? Thanks!
4 years, 4 months ago (2016-08-06 07:09:05 UTC) #30
marcelorcorrea
On 2016/08/06 07:09:05, wychen wrote: > Could you help update the CL description so that ...
4 years, 4 months ago (2016-08-08 14:03:41 UTC) #32
wychen
4 years, 4 months ago (2016-08-10 23:26:12 UTC) #37
Message was sent while issue was closed.
Committed patchset #10 (id:220001) manually as
365c44e85bbba9c237827444cef6df66e56f3689 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698