Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1)

Side by Side Diff: java/org/chromium/distiller/extractors/embeds/ImageExtractor.java

Issue 2670643006: Skip non-text elements in <a> without href in <figcaption> (Closed)
Patch Set: Created 3 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | javatests/org/chromium/distiller/EmbedExtractorTest.java » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 // Copyright 2015 The Chromium Authors. All rights reserved. 1 // Copyright 2015 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 4
5 package org.chromium.distiller.extractors.embeds; 5 package org.chromium.distiller.extractors.embeds;
6 6
7 import com.google.gwt.dom.client.Document; 7 import com.google.gwt.dom.client.Document;
8 import com.google.gwt.dom.client.Element; 8 import com.google.gwt.dom.client.Element;
9 import com.google.gwt.dom.client.ImageElement; 9 import com.google.gwt.dom.client.ImageElement;
10 import com.google.gwt.dom.client.NodeList; 10 import com.google.gwt.dom.client.NodeList;
(...skipping 48 matching lines...) Expand 10 before | Expand all | Expand 10 after
59 return null; 59 return null;
60 } 60 }
61 extractImageAttributes(ie); 61 extractImageAttributes(ie);
62 Element figcaption; 62 Element figcaption;
63 Element cap = DomUtil.getFirstElementByTagName(e, "FIGCAPTION"); 63 Element cap = DomUtil.getFirstElementByTagName(e, "FIGCAPTION");
64 if (cap != null) { 64 if (cap != null) {
65 // We look for links because some sites put non-caption 65 // We look for links because some sites put non-caption
66 // elements into <figcaption>. For example: image credit 66 // elements into <figcaption>. For example: image credit
67 // could contain a link. So we get the whole DOM structure withi n 67 // could contain a link. So we get the whole DOM structure withi n
68 // <figcaption> only when it contains links, otherwise we get th e innerText. 68 // <figcaption> only when it contains links, otherwise we get th e innerText.
69 figcaption = DomUtil.getFirstElementByTagName(cap, "A") != null ? 69 NodeList<Element> links = DomUtil.querySelectorAll(cap, "A[HREF] ");
70 figcaption = links.getLength() > 0 ?
70 cap : createFigcaptionElement(cap); 71 cap : createFigcaptionElement(cap);
71 } else { 72 } else {
72 figcaption = createFigcaptionElement(e); 73 figcaption = createFigcaptionElement(e);
73 } 74 }
74 return new WebFigure(img, width, height, imgSrc, figcaption); 75 return new WebFigure(img, width, height, imgSrc, figcaption);
75 } 76 }
76 77
77 extractImageAttributes(ie); 78 extractImageAttributes(ie);
78 return new WebImage(e, width, height, imgSrc); 79 return new WebImage(e, width, height, imgSrc);
79 } 80 }
(...skipping 23 matching lines...) Expand all
103 LogUtil.logToConsole("Extracted WebImage: " + imgSrc); 104 LogUtil.logToConsole("Extracted WebImage: " + imgSrc);
104 } 105 }
105 } 106 }
106 107
107 private Element createFigcaptionElement(Element element) { 108 private Element createFigcaptionElement(Element element) {
108 Element figcaption = Document.get().createElement("FIGCAPTION"); 109 Element figcaption = Document.get().createElement("FIGCAPTION");
109 figcaption.setInnerText(DomUtil.getInnerText(element)); 110 figcaption.setInnerText(DomUtil.getInnerText(element));
110 return figcaption; 111 return figcaption;
111 } 112 }
112 } 113 }
OLDNEW
« no previous file with comments | « no previous file | javatests/org/chromium/distiller/EmbedExtractorTest.java » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698