Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(273)

Side by Side Diff: java/org/chromium/distiller/DomUtil.java

Issue 2092553003: Fix schema.org article matching (Closed) Base URL: git@github.com:chromium/dom-distiller.git@master
Patch Set: Created 4 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | javatests/org/chromium/distiller/DomUtilTest.java » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 // Copyright 2014 The Chromium Authors. All rights reserved. 1 // Copyright 2014 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 4
5 package org.chromium.distiller; 5 package org.chromium.distiller;
6 6
7 import com.google.gwt.core.client.JsArray; 7 import com.google.gwt.core.client.JsArray;
8 import com.google.gwt.core.client.JsArrayString; 8 import com.google.gwt.core.client.JsArrayString;
9 import com.google.gwt.dom.client.AnchorElement; 9 import com.google.gwt.dom.client.AnchorElement;
10 import com.google.gwt.dom.client.Document; 10 import com.google.gwt.dom.client.Document;
(...skipping 109 matching lines...) Expand 10 before | Expand all | Expand 10 after
120 public static Element getArticleElement(Element root) { 120 public static Element getArticleElement(Element root) {
121 NodeList<Element> allArticles = root.getElementsByTagName("ARTICLE"); 121 NodeList<Element> allArticles = root.getElementsByTagName("ARTICLE");
122 List<Element> visibleElements = getVisibleElements(allArticles); 122 List<Element> visibleElements = getVisibleElements(allArticles);
123 // Having multiple article elements usually indicates a bad case for thi s shortcut. 123 // Having multiple article elements usually indicates a bad case for thi s shortcut.
124 // TODO(wychen): some sites exclude things like title and author in arti cle element. 124 // TODO(wychen): some sites exclude things like title and author in arti cle element.
125 if (visibleElements.size() == 1) { 125 if (visibleElements.size() == 1) {
126 return visibleElements.get(0); 126 return visibleElements.get(0);
127 } 127 }
128 // Note that the CSS property matching is case sensitive, and "Article" is the correct 128 // Note that the CSS property matching is case sensitive, and "Article" is the correct
129 // capitalization. 129 // capitalization.
130 String query = "[itemscope][itemtype*=\"Article\"],[itemscope][itemtype* =\"Post\"]"; 130 String query = "[itemscope][itemtype*=\"Article\"],[itemscope][itemtype* =\"Posting\"]";
131 allArticles = DomUtil.querySelectorAll(root, query); 131 allArticles = DomUtil.querySelectorAll(root, query);
132 visibleElements = getVisibleElements(allArticles); 132 visibleElements = getVisibleElements(allArticles);
133 // It is commonly seen that the article is wrapped separately or in mult iple layers. 133 // It is commonly seen that the article is wrapped separately or in mult iple layers.
134 if (visibleElements.size() > 0) { 134 if (visibleElements.size() > 0) {
135 return Element.as(DomUtil.getNearestCommonAncestor(visibleElements)) ; 135 return Element.as(DomUtil.getNearestCommonAncestor(visibleElements)) ;
136 } 136 }
137 return null; 137 return null;
138 } 138 }
139 139
140 /** 140 /**
(...skipping 383 matching lines...) Expand 10 before | Expand all | Expand 10 after
524 }-*/; 524 }-*/;
525 525
526 public static native Document createHTMLDocument(Document doc) /*-{ 526 public static native Document createHTMLDocument(Document doc) /*-{
527 return doc.implementation.createHTMLDocument(); 527 return doc.implementation.createHTMLDocument();
528 }-*/; 528 }-*/;
529 529
530 public static native Element getFirstElementChild(Document document) /*-{ 530 public static native Element getFirstElementChild(Document document) /*-{
531 return document.firstElementChild; 531 return document.firstElementChild;
532 }-*/; 532 }-*/;
533 } 533 }
OLDNEW
« no previous file with comments | « no previous file | javatests/org/chromium/distiller/DomUtilTest.java » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698