Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(54)

Unified Diff: src/com/dom_distiller/client/ContentExtractor.java

Issue 290993004: Fix final content and title extraction. (Closed) Base URL: https://code.google.com/p/dom-distiller/@master
Patch Set: Created 6 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: src/com/dom_distiller/client/ContentExtractor.java
diff --git a/src/com/dom_distiller/client/ContentExtractor.java b/src/com/dom_distiller/client/ContentExtractor.java
index f107e66f615960e667cab6beece10da68bd615a5..2790575d9286812f57610750f86d9bd86ceda27a 100644
--- a/src/com/dom_distiller/client/ContentExtractor.java
+++ b/src/com/dom_distiller/client/ContentExtractor.java
@@ -73,7 +73,6 @@ public class ContentExtractor implements Exportable {
}
Node clonedSubtree = NodeListExpander.expand(contentAndImages).cloneSubtree();
-
if (clonedSubtree.getNodeType() != Node.ELEMENT_NODE) {
return "";
}
@@ -99,6 +98,9 @@ public class ContentExtractor implements Exportable {
TextDocument document, List<Node> textNodes) {
List<Integer> contentTextIndexes = new ArrayList<Integer>();
for (TextBlock tb : document.getTextBlocks()) {
+ if (!tb.isContent()) {
+ continue;
+ }
if (!tb.hasLabel(DefaultLabels.TITLE)) {
contentTextIndexes.addAll(tb.getContainedTextElements());
}

Powered by Google App Engine
This is Rietveld 408576698