Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(59)

Unified Diff: src/com/dom_distiller/client/ContentExtractor.java

Issue 290993004: Fix final content and title extraction. (Closed) Base URL: https://code.google.com/p/dom-distiller/@master
Patch Set: changed title handling, added test Created 6 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: src/com/dom_distiller/client/ContentExtractor.java
diff --git a/src/com/dom_distiller/client/ContentExtractor.java b/src/com/dom_distiller/client/ContentExtractor.java
index f107e66f615960e667cab6beece10da68bd615a5..44d384332c04de248ee526fdc748cfcf58a7471e 100644
--- a/src/com/dom_distiller/client/ContentExtractor.java
+++ b/src/com/dom_distiller/client/ContentExtractor.java
@@ -73,7 +73,7 @@ public class ContentExtractor implements Exportable {
}
Node clonedSubtree = NodeListExpander.expand(contentAndImages).cloneSubtree();
-
+ LogUtil.logToConsole(clonedSubtree.getNodeType() + " node:" + Node.ELEMENT_NODE);
cjhopman 2014/05/22 00:27:21 probably don't need this. If it's been useful, at
Yaron 2014/05/22 17:05:24 err right that was supposed to go. I was trying to
if (clonedSubtree.getNodeType() != Node.ELEMENT_NODE) {
return "";
}
@@ -99,6 +99,9 @@ public class ContentExtractor implements Exportable {
TextDocument document, List<Node> textNodes) {
List<Integer> contentTextIndexes = new ArrayList<Integer>();
for (TextBlock tb : document.getTextBlocks()) {
+ if (!tb.isContent()) {
+ continue;
+ }
if (!tb.hasLabel(DefaultLabels.TITLE)) {
contentTextIndexes.addAll(tb.getContainedTextElements());
}

Powered by Google App Engine
This is Rietveld 408576698