src/com/dom_distiller/client/ContentExtractor.java - Issue 290993004: Fix final content and title extraction.

Keyboard Shortcuts

	File
u :	up to issue
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line

	Issue
u :	up to list of issues
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue

Unified Diff: src/com/dom_distiller/client/ContentExtractor.java

Issue 290993004: Fix final content and title extraction. (Closed) Base URL: https://code.google.com/p/dom-distiller/@master

Patch Set: changed title handling, added test Created 6 years, 7 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

Index: src/com/dom_distiller/client/ContentExtractor.java

diff --git a/src/com/dom_distiller/client/ContentExtractor.java b/src/com/dom_distiller/client/ContentExtractor.java

index f107e66f615960e667cab6beece10da68bd615a5..44d384332c04de248ee526fdc748cfcf58a7471e 100644

--- a/src/com/dom_distiller/client/ContentExtractor.java

+++ b/src/com/dom_distiller/client/ContentExtractor.java

@@ -73,7 +73,7 @@ public class ContentExtractor implements Exportable {

}

Node clonedSubtree = NodeListExpander.expand(contentAndImages).cloneSubtree();

+ LogUtil.logToConsole(clonedSubtree.getNodeType() + " node:" + Node.ELEMENT_NODE);

cjhopman 2014/05/22 00:27:21 probably don't need this. If it's been useful, at

Yaron 2014/05/22 17:05:24 err right that was supposed to go. I was trying to

if (clonedSubtree.getNodeType() != Node.ELEMENT_NODE) {

return "";

}

@@ -99,6 +99,9 @@ public class ContentExtractor implements Exportable {

TextDocument document, List<Node> textNodes) {

List<Integer> contentTextIndexes = new ArrayList<Integer>();

for (TextBlock tb : document.getTextBlocks()) {

+ if (!tb.isContent()) {

+ continue;

+ }

if (!tb.hasLabel(DefaultLabels.TITLE)) {

contentTextIndexes.addAll(tb.getContainedTextElements());

}