Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(981)

Side by Side Diff: src/com/dom_distiller/client/ContentExtractor.java

Issue 291823005: Restore Title identification. (Closed) Base URL: https://code.google.com/p/dom-distiller/@master
Patch Set: trim title Created 6 years, 7 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « boilerpipe-core/src/main/de/l3s/boilerpipe/sax/BoilerpipeHTMLContentHandler.java ('k') | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 // Copyright 2014 The Chromium Authors. All rights reserved. 1 // Copyright 2014 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 4
5 package com.dom_distiller.client; 5 package com.dom_distiller.client;
6 6
7 import com.google.gwt.dom.client.AnchorElement; 7 import com.google.gwt.dom.client.AnchorElement;
8 import com.google.gwt.dom.client.Document; 8 import com.google.gwt.dom.client.Document;
9 import com.google.gwt.dom.client.Element; 9 import com.google.gwt.dom.client.Element;
10 import com.google.gwt.dom.client.ImageElement; 10 import com.google.gwt.dom.client.ImageElement;
(...skipping 32 matching lines...) Expand 10 before | Expand all | Expand 10 after
43 htmlParser.startDocument(); 43 htmlParser.startDocument();
44 Element documentElement = Document.get().getDocumentElement(); 44 Element documentElement = Document.get().getDocumentElement();
45 textNodes = parse(documentElement, htmlParser); 45 textNodes = parse(documentElement, htmlParser);
46 htmlParser.endDocument(); 46 htmlParser.endDocument();
47 } catch (SAXException e) { 47 } catch (SAXException e) {
48 logger.warning("Parsing failed."); 48 logger.warning("Parsing failed.");
49 return ""; 49 return "";
50 } 50 }
51 51
52 TextDocument document = htmlParser.toTextDocument(); 52 TextDocument document = htmlParser.toTextDocument();
53 document.setTitle(Document.get().getTitle().trim());
53 try { 54 try {
54 CommonExtractors.ARTICLE_EXTRACTOR.process(document); 55 CommonExtractors.ARTICLE_EXTRACTOR.process(document);
55 } catch (BoilerpipeProcessingException e) { 56 } catch (BoilerpipeProcessingException e) {
56 logger.warning("Processing failed."); 57 logger.warning("Processing failed.");
57 return ""; 58 return "";
58 } 59 }
59 60
60 if (text_only) { 61 if (text_only) {
61 return document.getText(true, false); 62 return document.getText(true, false);
62 } 63 }
(...skipping 58 matching lines...) Expand 10 before | Expand all | Expand 10 after
121 link.setHref(link.getHref()); 122 link.setHref(link.getHref());
122 } 123 }
123 124
124 NodeList<Element> allImages = root.getElementsByTagName("IMG"); 125 NodeList<Element> allImages = root.getElementsByTagName("IMG");
125 for (int i = 0; i < allImages.getLength(); i++) { 126 for (int i = 0; i < allImages.getLength(); i++) {
126 ImageElement image = ImageElement.as(allImages.getItem(i)); 127 ImageElement image = ImageElement.as(allImages.getItem(i));
127 image.setSrc(image.getSrc()); 128 image.setSrc(image.getSrc());
128 } 129 }
129 } 130 }
130 } 131 }
OLDNEW
« no previous file with comments | « boilerpipe-core/src/main/de/l3s/boilerpipe/sax/BoilerpipeHTMLContentHandler.java ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698