Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(826)

Side by Side Diff: java/org/chromium/distiller/webdocument/WebText.java

Issue 2401853004: Strip unwanted classNames from all nodes (Closed)
Patch Set: address comments Created 4 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
OLDNEW
1 // Copyright 2015 The Chromium Authors. All rights reserved. 1 // Copyright 2015 The Chromium Authors. All rights reserved.
2 // Use of this source code is governed by a BSD-style license that can be 2 // Use of this source code is governed by a BSD-style license that can be
3 // found in the LICENSE file. 3 // found in the LICENSE file.
4 4
5 package org.chromium.distiller.webdocument; 5 package org.chromium.distiller.webdocument;
6 6
7 import com.google.gwt.dom.client.Document; 7 import com.google.gwt.dom.client.Document;
8 import com.google.gwt.dom.client.Element; 8 import com.google.gwt.dom.client.Element;
9 import org.chromium.distiller.DomUtil; 9 import org.chromium.distiller.DomUtil;
10 import org.chromium.distiller.TreeCloneBuilder; 10 import org.chromium.distiller.TreeCloneBuilder;
(...skipping 115 matching lines...) Expand 10 before | Expand all | Expand 10 after
126 if ("BODY".equals(Element.as(srcRoot).getTagName())) break; 126 if ("BODY".equals(Element.as(srcRoot).getTagName())) break;
127 Node parentClone = srcRoot.cloneNode(false); 127 Node parentClone = srcRoot.cloneNode(false);
128 parentClone.appendChild(clonedRoot); 128 parentClone.appendChild(clonedRoot);
129 clonedRoot = parentClone; 129 clonedRoot = parentClone;
130 } 130 }
131 131
132 // Make sure links are absolute and IDs are gone. 132 // Make sure links are absolute and IDs are gone.
133 DomUtil.makeAllLinksAbsolute(clonedRoot); 133 DomUtil.makeAllLinksAbsolute(clonedRoot);
134 DomUtil.stripTargetAttributes(clonedRoot); 134 DomUtil.stripTargetAttributes(clonedRoot);
135 DomUtil.stripIds(clonedRoot); 135 DomUtil.stripIds(clonedRoot);
136 DomUtil.stripUnwantedClassNames(clonedRoot);
136 DomUtil.stripFontColorAttributes(clonedRoot); 137 DomUtil.stripFontColorAttributes(clonedRoot);
137 DomUtil.stripStyleAttributes(clonedRoot); 138 DomUtil.stripStyleAttributes(clonedRoot);
138 // TODO(wychen): if we allow images in WebText later, add stripImageElem ents(). 139 // TODO(wychen): if we allow images in WebText later, add stripImageElem ents().
139 140
140 // Since there are tag elements that are being wrapped 141 // Since there are tag elements that are being wrapped
141 // by a pair of {@link WebTag}s, we only need to 142 // by a pair of {@link WebTag}s, we only need to
142 // get the innerHTML, otherwise these tags would be duplicated. 143 // get the innerHTML, otherwise these tags would be duplicated.
143 Element elementClonedRoot = Element.as(clonedRoot); 144 Element elementClonedRoot = Element.as(clonedRoot);
144 if (textOnly) { 145 if (textOnly) {
145 return DomUtil.getInnerText(elementClonedRoot); 146 return DomUtil.getInnerText(elementClonedRoot);
(...skipping 54 matching lines...) Expand 10 before | Expand all | Expand 10 after
200 } 201 }
201 202
202 public void setGroupNumber(int group) { 203 public void setGroupNumber(int group) {
203 groupNumber = group; 204 groupNumber = group;
204 } 205 }
205 206
206 public int getGroupNumber() { 207 public int getGroupNumber() {
207 return groupNumber; 208 return groupNumber;
208 } 209 }
209 } 210 }
OLDNEW
« no previous file with comments | « java/org/chromium/distiller/DomUtil.java ('k') | javatests/org/chromium/distiller/ContentExtractorTest.java » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698