DescriptionImprove handling of <video>, <figure> and <br>
Omit <video> and <figure> from the document Boilerpipe sees as they have
text that can confuse processing. Instead treat these (and <br>s) in a
similar manner to data tables and <img> tags.
This shows a very minor decline in eval but it's a false negative as a
<figcaption> is omitted from the text only response, but is included
later when reconstructing the page.
BUG=376107, 376102, 378385
R=cjhopman@chromium.org, nyquist@chromium.org
Committed: b6100c8
Patch Set 1 #Patch Set 2 : #
Messages
Total messages: 4 (0 generated)
|