DescriptionMake isVisible() faster and more accurate
Function isVisible() checked whether display is none, but it is not
inherited. This is OK when doing DOM traversal, since we skip children
of invisible elements. However, this doesn't work in general. For
example, when checking whether an anchor element is visible in next page
detection.
We check properties like offsetParent, offsetWidth, and offsetHeight
instead to amend this.
Function isVisible() checked for display, opacity and visibility, which
requires slow getComputedStyle(). Opacity and visibility are rarely used
to hide elements, and display is replaced by offset*, so that we can
stop getting computed styles.
** Score changes:
Multi-page dataset:
Minor changes in content extraction. Most entries here are not
distillable anyway.
Average F1 of content: 0.561 -> 0.561
https://x20web.corp.google.com/~wychen/domdistillerscore/visible/multi-page.html
Reader-image dataset:
Average F1 of image: 0.633 -> 0.634
https://x20web.corp.google.com/~wychen/domdistillerscore/visible/reader-images.html
No changes on other data sets:
- cjk-golden-data
- cleaneval-golden-data
- golden_data_with_knowledge
- page-links-golden-data
- reader-mode-golden-data
** Performance impact:
The average time reported by eval server for dataset
"reader-mode-golden-data" is used as the benchmark. To reduce noise, it
is rerun for 100 times.
The total time spent is 4.8% shorter.
BUG=431067
Patch Set 1 #
Total comments: 4
Patch Set 2 : address mdjones' comments #
Messages
Total messages: 7 (3 generated)
|