|
|
DescriptionRemove hidden elements from MHTML
We remove the following hidden elements that will never be visible
due to the fact that script execution is disable in MHTML loading:
1) Any DOM element has hidden attribute
2) Input element has hidden type
BUG=669325
TEST=new tests added
Committed: https://crrev.com/6f960b80d7bee61f42b4e95c8f8c2e252dc36739
Cr-Commit-Position: refs/heads/master@{#440790}
Patch Set 1 #
Total comments: 4
Patch Set 2 : Address feedback #
Total comments: 2
Patch Set 3 : Rebase #Patch Set 4 : Use blacklist #Patch Set 5 : Fix browser test #Patch Set 6 : Rebase #Patch Set 7 : Fix trybots #
Messages
Total messages: 41 (21 generated)
jianli@chromium.org changed reviewers: + carlosk@chromium.org, lukasza@chromium.org, tkent@chromium.org
dcheng@chromium.org changed reviewers: + dcheng@chromium.org
Drive-by: please provide some more detail in the bug about the motivations for doing this. Presumably it's to reduce the size of the saved MHTML page, but I'm not really sure.
https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebFrameSerializer.cpp:106: } Can you check existence of layoutObject(), or check ComputedStyle()? It's a general way to check visibility.
LGTM with one question. https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/c... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/c... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:224: return m_delegate.shouldIgnoreElement(element); This might be a dumb question but why is this logic split between this method and the delegate's? Are these checks here for cases that should always be ignored when serializing and the ones in the delegate only for format-specific cases (assuming MHTML is not the only format)?
Patchset #2 (id:20001) has been deleted
https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/w... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/w... third_party/WebKit/Source/web/WebFrameSerializer.cpp:106: } On 2016/11/30 08:22:34, tkent wrote: > Can you check existence of layoutObject(), or check ComputedStyle()? It's a > general way to check visibility. > Done.
On 2016/11/30 02:56:31, dcheng wrote: > Drive-by: please provide some more detail in the bug about the motivations for > doing this. Presumably it's to reduce the size of the saved MHTML page, but I'm > not really sure. Done. Please see bug. Thanks.
https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/c... File third_party/WebKit/Source/core/frame/FrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/1/third_party/WebKit/Source/c... third_party/WebKit/Source/core/frame/FrameSerializer.cpp:224: return m_delegate.shouldIgnoreElement(element); On 2016/11/30 23:30:22, carlosk wrote: > This might be a dumb question but why is this logic split between this method > and the delegate's? Are these checks here for cases that should always be > ignored when serializing and the ones in the delegate only for format-specific > cases (assuming MHTML is not the only format)? It seems that this is useful in supporting multiple formats. I think we need to have some discussions to understand what we want to support here.
https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:100: return !element.layoutObject(); Though I suggested this, it drops <head>, <meta>, <style>, <link>, <script>, <option>, <datalist>, etc. Is it ok?
https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... third_party/WebKit/Source/web/WebFrameSerializer.cpp:100: return !element.layoutObject(); On 2016/12/01 14:56:43, tkent wrote: > Though I suggested this, it drops <head>, <meta>, <style>, <link>, <script>, > <option>, <datalist>, etc. Is it ok? We don't want to drop <head>, <meta>, <style>, <link>, <datalist> and etc. <script> should have already been dropped by MHTML serializer. Do you know any other way to check for that? It seems that these elements <head>, <meta> and etc are defined as "display: none" in the default stylesheet html.css. Can we call CSSDefaultStyleSheets to get the default display style for the element in order not to drop these elements? How do you think?
On 2016/12/07 02:21:02, jianli wrote: > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): > > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > third_party/WebKit/Source/web/WebFrameSerializer.cpp:100: return > !element.layoutObject(); > On 2016/12/01 14:56:43, tkent wrote: > > Though I suggested this, it drops <head>, <meta>, <style>, <link>, <script>, > > <option>, <datalist>, etc. Is it ok? > > We don't want to drop <head>, <meta>, <style>, <link>, <datalist> and etc. > <script> should have already been dropped by MHTML serializer. > > Do you know any other way to check for that? > > It seems that these elements <head>, <meta> and etc are defined as "display: > none" in the default stylesheet html.css. Can we call CSSDefaultStyleSheets to > get the default display style for the element in order not to drop these > elements? How do you think? Indeed more of these tags are part of head element, like meta, style, link and etc. For the following that is not part of head, we can: 1) <datalist>: I think we can drop this since we're going to disable the form in another patch. So including this does not make any value. 2) <param>: This will be transformed into part of video element during parsing. So no problem. 3) <template>: This can be dropped since the existing DOM might have already include the activation from it. If not, we can ignore it. <meta> could also appear in <body>. We can just consider this as a special case. So it seems that we can check if the element is a child of head or meta. If yes, we don't want to drop them. How do you think?
On 2016/12/07 at 02:21:02, jianli wrote: > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): > > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > third_party/WebKit/Source/web/WebFrameSerializer.cpp:100: return !element.layoutObject(); > On 2016/12/01 14:56:43, tkent wrote: > > Though I suggested this, it drops <head>, <meta>, <style>, <link>, <script>, > > <option>, <datalist>, etc. Is it ok? > > We don't want to drop <head>, <meta>, <style>, <link>, <datalist> and etc. <script> should have already been dropped by MHTML serializer. > > Do you know any other way to check for that? > > It seems that these elements <head>, <meta> and etc are defined as "display: none" in the default stylesheet html.css. Can we call CSSDefaultStyleSheets to get the default display style for the element in order not to drop these elements? How do you think? I think CSSDefaultStyleSheets isn't suitable because it is just a set of rules and has no functionality to apply rules to an element. Probably, there are two choices? * Safe-list: Like Patch Set 1, we drop only elements which we know it's safe to remove. - Elements with 'hidden' attribute, - <input type=hidden>, and - For example, <div> and <span> without layoutObject(). * Block-list: Like Patch Set 2, but we don't drop some specific elements - Drop elements without layoutObject() except <head>, <title>, <meta>, <style>, <link>, etc. Also, if we drop any elements, it can break page rendering layout because of CSS nth-child selector.
The CQ bit was checked by jianli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/12/08 02:29:27, tkent wrote: > On 2016/12/07 at 02:21:02, jianli wrote: > > > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > > File third_party/WebKit/Source/web/WebFrameSerializer.cpp (right): > > > > > https://codereview.chromium.org/2538953002/diff/40001/third_party/WebKit/Sour... > > third_party/WebKit/Source/web/WebFrameSerializer.cpp:100: return > !element.layoutObject(); > > On 2016/12/01 14:56:43, tkent wrote: > > > Though I suggested this, it drops <head>, <meta>, <style>, <link>, <script>, > > > <option>, <datalist>, etc. Is it ok? > > > > We don't want to drop <head>, <meta>, <style>, <link>, <datalist> and etc. > <script> should have already been dropped by MHTML serializer. > > > > Do you know any other way to check for that? > > > > It seems that these elements <head>, <meta> and etc are defined as "display: > none" in the default stylesheet html.css. Can we call CSSDefaultStyleSheets to > get the default display style for the element in order not to drop these > elements? How do you think? > > I think CSSDefaultStyleSheets isn't suitable because it is just a set of rules > and has no functionality to apply rules to an element. > > Probably, there are two choices? > * Safe-list: Like Patch Set 1, we drop only elements which we know it's safe > to remove. > - Elements with 'hidden' attribute, > - <input type=hidden>, and > - For example, <div> and <span> without layoutObject(). > > * Block-list: Like Patch Set 2, but we don't drop some specific elements > - Drop elements without layoutObject() except <head>, <title>, <meta>, > <style>, <link>, etc. > > Also, if we drop any elements, it can break page rendering layout because of CSS > nth-child selector. We may also want to include <li> and etc so maintaining a safe list seems not to be easy. So I turn to block list since we only have a limited set of elements to exclude. Thanks for pointing out CSS nth-child selector case. Yes, this is an unfortunate case but since it often applies to children of <div>, <ul> and etc and it seems that most often these children should be visible because otherwise nth-child CSS code will become too complicated, I think it might be OK to live with this.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_chromeos_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The code looks ok. Please fix a test failure.
The CQ bit was checked by jianli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
On 2016/12/14 03:41:27, tkent wrote: > The code looks ok. Please fix a test failure. Fixed. Please review again. Thanks.
jianli@chromium.org changed reviewers: + svaldez@chromium.org
svaldez for save_page_browsertest.cc
third_party/WebKit lgtm
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: linux_chromium_asan_rel_ng on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by jianli@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_...)
save_page_browsertest LGTM.
The CQ bit was checked by jianli@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from carlosk@chromium.org, tkent@chromium.org, svaldez@chromium.org Link to the patchset: https://codereview.chromium.org/2538953002/#ps140001 (title: "Fix trybots")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 140001, "attempt_start_ts": 1482866687160630, "parent_rev": "a6d4555816eefd1371f7b0e5a4ae1953458d801d", "commit_rev": "229bc0d9bc7b947e19cf0904e2e0d710f81b3ae6"}
Message was sent while issue was closed.
Description was changed from ========== Remove hidden elements from MHTML We remove the following hidden elements that will never be visible due to the fact that script execution is disable in MHTML loading: 1) Any DOM element has hidden attribute 2) Input element has hidden type BUG=669325 TEST=new tests added ========== to ========== Remove hidden elements from MHTML We remove the following hidden elements that will never be visible due to the fact that script execution is disable in MHTML loading: 1) Any DOM element has hidden attribute 2) Input element has hidden type BUG=669325 TEST=new tests added Review-Url: https://codereview.chromium.org/2538953002 ==========
Message was sent while issue was closed.
Committed patchset #7 (id:140001)
Message was sent while issue was closed.
Description was changed from ========== Remove hidden elements from MHTML We remove the following hidden elements that will never be visible due to the fact that script execution is disable in MHTML loading: 1) Any DOM element has hidden attribute 2) Input element has hidden type BUG=669325 TEST=new tests added Review-Url: https://codereview.chromium.org/2538953002 ========== to ========== Remove hidden elements from MHTML We remove the following hidden elements that will never be visible due to the fact that script execution is disable in MHTML loading: 1) Any DOM element has hidden attribute 2) Input element has hidden type BUG=669325 TEST=new tests added Committed: https://crrev.com/6f960b80d7bee61f42b4e95c8f8c2e252dc36739 Cr-Commit-Position: refs/heads/master@{#440790} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/6f960b80d7bee61f42b4e95c8f8c2e252dc36739 Cr-Commit-Position: refs/heads/master@{#440790} |