|
|
Created:
5 years, 1 month ago by Łukasz Anforowicz Modified:
5 years ago Reviewers:
David Trainor- moved to gerrit, fgorski, tkent, Randy Smith (Not in Mondays), dcheng, yosin_UTC9 CC:
blink-reviews, blink-reviews-_chromium.org, chromium-reviews, dglazkov+blink, fgorski, hajimehoshi, Jay Civelli, site-isolation-reviews_chromium.org, yosin_UTC9 Base URL:
https://chromium.googlesource.com/chromium/src.git@mhtml-per-frame-page-serializer-only Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionGenerating CIDs in Blink during MHTML serialization.
This CL generalizes link rewriting done by PageSerializer
and gives control over it to users of PageSerializer through
the new PageSerializer::Delegate::rewriteLink virtual method.
The link rewriting is used to write cid:... uris in place of original
iframe.src attributes. This is a more generic approach for solving the
about:blank cross-referencing (the old approach of using artificial
wyciwyg://frame/X URLs is being removed from PageSerializer). The new
approach also solves bugs in MHTMLs containing subframes with a
different original-vs-current URIs (i.e. bugs 537047, 539936).
Additionally the new approach is compatible with OOPIFs (which don't
necessarily know the URLs of remote child frames and therefore wouldn't
be able to handle about:blank frames in the old way). To see other
reasons, why Content-IDs are desirable, please see the CL description of
crrev.com/1418653009.
The old link rewriting code in PageSerializer (i.e.
LinkChangeSerializerMarkupAccumulator) is being removed - this code was
part of the merge of PageSerializer and WebPageSerializerImpl that was
reverted back in July 2015. This code was not being used currently
(i.e. the registerRewriteURL and setRewriteURLFolder methods were not
being called right now outside of the test code).
The "rewritelinks_base.html" testcase is not applicable anymore, because
the code it was testing (base rewriting code in
LinkChangeSerializerMarkupAccumulator) is now removed. The other
PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator
(RewriteLinksSimple) has been rewritten on top of the new
PageSerializer::Delegate::rewriteLink API.
ContentIDs for subframes are for now generated in WebPageSerializer.cpp
(eventually this will have to be done in the browser process to be
compatible with --site-per-process flag and OOPIFs).
convertContentIDToURI method is being moved from
ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this
method by Content-ID-generating code in WebPageSerializer.
Pages with subframes that have been saved as MHTML after this CL lands will
contain Content-ID. This means that newly generated MHTML files will not
render properly in older versions of Chromium (i.e. before crrev.com/1418653009
which first landed in 48.0.2555.0) and will instead say "[Chromium / Google
Chrome] needs to launch an external application to handle cid: links".
After building Chromium ToT + this CL and using it to save an MHTML of a test
page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to
open the resulting test.mht file in various other, non-chromium-based browsers:
- Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097
opened and rendered the file correctly.
- Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all:
- Firefox renders MHTML files as if they were text/plain (i.e. it renders
MHTML headers and boundary markers and escaped/encoded MHTML parts).
- Safari simply opens the Mac Finder with the location of the MHTML file.
BUG=538766, 537047, 539936
Committed: https://crrev.com/0ebf1b1e9d43d344c5647de3630844b0244dd174
Cr-Commit-Position: refs/heads/master@{#362983}
Patch Set 1 #Patch Set 2 : Rebasing... #Patch Set 3 : Relaxing verification done by OfflinePageBridgeTest. #Patch Set 4 : Rebasing + self-review. #Patch Set 5 : More self-review-induced changes... #Patch Set 6 : Added browser tests. #
Total comments: 4
Patch Set 7 : Added a code comment explaining OfflinePageBridgeTest.java changes. #
Total comments: 2
Patch Set 8 : Simplified SerializerMarkupAccumulator::appendAttribute. #
Total comments: 4
Patch Set 9 : Addressed more CR feedback from yosin@. #
Total comments: 2
Patch Set 10 : Rebasing... #
Total comments: 6
Patch Set 11 : Using references for out parameters in Blink. #
Total comments: 14
Patch Set 12 : Addressed CR feedback from tkent@. #Patch Set 13 : Rebasing... #
Total comments: 5
Patch Set 14 : Addressed more CR feedback from tkent@ and yosin@. #Patch Set 15 : Fixing blink-gc (aka oilpan) integration. #
Total comments: 2
Patch Set 16 : Remove finalization and virtual destructor from PageSerializer::Delegate. #Patch Set 17 : Replace list Replaced initializer lists with array initialization. #Dependent Patchsets: Messages
Total messages: 68 (24 generated)
The CQ bit was checked by lukasza@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/100001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_TIMED_OUT, no build URL) ios_rel_device_ninja on tryserver.chromium.mac (JOB_TIMED_OUT, no build URL)
lukasza@chromium.org changed reviewers: + dcheng@chromium.org
dcheng@, could you please take a look?
Adding to CC a few people who are technically not in OWNERS files, but might be interested in the changes: - yosin@ - for serializer changes - fgorski@ - for OfflinePageBridgeTest.java changes + this is a fix for crbug.com/537047 that he pinged me about
Adding jcivelli@ to CC - I hear that you authored original MHTML support for Chromium - please let me know if you have any comments and/or feedback about adding Content-ID support to MHTML. I tried to explain why Content-IDs are desirable in the CL description. There is also a (possibly a bit stale) doc with some more notes and an example of before/after MHTML file at https://docs.google.com/document/d/1OMRoBCBuLV-HOcL8ZGcstYdocq0kHmFbZhWwJC7Ge...
fgorski@chromium.org changed reviewers: + fgorski@chromium.org
https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... File chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java (right): https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java:94: // BUG(518758): Depending on the bot the result will be either 626 or 627. Remove? https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java:96: assertTrue("Offline page item size is incorrect: " + size, 600 < size && size < 800); Add a comment why the range is so broad. (Include the information from the comment above, but don't stop there.
yosin@chromium.org changed reviewers: + yosin@chromium.org
+ @hajimehoshi, he did re-factor MarkupAccumulator spring 2015. https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:186: do { Using |do { ... } while (false)| other than macro isn't popular. Can we write as below? if (delegate) { String newValue = delegate->rewriteLink(element); if (newValue.isNotNull()) { if (... hasLegalLink) return appendRewrittenAttribute(...); if (... isHTMLFrameElementBase(...) ...) return appendRewrittenAttribute(...); } } ... It seems introduce PageSerializer::rewriteLink() making code simpler: String newValue = m_serailzier->rewriteLink(element); if (newValue.isNotNull) { if (... hasLegalLink) return appendRewrittenAttribute(...); if (... isHTMLFrameElementBase(...) ...) return appendRewrittenAttribute(...); } String PageSerializer::rewriteLink(const Element& element) { return m_delegate ? m_delegae->rewriteLink(element) : String(); } BTW, we may want to expose all member functions in Delegate to PageSerializer and drop |PageSerializer::delegate()| from PageSerializer.
fgorski@ + yosin@ - thanks for reviewing. Could you please take another look? https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... File chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java (right): https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java:94: // BUG(518758): Depending on the bot the result will be either 626 or 627. On 2015/11/20 17:07:48, fgorski wrote: > Remove? Good point. Done. https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javates... chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java:96: assertTrue("Offline page item size is incorrect: " + size, 600 < size && size < 800); On 2015/11/20 17:07:48, fgorski wrote: > Add a comment why the range is so broad. (Include the information from the > comment above, but don't stop there. Done. https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:186: do { On 2015/11/24 01:59:04, Yosi_UTC9 wrote: > Using |do { ... } while (false)| other than macro isn't popular. > Can we write as below? > > if (delegate) { > String newValue = delegate->rewriteLink(element); > if (newValue.isNotNull()) { > if (... hasLegalLink) > return appendRewrittenAttribute(...); > if (... isHTMLFrameElementBase(...) ...) > return appendRewrittenAttribute(...); > } > } > ... > I simplified that function and got rid of do...while(false). Looks much better now... > It seems introduce PageSerializer::rewriteLink() making code simpler: > > String newValue = m_serailzier->rewriteLink(element); > if (newValue.isNotNull) { > if (... hasLegalLink) > return appendRewrittenAttribute(...); > if (... isHTMLFrameElementBase(...) ...) > return appendRewrittenAttribute(...); > } > > String PageSerializer::rewriteLink(const Element& element) { > return m_delegate ? m_delegae->rewriteLink(element) : String(); > } > It looks nice, but this (SerializerMarkupAccumulator::appendAttribute) would be the only place where PageSerializer::rewriteLink would be used - I think I can just inline. > BTW, we may want to expose all member functions in Delegate to PageSerializer > and drop |PageSerializer::delegate()| from PageSerializer. This would be tricky for shouldIgnoreAttribute, so I think I'd rather not go this way.
offlinepages tests lgtm
Mixing chromium-style code and blink-style code makes me brain switching code. (^_^;) I wish blink to follow chromium-style. https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:195: if (element.hasLegalLinkAttribute(attribute.name())) { nit: early return style is preferred to get rid of else-if. if ((element.hasLegalLinkAttribute(attribute.name()))) return appendRewrittenAttribute(...); if (isHTMLFrameElementBase(&element) && attribute.name() == HTMLNames::srcdocAttr) { // Emit src instead of srcdoc attribute for frame elements. return appendRewrittenAttribute(out, element, HTMLNames::srcAttr.localName(), newValue); } // Please put comment why we don't append attribute. Note: void foo() { return bar(); } void bar() {} is valid in C++. https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:165: String contentID = (i == 0) ? frameToContentID.get(frame) : String(); I think having |bool isFirstResource;| with using range-for is better. Or, move |contentID| out of for-loop. String conentID = frameToContentID.get(frame); for (const auto& resource : resources) { MHTMArchive::generateMHTMLPart(..., contentID, ...); // Other than first frame, we don't need to have Content-ID header. contentID = String(); }
yosin@, could you please take another look? https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:195: if (element.hasLegalLinkAttribute(attribute.name())) { On 2015/11/25 01:21:55, Yosi_UTC9 wrote: > nit: early return style is preferred to get rid of else-if. > > if ((element.hasLegalLinkAttribute(attribute.name()))) > return appendRewrittenAttribute(...); > > if (isHTMLFrameElementBase(&element) && attribute.name() == > HTMLNames::srcdocAttr) { > // Emit src instead of srcdoc attribute for frame elements. > return appendRewrittenAttribute(out, element, HTMLNames::srcAttr.localName(), > newValue); > } > Done. > // Please put comment why we don't append attribute. Done. > > Note: void foo() { return bar(); } void bar() {} is valid in C++. Ack, although I am not sure if I understand the significance of your comment wrt to the changelist :-( https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:165: String contentID = (i == 0) ? frameToContentID.get(frame) : String(); On 2015/11/25 01:21:55, Yosi_UTC9 wrote: > I think having |bool isFirstResource;| with using range-for is better. Ok, done. (btw: the range-for was introduced by me in the previous CL :-) > Or, move |contentID| out of for-loop. > > String conentID = frameToContentID.get(frame); > for (const auto& resource : resources) { > MHTMArchive::generateMHTMLPart(..., contentID, ...); > // Other than first frame, we don't need to have Content-ID header. > contentID = String(); > } I think using the boolean (your first suggestion) is more verbose, but easier to understand.
https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:129: contentIDBuilder.appendLiteral("<frame"); How about just leaving off the angle brackets here and letting generateMHTMLPart insert them as needed? That way, convertContentIDToURI doesn't need to strip off the angle brackets.
https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:129: contentIDBuilder.appendLiteral("<frame"); On 2015/11/25 18:41:53, dcheng wrote: > How about just leaving off the angle brackets here and letting generateMHTMLPart > insert them as needed? That way, convertContentIDToURI doesn't need to strip off > the angle brackets. convertContentIDToURI strips the angle brackets, because they are present when MHTMLParser reads the MHTML/Mime Content-ID headers. We could do the stripping in MHTMLParser / outside of convertContentIDToURI, but it would mean that now we have 3 not 2 forms of content-id floating around: 1. <foo@bar.com> 2. cid:foo@bar.com 3. foo@bar.com So - I think we might want to leave this aspect of the code as-is?
https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:83: virtual String rewriteLink(const Element&) = 0; It might be easier to just return a bool here and return the String result in a ref parameter. https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:90: Vector<SerializedResource>*, Nit: Vector<>& https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/mhtml/MHTMLParser.cpp (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/mhtml/MHTMLParser.cpp:422: return KURL(KURL(), uriBuilder.toString()); Maybe use ParsedURLString? It's guaranteed that uriBuilder.toString() will be a valid URL right?
dcheng@, could you please take another look? https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:83: virtual String rewriteLink(const Element&) = 0; On 2015/12/01 04:19:31, dcheng wrote: > It might be easier to just return a bool here and return the String result in a > ref parameter. Done. Explicit bool does look/feel better than overloading the meaning of a null String. Side node: For a moment I wondered whether I should split this method into 2 (isRewritingRequired(const Element&) + String getRewrittenLink(const Element&)) but the 2 methods would have to share most of their implementation => keeping this as a single method feels better. https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:90: Vector<SerializedResource>*, On 2015/12/01 04:19:31, dcheng wrote: > Nit: Vector<>& Done. I guess https://www.chromium.org/blink/coding-style#TOC-Pointers-and-References wins with https://google.github.io/styleguide/cppguide.html#Reference_Arguments in this case :-) Maybe one day I'll get used to the differences between Chromium and Blink styles :-> I was wondering how deep I should go with this change. I've decided to change the public API (i.e. the type of the constructor parameter that you've pointed out) but leave private implementation details intact (i.e. leaving the type of the PageSerializer::m_resources field as a pointer). https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... File third_party/WebKit/Source/platform/mhtml/MHTMLParser.cpp (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Sou... third_party/WebKit/Source/platform/mhtml/MHTMLParser.cpp:422: return KURL(KURL(), uriBuilder.toString()); On 2015/12/01 04:19:31, dcheng wrote: > Maybe use ParsedURLString? It's guaranteed that uriBuilder.toString() will be a > valid URL right? I almost made the change you've suggested, but started having second thoughts after realizing that I can't really add an assert that the url will be always valid. This is because |contentID| may come from an untrusted source (i.e. from MHTML/MIME headers from an MHTML document). Additionally, this doesn't feel like a performance-sensitive code path (OTOH, this statement is based on an armchair analysis rather than on data), so it might be better to just play it safe. Given above, I think it is better to leave it as is. WDYT?
lgtm
lukasza@chromium.org changed reviewers: + rdsmith@chromium.org
Thanks for reviewing. rdsmith@, could you please take a look?
chrome/browser/download, chrome/test/data LGTM.
lukasza@chromium.org changed reviewers: + tkent@chromium.org
Thanks for reviewing. tkent@, could you please take a look? (mostly at third_party/WebKit/Source/platform/mhtml)
Description was changed from ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach is being removed from PageSerializer). This approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 ========== to ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 ==========
https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:78: // return true and populate |desiredLink| with a desired value of the desiredLink -> rewrittenLink ? https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:88: // serialization aspects. nit: We should note ownership/lifetime of Delegate object. e.g. Callsites should own the specified Delegate object and keep it alive until PageSerializer dies. https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:70: public: Please add STACK_ALLCOATED(); for oilpan https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:71: MHTMLPageSerializerDelegate(HashMap<Frame*, String>* frameToContentID); The argument and m_frameToContentId should be |const WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String>&|. https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:74: private: nit: add a blank line before |private:| https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:127: static HashMap<Frame*, String> generateFrameContentIDs(Page* page) The function should be in anonymous namespace above. The return value and |frameToContenID| below should be WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String>.
tkent@, thanks for the feedback - could you take another look please? https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:78: // return true and populate |desiredLink| with a desired value of the On 2015/12/02 00:59:59, tkent wrote: > desiredLink -> rewrittenLink ? Done. Thanks for catching this. https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:88: // serialization aspects. On 2015/12/02 00:59:59, tkent wrote: > nit: We should note ownership/lifetime of Delegate object. e.g. Callsites > should own the specified Delegate object and keep it alive until PageSerializer > dies. Done. Good point. https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:70: public: On 2015/12/02 00:59:59, tkent wrote: > Please add STACK_ALLCOATED(); for oilpan After doing this I got the following error: [blink-gc] Stack-allocated class 'MHTMLPageSerializerDelegate' derives class 'Delegate' which is not stack allocated. I think I can't fix this, because some classes derived from the Delegate (i.e. the test suite in PageSerializerTest.cpp) will not be stack allocated. Still - after your comment, I've added the wtf way of saying this class should not be copyable. Does that sound ok? https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:71: MHTMLPageSerializerDelegate(HashMap<Frame*, String>* frameToContentID); On 2015/12/02 00:59:59, tkent wrote: > The argument and m_frameToContentId should be |const > WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String>&|. I made the change you've suggested, but I wanted to double-check if it is ok, because I am allocating the HashMap on the stack (i.e. see the new code in WebPageSerializer.cpp in serializePageToMHTML - frameToContentID is allocated on the stack [although I did change its type to WillBeHeap... in the latest patchset). https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:74: private: On 2015/12/02 00:59:59, tkent wrote: > nit: add a blank line before |private:| Done. https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:127: static HashMap<Frame*, String> generateFrameContentIDs(Page* page) On 2015/12/02 00:59:59, tkent wrote: > The function should be in anonymous namespace above. > Done (for this and the next function). I am not sure how that happened... :-/ > The return value and |frameToContenID| below should be > WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String>. Ok (but please see the question in my other reply).
https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:71: MHTMLPageSerializerDelegate(HashMap<Frame*, String>* frameToContentID); On 2015/12/02 02:57:08, Łukasz Anforowicz wrote: > On 2015/12/02 00:59:59, tkent wrote: > > The argument and m_frameToContentId should be |const > > WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String>&|. > > I made the change you've suggested, but I wanted to double-check if it is ok, > because I am allocating the HashMap on the stack (i.e. see the new code in > WebPageSerializer.cpp in serializePageToMHTML - frameToContentID is allocated on > the stack [although I did change its type to WillBeHeap... in the latest > patchset). Actually, I don't know how I managed to confuse myself here - blink::HeapHashMap has "heap" in its name because it allocates buckets / elements / etc on the heap, not because the heap map itself is be stack allocated... So - I think that the latest patchset should be ok.
lgtm https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:91: PageSerializer( nit: We can put both parameter in same line.
lgtm https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Sou... third_party/WebKit/Source/web/WebPageSerializer.cpp:70: public: On 2015/12/02 at 02:57:08, Łukasz Anforowicz wrote: > On 2015/12/02 00:59:59, tkent wrote: > > Please add STACK_ALLCOATED(); for oilpan > > After doing this I got the following error: [blink-gc] Stack-allocated class 'MHTMLPageSerializerDelegate' derives class 'Delegate' which is not stack allocated. > > I think I can't fix this, because some classes derived from the Delegate (i.e. the test suite in PageSerializerTest.cpp) will not be stack allocated. > > Still - after your comment, I've added the wtf way of saying this class should not be copyable. > > Does that sound ok? Looks ok. It seems it's hard to make Delegate STACK_ALLOCATED because it is used with a test fixture. https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:129: HashSet<const Element*> m_elementsWithRewrittenLinks; The type should be WillBeHeapHashSet<RawPtrWillBeMember<const Element>>.
Thanks for reviewing tkent@ and yosin@. https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:129: HashSet<const Element*> m_elementsWithRewrittenLinks; On 2015/12/02 06:33:54, tkent wrote: > The type should be WillBeHeapHashSet<RawPtrWillBeMember<const Element>>. Done. Thank you for pointing this out (and apologies for not realizing I need to also check HashSet and Vector in addition to HashMap). BTW: I see that there are still some references to Vector that have not been converted to WillBeHeapVector. I assume that these shouldn't block this CL (they are fairly old + I'll try to run an oilpan try bot to double-check), but I wonder if I should do something about this in a future CL. https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.h:91: PageSerializer( On 2015/12/02 05:25:12, Yosi_back_on_Jan_5 wrote: > nit: We can put both parameter in same line. Done.
Description was changed from ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 ========== to ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. TBR=dtrainor@chromium.org BUG=538766, 537047, 539936 ==========
lukasza@chromium.org changed reviewers: + dtrainor@chromium.org
dtrainor@, I've added you to TBR for chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java (this has already been reviewed by fgorski@).
thanks for the heads up! lgtm for that particular test anyway :).
Description was changed from ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. TBR=dtrainor@chromium.org BUG=538766, 537047, 539936 ========== to ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 ==========
dcheng@, could you please take one more look from Oilpan perspective? In particular - I would appreciate an extra set of eyes for WebPageSerializer.cpp and PageSerializer.h changes made in patchsets #12 through #15. Notes I made when working on patchset #15: - MHTMLPageSerializerDelegate (in WebPageSerializer.cpp) needs to contain a reference/pointer to a map from Frame pointers to content id Strings. The map is stack-allocated in serializePageToMHTML function (in WebPageSerializer.cpp). As suggested by tkent@ I've declared the map as WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String> and I am storing a reference to the map as a const-ref m_frameToContentID field. - MHTMLPageSerializerDelegate is always allocated on the stack, but I cannot declare it as STACK_ALLOCATED(), because it is implementing PageSerializer::Delegate (which is also implemented by non-stack-allocated classes like PageSerializerTest. Therefore [1] I believe I need to declare MHTMLPageSerializerDelegate as derived from GarbageCollectedFinalized<MHTMLPageSerializerDelegate> (because otherwise I won't be able to use WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String> as the type of the map). I think I need to inherit from *NoBaseWillBe*GarbageCollectedFinalized<...> to work ok when oilpan is disabled. And that I need "finalized" version, because the base class has a virtual destructor (needed because the base class is an abstract interface). - I had to remove USING_FAST_MALLOC(Delegate) from Delegate class declaration in PageSerializer.h. This was done because otherwise when MHTMLPageSerializerDelegate inherits from GarbageCollectedFinalized<MHTMLPageSerializerDelegate> then the compiler complains that member 'operator delete' found in multiple base classes of different types. - I've added a trace method definition to MHTMLPageSerializerDelegate. Tracing into a const-ref field should be fine, right? [1] https://www.chromium.org/developers/blink-gc-plugin-errors (section about "Stack-allocated class 'Foo' derives class 'Bar' which is not stack allocated")
On 2015/12/02 at 19:42:26, lukasza wrote: > dcheng@, could you please take one more look from Oilpan perspective? In particular - I would appreciate an extra set of eyes for WebPageSerializer.cpp and PageSerializer.h changes made in patchsets #12 through #15. > > Notes I made when working on patchset #15: > > - MHTMLPageSerializerDelegate (in WebPageSerializer.cpp) needs to > contain a reference/pointer to a map from Frame pointers to content id > Strings. The map is stack-allocated in serializePageToMHTML function > (in WebPageSerializer.cpp). As suggested by tkent@ I've declared the > map as WillBeHeapHashMap<RawPtrWillBeMember<Frame>, String> and I am > storing a reference to the map as a const-ref m_frameToContentID field. > > - MHTMLPageSerializerDelegate is always allocated on the stack, but I > cannot declare it as STACK_ALLOCATED(), because it is implementing > PageSerializer::Delegate (which is also implemented by > non-stack-allocated classes like PageSerializerTest. Therefore [1] > I believe I need to declare MHTMLPageSerializerDelegate as derived from > GarbageCollectedFinalized<MHTMLPageSerializerDelegate> (because otherwise I > won't be able to use WillBeHeapHashMap<RawPtrWillBeMember<Frame>, > String> as the type of the map). I think I need to inherit from > *NoBaseWillBe*GarbageCollectedFinalized<...> to work ok when oilpan > is disabled. And that I need "finalized" version, because the > base class has a virtual destructor (needed because the base class > is an abstract interface). I think you can just remove the virtual dtor from Delegate (there's nothing to destroy, and the test fixture already gets a virtual dtor from testing::Test. It's kind of ugly, but requiring finalization here is kind of silly too. > > - I had to remove USING_FAST_MALLOC(Delegate) from Delegate class > declaration in PageSerializer.h. This was done because otherwise when > MHTMLPageSerializerDelegate inherits from > GarbageCollectedFinalized<MHTMLPageSerializerDelegate> then the compiler > complains that member 'operator delete' found in multiple base classes > of different types. > > - I've added a trace method definition to MHTMLPageSerializerDelegate. > Tracing into a const-ref field should be fine, right? Yep, this is fine. And if it isn't, well, it shouldn't have compiled ;-) > > [1] https://www.chromium.org/developers/blink-gc-plugin-errors > (section about "Stack-allocated class 'Foo' derives class 'Bar' which is not stack allocated") lgtm
On 2015/12/02 22:05:01, dcheng wrote: > On 2015/12/02 at 19:42:26, lukasza wrote: > > > > - MHTMLPageSerializerDelegate is always allocated on the stack, but I > > cannot declare it as STACK_ALLOCATED(), because it is implementing > > PageSerializer::Delegate (which is also implemented by > > non-stack-allocated classes like PageSerializerTest. Therefore [1] > > I believe I need to declare MHTMLPageSerializerDelegate as derived from > > GarbageCollectedFinalized<MHTMLPageSerializerDelegate> (because otherwise I > > won't be able to use WillBeHeapHashMap<RawPtrWillBeMember<Frame>, > > String> as the type of the map). I think I need to inherit from > > *NoBaseWillBe*GarbageCollectedFinalized<...> to work ok when oilpan > > is disabled. And that I need "finalized" version, because the > > base class has a virtual destructor (needed because the base class > > is an abstract interface). > > I think you can just remove the virtual dtor from Delegate (there's nothing to > destroy, and the test fixture already gets a virtual dtor from testing::Test. > It's kind of ugly, but requiring finalization here is kind of silly too. Done. > > > > > - I had to remove USING_FAST_MALLOC(Delegate) from Delegate class > > declaration in PageSerializer.h. This was done because otherwise when > > MHTMLPageSerializerDelegate inherits from > > GarbageCollectedFinalized<MHTMLPageSerializerDelegate> then the compiler > > complains that member 'operator delete' found in multiple base classes > > of different types. > > > > - I've added a trace method definition to MHTMLPageSerializerDelegate. > > Tracing into a const-ref field should be fine, right? > > Yep, this is fine. And if it isn't, well, it shouldn't have compiled ;-) Ack. Thanks for double-checking. It just feels weird to have to trace into constant/unchanging hashtable that I know is always living on the stack (and is therefore always a gc "root"). OTOH, I understand that "living on the stack" is something I see as a human, but this isn't statically enforced or discoverable.
The CQ bit was checked by lukasza@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from fgorski@chromium.org, rdsmith@chromium.org, tkent@chromium.org, yosin@chromium.org, dtrainor@chromium.org, dcheng@chromium.org Link to the patchset: https://codereview.chromium.org/1441553002/#ps300001 (title: "Remove finalization and virtual destructor from PageSerializer::Delegate.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/300001
https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/downloa... File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/downloa... chrome/browser/download/save_page_browsertest.cc:1087: std::vector<std::string> expected_substrings{ Btw, using initializer lists isn't actually allowed yet. Sorry =(
The CQ bit was unchecked by lukasza@chromium.org
Patchset #17 (id:320001) has been deleted
https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/downloa... File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/downloa... chrome/browser/download/save_page_browsertest.cc:1087: std::vector<std::string> expected_substrings{ On 2015/12/02 22:43:26, dcheng wrote: > Btw, using initializer lists isn't actually allowed yet. Sorry =( :-( Done. (replaced initializer lists with array initialization + constructing the vector via std::begin and std::end [which are allowed in Chromium]).
The CQ bit was checked by lukasza@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from dtrainor@chromium.org, fgorski@chromium.org, tkent@chromium.org, rdsmith@chromium.org, dcheng@chromium.org, yosin@chromium.org Link to the patchset: https://codereview.chromium.org/1441553002/#ps340001 (title: "Replace list Replaced initializer lists with array initialization.")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Sou... third_party/WebKit/Source/core/page/PageSerializer.cpp:129: HashSet<const Element*> m_elementsWithRewrittenLinks; On 2015/12/02 at 15:12:34, Łukasz Anforowicz wrote: > On 2015/12/02 06:33:54, tkent wrote: > > The type should be WillBeHeapHashSet<RawPtrWillBeMember<const Element>>. > > Done. Thank you for pointing this out (and apologies for not realizing I need to also check HashSet and Vector in addition to HashMap). > > BTW: I see that there are still some references to Vector that have not been converted to WillBeHeapVector. I assume that these shouldn't block this CL (they are fairly old + I'll try to run an oilpan try bot to double-check), but I wonder if I should do something about this in a future CL. We continue to use vanilla Vector for SerializedResource because it won't be a garbage-collected type.
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: linux_chromium_compile_dbg_32_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_...)
The CQ bit was checked by lukasza@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by lukasza@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
On 2015/12/03 03:57:06, commit-bot: I haz the power wrote: > Try jobs failed on following builders: > ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, > http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...) The failures: RoundTripTestCookieStore/CookieStoreTest/0.TestNonDottedAndTLD CookieStoreIOS/CookieStoreTest/0.TestNonDottedAndTLD It seems that the same tests are also failing on the main waterfall: - http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator... - http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator... And at a first glance the tests do not seem related to the changes in this CL. So, I guess I'll just keep re-checking the CQ checkbox... :-/
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator...)
The CQ bit was checked by lukasza@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
Message was sent while issue was closed.
Committed patchset #17 (id:340001)
Message was sent while issue was closed.
Description was changed from ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 ========== to ========== Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 Committed: https://crrev.com/0ebf1b1e9d43d344c5647de3630844b0244dd174 Cr-Commit-Position: refs/heads/master@{#362983} ==========
Message was sent while issue was closed.
Patchset 17 (id:??) landed as https://crrev.com/0ebf1b1e9d43d344c5647de3630844b0244dd174 Cr-Commit-Position: refs/heads/master@{#362983} |