Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(659)

Issue 1441553002: Generating CIDs in Blink during MHTML serialization. (Closed)

Created:
5 years, 1 month ago by Łukasz Anforowicz
Modified:
5 years ago
CC:
blink-reviews, blink-reviews-_chromium.org, chromium-reviews, dglazkov+blink, fgorski, hajimehoshi, Jay Civelli, site-isolation-reviews_chromium.org, yosin_UTC9
Base URL:
https://chromium.googlesource.com/chromium/src.git@mhtml-per-frame-page-serializer-only
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Generating CIDs in Blink during MHTML serialization. This CL generalizes link rewriting done by PageSerializer and gives control over it to users of PageSerializer through the new PageSerializer::Delegate::rewriteLink virtual method. The link rewriting is used to write cid:... uris in place of original iframe.src attributes. This is a more generic approach for solving the about:blank cross-referencing (the old approach of using artificial wyciwyg://frame/X URLs is being removed from PageSerializer). The new approach also solves bugs in MHTMLs containing subframes with a different original-vs-current URIs (i.e. bugs 537047, 539936). Additionally the new approach is compatible with OOPIFs (which don't necessarily know the URLs of remote child frames and therefore wouldn't be able to handle about:blank frames in the old way). To see other reasons, why Content-IDs are desirable, please see the CL description of crrev.com/1418653009. The old link rewriting code in PageSerializer (i.e. LinkChangeSerializerMarkupAccumulator) is being removed - this code was part of the merge of PageSerializer and WebPageSerializerImpl that was reverted back in July 2015. This code was not being used currently (i.e. the registerRewriteURL and setRewriteURLFolder methods were not being called right now outside of the test code). The "rewritelinks_base.html" testcase is not applicable anymore, because the code it was testing (base rewriting code in LinkChangeSerializerMarkupAccumulator) is now removed. The other PageSerializerTest testcase related to LinkChangeSerializerMarkupAccumulator (RewriteLinksSimple) has been rewritten on top of the new PageSerializer::Delegate::rewriteLink API. ContentIDs for subframes are for now generated in WebPageSerializer.cpp (eventually this will have to be done in the browser process to be compatible with --site-per-process flag and OOPIFs). convertContentIDToURI method is being moved from ArchiveResourceCollection.cpp to MHTMLParser, to enable reusing this method by Content-ID-generating code in WebPageSerializer. Pages with subframes that have been saved as MHTML after this CL lands will contain Content-ID. This means that newly generated MHTML files will not render properly in older versions of Chromium (i.e. before crrev.com/1418653009 which first landed in 48.0.2555.0) and will instead say "[Chromium / Google Chrome] needs to launch an external application to handle cid: links". After building Chromium ToT + this CL and using it to save an MHTML of a test page from http://anforowicz.github.io/nested-frames/index.htm, I've tried to open the resulting test.mht file in various other, non-chromium-based browsers: - Chromium ToT, Chrome 48.0.2564.8 and Internet Explorer 11.0.9600.18097 opened and rendered the file correctly. - Both Firefox 42.0 and Safari 9.0 (11601.1.56) seem to not support MHTML at all: - Firefox renders MHTML files as if they were text/plain (i.e. it renders MHTML headers and boundary markers and escaped/encoded MHTML parts). - Safari simply opens the Mac Finder with the location of the MHTML file. BUG=538766, 537047, 539936 Committed: https://crrev.com/0ebf1b1e9d43d344c5647de3630844b0244dd174 Cr-Commit-Position: refs/heads/master@{#362983}

Patch Set 1 #

Patch Set 2 : Rebasing... #

Patch Set 3 : Relaxing verification done by OfflinePageBridgeTest. #

Patch Set 4 : Rebasing + self-review. #

Patch Set 5 : More self-review-induced changes... #

Patch Set 6 : Added browser tests. #

Total comments: 4

Patch Set 7 : Added a code comment explaining OfflinePageBridgeTest.java changes. #

Total comments: 2

Patch Set 8 : Simplified SerializerMarkupAccumulator::appendAttribute. #

Total comments: 4

Patch Set 9 : Addressed more CR feedback from yosin@. #

Total comments: 2

Patch Set 10 : Rebasing... #

Total comments: 6

Patch Set 11 : Using references for out parameters in Blink. #

Total comments: 14

Patch Set 12 : Addressed CR feedback from tkent@. #

Patch Set 13 : Rebasing... #

Total comments: 5

Patch Set 14 : Addressed more CR feedback from tkent@ and yosin@. #

Patch Set 15 : Fixing blink-gc (aka oilpan) integration. #

Total comments: 2

Patch Set 16 : Remove finalization and virtual destructor from PageSerializer::Delegate. #

Patch Set 17 : Replace list Replaced initializer lists with array initialization. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+407 lines, -267 lines) Patch
M chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java View 1 2 3 4 5 6 1 chunk +11 lines, -2 lines 0 comments Download
M chrome/browser/download/save_page_browsertest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 6 chunks +39 lines, -5 lines 0 comments Download
A chrome/test/data/save_page/frames-runtime-changes.htm View 1 2 3 4 5 1 chunk +74 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/core/page/PageSerializer.h View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 4 chunks +21 lines, -18 lines 0 comments Download
M third_party/WebKit/Source/core/page/PageSerializer.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 12 chunks +69 lines, -151 lines 0 comments Download
M third_party/WebKit/Source/platform/mhtml/ArchiveResourceCollection.cpp View 1 2 3 3 chunks +3 lines, -29 lines 0 comments Download
M third_party/WebKit/Source/platform/mhtml/MHTMLArchive.h View 1 4 5 6 7 8 9 1 chunk +5 lines, -1 line 0 comments Download
M third_party/WebKit/Source/platform/mhtml/MHTMLArchive.cpp View 1 4 5 6 7 8 9 5 chunks +27 lines, -4 lines 0 comments Download
M third_party/WebKit/Source/platform/mhtml/MHTMLParser.h View 1 2 3 2 chunks +9 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/platform/mhtml/MHTMLParser.cpp View 1 2 3 1 chunk +22 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/web/WebPageSerializer.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 6 chunks +86 lines, -9 lines 0 comments Download
M third_party/WebKit/Source/web/tests/MHTMLTest.cpp View 1 2 3 1 chunk +1 line, -1 line 0 comments Download
M third_party/WebKit/Source/web/tests/PageSerializerTest.cpp View 1 2 3 4 5 6 7 8 9 10 6 chunks +40 lines, -34 lines 0 comments Download
D third_party/WebKit/Source/web/tests/data/pageserializer/rewritelinks/rewritelinks_base.html View 1 chunk +0 lines, -13 lines 0 comments Download

Dependent Patchsets:

Messages

Total messages: 68 (24 generated)
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/100001
5 years, 1 month ago (2015-11-20 01:06:50 UTC) #2
commit-bot: I haz the power
Dry run: Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_TIMED_OUT, no build URL) ...
5 years, 1 month ago (2015-11-20 03:11:05 UTC) #4
Łukasz Anforowicz
dcheng@, could you please take a look?
5 years, 1 month ago (2015-11-20 16:28:37 UTC) #6
Łukasz Anforowicz
Adding to CC a few people who are technically not in OWNERS files, but might ...
5 years, 1 month ago (2015-11-20 16:33:44 UTC) #7
Łukasz Anforowicz
Adding jcivelli@ to CC - I hear that you authored original MHTML support for Chromium ...
5 years, 1 month ago (2015-11-20 16:41:28 UTC) #8
fgorski
https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java File chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java (right): https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java#newcode94 chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java:94: // BUG(518758): Depending on the bot the result will ...
5 years, 1 month ago (2015-11-20 17:07:48 UTC) #10
yosin_UTC9
+ @hajimehoshi, he did re-factor MarkupAccumulator spring 2015. https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Source/core/page/PageSerializer.cpp File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/120001/third_party/WebKit/Source/core/page/PageSerializer.cpp#newcode186 third_party/WebKit/Source/core/page/PageSerializer.cpp:186: do ...
5 years, 1 month ago (2015-11-24 01:59:04 UTC) #12
Łukasz Anforowicz
fgorski@ + yosin@ - thanks for reviewing. Could you please take another look? https://codereview.chromium.org/1441553002/diff/100001/chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java File ...
5 years ago (2015-11-24 18:40:42 UTC) #13
fgorski
offlinepages tests lgtm
5 years ago (2015-11-24 21:04:48 UTC) #14
yosin_UTC9
Mixing chromium-style code and blink-style code makes me brain switching code. (^_^;) I wish blink ...
5 years ago (2015-11-25 01:21:55 UTC) #15
Łukasz Anforowicz
yosin@, could you please take another look? https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Source/core/page/PageSerializer.cpp File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/140001/third_party/WebKit/Source/core/page/PageSerializer.cpp#newcode195 third_party/WebKit/Source/core/page/PageSerializer.cpp:195: if (element.hasLegalLinkAttribute(attribute.name())) ...
5 years ago (2015-11-25 17:19:18 UTC) #16
dcheng
https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Source/web/WebPageSerializer.cpp File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Source/web/WebPageSerializer.cpp#newcode129 third_party/WebKit/Source/web/WebPageSerializer.cpp:129: contentIDBuilder.appendLiteral("<frame"); How about just leaving off the angle brackets ...
5 years ago (2015-11-25 18:41:53 UTC) #17
Łukasz Anforowicz
https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Source/web/WebPageSerializer.cpp File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/160001/third_party/WebKit/Source/web/WebPageSerializer.cpp#newcode129 third_party/WebKit/Source/web/WebPageSerializer.cpp:129: contentIDBuilder.appendLiteral("<frame"); On 2015/11/25 18:41:53, dcheng wrote: > How about ...
5 years ago (2015-11-25 19:16:49 UTC) #18
dcheng
https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Source/core/page/PageSerializer.h File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Source/core/page/PageSerializer.h#newcode83 third_party/WebKit/Source/core/page/PageSerializer.h:83: virtual String rewriteLink(const Element&) = 0; It might be ...
5 years ago (2015-12-01 04:19:31 UTC) #19
Łukasz Anforowicz
dcheng@, could you please take another look? https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Source/core/page/PageSerializer.h File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/180001/third_party/WebKit/Source/core/page/PageSerializer.h#newcode83 third_party/WebKit/Source/core/page/PageSerializer.h:83: virtual String ...
5 years ago (2015-12-01 18:45:14 UTC) #20
dcheng
lgtm
5 years ago (2015-12-01 18:50:51 UTC) #21
Łukasz Anforowicz
Thanks for reviewing. rdsmith@, could you please take a look?
5 years ago (2015-12-01 19:12:43 UTC) #23
Randy Smith (Not in Mondays)
chrome/browser/download, chrome/test/data LGTM.
5 years ago (2015-12-01 21:56:42 UTC) #24
Łukasz Anforowicz
Thanks for reviewing. tkent@, could you please take a look? (mostly at third_party/WebKit/Source/platform/mhtml)
5 years ago (2015-12-01 23:04:02 UTC) #26
tkent
https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/core/page/PageSerializer.h File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/core/page/PageSerializer.h#newcode78 third_party/WebKit/Source/core/page/PageSerializer.h:78: // return true and populate |desiredLink| with a desired ...
5 years ago (2015-12-02 00:59:59 UTC) #28
Łukasz Anforowicz
tkent@, thanks for the feedback - could you take another look please? https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/core/page/PageSerializer.h File third_party/WebKit/Source/core/page/PageSerializer.h ...
5 years ago (2015-12-02 02:57:08 UTC) #29
Łukasz Anforowicz
https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/web/WebPageSerializer.cpp File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/web/WebPageSerializer.cpp#newcode71 third_party/WebKit/Source/web/WebPageSerializer.cpp:71: MHTMLPageSerializerDelegate(HashMap<Frame*, String>* frameToContentID); On 2015/12/02 02:57:08, Łukasz Anforowicz wrote: ...
5 years ago (2015-12-02 03:11:47 UTC) #30
yosin_UTC9
lgtm https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.h File third_party/WebKit/Source/core/page/PageSerializer.h (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.h#newcode91 third_party/WebKit/Source/core/page/PageSerializer.h:91: PageSerializer( nit: We can put both parameter in ...
5 years ago (2015-12-02 05:25:12 UTC) #31
tkent
lgtm https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/web/WebPageSerializer.cpp File third_party/WebKit/Source/web/WebPageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/200001/third_party/WebKit/Source/web/WebPageSerializer.cpp#newcode70 third_party/WebKit/Source/web/WebPageSerializer.cpp:70: public: On 2015/12/02 at 02:57:08, Łukasz Anforowicz wrote: ...
5 years ago (2015-12-02 06:33:54 UTC) #32
Łukasz Anforowicz
Thanks for reviewing tkent@ and yosin@. https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.cpp File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.cpp#newcode129 third_party/WebKit/Source/core/page/PageSerializer.cpp:129: HashSet<const Element*> m_elementsWithRewrittenLinks; ...
5 years ago (2015-12-02 15:12:34 UTC) #33
Łukasz Anforowicz
dtrainor@, I've added you to TBR for chrome/android/javatests/src/org/chromium/chrome/browser/offlinepages/OfflinePageBridgeTest.java (this has already been reviewed by fgorski@).
5 years ago (2015-12-02 15:14:41 UTC) #36
David Trainor- moved to gerrit
thanks for the heads up! lgtm for that particular test anyway :).
5 years ago (2015-12-02 16:22:54 UTC) #37
Łukasz Anforowicz
dcheng@, could you please take one more look from Oilpan perspective? In particular - I ...
5 years ago (2015-12-02 19:42:26 UTC) #39
dcheng
On 2015/12/02 at 19:42:26, lukasza wrote: > dcheng@, could you please take one more look ...
5 years ago (2015-12-02 22:05:01 UTC) #40
Łukasz Anforowicz
On 2015/12/02 22:05:01, dcheng wrote: > On 2015/12/02 at 19:42:26, lukasza wrote: > > > ...
5 years ago (2015-12-02 22:30:01 UTC) #41
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/300001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/300001
5 years ago (2015-12-02 22:32:10 UTC) #44
dcheng
https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/download/save_page_browsertest.cc File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/download/save_page_browsertest.cc#newcode1087 chrome/browser/download/save_page_browsertest.cc:1087: std::vector<std::string> expected_substrings{ Btw, using initializer lists isn't actually allowed ...
5 years ago (2015-12-02 22:43:26 UTC) #45
Łukasz Anforowicz
https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/download/save_page_browsertest.cc File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1441553002/diff/280001/chrome/browser/download/save_page_browsertest.cc#newcode1087 chrome/browser/download/save_page_browsertest.cc:1087: std::vector<std::string> expected_substrings{ On 2015/12/02 22:43:26, dcheng wrote: > Btw, ...
5 years ago (2015-12-02 23:23:54 UTC) #48
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
5 years ago (2015-12-02 23:26:43 UTC) #51
tkent
https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.cpp File third_party/WebKit/Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1441553002/diff/240001/third_party/WebKit/Source/core/page/PageSerializer.cpp#newcode129 third_party/WebKit/Source/core/page/PageSerializer.cpp:129: HashSet<const Element*> m_elementsWithRewrittenLinks; On 2015/12/02 at 15:12:34, Łukasz Anforowicz ...
5 years ago (2015-12-03 00:20:46 UTC) #52
commit-bot: I haz the power
Try jobs failed on following builders: linux_chromium_compile_dbg_32_ng on tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/linux_chromium_compile_dbg_32_ng/builds/130394)
5 years ago (2015-12-03 00:46:52 UTC) #54
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
5 years ago (2015-12-03 01:11:35 UTC) #56
commit-bot: I haz the power
Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator_ninja/builds/102111)
5 years ago (2015-12-03 03:57:06 UTC) #58
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
5 years ago (2015-12-03 04:40:36 UTC) #60
Łukasz Anforowicz
On 2015/12/03 03:57:06, commit-bot: I haz the power wrote: > Try jobs failed on following ...
5 years ago (2015-12-03 06:10:32 UTC) #61
commit-bot: I haz the power
Try jobs failed on following builders: ios_dbg_simulator_ninja on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/ios_dbg_simulator_ninja/builds/102309)
5 years ago (2015-12-03 06:40:48 UTC) #63
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1441553002/340001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1441553002/340001
5 years ago (2015-12-03 14:40:22 UTC) #65
commit-bot: I haz the power
Committed patchset #17 (id:340001)
5 years ago (2015-12-03 15:43:23 UTC) #66
commit-bot: I haz the power
5 years ago (2015-12-03 15:44:18 UTC) #68
Message was sent while issue was closed.
Patchset 17 (id:??) landed as
https://crrev.com/0ebf1b1e9d43d344c5647de3630844b0244dd174
Cr-Commit-Position: refs/heads/master@{#362983}

Powered by Google App Engine
This is Rietveld 408576698