Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(586)

Issue 1407663004: Tweaking WebPageSerializerImpl to emit a BOM for UTF16/32. (Closed)

Created:
5 years, 2 months ago by Łukasz Anforowicz
Modified:
5 years ago
CC:
asanka, blink-reviews, cbentzel+watch_chromium.org, chromium-reviews, ncarter (slow), site-isolation-reviews_chromium.org
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Tweaking WebPageSerializerImpl to emit a BOM for UTF16/32. Previously save-page-as would truncate saved html files if they would include a '\0' byte (i.e. if saving a html document that was encoded as utf16). The truncation doesn't happen anymore [1], but the saved html files still don't round-trip properly, because the browser is not able to detect the encoding of the files (<meta> character cannot be sniffed in encodings that are not ascii compatible [2] + it is obviously impossible for local files to declare their encoding via http headers). This CL makes changes to always emits a BOM when saving in UTF16 or UTF32 encoding. This in-turn causes round-tripping tests in save_page_browsertest.cc to pass for tests using these encodings. Note that this CL applies only to save as "complete html" and has no impact to save as "html only" or save as "mhtml". [1] https://chromium.googlesource.com/chromium/src/+/17e4a38aef61b832589933c942426a9818e008ec%5E%21/content/renderer/render_frame_impl.cc [2] http://www.w3.org/TR/html5/document-metadata#charset: If an HTML document contains a meta element with a charset attribute or a meta element with an http-equiv attribute in the encoding declaration state, then the character encoding used must be an ASCII-compatible character encoding. BUG=541699 Committed: https://crrev.com/36f0d157b6d2c6fae627a3aba955ad4e0fe07d71 Cr-Commit-Position: refs/heads/master@{#364394}

Patch Set 1 : #

Patch Set 2 : Reverting unnecessary embedded_test_server.cc changes. #

Total comments: 4

Patch Set 3 : Unconditionally enabling the new test. #

Patch Set 4 : Rebasing... #

Total comments: 4

Patch Set 5 : Ready for further review. #

Patch Set 6 : Making .gitattributes more generic. #

Patch Set 7 : Reuploading with artificially high similarity requirements. #

Patch Set 8 : Dropped euc-kr test (little value + trouble with tools). #

Patch Set 9 : Dropping .gitattributes #

Patch Set 10 : Minor tweaks. #

Patch Set 11 : Saving UTF16 with BOM. Leaving UTF32 in a broken state. #

Total comments: 5

Patch Set 12 : Using TextEncoding::isNonByteBasedEncoding instead. #

Total comments: 12

Patch Set 13 : Addressed CR feedback from jsbell@. #

Patch Set 14 : Rebasing... #

Patch Set 15 : Switched \uNNNN escape codes into \xNN. #

Patch Set 16 : Rebased after landing the test files in a separate CL. #

Unified diffs Side-by-side diffs Delta from patch set Stats (+86 lines, -9 lines) Patch
M chrome/browser/download/save_page_browsertest.cc View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 chunks +59 lines, -8 lines 0 comments Download
A chrome/test/data/save_page/frames-encodings.htm View 1 2 3 4 5 6 7 8 9 10 1 chunk +21 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/web/WebPageSerializerImpl.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 chunk +4 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/wtf/text/TextEncoding.h View 1 2 3 4 5 6 7 8 9 10 11 1 chunk +2 lines, -1 line 0 comments Download

Messages

Total messages: 70 (26 generated)
Łukasz Anforowicz
Randy, could you please take a look?
5 years, 2 months ago (2015-10-14 17:01:37 UTC) #3
Randy Smith (Not in Mondays)
(I feel competent to review the overall test structure, but I know very little about ...
5 years, 2 months ago (2015-10-15 18:57:06 UTC) #4
Łukasz Anforowicz
Randy, can you take another look please? https://codereview.chromium.org/1407663004/diff/40001/chrome/browser/download/save_page_browsertest.cc File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1407663004/diff/40001/chrome/browser/download/save_page_browsertest.cc#newcode561 chrome/browser/download/save_page_browsertest.cc:561: // Disabled ...
5 years, 2 months ago (2015-10-20 21:08:43 UTC) #5
Łukasz Anforowicz
On 2015/10/15 18:57:06, rdsmith wrote: > (I feel competent to review the overall test structure, ...
5 years, 2 months ago (2015-10-20 21:10:22 UTC) #6
Randy Smith (Not in Mondays)
On 2015/10/20 21:08:43, Łukasz Anforowicz wrote: > Randy, can you take another look please? > ...
5 years, 2 months ago (2015-10-21 17:27:16 UTC) #7
Łukasz Anforowicz
Daniel, could you please take a look? (mostly at third_party/WebKit/Source/web/WebPageSerializerImpl.cpp, but obviously feedback for the ...
5 years, 2 months ago (2015-10-21 17:35:35 UTC) #9
Łukasz Anforowicz
On 2015/10/15 18:57:06, rdsmith wrote: > (I feel competent to review the overall test structure, ...
5 years, 2 months ago (2015-10-21 17:40:29 UTC) #11
ncarter (slow)
lgtm
5 years, 2 months ago (2015-10-21 19:59:53 UTC) #12
dcheng
Also, you might need to land this manually with git cl land. I recall strange ...
5 years, 2 months ago (2015-10-21 20:09:29 UTC) #13
ncarter (slow)
https://codereview.chromium.org/1407663004/diff/80001/third_party/WebKit/Source/web/WebPageSerializerImpl.cpp File third_party/WebKit/Source/web/WebPageSerializerImpl.cpp (right): https://codereview.chromium.org/1407663004/diff/80001/third_party/WebKit/Source/web/WebPageSerializerImpl.cpp#newcode133 third_party/WebKit/Source/web/WebPageSerializerImpl.cpp:133: if (isHTMLMetaElement(*element)) { It looks like it's designed to ...
5 years, 2 months ago (2015-10-21 20:17:58 UTC) #14
Łukasz Anforowicz
I've replied to CR questions. I think the current CL should be put on hold ...
5 years, 2 months ago (2015-10-23 21:42:23 UTC) #15
dcheng
Blink changes seem OK, but given your reply, I'll defer from L-G-T-M until you think ...
5 years, 1 month ago (2015-10-26 17:01:31 UTC) #16
Łukasz Anforowicz
jsbell@ + kojii@ - could you please take a look? In particular: 1. Please shout ...
5 years ago (2015-12-02 22:04:36 UTC) #23
jsbell
Quick nit about the CL description: it uses "multi-byte encodings" in a non-standard way. That ...
5 years ago (2015-12-02 23:26:10 UTC) #24
Łukasz Anforowicz
On 2015/12/02 23:26:10, jsbell wrote: > Quick nit about the CL description: it uses "multi-byte ...
5 years ago (2015-12-02 23:46:15 UTC) #31
jsbell
+jshin@ Thanks for the detailed CL description, BTW. It's answered most of the questions I've ...
5 years ago (2015-12-03 00:09:47 UTC) #32
kojii
I agree with jsbell@, in general, it's fine to always save in utf-8 as long ...
5 years ago (2015-12-03 03:41:45 UTC) #33
kojii
One more random thought, since IIUC what we want is not to save in UTF-8, ...
5 years ago (2015-12-04 05:08:59 UTC) #34
jsbell
On 2015/12/04 05:08:59, kojii wrote: > One more random thought, since IIUC what we want ...
5 years ago (2015-12-04 17:55:56 UTC) #35
Łukasz Anforowicz
jsbell@ / kojii@, thank you for your feedback. Can you please take a look at ...
5 years ago (2015-12-04 18:20:38 UTC) #36
blink-reviews
The reason why I am looking at save-page-as behavior for save-page-as is that some time ...
5 years ago (2015-12-04 18:24:53 UTC) #37
chromium-reviews
The reason why I am looking at save-page-as behavior for save-page-as is that some time ...
5 years ago (2015-12-04 18:24:54 UTC) #38
jsbell
On 2015/12/04 18:20:38, Łukasz Anforowicz wrote: > > If we serialize a BOM for UTF-16 ...
5 years ago (2015-12-04 23:12:01 UTC) #39
kojii
Sorry, I missed the discussion point in my previous reply. On 2015/12/04 at 23:12:01, jsbell ...
5 years ago (2015-12-07 02:42:37 UTC) #40
Łukasz Anforowicz
jsbell@, could you please take another look? Things that are ready for review: - API ...
5 years ago (2015-12-07 22:16:04 UTC) #42
Łukasz Anforowicz
On 2015/12/04 23:12:01, jsbell wrote: > I don't know where the extra character is coming ...
5 years ago (2015-12-07 22:19:48 UTC) #43
jsbell
What's the status of those extra 0x0A bytes you were seeing? A result of temp ...
5 years ago (2015-12-07 22:53:51 UTC) #44
Łukasz Anforowicz
jsbell@, can you take another look please? Thank you for the previous round of CR ...
5 years ago (2015-12-07 23:32:36 UTC) #46
jsbell
I like the direction this CL is trending. :) Overall lgtm with nits, but you'll ...
5 years ago (2015-12-08 01:19:08 UTC) #47
jsbell
On 2015/12/07 23:32:36, Łukasz Anforowicz wrote: > It turned out that this extra character was ...
5 years ago (2015-12-08 01:20:17 UTC) #48
dcheng
third_party/WebKit LGTM, but you'll need an owner in wtf (+tkent for that review)
5 years ago (2015-12-08 01:33:35 UTC) #50
dcheng
On 2015/12/08 at 01:33:35, dcheng wrote: > third_party/WebKit LGTM, but you'll need an owner in ...
5 years ago (2015-12-08 01:33:59 UTC) #51
blink-reviews
On Monday, December 7, 2015, <dcheng@chromium.org> wrote: > On 2015/12/08 at 01:33:35, dcheng wrote: > ...
5 years ago (2015-12-08 03:53:31 UTC) #52
chromium-reviews
On Monday, December 7, 2015, <dcheng@chromium.org> wrote: > On 2015/12/08 at 01:33:35, dcheng wrote: > ...
5 years ago (2015-12-08 03:53:32 UTC) #53
tkent
wtf lgtm
5 years ago (2015-12-08 06:20:09 UTC) #54
Łukasz Anforowicz
jsbell@, could you please take another look? (I've tried to address your feedback in the ...
5 years ago (2015-12-09 00:50:59 UTC) #55
jsbell
still lgtm - apart from the \u#### usage, alas. https://codereview.chromium.org/1407663004/diff/230001/chrome/browser/download/save_page_browsertest.cc File chrome/browser/download/save_page_browsertest.cc (right): https://codereview.chromium.org/1407663004/diff/230001/chrome/browser/download/save_page_browsertest.cc#newcode1100 chrome/browser/download/save_page_browsertest.cc:1100: ...
5 years ago (2015-12-09 01:02:47 UTC) #56
Łukasz Anforowicz
jsbell@, thanks for the review. BTW: the separate CL for the test files is @ ...
5 years ago (2015-12-09 01:27:36 UTC) #57
jungshik at Google
On 2015/12/09 01:27:36, Łukasz Anforowicz wrote: > jsbell@, thanks for the review. > > BTW: ...
5 years ago (2015-12-10 00:47:09 UTC) #59
commit-bot: I haz the power
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1407663004/310001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1407663004/310001
5 years ago (2015-12-10 01:13:34 UTC) #61
commit-bot: I haz the power
Dry run: This issue passed the CQ dry run.
5 years ago (2015-12-10 06:40:58 UTC) #63
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1407663004/310001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1407663004/310001
5 years ago (2015-12-10 17:56:56 UTC) #66
commit-bot: I haz the power
Committed patchset #16 (id:310001)
5 years ago (2015-12-10 18:06:19 UTC) #68
commit-bot: I haz the power
5 years ago (2015-12-10 18:07:31 UTC) #70
Message was sent while issue was closed.
Patchset 16 (id:??) landed as
https://crrev.com/36f0d157b6d2c6fae627a3aba955ad4e0fe07d71
Cr-Commit-Position: refs/heads/master@{#364394}

Powered by Google App Engine
This is Rietveld 408576698