Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(522)

Issue 1890103002: Reland "UTF-8 detector for pages missing encoding info" (Closed)

Created:
4 years, 8 months ago by Jinsuk Kim
Modified:
4 years, 8 months ago
CC:
blink-reviews, blink-reviews-html_chromium.org, blink-reviews-wtf_chromium.org, chromium-reviews, dglazkov+blink, jshin+watch_chromium.org, kinuko+watch, Mikhail, tyoshino+watch_chromium.org
Base URL:
https://chromium.googlesource.com/chromium/src.git@master
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

Reland "UTF-8 detector for pages missing encoding info" TextResourceDecoder is designed (or used) in such a way that the text encoding of a document gets resolved from the first chunk (as big as 4096 bytes) of the text received from network - by BOM, meta tag, or auto encoding detection (if enabled). The newly introduced UTF-8 encoding detector crrev.com/1721373002 was reverted (crbug.com/603558) because it attempted to work in a bit different way - it examined all the subsequent chunks as well in search of non-ASCII-UTF-8-encoded char sequence. This means it is possible for TextResourceDecoder to start with a codec for, say, windows-1252, and then later switch to one for UTF-8. Theoretically this should still work but doesn't in practice (maybe hasn't been used/tested in that way). This is what happened with failed perf tests - one of the js files was big (13K), of pure ASCII except one tiny char sequence \xc2\xa7 almost at the end. The CL was updated so the UTF-8 encoding detection also works against the first chunk only like other methods, to avoid potential codec switching in the middle. BUG=583549, 603558 Committed: https://crrev.com/57139d64c5b98142ca9305792f39ae23a4950375 Cr-Commit-Position: refs/heads/master@{#388927}

Patch Set 1 #

Patch Set 2 : add tests #

Total comments: 4

Patch Set 3 : #

Unified diffs Side-by-side diffs Delta from patch set Stats (+209 lines, -60 lines) Patch
A + third_party/WebKit/LayoutTests/fast/encoding/unlabelled-non-ascii-utf8.html View 2 chunks +10 lines, -11 lines 0 comments Download
A + third_party/WebKit/LayoutTests/fast/encoding/unlabelled-non-ascii-utf8-expected.html View 2 chunks +11 lines, -11 lines 0 comments Download
M third_party/WebKit/Source/core/core.gypi View 1 1 chunk +1 line, -0 lines 0 comments Download
M third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h View 2 chunks +9 lines, -6 lines 0 comments Download
M third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp View 4 chunks +43 lines, -26 lines 0 comments Download
A third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp View 1 2 1 chunk +38 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/core/xmlhttprequest/XMLHttpRequest.cpp View 1 chunk +5 lines, -2 lines 0 comments Download
M third_party/WebKit/Source/platform/text/TextEncodingDetector.h View 1 chunk +3 lines, -3 lines 0 comments Download
M third_party/WebKit/Source/platform/text/TextEncodingDetector.cpp View 1 chunk +1 line, -1 line 0 comments Download
M third_party/WebKit/Source/wtf/text/UTF8.h View 1 chunk +6 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/wtf/text/UTF8.cpp View 1 chunk +18 lines, -0 lines 0 comments Download
A third_party/WebKit/Source/wtf/text/UTF8Test.cpp View 1 chunk +63 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/wtf/wtf.gypi View 1 chunk +1 line, -0 lines 0 comments Download

Messages

Total messages: 18 (6 generated)
Jinsuk Kim
The ideal fix would be to allow TextResourceDecoder to switch codecs but first, this is ...
4 years, 8 months ago (2016-04-15 12:37:26 UTC) #1
jungshik at Google
On 2016/04/15 12:37:26, Jinsuk wrote: > The ideal fix would be to allow TextResourceDecoder to ...
4 years, 8 months ago (2016-04-15 19:10:36 UTC) #2
Jinsuk Kim
On 2016/04/15 19:10:36, jshin (ooo Fri aka jungshik) wrote: > On 2016/04/15 12:37:26, Jinsuk wrote: ...
4 years, 8 months ago (2016-04-15 19:20:40 UTC) #3
Jinsuk Kim
jshin@, tkent@: could you a look? This is a relanding CL that was reverted. And ...
4 years, 8 months ago (2016-04-19 22:59:28 UTC) #6
tkent
This CL should have a test for crbug.com/603558.
4 years, 8 months ago (2016-04-20 00:07:49 UTC) #7
Jinsuk Kim
On 2016/04/20 00:07:49, tkent wrote: > This CL should have a test for crbug.com/603558. Done. ...
4 years, 8 months ago (2016-04-20 02:08:39 UTC) #8
tkent
lgtm https://codereview.chromium.org/1890103002/diff/20001/third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp File third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp (right): https://codereview.chromium.org/1890103002/diff/20001/third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp#newcode24 third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp:24: ASSERT_EQ(UTF8Encoding(), decoder->encoding()); probably this should be EXPECT_EQ because ...
4 years, 8 months ago (2016-04-20 06:37:47 UTC) #9
Jinsuk Kim
Thanks again Kent for the review. https://codereview.chromium.org/1890103002/diff/20001/third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp File third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp (right): https://codereview.chromium.org/1890103002/diff/20001/third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp#newcode24 third_party/WebKit/Source/core/html/parser/TextResourceDecoderTest.cpp:24: ASSERT_EQ(UTF8Encoding(), decoder->encoding()); On ...
4 years, 8 months ago (2016-04-20 07:13:55 UTC) #10
jungshik at Google
LGTM ! Thank you for your patience.
4 years, 8 months ago (2016-04-21 18:31:49 UTC) #11
commit-bot: I haz the power
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1890103002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1890103002/40001
4 years, 8 months ago (2016-04-21 20:58:30 UTC) #14
commit-bot: I haz the power
Committed patchset #3 (id:40001)
4 years, 8 months ago (2016-04-21 22:33:36 UTC) #16
commit-bot: I haz the power
4 years, 8 months ago (2016-04-22 19:40:38 UTC) #18
Message was sent while issue was closed.
Patchset 3 (id:??) landed as
https://crrev.com/57139d64c5b98142ca9305792f39ae23a4950375
Cr-Commit-Position: refs/heads/master@{#388927}

Powered by Google App Engine
This is Rietveld 408576698