[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
Committed: https://crrev.com/2198aba1db661a195eb6f191274c1595c3bc6867
Cr-Commit-Position: refs/heads/master@{#364978}
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/1 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/1
Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/144517)
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/40001
Description was changed from
==========
[Fetch] Always use utf-8 for decoding in text()
BUG=557510
==========
to
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::CheckForOnlyUTF8BOM option, which does not
check for utf-16/32 BOMs but checks utf-8 BOMs, so that
TextResourceDecoder::create("text/plain", UTF8Encoding(), false,
CheckForOnlyUTF8BOM)
corresponds to utf-8 decode in the spec [2],
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
hiroshige
The CQ bit was checked by hiroshige@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/60001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/60001
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/100001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/100001
5 years, 1 month ago
(2015-11-23 14:32:33 UTC)
#12
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/120001
5 years, 1 month ago
(2015-11-23 14:34:10 UTC)
#14
5 years, 1 month ago
(2015-11-23 18:33:39 UTC)
#18
PTAL.
yhirano
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp (right): https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp#newcode197 third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:197: // We let it override even a user-chosen encoding. ...
5 years, 1 month ago
(2015-11-24 02:06:40 UTC)
#19
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp (right):
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:197: // We
let it override even a user-chosen encoding.
This is the problem, right?
Rather than introducing a new parameter, I think it is good to expand
|usesEncodingDetector| constructor parameter so that a user can choose one of
- UseAutoDetection (<= usesEncodingDetector == true)
- UseBOMBasedDetectionOnly (<= usesEncodingDetector == false, default value)
- DoNotUseAutoDetection (<= Response.text())
. What do you think?
hiroshige
The CQ bit was checked by hiroshige@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/130001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/130001
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp (right): https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp#newcode197 third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:197: // We let it override even a user-chosen encoding. ...
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp (right):
https://codereview.chromium.org/1470893002/diff/120001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:197: // We
let it override even a user-chosen encoding.
How about Patch Set 8:
|usesEncodingDetector| can be:
- UseAllAutoDetection (<= usesEncodingDetector == true)
- UseContentAndBOMBasedDetection (<= usesEncodingDetector == false, default
value)
- AlwaysUseUTF8ForText (<= Response.text())
and when it is AlwaysUseUTF8ForText, |defaultEncoding| should be utf-8, and
|mimeType| should be "text/plain".
This is not orthogonal, but implementing DoNotUseAutoDetection for all encodings
and non-text contents complicates the code:
- Even when we don't detect encoding, we should handle utf-8 BOMs in
AlwaysUseUTF8ForText, i.e. we should skip utf-8 BOMs (but not other BOMs such as
utf-16be BOMs) when we decode the content as utf-8.
Supporting this BOM skipping for utf-16/32 makes code more complicated.
- In non-text contents (i.e. CSS/XML/HTML), TextResourceDecoder handles
encodings explicitly specified in those files (e.g. <meta> tag) and its
detection logic is built into other logics in the decoder.
Disabling such logic makes code and state management of TextResourceDecoder
more complicated.
- And anyway we don't have such use cases.
commit-bot: I haz the power
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/150001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/150001
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/150001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/150001
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/170001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/170001
Description was changed from
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::CheckForOnlyUTF8BOM option, which does not
check for utf-16/32 BOMs but checks utf-8 BOMs, so that
TextResourceDecoder::create("text/plain", UTF8Encoding(), false,
CheckForOnlyUTF8BOM)
corresponds to utf-8 decode in the spec [2],
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
to
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
Also updated CL description.
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp (right):
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:128:
ASSERT(!(m_encodingDetectionOption == AlwaysUseUTF8ForText && (m_contentType !=
PlainTextContent || m_encoding != UTF8Encoding())));
On 2015/12/14 02:00:22, kouhei wrote:
> Optional nit: Would you split this ASSERT stmt?
> if (m_encodingDetectionOption == AlwaysUseUTF8ForText)
> ASSERT(m_contentType != PlainTextContent || m_encoding != UTF8Encoding());
Done.
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/html/parser/TextResourceDecoder.cpp:214: if
(m_encodingDetectionOption != AlwaysUseUTF8ForText && c1 == 0xFF && c2 == 0xFE)
{
On 2015/12/14 02:00:22, kouhei wrote:
> Can we reorder the if stmts here?
>
> if (c1 == 0xEF && c2 == 0xBB && c3 == 0xBF) {
>
> } else if (m_encodingDetectionOption != AlwaysUseUTF8ForText) {
> if (c1 == 0xFF && c2 == 0xFE) {
>
> } else if (c1 == 0xFE && c2 == 0xFF) {
>
> } else if (...) {
>
> }
> }
Done.
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h (right):
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h:49: static
PassOwnPtr<TextResourceDecoder> create(const String& mimeType, const
WTF::TextEncoding& defaultEncoding = WTF::TextEncoding(), bool
usesEncodingDetector = false)
On 2015/12/14 02:00:22, kouhei wrote:
> Most of the create() methods use exactly same params as c-tors.
> Would you change "bool usesEncodingDetector" to EncodingDecodingOption?
I'd like to enforce mimeType == "plain/text" && defaultEncoding ==
UTF8Enconding() if EncodingDecodingOption is AlwaysUseUTF8ForText, so
AlwaysUseUTF8ForText shouldn't specified from create().
Changing the last param to EncodingDetectionOption and placing RELEASE_ASSERT or
ASSERT(encodingDetectionOption != AlwaysUseUTF8ForText) would also work (but I
slightly prefer the current Patch Set).
How do you think?
lgtm
>
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
> File third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h (right):
>
>
https://codereview.chromium.org/1470893002/diff/150001/third_party/WebKit/Sou...
> third_party/WebKit/Source/core/html/parser/TextResourceDecoder.h:49: static
> PassOwnPtr<TextResourceDecoder> create(const String& mimeType, const
> WTF::TextEncoding& defaultEncoding = WTF::TextEncoding(), bool
> usesEncodingDetector = false)
> On 2015/12/14 02:00:22, kouhei wrote:
> > Most of the create() methods use exactly same params as c-tors.
> > Would you change "bool usesEncodingDetector" to EncodingDecodingOption?
>
> I'd like to enforce mimeType == "plain/text" && defaultEncoding ==
> UTF8Enconding() if EncodingDecodingOption is AlwaysUseUTF8ForText, so
> AlwaysUseUTF8ForText shouldn't specified from create().
> Changing the last param to EncodingDetectionOption and placing RELEASE_ASSERT
or
> ASSERT(encodingDetectionOption != AlwaysUseUTF8ForText) would also work (but I
> slightly prefer the current Patch Set).
> How do you think?
ok
hiroshige
The CQ bit was unchecked by hiroshige@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1470893002/170001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1470893002/170001
Description was changed from
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
to
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
Description was changed from
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
==========
to
==========
[Fetch] Always use utf-8 for decoding in text()
In the spec, Response.text() should run utf-8 decode [1][2], i.e. always use
utf-8. But in the implementation, TextResourceDecoder used by Response.text()
decodes the input as utf-16/32 if it contains utf-16/32 BOMs.
This CL
- Adds TextResourceDecoder::createAlwaysUseUTF8ForText() and
AlwaysUseUTF8ForText that corresponds to utf-8 decode in the spec [2].
- Fixes Response.text() by using CheckForOnlyUTF8BOM in FetchDataLoaderAsString.
- Adds layout tests for Fetch API and XHRs
(but doesn't change the behavior of XHRs).
[1] https://fetch.spec.whatwg.org/#concept-body-package-data
[2] https://encoding.spec.whatwg.org/#utf-8-decode
BUG=557510
Committed: https://crrev.com/2198aba1db661a195eb6f191274c1595c3bc6867
Cr-Commit-Position: refs/heads/master@{#364978}
==========
commit-bot: I haz the power
Patchset 10 (id:??) landed as https://crrev.com/2198aba1db661a195eb6f191274c1595c3bc6867 Cr-Commit-Position: refs/heads/master@{#364978}
Issue 1470893002: [Fetch] Always use utf-8 for decoding in text()
(Closed)
Created 5 years, 1 month ago by hiroshige
Modified 5 years ago
Reviewers: yhirano, kouhei (in TOK)
Base URL: https://chromium.googlesource.com/chromium/src.git@master
Comments: 9