|
|
Chromium Code Reviews|
Created:
4 years, 7 months ago by Peng Xinchao Modified:
3 years, 6 months ago CC:
cbentzel+watch_chromium.org, chromium-reviews, Nate Chapin, Mike West Base URL:
https://chromium.googlesource.com/chromium/src.git@master Target Ref:
refs/pending/heads/master Project:
chromium Visibility:
Public. |
DescriptionRemove default charset of dataURL request
It is a special case.
DataURL is not include chareset field ,data of DataURL is HTML and it include '<meta charset="UTF-8">'.
so data of DataURL should UTF-8 charset .
But because the charset of DataURL is setted by default charset 'US-ASCII', it lead to that
m_source of TextResourceDecoder::checkForMetaCharset function is EncodingFromHTTPHeader, it return directly and can not continue to check meta tag.
BUG=612104
Signed-off-by: Peng Xinchao <pxinchao@gmail.com>
Patch Set 1 #
Total comments: 1
Patch Set 2 : Add testcase #Patch Set 3 : for RFC #
Messages
Total messages: 28 (11 generated)
Description was changed from ========== Remove default charset of dataURL request It is a special case. DataURL is not include chareset field ,data of DataURL is HTML and it include '<meta charset="UTF-8">'. so data of DataURL should UTF-8 charset . But because the charset of DataURL is setted by default charset 'US-ASCII', it lead to that m_source of TextResourceDecoder::checkForMetaCharset function is EncodingFromHTTPHeader, it return directly and can not continue to check meta tag. BUG=612104 Signed-off-by: Peng Xinchao <pxinchao@gmail.com> ========== to ========== Remove default charset of dataURL request It is a special case. DataURL is not include chareset field ,data of DataURL is HTML and it include '<meta charset="UTF-8">'. so data of DataURL should UTF-8 charset . But because the charset of DataURL is setted by default charset 'US-ASCII', it lead to that m_source of TextResourceDecoder::checkForMetaCharset function is EncodingFromHTTPHeader, it return directly and can not continue to check meta tag. BUG=612104 Signed-off-by: Peng Xinchao <pxinchao@gmail.com> ==========
pxinchao@gmail.com changed reviewers: + davve@opera.com, rdevlin.cronin@chromium.org
pxinchao@gmail.com changed reviewers: + kavvaru@chromium.org
davve@opera.com changed reviewers: + eroman@chromium.org, mmenke@chromium.org
It looks like you're on the track of something here but I'm not a good reviewer for this change. +eroman and +mmenke.
mmenke@chromium.org changed reviewers: + asanka@chromium.org
[+asanka], who knows more about mime types than I do. Seems weird to me that all of this is handled differently than it is for HTTP, but the HTTP and data URL specs are pretty clear about default mime types, at least, if not default charsets. https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc File net/base/data_url.cc (left): https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc#oldcode74 net/base/data_url.cc:74: mime_type->assign("text/plain"); Per RFC2397, we should certainly be setting charset to "US-ASCII" if media type is empty (It doesn't specifically exclude the case media type is empty but charset is not, but I'm certainly fine keeping the charset if we get one).
On 2016/05/17 14:00:28, David Vest wrote: > It looks like you're on the track of something here but I'm not a good reviewer > for this change. +eroman and +mmenke. I'm also not a good reviewer for net changes. :)
On 2016/05/17 15:08:56, mmenke wrote: > [+asanka], who knows more about mime types than I do. Seems weird to me that > all of this is handled differently than it is for HTTP, but the HTTP and data > URL specs are pretty clear about default mime types, at least, if not default > charsets. > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc > File net/base/data_url.cc (left): > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc#oldcode74 > net/base/data_url.cc:74: mime_type->assign("text/plain"); > Per RFC2397, we should certainly be setting charset to "US-ASCII" if media type > is empty (It doesn't specifically exclude the case media type is empty but > charset is not, but I'm certainly fine keeping the charset if we get one). If data_url.cc setted default charset , blink don't have method to know it is default charset or custom charset . So i think it is not necessary to set default charset . The default charset should be setted by Blink . And Blink have done it . const WTF::TextEncoding& TextResourceDecoder::defaultEncoding(ContentType contentType, const WTF::TextEncoding& specifiedDefaultEncoding) { // Despite 8.5 "Text/xml with Omitted Charset" of RFC 3023, we assume UTF-8 instead of US-ASCII // for text/xml. This matches Firefox. if (contentType == XMLContent) return UTF8Encoding(); if (!specifiedDefaultEncoding.isValid()) return Latin1Encoding(); return specifiedDefaultEncoding; }
On 2016/05/18 02:01:08, pxinchao wrote: > On 2016/05/17 15:08:56, mmenke wrote: > > [+asanka], who knows more about mime types than I do. Seems weird to me that > > all of this is handled differently than it is for HTTP, but the HTTP and data > > URL specs are pretty clear about default mime types, at least, if not default > > charsets. > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc > > File net/base/data_url.cc (left): > > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc#oldcode74 > > net/base/data_url.cc:74: mime_type->assign("text/plain"); > > Per RFC2397, we should certainly be setting charset to "US-ASCII" if media > type > > is empty (It doesn't specifically exclude the case media type is empty but > > charset is not, but I'm certainly fine keeping the charset if we get one). > > If data_url.cc setted default charset , blink don't have method to know it is > default charset or custom charset . > So i think it is not necessary to set default charset . > The default charset should be setted by Blink . And Blink have done it . The data URL RFC is quite explicit in this case. It's different from the HTML RFC on the default mime type. Note that the blink code seems to be unaware of scheme, but the RFCs make it quite clear the default is different for different schemes. > const WTF::TextEncoding& TextResourceDecoder::defaultEncoding(ContentType > contentType, const WTF::TextEncoding& specifiedDefaultEncoding) > { > // Despite 8.5 "Text/xml with Omitted Charset" of RFC 3023, we assume UTF-8 > instead of US-ASCII > // for text/xml. This matches Firefox. > if (contentType == XMLContent) > return UTF8Encoding(); > if (!specifiedDefaultEncoding.isValid()) > return Latin1Encoding(); > return specifiedDefaultEncoding; > }
On 2016/05/18 03:56:50, mmenke wrote: > On 2016/05/18 02:01:08, pxinchao wrote: > > On 2016/05/17 15:08:56, mmenke wrote: > > > [+asanka], who knows more about mime types than I do. Seems weird to me > that > > > all of this is handled differently than it is for HTTP, but the HTTP and > data > > > URL specs are pretty clear about default mime types, at least, if not > default > > > charsets. > > > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc > > > File net/base/data_url.cc (left): > > > > > > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc#oldcode74 > > > net/base/data_url.cc:74: mime_type->assign("text/plain"); > > > Per RFC2397, we should certainly be setting charset to "US-ASCII" if media > > type > > > is empty (It doesn't specifically exclude the case media type is empty but > > > charset is not, but I'm certainly fine keeping the charset if we get one). > > > > If data_url.cc setted default charset , blink don't have method to know it is > > default charset or custom charset . > > So i think it is not necessary to set default charset . > > The default charset should be setted by Blink . And Blink have done it . > > The data URL RFC is quite explicit in this case. It's different from the HTML > RFC on the default mime type. Note that the blink code seems to be unaware of > scheme, but the RFCs make it quite clear the default is different for different > schemes. Err..default charset, rather. Data URLs default mime type to text/plain and US-ASCII, if no mime type is given. It leaves ambiguous what to do if there's a mime type but no charset. HTTP defaults to text/html and charset to ISO-8859-1. > > > const WTF::TextEncoding& TextResourceDecoder::defaultEncoding(ContentType > > contentType, const WTF::TextEncoding& specifiedDefaultEncoding) > > { > > // Despite 8.5 "Text/xml with Omitted Charset" of RFC 3023, we assume > UTF-8 > > instead of US-ASCII > > // for text/xml. This matches Firefox. > > if (contentType == XMLContent) > > return UTF8Encoding(); > > if (!specifiedDefaultEncoding.isValid()) > > return Latin1Encoding(); > > return specifiedDefaultEncoding; > > }
On 2016/05/18 04:03:20, mmenke wrote: > On 2016/05/18 03:56:50, mmenke wrote: > > On 2016/05/18 02:01:08, pxinchao wrote: > > > On 2016/05/17 15:08:56, mmenke wrote: > > > > [+asanka], who knows more about mime types than I do. Seems weird to me > > that > > > > all of this is handled differently than it is for HTTP, but the HTTP and > > data > > > > URL specs are pretty clear about default mime types, at least, if not > > default > > > > charsets. > > > > > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc > > > > File net/base/data_url.cc (left): > > > > > > > > > > > > > > https://codereview.chromium.org/1986053003/diff/1/net/base/data_url.cc#oldcode74 > > > > net/base/data_url.cc:74: mime_type->assign("text/plain"); > > > > Per RFC2397, we should certainly be setting charset to "US-ASCII" if media > > > type > > > > is empty (It doesn't specifically exclude the case media type is empty but > > > > charset is not, but I'm certainly fine keeping the charset if we get one). > > > > > > If data_url.cc setted default charset , blink don't have method to know it > is > > > default charset or custom charset . > > > So i think it is not necessary to set default charset . > > > The default charset should be setted by Blink . And Blink have done it . > > > > The data URL RFC is quite explicit in this case. It's different from the HTML > > RFC on the default mime type. Note that the blink code seems to be unaware of > > scheme, but the RFCs make it quite clear the default is different for > different > > schemes. > > Err..default charset, rather. Data URLs default mime type to text/plain and > US-ASCII, if no mime type is given. It leaves ambiguous what to do if there's a > mime type but no charset. > > HTTP defaults to text/html and charset to ISO-8859-1. First the fault charset is US-ASCII, you are right . But i think it should not be setted in DataURL parse . When Data parsing , it should return the real context , it should not include default setting . If data without charset , it should set charset empty , not default . When other part need to use charset , if it is empty , it should be done with by defacut charset. For example HTTP reposene , if reponse without charset , it should be setted empty , not default charset . How about you ?
The CQ bit was checked by pxinchao@gmail.com
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1986053003/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1986053003/40001
The CQ bit was unchecked by commit-bot@chromium.org
No L-G-T-M from a valid reviewer yet. CQ run can only be started by full committers or once the patch has received an L-G-T-M from a full committer. Even if an L-G-T-M may have been provided, it was from a non-committer, _not_ a full super star committer. See http://www.chromium.org/getting-involved/become-a-committer Note that this has nothing to do with OWNERS files.
The CQ bit was checked by mmenke@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1986053003/40001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/1986053003/40001
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: The author pxinchao@gmail.com has not signed Google Contributor License Agreement. Please visit https://cla.developers.google.com to sign and manage CLA.
hi ,mmenke. How about the Path Set 3? Do you have idea?
On 2016/05/19 01:06:49, pxinchao wrote: > hi ,mmenke. > How about the Path Set 3? Do you have idea? I've been waiting for Asanka to chime in, who knows more about this stuff than I do. Asanka's currently pretty heavily loaded. I'd tend to say "Without a standards track spec or security issue, we should not be implicitly extending the data URL spec to have more guarantees, which other browsers may or may not support." If there were a spec on behavior here, I'd certainly be fine with changing behavior. I wanted to give Asanka a chance to weigh in without biasing his initial thoughts on the topic.
Sorry about the delay. There are several rules at play here and I believe the code in the tree is [mostly] correct. As mmenke pointed out, if a media type is specified and the charset is not, then the default charset is dependent on the scheme. This is called out in RFC 1866 sec 4.1. For HTTP, text/html defaults to iso-8859-1. As for data: URIs, RFC 2397 states (paraphrasing): if <mediatype> is empty, then assume it is "text/plain;charset=US-ASCII" if <mediatype> only contains a "charset" value, then assume the full type is "text/plain;charset=<value>" No mention of text/html, but in general, this RFC 2397 follows RFC 1521 and RFC 822 in using a default of "text/plain; charset=us-ascii". Hence, "charset=us-ascii" is a reasonable default for text/html. 'charset' parameters don't make sense for all media types, and data_url.cc should really be setting it only if the effective media type is a text subtype without an explicit charset. Resolving this doesn't help with issue 612104. The effective charset can be specified via several means for a data URI containing text/html: data URI <mediatype>, BOM, and <meta charset=...> OR <meta http-equiv="content-type"...>. If more than one mechanism is being used, they must all match. If there's a mismatch (as in issue 612104 where the defaults for <mediatype> conflict with <meta charset>), the behavior is not specified other than considering it a document-conformance error. Given this, I'd say not lgtm for this data_url.cc change. The current behavior is reasonable and there's no evidence of a significant compatibility issue in handling of conformant documents. The data URI should specify a charset if it intends to have a specific charset. If a change was to be made to allow a character encoding declaration to override Content-Type metadata for non-conformant documents, then we could reconsider (by 'we' I mean Blink folks). But the latter would have to be justified as being compatible with other browsers.
pxinchao@gmail.com changed reviewers: + japhet@chromium.org, kouhei@chromium.org
On 2016/05/19 19:17:51, asanka wrote: > Sorry about the delay. > > There are several rules at play here and I believe the code in the tree is > [mostly] correct. > > As mmenke pointed out, if a media type is specified and the charset is not, then > the default charset is dependent on the scheme. This is called out in RFC 1866 > sec 4.1. For HTTP, text/html defaults to iso-8859-1. > > As for data: URIs, RFC 2397 states (paraphrasing): > > if <mediatype> is empty, then assume it is "text/plain;charset=US-ASCII" > if <mediatype> only contains a "charset" value, then assume the full type is > "text/plain;charset=<value>" > > No mention of text/html, but in general, this RFC 2397 follows RFC 1521 and RFC > 822 in using a default of "text/plain; charset=us-ascii". Hence, > "charset=us-ascii" is a reasonable default for text/html. 'charset' parameters > don't make sense for all media types, and data_url.cc should really be setting > it only if the effective media type is a text subtype without an explicit > charset. Resolving this doesn't help with issue 612104. > > The effective charset can be specified via several means for a data URI > containing text/html: data URI <mediatype>, BOM, and <meta charset=...> OR <meta > http-equiv="content-type"...>. If more than one mechanism is being used, they > must all match. If there's a mismatch (as in issue 612104 where the defaults for > <mediatype> conflict with <meta charset>), the behavior is not specified other > than considering it a document-conformance error. > > Given this, I'd say not lgtm for this data_url.cc change. The current behavior > is reasonable and there's no evidence of a significant compatibility issue in > handling of conformant documents. The data URI should specify a charset if it > intends to have a specific charset. > > If a change was to be made to allow a character encoding declaration to override > Content-Type metadata for non-conformant documents, then we could reconsider (by > 'we' I mean Blink folks). But the latter would have to be justified as being > compatible with other browsers. hi ,@japhet and @kouhei : How about the issue? Do you have idea? or do you have a plan to motify ? thank you
On 2016/05/20 02:00:41, pxinchao wrote: > On 2016/05/19 19:17:51, asanka wrote: > > Sorry about the delay. > > > > There are several rules at play here and I believe the code in the tree is > > [mostly] correct. > > > > As mmenke pointed out, if a media type is specified and the charset is not, > then > > the default charset is dependent on the scheme. This is called out in RFC 1866 > > sec 4.1. For HTTP, text/html defaults to iso-8859-1. > > > > As for data: URIs, RFC 2397 states (paraphrasing): > > > > if <mediatype> is empty, then assume it is "text/plain;charset=US-ASCII" > > if <mediatype> only contains a "charset" value, then assume the full type is > > "text/plain;charset=<value>" > > > > No mention of text/html, but in general, this RFC 2397 follows RFC 1521 and > RFC > > 822 in using a default of "text/plain; charset=us-ascii". Hence, > > "charset=us-ascii" is a reasonable default for text/html. 'charset' parameters > > don't make sense for all media types, and data_url.cc should really be setting > > it only if the effective media type is a text subtype without an explicit > > charset. Resolving this doesn't help with issue 612104. > > > > The effective charset can be specified via several means for a data URI > > containing text/html: data URI <mediatype>, BOM, and <meta charset=...> OR > <meta > > http-equiv="content-type"...>. If more than one mechanism is being used, they > > must all match. If there's a mismatch (as in issue 612104 where the defaults > for > > <mediatype> conflict with <meta charset>), the behavior is not specified other > > than considering it a document-conformance error. > > > > Given this, I'd say not lgtm for this data_url.cc change. The current behavior > > is reasonable and there's no evidence of a significant compatibility issue in > > handling of conformant documents. The data URI should specify a charset if it > > intends to have a specific charset. > > > > If a change was to be made to allow a character encoding declaration to > override > > Content-Type metadata for non-conformant documents, then we could reconsider > (by > > 'we' I mean Blink folks). But the latter would have to be justified as being > > compatible with other browsers. > hi ,@japhet and @kouhei : > How about the issue? Do you have idea? or do you have a plan to motify ? > thank you not lgtm from me too. I think it makes sense to keep the current behavior as is.
mmenke@chromium.org changed reviewers: - mmenke@chromium.org
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
