Issue 1180793002: Document attribute charset and its aliases to return UTF-8

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-11 09:24:30 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/1

5 years, 6 months ago (2015-06-11 09:24:56 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 6 months ago (2015-06-11 10:38:45 UTC) #3

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 6 months ago (2015-06-11 10:38:46 UTC) #4

haraken

Given that these are non-standard attributes, would you describe the behavior of other browsers in ...

5 years, 6 months ago (2015-06-11 11:08:35 UTC) #6

philipj_slow

In the description, please also describe the conditions under which these getters will return null, ...

5 years, 6 months ago (2015-06-11 11:42:54 UTC) #7

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-12 16:29:39 UTC) #8

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/20001

5 years, 6 months ago (2015-06-12 16:29:48 UTC) #9

Habib Virji

I have added a small description about what's supported. Let me know if more information ...

5 years, 6 months ago (2015-06-12 16:45:41 UTC) #10

Habib Virji

Thanks philip, updated test and also description of where it can return null. https://codereview.chromium.org/1180793002/diff/1/LayoutTests/fast/dom/document-attribute-js-null.html File ...

5 years, 6 months ago (2015-06-12 16:46:31 UTC) #11

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 6 months ago (2015-06-12 17:59:30 UTC) #12

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: linux_blink_rel on tryserver.blink (JOB_FAILED, http://build.chromium.org/p/tryserver.blink/builders/linux_blink_rel/builds/66410)

5 years, 6 months ago (2015-06-12 17:59:32 UTC) #13

philipj_slow

Hmm, so the changes to characterSet and inputEncoding make it clear that this perhaps is ...

5 years, 6 months ago (2015-06-12 21:45:14 UTC) #14

Habib Virji

I had a look in the Gecko code in regards to characterSet value, particularly looking ...

5 years, 6 months ago (2015-06-15 08:59:27 UTC) #15

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-15 11:28:54 UTC) #16

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/40001

5 years, 6 months ago (2015-06-15 11:28:59 UTC) #17

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-15 11:41:58 UTC) #19

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/60001

5 years, 6 months ago (2015-06-15 11:42:11 UTC) #20

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-15 11:47:50 UTC) #22

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/80001

5 years, 6 months ago (2015-06-15 11:48:01 UTC) #23

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 6 months ago (2015-06-15 12:57:49 UTC) #24

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 6 months ago (2015-06-15 12:57:50 UTC) #25

Habib Virji

Thanks philip, it has been updated now to initialize m_encodingData by default to UTF-8. Since ...

5 years, 6 months ago (2015-06-15 12:58:32 UTC) #26

philipj_slow

https://codereview.chromium.org/1180793002/diff/80001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/80001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "UTF-8".toLowerCase(), encodingName.toLowerCase()); Why not compare the encodings case-sensitively? ...

5 years, 6 months ago (2015-06-15 15:31:38 UTC) #27

Habib Virji

Thanks philips. replied to your comments below. https://codereview.chromium.org/1180793002/diff/80001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/80001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "UTF-8".toLowerCase(), ...

5 years, 6 months ago (2015-06-15 16:14:55 UTC) #28

Habib Virji

@philip: do let me know if comments are okay or should work on any of ...

5 years, 6 months ago (2015-06-16 16:04:18 UTC) #29

philipj_slow

So defaultCharset was trickier than I had hoped, but let's deal with that separately. https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Document.cpp ...

5 years, 6 months ago (2015-06-16 22:05:57 UTC) #30

So defaultCharset was trickier than I had hoped, but let's deal with that
separately.

https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Documen...
File Source/core/dom/Document.cpp (right):

https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Documen...
Source/core/dom/Document.cpp:1131: return String("UTF-8");
On 2015/06/15 16:14:55, Habib Virji wrote:
> |document.implementation.createHTMLDocument('').defaultCharset| will be null
> without this change as settings will be null. document-attribute-js-null.html
> without this string will be null and thus updated it to return correct value. 

This case is tricky, I'm unsure what to do. It's strange to return anything
other than the encoding that will actually be used, but in order to not depend
on the settings object we'd have to make the default encoding a part of
Document, and make createDocument() and createHTMLDocument() get the default
encoding from the creating context. However, if the same logic isn't used in
TextResourceDecoderBuilder::createDecoderInstance that would just be a fancy
lie...

Document.defaultCharset is non-standard so there's nothing to guide us here. I
think the best thing is probably to leave it out of this CL and only deal with
Document.characterSet and its many aliases here. When Document.defaultCharset is
the last remaining user of [TreatReturnedNullStringAs=Undefined] let's try
again.

https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Documen...
Source/core/dom/Document.cpp:4220: m_encodingData = DocumentEncodingData();
On 2015/06/15 16:14:55, Habib Virji wrote:
> With the change in DocumentEncoderData it should not be. 

OK, so would an ASSERT(newData.encoding().isValid()) here always pass? There are
a few callers to Document::setEncodingData and it's not obvious at first glance.

https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Documen...
File Source/core/dom/DocumentEncodingData.cpp (right):

https://codereview.chromium.org/1180793002/diff/80001/Source/core/dom/Documen...
Source/core/dom/DocumentEncodingData.cpp:39: : m_encoding("UTF-8")
Could this use UTF8Encoding() instead of "UTF-8"?

philipj_slow

Actually it looks like Document.defaultCharset and Document.charset are the only two remaining uses of [TreatReturnedNullStringAs=Undefined].

5 years, 6 months ago (2015-06-16 22:14:52 UTC) #31

philipj_slow

Actually it looks like Document.defaultCharset and Document.charset are the only two remaining uses of [TreatReturnedNullStringAs=Undefined].

5 years, 6 months ago (2015-06-16 22:14:53 UTC) #32

philipj_slow

philipj@opera.com changed reviewers: + jl@opera.com

5 years, 6 months ago (2015-06-16 22:16:48 UTC) #33

philipj_slow

haraken@, jl@, do you have any thoughts on what to do with Document.defaultCharset, and for ...

5 years, 6 months ago (2015-06-16 22:16:49 UTC) #34

haraken

On 2015/06/16 22:16:49, philipj wrote: > haraken@, jl@, do you have any thoughts on what ...

5 years, 6 months ago (2015-06-16 22:21:06 UTC) #35

philipj_slow

On 2015/06/16 22:21:06, haraken wrote: > On 2015/06/16 22:16:49, philipj wrote: > > haraken@, jl@, ...

5 years, 6 months ago (2015-06-16 22:43:52 UTC) #36

haraken

On 2015/06/16 22:43:52, philipj wrote: > On 2015/06/16 22:21:06, haraken wrote: > > On 2015/06/16 ...

5 years, 6 months ago (2015-06-16 23:00:56 UTC) #37

philipj_slow

On 2015/06/16 23:00:56, haraken wrote: > On 2015/06/16 22:43:52, philipj wrote: > > On 2015/06/16 ...

5 years, 6 months ago (2015-06-17 07:22:20 UTC) #38

On 2015/06/16 23:00:56, haraken wrote:
> On 2015/06/16 22:43:52, philipj wrote:
> > On 2015/06/16 22:21:06, haraken wrote:
> > > On 2015/06/16 22:16:49, philipj wrote:
> > > > haraken@, jl@, do you have any thoughts on what to do with
> > > > Document.defaultCharset, and for that matter the rest of this CL? Usage
is
> > > > pretty high so just removing it isn't so easy:
> > > > https://www.chromestatus.com/metrics/feature/timeline/popularity/428
> > > 
> > > Just to clarify:
> > > 
> > > > In firefox only characterSet is supported and it returns an 
> > > > "utf-8".It does not implement charset and defaultCharset.
> > > >
> > > > In IE all characterSet, charset and defaultCharset are 
> > > > supported. 
> > > 
> > > What is the behavior of IE?
> > > 
> > > It is great that we can remove [TreatReturnedNullStringAs=undefined], but
we
> > > don't need to (shouldn't) do that if it breaks compatibility.
> > 
> > IE11:
> > 
> > document.implementation.createDocument(null, null, null).characterSet =>
> "utf-8"
> > document.implementation.createDocument(null, null, null).charset => "utf-8"
> > document.implementation.createDocument(null, null, null).defaultCharset =>
> > "windows-1252"
> > document.implementation.createDocument(null, null, null).inputEncoding =>
> > "UTF-8"
> > 
> > The results are the same for
document.implementation.createHTMLDocument('').*
> > 
> > I'm not sure if there are any other cases where IE might return null or
> > undefined for these, but I don't know how to create a document which is more
> > "orphaned" than these.
> 
> Doesn't that mean that this CL makes Blink more conformant to Firefox and IE?
> (If the change aligns with Firefox, IE and the spec, I think it would be
> reasonable to land it and see what happens even if the usage rate is high.)

Yes, bringing us closer to Firefox and IE is indeed the intention. The bit about
this CL that is troubling is defaultCharset for a newly created document where
we don't have access to settings.

haraken

On 2015/06/17 07:22:20, philipj wrote: > On 2015/06/16 23:00:56, haraken wrote: > > On 2015/06/16 ...

5 years, 6 months ago (2015-06-17 07:49:32 UTC) #39

On 2015/06/17 07:22:20, philipj wrote:
> On 2015/06/16 23:00:56, haraken wrote:
> > On 2015/06/16 22:43:52, philipj wrote:
> > > On 2015/06/16 22:21:06, haraken wrote:
> > > > On 2015/06/16 22:16:49, philipj wrote:
> > > > > haraken@, jl@, do you have any thoughts on what to do with
> > > > > Document.defaultCharset, and for that matter the rest of this CL?
Usage
> is
> > > > > pretty high so just removing it isn't so easy:
> > > > > https://www.chromestatus.com/metrics/feature/timeline/popularity/428
> > > > 
> > > > Just to clarify:
> > > > 
> > > > > In firefox only characterSet is supported and it returns an 
> > > > > "utf-8".It does not implement charset and defaultCharset.
> > > > >
> > > > > In IE all characterSet, charset and defaultCharset are 
> > > > > supported. 
> > > > 
> > > > What is the behavior of IE?
> > > > 
> > > > It is great that we can remove [TreatReturnedNullStringAs=undefined],
but
> we
> > > > don't need to (shouldn't) do that if it breaks compatibility.
> > > 
> > > IE11:
> > > 
> > > document.implementation.createDocument(null, null, null).characterSet =>
> > "utf-8"
> > > document.implementation.createDocument(null, null, null).charset =>
"utf-8"
> > > document.implementation.createDocument(null, null, null).defaultCharset =>
> > > "windows-1252"
> > > document.implementation.createDocument(null, null, null).inputEncoding =>
> > > "UTF-8"
> > > 
> > > The results are the same for
> document.implementation.createHTMLDocument('').*
> > > 
> > > I'm not sure if there are any other cases where IE might return null or
> > > undefined for these, but I don't know how to create a document which is
more
> > > "orphaned" than these.
> > 
> > Doesn't that mean that this CL makes Blink more conformant to Firefox and
IE?
> > (If the change aligns with Firefox, IE and the spec, I think it would be
> > reasonable to land it and see what happens even if the usage rate is high.)
> 
> Yes, bringing us closer to Firefox and IE is indeed the intention. The bit
about
> this CL that is troubling is defaultCharset for a newly created document where
> we don't have access to settings.

Thanks, now I understand the point.

You're an expert of these compatibility issues, so I'll defer the decision to
you. There is no strong reason we must remove [TreatReturnedNullStringAs] from
the IDL compiler, so I'm fine with the either way from the perspective of the
IDL compiler. The compatibility should be the first.

philipj_slow

Habib, my advice is to back out the defaultCharset change from this CL. The long-term ...

5 years, 6 months ago (2015-06-17 11:04:31 UTC) #40

Habib Virji

Thanks, I was looking to find a way of addressing a "default encoding for the ...

5 years, 6 months ago (2015-06-17 11:10:38 UTC) #41

philipj_slow

On 2015/06/17 11:10:38, Habib Virji wrote: > Thanks, I was looking to find a way ...

5 years, 6 months ago (2015-06-17 11:20:34 UTC) #42

philipj_slow

Please also update the title and description of this CL to be more accurate about ...

5 years, 6 months ago (2015-06-17 12:17:00 UTC) #43

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-17 15:37:55 UTC) #44

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/100001

5 years, 6 months ago (2015-06-17 15:38:14 UTC) #45

Habib Virji

Thanks philip, have uploaded patch without defaultCharset.

5 years, 6 months ago (2015-06-17 16:53:24 UTC) #46

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 6 months ago (2015-06-17 17:15:44 UTC) #47

commit-bot: I haz the power

Dry run: This issue passed the CQ dry run.

5 years, 6 months ago (2015-06-17 17:15:45 UTC) #48

philipj_slow

philipj@opera.com changed reviewers: + pdr@chromium.org

5 years, 6 months ago (2015-06-17 20:50:12 UTC) #49

philipj_slow

OK, this now LGTM, but since this does poke at some very central code I'd ...

5 years, 6 months ago (2015-06-17 20:50:14 UTC) #50

pdr.

pdr@chromium.org changed reviewers: + dominicc@chromium.org

5 years, 6 months ago (2015-06-17 20:55:52 UTC) #51

pdr.

On 2015/06/17 at 20:50:14, philipj wrote: > OK, this now LGTM, but since this does ...

5 years, 6 months ago (2015-06-17 20:56:36 UTC) #52

dominicc (has gone to gerrit)

On 2015/06/17 at 20:56:36, pdr wrote: > On 2015/06/17 at 20:50:14, philipj wrote: > > ...

5 years, 6 months ago (2015-06-22 01:23:00 UTC) #53

dominicc (has gone to gerrit)

Oops, and here are the comments. https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "UTF-8".toLowerCase(), encodingName.toLowerCase()); ...

5 years, 6 months ago (2015-06-22 01:23:31 UTC) #54

Habib Virji

https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "UTF-8".toLowerCase(), encodingName.toLowerCase()); The spec link I mentioned was ...

5 years, 6 months ago (2015-06-22 14:07:31 UTC) #55

philipj_slow

https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/100001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "UTF-8".toLowerCase(), encodingName.toLowerCase()); On 2015/06/22 14:07:30, Habib Virji wrote: ...

5 years, 6 months ago (2015-06-23 08:32:57 UTC) #56

Habib Virji

The CQ bit was checked by habib.virji@samsung.com to run a CQ dry run

5 years, 6 months ago (2015-06-23 13:57:53 UTC) #57

Habib Virji

The patchset sent to the CQ was uploaded after l-g-t-m from philipj@opera.com Link to the ...

5 years, 6 months ago (2015-06-23 13:57:54 UTC) #58

Habib Virji

https://codereview.chromium.org/1180793002/diff/120001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js File LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js (right): https://codereview.chromium.org/1180793002/diff/120001/LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js#newcode107 LayoutTests/dom/xhtml/level3/core/documentgetinputencoding02.js:107: assertEquals("documentgetinputencoding02", "utf-8", encodingName); On 2015/06/23 08:32:57, philipj (away until ...

5 years, 6 months ago (2015-06-23 13:58:09 UTC) #59

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/140001

5 years, 6 months ago (2015-06-23 13:58:45 UTC) #60

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

5 years, 6 months ago (2015-06-23 15:23:13 UTC) #61

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: mac_blink_rel on tryserver.blink (JOB_FAILED, http://build.chromium.org/p/tryserver.blink/builders/mac_blink_rel/builds/60311)

5 years, 6 months ago (2015-06-23 15:23:15 UTC) #62

Habib Virji

@philipj/dominicc: The failure in mac and win are unrelated to my changes. Are changes good ...

5 years, 6 months ago (2015-06-25 14:08:56 UTC) #63

philipj_slow

LGTM for me, but I'll give dominicc@ the final say.

5 years, 6 months ago (2015-06-26 13:05:23 UTC) #64

Habib Virji

On 2015/06/26 13:05:23, philipj wrote: > LGTM for me, but I'll give dominicc@ the final ...

5 years, 5 months ago (2015-07-08 08:51:02 UTC) #65

dominicc (has gone to gerrit)

On 2015/07/08 at 08:51:02, habib.virji wrote: > On 2015/06/26 13:05:23, philipj wrote: > > LGTM ...

5 years, 5 months ago (2015-07-09 04:13:18 UTC) #66

dominicc (has gone to gerrit)

The CQ bit was checked by dominicc@chromium.org

5 years, 5 months ago (2015-07-09 04:13:34 UTC) #67

commit-bot: I haz the power

CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1180793002/140001

5 years, 5 months ago (2015-07-09 04:13:48 UTC) #69

commit-bot: I haz the power

5 years, 5 months ago (2015-07-09 05:22:38 UTC) #70

Message was sent while issue was closed.

Committed patchset #8 (id:140001) as
https://src.chromium.org/viewvc/blink?view=rev&revision=198563

Issue 1180793002: Document attribute charset and its aliases to return UTF-8 (Closed)

Description

Patch Set 1 #

Patch Set 2 : Clean up further Document.idl to not return undefined for null string #

Patch Set 3 : Sets m_encodingData as UTF-8 by default. If DocumentEncodingData is empty in setEncoding, default v… #

Patch Set 4 : Updated layout tests and defaultCharset to return UTF-8 #

Patch Set 5 : Updated expectation file with UTF-8 #

Patch Set 6 : Removed defaultCharset changes #

Patch Set 7 : Add utf-8, utf8 and unicode-1-1-utf-8 as the return type #

Patch Set 8 : Updated as per philipj suggestions #

Messages