|
|
|
Created:
4 years, 10 months ago by Tiger (Sony Mobile) Modified:
4 years, 10 months ago CC:
blink-reviews Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Target Ref:
refs/heads/master Project:
blink Visibility:
Public. |
DescriptionCleanup PageSerializer, add XHTML compatibility
Rename paramater in functions inherited from MarkupAccumulator to match
parent class and output a valid meta tag even for XML documents.
--
This review is part of set aiming to merge the two existing page
serializers, WebPageSerializerImpl and PageSerializer into the latter
one.
In addition to this it moves all the old tests from
WebPageNewSerializerTest and WebPageSerializerTest to the newer
PageSerializerTest structure and splits out a test for MHTML into the
MHTMLTest file.
Note: This is a rebase and split of
https://codereview.chromium.org/68613003 from Opera which was reverted
due to failing tests. In addition to the rebase some smaller fixes was
done to accommodate to changes in the code from when it was first
landed.
BUG=328354
R=hajimehoshi@chromium.org
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=197041
Patch Set 1 #
Total comments: 14
Patch Set 2 : Fix issues #
Total comments: 2
Messages
Total messages: 26 (3 generated)
PTAL
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML nit: Use TODO(yourname) instead. https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:162: result.append(m_document->suggestedMIMEType()); Use MarkupFormatter::appendAttributeValue for escaping
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:163: result.appendLiteral("; charset="); last '"' is missing? https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:164: result.append(m_document->charset()); MarkupFormatter::appendAttributeValue
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML Now MarkupFormatter has two modes represented by enum SerializationType (XML or not). I suggest to expand this to have XHTML mode so that we can delegate these special cases to MarkupFormatter. yosin@ or I am going to do that. What do you think?
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 01:40:32, hajimehoshi wrote: > nit: Use TODO(yourname) instead. Done. https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 02:30:16, hajimehoshi wrote: > Now MarkupFormatter has two modes represented by enum SerializationType (XML or > not). I suggest to expand this to have XHTML mode so that we can delegate these > special cases to MarkupFormatter. yosin@ or I am going to do that. What do you > think? It would be nice to have that delegated yes. Is another mode needed or can XML mode be used for XHTML as well? As an alternative, how feasible is to create a temporary HTMLMetaElement and print it using the existing methods in the serializer? https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:162: result.append(m_document->suggestedMIMEType()); On 2015/06/11 01:40:32, hajimehoshi wrote: > Use MarkupFormatter::appendAttributeValue for escaping Done. https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:163: result.appendLiteral("; charset="); On 2015/06/11 01:41:40, hajimehoshi wrote: > last '"' is missing? The last " is on line 166/168. The complete tag will look like this: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:164: result.append(m_document->charset()); On 2015/06/11 01:41:41, hajimehoshi wrote: > MarkupFormatter::appendAttributeValue Done.
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 08:15:16, Tiger (Sony Mobile) wrote: > On 2015/06/11 02:30:16, hajimehoshi wrote: > > Now MarkupFormatter has two modes represented by enum SerializationType (XML > or > > not). I suggest to expand this to have XHTML mode so that we can delegate > these > > special cases to MarkupFormatter. yosin@ or I am going to do that. What do you > > think? > > It would be nice to have that delegated yes. Is another mode needed or can XML > mode be used for XHTML as well? Yes, XHTML is slightly different from XML in that XHTML needs a space char before closing a tag '/>' as you did. > As an alternative, how feasible is to create a temporary HTMLMetaElement and > print it using the existing methods in the serializer? Good point. I definitely recommend to create a temporary object and use appendElement so far.
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 08:37:12, hajimehoshi wrote: > On 2015/06/11 08:15:16, Tiger (Sony Mobile) wrote: > > On 2015/06/11 02:30:16, hajimehoshi wrote: > > > Now MarkupFormatter has two modes represented by enum SerializationType (XML > > or > > > not). I suggest to expand this to have XHTML mode so that we can delegate > > these > > > special cases to MarkupFormatter. yosin@ or I am going to do that. What do > you > > > think? > > > > It would be nice to have that delegated yes. Is another mode needed or can XML > > mode be used for XHTML as well? > > Yes, XHTML is slightly different from XML in that XHTML needs a space char > before closing a tag '/>' as you did. This is only recommended for compatibility but not required as far as I know. > > > As an alternative, how feasible is to create a temporary HTMLMetaElement and > > print it using the existing methods in the serializer? > > Good point. I definitely recommend to create a temporary object and use > appendElement so far. This does seem to have its own set of problems. Mainly creating a new Elements requires a non-const Document reference. I'm not sure why. PageSerializer only have a const reference to the document which makes sense, and it is probably a good idea to keep it that way. Disregarding the const problem, this at least feels slightly better: RefPtrWillBeRawPtr<HTMLMetaElement> meta = HTMLMetaElement::create(const_cast<Document&>(*m_document)); meta->setAttribute(HTMLNames::http_equivAttr, "Content-Type"); String content = String::format("%s; charset=%s", m_document->suggestedMIMEType().utf8().data(), m_document->charset().utf8().data()); meta->setAttribute(HTMLNames::contentAttr, AtomicString(content)); MarkupAccumulator::appendElement(result, *meta.release(), nullptr);
hajimehoshi@chromium.org changed reviewers: + yosin@chromium.org
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 10:22:10, Tiger (Sony Mobile) wrote: > On 2015/06/11 08:37:12, hajimehoshi wrote: > > On 2015/06/11 08:15:16, Tiger (Sony Mobile) wrote: > > > On 2015/06/11 02:30:16, hajimehoshi wrote: > > > > Now MarkupFormatter has two modes represented by enum SerializationType > (XML > > > or > > > > not). I suggest to expand this to have XHTML mode so that we can delegate > > > these > > > > special cases to MarkupFormatter. yosin@ or I am going to do that. What do > > you > > > > think? > > > > > > It would be nice to have that delegated yes. Is another mode needed or can > XML > > > mode be used for XHTML as well? > > > > Yes, XHTML is slightly different from XML in that XHTML needs a space char > > before closing a tag '/>' as you did. > > This is only recommended for compatibility but not required as far as I know. > > > > > > As an alternative, how feasible is to create a temporary HTMLMetaElement and > > > print it using the existing methods in the serializer? > > > > Good point. I definitely recommend to create a temporary object and use > > appendElement so far. > > This does seem to have its own set of problems. Mainly creating a new Elements > requires a non-const Document reference. I'm not sure why. PageSerializer only > have a const reference to the document which makes sense, and it is probably a > good idea to keep it that way. Disregarding the const problem, this at least > feels slightly better: > > RefPtrWillBeRawPtr<HTMLMetaElement> meta = > HTMLMetaElement::create(const_cast<Document&>(*m_document)); > meta->setAttribute(HTMLNames::http_equivAttr, "Content-Type"); > String content = String::format("%s; charset=%s", > m_document->suggestedMIMEType().utf8().data(), > m_document->charset().utf8().data()); > meta->setAttribute(HTMLNames::contentAttr, AtomicString(content)); > MarkupAccumulator::appendElement(result, *meta.release(), nullptr); > How about using |Document* document = element->document();| and use local variable |documnet| instead of |m_document|. Yes, it is rather hacky, but not so strange since |element| parameter isn't const.
philipj@opera.com changed reviewers: + philipj@opera.com
https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so it is easier to append an element like this, without special cases for XHTML On 2015/06/11 08:37:12, hajimehoshi wrote: > On 2015/06/11 08:15:16, Tiger (Sony Mobile) wrote: > > On 2015/06/11 02:30:16, hajimehoshi wrote: > > > Now MarkupFormatter has two modes represented by enum SerializationType (XML > > or > > > not). I suggest to expand this to have XHTML mode so that we can delegate > > these > > > special cases to MarkupFormatter. yosin@ or I am going to do that. What do > you > > > think? > > > > It would be nice to have that delegated yes. Is another mode needed or can XML > > mode be used for XHTML as well? > > Yes, XHTML is slightly different from XML in that XHTML needs a space char > before closing a tag '/>' as you did. > > > As an alternative, how feasible is to create a temporary HTMLMetaElement and > > print it using the existing methods in the serializer? > > Good point. I definitely recommend to create a temporary object and use > appendElement so far. Actually XHTML does not need a space before the slash, I believe that style came from trying to write polyglot documents that would work in IE6 way back, and that no browser has had any problem with e.g. |<img src="image"/>| with no space since then. Is this the only difference, or are there other differences with an XHTML serializer? If the output is expected to actually be parsed as XML and not HTML, then serializing it as XML ought to be just fine...
On 2015/06/12 07:58:50, philipj wrote: > https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... > File Source/core/page/PageSerializer.cpp (right): > > https://codereview.chromium.org/1174923003/diff/1/Source/core/page/PageSerial... > Source/core/page/PageSerializer.cpp:159: // FIXME: Refactor MarkupAccumulator so > it is easier to append an element like this, without special cases for XHTML > On 2015/06/11 08:37:12, hajimehoshi wrote: > > On 2015/06/11 08:15:16, Tiger (Sony Mobile) wrote: > > > On 2015/06/11 02:30:16, hajimehoshi wrote: > > > > Now MarkupFormatter has two modes represented by enum SerializationType > (XML > > > or > > > > not). I suggest to expand this to have XHTML mode so that we can delegate > > > these > > > > special cases to MarkupFormatter. yosin@ or I am going to do that. What do > > you > > > > think? > > > > > > It would be nice to have that delegated yes. Is another mode needed or can > XML > > > mode be used for XHTML as well? > > > > Yes, XHTML is slightly different from XML in that XHTML needs a space char > > before closing a tag '/>' as you did. > > > > > As an alternative, how feasible is to create a temporary HTMLMetaElement and > > > print it using the existing methods in the serializer? > > > > Good point. I definitely recommend to create a temporary object and use > > appendElement so far. > > Actually XHTML does not need a space before the slash, I believe that style came > from trying to write polyglot documents that would work in IE6 way back, and > that no browser has had any problem with e.g. |<img src="image"/>| with no space > since then. Is this the only difference, or are there other differences with an > XHTML serializer? If the output is expected to actually be parsed as XML and not > HTML, then serializing it as XML ought to be just fine... IIUC, this is the only difference. I think there are very few browsers that can't accept XHTML without the space chars, but considering the compatibility, I'm not sure if we can omit this hack from Chromium.
Maybe I'm missing some context, but it looks like this hack wasn't there to begin with? http://stackoverflow.com/a/463692/191722 says the extra space was needed for Netscape 4, so I really doubt we need it for compat.
On 2015/06/12 08:40:26, philipj wrote: > Maybe I'm missing some context, but it looks like this hack wasn't there to > begin with? http://stackoverflow.com/a/463692/191722 says the extra space was > needed for Netscape 4, so I really doubt we need it for compat. Yeah, it wasn't there, but before there was no support for XML documents at all. The same logic with the space is also in MarkupFormatter::appendCloseTag. The standard (in http://www.w3.org/TR/xhtml1/#media) seem to suggest (as well as the stackoverflow post) that to be able to serve the content as text/html you need to follow the compatibility guidelines http://www.w3.org/TR/xhtml1/#C_2 but yes, a modern browser doesn't care.
On 2015/06/12 08:40:26, philipj wrote: > Maybe I'm missing some context, but it looks like this hack wasn't there to > begin with? http://stackoverflow.com/a/463692/191722 says the extra space was > needed for Netscape 4, so I really doubt we need it for compat. I'm OK to remove this as long as all the current (layout) tests pass. My only concern is the backward compatibility of Chromium.
On 2015/06/12 08:52:31, hajimehoshi wrote: > On 2015/06/12 08:40:26, philipj wrote: > > Maybe I'm missing some context, but it looks like this hack wasn't there to > > begin with? http://stackoverflow.com/a/463692/191722 says the extra space was > > needed for Netscape 4, so I really doubt we need it for compat. > > I'm OK to remove this as long as all the current (layout) tests pass. My only > concern is the backward compatibility of Chromium. If this is preserving existing behavior I'm fine with leaving it alone, it just looked like new code at first glance. In order to make the distinction, then outputting string literals really does seem the most straightforward, even if it looks ugly.
On 2015/06/12 08:56:50, philipj wrote: > On 2015/06/12 08:52:31, hajimehoshi wrote: > > On 2015/06/12 08:40:26, philipj wrote: > > > Maybe I'm missing some context, but it looks like this hack wasn't there to > > > begin with? http://stackoverflow.com/a/463692/191722 says the extra space > was > > > needed for Netscape 4, so I really doubt we need it for compat. > > > > I'm OK to remove this as long as all the current (layout) tests pass. My only > > concern is the backward compatibility of Chromium. > > If this is preserving existing behavior I'm fine with leaving it alone, it just > looked like new code at first glance. In order to make the distinction, then > outputting string literals really does seem the most straightforward, even if it > looks ugly. Ah, the term 'compatibility' was ambiguous. I noticed there are several views of compabitility: one is SerializerMarkupAccumulator, another is MarkupFormatter. In terms of the backward compatibility of SerializerMarkupAccumulator, adding XHTML space chars would change the current behavior of SerializerMarkupAccumulator, right? Adding space chars itself is OK but I think this should be in another CL. Creating HTMLMetaElement and using MarkupFormatter::appendElement here would add the space chars for XHTML and also change the behavior. Yeah, let's go with outputting strings here.
On 2015/06/12 09:17:13, hajimehoshi wrote: > On 2015/06/12 08:56:50, philipj wrote: > > On 2015/06/12 08:52:31, hajimehoshi wrote: > > > On 2015/06/12 08:40:26, philipj wrote: > > > > Maybe I'm missing some context, but it looks like this hack wasn't there > to > > > > begin with? http://stackoverflow.com/a/463692/191722 says the extra space > > was > > > > needed for Netscape 4, so I really doubt we need it for compat. > > > > > > I'm OK to remove this as long as all the current (layout) tests pass. My > only > > > concern is the backward compatibility of Chromium. > > > > If this is preserving existing behavior I'm fine with leaving it alone, it > just > > looked like new code at first glance. In order to make the distinction, then > > outputting string literals really does seem the most straightforward, even if > it > > looks ugly. > > Ah, the term 'compatibility' was ambiguous. I noticed there are several views of > compabitility: one is SerializerMarkupAccumulator, another is MarkupFormatter. > In terms of the backward compatibility of SerializerMarkupAccumulator, adding > XHTML space chars would change the current behavior of > SerializerMarkupAccumulator, right? Adding space chars itself is OK but I think > this should be in another CL. > > Creating HTMLMetaElement and using MarkupFormatter::appendElement here would add > the space chars for XHTML and also change the behavior. Yeah, let's go with > outputting strings here. Sounds good. Are there any outstanding issues in this CL now?
LGTM for me if hajimehoshi@ is happy.
On 2015/06/12 09:47:19, Tiger (Sony Mobile) wrote: > On 2015/06/12 09:17:13, hajimehoshi wrote: > > On 2015/06/12 08:56:50, philipj wrote: > > > On 2015/06/12 08:52:31, hajimehoshi wrote: > > > > On 2015/06/12 08:40:26, philipj wrote: > > > > > Maybe I'm missing some context, but it looks like this hack wasn't there > > to > > > > > begin with? http://stackoverflow.com/a/463692/191722 says the extra > space > > > was > > > > > needed for Netscape 4, so I really doubt we need it for compat. > > > > > > > > I'm OK to remove this as long as all the current (layout) tests pass. My > > only > > > > concern is the backward compatibility of Chromium. > > > > > > If this is preserving existing behavior I'm fine with leaving it alone, it > > just > > > looked like new code at first glance. In order to make the distinction, then > > > outputting string literals really does seem the most straightforward, even > if > > it > > > looks ugly. > > > > Ah, the term 'compatibility' was ambiguous. I noticed there are several views > of > > compabitility: one is SerializerMarkupAccumulator, another is MarkupFormatter. > > In terms of the backward compatibility of SerializerMarkupAccumulator, adding > > XHTML space chars would change the current behavior of > > SerializerMarkupAccumulator, right? Adding space chars itself is OK but I > think > > this should be in another CL. > > > > Creating HTMLMetaElement and using MarkupFormatter::appendElement here would > add > > the space chars for XHTML and also change the behavior. Yeah, let's go with > > outputting strings here. > > Sounds good. Are there any outstanding issues in this CL now? I think no.
https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... Source/core/page/PageSerializer.cpp:165: if (m_document->isXHTMLDocument()) Please remove fixing XHTML compatibility in this CL not to change the behavior of SerializerMarkupAccumulator as I explained; let's do it in another CL if necessary.
https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... File Source/core/page/PageSerializer.cpp (right): https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... Source/core/page/PageSerializer.cpp:165: if (m_document->isXHTMLDocument()) On 2015/06/12 09:57:02, hajimehoshi wrote: > Please remove fixing XHTML compatibility in this CL not to change the behavior > of SerializerMarkupAccumulator as I explained; let's do it in another CL if > necessary. This is not only for XHTML compatibility but XML compatibility as well. I can remove the space but if we don't add the '/' it will not generate a correct XML document and we at least want that.
On 2015/06/12 11:31:55, Tiger (Sony Mobile) wrote: > https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... > File Source/core/page/PageSerializer.cpp (right): > > https://codereview.chromium.org/1174923003/diff/20001/Source/core/page/PageSe... > Source/core/page/PageSerializer.cpp:165: if (m_document->isXHTMLDocument()) > On 2015/06/12 09:57:02, hajimehoshi wrote: > > Please remove fixing XHTML compatibility in this CL not to change the behavior > > of SerializerMarkupAccumulator as I explained; let's do it in another CL if > > necessary. > > This is not only for XHTML compatibility but XML compatibility as well. I can > remove the space but if we don't add the '/' it will not generate a correct XML > document and we at least want that. I see. lgtm
The CQ bit was checked by hajimehoshi@chromium.org
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/1174923003/20001
Message was sent while issue was closed.
Committed patchset #2 (id:20001) as https://src.chromium.org/viewvc/blink?view=rev&revision=197041 |
Chromium Code Reviews