|
|
Created:
6 years, 9 months ago by Daniel Bratell Modified:
6 years, 8 months ago CC:
blink-reviews, ed+blinkwatch_opera.com, dglazkov+blink, apavlov+blink_chromium.org, rune+blink, rwlbuis Base URL:
https://chromium.googlesource.com/chromium/blink.git@master Visibility:
Public. |
DescriptionSmaller CSSParser UTF16 buffers for escaped strings.
If an ASCII stylesheet contains escaped characters those need to be
unescaped and stored somewhere. The code used to allocate a buffer the
size of the whole CSS file * sizeof(UChar). That meant 700 KB on
arstechnica.com to store a single character. Instead store the
supposedly rare characters in special buffers of appropriate sizes.
BUG=352544
Committed: https://src.chromium.org/viewvc/blink?view=rev&revision=170451
Patch Set 1 #
Total comments: 21
Patch Set 2 : Dropped /* static */ and rewrote comment. #
Total comments: 6
Patch Set 3 : Rewrote comments. #
Total comments: 9
Patch Set 4 : More comments. #
Total comments: 2
Patch Set 5 : Reused existing code. #Patch Set 6 : Rebased #Messages
Total messages: 16 (0 generated)
jchaffraix, could you please have a look at this patch to stop using the full-stylesheet-unicode buffer for storing unicode strings for ASCII stylesheets? It is a big memory win (700 KB) for arstechnica at least.
https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:389: /* static */ There's no precedence for indication of static like this in Blink. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:420: /* static */ Likewise. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:430: /* static */ Likewise. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as upper limit. By that you mean that: "For escapes, the number of UTF-16 code units needed to represent it will not exceed the number of ASCII characters used to express that as a CSS escape"? Perhaps write something along those lines. At least I didn't immediately understand what you wrote above. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:462: /* static */ and here. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:514: /* static */ Same. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:546: /* static */ Same. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:638: /* static */ Same. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:660: /* static */ Same.
https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as upper limit. On 2014/03/14 12:50:23, rune - CET wrote: > > By that you mean that: "For escapes, the number of UTF-16 code units needed to > represent it will not exceed the number of ASCII characters used to express that > as a CSS escape"? Perhaps write something along those lines. At least I didn't > immediately understand what you wrote above. That is what it means, but considering that this code talks about SrcCharacterType and DestCharacterType the comment will be a little more fuzzy. I will rewrite it though.
https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:666: if (unicode > 0xff && sizeof(DestCharacterType) == 1) Note: This is an unrelated bug fix. Before this fix sizeof(SrcCharacterType) == sizeof(DestCharacterType) when this code was run so the bug wasn't noticed.
https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:389: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > There's no precedence for indication of static like this in Blink. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:420: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Likewise. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:430: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Likewise. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as upper limit. On 2014/03/14 12:50:23, rune - CET wrote: > > By that you mean that: "For escapes, the number of UTF-16 code units needed to > represent it will not exceed the number of ASCII characters used to express that > as a CSS escape"? Perhaps write something along those lines. At least I didn't > immediately understand what you wrote above. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as upper limit. On 2014/03/14 12:50:23, rune - CET wrote: > > By that you mean that: "For escapes, the number of UTF-16 code units needed to > represent it will not exceed the number of ASCII characters used to express that > as a CSS escape"? Perhaps write something along those lines. At least I didn't > immediately understand what you wrote above. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:449: // write it down so use that count as upper limit. On 2014/03/17 15:55:35, Daniel Bratell wrote: > On 2014/03/14 12:50:23, rune - CET wrote: > > > > By that you mean that: "For escapes, the number of UTF-16 code units needed to > > represent it will not exceed the number of ASCII characters used to express > that > > as a CSS escape"? Perhaps write something along those lines. At least I didn't > > immediately understand what you wrote above. > > That is what it means, but considering that this code talks about > SrcCharacterType and DestCharacterType the comment will be a little more fuzzy. > > I will rewrite it though. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:462: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > and here. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:514: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Same. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:546: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Same. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:638: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Same. Done. https://codereview.chromium.org/196353018/diff/1/Source/core/css/CSSTokenizer... Source/core/css/CSSTokenizer-in.cpp:660: /* static */ On 2014/03/14 12:50:23, rune - CET wrote: > > Same. Done. https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:666: if (unicode > 0xff && sizeof(DestCharacterType) == 1) On 2014/03/17 16:01:07, Daniel Bratell wrote: > Note: This is an unrelated bug fix. Before this fix sizeof(SrcCharacterType) == > sizeof(DestCharacterType) when this code was run so the bug wasn't noticed. Done.
https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:514: // it down so use that as the upper limit. I just noticed now this comment has the same issue as I commented on in the previous patch set. https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:636: // it down so use that as the upper limit. I just noticed now this comment has the same issue as I commented on in the previous patch set.
https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:514: // it down so use that as the upper limit. On 2014/03/17 20:18:35, rune - CET wrote: > > I just noticed now this comment has the same issue as I commented on in the > previous patch set. Done. https://codereview.chromium.org/196353018/diff/20001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:636: // it down so use that as the upper limit. On 2014/03/17 20:18:35, rune - CET wrote: > > I just noticed now this comment has the same issue as I commented on in the > previous patch set. Done.
rslgtm
I don't know this code really well so it would take me quite some time to understand the change. @Alexis, could you review this change?
https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:307: UChar* CSSTokenizer::getStringBuffer16(size_t len) We usually don't put the word "get" on getters as it's redundant. Here we could write allocateStringBuffer16 as it would better match the intent. Maybe we could mention somehow that the string is owned by the tokenizer in the name too but we don't want to make the name too long. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:389: unsigned CSSTokenizer::parseEscape(CharacterType*& src) This function should now have static linkage. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:535: src += src[2] == '\n' ? 3 : 2; This looks awfully like checkAndSkipString, maybe we could share some code? https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer.h (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer.h:162: Vector<OwnPtr<UChar[]> > m_cssStrings16; Really not a huge fan of the Vector approach but we need to hold onto the characters as the intermediate strings don't do it. Would you add a comment to explain this optimization in the code and why this is needed? (if we revisit how we handle CSSParserString's, we could remove this giant Vector).
Now even better. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:307: UChar* CSSTokenizer::getStringBuffer16(size_t len) On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote: > We usually don't put the word "get" on getters as it's redundant. Here we could > write allocateStringBuffer16 as it would better match the intent. > > Maybe we could mention somehow that the string is owned by the tokenizer in the > name too but we don't want to make the name too long. Done. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:389: unsigned CSSTokenizer::parseEscape(CharacterType*& src) On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote: > This function should now have static linkage. Hmm, not sure I understand. A function can have internal or external linkage. This has external linkage since it's part of a class. I did make it a static method in the class in this patch though, mostly to prevent it from modifying the tokenizer state by accident which it used to do ("src" and "currentCharacter()" happened to be the same pointer so it "worked(tm)".) I think it's doing fine as a static member function, but I can make it a file level internal function though if you think that matches the style better. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer.h (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer.h:162: Vector<OwnPtr<UChar[]> > m_cssStrings16; On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote: > Really not a huge fan of the Vector approach but we need to hold onto the > characters as the intermediate strings don't do it. > > Would you add a comment to explain this optimization in the code and why this is > needed? (if we revisit how we handle CSSParserString's, we could remove this > giant Vector). Done. - Though I wouldn't call it giant. It will be 20 bytes and empty in most cases. For Ars Technica it will also contain a 1 character string for another ~20 bytes. Those 40 bytes should be compared to the 700 KB we used to allocate into m_currentCharacter16 before this patch.
lgtm but please try to share some code. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:389: unsigned CSSTokenizer::parseEscape(CharacterType*& src) On 2014/03/21 15:14:39, Daniel Bratell wrote: > On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote: > > This function should now have static linkage. > > Hmm, not sure I understand. A function can have internal or external linkage. > This has external linkage since it's part of a class. > > I did make it a static method in the class in this patch though, mostly to > prevent it from modifying the tokenizer state by accident which it used to do > ("src" and "currentCharacter()" happened to be the same pointer so it > "worked(tm)".) > > I think it's doing fine as a static member function, but I can make it a file > level internal function though if you think that matches the style better. I was talking about file level internal function as you pointed out. All in all, it's not really blocking and it works either way. https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer.h (right): https://codereview.chromium.org/196353018/diff/40001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer.h:162: Vector<OwnPtr<UChar[]> > m_cssStrings16; On 2014/03/21 15:14:39, Daniel Bratell wrote: > On 2014/03/20 20:57:39, Julien Chaffraix - PST wrote: > > Really not a huge fan of the Vector approach but we need to hold onto the > > characters as the intermediate strings don't do it. > > > > Would you add a comment to explain this optimization in the code and why this > is > > needed? (if we revisit how we handle CSSParserString's, we could remove this > > giant Vector). > > Done. - Though I wouldn't call it giant. It will be 20 bytes and empty in most > cases. For Ars Technica it will also contain a 1 character string for another > ~20 bytes. Those 40 bytes should be compared to the 700 KB we used to allocate > into m_currentCharacter16 before this patch. I meant giant *hack* Vector but forgot one word. Ideally we shouldn't require that but due to the way CSSParserString works, we have to retain the memory until parsing is done :-( https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:540: parseEscape<SrcCharacterType>(src); I really think this inner loop should be replaced by a call to checkAndSkipString as it seems to do the exact same checks.
I don't think it's a hack at all to have a string table. It's the conceptual solution to many kinds of parsers. That we get away with using the storage of the input in most cases is a nice optimization but if you were to make an academic level solution, you would have collections of strings, symbols, uris. But academic.... Thanks for the review! https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken... File Source/core/css/CSSTokenizer-in.cpp (right): https://codereview.chromium.org/196353018/diff/60001/Source/core/css/CSSToken... Source/core/css/CSSTokenizer-in.cpp:540: parseEscape<SrcCharacterType>(src); On 2014/03/21 17:56:27, Julien Chaffraix - PST wrote: > I really think this inner loop should be replaced by a call to > checkAndSkipString as it seems to do the exact same checks. Done! Good. It does feel like there is a bit too much duplicated code right now (and even more duplicated machine code since these are templates with multiple invocations).
The CQ bit was checked by bratell@opera.com
CQ is trying da patch. Follow status at https://chromium-status.appspot.com/cq/bratell@opera.com/196353018/100001
Message was sent while issue was closed.
Change committed as 170451 |