Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(504)

Issue 2503683003: [WIP] Streaming CSS parser (Closed)

Created:
4 years, 1 month ago by Timothy Loh
Modified:
3 years, 8 months ago
CC:
chromium-reviews, blink-reviews-css, dglazkov+blink, apavlov+blink_chromium.org, darktears, blink-reviews, rwlbuis, Charlie Harrison
Target Ref:
refs/pending/heads/master
Project:
chromium
Visibility:
Public.

Description

[WIP] Streaming CSS parser This patch introduces a CSSParserTokenStream class, which is a lazily tokenized list of CSSParserTokens. It has a similar interface to CSSParserTokenRange, but as it lazily tokenizes it allows us to get the character offset of where we've tokenized up to. Instead of doing an up-front tokenization of entire stylesheets, we have it now interleaved with parsing. This does *not* make the parser interruptible. Lazy parsing now just stores the start offset for declaration list instead of a vector of CSSParserTokens. A function on CSSTokenizer is added to efficiently skip over a block to support this, without needing to actually tokenize. Most of the complexity in this comes from url tokens, which have interesting error recovery. The empty-block optimization in lazy parsing is removed, although I suspect it is generally not useful as empty blocks are only going to be used as sentinel values. We could re-add this later by seeing the number of characters if needed (the minimum required is 3, e.g. "x:0"), although it's slightly awkward as we would skip the block but then have to rewind to tokenize it. The CSSParserObserverWrapper class, which was used to store character offset information about tokens and comment locations, is removed since we can now extract the information directly from stream objects. This means anywhere that requires this offset information now needs to operate on stream objects instead of ranges. This requires being careful when calling peek()/atEnd() as token lookahead advances the current offset. For at-rules, we now need to pass in the offsets where the preludes start. This is so we can retain the generic at-rule parsing logic as independent of the individual at-rule logic. For style rules we now take a stream from the start of the selector, instead of a range and a stream for the block, as the observer requires callbacks for the selector structure. The callbacks for @import rules now also include the imported url (i.e. contain the entire prelude) for simplicity. This patch removes some of the tracing metrics we have around tokenization and parsing. As these are now interleaved, we can no longer have separate measurements for tokenization and parsing. We also lose the information about number of tokens as when lazy parsing is enabled we will skip tokenization inside style rule declaration blocks. ************* TODO: Work out what to do with blink_style bucketing This patch greatly reduces the memory requirements of stylesheet parsing as we discard tokens after parsing them. We currently will allocate a large contiguous chunk of memory for storing tokens (can be up to a few MB for large web properties). With this patch, the vector usually caps at 128 tokens or less. The two changes in InspectorStyleSheet are due to slight changes in how we call the CSSParserObserver. Firstly we now call it for style rules even when the selector is invalid (startRuleHeader, observeSelector, endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe rules inside the @keyframes rules (endRuleBlock for @keyframes is called after after the nested blocks). ************* TODO: Investigate and comment on performance ************* TODO: Comment on custom property stuff Some more details are available in the design doc: https://docs.google.com/document/d/125OYuPzEXLziVNzzgClkRdggS1iyGEqC75c_H61rPJM/edit BUG=661854

Patch Set 1 #

Patch Set 2 : bla #

Patch Set 3 : bla #

Patch Set 4 : wip #

Patch Set 5 : bla #

Patch Set 6 : bla #

Patch Set 7 : clean up a bit #

Patch Set 8 : rebase/fix last inspector tests #

Patch Set 9 : bla #

Patch Set 10 : bla #

Patch Set 11 : fix @apply #

Patch Set 12 : moo #

Patch Set 13 : ready for review #

Patch Set 14 : rebase #

Total comments: 11
Unified diffs Side-by-side diffs Delta from patch set Stats (+842 lines, -459 lines) Patch
M third_party/WebKit/Source/core/css/BUILD.gn View 1 2 3 4 5 6 7 8 9 10 11 12 13 1 chunk +2 lines, -1 line 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSLazyParsingState.h View 1 2 3 4 5 6 7 8 9 3 chunks +3 lines, -6 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSLazyParsingState.cpp View 1 2 3 4 5 6 7 8 9 3 chunks +3 lines, -14 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSLazyPropertyParserImpl.h View 1 2 3 4 5 3 chunks +2 lines, -4 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSLazyPropertyParserImpl.cpp View 1 2 3 4 5 6 7 1 chunk +3 lines, -8 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSParserImpl.h View 1 2 3 4 5 6 7 8 9 10 11 12 13 6 chunks +43 lines, -22 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSParserImpl.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 30 chunks +252 lines, -245 lines 0 comments Download
D third_party/WebKit/Source/core/css/parser/CSSParserObserverWrapper.h View 1 2 3 4 5 6 7 8 9 10 11 1 chunk +0 lines, -61 lines 0 comments Download
D third_party/WebKit/Source/core/css/parser/CSSParserObserverWrapper.cpp View 1 chunk +0 lines, -49 lines 0 comments Download
A third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h View 1 2 3 4 5 6 7 8 9 10 11 1 chunk +184 lines, -0 lines 3 comments Download
A third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp View 1 2 3 4 5 6 7 8 9 1 chunk +38 lines, -0 lines 3 comments Download
M third_party/WebKit/Source/core/css/parser/CSSSelectorParser.h View 1 2 3 4 5 6 7 8 9 10 11 12 13 3 chunks +13 lines, -0 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSSelectorParser.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 2 chunks +52 lines, -0 lines 1 comment Download
M third_party/WebKit/Source/core/css/parser/CSSTokenizer.h View 1 2 3 4 5 6 3 chunks +15 lines, -4 lines 0 comments Download
M third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp View 1 2 3 4 5 6 7 8 9 10 11 12 13 2 chunks +218 lines, -41 lines 4 comments Download
M third_party/WebKit/Source/core/inspector/InspectorStyleSheet.cpp View 1 2 3 4 5 6 7 8 9 10 11 2 chunks +14 lines, -4 lines 0 comments Download

Messages

Total messages: 36 (31 generated)
Timothy Loh
csharrison: This isn't quite done yet, but if you have a spare moment and it ...
4 years ago (2016-12-06 06:57:49 UTC) #16
Charlie Harrison
Awesome work! I just sent you a brief manual test I did with the 2x2 ...
4 years ago (2016-12-06 21:14:53 UTC) #18
Timothy Loh
There's still some stuff missing but I think this is ready for review now. Mainly ...
4 years ago (2016-12-15 05:29:34 UTC) #28
Charlie Harrison
FYI I will probably not have time to review this until after the new year ...
4 years ago (2016-12-22 15:47:27 UTC) #31
Charlie Harrison
3 years, 11 months ago (2017-01-09 21:35:07 UTC) #36
Generally looks good, though I am still learning this code.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:13: //
TODO(timloh): Should be using CSSTokenier::skipToBlockEnd() instead.
s/CSSTokenier/CSSTokenizer

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:16: if
(m_tokenizer.m_tokens.size() == m_currentIndex) {
could these conditions be re-written in terms of hasLookedAhead?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:28: const
CSSParserToken& CSSParserTokenStream::peekInternal() {
If tokenizeSingle() returned a token, this could be rewritten to:

if (hasLookedAhead)
  return m_tokenizer.m_tokens[m_currentIndex]
return m_tokenizer.tokenizeSingle()


Assuming tokenizeSingle just returns the staticEOFToken on end.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:22: enum
MakeSubStreamTag { MakeSubStream };
optional: Put this right above the constructor that needs this, so as to make it
obvious this enum's purpose.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:50:
m_tokenizer.m_tokens.remove(m_startIndex, m_currentIndex - m_startIndex);
I'll need to look at all usage, but it seems like if it is possible to be
removing in the middle of a Vector this could have bad perf consequences.

Disclaimer: I'm not seeing any of this in profiles.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:133: bool
hasLookedAhead() {
Maybe replace this method with:

DCHECK(m_currentIndex == m_tokenizer.size() || m_currentIndex ==
m_tokenizer.m_tokens.size() - 1);
return m_currentIndex != m_tokenizer.m_tokens.size();

Consider this optional if the current code is successfully optimized away
though.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSSelectorParser.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSSelectorParser.cpp:162: return
CSSSelectorList::adoptSelectorVector(selectorList);
Not new code, but why do we allocate some selectors on the stack, then copy them
into a separate CSSSelectorList?

Context: I am seeing the fastMalloc (and other things) in adoptSelectorVector in
profiles as a non-significant chunk of time here.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp (left):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:34:
m_tokens.reserveInitialCapacity(string.length() / 3);
w000t!

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:26:
m_input.advance(startOffset);
It looks like advance() does not change the underlying length. Should I be
worried at the reserveInitialCapacity() call below?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:48: void
CSSTokenizer::tokenizeSingle() {
Maybe tokenizeSingle could return something useful, like whether we're finished
tokenizing, or maybe the last token tokenized?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:677: if
(isASCIIAlphaCaselessEqual(nextChar, cc))
Hm, this function has a compiler hint that the branch is likely to be true. I'm
worried that might lead to poor branch prediction. WDYT?

Powered by Google App Engine
This is Rietveld 408576698