Issue 2503683003: [WIP] Streaming CSS parser

Timothy Loh

The CQ bit was checked by timloh@chromium.org to run a CQ dry run

4 years, 1 month ago (2016-11-23 05:02:51 UTC) #1

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2503683003/80001

4 years, 1 month ago (2016-11-23 05:03:08 UTC) #2

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years, 1 month ago (2016-11-23 05:10:14 UTC) #3

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: chromium_presubmit on master.tryserver.chromium.linux (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.linux/builders/chromium_presubmit/builds/311230)

4 years, 1 month ago (2016-11-23 05:10:14 UTC) #4

Timothy Loh

Description was changed from ========== wip, not for review BUG= ========== to ========== wip, not ...

4 years ago (2016-12-05 02:15:13 UTC) #5

Timothy Loh

Description was changed from ========== wip, not for review - Introduce CSSParserTokenStream class - offset() ...

4 years ago (2016-12-05 02:51:31 UTC) #6

Timothy Loh

Description was changed from ========== wip, not for review - Introduce CSSParserTokenStream class - offset() ...

4 years ago (2016-12-05 05:45:48 UTC) #7

Timothy Loh

Description was changed from ========== wip, not for review - Introduce CSSParserTokenStream class - offset() ...

4 years ago (2016-12-05 06:30:18 UTC) #8

Description was changed from

==========
wip, not for review

- Introduce CSSParserTokenStream class
- offset()
- Lazy parsing
-- empty block optimisation
- Observer ickiness, offset before/after comments
-- Selectors
- Start/end offset for preludes
- Tracing metrics
-- Token count no longer available if we use lazy parsing, so not hooked up . .
.
- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- CSSTokenizer::skipToBlockEnd()
-- url tokens :\

BUG=
==========

to

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values.

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

Timothy Loh

Description was changed from ========== wip, not for review This patch introduces a CSSParserTokenStream class, ...

4 years ago (2016-12-05 06:30:38 UTC) #9

Description was changed from

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values.

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

to

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values.

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

Timothy Loh

Description was changed from ========== wip, not for review This patch introduces a CSSParserTokenStream class, ...

4 years ago (2016-12-05 06:52:33 UTC) #10

Description was changed from

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values.

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

to

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

Timothy Loh

The CQ bit was checked by timloh@chromium.org to run a CQ dry run

4 years ago (2016-12-05 07:06:46 UTC) #11

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2503683003/120001

4 years ago (2016-12-05 07:07:02 UTC) #12

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years ago (2016-12-05 08:19:24 UTC) #13

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/328422)

4 years ago (2016-12-05 08:19:24 UTC) #14

Timothy Loh

Description was changed from ========== wip, not for review This patch introduces a CSSParserTokenStream class, ...

4 years ago (2016-12-06 05:28:43 UTC) #15

Description was changed from

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

to

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

The two changes in InspectorStyleSheet are due to slight changes in how
we call the CSSParserObserver. Firstly we now call it for style rules
even when the selector is invalid (startRuleHeader, observeSelector,
endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe
rules inside the @keyframes rules (endRuleBlock for @keyframes is called
after after the nested blocks).

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

Timothy Loh

csharrison: This isn't quite done yet, but if you have a spare moment and it ...

4 years ago (2016-12-06 06:57:49 UTC) #16

Charlie Harrison

csharrison@chromium.org changed reviewers: + csharrison@chromium.org

4 years ago (2016-12-06 21:14:52 UTC) #17

Charlie Harrison

Awesome work! I just sent you a brief manual test I did with the 2x2 ...

4 years ago (2016-12-06 21:14:53 UTC) #18

Timothy Loh

The CQ bit was checked by timloh@chromium.org to run a CQ dry run

4 years ago (2016-12-09 04:44:34 UTC) #19

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2503683003/200001

4 years ago (2016-12-09 04:44:54 UTC) #20

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years ago (2016-12-09 06:01:02 UTC) #21

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: mac_chromium_rel_ng on master.tryserver.chromium.mac (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.mac/builders/mac_chromium_rel_ng/builds/351227)

4 years ago (2016-12-09 06:01:03 UTC) #22

Timothy Loh

Description was changed from ========== wip, not for review This patch introduces a CSSParserTokenStream class, ...

4 years ago (2016-12-12 06:50:38 UTC) #23

Description was changed from

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

The two changes in InspectorStyleSheet are due to slight changes in how
we call the CSSParserObserver. Firstly we now call it for style rules
even when the selector is invalid (startRuleHeader, observeSelector,
endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe
rules inside the @keyframes rules (endRuleBlock for @keyframes is called
after after the nested blocks).

- Performance implication
- Duplicated consumeDeclarationList for @apply until we fix custom property
input preservation
- Custom property stuff

[DOC LINK]

BUG=
==========

to

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure. The callbacks for @import rules now also include the imported
url (i.e. contain the entire prelude) for simplicity.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

The two changes in InspectorStyleSheet are due to slight changes in how
we call the CSSParserObserver. Firstly we now call it for style rules
even when the selector is invalid (startRuleHeader, observeSelector,
endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe
rules inside the @keyframes rules (endRuleBlock for @keyframes is called
after after the nested blocks).

- Performance implication
- Custom property stuff

[DOC LINK]

BUG=
==========

Timothy Loh

Description was changed from ========== wip, not for review This patch introduces a CSSParserTokenStream class, ...

4 years ago (2016-12-15 04:41:37 UTC) #24

Description was changed from

==========
wip, not for review

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0").

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure. The callbacks for @import rules now also include the imported
url (i.e. contain the entire prelude) for simplicity.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

The two changes in InspectorStyleSheet are due to slight changes in how
we call the CSSParserObserver. Firstly we now call it for style rules
even when the selector is invalid (startRuleHeader, observeSelector,
endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe
rules inside the @keyframes rules (endRuleBlock for @keyframes is called
after after the nested blocks).

- Performance implication
- Custom property stuff

[DOC LINK]

BUG=
==========

to

==========
[WIP] Streaming CSS parser

This patch introduces a CSSParserTokenStream class, which is a lazily
tokenized list of CSSParserTokens. It has a similar interface to
CSSParserTokenRange, but as it lazily tokenizes it allows us to get the
character offset of where we've tokenized up to. Instead of doing an
up-front tokenization of entire stylesheets, we have it now interleaved
with parsing. This does *not* make the parser interruptible.

Lazy parsing now just stores the start offset for declaration list
instead of a vector of CSSParserTokens. A function on CSSTokenizer is
added to efficiently skip over a block to support this, without needing
to actually tokenize. Most of the complexity in this comes from url
tokens, which have interesting error recovery. The empty-block
optimization in lazy parsing is removed, although I suspect it is
generally not useful as empty blocks are only going to be used as
sentinel values. We could re-add this later by seeing the number of
characters if needed (the minimum required is 3, e.g. "x:0"), although
it's slightly awkward as we would skip the block but then have to rewind
to tokenize it.

The CSSParserObserverWrapper class, which was used to store character
offset information about tokens and comment locations, is removed since
we can now extract the information directly from stream objects. This
means anywhere that requires this offset information now needs to
operate on stream objects instead of ranges. This requires being careful
when calling peek()/atEnd() as token lookahead advances the current
offset.

For at-rules, we now need to pass in the offsets where the preludes
start. This is so we can retain the generic at-rule parsing logic as
independent of the individual at-rule logic. For style rules we now take
a stream from the start of the selector, instead of a range and a stream
for the block, as the observer requires callbacks for the selector
structure. The callbacks for @import rules now also include the imported
url (i.e. contain the entire prelude) for simplicity.

This patch removes some of the tracing metrics we have around
tokenization and parsing. As these are now interleaved, we can no longer
have separate measurements for tokenization and parsing. We also lose
the information about number of tokens as when lazy parsing is enabled
we will skip tokenization inside style rule declaration blocks.

************* TODO: Work out what to do with blink_style bucketing

This patch greatly reduces the memory requirements of stylesheet parsing
as we discard tokens after parsing them. We currently will allocate a
large contiguous chunk of memory for storing tokens (can be up to a few
MB for large web properties). With this patch, the vector usually caps
at 128 tokens or less.

The two changes in InspectorStyleSheet are due to slight changes in how
we call the CSSParserObserver. Firstly we now call it for style rules
even when the selector is invalid (startRuleHeader, observeSelector,
endRuleHeader, but no start/endRuleBlock). Secondly we now nest keyframe
rules inside the @keyframes rules (endRuleBlock for @keyframes is called
after after the nested blocks).

************* TODO: Investigate and comment on performance
************* TODO: Comment on custom property stuff

Some more details are available in the design doc:
https://docs.google.com/document/d/125OYuPzEXLziVNzzgClkRdggS1iyGEqC75c_H61rP...

BUG=661854
==========

Timothy Loh

The CQ bit was checked by timloh@chromium.org to run a CQ dry run

4 years ago (2016-12-15 05:09:26 UTC) #25

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2503683003/240001

4 years ago (2016-12-15 05:09:46 UTC) #26

Timothy Loh

timloh@chromium.org changed reviewers: + alancutter@chromium.org

4 years ago (2016-12-15 05:29:33 UTC) #27

Timothy Loh

There's still some stuff missing but I think this is ready for review now. Mainly ...

4 years ago (2016-12-15 05:29:34 UTC) #28

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

4 years ago (2016-12-15 06:22:26 UTC) #29

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/335182)

4 years ago (2016-12-15 06:22:27 UTC) #30

Charlie Harrison

FYI I will probably not have time to review this until after the new year ...

4 years ago (2016-12-22 15:47:27 UTC) #31

Timothy Loh

The CQ bit was checked by timloh@chromium.org to run a CQ dry run

3 years, 11 months ago (2017-01-06 04:06:30 UTC) #32

commit-bot: I haz the power

Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.org/2503683003/260001

3 years, 11 months ago (2017-01-06 04:06:48 UTC) #33

commit-bot: I haz the power

The CQ bit was unchecked by commit-bot@chromium.org

3 years, 11 months ago (2017-01-06 05:12:20 UTC) #34

commit-bot: I haz the power

Dry run: Try jobs failed on following builders: win_chromium_x64_rel_ng on master.tryserver.chromium.win (JOB_FAILED, http://build.chromium.org/p/tryserver.chromium.win/builders/win_chromium_x64_rel_ng/builds/343340)

3 years, 11 months ago (2017-01-06 05:12:21 UTC) #35

Charlie Harrison

3 years, 11 months ago (2017-01-09 21:35:07 UTC) #36

Generally looks good, though I am still learning this code.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:13: //
TODO(timloh): Should be using CSSTokenier::skipToBlockEnd() instead.
s/CSSTokenier/CSSTokenizer

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:16: if
(m_tokenizer.m_tokens.size() == m_currentIndex) {
could these conditions be re-written in terms of hasLookedAhead?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.cpp:28: const
CSSParserToken& CSSParserTokenStream::peekInternal() {
If tokenizeSingle() returned a token, this could be rewritten to:

if (hasLookedAhead)
  return m_tokenizer.m_tokens[m_currentIndex]
return m_tokenizer.tokenizeSingle()


Assuming tokenizeSingle just returns the staticEOFToken on end.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:22: enum
MakeSubStreamTag { MakeSubStream };
optional: Put this right above the constructor that needs this, so as to make it
obvious this enum's purpose.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:50:
m_tokenizer.m_tokens.remove(m_startIndex, m_currentIndex - m_startIndex);
I'll need to look at all usage, but it seems like if it is possible to be
removing in the middle of a Vector this could have bad perf consequences.

Disclaimer: I'm not seeing any of this in profiles.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSParserTokenStream.h:133: bool
hasLookedAhead() {
Maybe replace this method with:

DCHECK(m_currentIndex == m_tokenizer.size() || m_currentIndex ==
m_tokenizer.m_tokens.size() - 1);
return m_currentIndex != m_tokenizer.m_tokens.size();

Consider this optional if the current code is successfully optimized away
though.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSSelectorParser.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSSelectorParser.cpp:162: return
CSSSelectorList::adoptSelectorVector(selectorList);
Not new code, but why do we allocate some selectors on the stack, then copy them
into a separate CSSSelectorList?

Context: I am seeing the fastMalloc (and other things) in adoptSelectorVector in
profiles as a non-significant chunk of time here.

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp (left):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:34:
m_tokens.reserveInitialCapacity(string.length() / 3);
w000t!

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
File third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp (right):

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:26:
m_input.advance(startOffset);
It looks like advance() does not change the underlying length. Should I be
worried at the reserveInitialCapacity() call below?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:48: void
CSSTokenizer::tokenizeSingle() {
Maybe tokenizeSingle could return something useful, like whether we're finished
tokenizing, or maybe the last token tokenized?

https://codereview.chromium.org/2503683003/diff/260001/third_party/WebKit/Sou...
third_party/WebKit/Source/core/css/parser/CSSTokenizer.cpp:677: if
(isASCIIAlphaCaselessEqual(nextChar, cc))
Hm, this function has a compiler hint that the branch is likely to be true. I'm
worried that might lead to poor branch prediction. WDYT?

Issue 2503683003: [WIP] Streaming CSS parser (Closed)

Description

Patch Set 1 #

Patch Set 2 : bla #

Patch Set 3 : bla #

Patch Set 4 : wip #

Patch Set 5 : bla #

Patch Set 6 : bla #

Patch Set 7 : clean up a bit #

Patch Set 8 : rebase/fix last inspector tests #

Patch Set 9 : bla #

Patch Set 10 : bla #

Patch Set 11 : fix @apply #

Patch Set 12 : moo #

Patch Set 13 : ready for review #

Patch Set 14 : rebase #

Messages