Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(523)

Side by Side Diff: sky/specs/parsing.md

Issue 677533004: Update parser to support <t>, <template>, /> syntax (Closed) Base URL: https://github.com/domokit/mojo.git@master
Patch Set: Specs: parser updates Created 6 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « sky/specs/markup.md ('k') | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Parsing 1 Parsing
2 ======= 2 =======
3 3
4 Parsing in Sky is a strict pipeline consisting of four stages: 4 Parsing in Sky is a strict pipeline consisting of five stages:
5 5
6 - decoding, which converts incoming bytes into Unicode characters 6 - decoding, which converts incoming bytes into Unicode characters
7 using UTF-8 7 using UTF-8.
8 8
9 - normalising, which converts certain sequences of characters 9 - normalising, which manipulates the sequence of characters.
10 10
11 - tokenising, which converts these characters into tokens 11 - tokenising, which converts these characters into three kinds of
12 tokens: character tokens, start tag tokens, and end tag tokens.
13 Character tokens have a single character value. Tag tokens have a
14 tag name, and a list of name/value pairs known as attributes.
12 15
13 - tree construction, which converts these tokens into a tree of nodes 16 - token cleanup, which converts sequences of character tokens into
17 string tokens, and removes duplicate attributes in tag tokens.
18
19 - tree construction, which converts these tokens into a tree of nodes.
14 20
15 Later stages cannot affect earlier stages. 21 Later stages cannot affect earlier stages.
16 22
17 When a sequence of bytes is to be parsed, there is always a defined 23 When a sequence of bytes is to be parsed, there is always a defined
18 _parsing context_, which is either "application" or "module". 24 _parsing context_, which is either an Application object or a Module
25 object.
19 26
20 27
21 Decoding stage 28 Decoding stage
22 -------------- 29 --------------
23 30
24 To decode a sequence of bytes _bytes_ for parsing, the [UTF-8 31 To decode a sequence of bytes _bytes_ for parsing, the [UTF-8
25 decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used 32 decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used
26 to transform _bytes_ into a sequence of characters _characters_. 33 to transform _bytes_ into a sequence of characters _characters_.
27 34
28 This sequence must then be passed to the normalisation stage. 35 This sequence must then be passed to the normalisation stage.
(...skipping 18 matching lines...) Expand all
47 Tokenisation stage 54 Tokenisation stage
48 ------------------ 55 ------------------
49 56
50 To tokenise a sequence of characters, a state machine is used. 57 To tokenise a sequence of characters, a state machine is used.
51 58
52 Initially, the state machine must begin in the **signature** state. 59 Initially, the state machine must begin in the **signature** state.
53 60
54 Each character in turn must be processed according to the rules of the 61 Each character in turn must be processed according to the rules of the
55 state at the time the character is processed. A character is processed 62 state at the time the character is processed. A character is processed
56 once it has been _consumed_. This produces a stream of tokens; the 63 once it has been _consumed_. This produces a stream of tokens; the
57 tokens must be passed to the tree construction stage. 64 tokens must be passed to the token cleanup stage.
58 65
59 When the last character is consumed, the tokeniser ends. 66 When the last character is consumed, the tokeniser ends.
60 67
61 68
62 ### Expecting a string ### 69 ### Expecting a string ###
63 70
64 When the user agent is to _expect a string_, it must run these steps: 71 When the user agent is to _expect a string_, it must run these steps:
65 72
66 1. Let _expectation_ be the string to expect. When this string is 73 1. Let _expectation_ be the string to expect. When this string is
67 indexed, the first character has index 0. 74 indexed, the first character has index 0.
(...skipping 10 matching lines...) Expand all
78 85
79 6. Switch to the **expect a string** state. 86 6. Switch to the **expect a string** state.
80 87
81 88
82 ### Tokeniser states ### 89 ### Tokeniser states ###
83 90
84 #### **Signature** state #### 91 #### **Signature** state ####
85 92
86 If the current character is... 93 If the current character is...
87 94
88 * '```#```': If the _parsing context_ is not "application", switch to 95 * '```#```': If the _parsing context_ is not an Application, switch to
89 the _failed signature_ state. Otherwise, expect the string 96 the _failed signature_ state. Otherwise, expect the string
90 "```#!mojo mojo:sky```", with _after signature_ as the _success_ 97 "```#!mojo mojo:sky```", with _after signature_ as the _success_
91 state and _failed signature_ as the _failure_ state. 98 state and _failed signature_ as the _failure_ state.
92 99
93 * '```S```': If the _parsing context_ is not "module", switch to the 100 * '```S```': If the _parsing context_ is not a Module, switch to the
94 _failed signature_ state. Otherwise, expect the string 101 _failed signature_ state. Otherwise, expect the string
95 "```SKY MODULE```", with _after signature_ as the _success_ state, 102 "```SKY MODULE```", with _after signature_ as the _success_ state,
96 and _failed signature_ as the _failure_ state. 103 and _failed signature_ as the _failure_ state.
97 104
98 * Anything else: Jump to the **failed signature** state. 105 * Anything else: Jump to the **failed signature** state.
99 106
100 107
101 #### **Expect a string** state #### 108 #### **Expect a string** state ####
102 109
103 If the current character is not the same as the <i>index</i>th character in 110 If the current character is not the same as the <i>index</i>th character in
(...skipping 284 matching lines...) Expand 10 before | Expand all | Expand 10 after
388 tag** state. 395 tag** state.
389 396
390 * Anything else: Append the current character to the tag name, and 397 * Anything else: Append the current character to the tag name, and
391 consume the current character. Stay in this state. 398 consume the current character. Stay in this state.
392 399
393 400
394 ### **Void tag** state ### 401 ### **Void tag** state ###
395 402
396 If the current character is... 403 If the current character is...
397 404
398 * '```>```': Consume the current character. Switch to the **after 405 * '```>```': Consume the current character. Switch to the **after void
399 tag** state. 406 tag** state.
400 407
401 * Anything else: Switch to the **before attribute name** state without 408 * Anything else: Switch to the **before attribute name** state without
402 consuming the current character. 409 consuming the current character.
403 410
404 411
405 ### **Before attribute name** state ### 412 ### **Before attribute name** state ###
406 413
407 If the current character is... 414 If the current character is...
408 415
(...skipping 137 matching lines...) Expand 10 before | Expand all | Expand 10 after
546 553
547 If the tag token was a start tag token and the tag name was 554 If the tag token was a start tag token and the tag name was
548 '```script```', then and switch to the **script raw data** state. 555 '```script```', then and switch to the **script raw data** state.
549 556
550 If the tag token was a start tag token and the tag name was 557 If the tag token was a start tag token and the tag name was
551 '```style```', then and switch to the **style raw data** state. 558 '```style```', then and switch to the **style raw data** state.
552 559
553 Otherwise, switch to the **data** state. 560 Otherwise, switch to the **data** state.
554 561
555 562
563 ### **After void tag** state ###
564
565 Emit the tag token.
566
567 If the tag token is a start tag token, emit an end tag token with the
568 same tag name.
569
570 Switch to the **data** state.
571
572
556 ### **Comment start 1** state ### 573 ### **Comment start 1** state ###
557 574
558 If the current character is... 575 If the current character is...
559 576
560 * '```-```': Consume the character and switch to the **comment start 577 * '```-```': Consume the character and switch to the **comment start
561 2** state. 578 2** state.
562 579
563 * '```>```': Emit character tokens for '```<!>```'. Consume the 580 * '```>```': Emit character tokens for '```<!>```'. Consume the
564 current character. Switch to the **data** state. 581 current character. Switch to the **data** state.
565 582
(...skipping 294 matching lines...) Expand 10 before | Expand all | Expand 10 after
860 877
861 * Any other character in the range '```0```'..'```9```', 878 * Any other character in the range '```0```'..'```9```',
862 '```a```'..'```f```', '```A```'..'```F```': Consume the character 879 '```a```'..'```f```', '```A```'..'```F```': Consume the character
863 and stay in this state. 880 and stay in this state.
864 881
865 * Anything else: Run the _emitting operation_ for all but the last 882 * Anything else: Run the _emitting operation_ for all but the last
866 character in _raw value_, and switch to the **data state** without 883 character in _raw value_, and switch to the **data state** without
867 consuming the current character. 884 consuming the current character.
868 885
869 886
870 Tree construction 887 Token cleanup stage
871 ----------------- 888 -------------------
872 889
873 To construct a node tree from a _sequence of tokens_ and a document _document_: 890 Replace each sequence of character tokens with a single string token
891 whose value is the concatenation of all the characters in the
892 character tokens.
893
894 For each start tag token, remove all but the first name/value pair for
895 each name (i.e. remove duplicate attributes, keeping only the first
896 one).
897
898 For each end tag token, remove the attributes entirely.
899
900 If the token is a start tag token, notify the JavaScript token stream
901 callback of the token.
902
903 Then, pass the tokens to the tree construction stage.
904
905
906 Tree construction stage
907 -----------------------
908
909 To construct a node tree from a _sequence of tokens_ and a document
910 _document_:
874 911
875 1. Initialize the _stack of open nodes_ to be _document_. 912 1. Initialize the _stack of open nodes_ to be _document_.
876 2. Consider each token _token_ in the _sequence of tokens_ in turn. 913 2. Consider each token _token_ in the _sequence of tokens_ in turn, as
877 - If _token_ is a text token, 914 follows. If a token is to be skipped, then jump straight to the
878 1. Create a text node _node_ with character data _token.data_. 915 next token, without doing any more work with the skipped token.
916 - If _token_ is a string token,
917 1. If the value of the token contains only U+0020 and U+000A
918 characters, and there is no ```t``` element on the _stack of
919 open nodes_, then skip the token.
920 2. Create a text node _node_ whose character data is the value of
921 the token.
922 3. Append _node_ to the top node in the _stack of open nodes_.
923 - If _token_ is a start tag token,
924 1. Create an element _node_ with tag name and attributes given by
925 the token.
879 2. Append _node_ to the top node in the _stack of open nodes_. 926 2. Append _node_ to the top node in the _stack of open nodes_.
880 - If _token_ is a start tag token, 927 - If _token_ is an end tag token:
881 1. Create an element _node_ with tag name _token.tagName_ and attributes 928 1. Let _node_ be the topmost node in the _stack of open nodes_
882 _token.attributes_. 929 whose tag name is the same as the token's tag name, if any. If
883 2. Append _node_ to the top node in the _stack of open nodes_. 930 there isn't one, skip this token.
884 3. If the _token.selfClosing_ flag is not set, push _node_ onto the 931 2. If there's a ```template``` element in the _stack of open
885 _stack of open elements_. 932 nodes_ above _node_, then skip this token.
886 4. If _token.tagName_ is _script_, TODO: Execute the script. 933 3. Pop nodes from the _stack of open nodes_ until _node_ has been
887 - If _token_ is an end tag token, 934 popped.
888 1. If the _stack of open nodes_ contains a node whose _tagName_ is 935 4. If _node_'s tag name is ```script```, then yield until there
889 _token.tagName_, 936 are no pending import loads, then execute the script given by
890 - Pop nodes from the _stack of open nodes_ until a node with 937 the element's contents.
891 a _tagName_ equal to _token.tagName_ has been popped. 938 3. Yield until there are no pending import loads.
892 2. Otherwise, ignore _token_. 939 3. Fire a ```load``` event at the _parsing context_ object.
893 - If _token_ is a comment token,
894 1. Ignore _token_.
895 - If _token_ is an EOF token,
896 1. Pop all the nodes from the _stack of open nodes_.
897 2. Signal _document_ that parsing is complete.
898
899 TODO(ianh): &lt;template>, &lt;t>
OLDNEW
« no previous file with comments | « sky/specs/markup.md ('k') | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698