sky/specs/parsing.md - Issue 677533004: Update parser to support <t>, <template>, /> syntax

Side by Side Diff: sky/specs/parsing.md

Issue 677533004: Update parser to support <t>, <template>, /> syntax (Closed) Base URL: https://github.com/domokit/mojo.git@master

Patch Set: Specs: parser updates Created 6 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 Parsing	1 Parsing

2 =======	2 =======

3	3

4 Parsing in Sky is a strict pipeline consisting of four stages:	4 Parsing in Sky is a strict pipeline consisting of five stages:

5	5

6 - decoding, which converts incoming bytes into Unicode characters	6 - decoding, which converts incoming bytes into Unicode characters

7 using UTF-8	7 using UTF-8.

8	8

9 - normalising, which converts certain sequences of characters	9 - normalising, which manipulates the sequence of characters.

10	10

11 - tokenising, which converts these characters into tokens	11 - tokenising, which converts these characters into three kinds of

	12 tokens: character tokens, start tag tokens, and end tag tokens.

	13 Character tokens have a single character value. Tag tokens have a

	14 tag name, and a list of name/value pairs known as attributes.

12	15

13 - tree construction, which converts these tokens into a tree of nodes	16 - token cleanup, which converts sequences of character tokens into

	17 string tokens, and removes duplicate attributes in tag tokens.

	18

	19 - tree construction, which converts these tokens into a tree of nodes.

14	20

15 Later stages cannot affect earlier stages.	21 Later stages cannot affect earlier stages.

16	22

17 When a sequence of bytes is to be parsed, there is always a defined	23 When a sequence of bytes is to be parsed, there is always a defined

18 _parsing context_, which is either "application" or "module".	24 _parsing context_, which is either an Application object or a Module

	25 object.

19	26

20	27

21 Decoding stage	28 Decoding stage

22 --------------	29 --------------

23	30

24 To decode a sequence of bytes _bytes_ for parsing, the [UTF-8	31 To decode a sequence of bytes _bytes_ for parsing, the [UTF-8

25 decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used	32 decoder](https://encoding.spec.whatwg.org/#utf-8-decoder) must be used

26 to transform _bytes_ into a sequence of characters _characters_.	33 to transform _bytes_ into a sequence of characters _characters_.

27	34

28 This sequence must then be passed to the normalisation stage.	35 This sequence must then be passed to the normalisation stage.

(...skipping 18 matching lines...) Expand all Loading...
47 Tokenisation stage	54 Tokenisation stage

48 ------------------	55 ------------------

49	56

50 To tokenise a sequence of characters, a state machine is used.	57 To tokenise a sequence of characters, a state machine is used.

51	58

52 Initially, the state machine must begin in the signature state.	59 Initially, the state machine must begin in the signature state.

53	60

54 Each character in turn must be processed according to the rules of the	61 Each character in turn must be processed according to the rules of the

55 state at the time the character is processed. A character is processed	62 state at the time the character is processed. A character is processed

56 once it has been _consumed_. This produces a stream of tokens; the	63 once it has been _consumed_. This produces a stream of tokens; the

57 tokens must be passed to the tree construction stage.	64 tokens must be passed to the token cleanup stage.

58	65

59 When the last character is consumed, the tokeniser ends.	66 When the last character is consumed, the tokeniser ends.

60	67

61	68

62 ### Expecting a string ###	69 ### Expecting a string ###

63	70

64 When the user agent is to _expect a string_, it must run these steps:	71 When the user agent is to _expect a string_, it must run these steps:

65	72

66 1. Let _expectation_ be the string to expect. When this string is	73 1. Let _expectation_ be the string to expect. When this string is

67 indexed, the first character has index 0.	74 indexed, the first character has index 0.

(...skipping 10 matching lines...) Expand all Loading...
78	85

79 6. Switch to the expect a string state.	86 6. Switch to the expect a string state.

80	87

81	88

82 ### Tokeniser states ###	89 ### Tokeniser states ###

83	90

84 #### Signature state ####	91 #### Signature state ####

85	92

86 If the current character is...	93 If the current character is...

87	94

88 * '```#```': If the _parsing context_ is not "application", switch to	95 * '```#```': If the _parsing context_ is not an Application, switch to

89 the _failed signature_ state. Otherwise, expect the string	96 the _failed signature_ state. Otherwise, expect the string

90 "```#!mojo mojo:sky```", with _after signature_ as the _success_	97 "```#!mojo mojo:sky```", with _after signature_ as the _success_

91 state and _failed signature_ as the _failure_ state.	98 state and _failed signature_ as the _failure_ state.

92	99

93 * '```S```': If the _parsing context_ is not "module", switch to the	100 * '```S```': If the _parsing context_ is not a Module, switch to the

94 _failed signature_ state. Otherwise, expect the string	101 _failed signature_ state. Otherwise, expect the string

95 "```SKY MODULE```", with _after signature_ as the _success_ state,	102 "```SKY MODULE```", with _after signature_ as the _success_ state,

96 and _failed signature_ as the _failure_ state.	103 and _failed signature_ as the _failure_ state.

97	104

98 * Anything else: Jump to the failed signature state.	105 * Anything else: Jump to the failed signature state.

99	106

100	107

101 #### Expect a string state ####	108 #### Expect a string state ####

102	109

103 If the current character is not the same as the <i>index</i>th character in	110 If the current character is not the same as the <i>index</i>th character in

(...skipping 284 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
388 tag** state.	395 tag** state.

389	396

390 * Anything else: Append the current character to the tag name, and	397 * Anything else: Append the current character to the tag name, and

391 consume the current character. Stay in this state.	398 consume the current character. Stay in this state.

392	399

393	400

394 ### Void tag state ###	401 ### Void tag state ###

395	402

396 If the current character is...	403 If the current character is...

397	404

398 * '```>```': Consume the current character. Switch to the **after	405 * '```>```': Consume the current character. Switch to the **after void

399 tag** state.	406 tag** state.

400	407

401 * Anything else: Switch to the before attribute name state without	408 * Anything else: Switch to the before attribute name state without

402 consuming the current character.	409 consuming the current character.

403	410

404	411

405 ### Before attribute name state ###	412 ### Before attribute name state ###

406	413

407 If the current character is...	414 If the current character is...

408	415

(...skipping 137 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
546	553

547 If the tag token was a start tag token and the tag name was	554 If the tag token was a start tag token and the tag name was

548 '```script```', then and switch to the script raw data state.	555 '```script```', then and switch to the script raw data state.

549	556

550 If the tag token was a start tag token and the tag name was	557 If the tag token was a start tag token and the tag name was

551 '```style```', then and switch to the style raw data state.	558 '```style```', then and switch to the style raw data state.

552	559

553 Otherwise, switch to the data state.	560 Otherwise, switch to the data state.

554	561

555	562

	563 ### After void tag state ###

	564

	565 Emit the tag token.

	566

	567 If the tag token is a start tag token, emit an end tag token with the

	568 same tag name.

	569

	570 Switch to the data state.

	571

	572

556 ### Comment start 1 state ###	573 ### Comment start 1 state ###

557	574

558 If the current character is...	575 If the current character is...

559	576

560 * '```-```': Consume the character and switch to the **comment start	577 * '```-```': Consume the character and switch to the **comment start

561 2** state.	578 2** state.

562	579

563 * '```>```': Emit character tokens for '```<!>```'. Consume the	580 * '```>```': Emit character tokens for '```<!>```'. Consume the

564 current character. Switch to the data state.	581 current character. Switch to the data state.

565	582

(...skipping 294 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
860	877

861 * Any other character in the range '```0```'..'```9```',	878 * Any other character in the range '```0```'..'```9```',

862 '```a```'..'```f```', '```A```'..'```F```': Consume the character	879 '```a```'..'```f```', '```A```'..'```F```': Consume the character

863 and stay in this state.	880 and stay in this state.

864	881

865 * Anything else: Run the _emitting operation_ for all but the last	882 * Anything else: Run the _emitting operation_ for all but the last

866 character in _raw value_, and switch to the data state without	883 character in _raw value_, and switch to the data state without

867 consuming the current character.	884 consuming the current character.

868	885

869	886

870 Tree construction	887 Token cleanup stage

871 -----------------	888 -------------------

872	889

873 To construct a node tree from a _sequence of tokens_ and a document _document_:	890 Replace each sequence of character tokens with a single string token

	891 whose value is the concatenation of all the characters in the

	892 character tokens.

	893

	894 For each start tag token, remove all but the first name/value pair for

	895 each name (i.e. remove duplicate attributes, keeping only the first

	896 one).

	897

	898 For each end tag token, remove the attributes entirely.

	899

	900 If the token is a start tag token, notify the JavaScript token stream

	901 callback of the token.

	902

	903 Then, pass the tokens to the tree construction stage.

	904

	905

	906 Tree construction stage

	907 -----------------------

	908

	909 To construct a node tree from a _sequence of tokens_ and a document

	910 _document_:

874	911

875 1. Initialize the _stack of open nodes_ to be _document_.	912 1. Initialize the _stack of open nodes_ to be _document_.

876 2. Consider each token _token_ in the _sequence of tokens_ in turn.	913 2. Consider each token _token_ in the _sequence of tokens_ in turn, as

877 - If _token_ is a text token,	914 follows. If a token is to be skipped, then jump straight to the

878 1. Create a text node _node_ with character data _token.data_.	915 next token, without doing any more work with the skipped token.

	916 - If _token_ is a string token,

	917 1. If the value of the token contains only U+0020 and U+000A

	918 characters, and there is no ```t``` element on the _stack of

	919 open nodes_, then skip the token.

	920 2. Create a text node _node_ whose character data is the value of

	921 the token.

	922 3. Append _node_ to the top node in the _stack of open nodes_.

	923 - If _token_ is a start tag token,

	924 1. Create an element _node_ with tag name and attributes given by

	925 the token.

879 2. Append _node_ to the top node in the _stack of open nodes_.	926 2. Append _node_ to the top node in the _stack of open nodes_.

880 - If _token_ is a start tag token,	927 - If _token_ is an end tag token:

881 1. Create an element _node_ with tag name _token.tagName_ and attributes	928 1. Let _node_ be the topmost node in the _stack of open nodes_

882 _token.attributes_.	929 whose tag name is the same as the token's tag name, if any. If

883 2. Append _node_ to the top node in the _stack of open nodes_.	930 there isn't one, skip this token.

884 3. If the _token.selfClosing_ flag is not set, push _node_ onto the	931 2. If there's a ```template``` element in the _stack of open

885 _stack of open elements_.	932 nodes_ above _node_, then skip this token.

886 4. If _token.tagName_ is _script_, TODO: Execute the script.	933 3. Pop nodes from the _stack of open nodes_ until _node_ has been

887 - If _token_ is an end tag token,	934 popped.

888 1. If the _stack of open nodes_ contains a node whose _tagName_ is	935 4. If _node_'s tag name is ```script```, then yield until there

889 _token.tagName_,	936 are no pending import loads, then execute the script given by

890 - Pop nodes from the _stack of open nodes_ until a node with	937 the element's contents.

891 a _tagName_ equal to _token.tagName_ has been popped.	938 3. Yield until there are no pending import loads.

892 2. Otherwise, ignore _token_.	939 3. Fire a ```load``` event at the _parsing context_ object.

893 - If _token_ is a comment token,

894 1. Ignore _token_.

895 - If _token_ is an EOF token,

896 1. Pop all the nodes from the _stack of open nodes_.

897 2. Signal _document_ that parsing is complete.

898

899 TODO(ianh): <template>, <t>

OLD	NEW

« no previous file with comments | « sky/specs/markup.md ('k') | no next file » | no next file with comments »