| OLD | NEW |
| (Empty) |
| 1 Parsing | |
| 2 ======= | |
| 3 | |
| 4 Parsing in Sky is a strict pipeline consisting of five stages: | |
| 5 | |
| 6 - decoding, which converts incoming bytes into Unicode characters | |
| 7 using UTF-8. | |
| 8 | |
| 9 - normalising, which manipulates the sequence of characters. | |
| 10 | |
| 11 - tokenising, which converts these characters into four kinds of | |
| 12 tokens: character tokens, start tag tokens, end tag tokens, and | |
| 13 automatic end tag tokens. Character tokens have a single character | |
| 14 value. Start and end tag tokens have a tag name, and a list of | |
| 15 name/value pairs known as attributes. | |
| 16 | |
| 17 - token cleanup, which converts sequences of character tokens into | |
| 18 string tokens, and removes duplicate attributes in tag tokens. | |
| 19 | |
| 20 - tree construction, which converts these tokens into a tree of nodes. | |
| 21 | |
| 22 Later stages cannot affect earlier stages. | |
| 23 | |
| 24 When a sequence of bytes is to be parsed, there is always a defined | |
| 25 _parsing context_, which is either an Application object or a Module | |
| 26 object. | |
| 27 | |
| 28 | |
| 29 Decoding stage | |
| 30 -------------- | |
| 31 | |
| 32 To decode a sequence of bytes _bytes_ for parsing, the [utf-8 | |
| 33 decode](https://encoding.spec.whatwg.org/#utf-8-decode) algorithm must | |
| 34 be used to transform _bytes_ into a sequence of characters | |
| 35 _characters_. | |
| 36 | |
| 37 Note: The decoder will strip a leading BOM if any. | |
| 38 | |
| 39 This sequence must then be passed to the normalisation stage. | |
| 40 | |
| 41 | |
| 42 Normalisation stage | |
| 43 ------------------- | |
| 44 | |
| 45 To normalise a sequence of characters, apply the following rules: | |
| 46 | |
| 47 * Any U+000D character followed by a U+000A character must be removed. | |
| 48 | |
| 49 * Any U+000D character not followed by a U+000A character must be | |
| 50 converted to a U+000A character. | |
| 51 | |
| 52 * Any U+0000 character must be converted to a U+FFFD character. | |
| 53 | |
| 54 The converted sequence of characters must then be passed to the | |
| 55 tokenisation stage. | |
| 56 | |
| 57 | |
| 58 Tokenisation stage | |
| 59 ------------------ | |
| 60 | |
| 61 To tokenise a sequence of characters, a state machine is used. | |
| 62 | |
| 63 Initially, the state machine must begin in the **signature** state. | |
| 64 | |
| 65 Each character in turn must be processed according to the rules of the | |
| 66 state at the time the character is processed. A character is processed | |
| 67 once it has been _consumed_. This produces a stream of tokens; the | |
| 68 tokens must be passed to the token cleanup stage. | |
| 69 | |
| 70 When the last character is consumed, the tokeniser ends. | |
| 71 | |
| 72 | |
| 73 ### Expecting a string ### | |
| 74 | |
| 75 When the user agent is to _expect a string_, it must run these steps: | |
| 76 | |
| 77 1. Let _expectation_ be the string to expect. When this string is | |
| 78 indexed, the first character has index 0. | |
| 79 | |
| 80 2. Assertion: The first character in _expectation_ is the current | |
| 81 character, and _expectation_ has more than one character. | |
| 82 | |
| 83 3. Consume the current character. | |
| 84 | |
| 85 4. Let _index_ be 1. | |
| 86 | |
| 87 5. Let _success_ and _failure_ be the states specified for success and | |
| 88 failure respectively. | |
| 89 | |
| 90 6. Switch to the **expect a string** state. | |
| 91 | |
| 92 | |
| 93 ### Tokeniser states ### | |
| 94 | |
| 95 #### **Signature** state #### | |
| 96 | |
| 97 If the current character is... | |
| 98 | |
| 99 * '``#``': If the _parsing context_ is not an Application, switch to | |
| 100 the _failed signature_ state. Otherwise, expect the string | |
| 101 "``#!mojo mojo:sky``", with _after signature_ as the _success_ | |
| 102 state and _failed signature_ as the _failure_ state. | |
| 103 | |
| 104 * '``S``': If the _parsing context_ is not a Module, switch to the | |
| 105 _failed signature_ state. Otherwise, expect the string | |
| 106 "``SKY MODULE``", with _after signature_ as the _success_ state, | |
| 107 and _failed signature_ as the _failure_ state. | |
| 108 | |
| 109 * Anything else: Jump to the **failed signature** state. | |
| 110 | |
| 111 | |
| 112 #### **Expect a string** state #### | |
| 113 | |
| 114 If the current character is not the same as the <i>index</i>th character in | |
| 115 _expectation_, then switch to the _failure_ state. | |
| 116 | |
| 117 Otherwise, consume the character, and increase _index_. If _index_ is | |
| 118 now equal to the length of _expectation_, then switch to the _success_ | |
| 119 state. | |
| 120 | |
| 121 | |
| 122 #### **After signature** state #### | |
| 123 | |
| 124 If the current character is... | |
| 125 | |
| 126 * U+000A: Consume the character and switch to the **data** state. | |
| 127 * U+0020: Consume the character and switch to the **consume rest of | |
| 128 line** state. | |
| 129 * Anything else: Switch to the **failed signature** state. | |
| 130 | |
| 131 | |
| 132 #### **Failed signature** state #### | |
| 133 | |
| 134 Stop parsing. No tokens are emitted. The file is not a sky file. | |
| 135 | |
| 136 | |
| 137 #### **Consume rest of line** state #### | |
| 138 | |
| 139 If the current character is... | |
| 140 | |
| 141 * U+000A: Consume the character and switch to the **data** state. | |
| 142 * Anything else: Consume the character and stay in this state. | |
| 143 | |
| 144 | |
| 145 #### **Data** state #### | |
| 146 | |
| 147 If the current character is... | |
| 148 | |
| 149 * '``<``': Consume the character and switch to the **tag open** state. | |
| 150 | |
| 151 * '``&``': Consume the character and switch to the **character | |
| 152 reference** state, with the _return state_ set to the **data** | |
| 153 state, and the _emitting operation_ being to emit a character token | |
| 154 for the given character. | |
| 155 | |
| 156 * Anything else: Emit the current input character as a character | |
| 157 token. Consume the character. Stay in this state. | |
| 158 | |
| 159 | |
| 160 #### **Script raw data** state #### | |
| 161 | |
| 162 If the current character is... | |
| 163 | |
| 164 * '``<``': Consume the character and switch to the **script raw | |
| 165 data: close 1** state. | |
| 166 | |
| 167 * Anything else: Emit the current input character as a character | |
| 168 token. Consume the character. Stay in this state. | |
| 169 | |
| 170 | |
| 171 #### **Script raw data: close 1** state #### | |
| 172 | |
| 173 If the current character is... | |
| 174 | |
| 175 * '``/``': Consume the character and switch to the **script raw | |
| 176 data: close 2** state. | |
| 177 | |
| 178 * Anything else: Emit '``<``' character tokens. Switch to the | |
| 179 **script raw data** state without consuming the character. | |
| 180 | |
| 181 | |
| 182 #### **Script raw data: close 2** state #### | |
| 183 | |
| 184 If the current character is... | |
| 185 | |
| 186 * '``s``': Consume the character and switch to the **script raw | |
| 187 data: close 3** state. | |
| 188 | |
| 189 * Anything else: Emit '``</``' character tokens. Switch to the | |
| 190 **script raw data** state without consuming the character. | |
| 191 | |
| 192 | |
| 193 #### **Script raw data: close 3** state #### | |
| 194 | |
| 195 If the current character is... | |
| 196 | |
| 197 * '``c``': Consume the character and switch to the **script raw | |
| 198 data: close 4** state. | |
| 199 | |
| 200 * Anything else: Emit '``</s``' character tokens. Switch to the | |
| 201 **script raw data** state without consuming the character. | |
| 202 | |
| 203 | |
| 204 #### **Script raw data: close 4** state #### | |
| 205 | |
| 206 If the current character is... | |
| 207 | |
| 208 * '``r``': Consume the character and switch to the **script raw | |
| 209 data: close 5** state. | |
| 210 | |
| 211 * Anything else: Emit '``</sc``' character tokens. Switch to the | |
| 212 **script raw data** state without consuming the character. | |
| 213 | |
| 214 | |
| 215 #### **Script raw data: close 5** state #### | |
| 216 | |
| 217 If the current character is... | |
| 218 | |
| 219 * '``i``': Consume the character and switch to the **script raw | |
| 220 data: close 6** state. | |
| 221 | |
| 222 * Anything else: Emit '``</scr``' character tokens. Switch to the | |
| 223 **script raw data** state without consuming the character. | |
| 224 | |
| 225 | |
| 226 #### **Script raw data: close 6** state #### | |
| 227 | |
| 228 If the current character is... | |
| 229 | |
| 230 * '``p``': Consume the character and switch to the **script raw | |
| 231 data: close 7** state. | |
| 232 | |
| 233 * Anything else: Emit '``</scri``' character tokens. Switch to the | |
| 234 **script raw data** state without consuming the character. | |
| 235 | |
| 236 | |
| 237 #### **Script raw data: close 7** state #### | |
| 238 | |
| 239 If the current character is... | |
| 240 | |
| 241 * '``t``': Consume the character and switch to the **script raw | |
| 242 data: close 8** state. | |
| 243 | |
| 244 * Anything else: Emit '``</scrip``' character tokens. Switch to the | |
| 245 **script raw data** state without consuming the character. | |
| 246 | |
| 247 | |
| 248 #### **Script raw data: close 8** state #### | |
| 249 | |
| 250 If the current character is... | |
| 251 | |
| 252 * U+0020, U+000A, '``/``', '``>``': Create an end tag token, and | |
| 253 let its tag name be the string '``script``'. Switch to the | |
| 254 **before attribute name** state without consuming the character. | |
| 255 | |
| 256 * Anything else: Emit '``</script``' character tokens. Switch to the | |
| 257 **script raw data** state without consuming the character. | |
| 258 | |
| 259 | |
| 260 #### **Style raw data** state #### | |
| 261 | |
| 262 If the current character is... | |
| 263 | |
| 264 * '``<``': Consume the character and switch to the **style raw | |
| 265 data: close 1** state. | |
| 266 | |
| 267 * Anything else: Emit the current input character as a character | |
| 268 token. Consume the character. Stay in this state. | |
| 269 | |
| 270 | |
| 271 #### **Style raw data: close 1** state #### | |
| 272 | |
| 273 If the current character is... | |
| 274 | |
| 275 * '``/``': Consume the character and switch to the **style raw | |
| 276 data: close 2** state. | |
| 277 | |
| 278 * Anything else: Emit '``<``' character tokens. Switch to the | |
| 279 **style raw data** state without consuming the character. | |
| 280 | |
| 281 | |
| 282 #### **Style raw data: close 2** state #### | |
| 283 | |
| 284 If the current character is... | |
| 285 | |
| 286 * '``s``': Consume the character and switch to the **style raw | |
| 287 data: close 3** state. | |
| 288 | |
| 289 * Anything else: Emit '``</``' character tokens. Switch to the | |
| 290 **style raw data** state without consuming the character. | |
| 291 | |
| 292 | |
| 293 #### **Style raw data: close 3** state #### | |
| 294 | |
| 295 If the current character is... | |
| 296 | |
| 297 * '``t``': Consume the character and switch to the **style raw | |
| 298 data: close 4** state. | |
| 299 | |
| 300 * Anything else: Emit '``</s``' character tokens. Switch to the | |
| 301 **style raw data** state without consuming the character. | |
| 302 | |
| 303 | |
| 304 #### **Style raw data: close 4** state #### | |
| 305 | |
| 306 If the current character is... | |
| 307 | |
| 308 * '``y``': Consume the character and switch to the **style raw | |
| 309 data: close 5** state. | |
| 310 | |
| 311 * Anything else: Emit '``</st``' character tokens. Switch to the | |
| 312 **style raw data** state without consuming the character. | |
| 313 | |
| 314 | |
| 315 #### **Style raw data: close 5** state #### | |
| 316 | |
| 317 If the current character is... | |
| 318 | |
| 319 * '``l``': Consume the character and switch to the **style raw | |
| 320 data: close 6** state. | |
| 321 | |
| 322 * Anything else: Emit '``</sty``' character tokens. Switch to the | |
| 323 **style raw data** state without consuming the character. | |
| 324 | |
| 325 | |
| 326 #### **Style raw data: close 6** state #### | |
| 327 | |
| 328 If the current character is... | |
| 329 | |
| 330 * '``e``': Consume the character and switch to the **style raw | |
| 331 data: close 7** state. | |
| 332 | |
| 333 * Anything else: Emit '``</styl``' character tokens. Switch to the | |
| 334 **style raw data** state without consuming the character. | |
| 335 | |
| 336 | |
| 337 #### **Style raw data: close 7** state #### | |
| 338 | |
| 339 If the current character is... | |
| 340 | |
| 341 * U+0020, U+000A, '``/``', '``>``': Create an end tag token, and | |
| 342 let its tag name be the string '``style``'. Switch to the | |
| 343 **before attribute name** state without consuming the character. | |
| 344 | |
| 345 * Anything else: Emit '``</style``' character tokens. Switch to the | |
| 346 **style raw data** state without consuming the character. | |
| 347 | |
| 348 | |
| 349 #### **Tag open** state #### | |
| 350 | |
| 351 If the current character is... | |
| 352 | |
| 353 * '``!``': Consume the character and switch to the **comment start | |
| 354 1** state. | |
| 355 | |
| 356 * '``/``': Consume the character and switch to the **close tag | |
| 357 state** state. | |
| 358 | |
| 359 * '``>``': Emit character tokens for '``<>``'. Consume the current | |
| 360 character. Switch to the **data** state. | |
| 361 | |
| 362 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``', | |
| 363 '``-``', '``_``', '``.``': Create a start tag token, let its | |
| 364 tag name be the current character, consume the current character and | |
| 365 switch to the **tag name** state. | |
| 366 | |
| 367 * Anything else: Emit the character token for '``<``'. Switch to the | |
| 368 **data** state without consuming the current character. | |
| 369 | |
| 370 | |
| 371 #### **Close tag** state #### | |
| 372 | |
| 373 If the current character is... | |
| 374 | |
| 375 * '``>``': Emit an automatic end tag token. Switch to the **data** | |
| 376 state. | |
| 377 | |
| 378 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``', | |
| 379 '``-``', '``_``', '``.``': Create an end tag token, let its | |
| 380 tag name be the current character, consume the current character and | |
| 381 switch to the **tag name** state. | |
| 382 | |
| 383 * Anything else: Emit the character tokens for '``</``'. Switch to | |
| 384 the **data** state without consuming the current character. | |
| 385 | |
| 386 | |
| 387 #### **Tag name** state #### | |
| 388 | |
| 389 If the current character is... | |
| 390 | |
| 391 * U+0020, U+000A: Consume the current character. Switch to the | |
| 392 **before attribute name** state. | |
| 393 | |
| 394 * '``/``': Consume the current character. Switch to the **void tag** | |
| 395 state. | |
| 396 | |
| 397 * '``>``': Consume the current character. Switch to the **after | |
| 398 tag** state. | |
| 399 | |
| 400 * Anything else: Append the current character to the tag name, and | |
| 401 consume the current character. Stay in this state. | |
| 402 | |
| 403 | |
| 404 #### **Void tag** state #### | |
| 405 | |
| 406 If the current character is... | |
| 407 | |
| 408 * '``>``': Consume the current character. Switch to the **after void | |
| 409 tag** state. | |
| 410 | |
| 411 * Anything else: Switch to the **before attribute name** state without | |
| 412 consuming the current character. | |
| 413 | |
| 414 | |
| 415 #### **Before attribute name** state #### | |
| 416 | |
| 417 If the current character is... | |
| 418 | |
| 419 * U+0020, U+000A: Consume the current character. Stay in this state. | |
| 420 | |
| 421 * '``/``': Consume the current character. Switch to the **void tag** | |
| 422 state. | |
| 423 | |
| 424 * '``>``': Consume the current character. Switch to the **after | |
| 425 tag** state. | |
| 426 | |
| 427 * Anything else: Create a new attribute in the tag token, and set its | |
| 428 name to the current character and its value to the empty string. | |
| 429 Consume the current character. Switch to the **attribute name** | |
| 430 state. | |
| 431 | |
| 432 | |
| 433 #### **Attribute name** state #### | |
| 434 | |
| 435 If the current character is... | |
| 436 | |
| 437 * U+0020, U+000A: Consume the current character. Switch to the **after | |
| 438 attribute name** state. | |
| 439 | |
| 440 * '``/``': Consume the current character. Switch to the **void tag** | |
| 441 state. | |
| 442 | |
| 443 * '``=``': Consume the current character. Switch to the **before | |
| 444 attribute value** state. | |
| 445 | |
| 446 * '``>``': Consume the current character. Switch to the **after | |
| 447 tag** state. | |
| 448 | |
| 449 * Anything else: Append the current character to the most recently | |
| 450 added attribute's name, and consume the current character. Stay in | |
| 451 this state. | |
| 452 | |
| 453 | |
| 454 #### **After attribute name** state #### | |
| 455 | |
| 456 If the current character is... | |
| 457 | |
| 458 * U+0020, U+000A: Consume the current character. Stay in this state. | |
| 459 | |
| 460 * '``/``': Consume the current character. Switch to the **void tag** | |
| 461 state. | |
| 462 | |
| 463 * '``=``': Consume the current character. Switch to the **before | |
| 464 attribute value** state. | |
| 465 | |
| 466 * '``>``': Consume the current character. Switch to the **after | |
| 467 tag** state. | |
| 468 | |
| 469 * Anything else: Create a new attribute in the tag token, and set its | |
| 470 name to the current character and its value to the empty string. | |
| 471 Consume the current character. Switch to the **attribute name** | |
| 472 state. | |
| 473 | |
| 474 | |
| 475 #### **Before attribute value** state #### | |
| 476 | |
| 477 If the current character is... | |
| 478 | |
| 479 * U+0020, U+000A: Consume the current character. Stay in this state. | |
| 480 | |
| 481 * '``>``': Consume the current character. Switch to the **after | |
| 482 tag** state. | |
| 483 | |
| 484 * '``'``': Consume the current character. Switch to the | |
| 485 **single-quoted attribute value** state. | |
| 486 | |
| 487 * '``"``': Consume the current character. Switch to the | |
| 488 **double-quoted attribute value** state. | |
| 489 | |
| 490 * Anything else: Switch to the **unquoted attribute value** state | |
| 491 without consuming the current character. | |
| 492 | |
| 493 | |
| 494 #### **Single-quoted attribute value** state #### | |
| 495 | |
| 496 If the current character is... | |
| 497 | |
| 498 * '``'``': Consume the current character. Switch to the | |
| 499 **before attribute name** state. | |
| 500 | |
| 501 * '``&``': Consume the character and switch to the **character | |
| 502 reference** state, with the _return state_ set to the | |
| 503 **single-quoted attribute value** state and the _emitting operation_ | |
| 504 being to append the given character to the value of the most | |
| 505 recently added attribute. | |
| 506 | |
| 507 * Anything else: Append the current character to the value of the most | |
| 508 recently added attribute. Consume the current character. Stay in | |
| 509 this state. | |
| 510 | |
| 511 | |
| 512 #### **Double-quoted attribute value** state #### | |
| 513 | |
| 514 If the current character is... | |
| 515 | |
| 516 * '``"``': Consume the current character. Switch to the | |
| 517 **before attribute name** state. | |
| 518 | |
| 519 * '``&``': Consume the character and switch to the **character | |
| 520 reference** state, with the _return state_ set to the | |
| 521 **double-quoted attribute value** state and the _emitting operation_ | |
| 522 being to append the given character to the value of the most | |
| 523 recently added attribute. | |
| 524 | |
| 525 * Anything else: Append the current character to the value of the most | |
| 526 recently added attribute. Consume the current character. Stay in | |
| 527 this state. | |
| 528 | |
| 529 | |
| 530 #### **Unquoted attribute value** state #### | |
| 531 | |
| 532 If the current character is... | |
| 533 | |
| 534 * U+0020, U+000A: Consume the current character. Switch to the | |
| 535 **before attribute name** state. | |
| 536 | |
| 537 * '``>``': Consume the current character. Switch to the **after tag** | |
| 538 state. | |
| 539 | |
| 540 * '``&``': Consume the character and switch to the **character | |
| 541 reference** state, with the _return state_ set to the **unquoted | |
| 542 attribute value** state, and the _emitting operation_ being to | |
| 543 append the given character to the value of the most recently added | |
| 544 attribute. | |
| 545 | |
| 546 * Anything else: Append the current character to the value of the most | |
| 547 recently added attribute. Consume the current character. Stay in | |
| 548 this state. | |
| 549 | |
| 550 | |
| 551 #### **After tag** state #### | |
| 552 | |
| 553 Emit the tag token. | |
| 554 | |
| 555 If the tag token was a start tag token and the tag name was | |
| 556 '``script``', then switch to the **script raw data** state. | |
| 557 | |
| 558 If the tag token was a start tag token and the tag name was | |
| 559 '``style``', then switch to the **style raw data** state. | |
| 560 | |
| 561 Otherwise, switch to the **data** state. | |
| 562 | |
| 563 | |
| 564 #### **After void tag** state #### | |
| 565 | |
| 566 Emit the tag token. | |
| 567 | |
| 568 If the tag token is a start tag token, emit an end tag token with the | |
| 569 same tag name. | |
| 570 | |
| 571 Switch to the **data** state. | |
| 572 | |
| 573 | |
| 574 #### **Comment start 1** state #### | |
| 575 | |
| 576 If the current character is... | |
| 577 | |
| 578 * '``-``': Consume the character and switch to the **comment start | |
| 579 2** state. | |
| 580 | |
| 581 * Anything else: Emit character tokens for '``<!``'. Switch to the | |
| 582 **data** state without consuming the current character. | |
| 583 | |
| 584 | |
| 585 #### **Comment start 2** state #### | |
| 586 | |
| 587 If the current character is... | |
| 588 | |
| 589 * '``-``': Consume the character and switch to the **comment** | |
| 590 state. | |
| 591 | |
| 592 * Anything else: Emit character tokens for '``<!-``'. Switch to the | |
| 593 **data** state without consuming the current character. | |
| 594 | |
| 595 | |
| 596 #### **Comment** state #### | |
| 597 | |
| 598 If the current character is... | |
| 599 | |
| 600 * '``-``': Consume the character and switch to the **comment end 1** | |
| 601 state. | |
| 602 | |
| 603 * Anything else: Consume the character and stay in this state. | |
| 604 | |
| 605 | |
| 606 #### **Comment end 1** state #### | |
| 607 | |
| 608 If the current character is... | |
| 609 | |
| 610 * '``-``': Consume the character, switch to the **comment end 2** | |
| 611 state. | |
| 612 | |
| 613 * Anything else: Consume the character, and switch to the **comment** | |
| 614 state. | |
| 615 | |
| 616 | |
| 617 #### **Comment end 2** state #### | |
| 618 | |
| 619 If the current character is... | |
| 620 | |
| 621 * '``>``': Consume the character and switch to the **data** state. | |
| 622 | |
| 623 * '``-``': Consume the character, but stay in this state. | |
| 624 | |
| 625 * Anything else: Consume the character, and switch to the **comment** | |
| 626 state. | |
| 627 | |
| 628 | |
| 629 #### **Character reference** state #### | |
| 630 | |
| 631 Let _raw value_ be the string '``&``'. | |
| 632 | |
| 633 Append the current character to _raw value_. | |
| 634 | |
| 635 If the current character is... | |
| 636 | |
| 637 * '``#``': Consume the character, and switch to the **numeric | |
| 638 character reference** state. | |
| 639 | |
| 640 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': switch to the | |
| 641 **named character reference** state without consuming the current | |
| 642 character. | |
| 643 | |
| 644 * Anything else: Run the _emitting operation_ for all but the last | |
| 645 character in _raw value_, and switch to the _return state_ without | |
| 646 consuming the current character. | |
| 647 | |
| 648 | |
| 649 #### **Numeric character reference** state #### | |
| 650 | |
| 651 Append the current character to _raw value_. | |
| 652 | |
| 653 If the current character is... | |
| 654 | |
| 655 * '``x``', '``X``': Consume the character and switch to the **before | |
| 656 hexadecimal numeric character reference** state. | |
| 657 | |
| 658 * '``0``'..'``9``': Let _value_ be the numeric value of the | |
| 659 current character interpreted as a decimal digit, consume the | |
| 660 character, and switch to the **decimal numeric character reference** | |
| 661 state. | |
| 662 | |
| 663 * Anything else: Run the _emitting operation_ for all but the last | |
| 664 character in _raw value_, and switch to the _return state_ without | |
| 665 consuming the current character. | |
| 666 | |
| 667 | |
| 668 #### **Before hexadecimal numeric character reference** state #### | |
| 669 | |
| 670 Append the current character to _raw value_. | |
| 671 | |
| 672 If the current character is... | |
| 673 | |
| 674 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': | |
| 675 Let _value_ be the numeric value of the current character | |
| 676 interpreted as a hexadecimal digit, consume the character, and | |
| 677 switch to the **hexadecimal numeric character reference** state. | |
| 678 | |
| 679 * Anything else: Run the _emitting operation_ for all but the last | |
| 680 character in _raw value_, and switch to the _return state_ without | |
| 681 consuming the current character. | |
| 682 | |
| 683 | |
| 684 #### **Hexadecimal numeric character reference** state #### | |
| 685 | |
| 686 Append the current character to _raw value_. | |
| 687 | |
| 688 If the current character is... | |
| 689 | |
| 690 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': | |
| 691 Let _value_ be sixteen times _value_ plus the numeric value of the | |
| 692 current character interpreted as a hexadecimal digit. | |
| 693 | |
| 694 * '``;``': Consume the character. If _value_ is between 0x0001 and | |
| 695 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, | |
| 696 run the _emitting operation_ with a unicode character having the | |
| 697 scalar value _value_; otherwise, run the _emitting operation_ with | |
| 698 the character U+FFFD. Then, in either case, switch to the _return | |
| 699 state_. | |
| 700 | |
| 701 * Anything else: Run the _emitting operation_ for all but the last | |
| 702 character in _raw value_, and switch to the _return state_ without | |
| 703 consuming the current character. | |
| 704 | |
| 705 | |
| 706 #### **Decimal numeric character reference** state #### | |
| 707 | |
| 708 Append the current character to _raw value_. | |
| 709 | |
| 710 If the current character is... | |
| 711 | |
| 712 * '``0``'..'``9``': Let _value_ be ten times _value_ plus the | |
| 713 numeric value of the current character interpreted as a decimal | |
| 714 digit. | |
| 715 | |
| 716 * '``;``': Consume the character. If _value_ is between 0x0001 and | |
| 717 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, | |
| 718 run the _emitting operation_ with a unicode character having the | |
| 719 scalar value _value_; otherwise, run the _emitting operation_ with | |
| 720 the character U+FFFD. Then, in either case, switch to the _return | |
| 721 state_. | |
| 722 | |
| 723 * Anything else: Run the _emitting operation_ for all but the last | |
| 724 character in _raw value_, and switch to the _return state_ without | |
| 725 consuming the current character. | |
| 726 | |
| 727 | |
| 728 #### **Named character reference** state #### | |
| 729 | |
| 730 Append the current character to _raw value_. | |
| 731 | |
| 732 If the current character is... | |
| 733 | |
| 734 * '``;``': Consume the character. | |
| 735 If the _raw value_ is... | |
| 736 | |
| 737 - '``&``: Emit Run the _emitting operation_ for the character | |
| 738 '``&``'. | |
| 739 | |
| 740 - '``'``: Emit Run the _emitting operation_ for the character | |
| 741 '``'``'. | |
| 742 | |
| 743 - '``>``: Emit Run the _emitting operation_ for the character | |
| 744 '``>``'. | |
| 745 | |
| 746 - '``<``: Emit Run the _emitting operation_ for the character | |
| 747 '``<``'. | |
| 748 | |
| 749 - '``"``: Emit Run the _emitting operation_ for the character | |
| 750 '``"``'. | |
| 751 | |
| 752 Then, switch to the _return state_. | |
| 753 | |
| 754 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``': Consume the | |
| 755 character and stay in this state. | |
| 756 | |
| 757 * Anything else: Run the _emitting operation_ for all but the last | |
| 758 character in _raw value_, and switch to the _return state_ without | |
| 759 consuming the current character. | |
| 760 | |
| 761 | |
| 762 Token cleanup stage | |
| 763 ------------------- | |
| 764 | |
| 765 Replace each sequence of character tokens with a single string token | |
| 766 whose value is the concatenation of all the characters in the | |
| 767 character tokens. | |
| 768 | |
| 769 For each start tag token, remove all but the first name/value pair for | |
| 770 each name (i.e. remove duplicate attributes, keeping only the first | |
| 771 one). | |
| 772 | |
| 773 TODO(ianh): maybe sort the attributes? | |
| 774 | |
| 775 For each end tag token, remove the attributes entirely. | |
| 776 | |
| 777 If the token is a start tag token, notify the JavaScript token stream | |
| 778 callback of the token. | |
| 779 | |
| 780 Then, pass the tokens to the tree construction stage. | |
| 781 | |
| 782 | |
| 783 Tree construction stage | |
| 784 ----------------------- | |
| 785 | |
| 786 To construct a node tree from a _sequence of tokens_ and an element | |
| 787 tree rooted at a `Root` node _root_ (this is implemented in JS): | |
| 788 | |
| 789 1. Initialize the _stack of open nodes_ to be _root_. | |
| 790 2. Initialize _imported modules_ to an empty list. | |
| 791 3. Consider each token _token_ in the _sequence of tokens_ in turn, as | |
| 792 follows. If a token is to be skipped, then jump straight to the | |
| 793 next token, without doing any more work with the skipped token. | |
| 794 - If _token_ is a string token, | |
| 795 1. If the value of the token contains only U+0020 and U+000A | |
| 796 characters, and there is no ``t`` element on the _stack of | |
| 797 open nodes_, then skip the token. | |
| 798 2. Create a text node _node_ whose character data is the value of | |
| 799 the token. | |
| 800 3. Append _node_ to the top node in the _stack of open nodes_. | |
| 801 - If _token_ is a start tag token, | |
| 802 1. If the tag name isn't a registered tag name, then yield until | |
| 803 _imported modules_ contains no entries with unresolved | |
| 804 promises. | |
| 805 2. If the tag name is not registered, then let the ErrorElement | |
| 806 constructor from dart:sky be the element constructor. | |
| 807 Otherwise, let the element constructor be the registered | |
| 808 element's constructor for that tag name in this module. | |
| 809 3. Create an element _node_ with the attributes given by the | |
| 810 token by calling the constructor. | |
| 811 4. If _node_ is not an Element object, then let the constructor | |
| 812 be the ErrorElement constructor and return to the previous | |
| 813 step. | |
| 814 5. Append _node_ to the top node in the _stack of open nodes_. | |
| 815 6. Push _node_ onto the top of the _stack of open nodes_. | |
| 816 7. If _node_ is a ``template`` element, then: | |
| 817 1. Let _fragment_ be the ``Fragment`` object that the | |
| 818 ``template`` element uses as its template contents | |
| 819 container. | |
| 820 2. Push _fragment_ onto the top of the _stack of open nodes_. | |
| 821 If _node_ is an ``import`` element, then: | |
| 822 1. Let ``url`` be the value of _node_'s ``src`` attribute. | |
| 823 2. Call ``parsing context``'s ``importModule()`` method, | |
| 824 passing it ``url``. | |
| 825 3. Add the returned promise to _imported modules_; if _node_ | |
| 826 has an ``as`` attribute, associate the entry with that | |
| 827 name. | |
| 828 - If _token_ is an end tag token: | |
| 829 1. If the tag name is registered, let _tag name_ be that tag | |
| 830 name. Otherwise, let _tag name_ be "error". | |
| 831 2. Let _node_ be the topmost node in the _stack of open nodes_ | |
| 832 whose tag name is _tag name_, if any. If there isn't one, skip | |
| 833 this token. | |
| 834 3. If there's a ``template`` element in the _stack of open | |
| 835 nodes_ above _node_, then skip this token. | |
| 836 4. Pop nodes from the _stack of open nodes_ until _node_ has been | |
| 837 popped. | |
| 838 5. If _node_'s tag name is ``script``, then yield until _imported | |
| 839 modules_ contains no entries with unresolved promises, then | |
| 840 execute the script given by the element's contents, using the | |
| 841 associated names as appropriate. | |
| 842 - If _token_ is an automatic end tag token: | |
| 843 1. Pop the top node from the _stack of open nodes_, unless it is | |
| 844 the _root_ node. | |
| 845 4. Yield until _imported modules_ has no promises. | |
| 846 5. Fire a ``load`` event at the _parsing context_ object. | |
| OLD | NEW |