Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(176)

Side by Side Diff: sky/specs/parsing.md

Issue 683493003: Specs: Tokeniser fixes: simplify the entity parser to just do string compares, fix copy-pasta (Closed) Base URL: https://github.com/domokit/mojo.git@master
Patch Set: Created 6 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Parsing 1 Parsing
2 ======= 2 =======
3 3
4 Parsing in Sky is a strict pipeline consisting of five stages: 4 Parsing in Sky is a strict pipeline consisting of five stages:
5 5
6 - decoding, which converts incoming bytes into Unicode characters 6 - decoding, which converts incoming bytes into Unicode characters
7 using UTF-8. 7 using UTF-8.
8 8
9 - normalising, which manipulates the sequence of characters. 9 - normalising, which manipulates the sequence of characters.
10 10
(...skipping 128 matching lines...) Expand 10 before | Expand all | Expand 10 after
139 139
140 140
141 #### **Data** state #### 141 #### **Data** state ####
142 142
143 If the current character is... 143 If the current character is...
144 144
145 * '``<``': Consume the character and switch to the **tag open** state. 145 * '``<``': Consume the character and switch to the **tag open** state.
146 146
147 * '``&``': Consume the character and switch to the **character 147 * '``&``': Consume the character and switch to the **character
148 reference** state, with the _return state_ set to the **data** 148 reference** state, with the _return state_ set to the **data**
149 state, the _extra terminating character_ unset (or set to U+0000, 149 state, and the _emitting operation_ being to emit a character token
150 which has the same effect), and the _emitting operation_ being to 150 for the given character.
151 emit a character token for the given character.
152 151
153 * Anything else: Emit the current input character as a character 152 * Anything else: Emit the current input character as a character
154 token. Consume the character. Stay in this state. 153 token. Consume the character. Stay in this state.
155 154
156 155
157 #### **Script raw data** state #### 156 #### **Script raw data** state ####
158 157
159 If the current character is... 158 If the current character is...
160 159
161 * '``<``': Consume the character and switch to the **script raw 160 * '``<``': Consume the character and switch to the **script raw
(...skipping 327 matching lines...) Expand 10 before | Expand all | Expand 10 after
489 488
490 #### **Single-quoted attribute value** state #### 489 #### **Single-quoted attribute value** state ####
491 490
492 If the current character is... 491 If the current character is...
493 492
494 * '``'``': Consume the current character. Switch to the 493 * '``'``': Consume the current character. Switch to the
495 **before attribute name** state. 494 **before attribute name** state.
496 495
497 * '``&``': Consume the character and switch to the **character 496 * '``&``': Consume the character and switch to the **character
498 reference** state, with the _return state_ set to the 497 reference** state, with the _return state_ set to the
499 **single-quoted attribute value** state, the _extra terminating 498 **single-quoted attribute value** state and the _emitting operation_
500 character_ set to '``'``', and the _emitting operation_ being to 499 being to append the given character to the value of the most
501 append the given character to the value of the most recently added 500 recently added attribute.
502 attribute.
503 501
504 * Anything else: Append the current character to the value of the most 502 * Anything else: Append the current character to the value of the most
505 recently added attribute. Consume the current character. Stay in 503 recently added attribute. Consume the current character. Stay in
506 this state. 504 this state.
507 505
508 506
509 #### **Double-quoted attribute value** state #### 507 #### **Double-quoted attribute value** state ####
510 508
511 If the current character is... 509 If the current character is...
512 510
513 * '``"``': Consume the current character. Switch to the 511 * '``"``': Consume the current character. Switch to the
514 **before attribute name** state. 512 **before attribute name** state.
515 513
516 * '``&``': Consume the character and switch to the **character 514 * '``&``': Consume the character and switch to the **character
517 reference** state, with the _return state_ set to the 515 reference** state, with the _return state_ set to the
518 **double-quoted attribute value** state, the _extra terminating 516 **double-quoted attribute value** state and the _emitting operation_
519 character_ set to '``"``', and the _emitting operation_ being to 517 being to append the given character to the value of the most
520 append the given character to the value of the most recently added 518 recently added attribute.
521 attribute.
522 519
523 * Anything else: Append the current character to the value of the most 520 * Anything else: Append the current character to the value of the most
524 recently added attribute. Consume the current character. Stay in 521 recently added attribute. Consume the current character. Stay in
525 this state. 522 this state.
526 523
527 524
528 #### **Unquoted attribute value** state #### 525 #### **Unquoted attribute value** state ####
529 526
530 If the current character is... 527 If the current character is...
531 528
532 * U+0020, U+000A: Consume the current character. Switch to the 529 * U+0020, U+000A: Consume the current character. Switch to the
533 **before attribute name** state. 530 **before attribute name** state.
534 531
535 * '``>``': Consume the current character. Switch to the **data** 532 * '``>``': Consume the current character. Switch to the **data**
536 state. Switch to the **after tag** state. 533 state. Switch to the **after tag** state.
537 534
538 * '``&``': Consume the character and switch to the **character 535 * '``&``': Consume the character and switch to the **character
539 reference** state, with the _return state_ set to the **unquoted 536 reference** state, with the _return state_ set to the **unquoted
540 attribute value** state, the _extra terminating character_ unset (or 537 attribute value** state which has the same effect), and the
541 set to U+0000, which has the same effect), and the _emitting 538 _emitting operation_ being to append the given character to the
542 operation_ being to append the given character to the value of the 539 value of the most recently added attribute.
543 most recently added attribute.
544 540
545 * Anything else: Append the current character to the value of the most 541 * Anything else: Append the current character to the value of the most
546 recently added attribute. Consume the current character. Stay in 542 recently added attribute. Consume the current character. Stay in
547 this state. 543 this state.
548 544
549 545
550 #### **After tag** state #### 546 #### **After tag** state ####
551 547
552 Emit the tag token. 548 Emit the tag token.
553 549
(...skipping 76 matching lines...) Expand 10 before | Expand all | Expand 10 after
630 626
631 Let _raw value_ be the string '``&``'. 627 Let _raw value_ be the string '``&``'.
632 628
633 Append the current character to _raw value_. 629 Append the current character to _raw value_.
634 630
635 If the current character is... 631 If the current character is...
636 632
637 * '``#``': Consume the character, and switch to the **numeric 633 * '``#``': Consume the character, and switch to the **numeric
638 character reference** state. 634 character reference** state.
639 635
640 * '``l``': Consume the character and switch to the **named character 636 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': switch to the
641 reference L** state. 637 **named character reference** state without consuming the current
642 638 character.
643 * '``a``': Consume the character and switch to the **named character
644 reference A** state.
645
646 * '``g``': Consume the character and switch to the **named character
647 reference G** state.
648
649 * '``q``': Consume the character and switch to the **named character
650 reference Q** state.
651
652 * Any other character in the range '``0``'..'``9``',
653 '``a``'..'``f``', '``A``'..'``F``': Consume the character
654 and switch to the **bad named character reference** state.
655 639
656 * Anything else: Run the _emitting operation_ for all but the last 640 * Anything else: Run the _emitting operation_ for all but the last
657 character in _raw value_, and switch to the **data state** without 641 character in _raw value_, and switch to the **data state** without
658 consuming the current character. 642 consuming the current character.
659 643
660 644
661 #### **Numeric character reference** state #### 645 #### **Numeric character reference** state ####
662 646
663 Append the current character to _raw value_. 647 Append the current character to _raw value_.
664 648
(...skipping 55 matching lines...) Expand 10 before | Expand all | Expand 10 after
720 character in _raw value_, and switch to the **data state** without 704 character in _raw value_, and switch to the **data state** without
721 consuming the current character. 705 consuming the current character.
722 706
723 707
724 #### **Named character reference L** state #### 708 #### **Named character reference L** state ####
725 709
726 Append the current character to _raw value_. 710 Append the current character to _raw value_.
727 711
728 If the current character is... 712 If the current character is...
729 713
730 * '``t``': Let _character_ be '``<``', consume the current 714 * '``;``': Consume the character.
731 character, and switch to the **after named character reference** 715 If the _raw value_ is...
732 state.
733 716
734 * Anything else: Switch to the _bad named character reference_ state 717 - '``&amp;``: Emit Run the _emitting operation_ for the character
735 without consuming the character. 718 '``&``'.
736 719
720 - '``&apos;``: Emit Run the _emitting operation_ for the character
721 '``'``'.
737 722
738 #### **Named character reference A** state #### 723 - '``&gt;``: Emit Run the _emitting operation_ for the character
724 '``>``'.
739 725
740 Append the current character to _raw value_. 726 - '``&lt;``: Emit Run the _emitting operation_ for the character
727 '``<``'.
741 728
742 If the current character is... 729 - '``&quot;``: Emit Run the _emitting operation_ for the character
730 '``"``'.
743 731
744 * '``p``': Consume the current character and switch to the **named 732 Then, switch to the _return state_.
745 character reference AP** state.
746 733
747 * '``m``': Consume the current character and switch to the **named 734 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``': Consume the
748 character reference AM** state. 735 character and stay in this state.
749
750 * Anything else: Switch to the _bad named character reference_ state
751 without consuming the character.
752
753
754 #### **Named character reference AM** state ####
755
756 Append the current character to _raw value_.
757
758 If the current character is...
759
760 * '``p``': Let _character_ be '``&``', consume the current
761 character, and switch to the **after named character reference**
762 state.
763
764 * Anything else: Switch to the _bad named character reference_ state
765 without consuming the character.
766
767
768 #### **Named character reference AP** state ####
769
770 Append the current character to _raw value_.
771
772 If the current character is...
773
774 * '``o``': Consume the current character and switch to the **named
775 character reference APO** state.
776
777 * Anything else: Switch to the _bad named character reference_ state
778 without consuming the character.
779
780
781 #### **Named character reference APO** state ####
782
783 Append the current character to _raw value_.
784
785 If the current character is...
786
787 * '``s``': Let _character_ be '``'``', consume the current
788 character, and switch to the **after named character reference**
789 state.
790
791 * Anything else: Switch to the _bad named character reference_ state
792 without consuming the character.
793
794
795 #### **Named character reference G** state ####
796
797 Append the current character to _raw value_.
798
799 If the current character is...
800
801 * '``t``': Let _character_ be '``>``', consume the current
802 character, and switch to the **after named character reference**
803 state.
804
805 * Anything else: Switch to the _bad named character reference_ state
806 without consuming the character.
807
808
809 #### **Named character reference Q** state ####
810
811 Append the current character to _raw value_.
812
813 If the current character is...
814
815 * '``u``': Consume the current character and switch to the **named
816 character reference QU** state.
817
818 * Anything else: Switch to the _bad named character reference_ state
819 without consuming the character.
820
821
822 #### **Named character reference QU** state ####
823
824 Append the current character to _raw value_.
825
826 If the current character is...
827
828 * '``o``': Consume the current character and switch to the **named
829 character reference QUO** state.
830
831 * Anything else: Switch to the _bad named character reference_ state
832 without consuming the character.
833
834
835 #### **Named character reference QUO** state ####
836
837 Append the current character to _raw value_.
838
839 If the current character is...
840
841 * '``t``': Let _character_ be '``"``', consume the current
842 character, and switch to the **after named character reference**
843 state.
844
845 * Anything else: Switch to the _bad named character reference_ state
846 without consuming the character.
847
848
849 #### **After named character reference** state ####
850
851 Append the current character to _raw value_.
852
853 If the current character is...
854
855 * '``;``': Consume the character. Run the _emitting operation_ with
856 the character _character_. Switch to the _return state_.
857
858 * The _extra terminating character_: Run the _emitting operation_ with
859 the character U+FFFD. Switch to the _return state_ without consuming
860 the current character.
861
862 * Anything else: Switch to the _bad named character reference_ state
863 without consuming the current character.
864
865
866 #### **Bad named character reference** state ####
867
868 Append the current character to _raw value_.
869
870 If the current character is...
871
872 * '``;``': Consume the character. Run the _emitting operation_ with
873 the character U+FFFD. Switch to the _return state_.
874
875 * The _extra terminating character_: Switch to the _return state_
876 without consuming the current character.
877
878 * Any other character in the range '``0``'..'``9``',
879 '``a``'..'``f``', '``A``'..'``F``': Consume the character
880 and stay in this state.
881 736
882 * Anything else: Run the _emitting operation_ for all but the last 737 * Anything else: Run the _emitting operation_ for all but the last
883 character in _raw value_, and switch to the **data state** without 738 character in _raw value_, and switch to the **data state** without
884 consuming the current character. 739 consuming the current character.
885 740
886 741
887 Token cleanup stage 742 Token cleanup stage
888 ------------------- 743 -------------------
889 744
890 Replace each sequence of character tokens with a single string token 745 Replace each sequence of character tokens with a single string token
(...skipping 39 matching lines...) Expand 10 before | Expand all | Expand 10 after
930 there isn't one, skip this token. 785 there isn't one, skip this token.
931 2. If there's a ``template`` element in the _stack of open 786 2. If there's a ``template`` element in the _stack of open
932 nodes_ above _node_, then skip this token. 787 nodes_ above _node_, then skip this token.
933 3. Pop nodes from the _stack of open nodes_ until _node_ has been 788 3. Pop nodes from the _stack of open nodes_ until _node_ has been
934 popped. 789 popped.
935 4. If _node_'s tag name is ``script``, then yield until there 790 4. If _node_'s tag name is ``script``, then yield until there
936 are no pending import loads, then execute the script given by 791 are no pending import loads, then execute the script given by
937 the element's contents. 792 the element's contents.
938 3. Yield until there are no pending import loads. 793 3. Yield until there are no pending import loads.
939 3. Fire a ``load`` event at the _parsing context_ object. 794 3. Fire a ``load`` event at the _parsing context_ object.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698