Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(79)

Side by Side Diff: sky/specs/parsing.md

Issue 650323005: Specs: Tokeniser should return to return state, not data state; &#x; should not emit U+FFFD. (Closed) Base URL: https://github.com/domokit/mojo.git@master
Patch Set: Created 6 years, 1 month ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Parsing 1 Parsing
2 ======= 2 =======
3 3
4 Parsing in Sky is a strict pipeline consisting of five stages: 4 Parsing in Sky is a strict pipeline consisting of five stages:
5 5
6 - decoding, which converts incoming bytes into Unicode characters 6 - decoding, which converts incoming bytes into Unicode characters
7 using UTF-8. 7 using UTF-8.
8 8
9 - normalising, which manipulates the sequence of characters. 9 - normalising, which manipulates the sequence of characters.
10 10
(...skipping 620 matching lines...) Expand 10 before | Expand all | Expand 10 after
631 If the current character is... 631 If the current character is...
632 632
633 * '``#``': Consume the character, and switch to the **numeric 633 * '``#``': Consume the character, and switch to the **numeric
634 character reference** state. 634 character reference** state.
635 635
636 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': switch to the 636 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': switch to the
637 **named character reference** state without consuming the current 637 **named character reference** state without consuming the current
638 character. 638 character.
639 639
640 * Anything else: Run the _emitting operation_ for all but the last 640 * Anything else: Run the _emitting operation_ for all but the last
641 character in _raw value_, and switch to the **data state** without 641 character in _raw value_, and switch to the _return state_ without
642 consuming the current character. 642 consuming the current character.
643 643
644 644
645 #### **Numeric character reference** state #### 645 #### **Numeric character reference** state ####
646 646
647 Append the current character to _raw value_. 647 Append the current character to _raw value_.
648 648
649 If the current character is... 649 If the current character is...
650 650
651 * '``x``', '``X``': Let _value_ be zero, consume the character, 651 * '``x``', '``X``': Consume the character and switch to the **before
652 and switch to the **hexadecimal numeric character reference** state. 652 hexadecimal numeric character reference** state.
653 653
654 * '``0``'..'``9``': Let _value_ be the numeric value of the 654 * '``0``'..'``9``': Let _value_ be the numeric value of the
655 current character interpreted as a decimal digit, consume the 655 current character interpreted as a decimal digit, consume the
656 character, and switch to the **decimal numeric character reference** 656 character, and switch to the **decimal numeric character reference**
657 state. 657 state.
658 658
659 * Anything else: Run the _emitting operation_ for all but the last 659 * Anything else: Run the _emitting operation_ for all but the last
660 character in _raw value_, and switch to the **data state** without 660 character in _raw value_, and switch to the _return state_ without
661 consuming the current character. 661 consuming the current character.
662 662
663 663
664 #### **Before hexadecimal numeric character reference** state ####
665
666 Append the current character to _raw value_.
667
668 If the current character is...
669
670 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``':
671 Let _value_ be the numeric value of the current character
672 interpreted as a hexadecimal digit, consume the character, and
673 switch to the **hexadecimal numeric character reference** state.
674
675 * Anything else: Run the _emitting operation_ for all but the last
676 character in _raw value_, and switch to the _return state_ without
677 consuming the current character.
678
679
664 #### **Hexadecimal numeric character reference** state #### 680 #### **Hexadecimal numeric character reference** state ####
665 681
666 Append the current character to _raw value_. 682 Append the current character to _raw value_.
667 683
668 If the current character is... 684 If the current character is...
669 685
670 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``': 686 * '``0``'..'``9``', '``a``'..'``f``', '``A``'..'``F``':
671 Let _value_ be sixteen times _value_ plus the numeric value of the 687 Let _value_ be sixteen times _value_ plus the numeric value of the
672 current character interpreted as a hexadecimal digit. 688 current character interpreted as a hexadecimal digit.
673 689
674 * '``;``': Consume the character. If _value_ is between 0x0001 and 690 * '``;``': Consume the character. If _value_ is between 0x0001 and
675 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, 691 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive,
676 run the _emitting operation_ with a unicode character having the 692 run the _emitting operation_ with a unicode character having the
677 scalar value _value_; otherwise, run the _emitting operation_ with 693 scalar value _value_; otherwise, run the _emitting operation_ with
678 the character U+FFFD. Then, in either case, switch to the _return 694 the character U+FFFD. Then, in either case, switch to the _return
679 state_. 695 state_.
680 696
681 * Anything else: Run the _emitting operation_ for all but the last 697 * Anything else: Run the _emitting operation_ for all but the last
682 character in _raw value_, and switch to the **data state** without 698 character in _raw value_, and switch to the _return state_ without
683 consuming the current character. 699 consuming the current character.
684 700
685 701
686 #### **Decimal numeric character reference** state #### 702 #### **Decimal numeric character reference** state ####
687 703
688 Append the current character to _raw value_. 704 Append the current character to _raw value_.
689 705
690 If the current character is... 706 If the current character is...
691 707
692 * '``0``'..'``9``': Let _value_ be ten times _value_ plus the 708 * '``0``'..'``9``': Let _value_ be ten times _value_ plus the
693 numeric value of the current character interpreted as a decimal 709 numeric value of the current character interpreted as a decimal
694 digit. 710 digit.
695 711
696 * '``;``': Consume the character. If _value_ is between 0x0001 and 712 * '``;``': Consume the character. If _value_ is between 0x0001 and
697 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive, 713 0x10FFFF inclusive, but is not between 0xD800 and 0xDFFF inclusive,
698 run the _emitting operation_ with a unicode character having the 714 run the _emitting operation_ with a unicode character having the
699 scalar value _value_; otherwise, run the _emitting operation_ with 715 scalar value _value_; otherwise, run the _emitting operation_ with
700 the character U+FFFD. Then, in either case, switch to the _return 716 the character U+FFFD. Then, in either case, switch to the _return
701 state_. 717 state_.
702 718
703 * Anything else: Run the _emitting operation_ for all but the last 719 * Anything else: Run the _emitting operation_ for all but the last
704 character in _raw value_, and switch to the **data state** without 720 character in _raw value_, and switch to the _return state_ without
705 consuming the current character. 721 consuming the current character.
706 722
707 723
708 #### **Named character reference** state #### 724 #### **Named character reference** state ####
709 725
710 Append the current character to _raw value_. 726 Append the current character to _raw value_.
711 727
712 If the current character is... 728 If the current character is...
713 729
714 * '``;``': Consume the character. 730 * '``;``': Consume the character.
(...skipping 13 matching lines...) Expand all
728 744
729 - '``"``: Emit Run the _emitting operation_ for the character 745 - '``"``: Emit Run the _emitting operation_ for the character
730 '``"``'. 746 '``"``'.
731 747
732 Then, switch to the _return state_. 748 Then, switch to the _return state_.
733 749
734 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``': Consume the 750 * '``0``'..'``9``', '``a``'..'``z``', '``A``'..'``Z``': Consume the
735 character and stay in this state. 751 character and stay in this state.
736 752
737 * Anything else: Run the _emitting operation_ for all but the last 753 * Anything else: Run the _emitting operation_ for all but the last
738 character in _raw value_, and switch to the **data state** without 754 character in _raw value_, and switch to the _return state_ without
739 consuming the current character. 755 consuming the current character.
740 756
741 757
742 Token cleanup stage 758 Token cleanup stage
743 ------------------- 759 -------------------
744 760
745 Replace each sequence of character tokens with a single string token 761 Replace each sequence of character tokens with a single string token
746 whose value is the concatenation of all the characters in the 762 whose value is the concatenation of all the characters in the
747 character tokens. 763 character tokens.
748 764
(...skipping 41 matching lines...) Expand 10 before | Expand all | Expand 10 after
790 there isn't one, skip this token. 806 there isn't one, skip this token.
791 2. If there's a ``template`` element in the _stack of open 807 2. If there's a ``template`` element in the _stack of open
792 nodes_ above _node_, then skip this token. 808 nodes_ above _node_, then skip this token.
793 3. Pop nodes from the _stack of open nodes_ until _node_ has been 809 3. Pop nodes from the _stack of open nodes_ until _node_ has been
794 popped. 810 popped.
795 4. If _node_'s tag name is ``script``, then yield until there 811 4. If _node_'s tag name is ``script``, then yield until there
796 are no pending import loads, then execute the script given by 812 are no pending import loads, then execute the script given by
797 the element's contents. 813 the element's contents.
798 3. Yield until there are no pending import loads. 814 3. Yield until there are no pending import loads.
799 3. Fire a ``load`` event at the _parsing context_ object. 815 3. Fire a ``load`` event at the _parsing context_ object.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698