| OLD | NEW |
| 1 # SpecialCasing-6.3.0.txt | 1 # SpecialCasing-7.0.0.txt |
| 2 # Date: 2013-05-08, 13:54:51 GMT [MD] | 2 # Date: 2014-03-18, 07:18:02 GMT [MD] |
| 3 # | 3 # |
| 4 # Unicode Character Database | 4 # Unicode Character Database |
| 5 # Copyright (c) 1991-2013 Unicode, Inc. | 5 # Copyright (c) 1991-2014 Unicode, Inc. |
| 6 # For terms of use, see http://www.unicode.org/terms_of_use.html | 6 # For terms of use, see http://www.unicode.org/terms_of_use.html |
| 7 # For documentation, see http://www.unicode.org/reports/tr44/ | 7 # For documentation, see http://www.unicode.org/reports/tr44/ |
| 8 # | 8 # |
| 9 # Special Casing Properties | 9 # Special Casing |
| 10 # | 10 # |
| 11 # This file is a supplement to the UnicodeData file. | 11 # This file is a supplement to the UnicodeData.txt file. It does not define any |
| 12 # It contains additional information about the casing of Unicode characters. | 12 # properties, but rather provides additional information about the casing of |
| 13 # (For compatibility, the UnicodeData.txt file only contains case mappings for | 13 # Unicode characters, for situations when casing incurs a change in string lengt
h |
| 14 # characters where they are 1-1, and independent of context and language. | 14 # or is dependent on context or locale. For compatibility, the UnicodeData.txt |
| 15 # For more information, see the discussion of Case Mappings in the Unicode Stand
ard. | 15 # file only contains simple case mappings for characters where they are one-to-o
ne |
| 16 # and independent of context and language. The data in this file, combined with |
| 17 # the simple case mappings in UnicodeData.txt, defines the full case mappings |
| 18 # Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc). |
| 19 # |
| 20 # Note that the preferred mechanism for defining tailored casing operations is |
| 21 # the Unicode Common Locale Data Repository (CLDR). For more information, see th
e |
| 22 # discussion of case mappings and case algorithms in the Unicode Standard. |
| 16 # | 23 # |
| 17 # All code points not listed in this file that do not have a simple case mapping
s | 24 # All code points not listed in this file that do not have a simple case mapping
s |
| 18 # in UnicodeData.txt map to themselves. | 25 # in UnicodeData.txt map to themselves. |
| 19 # ==============================================================================
== | 26 # ==============================================================================
== |
| 20 # Format | 27 # Format |
| 21 # ==============================================================================
== | 28 # ==============================================================================
== |
| 22 # The entries in this file are in the following machine-readable format: | 29 # The entries in this file are in the following machine-readable format: |
| 23 # | 30 # |
| 24 # <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment> | 31 # <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment> |
| 25 # | 32 # |
| 26 # <code>, <lower>, <title>, and <upper> provide character values in hex. If ther
e is more | 33 # <code>, <lower>, <title>, and <upper> provide the respective full case mapping
s |
| 27 # than one character, they are separated by spaces. Other than as used to separa
te | 34 # of <code>, expressed as character values in hex. If there is more than one cha
racter, |
| 28 # elements, spaces are to be ignored. | 35 # they are separated by spaces. Other than as used to separate elements, spaces
are |
| 36 # to be ignored. |
| 29 # | 37 # |
| 30 # The <condition_list> is optional. Where present, it consists of one or more la
nguage IDs | 38 # The <condition_list> is optional. Where present, it consists of one or more la
nguage IDs |
| 31 # or contexts, separated by spaces. In these conditions: | 39 # or casing contexts, separated by spaces. In these conditions: |
| 32 # - A condition list overrides the normal behavior if all of the listed conditio
ns are true. | 40 # - A condition list overrides the normal behavior if all of the listed conditio
ns are true. |
| 33 # - The context is always the context of the characters in the original string, | 41 # - The casing context is always the context of the characters in the original s
tring, |
| 34 # NOT in the resulting string. | 42 # NOT in the resulting string. |
| 35 # - Case distinctions in the condition list are not significant. | 43 # - Case distinctions in the condition list are not significant. |
| 36 # - Conditions preceded by "Not_" represent the negation of the condition. | 44 # - Conditions preceded by "Not_" represent the negation of the condition. |
| 37 # The condition list is not represented in the UCD as a formal property. | 45 # The condition list is not represented in the UCD as a formal property. |
| 38 # | 46 # |
| 39 # A language ID is defined by BCP 47, with '-' and '_' treated equivalently. | 47 # A language ID is defined by BCP 47, with '-' and '_' treated equivalently. |
| 40 # | 48 # |
| 41 # A context for a character C is defined by Section 3.13 Default Case | 49 # A casing context for a character is defined by Section 3.13 Default Case Algor
ithms |
| 42 # Algorithms, of The Unicode Standard, Version 6.3. | 50 # of The Unicode Standard. |
| 43 # (This is identical to the context defined by Unicode 4.1.0, | |
| 44 # as specified in http://www.unicode.org/versions/Unicode4.1.0/) | |
| 45 # | 51 # |
| 46 # Parsers of this file must be prepared to deal with future additions to this fo
rmat: | 52 # Parsers of this file must be prepared to deal with future additions to this fo
rmat: |
| 47 # * Additional contexts | 53 # * Additional contexts |
| 48 # * Additional fields | 54 # * Additional fields |
| 49 # ==============================================================================
== | 55 # ==============================================================================
== |
| 50 | 56 |
| 51 # @missing: 0000..10FFFF; <slc>; <stc>; <suc>; | |
| 52 | |
| 53 # ==============================================================================
== | 57 # ==============================================================================
== |
| 54 # Unconditional mappings | 58 # Unconditional mappings |
| 55 # ==============================================================================
== | 59 # ==============================================================================
== |
| 56 | 60 |
| 57 # The German es-zed is special--the normal mapping is to SS. | 61 # The German es-zed is special--the normal mapping is to SS. |
| 58 # Note: the titlecase should never occur in practice. It is equal to titlecase(u
ppercase(<es-zed>)) | 62 # Note: the titlecase should never occur in practice. It is equal to titlecase(u
ppercase(<es-zed>)) |
| 59 | 63 |
| 60 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S | 64 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S |
| 61 | 65 |
| 62 # Preserve canonical equivalence for I with dot. Turkic is handled below. | 66 # Preserve canonical equivalence for I with dot. Turkic is handled below. |
| (...skipping 44 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 107 1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI | 111 1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI |
| 108 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DI
ALYTIKA AND PERISPOMENI | 112 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DI
ALYTIKA AND PERISPOMENI |
| 109 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI | 113 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI |
| 110 | 114 |
| 111 # IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, | 115 # IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, |
| 112 # the result will be incorrect unless the iota-subscript is moved to the end | 116 # the result will be incorrect unless the iota-subscript is moved to the end |
| 113 # of any sequence of combining marks. Otherwise, the accents will go on the cap
ital iota. | 117 # of any sequence of combining marks. Otherwise, the accents will go on the cap
ital iota. |
| 114 # This process can be achieved by first transforming the text to NFC before cas
ing. | 118 # This process can be achieved by first transforming the text to NFC before cas
ing. |
| 115 # E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA> | 119 # E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA> |
| 116 | 120 |
| 117 # The following cases are already in the UnicodeData file, so are only commented
here. | 121 # The following cases are already in the UnicodeData.txt file, so are only comme
nted here. |
| 118 | 122 |
| 119 # 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI | 123 # 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI |
| 120 | 124 |
| 121 # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscri
pt) | 125 # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscri
pt) |
| 122 # have special uppercases. | 126 # have special uppercases. |
| 123 # Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase! | 127 # Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase! |
| 124 | 128 |
| 125 1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAM
MENI | 129 1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAM
MENI |
| 126 1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAM
MENI | 130 1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAM
MENI |
| 127 1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND
YPOGEGRAMMENI | 131 1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND
YPOGEGRAMMENI |
| (...skipping 70 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 198 # Language-Insensitive Mappings | 202 # Language-Insensitive Mappings |
| 199 # These are characters whose full case mappings do not depend on language, but d
o | 203 # These are characters whose full case mappings do not depend on language, but d
o |
| 200 # depend on context (which characters come before or after). For more informatio
n | 204 # depend on context (which characters come before or after). For more informatio
n |
| 201 # see the header of this file and the Unicode Standard. | 205 # see the header of this file and the Unicode Standard. |
| 202 # ==============================================================================
== | 206 # ==============================================================================
== |
| 203 | 207 |
| 204 # Special case for final form of sigma | 208 # Special case for final form of sigma |
| 205 | 209 |
| 206 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA | 210 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA |
| 207 | 211 |
| 208 # Note: the following cases for non-final are already in the UnicodeData file. | 212 # Note: the following cases for non-final are already in the UnicodeData.txt fil
e. |
| 209 | 213 |
| 210 # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA | 214 # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA |
| 211 # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA | 215 # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA |
| 212 # 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA | 216 # 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA |
| 213 | 217 |
| 214 # Note: the following cases are not included, since they would case-fold in lowe
rcasing | 218 # Note: the following cases are not included, since they would case-fold in lowe
rcasing |
| 215 | 219 |
| 216 # 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA | 220 # 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA |
| 217 # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA | 221 # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA |
| 218 | 222 |
| (...skipping 42 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
| 261 # When lowercasing, unless an I is before a dot_above, it turns into a dotless i
. | 265 # When lowercasing, unless an I is before a dot_above, it turns into a dotless i
. |
| 262 | 266 |
| 263 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I | 267 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I |
| 264 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I | 268 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I |
| 265 | 269 |
| 266 # When uppercasing, i turns into a dotted capital I | 270 # When uppercasing, i turns into a dotted capital I |
| 267 | 271 |
| 268 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I | 272 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I |
| 269 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I | 273 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I |
| 270 | 274 |
| 271 # Note: the following case is already in the UnicodeData file. | 275 # Note: the following case is already in the UnicodeData.txt file. |
| 272 | 276 |
| 273 # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I | 277 # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I |
| 274 | 278 |
| 275 # EOF | 279 # EOF |
| 276 | 280 |
| OLD | NEW |