OLD | NEW |
1 # SpecialCasing-6.3.0.txt | 1 # SpecialCasing-7.0.0.txt |
2 # Date: 2013-05-08, 13:54:51 GMT [MD] | 2 # Date: 2014-03-18, 07:18:02 GMT [MD] |
3 # | 3 # |
4 # Unicode Character Database | 4 # Unicode Character Database |
5 # Copyright (c) 1991-2013 Unicode, Inc. | 5 # Copyright (c) 1991-2014 Unicode, Inc. |
6 # For terms of use, see http://www.unicode.org/terms_of_use.html | 6 # For terms of use, see http://www.unicode.org/terms_of_use.html |
7 # For documentation, see http://www.unicode.org/reports/tr44/ | 7 # For documentation, see http://www.unicode.org/reports/tr44/ |
8 # | 8 # |
9 # Special Casing Properties | 9 # Special Casing |
10 # | 10 # |
11 # This file is a supplement to the UnicodeData file. | 11 # This file is a supplement to the UnicodeData.txt file. It does not define any |
12 # It contains additional information about the casing of Unicode characters. | 12 # properties, but rather provides additional information about the casing of |
13 # (For compatibility, the UnicodeData.txt file only contains case mappings for | 13 # Unicode characters, for situations when casing incurs a change in string lengt
h |
14 # characters where they are 1-1, and independent of context and language. | 14 # or is dependent on context or locale. For compatibility, the UnicodeData.txt |
15 # For more information, see the discussion of Case Mappings in the Unicode Stand
ard. | 15 # file only contains simple case mappings for characters where they are one-to-o
ne |
| 16 # and independent of context and language. The data in this file, combined with |
| 17 # the simple case mappings in UnicodeData.txt, defines the full case mappings |
| 18 # Lowercase_Mapping (lc), Titlecase_Mapping (tc), and Uppercase_Mapping (uc). |
| 19 # |
| 20 # Note that the preferred mechanism for defining tailored casing operations is |
| 21 # the Unicode Common Locale Data Repository (CLDR). For more information, see th
e |
| 22 # discussion of case mappings and case algorithms in the Unicode Standard. |
16 # | 23 # |
17 # All code points not listed in this file that do not have a simple case mapping
s | 24 # All code points not listed in this file that do not have a simple case mapping
s |
18 # in UnicodeData.txt map to themselves. | 25 # in UnicodeData.txt map to themselves. |
19 # ==============================================================================
== | 26 # ==============================================================================
== |
20 # Format | 27 # Format |
21 # ==============================================================================
== | 28 # ==============================================================================
== |
22 # The entries in this file are in the following machine-readable format: | 29 # The entries in this file are in the following machine-readable format: |
23 # | 30 # |
24 # <code>; <lower> ; <title> ; <upper> ; (<condition_list> ;)? # <comment> | 31 # <code>; <lower>; <title>; <upper>; (<condition_list>;)? # <comment> |
25 # | 32 # |
26 # <code>, <lower>, <title>, and <upper> provide character values in hex. If ther
e is more | 33 # <code>, <lower>, <title>, and <upper> provide the respective full case mapping
s |
27 # than one character, they are separated by spaces. Other than as used to separa
te | 34 # of <code>, expressed as character values in hex. If there is more than one cha
racter, |
28 # elements, spaces are to be ignored. | 35 # they are separated by spaces. Other than as used to separate elements, spaces
are |
| 36 # to be ignored. |
29 # | 37 # |
30 # The <condition_list> is optional. Where present, it consists of one or more la
nguage IDs | 38 # The <condition_list> is optional. Where present, it consists of one or more la
nguage IDs |
31 # or contexts, separated by spaces. In these conditions: | 39 # or casing contexts, separated by spaces. In these conditions: |
32 # - A condition list overrides the normal behavior if all of the listed conditio
ns are true. | 40 # - A condition list overrides the normal behavior if all of the listed conditio
ns are true. |
33 # - The context is always the context of the characters in the original string, | 41 # - The casing context is always the context of the characters in the original s
tring, |
34 # NOT in the resulting string. | 42 # NOT in the resulting string. |
35 # - Case distinctions in the condition list are not significant. | 43 # - Case distinctions in the condition list are not significant. |
36 # - Conditions preceded by "Not_" represent the negation of the condition. | 44 # - Conditions preceded by "Not_" represent the negation of the condition. |
37 # The condition list is not represented in the UCD as a formal property. | 45 # The condition list is not represented in the UCD as a formal property. |
38 # | 46 # |
39 # A language ID is defined by BCP 47, with '-' and '_' treated equivalently. | 47 # A language ID is defined by BCP 47, with '-' and '_' treated equivalently. |
40 # | 48 # |
41 # A context for a character C is defined by Section 3.13 Default Case | 49 # A casing context for a character is defined by Section 3.13 Default Case Algor
ithms |
42 # Algorithms, of The Unicode Standard, Version 6.3. | 50 # of The Unicode Standard. |
43 # (This is identical to the context defined by Unicode 4.1.0, | |
44 # as specified in http://www.unicode.org/versions/Unicode4.1.0/) | |
45 # | 51 # |
46 # Parsers of this file must be prepared to deal with future additions to this fo
rmat: | 52 # Parsers of this file must be prepared to deal with future additions to this fo
rmat: |
47 # * Additional contexts | 53 # * Additional contexts |
48 # * Additional fields | 54 # * Additional fields |
49 # ==============================================================================
== | 55 # ==============================================================================
== |
50 | 56 |
51 # @missing: 0000..10FFFF; <slc>; <stc>; <suc>; | |
52 | |
53 # ==============================================================================
== | 57 # ==============================================================================
== |
54 # Unconditional mappings | 58 # Unconditional mappings |
55 # ==============================================================================
== | 59 # ==============================================================================
== |
56 | 60 |
57 # The German es-zed is special--the normal mapping is to SS. | 61 # The German es-zed is special--the normal mapping is to SS. |
58 # Note: the titlecase should never occur in practice. It is equal to titlecase(u
ppercase(<es-zed>)) | 62 # Note: the titlecase should never occur in practice. It is equal to titlecase(u
ppercase(<es-zed>)) |
59 | 63 |
60 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S | 64 00DF; 00DF; 0053 0073; 0053 0053; # LATIN SMALL LETTER SHARP S |
61 | 65 |
62 # Preserve canonical equivalence for I with dot. Turkic is handled below. | 66 # Preserve canonical equivalence for I with dot. Turkic is handled below. |
(...skipping 44 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
107 1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI | 111 1FE6; 1FE6; 03A5 0342; 03A5 0342; # GREEK SMALL LETTER UPSILON WITH PERISPOMENI |
108 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DI
ALYTIKA AND PERISPOMENI | 112 1FE7; 1FE7; 03A5 0308 0342; 03A5 0308 0342; # GREEK SMALL LETTER UPSILON WITH DI
ALYTIKA AND PERISPOMENI |
109 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI | 113 1FF6; 1FF6; 03A9 0342; 03A9 0342; # GREEK SMALL LETTER OMEGA WITH PERISPOMENI |
110 | 114 |
111 # IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, | 115 # IMPORTANT-when iota-subscript (0345) is uppercased or titlecased, |
112 # the result will be incorrect unless the iota-subscript is moved to the end | 116 # the result will be incorrect unless the iota-subscript is moved to the end |
113 # of any sequence of combining marks. Otherwise, the accents will go on the cap
ital iota. | 117 # of any sequence of combining marks. Otherwise, the accents will go on the cap
ital iota. |
114 # This process can be achieved by first transforming the text to NFC before cas
ing. | 118 # This process can be achieved by first transforming the text to NFC before cas
ing. |
115 # E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA> | 119 # E.g. <alpha><iota_subscript><acute> is uppercased to <ALPHA><acute><IOTA> |
116 | 120 |
117 # The following cases are already in the UnicodeData file, so are only commented
here. | 121 # The following cases are already in the UnicodeData.txt file, so are only comme
nted here. |
118 | 122 |
119 # 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI | 123 # 0345; 0345; 0345; 0399; # COMBINING GREEK YPOGEGRAMMENI |
120 | 124 |
121 # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscri
pt) | 125 # All letters with YPOGEGRAMMENI (iota-subscript) or PROSGEGRAMMENI (iota adscri
pt) |
122 # have special uppercases. | 126 # have special uppercases. |
123 # Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase! | 127 # Note: characters with PROSGEGRAMMENI are actually titlecase, not uppercase! |
124 | 128 |
125 1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAM
MENI | 129 1F80; 1F80; 1F88; 1F08 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND YPOGEGRAM
MENI |
126 1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAM
MENI | 130 1F81; 1F81; 1F89; 1F09 0399; # GREEK SMALL LETTER ALPHA WITH DASIA AND YPOGEGRAM
MENI |
127 1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND
YPOGEGRAMMENI | 131 1F82; 1F82; 1F8A; 1F0A 0399; # GREEK SMALL LETTER ALPHA WITH PSILI AND VARIA AND
YPOGEGRAMMENI |
(...skipping 70 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
198 # Language-Insensitive Mappings | 202 # Language-Insensitive Mappings |
199 # These are characters whose full case mappings do not depend on language, but d
o | 203 # These are characters whose full case mappings do not depend on language, but d
o |
200 # depend on context (which characters come before or after). For more informatio
n | 204 # depend on context (which characters come before or after). For more informatio
n |
201 # see the header of this file and the Unicode Standard. | 205 # see the header of this file and the Unicode Standard. |
202 # ==============================================================================
== | 206 # ==============================================================================
== |
203 | 207 |
204 # Special case for final form of sigma | 208 # Special case for final form of sigma |
205 | 209 |
206 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA | 210 03A3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK CAPITAL LETTER SIGMA |
207 | 211 |
208 # Note: the following cases for non-final are already in the UnicodeData file. | 212 # Note: the following cases for non-final are already in the UnicodeData.txt fil
e. |
209 | 213 |
210 # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA | 214 # 03A3; 03C3; 03A3; 03A3; # GREEK CAPITAL LETTER SIGMA |
211 # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA | 215 # 03C3; 03C3; 03A3; 03A3; # GREEK SMALL LETTER SIGMA |
212 # 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA | 216 # 03C2; 03C2; 03A3; 03A3; # GREEK SMALL LETTER FINAL SIGMA |
213 | 217 |
214 # Note: the following cases are not included, since they would case-fold in lowe
rcasing | 218 # Note: the following cases are not included, since they would case-fold in lowe
rcasing |
215 | 219 |
216 # 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA | 220 # 03C3; 03C2; 03A3; 03A3; Final_Sigma; # GREEK SMALL LETTER SIGMA |
217 # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA | 221 # 03C2; 03C3; 03A3; 03A3; Not_Final_Sigma; # GREEK SMALL LETTER FINAL SIGMA |
218 | 222 |
(...skipping 42 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
261 # When lowercasing, unless an I is before a dot_above, it turns into a dotless i
. | 265 # When lowercasing, unless an I is before a dot_above, it turns into a dotless i
. |
262 | 266 |
263 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I | 267 0049; 0131; 0049; 0049; tr Not_Before_Dot; # LATIN CAPITAL LETTER I |
264 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I | 268 0049; 0131; 0049; 0049; az Not_Before_Dot; # LATIN CAPITAL LETTER I |
265 | 269 |
266 # When uppercasing, i turns into a dotted capital I | 270 # When uppercasing, i turns into a dotted capital I |
267 | 271 |
268 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I | 272 0069; 0069; 0130; 0130; tr; # LATIN SMALL LETTER I |
269 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I | 273 0069; 0069; 0130; 0130; az; # LATIN SMALL LETTER I |
270 | 274 |
271 # Note: the following case is already in the UnicodeData file. | 275 # Note: the following case is already in the UnicodeData.txt file. |
272 | 276 |
273 # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I | 277 # 0131; 0131; 0049; 0049; tr; # LATIN SMALL LETTER DOTLESS I |
274 | 278 |
275 # EOF | 279 # EOF |
276 | 280 |
OLD | NEW |