Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(620)

Side by Side Diff: source/data/mappings/convrtrs.txt

Issue 598383002: Make all the single byte encodings compliant to the encoding spec. (Closed) Base URL: svn://svn.chromium.org/chrome/trunk/deps/third_party/icu52/
Patch Set: Created 6 years, 2 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch | Annotate | Revision Log
« no previous file with comments | « source/data/in/icudtl.dat ('k') | source/data/mappings/iso-8859-10-html.ucm » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 # ****************************************************************************** 1 # ******************************************************************************
2 # * 2 # *
3 # * Copyright (C) 1995-2013, International Business Machines 3 # * Copyright (C) 1995-2014, International Business Machines
4 # * Corporation and others. All Rights Reserved. 4 # * Corporation and others. All Rights Reserved.
5 # * 5 # *
6 # ****************************************************************************** 6 # ******************************************************************************
7 7
8 # If this converter alias table looks very confusing, a much easier to 8 # If this converter alias table looks very confusing, a much easier to
9 # understand view can be found at this demo: 9 # understand view can be found at this demo:
10 # http://demo.icu-project.org/icu-bin/convexp 10 # http://demo.icu-project.org/icu-bin/convexp
11 11
12 # IMPORTANT NOTE 12 # IMPORTANT NOTE
13 # 13 #
(...skipping 95 matching lines...) Expand 10 before | Expand all | Expand 10 after
109 109
110 # The immediately following list is the affinity list of supported standard tags . 110 # The immediately following list is the affinity list of supported standard tags .
111 # When multiple converters have the same alias under different standards, 111 # When multiple converters have the same alias under different standards,
112 # the standard nearest to the top of this list with that alias will 112 # the standard nearest to the top of this list with that alias will
113 # be the first converter that will be opened. The ordering of the aliases 113 # be the first converter that will be opened. The ordering of the aliases
114 # after this affinity list does not affect the preferred alias, but it may 114 # after this affinity list does not affect the preferred alias, but it may
115 # affect the order of the returned list of aliases for a given converter. 115 # affect the order of the returned list of aliases for a given converter.
116 # 116 #
117 # The general ordering is from specific and frequently used to more general 117 # The general ordering is from specific and frequently used to more general
118 # or rarely used at the bottom. 118 # or rarely used at the bottom.
119 { UTR22 # Name format specified by http://www.unicode.org/unicode/re ports/tr22/ 119 {
120 # ICU # Can also use ICU_FEATURE 120 UTR22 # Name format specified by http://www.unicode.org/unicode/re ports/tr22/
121 IBM # The IBM CCSID number is specified by ibm-* 121 HTML # WHATWG's encoding spec; https://encoding.spec.whatwg.org
122 WINDOWS # The Microsoft code page identifier number is specified by windows-*. The rest are recognized IE names.
123 JAVA # Source: Sun JDK. Alias name case is ignored, but dashes ar e not ignored.
124 # GLIBC
125 # AIX
126 # DB2
127 # SOLARIS
128 # APPLE
129 # HPUX
130 IANA # Source: http://www.iana.org/assignments/character-sets 122 IANA # Source: http://www.iana.org/assignments/character-sets
131 MIME # Source: http://www.iana.org/assignments/character-sets 123 MIME # Source: http://www.iana.org/assignments/character-sets
132 # MSIE # MSIE is Internet Explorer, which can be different from W indows (From the IMultiLanguage COM interface)
133 # ZOS_USS # z/OS (os/390) Unix System Services (USS), which has NL<- >LF swapping. They have the same format as the IBM tag.
134 } 124 }
135 125
136 126 UTF-8 { MIME* HTML* }
137 127 unicode-1-1-utf-8
138 # Fully algorithmic converters 128 utf8
139 129
140 UTF-8 { IANA* MIME* JAVA* WINDOWS } 130 utf-16be { MIME* HTML* }
141 ibm-1208 { IBM* } # UTF-8 with IBM PUA 131
142 ibm-1209 { IBM } # UTF-8 132 utf-16le { MIME* HTML* }
143 ibm-5304 { IBM } # Unicode 2.0, UTF-8 with IBM PUA 133 utf-16
144 ibm-5305 { IBM } # Unicode 2.0, UTF-8 134
145 ibm-13496 { IBM } # Unicode 3.0, UTF-8 with IBM PUA 135 # Keep UTF-32 entries for now until we sort out Blink's behavior when
146 ibm-13497 { IBM } # Unicode 3.0, UTF-8 136 # UTF-32 is dropped.
147 ibm-17592 { IBM } # Unicode 4.0, UTF-8 with IBM PUA
148 ibm-17593 { IBM } # Unicode 4.0, UTF-8
149 windows-65001 { WINDOWS* }
150 cp1208
151 x-UTF_8J
152 unicode-1-1-utf-8
153 unicode-2-0-utf-8
154
155 # The ICU 2.2 UTF-16/32 converters detect and write a BOM.
156 UTF-16 { IANA* MIME* JAVA* } ISO-10646-UCS-2 { IANA }
157 ibm-1204 { IBM* } # UTF-16 with IBM PUA and BOM sensitive
158 ibm-1205 { IBM } # UTF-16 BOM sensitive
159 unicode
160 csUnicode
161 ucs-2
162 # The following Unicode CCSIDs (IBM) are not valid in ICU because they are
163 # considered pure DBCS (exactly 2 bytes) of Unicode,
164 # and they are a subset of Unicode. ICU does not support their encoding structur es.
165 # 1400 1401 1402 1410 1414 1415 1446 1447 1448 1449 64770 64771 65520 5496 5497 5498 9592 13688
166 UTF-16BE { IANA* MIME* JAVA* } x-utf-16be { JAVA }
167 UnicodeBigUnmarked { JAVA } # java.io name
168 ibm-1200 { IBM* } # UTF-16 BE with IBM PUA
169 ibm-1201 { IBM } # UTF-16 BE
170 ibm-13488 { IBM } # Unicode 2.0, UTF-16 BE with IBM PUA
171 ibm-13489 { IBM } # Unicode 2.0, UTF-16 BE
172 ibm-17584 { IBM } # Unicode 3.0, UTF-16 BE with IBM PUA
173 ibm-17585 { IBM } # Unicode 3.0, UTF-16 BE
174 ibm-21680 { IBM } # Unicode 4.0, UTF-16 BE with IBM PUA
175 ibm-21681 { IBM } # Unicode 4.0, UTF-16 BE
176 ibm-25776 { IBM } # Unicode 4.1, UTF-16 BE with IBM PUA
177 ibm-25777 { IBM } # Unicode 4.1, UTF-16 BE
178 ibm-29872 { IBM } # Unicode 5.0, UTF-16 BE with IBM PUA
179 ibm-29873 { IBM } # Unicode 5.0, UTF-16 BE
180 ibm-61955 { IBM } # UTF-16BE with Gaidai Univers ity (Japan) PUA
181 ibm-61956 { IBM } # UTF-16BE with Microsoft HKSC S-Big 5 PUA
182 windows-1201 { WINDOWS* }
183 cp1200
184 cp1201
185 UTF16_BigEndian
186 # ibm-5297 { IBM } # Unicode 2.0, UTF-16 (BE) ( reserved, never used)
187 # iso-10646-ucs-2 { JAVA } # This is ambiguous
188 # ibm-61952 is not a valid CCSID because it's Un icode 1.1
189 # ibm-61953 is not a valid CCSID because it's Un icode 1.0
190 UTF-16LE { IANA* MIME* JAVA* } x-utf-16le { JAVA }
191 UnicodeLittleUnmarked { JAVA } # java.io name
192 ibm-1202 { IBM* } # UTF-16 LE with IBM PUA
193 ibm-1203 { IBM } # UTF-16 LE
194 ibm-13490 { IBM } # Unicode 2.0, UTF-16 LE with IBM PUA
195 ibm-13491 { IBM } # Unicode 2.0, UTF-16 LE
196 ibm-17586 { IBM } # Unicode 3.0, UTF-16 LE with IBM PUA
197 ibm-17587 { IBM } # Unicode 3.0, UTF-16 LE
198 ibm-21682 { IBM } # Unicode 4.0, UTF-16 LE with IBM PUA
199 ibm-21683 { IBM } # Unicode 4.0, UTF-16 LE
200 ibm-25778 { IBM } # Unicode 4.1, UTF-16 LE with IBM PUA
201 ibm-25779 { IBM } # Unicode 4.1, UTF-16 LE
202 ibm-29874 { IBM } # Unicode 5.0, UTF-16 LE with IBM PUA
203 ibm-29875 { IBM } # Unicode 5.0, UTF-16 LE
204 UTF16_LittleEndian
205 windows-1200 { WINDOWS* }
206
207 UTF-32 { IANA* MIME* } ISO-10646-UCS-4 { IANA } 137 UTF-32 { IANA* MIME* } ISO-10646-UCS-4 { IANA }
208 ibm-1236 { IBM* } # UTF-32 with IBM PUA and BOM sensitive
209 ibm-1237 { IBM } # UTF-32 BOM sensitive
210 csUCS4 138 csUCS4
211 ucs-4 139 ucs-4
212 UTF-32BE { IANA* } UTF32_BigEndian 140 UTF-32BE { IANA* } UTF32_BigEndian
213 ibm-1232 { IBM* } # UTF-32 BE with IBM PUA
214 ibm-1233 { IBM } # UTF-32 BE
215 ibm-9424 { IBM } # Unicode 4.1, UTF-32 BE with IBM PUA
216 UTF-32LE { IANA* } UTF32_LittleEndian 141 UTF-32LE { IANA* } UTF32_LittleEndian
217 ibm-1234 { IBM* } # UTF-32 LE, with IBM PUA 142
218 ibm-1235 { IBM } # UTF-32 LE 143 ibm866-html
219 144 IBM866 { MIME* HTML* }
220 # ICU-specific names for special uses 145 866
221 UTF16_PlatformEndian 146 cp866
222 UTF16_OppositeEndian 147 csibm866
223 148
224 UTF32_PlatformEndian 149 iso-8859-2-html
225 UTF32_OppositeEndian 150 ISO-8859-2 { MIME* HTML* }
226 151 csisolatin2
227 152 iso-ir-101
228 # Java-specific, non-Unicode-standard UTF-16 variants. 153 iso8859-2
229 # These are in the Java "Basic Encoding Set (contained in lib/rt.jar)". 154 iso88592
230 # See the "Supported Encodings" at 155 iso_8859-2
231 # http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html 156 iso_8859-2:1987
232 # or a newer version of this document. 157 l2
233 # 158 latin2
234 # Aliases marked with { JAVA* } are canonical names for java.io and java.lang AP Is. 159
235 # Aliases marked with { JAVA } are canonical names for the java.nio API. 160 iso-8859-3-html
236 # 161 ISO-8859-3 { MIME* HTML* }
237 # "BOM" means the Unicode Byte Order Mark, which is the encoding-scheme-specific 162 csisolatin3
238 # byte sequence for U+FEFF. 163 iso-ir-109
239 # "Reverse BOM" means the BOM for the sibling encoding scheme with the 164 iso8859-3
240 # opposite endianness. (LE<->BE) 165 iso88593
241 166 iso_8859-3
242 # "Sixteen-bit Unicode (or UCS) Transformation Format, big-endian byte order, 167 iso_8859-3:1988
243 # with byte-order mark" 168 l3
244 # 169 latin3
245 # From Unicode: Writes BOM. 170
246 # To Unicode: Detects and consumes BOM. 171 iso-8859-4-html
247 # If there is a "reverse BOM", Java throws 172 ISO-8859-4 { MIME* HTML* }
248 # MalformedInputException: Incorrect byte-order mark. 173 csisolatin4
249 # In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value 174 iso-ir-110
250 # and a UCNV_ILLEGAL UConverterCallbackReason. 175 iso8859-4
251 UTF-16BE,version=1 UnicodeBig { JAVA* } 176 iso88594
252 177 iso_8859-4
253 # "Sixteen-bit Unicode (or UCS) Transformation Format, little-endian byte order, 178 iso_8859-4:1988
254 # with byte-order mark" 179 l4
255 # 180 latin4
256 # From Unicode: Writes BOM. 181
257 # To Unicode: Detects and consumes BOM. 182 iso-8859-5-html
258 # If there is a "reverse BOM", Java throws 183 ISO-8859-5 { MIME* HTML* }
259 # MalformedInputException: Incorrect byte-order mark. 184 csisolatincyrillic
260 # In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value 185 cyrillic
261 # and a UCNV_ILLEGAL UConverterCallbackReason. 186 iso-ir-144
262 UTF-16LE,version=1 UnicodeLittle { JAVA* } x-UTF-16LE-BOM { JAVA } 187 iso8859-5
263 188 iso88595
264 # This one is not mentioned on the "Supported Encodings" page 189 iso_8859-5
265 # but is available in Java. 190 iso_8859-5:1988
266 # In Java, this is called "Unicode" but we cannot give it that alias 191
267 # because the standard UTF-16 converter already has a "unicode" alias. 192 iso-8859-6-html
268 # 193 ISO-8859-6 { MIME* HTML* }
269 # From Unicode: Writes BOM. 194 arabic
270 # To Unicode: Detects and consumes BOM. 195 asmo-708
271 # If there is no BOM, rather than defaulting to BE, Java throws 196 csiso88596e
272 # MalformedInputException: Missing byte-order mark. 197 csiso88596i
273 # In this case, ICU4C sets a U_ILLEGAL_ESCAPE_SEQUENCE UErrorCode value 198 csisolatinarabic
274 # and a UCNV_ILLEGAL UConverterCallbackReason. 199 ecma-114
275 UTF-16,version=1 200 iso-8859-6-e
276 201 iso-8859-6-i
277 # This is the same as standard UTF-16 but always writes a big-endian byte stream , 202 iso-ir-127
278 # regardless of the platform endianness, as expected by the Java compatibility t ests. 203 iso8859-6
279 # See the java.nio.charset.Charset API documentation at 204 iso88596
280 # http://java.sun.com/javase/6/docs/api/java/nio/charset/Charset.html 205 iso_8859-6
281 # or a newer version of this document. 206 iso_8859-6:1987
282 # 207
283 # From Unicode: Write BE BOM and BE bytes 208 iso-8859-7-html
284 # To Unicode: Detects and consumes BOM. Defaults to BE. 209 ISO-8859-7 { MIME* HTML* }
285 UTF-16,version=2 210 csisolatingreek
286 211 ecma-118
287 # Note: ICU does not currently support Java-specific, non-Unicode-standard UTF-3 2 variants. 212 elot_928
288 # Presumably, these behave analogously to the UTF-16 variants with similar names . 213 greek
289 # UTF_32BE_BOM x-UTF-32BE-BOM 214 greek8
290 # UTF_32LE_BOM x-UTF-32LE-BOM 215 iso-ir-126
291 216 iso8859-7
292 # End of Java-specific, non-Unicode-standard UTF variants. 217 iso88597
293 218 iso_8859-7
294 219 iso_8859-7:1987
295 # Chrome: Remove all the entries for UTF-7, SCSU, BOCU, CESU-8. 220 sun_eu_greek
296 221
297 # Standard iso-8859-1, which does not have the Euro update. 222 iso-8859-8-html
298 # See iso-8859-15 (latin9) for the Euro update 223 ISO-8859-8 { MIME* HTML* }
299 ISO-8859-1 { MIME* IANA JAVA* } 224 csiso88598e { MIME }
300 ibm-819 { IBM* JAVA } # This is not truely ibm-819 because it's missing t he fallbacks. 225 csisolatinhebrew
301 IBM819 { IANA } 226 hebrew
302 cp819 { IANA JAVA } 227 ISO-8859-8-E
303 latin1 { IANA JAVA } 228 ISO-8859-8-I
304 8859_1 { JAVA } 229 iso-ir-138
305 csISOLatin1 { IANA JAVA } 230 iso8859-8
306 iso-ir-100 { IANA JAVA } 231 iso88598
307 ISO_8859-1:1987 { IANA* JAVA } 232 iso_8859-8
308 l1 { IANA JAVA } 233 iso_8859-8:1988
309 819 { JAVA } 234 # adding this one leads to a failure in encoding-labels.html
310 # windows-28591 { WINDOWS* } # This has odd behavior because it has the Euro update, which isn't correct. 235 # csiso88598i
311 # LATIN_1 # Old ICU name 236
312 # ANSI_X3.110-1983 # This is for a different IANA alias. This isn't iso-88 59-1. 237
313 238 # This alias has to be dealt with by TextCodecICU unless
314 US-ASCII { MIME* IANA JAVA WINDOWS } 239 # multiple encodings can share a single mapping table.
315 ASCII { JAVA* IANA WINDOWS } 240 #ISO-8859-8-I { MIME* HTML* }
316 ANSI_X3.4-1968 { IANA* WINDOWS } 241 # csiso88598i
317 ANSI_X3.4-1986 { IANA WINDOWS } 242 # logical
318 ISO_646.irv:1991 { IANA WINDOWS } 243
319 iso_646.irv:1983 { JAVA } 244 iso-8859-10-html
320 ISO646-US { JAVA IANA WINDOWS } 245 ISO-8859-10 { MIME* HTML* }
321 us { IANA } 246 csisolatin6
322 csASCII { IANA WINDOWS } 247 iso-ir-157
323 iso-ir-6 { IANA } 248 iso8859-10
324 cp367 { IANA WINDOWS } 249 iso885910
325 ascii7 { JAVA } 250 l6
326 646 { JAVA } 251 latin6
327 windows-20127 { WINDOWS* } 252
328 ibm-367 { IBM* } IBM367 { IANA WINDOWS } # This is not truely ibm-367 becaus e it's missing the fallbacks. 253 iso-8859-13-html
329 254 ISO-8859-13 { MIME* HTML* }
330 # GB 18030 is partly algorithmic, using the MBCS converter 255 iso8859-13
331 # Chrome: HTML5 GBK an alias for GB18030 256 iso885913
332 # TODO(jshin): Decide if Chrome should follow spec. crbug.com/339862 257
333 gb18030 { IANA* } ibm-1392 { IBM* } windows-54936 { WINDOWS* } gb18030 { M IME* } 258 iso-8859-14-html
334 259 ISO-8859-14 { MIME* HTML* }
335 # Table-based interchange codepages 260 iso8859-14
336 261 iso885914
337 # Central Europe 262
338 ibm-912_P100-1995 { UTR22* } 263 iso-8859-15-html
339 ibm-912 { IBM* JAVA } 264 ISO-8859-15 { MIME* HTML* }
340 ISO-8859-2 { MIME* IANA JAVA* WINDOWS } 265 csisolatin9
341 ISO_8859-2:1987 { IANA* WINDOWS JAVA } 266 iso8859-15
342 latin2 { IANA WINDOWS JAVA } 267 iso885915
343 csISOLatin2 { IANA WINDOWS JAVA } 268 iso_8859-15
344 iso-ir-101 { IANA WINDOWS JAVA } 269 l9
345 l2 { IANA WINDOWS JAVA } 270
346 8859_2 { JAVA } 271 iso-8859-16-html
347 cp912 { JAVA } 272 ISO-8859-16 { MIME* HTML* }
348 912 { JAVA } 273
349 windows-28592 { WINDOWS* } 274 koi8-r-html
350 275 KOI8-R { MIME* HTML* }
351 # Maltese Esperanto 276 cskoi8r
352 ibm-913_P100-2000 { UTR22* } 277 koi
353 ibm-913 { IBM* JAVA } 278 koi8
354 ISO-8859-3 { MIME* IANA WINDOWS JAVA* } 279 koi8_r
355 ISO_8859-3:1988 { IANA* WINDOWS JAVA } 280
356 latin3 { IANA JAVA WINDOWS } 281 koi8-u-html
357 csISOLatin3 { IANA WINDOWS } 282 KOI8-U { MIME* HTML* }
358 iso-ir-109 { IANA WINDOWS JAVA } 283
359 l3 { IANA WINDOWS JAVA } 284 macintosh-html
360 8859_3 { JAVA } 285 macintosh { MIME* HTML* }
361 cp913 { JAVA } 286 csmacintosh
362 913 { JAVA } 287 mac
363 windows-28593 { WINDOWS* } 288 x-mac-roman
364 289
365 # Baltic 290 windows-874-html
366 ibm-914_P100-1995 { UTR22* } 291 windows-874 { MIME* HTML* }
367 ibm-914 { IBM* JAVA } 292 dos-874
368 ISO-8859-4 { MIME* IANA WINDOWS JAVA* } 293 iso-8859-11
369 latin4 { IANA WINDOWS JAVA } 294 iso8859-11
370 csISOLatin4 { IANA WINDOWS JAVA } 295 iso885911
371 iso-ir-110 { IANA WINDOWS JAVA } 296 tis-620
372 ISO_8859-4:1988 { IANA* WINDOWS JAVA } 297
373 l4 { IANA WINDOWS JAVA } 298 windows-1250-html
374 8859_4 { JAVA } 299 windows-1250 { MIME* HTML* }
375 cp914 { JAVA } 300 cp1250
376 914 { JAVA } 301 x-cp1250
377 windows-28594 { WINDOWS* } 302
378 303 windows-1251-html
379 # Cyrillic 304 windows-1251 { MIME* HTML* }
380 ibm-915_P100-1995 { UTR22* } 305 cp1251
381 ibm-915 { IBM* JAVA } 306 x-cp1251
382 ISO-8859-5 { MIME* IANA WINDOWS JAVA* } 307
383 cyrillic { IANA WINDOWS JAVA } 308 windows-1252-html
384 csISOLatinCyrillic { IANA WINDOWS JAVA } 309 windows-1252 { MIME* HTML* }
385 iso-ir-144 { IANA WINDOWS JAVA } 310 ansi_x3.4-1968
386 ISO_8859-5:1988 { IANA* WINDOWS JAVA } 311 ascii
387 8859_5 { JAVA } 312 cp1252
388 cp915 { JAVA } 313 cp819
389 915 { JAVA } 314 csisolatin1
390 windows-28595 { WINDOWS* } 315 ibm819
391 316 iso-8859-1
392 # Arabic 317 iso-ir-100
393 # ISO_8859-6-E and ISO_8859-6-I are similar to this charset, but BiDi is done di fferently 318 iso8859-1
394 # From a narrow mapping point of view, there is no difference. 319 iso88591
395 # -E means explicit. -I means implicit. 320 iso_8859-1
396 # -E requires the client to handle the ISO 6429 bidirectional controls 321 iso_8859-1:1987
397 ibm-1089_P100-1995 { UTR22* } 322 l1
398 ibm-1089 { IBM* JAVA } 323 latin1
399 ISO-8859-6 { MIME* IANA WINDOWS JAVA* } 324 us-ascii
400 arabic { IANA WINDOWS JAVA } 325 x-cp1252
401 csISOLatinArabic { IANA WINDOWS JAVA } 326
402 iso-ir-127 { IANA WINDOWS JAVA } 327 windows-1253-html
403 ISO_8859-6:1987 { IANA* WINDOWS JAVA } 328 windows-1253 { MIME* HTML* }
404 ECMA-114 { IANA JAVA } 329 cp1253
405 ASMO-708 { IANA JAVA } 330 x-cp1253
406 8859_6 { JAVA } 331
407 cp1089 { JAVA } 332 windows-1254-html
408 1089 { JAVA } 333 windows-1254 { MIME* HTML* }
409 windows-28596 { WINDOWS* } 334 cp1254
410 ISO-8859-6-I { IANA MIME } # IANA considers this alias d ifferent and BiDi needs to be applied. 335 csisolatin5
411 ISO-8859-6-E { IANA MIME } # IANA considers this alias d ifferent and BiDi needs to be applied. 336 iso-8859-9
412 x-ISO-8859-6S { JAVA } 337 iso-ir-148
413 338 iso8859-9
414 # ISO Greek (with euro update). This is really ISO_8859-7:2003 339 iso88599
415 ibm-9005_X110-2007 { UTR22* } 340 iso_8859-9
416 ibm-9005 { IBM* } 341 iso_8859-9:1989
417 ISO-8859-7 { MIME* IANA JAVA* WINDOWS } 342 l5
418 8859_7 { JAVA } 343 latin5
419 greek { IANA JAVA WINDOWS } 344 x-cp1254
420 greek8 { IANA JAVA WINDOWS } 345
421 ELOT_928 { IANA JAVA WINDOWS } 346 windows-1255-html
422 ECMA-118 { IANA JAVA WINDOWS } 347 windows-1255 { MIME* HTML* }
423 csISOLatinGreek { IANA JAVA WINDOWS } 348 cp1255
424 iso-ir-126 { IANA JAVA WINDOWS } 349 x-cp1255
425 ISO_8859-7:1987 { IANA* JAVA WINDOWS } 350
426 windows-28597 { WINDOWS* } 351 windows-1256-html
427 sun_eu_greek # For Solaris 352 windows-1256 { MIME* HTML* }
428 353 cp1256
429 # hebrew 354 x-cp1256
430 # ISO_8859-8-E and ISO_8859-8-I are similar to this charset, but BiDi is done di fferently 355
431 # From a narrow mapping point of view, there is no difference. 356 windows-1257-html
432 # -E means explicit. -I means implicit. 357 windows-1257 { MIME* HTML* }
433 # -E requires the client to handle the ISO 6429 bidirectional controls 358 cp1257
434 # This matches the official mapping on unicode.org 359 x-cp1257
435 ibm-5012_P100-1999 { UTR22* } 360
436 ibm-5012 { IBM* } 361 windows-1258-html
437 ISO-8859-8 { MIME* IANA WINDOWS JAVA* } 362 windows-1258 { MIME* HTML* }
438 hebrew { IANA WINDOWS JAVA } 363 cp1258
439 csISOLatinHebrew { IANA WINDOWS JAVA } 364 x-cp1258
440 iso-ir-138 { IANA WINDOWS JAVA } 365
441 ISO_8859-8:1988 { IANA* WINDOWS JAVA } 366 x-mac-cyrillic-html
442 ISO-8859-8-I { IANA MIME } # IANA and Windows considers this alias different and BiDi needs to be applied. 367 x-mac-cyrillic { MIME* HTML* }
443 ISO-8859-8-E { IANA MIME } # IANA and Windows considers this alias different and BiDi needs to be applied. 368 x-mac-ukrainian
444 8859_8 { JAVA } 369
445 windows-28598 { WINDOWS* } # Hebrew (ISO-Visual). A hybr id between ibm-5012 and ibm-916 with extra PUA mappings.
446 hebrew8 # Reflect HP-UX code page update
447
448 # Turkish
449 # Chrome: ISO-8859-9 and its aliases are moved to windows-1254 per
450 # HTML5.
451 ibm-920_P100-1995 { UTR22* }
452 ibm-920 { IBM* JAVA }
453 ISO-8859-9
454 latin5
455 csISOLatin5
456 iso-ir-148
457 ISO_8859-9:1989
458 l5
459 cp920 { JAVA }
460 920 { JAVA }
461 windows-28599 { WINDOWS* }
462 ECMA-128 # IANA doesn't have this alias 6/24/2002
463 turkish8 # Reflect HP-UX codepage update 8/1/2008
464 turkish # Reflect HP-UX codepage update 8/1/2008
465
466 # Nordic languages
467 iso-8859_10-1998 { UTR22* } ISO-8859-10 { MIME* IANA* }
468 iso-ir-157 { IANA }
469 l6 { IANA }
470 ISO_8859-10:1992 { IANA }
471 csISOLatin6 { IANA }
472 latin6 { IANA }
473
474 # Thai
475 # Be warned. There several iso-8859-11 codepage variants, and they are all incom patible.
476 # ISO-8859-11 is a superset of TIS-620. The difference is that ISO-8859-11 conta ins the C1 control codes.
477 iso-8859_11-2001 { UTR22* } ISO-8859-11
478 thai8 # HP-UX alias. HP-UX says TIS-620, but it's closer to ISO-8859-11.
479 x-iso-8859-11 { JAVA* }
480
481 # iso-8859-13, PC Baltic (w/o euro update)
482 ibm-921_P100-1995 { UTR22* }
483 ibm-921 { IBM* }
484 ISO-8859-13 { IANA* MIME* JAVA* }
485 8859_13 { JAVA }
486 windows-28603 { WINDOWS* }
487 cp921
488 921
489 x-IBM921 { JAVA }
490
491 # Celtic
492 iso-8859_14-1998 { UTR22* } ISO-8859-14 { IANA* }
493 iso-ir-199 { IANA }
494 ISO_8859-14:1998 { IANA }
495 latin8 { IANA }
496 iso-celtic { IANA }
497 l8 { IANA }
498
499 # Latin 9
500 ibm-923_P100-1998 { UTR22* }
501 ibm-923 { IBM* JAVA }
502 ISO-8859-15 { IANA* MIME* WINDOWS JAVA* }
503 Latin-9 { IANA WINDOWS }
504 l9 { WINDOWS }
505 8859_15 { JAVA }
506 latin0 { JAVA }
507 csisolatin0 { JAVA }
508 csisolatin9 { JAVA }
509 iso8859_15_fdis { JAVA }
510 cp923 { JAVA }
511 923 { JAVA }
512 windows-28605 { WINDOWS* }
513
514 # CJK encodings
515
516 # Chrome: Instead of ibm-943_P15A-2003, we use what's specified in the WHATWG
517 # encoding standard (HTML5) for Shift_JIS. Keep all the aliases (even though
518 not all of them not required by the encoding spec) for now.
519
520 shift_jis-html
521 ibm-943 # Leave untagged because this isn't the default
522 Shift_JIS { IANA* MIME* WINDOWS JAVA }
523 MS_Kanji { IANA WINDOWS JAVA }
524 csShiftJIS { IANA WINDOWS JAVA }
525 windows-31j { IANA JAVA } # A further extension of Shift _JIS to include NEC special characters (Row 13)
526 csWindows31J { IANA WINDOWS JAVA } # A further extension of Shift_JIS to include NEC special characters (Row 13)
527 x-sjis { WINDOWS JAVA }
528 x-ms-cp932 { WINDOWS }
529 cp932 { WINDOWS }
530 windows-932 { WINDOWS* }
531 cp943c { JAVA* } # This is slightly different, but th e backslash mapping is the same.
532 IBM-943C #{ AIX* } # Add this tag once AIX aliases becom es available
533 ms932
534 pck # Probably SOLARIS
535 sjis # This might be for ibm-1351
536 ibm-943_VSUB_VPUA
537 x-MS932_0213 { JAVA }
538 x-JISAutoDetect { JAVA }
539
540 # Chrome: Instead of ibm-33722_P*, we use what's specified in the WHATWG
541 # encoding standard (HTML5). All the
542 # 3-byte seqeunces in the normative EUC-JP are now decode-only.
543 euc-jp-html
544 EUC-JP { MIME* IANA JAVA* WINDOWS*}
545 Extended_UNIX_Code_Packed_Format_for_Japanese { IANA* JA VA WINDOWS }
546 csEUCPkdFmtJapanese { IANA JAVA WINDOWS }
547 windows-51932 { WINDOWS }
548 X-EUC-JP { MIME JAVA WINDOWS } # Japan EUC. x-euc-jp i s a MIME name
549 eucjis {JAVA}
550 ujis # Linux sometimes uses this name. This is an unfort unate generic and rarely used name. Its use is discouraged.
551
552
553 windows-950-2000 { UTR22* }
554 Big5 { IANA* MIME* JAVA* WINDOWS }
555 csBig5 { IANA WINDOWS }
556 windows-950 { WINDOWS* }
557 x-windows-950 { JAVA }
558 x-big5
559 ms950
560 # Chrome: HTML5 has big5-hkscs as an alias for big5
561 # TODO(jshin): Decide if Chrome should follow spec. crbug.com/277040
562 ibm-1375_P100-2007 { UTR22* } # Big5-HKSCS-2004 with Unicode 3.1 mappings. Thi s uses supplementary characters.
563 ibm-1375 { IBM* }
564 Big5-HKSCS { IANA* JAVA* }
565 big5hk { JAVA }
566 HKSCS-BIG5 # From http://www.openi18n.org/localenamegui de/
567
568 ibm-5471_P100-2006 { UTR22* } # Big5-HKSCS-2001 with Unicode 3.0 mappings. Thi s uses many PUA characters.
569 ibm-5471 { IBM* }
570 Big5-HKSCS
571 MS950_HKSCS { JAVA* }
572 hkbig5 # from HP-UX 11i, which can't handle supplementar y characters.
573 big5-hkscs:unicode3.0
574 x-MS950-HKSCS { JAVA }
575 # windows-950 # Windows-950 can be w/ or w/o HKSCS exten sions. By default it's not.
576 # windows-950_hkscs
577 # GBK
578 # Chrome: Added 4 GB2312 aliases and EUC-CN to Windows-936 to reflect the 370 # Chrome: Added 4 GB2312 aliases and EUC-CN to Windows-936 to reflect the
579 # reality of the web (GB2312 is treated synonymously with its 371 # reality of the web (GB2312 is treated synonymously with its
580 # superset, Windows-936/GBK) 372 # superset, Windows-936/GBK)
581 # All the aliases listed for this converter (windows-936-2000)
582 # are removed from the list of aliases for other simplified Chinese
583 # converters above.
584 # HTML5 makes GBK an alias for GB18030 373 # HTML5 makes GBK an alias for GB18030
585 # TODO(jshin): Decide if Chrome should follow spec. crbug.com/339862 374 # TODO(jshin): Decide if Chrome should follow spec. crbug.com/339862
586 windows-936-2000 { UTR22* } 375 windows-936-2000
587 GB2312 { IANA MIME } 376 GB2312 { IANA MIME }
588 GBK { IANA* MIME* WINDOWS JAVA* } 377 GBK { IANA* MIME* }
589 CP936 { IANA JAVA } 378 CP936 { IANA }
590 MS936 { IANA } # In JDK 1.5, this goes to x-mswin-936. This is an IANA name split. 379 MS936 { IANA }
591 windows-936 { IANA WINDOWS* JAVA } 380 windows-936 { IANA }
592 chinese { IANA } 381 chinese { IANA }
593 iso-ir-58 { IANA } 382 iso-ir-58 { IANA }
594 gb2312-1980 383 gb2312-1980
595 EUC-CN 384 EUC-CN
596 csGB2312 { IANA } 385 csGB2312 { IANA }
597 GB_2312-80 { IANA } 386 GB_2312-80 { IANA }
387 x-gbk
388
389 # GB 18030 is partly algorithmic, using the MBCS converter
390 gb18030 { IANA* } gb18030 { MIME* } ibm-1392 windows-54936
391
392 windows-950-2000
393 big5 { MIME* HTML* }
394 big5-hkscs
395 cn-big5
396 csbig5
397 x-x-big5
398
399 # Chrome: WHATWG encoding spec has big5-hkscs as an alias for big5
400 # TODO(jshin): Decide if Chrome should follow spec. crbug.com/277040
401 ibm-1375_P100-2007 { UTR22* } # Big5-HKSCS-2004 with Unicode 3.1 mappings. Thi s uses supplementary characters.
402 ibm-1375
403 Big5-HKSCS { MIME* IANA* }
404 big5hk
405 HKSCS-BIG5 # From http://www.openi18n.org/localenamegui de/
598 406
599 407
600 # Chrome: ibm-5478 and ibm-949 are replaced by noop-gb2312_gl and windows-949 408 euc-jp-html
601 # (ksc_5601), respectively, in ucnv2022.c 409 EUC-JP { MIME* HTML* }
410 cseucpkdfmtjapanese
411 x-euc-jp
602 412
603 # Korean EUC. 413 ISO_2022,locale=ja,version=0
414 ISO-2022-JP { MIME* HTML* }
415 csiso2022jp
604 416
605 # Chrome: Windows-949 is not EUC-KR, but a superset of EUC-KR with 8,822 417 shift_jis-html
606 # additional Hangul syllables. However, the reality of the web 418 Shift_JIS { MIME* HTML* }
607 # and HTML5 require that we treat EUC-KR a 419 csshiftjis
608 # synonym of windows-949. 420 ms_kanji
609 # All the aliases listed for this converter (windows-949-2000) 421 shift-jis
610 # are removed from the list of aliases for other Korean converters 422 sjis
611 # above. 423 windows-31j
612 windows-949-2000 { UTR22* } 424 x-sjis
613 windows-949 { JAVA* WINDOWS* }
614 EUC-KR { IANA* MIME* WINDOWS }
615 KS_C_5601-1987 { WINDOWS IANA }
616 KS_C_5601-1989 { WINDOWS IANA }
617 KSC_5601 { IANA WINDOWS } # Needed by iso-2022
618 csKSC56011987 { WINDOWS }
619 korean { IANA WINDOWS }
620 iso-ir-149 { IANA WINDOWS }
621 csEUCKR { IANA WINDOWS }
622 425
623 #Chrome: TIS-620, ISO-8859-11 and Windows-874 are slightly different from 426 windows-949-2000
624 # each other, but they're used as if they're identical on the web. This is 427 EUC-KR { MIME* HTML* }
625 # also per HTML5. 428 cseuckr
626 windows-874-2000 { UTR22* } # Thai (w/ euro update) 429 csksc56011987
627 TIS-620 { IANA* WINDOWS MIME* } 430 iso-ir-149
628 windows-874 { JAVA* WINDOWS* MIME } 431 korean
629 MS874 { JAVA } 432 ks_c_5601-1987
630 x-windows-874 { JAVA } 433 ks_c_5601-1989
631 iso-8859-11 { IANA WINDOWS MIME } # iso-8859-11 is simil ar to TIS-620. ibm-13162 is a closer match. 434 ksc5601
435 ksc_5601
436 windows-949
632 437
633 # Platform codepages 438 # We need to keep these aliases so that documents labelled with them
634 # Chrome: only keep ibm-878 for KOI8-R, ibm-1168 for KOI8-RU and ibm-866 439 # are converted to a single U+FFFD instead of being rendered as a gibberish.
635 ibm-878_P100-1996 { UTR22* } ibm-878 { IBM* } KOI8-R { IANA* MIME* WINDOWS JA VA* } koi8 { WINDOWS JAVA } csKOI8R { IANA WINDOWS JAVA } windows-20866 { WINDOW S* } cp878 # Russian internet 440 ISO-2022-KR { HTML* MIME* } csISO2022KR { IANA }
636 # Chrome: Use the table from the WHATWG encoding standard (HTML5). 441 ISO-2022-CN { IANA* HTML* } csISO2022CN x-ISO-2022-CN-GB
637 ibm866-html ibm-866 { IBM* } IBM866 { IANA* MIME* JAVA } cp866 { IANA MIME WIN DOWS JAVA* } 866 { IANA JAVA } csIBM866 { IANA JAVA } # PC Russian (w/o euro upd ate) 442 ISO-2022-CN-EXT { IANA* HTML* }
638 ibm-1168_P100-2002 { UTR22* } ibm-1168 { IBM* } KOI8-U { IANA* WINDOWS } windo ws-21866 { WINDOWS* } # Ukrainian KOI8. koi8-ru != KOI8-U and Microsoft is wrong for aliasing them as the same. 443 HZ-GB-2312 { HTML* IANA* } HZ
639 444
640 # The cp aliases in this section aren't really windows aliases, but it was used by ICU for Windows.
641 # cp is usually used to denote IBM in Java, and that is why we don't do that any more.
642 # The windows-* aliases mean windows codepages.
643 ibm-5346_P100-1998 { UTR22* } ibm-5346 { IBM* } windows-1250 { IANA* JAVA* WIN DOWS* } cp1250 { WINDOWS JAVA } # Windows Latin2 (w/ euro update)
644 ibm-5347_P100-1998 { UTR22* } ibm-5347 { IBM* } windows-1251 { IANA* JAVA* WIN DOWS* } cp1251 { WINDOWS JAVA } ANSI1251 # Windows Cyrillic (w/ euro update). AN SI1251 is from Solaris
645 ibm-5348_P100-1997 { UTR22* } ibm-5348 { IBM* } windows-1252 { IANA* JAVA* WIN DOWS* } cp1252 { JAVA } # Windows Latin1 (w/ euro update)
646 ibm-5349_P100-1998 { UTR22* } ibm-5349 { IBM* } windows-1253 { IANA* JAVA* WIN DOWS* } cp1253 { JAVA } # Windows Greek (w/ euro update)
647
648 #CHROME : Make ISO-8859-9 an alias to windows-1254 per HTML5. Move
649 # other IANA aliases for ISO-8859-9 as well.
650 ibm-5350_P100-1998 { UTR22* } ibm-5350 { IBM* } windows-1254 { MIME* IANA* JAV A* WINDOWS* } cp1254 { JAVA } # Windows Turkish (w/ euro update)
651 ISO-8859-9 { MIME }
652 latin5 { IANA }
653 csISOLatin5 { IANA }
654 iso-ir-148 { IANA }
655 ISO_8859-9:1989 { IANA }
656 l5 { IANA }
657 8859_9 { JAVA }
658 ibm-9447_P100-2002 { UTR22* } ibm-9447 { IBM* } windows-1255 { IANA* JAVA* WIN DOWS* } cp1255 { JAVA } # Windows Hebrew (w/ euro update)
659 ibm-9448_X100-2005 { UTR22* } ibm-9448 { IBM* } windows-1256 { IANA* JAVA* WIN DOWS* } cp1256 { WINDOWS JAVA } x-windows-1256S { JAVA } # Windows Arabic (w/ eu ro update)
660 ibm-9449_P100-2002 { UTR22* } ibm-9449 { IBM* } windows-1257 { IANA* JAVA* WIN DOWS* } cp1257 { JAVA } # Windows Baltic (w/ euro update)
661 ibm-5354_P100-1998 { UTR22* } ibm-5354 { IBM* } windows-1258 { IANA* JAVA* WIN DOWS* } cp1258 { JAVA } # Windows Vietnamese (w/ euro update)
662
663 # Chrome: Only MacRoman and MacCyrillic are necessary for HTML5.
664 macos-0_2-10.2 { UTR22* } macintosh { IANA* MIME* WINDOWS } mac { IANA } c sMacintosh { IANA } windows-10000 { WINDOWS* } macroman { JAVA } x-macroman { JA VA* } # Apple latin 1
665 macos-7_3-10.2 { UTR22* } x-mac-cyrillic { MIME* WINDOWS } windows-10007 { WINDOWS* } mac-cyrillic maccy x-MacCyrillic { JAVA } x-MacUkraine { JAVA* } # A pple Cyrillic
666
667 # Partially algorithmic converters
668
669 # [U_ENABLE_GENERIC_ISO_2022]
670 # The _generic_ ISO-2022 converter is disabled starting 2003-dec-03 (ICU 2.8).
671 # For details see the icu mailing list from 2003-dec-01 and the ucnv2022.c file.
672 # Language-specific variants of ISO-2022 continue to be available as listed belo w.
673 # ISO_2022 ISO-2022
674
675 # Chrome: The encoding standard only supports ISO-2022-JP.
676 # Remove ISO-2022-{KR,CN,CN-Ext} and HZ-GB from the alias table.
677 # See crbug.com/277037 and https://www.w3.org/Bugs/Public/show_bug.cgi?id=25339
678 # about HZ-GB.
679 ISO_2022,locale=ja,version=0 ISO-2022-JP { IANA* MIME* JAVA* } csISO2022JP { IANA JAVA } x-windows-iso2022jp { JAVA } x-windows-50220 { JAVA }
680
681 # Chrome: HTML5 does not need ISCII.
682 # Remove all Lotus entries as well.
683
684 # EBCDIC codepages according to the CDRA
685 # Chrome: Removed all EBCDIC code pages.
686
687 # These are not installed by default. They are rarely used.
688 # Many of them can be added through the online ICU Data Library Customization to ol
689 # Chrome: Removed all these entries except for ISO-8859-16 required by HTML5.
690
691 iso-8859_16-2001 { UTR22* } ISO-8859-16 { IANA* } iso-ir-226 { IANA } ISO_88 59-16:2001 { IANA } latin10 { IANA } l10 { IANA }
692
OLDNEW
« no previous file with comments | « source/data/in/icudtl.dat ('k') | source/data/mappings/iso-8859-10-html.ucm » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698