Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(380)

Issue 598383002: Make all the single byte encodings compliant to the encoding spec. (Closed)

Created:
6 years, 2 months ago by jungshik at Google
Modified:
6 years, 2 months ago
Reviewers:
jsbell
CC:
chromium-reviews
Visibility:
Public.

Description

Make all the single byte encodings compliant to the encoding spec. 1. Replace the current encoding alias list (heavily patched) with our own HTML5-specific alias list. It's mostly generated from encoding.json, which is in turn derived from the WHATWG Encoding living standard. The most notable difference is that UTF-32 entries are kept until bug 417850 is resolved. Two other differences are: a. Two aliases for iso-8859-8-i (logical and csiso88598i) are not listed. They're dealt with in Blink. b. Chinese (gb*, big5*) aliases are not yet aligned to the encoding spec pending our decision on the unification of Big5 / Big5-HKSCS and GBK / GB18030. 2. Replace all the single-byte mapping tables with what's automatically generated with scripts/single-byte-gen.sh that uses index-* files downloaded from the WHATWG spec site. This will fix the decoding (ToUnicode) of windows-874 and windows-1253 while removing a lot of fallback/spurrious mapping entries in encoding direction ('FromUnicode') in a number of encodings. 3. Regenerate the ICU binary data files for Linux/Mac/Android/Windows/CrOS. 4. Remove now obsolete noop-*ucm files used to make ISO-2022-CN* decoder to turn an empty string. They're not necessary any more because ISO-2022-CN* were made 'replacement' encodings in Blink and our version of ICU does not have any code for ISO-2022-CN* any more. This cuts down the data size by 15kB. On Android, there's virtually no change in the data size because the previous data file on Android accidentally had smaller locale data for nb and ms. BUG=412053 TEST=browser_tests --gtest_filter="*ncoding*" TEST=net_unittest --gtest_filter="*ilenameUtil*" TEST=base_unittests --gtest_filter="*Conv*" TEST=Blink: fast/encoding/* TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-indexes TEST=http://www.w3.org/International/tests/repository/encoding/indexes/results-aliases TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-1253_test TEST=http://www.w3.org/International/tests/repository/run?manifest=encoding/indexes&test=windows-874_test R=jsbell@chromium.org Committed: https://src.chromium.org/viewvc/chrome?view=rev&revision=292447

Patch Set 1 #

Patch Set 2 : #

Patch Set 3 : #

Total comments: 10

Patch Set 4 : #

Unified diffs Side-by-side diffs Delta from patch set Stats (+7382 lines, -624 lines) Patch
M README.chromium View 1 2 3 1 chunk +7 lines, -7 lines 0 comments Download
M android/icudtl.dat View Binary file 0 comments Download
A scripts/single_byte_gen.sh View 1 2 3 1 chunk +64 lines, -0 lines 0 comments Download
M source/data/in/icudtl.dat View Binary file 0 comments Download
M source/data/mappings/convrtrs.txt View 1 2 3 3 chunks +263 lines, -511 lines 0 comments Download
A source/data/mappings/iso-8859-10-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-13-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-14-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-15-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-16-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-2-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-3-html.ucm View 1 2 3 1 chunk +268 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-4-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-5-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-6-html.ucm View 1 2 3 1 chunk +230 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-7-html.ucm View 1 2 3 1 chunk +272 lines, -0 lines 0 comments Download
A source/data/mappings/iso-8859-8-html.ucm View 1 2 3 1 chunk +239 lines, -0 lines 0 comments Download
A source/data/mappings/koi8-r-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/koi8-u-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/macintosh-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
D source/data/mappings/noop-cns-11643.ucm View 1 2 3 1 chunk +0 lines, -31 lines 0 comments Download
D source/data/mappings/noop-gb2312_gl.ucm View 1 2 3 1 chunk +0 lines, -32 lines 0 comments Download
D source/data/mappings/noop-iso-ir-165.ucm View 1 2 3 1 chunk +0 lines, -29 lines 0 comments Download
M source/data/mappings/ucmlocal.mk View 1 2 3 2 chunks +13 lines, -14 lines 0 comments Download
A source/data/mappings/windows-1250-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1251-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1252-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1253-html.ucm View 1 2 3 1 chunk +272 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1254-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1255-html.ucm View 1 2 3 1 chunk +264 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1256-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1257-html.ucm View 1 2 3 1 chunk +273 lines, -0 lines 0 comments Download
A source/data/mappings/windows-1258-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download
A source/data/mappings/windows-874-html.ucm View 1 2 3 1 chunk +267 lines, -0 lines 0 comments Download
A source/data/mappings/x-mac-cyrillic-html.ucm View 1 2 3 1 chunk +275 lines, -0 lines 0 comments Download

Messages

Total messages: 7 (1 generated)
jungshik at Google
Can you take a look? Thanks. With this change, we're multiple steps closer to the ...
6 years, 2 months ago (2014-10-03 23:56:44 UTC) #2
jsbell
https://codereview.chromium.org/598383002/diff/20001/scripts/single_byte_gen.sh File scripts/single_byte_gen.sh (right): https://codereview.chromium.org/598383002/diff/20001/scripts/single_byte_gen.sh#newcode31 scripts/single_byte_gen.sh:31: encodings="ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6\ Consider including a ...
6 years, 2 months ago (2014-10-04 00:15:26 UTC) #3
jungshik at Google
https://codereview.chromium.org/598383002/diff/20001/scripts/single_byte_gen.sh File scripts/single_byte_gen.sh (right): https://codereview.chromium.org/598383002/diff/20001/scripts/single_byte_gen.sh#newcode31 scripts/single_byte_gen.sh:31: encodings="ibm866 iso-8859-2 iso-8859-3 iso-8859-4 iso-8859-5 iso-8859-6\ On 2014/10/04 00:15:26, ...
6 years, 2 months ago (2014-10-04 09:32:41 UTC) #4
jsbell
lgtm; I sanity checked the aliases and some of the single-byte mappings. https://codereview.chromium.org/598383002/diff/20001/source/data/mappings/convrtrs.txt File source/data/mappings/convrtrs.txt ...
6 years, 2 months ago (2014-10-07 17:13:21 UTC) #5
jungshik at Google
On 2014/10/07 17:13:21, jsbell wrote: > lgtm; I sanity checked the aliases and some of ...
6 years, 2 months ago (2014-10-13 22:38:12 UTC) #6
jungshik at Google
6 years, 2 months ago (2014-10-13 23:07:08 UTC) #7
Message was sent while issue was closed.
Committed patchset #4 (id:40001) manually as r292447 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698