Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(25)

Issue 984233002: Update CJK converters and their generating scripts (Closed)

Created:
5 years, 9 months ago by jungshik at Google
Modified:
5 years, 9 months ago
Reviewers:
Mark Mentovai, jsbell
CC:
chromium-reviews
Base URL:
https://chromium.googlesource.com/chromium/deps/icu.git@master
Target Ref:
refs/heads/master
Visibility:
Public.

Description

Update CJK converters and their generating scripts 1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm instead of windows-949.ucm 2. Tighten up the valid code range for the following converters: EUC-KR, Shift_JIS, Big5 This is to add back an ASCII range byte to the stream per the encoding spec when they're either illegal as a 'trail byte' or there's no assigned code point for a "lead + trail" sequence. For instance, with this change, '0xF3 0x41' in EUC-KR is converted to 'U+FFFD U+0041' instead of 'U+FFFD'. This change requires adding 2 ~ 8 new states to the conversion table of each converter mentioned above leading to 6.5kB net increase in the final data size. 3. Tighten the trail byte range for 2-byte sequences starting with 0x8E from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating script. 4. Change the substitution characters for EUC-JP and Shift_JIS to match other converters. i.e. make them produce U+FFFD when encountering an invalid input. Before this chaange, they emitted U+001A. 5. Enable 'U_CHARSET_IS_UTF8' configuration flag. Chromium/Blink does not rely on ICU for the code conversion between the 'system native encoding' (if it's one of legacy encodings) and Unicode. With this configuration, we can cut down the code size a bit. 6. Update the icudtl.dat (all platforms) and assembly files (mac,linux) and the icudata dll (windows) See https://codereview.chromium.org/1026453002 for a new blink test added ( fast/encoding/char-decoding-invalid-trail.html ) BUG=450312, 430823 TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding* R=jsbell@chromium.org, mark@chromium.org Committed: https://chromium.googlesource.com/chromium/deps/icu/+/dafa8443b5513c074d4ad6869d9e9b1775144ad3

Patch Set 1 #

Patch Set 2 : update the state tables and pre-built data #

Patch Set 3 : update Android data #

Patch Set 4 : update the icudata dll for Windows #

Total comments: 2

Patch Set 5 : tighten euc-jp trail byte for lead 0x8E #

Patch Set 6 : use https #

Patch Set 7 : add EUC-KR to README.chromium #

Unified diffs Side-by-side diffs Delta from patch set Stats (+86 lines, -44 lines) Patch
M README.chromium View 1 2 3 4 5 6 1 chunk +1 line, -1 line 0 comments Download
M android/icudtl.dat View Binary file 0 comments Download
M icu.gyp View 1 2 3 4 5 6 1 chunk +3 lines, -0 lines 0 comments Download
M scripts/big5_gen.sh View 1 2 3 4 5 6 1 chunk +13 lines, -1 line 0 comments Download
M scripts/eucjp_gen.sh View 1 2 3 4 5 6 3 chunks +11 lines, -11 lines 0 comments Download
M scripts/euckr_gen.sh View 1 2 3 4 5 6 2 chunks +6 lines, -2 lines 0 comments Download
M scripts/sjis_gen.sh View 1 2 3 4 5 6 6 chunks +16 lines, -8 lines 0 comments Download
M source/data/in/icudtl.dat View Binary file 0 comments Download
M source/data/mappings/big5-html.ucm View 1 2 3 4 5 6 1 chunk +13 lines, -1 line 0 comments Download
M source/data/mappings/convrtrs.txt View 1 2 3 4 5 6 1 chunk +1 line, -1 line 0 comments Download
M source/data/mappings/euc-jp-html.ucm View 1 2 3 4 5 6 2 chunks +3 lines, -6 lines 0 comments Download
M source/data/mappings/euc-kr-html.ucm View 1 2 3 4 5 6 1 chunk +5 lines, -1 line 0 comments Download
M source/data/mappings/shift_jis-html.ucm View 1 2 3 4 5 6 1 chunk +12 lines, -5 lines 0 comments Download
M source/data/mappings/ucmlocal.mk View 1 2 3 4 5 6 2 chunks +2 lines, -7 lines 0 comments Download
M windows/icudt.dll View Binary file 0 comments Download

Messages

Total messages: 6 (1 generated)
jungshik at Google
Can you take a look? Thanks
5 years, 9 months ago (2015-03-18 21:00:36 UTC) #2
jsbell
lgtm I didn't analyze the state tables in great detail. https://codereview.chromium.org/984233002/diff/30001/scripts/eucjp_gen.sh File scripts/eucjp_gen.sh (right): https://codereview.chromium.org/984233002/diff/30001/scripts/eucjp_gen.sh#newcode105 ...
5 years, 9 months ago (2015-03-18 21:38:32 UTC) #3
Mark Mentovai
LGTM. I agree, use https whenever possible.
5 years, 9 months ago (2015-03-18 22:45:21 UTC) #4
jungshik at Google
On 2015/03/18 22:45:21, Mark Mentovai wrote: > LGTM. > > I agree, use https whenever ...
5 years, 9 months ago (2015-03-19 19:00:32 UTC) #5
jungshik at Google
5 years, 9 months ago (2015-03-19 19:08:37 UTC) #6
Message was sent while issue was closed.
Committed patchset #7 (id:60001) manually as
dafa8443b5513c074d4ad6869d9e9b1775144ad3 (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698