DescriptionUpdate CJK converters and their generating scripts
1. Update ucmlocal.mk and convertrs.txt to refer to euc-kr-html.ucm
instead of windows-949.ucm
2. Tighten up the valid code range for the following converters:
EUC-KR, Shift_JIS, Big5
This is to add back an ASCII range byte to the stream per
the encoding spec when they're either illegal as a 'trail byte' or
there's no assigned code point for a "lead + trail" sequence.
For instance, with this change, '0xF3 0x41' in EUC-KR is converted to
'U+FFFD U+0041' instead of 'U+FFFD'.
This change requires adding 2 ~ 8 new states to the conversion
table of each converter mentioned above leading to 6.5kB net increase
in the final data size.
3. Tighten the trail byte range for 2-byte sequences starting with 0x8E
from [A1,E2] to [A1,DF] in EUC-JP and update the corresponding generating
script.
4. Change the substitution characters for EUC-JP and Shift_JIS to
match other converters. i.e. make them produce U+FFFD when encountering
an invalid input. Before this chaange, they emitted U+001A.
5. Enable 'U_CHARSET_IS_UTF8' configuration flag.
Chromium/Blink does not rely on ICU for the code conversion between
the 'system native encoding' (if it's one of legacy encodings)
and Unicode. With this configuration, we can cut down the code size
a bit.
6. Update the icudtl.dat (all platforms) and assembly files (mac,linux)
and the icudata dll (windows)
See https://codereview.chromium.org/1026453002 for a new blink test
added ( fast/encoding/char-decoding-invalid-trail.html )
BUG=450312, 430823
TEST=Blink: fast/encoding/char-decoding-{truncated,invalid-trail}.html
TEST=base_unittests --gtest_filter=*Conv*, browser_tests --gtest_filter=*ncoding*
R=jsbell@chromium.org, mark@chromium.org
Committed: https://chromium.googlesource.com/chromium/deps/icu/+/dafa8443b5513c074d4ad6869d9e9b1775144ad3
Patch Set 1 #Patch Set 2 : update the state tables and pre-built data #Patch Set 3 : update Android data #Patch Set 4 : update the icudata dll for Windows #
Total comments: 2
Patch Set 5 : tighten euc-jp trail byte for lead 0x8E #Patch Set 6 : use https #Patch Set 7 : add EUC-KR to README.chromium #
Created: 5 years, 9 months ago
(Patch set is too large to download)
Messages
Total messages: 6 (1 generated)
|