| Index: README.chromium
|
| diff --git a/README.chromium b/README.chromium
|
| index 6d73e837595a949c45ae8e7387d65fa3656046ff..d1f0fa0d08bb24f4abe2a66a2f8ad0459f73ee32 100644
|
| --- a/README.chromium
|
| +++ b/README.chromium
|
| @@ -12,122 +12,128 @@ create a branch for 54.1 to apply a fix on top of.
|
| Description:
|
| This directory contains the source code of ICU 56.1 for C/C++.
|
|
|
| +A. How to update ICU
|
|
|
| 1. Run "scripts/update.sh <version>" (e.g. 56-1).
|
| + This will download ICU from the upstream svn repository.
|
| + It does preserve Chrome-specific build files (*local.mk) and
|
| + converter files. (see section C)
|
|
|
| -2. Apply locale data patches from Google obtained by diff'ing
|
| - the upstream copy and Google's internal copy for source/data
|
| +2. Update the source file lists for i18n and common
|
| + in icu.gypi and BUILD.gn. See the comments in the files.
|
|
|
| - - patches/locale_google.patch:
|
| - * Google's internal ICU locale changes
|
| - * Simpler region names for Hong Kong and Macau in all locales
|
| - * Currency signs in ru, uk and tr locales
|
| - * AM/PM, midnight, noon formatting for a few Indian locales
|
| - * Timezone name changes in Korean and Chinese locales
|
| +3. Review and apply patches/changes in "D. Local Modifications" if
|
| + necessary/applicable. Update patch files in patches/.
|
|
|
| - - patches/locale1.patch: Minor fixes for Korean
|
| +4. Follow the instructions in section B on building ICU data files
|
|
|
|
|
| -3. Apply post-56 fixes from the upstream for measure/date format bugs
|
| +B. How to build ICU data files
|
|
|
| - - patches/measure_format.patch: combined patch of 12 CLs taken
|
| - from bugs below.
|
| - - upstream bugs
|
| - http://bugs.icu-project.org/trac/ticket/11986
|
| - http://bugs.icu-project.org/trac/ticket/12031
|
| - http://bugs.icu-project.org/trac/ticket/12030
|
| - http://bugs.icu-project.org/trac/ticket/12041
|
|
|
| - - patches/relative_date.patch from Android
|
| - https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
|
| +Pre-built data files are generated and checked in with the following steps
|
|
|
| -3. Breakiterator patches
|
| - - patches/linebrk.patch
|
| - a. Drop *_loose.txt for all locales and use the corresponding normal.txt
|
| - b. Drop local patches we used to have for the following issues. They'll
|
| - be dealt with in the upstream (Unicode/CLDR).
|
| - http://unicode.org/cldr/trac/ticket/6557
|
| - http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
|
| -
|
| - - patches/wordbrk.patch
|
| - * word.txt
|
| - a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
|
| - FQDN labels can be split at '.'
|
| - b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
|
| - See http://unicode.org/cldr/trac/ticket/6555
|
| -
|
| - - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
|
| - and word_POSIX.txt dropped from the build list.
|
| -
|
| - - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
|
| - (source/data/brkitr/khmerdict.txt) obtained from
|
| - http://bugs.icu-project.org/trac/ticket/9451
|
| +1. icu data files for Chrome OS, Linux, Mac and Windows
|
|
|
| - - Add several common Chinese words that were dropped previously to
|
| - source/data/cjdict/brkitr/cjdict.txt
|
| - patch: patches/cjdict.patch
|
| - upstream bug: http://bugs.icu-project.org/trac/ticket/10888
|
| + a. Make a icu data build directory outside the Chromium source tree
|
| + and cd to that directory (say, $ICUBUILDIR).
|
|
|
| + b. Run
|
|
|
| - - android/brkitr.patch (to be applied for Android build only) :
|
| - Do not use the C+J dictionary for Chinese/Japanese segmentation
|
| - to reduce the data size. Adjust word.txt and a few other files.
|
| + ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
|
|
|
| - - source/data/brkitr/word_ja.txt (used only on Android)
|
| - Added for Japanese-specific word-breaking without the C+J dictionary.
|
| + c. Run make
|
| + 'make' will fail when pkgdata looks for css3transform.res. This
|
| + is expected. See http://bugs.icu-project.org/trac/ticket/10570
|
|
|
| -4. Converter changes :
|
| + d. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh
|
|
|
| - - convrtrs.txt : Replaced the original by our own that only lists encodings
|
| - and aliases required by the WHATWG Encoding spec plus a few extra (see
|
| - the file as to why).
|
| + The full locale data for Chrome's UI languages and their select variants
|
| + and the bare minimum locale data for other locales will be kept.
|
|
|
| - - Add source/data/mappings/ucmlocal.txt : to list only converters we need.
|
| + e. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
|
|
|
| - - Add new tables per the WHATWG encoding standards for EUC-JP,
|
| - Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
|
| - They're generated with scripts :
|
| - scripts/{eucjp,sjis,big5,single_byte}_gen.sh
|
| + This will make icudt${version}l.dat and icudt${version}l_dat.S
|
|
|
| - - gb_table.patch
|
| - 1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
|
| - the encoding spec (one-way mapping in toUnicode direction).
|
| - 2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
|
| - from U+1E3F to \xA8\xBC (windows-936/GBK).
|
| - See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
|
| + f. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
|
| +
|
| + This will erase the result of step d (trim_data.sh).
|
| +
|
| + g. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh
|
| +
|
| + This will revert the changes made in source/data by trim_data.sh.
|
| + It will also copy the ICU data file for non-Android platform
|
| + and the corresponding assembly source files for Linux and Mac to
|
| + the following places. Check them in.
|
| +
|
| + source/data/in/icudtl.dat
|
| + source/{linux,mac}/icudtl_dat.S
|
| +
|
| + h. Whenever data is updated (e.g timezone update), follow d ~ g as long
|
| + as the ICU build directory used in a ~ c is kept.
|
| +
|
| +2. icu data files for Android
|
| +
|
| + a. Follow a ~ d for non-Android platforms
|
| + b. Run
|
| +
|
| + ${CHROME_ICU_TREE_TOP}/android/patch_locale.sh
|
| +
|
| + On top of trim_data.sh, further cut the data entries for Android.
|
| +
|
| + c. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
|
| +
|
| + d. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh
|
| +
|
| + and check in the following files.
|
| +
|
| + android/icudtl.dat
|
| + android/icudtl_dat.S
|
|
|
| - - uconv.patch
|
| - http://www.icu-project.org/trac/ticket/11296 (uconv.patch)
|
| + e. Run
|
| + ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
|
|
|
| - It was landed in the upstream and is in 55 RC with the build
|
| - config changed to UCONFIG_ONLY_HTML_CONVERSION.
|
| + This will erase the result of trim_data.sh and patch_locale.sh
|
|
|
| +3. icu data dll for Windows (non-default build option)
|
|
|
| -5. Locale changes
|
| + Follow these steps to build windows/icudt.dll. By default, we set
|
| + icu_use_icu_data_flag to 1 and don't use this file.
|
|
|
| - - Locale build configuration files: To include the full locale data
|
| - for Chrome's UI languages and the minimum locale data for other locales,
|
| - add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
|
| - source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
|
| + a. check out a clean copy of icu56 from the upstream on Windows
|
| + outside the Chrome tree.
|
|
|
| - This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter)
|
| - cuts down the data size by ~ 11MB.
|
| + $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-56-1 ${SEPARATE_ICU_ROOT}/icu56
|
|
|
| - - Run scripts/trim_data.sh : About 2.1MB data size reduction.
|
| + b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
|
| + ${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat
|
| + c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
|
| + ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
|
| + c. In Visual Studio, open source/allinone/allinone.sln solution
|
| + in ${SEPARATE_ICU_ROOT}
|
| + d. Build 'makedata' target
|
| + e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
|
| + f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
|
| + and check that in.
|
| +
|
| +4. Note on the locale data customization
|
| +
|
| + - scripts/trim_data.sh
|
| a. Trim the locale data for Chrome's UI langauges :
|
| - locales, lang, region, currency
|
| + locales, lang, region, currency, zone
|
| b. Trim the locale data for non-UI languages to the bare minimum :
|
| ExemplarCharacters, LocaleScript, layout, and the name of the
|
| language for a locale in its native language.
|
| c. Remove the legacy Chinese character set-based collation
|
| (big5han/gb2312han) that don't make any sense and nobdoy uses.
|
|
|
| - - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
|
| - with the minimal locale data necessary for spellchecker and
|
| - and language menus. Also change the English display name
|
| - for ckb to 'Kurdish (Arabic)'.
|
| -
|
| - - android/patch_locale.sh (to be run for Android build only):
|
| + - android/patch_locale.sh
|
| a. Make changes to source/data/{region,lang} to exclude these data
|
| except the language and script names of zh_Hans and zh_Hant.
|
| b. Remove exemplar cities in timezone data (data/zone).
|
| @@ -137,112 +143,148 @@ This directory contains the source code of ICU 56.1 for C/C++.
|
| is not localized.
|
| f. Also apply android/brkitr.patch
|
|
|
| -6. Timezone data update
|
| - - Grab the latest version of the following timezone data files and
|
| - put them in source/data/misc using scripts/update_tz.sh
|
| + - android/brkitr.patch
|
| + Do not use the C+J dictionary for Chinese/Japanese segmentation
|
| + to reduce the data size. Adjust word.txt and a few other files.
|
|
|
| - metaZones.txt
|
| - timezoneTypes.txt
|
| - windowsZones.txt
|
| - zoneinfo64.txt
|
| +C. Chromium-specific data build files and converters
|
|
|
| - As of Jan 20 2016, the latest version is 2015g and the above files
|
| - are available at
|
| - http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
|
| +They're preserved in step A.1 above. In general, there's no need to touch
|
| +them when updating ICU.
|
|
|
| -7. Transliterator customization
|
| +1. source/data/mappings
|
| + - convrtrs.txt : Lists encodings and aliases required by the WHATWG
|
| + Encoding spec plus a few extra (see the file as to why).
|
|
|
| - - Also add css3transform.txt to source/data/trnslit.
|
| - - Put the following line in trnslocal.mk
|
| + - ucmlocal.txt : to list only converters we need.
|
|
|
| - TRANSLIT_SOURCE=css3transform.txt
|
| + - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
|
| + Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
|
| + They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
|
|
|
| -8. Build-related changes
|
| + - gb18030.ucm and windows-936.ucm
|
| + gb_table.patch was applied for the following changes.
|
| + a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
|
| + the encoding spec (one-way mapping in toUnicode direction).
|
| + b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
|
| + from U+1E3F to \xA8\xBC (windows-936/GBK).
|
| + See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
|
|
|
| - - patches/wpo.patch
|
| - upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
|
| - http://bugs.icu-project.org/trac/ticket/5701
|
| - - patches/vscomp.patch for building with Visual Studio on Windows.
|
| - a. do not use WINDOWS_LOCALE_API in locmap.c
|
| - b. do not redefine stringpiece::npos
|
| +2. source/data/*/*local.mk
|
| + - List locales of interest to Chromium
|
| + a. Chrome's UI languages
|
| + b. Variants of UI languages
|
| + c. Other locales in Accept-Language list : will only have bare minimum
|
| + locale data
|
|
|
| - - patches/data.build.patch :
|
| - Remove unnecessary resources : unames, collator rule source
|
| - - patches/data.build.win.patch :
|
| - Windows-only data build patch.
|
| - - patches/data_symb.patch :
|
| - Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
|
| - the icu data file or icudt.dll
|
| + - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now.
|
|
|
| -9. Pre-built data files are checked in with the following steps on Linux:
|
| +3. source/data/brkitr
|
| + - khmerdict.txt: Abridged Khmer dictionary. See
|
| + http://bugs.icu-project.org/trac/ticket/9451
|
| + - word_ja.txt (used only on Android)
|
| + Added for Japanese-specific word-breaking without the C+J dictionary.
|
|
|
| - a. Make a icu data build directory outside the Chromium source tree
|
| - and cd to that directory, $ICUBUILDIR.
|
| - b. Run
|
| +4. source/data/trnslit/css3transform.txt
|
| + - Handle Greek case conversion with a transliterator
|
|
|
| - ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
|
| +5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
|
| + with the minimal locale data necessary for spellchecker and
|
| + and language menus. Also change the English display name
|
| + for ckb to 'Kurdish (Arabic)'.
|
|
|
| - c. Run 'make'
|
| - d. 'make' will fail when pkgdata looks for css3transform.res.
|
| - See http://bugs.icu-project.org/trac/ticket/10570
|
| - e. run
|
| +D. Local Modifications
|
|
|
| - ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh
|
| +1. Applied locale data patches from Google obtained by diff'ing
|
| + the upstream copy and Google's internal copy for source/data
|
|
|
| - This will make and copy icudtl.dat and icudtl_dat.S for Linux and
|
| - Mac as listed below. Renaming the data/assembly files to drop
|
| - the ICU major version number as well as running make_mac_assembly.sh
|
| - is done by this script.
|
| + - patches/locale_google.patch:
|
| + * Google's internal ICU locale changes
|
| + * Simpler region names for Hong Kong and Macau in all locales
|
| + * Currency signs in ru, uk and tr locales
|
| + * AM/PM, midnight, noon formatting for a few Indian locales
|
| + * Timezone name changes in Korean and Chinese locales
|
|
|
| - This script can be run again whenever you update the data.
|
| + - patches/locale1.patch: Minor fixes for Korean
|
|
|
| - - source/data/in/icudtl.dat : Built on Linux with all the patches
|
| - above applied. icudt54l.dat is generated in
|
| - {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
|
| - version number (54) dropped.
|
|
|
| +2. Applied post-56 fixes from the upstream for measure/date format bugs
|
|
|
| - - {mac,linux}/icudtl_dat.S : Built on Linux with all the
|
| - patches above (except android/brkitr.patch) applied and checked in.
|
| - This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
|
| - icudt54l_dat.S, but '54' is dropped while copying.
|
| + - patches/measure_format.patch: combined patch of 12 CLs taken
|
| + from bugs below.
|
| + - upstream bugs
|
| + http://bugs.icu-project.org/trac/ticket/11986
|
| + http://bugs.icu-project.org/trac/ticket/12031
|
| + http://bugs.icu-project.org/trac/ticket/12030
|
| + http://bugs.icu-project.org/trac/ticket/12041
|
|
|
| - mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
|
| - the header portion. With "linux/icudtl_dat.S" in its place,
|
| + - patches/relative_date.patch from Android
|
| + https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
|
| +
|
| +3. Breakiterator patches
|
| + - patches/linebrk.patch
|
| + a. Drop *_loose.txt for all locales and use the corresponding normal.txt
|
| + b. Drop local patches we used to have for the following issues. They'll
|
| + be dealt with in the upstream (Unicode/CLDR).
|
| + http://unicode.org/cldr/trac/ticket/6557
|
| + http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
|
| +
|
| + - patches/wordbrk.patch for word.txt
|
| + a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
|
| + FQDN labels can be split at '.'
|
| + b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
|
| + See http://unicode.org/cldr/trac/ticket/6555
|
| +
|
| + - patches/khmer-dictbe.patch
|
| + Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
|
| + http://bugs.icu-project.org/trac/ticket/9451
|
| +
|
| + - Add several common Chinese words that were dropped previously to
|
| + source/data/cjdict/brkitr/cjdict.txt
|
| + patch: patches/cjdict.patch
|
| + upstream bug: http://bugs.icu-project.org/trac/ticket/10888
|
|
|
| - - android/icudtl_dat.S : Built on Linux with all the patches above and
|
| - android/patch_locale.sh executed.
|
| - '54' is dropped from the name generated in the build tree.
|
| +4. Timezone data update
|
| + Run scripts/update_tz.sh to grab the latest version of the
|
| + following timezone data files and put them in source/data/misc
|
|
|
| - - android/icudtl.dat : Generated as icudt54l.dat in
|
| - {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and
|
| - copied to the above location with '54' dropped in its name.
|
| + metaZones.txt
|
| + timezoneTypes.txt
|
| + windowsZones.txt
|
| + zoneinfo64.txt
|
|
|
| - - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
|
| - and don't use this file.)
|
| + As of Jan 20 2016, the latest version is 2015g and the above files
|
| + are available at
|
| + http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
|
|
|
| - a. check out a clean copy of icu54 from the upstream on Windows
|
| - outside the Chrome tree.
|
| +5. Build-related changes
|
|
|
| - $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54
|
| + - patches/wpo.patch
|
| + upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
|
| + http://bugs.icu-project.org/trac/ticket/5701
|
| + - patches/vscomp.patch for building with Visual Studio on Windows.
|
| + a. do not use WINDOWS_LOCALE_API in locmap.c
|
| + b. do not redefine stringpiece::npos
|
|
|
| - b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
|
| - ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat
|
| - c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
|
| - ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
|
| - c. In Visual Studio, open source/allinone/allinone.sln solution
|
| - in ${SEPARATE_ICU_ROOT}
|
| - d. Build 'makedata' target
|
| - e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
|
| - f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
|
| - and check that in.
|
| + - patches/data.build.patch :
|
| + Remove unnecessary resources : unames, collator rule source
|
| + - patches/data.build.win.patch :
|
| + Windows-only data build patch.
|
| + - patches/data_symb.patch :
|
| + Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
|
| + the icu data file or icudt.dll
|
|
|
| -15. Apply a timezone detection API fix
|
| +6. Apply a timezone detection API fix
|
| - patches/tzdetect.patch
|
| - upstream bugs
|
| http://bugs.icu-project.org/trac/ticket/11623
|
|
|
| -23. Fix 'bad cast' found in Transliterator with a cfi build
|
| +7. Fix 'bad cast' found in Transliterator with a cfi build
|
| - patches/xlit_badcast.patch
|
| - upstream bug (yet to be resolved)
|
| http://bugs.icu-project.org/trac/ticket/11937
|
| +
|
| +8. TODO: If removing UTF-32 from Blink is more involved than expected,
|
| + add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is
|
| + defined See
|
| + http://www.icu-project.org/trac/ticket/11296
|
|
|