Chromium Code Reviews| Index: README.chromium |
| diff --git a/README.chromium b/README.chromium |
| index 6d73e837595a949c45ae8e7387d65fa3656046ff..4ae2d024aa2b9456203818e59be5a310574472ec 100644 |
| --- a/README.chromium |
| +++ b/README.chromium |
| @@ -12,122 +12,128 @@ create a branch for 54.1 to apply a fix on top of. |
| Description: |
| This directory contains the source code of ICU 56.1 for C/C++. |
| +A. How to update ICU |
| 1. Run "scripts/update.sh <version>" (e.g. 56-1). |
| + This will download the ICU from the upstream svn repository. |
|
Mark Mentovai
2016/01/28 14:23:33
the ICU→ICU
jungshik at Google
2016/01/29 08:11:18
Done.
|
| + It does preserve Chrome-specific build files (*local.mk) and |
| + converter files. (see section C) |
| -2. Apply locale data patches from Google obtained by diff'ing |
| - the upstream copy and Google's internal copy for source/data |
| +2. Update the source file lists for i18n and common |
| + in icu.gypi and BUILD.gn. See the comments in the files. |
| - - patches/locale_google.patch: |
| - * Google's internal ICU locale changes |
| - * Simpler region names for Hong Kong and Macau in all locales |
| - * Currency signs in ru, uk and tr locales |
| - * AM/PM, midnight, noon formatting for a few Indian locales |
| - * Timezone name changes in Korean and Chinese locales |
| +3. Review and apply patches/changes in "D. Local Modifications" if |
| + necessary/applicable. Update patch files in patches/. |
| - - patches/locale1.patch: Minor fixes for Korean |
| +4. Follow the instructions in section B on building ICU data files |
| -3. Apply post-56 fixes from the upstream for measure/date format bugs |
| +B. How to build ICU data files |
| - - patches/measure_format.patch: combined patch of 12 CLs taken |
| - from bugs below. |
| - - upstream bugs |
| - http://bugs.icu-project.org/trac/ticket/11986 |
| - http://bugs.icu-project.org/trac/ticket/12031 |
| - http://bugs.icu-project.org/trac/ticket/12030 |
| - http://bugs.icu-project.org/trac/ticket/12041 |
| - - patches/relative_date.patch from Android |
| - https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21 |
| +Pre-built data files are generated and checked in with the following steps |
| -3. Breakiterator patches |
| - - patches/linebrk.patch |
| - a. Drop *_loose.txt for all locales and use the corresponding normal.txt |
| - b. Drop local patches we used to have for the following issues. They'll |
| - be dealt with in the upstream (Unicode/CLDR). |
| - http://unicode.org/cldr/trac/ticket/6557 |
| - http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) |
| - |
| - - patches/wordbrk.patch |
| - * word.txt |
| - a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
| - FQDN labels can be split at '.' |
| - b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
| - See http://unicode.org/cldr/trac/ticket/6555 |
| - |
| - - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt |
| - and word_POSIX.txt dropped from the build list. |
| - |
| - - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary |
| - (source/data/brkitr/khmerdict.txt) obtained from |
| - http://bugs.icu-project.org/trac/ticket/9451 |
| +1. icu data files for Chrome OS, Linux, Mac and Windows |
| - - Add several common Chinese words that were dropped previously to |
| - source/data/cjdict/brkitr/cjdict.txt |
| - patch: patches/cjdict.patch |
| - upstream bug: http://bugs.icu-project.org/trac/ticket/10888 |
| + a. Make a icu data build directory outside the Chromium source tree |
| + and cd to that directory (say, $ICUBUILDIR). |
| + b. Run |
| - - android/brkitr.patch (to be applied for Android build only) : |
| - Do not use the C+J dictionary for Chinese/Japanese segmentation |
| - to reduce the data size. Adjust word.txt and a few other files. |
| + ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout |
| - - source/data/brkitr/word_ja.txt (used only on Android) |
| - Added for Japanese-specific word-breaking without the C+J dictionary. |
| + c. Run make |
| + 'make' will fail when pkgdata looks for css3transform.res. This |
| + is expected. See http://bugs.icu-project.org/trac/ticket/10570 |
| -4. Converter changes : |
| + d. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh |
| - - convrtrs.txt : Replaced the original by our own that only lists encodings |
| - and aliases required by the WHATWG Encoding spec plus a few extra (see |
| - the file as to why). |
| + The full locale data for Chrome's UI languages and their select variants |
| + and the bare minimum locale data for other locales will be kept. |
| - - Add source/data/mappings/ucmlocal.txt : to list only converters we need. |
| + e. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh |
| - - Add new tables per the WHATWG encoding standards for EUC-JP, |
| - Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. |
| - They're generated with scripts : |
| - scripts/{eucjp,sjis,big5,single_byte}_gen.sh |
| + This will make icudt${version}l.dat and icudt${version}l_dat.S |
| - - gb_table.patch |
| - 1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per |
| - the encoding spec (one-way mapping in toUnicode direction). |
| - 2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map |
| - from U+1E3F to \xA8\xBC (windows-936/GBK). |
| - See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 |
| + f. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh |
| + |
| + This will erase the result of step d (trim_data.sh). |
| + |
| + g. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh |
| + |
| + This will revert the changes made in source/data by trim_data.sh. |
| + It will also copy the ICU data file for non-Android platform |
| + and the corresponding assembly source files for Linux and Mac to |
| + the following places. Check them in. |
| + |
| + source/data/in/icudtl.dat |
| + source/{linux,mac}/icudtl_dat.S |
| + |
| + h. Whenever data is updated (e.g timezone update), follow d ~ g as long |
| + as the ICU build directory used in a ~ c is kept. |
| + |
| +2. icu data files for Android |
| + |
| + a. Follow a ~ d for non-Android platforms |
| + b. Run |
| + |
| + ${CHROME_ICU_TREE_TOP}/android/patch_locale.sh |
| + |
| + On top of trim_data.sh, further cut the data entries for Android. |
| + |
| + c. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh |
| + |
| + d. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh |
| + |
| + and check in the following files. |
| + |
| + android/icudtl.dat |
| + android/icudtl_dat.S |
| + |
| + e. Run |
| + ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh |
| - - uconv.patch |
| - http://www.icu-project.org/trac/ticket/11296 (uconv.patch) |
| + This will erase the result of trim_data.sh and patch_locale.sh |
| - It was landed in the upstream and is in 55 RC with the build |
| - config changed to UCONFIG_ONLY_HTML_CONVERSION. |
| +3. icu data dll for Windows (non-default build option) |
| + Follow these steps to build windows/icudt.dll. By default, we set |
| + icu_use_icu_data_flag to 1 and don't use this file. |
| -5. Locale changes |
| + a. check out a clean copy of icu56 from the upstream on Windows |
| + outside the Chrome tree. |
| - - Locale build configuration files: To include the full locale data |
| - for Chrome's UI languages and the minimum locale data for other locales, |
| - add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to |
| - source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. |
| + $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-56-1 ${SEPARATE_ICU_ROOT}/icu56 |
| - This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter) |
| - cuts down the data size by ~ 11MB. |
| + b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to |
| + ${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat |
| + c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to |
| + ${SEPARATE_ICU_ROOT}/source/data/makedata.mak |
| + c. In Visual Studio, open source/allinone/allinone.sln solution |
| + in ${SEPARATE_ICU_ROOT} |
| + d. Build 'makedata' target |
| + e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin |
| + f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll |
| + and check that in. |
| - - Run scripts/trim_data.sh : About 2.1MB data size reduction. |
| +4. Note on the locale data customization |
| + |
| + - scripts/trim_data.sh |
| a. Trim the locale data for Chrome's UI langauges : |
| - locales, lang, region, currency |
| + locales, lang, region, currency, zone |
| b. Trim the locale data for non-UI languages to the bare minimum : |
| ExemplarCharacters, LocaleScript, layout, and the name of the |
| language for a locale in its native language. |
| c. Remove the legacy Chinese character set-based collation |
| (big5han/gb2312han) that don't make any sense and nobdoy uses. |
| - - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang} |
| - with the minimal locale data necessary for spellchecker and |
| - and language menus. Also change the English display name |
| - for ckb to 'Kurdish (Arabic)'. |
| - |
| - - android/patch_locale.sh (to be run for Android build only): |
| + - android/patch_locale.sh |
| a. Make changes to source/data/{region,lang} to exclude these data |
| except the language and script names of zh_Hans and zh_Hant. |
| b. Remove exemplar cities in timezone data (data/zone). |
| @@ -137,112 +143,150 @@ This directory contains the source code of ICU 56.1 for C/C++. |
| is not localized. |
| f. Also apply android/brkitr.patch |
| -6. Timezone data update |
| - - Grab the latest version of the following timezone data files and |
| - put them in source/data/misc using scripts/update_tz.sh |
| + - android/brkitr.patch |
| + Do not use the C+J dictionary for Chinese/Japanese segmentation |
| + to reduce the data size. Adjust word.txt and a few other files. |
| - metaZones.txt |
| - timezoneTypes.txt |
| - windowsZones.txt |
| - zoneinfo64.txt |
| +C. Chromium-specific data build files and converters |
| - As of Jan 20 2016, the latest version is 2015g and the above files |
| - are available at |
| - http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/ |
| +They're preserved in step A.1 above. In general, there's no need to touch |
| +them when updating ICU. |
| -7. Transliterator customization |
| +1. source/data/mappings |
| + - convrtrs.txt : Lists encodings and aliases required by the WHATWG |
| + Encoding spec plus a few extra (see the file as to why). |
| - - Also add css3transform.txt to source/data/trnslit. |
| - - Put the following line in trnslocal.mk |
| + - ucmlocal.txt : to list only converters we need. |
| - TRANSLIT_SOURCE=css3transform.txt |
| + - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP, |
| + Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. |
| + They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh. |
| -8. Build-related changes |
| + - gb18030.ucm and windows-936.ucm |
| + gb_table.patch was applied for the following changes. |
| + a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per |
| + the encoding spec (one-way mapping in toUnicode direction). |
| + b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map |
| + from U+1E3F to \xA8\xBC (windows-936/GBK). |
| + See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 |
| - - patches/wpo.patch |
| - upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 |
| - http://bugs.icu-project.org/trac/ticket/5701 |
| - - patches/vscomp.patch for building with Visual Studio on Windows. |
| - a. do not use WINDOWS_LOCALE_API in locmap.c |
| - b. do not redefine stringpiece::npos |
| +2. source/data/*/*local.mk |
| + - List locales of interest to Chromium |
| + a. Chrome's UI languages |
| + b. Variants of UI languages |
| + c. Other locales in Accept-Language list : will only have bare minimum |
| + locale data |
| - - patches/data.build.patch : |
| - Remove unnecessary resources : unames, collator rule source |
| - - patches/data.build.win.patch : |
| - Windows-only data build patch. |
| - - patches/data_symb.patch : |
| - Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use |
| - the icu data file or icudt.dll |
| + - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now. |
| -9. Pre-built data files are checked in with the following steps on Linux: |
| +3. source/data/brkitr |
| + - khmerdict.txt: Abridged Khmer dictionary. See |
| + http://bugs.icu-project.org/trac/ticket/9451 |
| + - word_ja.txt (used only on Android) |
| + Added for Japanese-specific word-breaking without the C+J dictionary. |
| - a. Make a icu data build directory outside the Chromium source tree |
| - and cd to that directory, $ICUBUILDIR. |
| - b. Run |
| +4. source/data/trnslit/css3transform.txt |
| + - Handle Greek case conversion with a transliterator |
| - ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout |
| +5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang} |
| + with the minimal locale data necessary for spellchecker and |
| + and language menus. Also change the English display name |
| + for ckb to 'Kurdish (Arabic)'. |
| - c. Run 'make' |
| - d. 'make' will fail when pkgdata looks for css3transform.res. |
| - See http://bugs.icu-project.org/trac/ticket/10570 |
| - e. run |
| +D. Local Modifications |
| - ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh |
| +1. Applied locale data patches from Google obtained by diff'ing |
| + the upstream copy and Google's internal copy for source/data |
| + |
| + - patches/locale_google.patch: |
| + * Google's internal ICU locale changes |
| + * Simpler region names for Hong Kong and Macau in all locales |
| + * Currency signs in ru, uk and tr locales |
| + * AM/PM, midnight, noon formatting for a few Indian locales |
| + * Timezone name changes in Korean and Chinese locales |
| + |
| + - patches/locale1.patch: Minor fixes for Korean |
| - This will make and copy icudtl.dat and icudtl_dat.S for Linux and |
| - Mac as listed below. Renaming the data/assembly files to drop |
| - the ICU major version number as well as running make_mac_assembly.sh |
| - is done by this script. |
| - This script can be run again whenever you update the data. |
| +2. Applied post-56 fixes from the upstream for measure/date format bugs |
| - - source/data/in/icudtl.dat : Built on Linux with all the patches |
| - above applied. icudt54l.dat is generated in |
| - {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a |
| - version number (54) dropped. |
| + - patches/measure_format.patch: combined patch of 12 CLs taken |
| + from bugs below. |
| + - upstream bugs |
| + http://bugs.icu-project.org/trac/ticket/11986 |
| + http://bugs.icu-project.org/trac/ticket/12031 |
| + http://bugs.icu-project.org/trac/ticket/12030 |
| + http://bugs.icu-project.org/trac/ticket/12041 |
| + - patches/relative_date.patch from Android |
| + https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21 |
| - - {mac,linux}/icudtl_dat.S : Built on Linux with all the |
| - patches above (except android/brkitr.patch) applied and checked in. |
| - This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as |
| - icudt54l_dat.S, but '54' is dropped while copying. |
| +3. Breakiterator patches |
| + - patches/linebrk.patch |
| + a. Drop *_loose.txt for all locales and use the corresponding normal.txt |
| + b. Drop local patches we used to have for the following issues. They'll |
| + be dealt with in the upstream (Unicode/CLDR). |
| + http://unicode.org/cldr/trac/ticket/6557 |
| + http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) |
| + |
| + - patches/wordbrk.patch for word.txt |
| + a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
| + FQDN labels can be split at '.' |
| + b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
| + See http://unicode.org/cldr/trac/ticket/6555 |
| + |
| + - patches/khmer-dictbe.patch |
| + Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt). |
| + http://bugs.icu-project.org/trac/ticket/9451 |
| - mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for |
| - the header portion. With "linux/icudtl_dat.S" in its place, |
| + - Add several common Chinese words that were dropped previously to |
| + source/data/cjdict/brkitr/cjdict.txt |
| + patch: patches/cjdict.patch |
| + upstream bug: http://bugs.icu-project.org/trac/ticket/10888 |
| - - android/icudtl_dat.S : Built on Linux with all the patches above and |
| - android/patch_locale.sh executed. |
| - '54' is dropped from the name generated in the build tree. |
| +4. Timezone data update |
| + Run scripts/update_tz.sh to grab the latest version of the |
| + following timezone data files and put them in source/data/misc |
| - - android/icudtl.dat : Generated as icudt54l.dat in |
| - {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and |
| - copied to the above location with '54' dropped in its name. |
| + metaZones.txt |
| + timezoneTypes.txt |
| + windowsZones.txt |
| + zoneinfo64.txt |
| - - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1 |
| - and don't use this file.) |
| + As of Jan 20 2016, the latest version is 2015g and the above files |
| + are available at |
| + http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/ |
| - a. check out a clean copy of icu54 from the upstream on Windows |
| - outside the Chrome tree. |
| +5. Build-related changes |
| - $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54 |
| + - patches/wpo.patch |
| + upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 |
| + http://bugs.icu-project.org/trac/ticket/5701 |
| + - patches/vscomp.patch for building with Visual Studio on Windows. |
| + a. do not use WINDOWS_LOCALE_API in locmap.c |
| + b. do not redefine stringpiece::npos |
| - b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to |
| - ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat |
| - c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to |
| - ${SEPARATE_ICU_ROOT}/source/data/makedata.mak |
| - c. In Visual Studio, open source/allinone/allinone.sln solution |
| - in ${SEPARATE_ICU_ROOT} |
| - d. Build 'makedata' target |
| - e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin |
| - f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll |
| - and check that in. |
| + - patches/data.build.patch : |
| + Remove unnecessary resources : unames, collator rule source |
| + - patches/data.build.win.patch : |
| + Windows-only data build patch. |
| + - patches/data_symb.patch : |
| + Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use |
| + the icu data file or icudt.dll |
| -15. Apply a timezone detection API fix |
| +6. Apply a timezone detection API fix |
| - patches/tzdetect.patch |
| - upstream bugs |
| http://bugs.icu-project.org/trac/ticket/11623 |
| -23. Fix 'bad cast' found in Transliterator with a cfi build |
| +7. Fix 'bad cast' found in Transliterator with a cfi build |
| - patches/xlit_badcast.patch |
| - upstream bug (yet to be resolved) |
| http://bugs.icu-project.org/trac/ticket/11937 |
| + |
| +8. TODO: If removing UTF-32 from Blink is more involved than expected, |
| + add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is |
| + defined See |
| + http://www.icu-project.org/trac/ticket/11296 |
| + |
|
Mark Mentovai
2016/01/28 14:23:33
You don’t need these blank lines at EOF.
jungshik at Google
2016/01/29 08:11:18
Done.
|
| + |