Index: README.chromium |
diff --git a/README.chromium b/README.chromium |
index 6d73e837595a949c45ae8e7387d65fa3656046ff..d1f0fa0d08bb24f4abe2a66a2f8ad0459f73ee32 100644 |
--- a/README.chromium |
+++ b/README.chromium |
@@ -12,122 +12,128 @@ create a branch for 54.1 to apply a fix on top of. |
Description: |
This directory contains the source code of ICU 56.1 for C/C++. |
+A. How to update ICU |
1. Run "scripts/update.sh <version>" (e.g. 56-1). |
+ This will download ICU from the upstream svn repository. |
+ It does preserve Chrome-specific build files (*local.mk) and |
+ converter files. (see section C) |
-2. Apply locale data patches from Google obtained by diff'ing |
- the upstream copy and Google's internal copy for source/data |
+2. Update the source file lists for i18n and common |
+ in icu.gypi and BUILD.gn. See the comments in the files. |
- - patches/locale_google.patch: |
- * Google's internal ICU locale changes |
- * Simpler region names for Hong Kong and Macau in all locales |
- * Currency signs in ru, uk and tr locales |
- * AM/PM, midnight, noon formatting for a few Indian locales |
- * Timezone name changes in Korean and Chinese locales |
+3. Review and apply patches/changes in "D. Local Modifications" if |
+ necessary/applicable. Update patch files in patches/. |
- - patches/locale1.patch: Minor fixes for Korean |
+4. Follow the instructions in section B on building ICU data files |
-3. Apply post-56 fixes from the upstream for measure/date format bugs |
+B. How to build ICU data files |
- - patches/measure_format.patch: combined patch of 12 CLs taken |
- from bugs below. |
- - upstream bugs |
- http://bugs.icu-project.org/trac/ticket/11986 |
- http://bugs.icu-project.org/trac/ticket/12031 |
- http://bugs.icu-project.org/trac/ticket/12030 |
- http://bugs.icu-project.org/trac/ticket/12041 |
- - patches/relative_date.patch from Android |
- https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21 |
+Pre-built data files are generated and checked in with the following steps |
-3. Breakiterator patches |
- - patches/linebrk.patch |
- a. Drop *_loose.txt for all locales and use the corresponding normal.txt |
- b. Drop local patches we used to have for the following issues. They'll |
- be dealt with in the upstream (Unicode/CLDR). |
- http://unicode.org/cldr/trac/ticket/6557 |
- http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) |
- |
- - patches/wordbrk.patch |
- * word.txt |
- a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
- FQDN labels can be split at '.' |
- b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
- See http://unicode.org/cldr/trac/ticket/6555 |
- |
- - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt |
- and word_POSIX.txt dropped from the build list. |
- |
- - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary |
- (source/data/brkitr/khmerdict.txt) obtained from |
- http://bugs.icu-project.org/trac/ticket/9451 |
+1. icu data files for Chrome OS, Linux, Mac and Windows |
- - Add several common Chinese words that were dropped previously to |
- source/data/cjdict/brkitr/cjdict.txt |
- patch: patches/cjdict.patch |
- upstream bug: http://bugs.icu-project.org/trac/ticket/10888 |
+ a. Make a icu data build directory outside the Chromium source tree |
+ and cd to that directory (say, $ICUBUILDIR). |
+ b. Run |
- - android/brkitr.patch (to be applied for Android build only) : |
- Do not use the C+J dictionary for Chinese/Japanese segmentation |
- to reduce the data size. Adjust word.txt and a few other files. |
+ ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout |
- - source/data/brkitr/word_ja.txt (used only on Android) |
- Added for Japanese-specific word-breaking without the C+J dictionary. |
+ c. Run make |
+ 'make' will fail when pkgdata looks for css3transform.res. This |
+ is expected. See http://bugs.icu-project.org/trac/ticket/10570 |
-4. Converter changes : |
+ d. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh |
- - convrtrs.txt : Replaced the original by our own that only lists encodings |
- and aliases required by the WHATWG Encoding spec plus a few extra (see |
- the file as to why). |
+ The full locale data for Chrome's UI languages and their select variants |
+ and the bare minimum locale data for other locales will be kept. |
- - Add source/data/mappings/ucmlocal.txt : to list only converters we need. |
+ e. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh |
- - Add new tables per the WHATWG encoding standards for EUC-JP, |
- Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. |
- They're generated with scripts : |
- scripts/{eucjp,sjis,big5,single_byte}_gen.sh |
+ This will make icudt${version}l.dat and icudt${version}l_dat.S |
- - gb_table.patch |
- 1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per |
- the encoding spec (one-way mapping in toUnicode direction). |
- 2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map |
- from U+1E3F to \xA8\xBC (windows-936/GBK). |
- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 |
+ f. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh |
+ |
+ This will erase the result of step d (trim_data.sh). |
+ |
+ g. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh |
+ |
+ This will revert the changes made in source/data by trim_data.sh. |
+ It will also copy the ICU data file for non-Android platform |
+ and the corresponding assembly source files for Linux and Mac to |
+ the following places. Check them in. |
+ |
+ source/data/in/icudtl.dat |
+ source/{linux,mac}/icudtl_dat.S |
+ |
+ h. Whenever data is updated (e.g timezone update), follow d ~ g as long |
+ as the ICU build directory used in a ~ c is kept. |
+ |
+2. icu data files for Android |
+ |
+ a. Follow a ~ d for non-Android platforms |
+ b. Run |
+ |
+ ${CHROME_ICU_TREE_TOP}/android/patch_locale.sh |
+ |
+ On top of trim_data.sh, further cut the data entries for Android. |
+ |
+ c. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh |
+ |
+ d. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh |
+ |
+ and check in the following files. |
+ |
+ android/icudtl.dat |
+ android/icudtl_dat.S |
- - uconv.patch |
- http://www.icu-project.org/trac/ticket/11296 (uconv.patch) |
+ e. Run |
+ ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh |
- It was landed in the upstream and is in 55 RC with the build |
- config changed to UCONFIG_ONLY_HTML_CONVERSION. |
+ This will erase the result of trim_data.sh and patch_locale.sh |
+3. icu data dll for Windows (non-default build option) |
-5. Locale changes |
+ Follow these steps to build windows/icudt.dll. By default, we set |
+ icu_use_icu_data_flag to 1 and don't use this file. |
- - Locale build configuration files: To include the full locale data |
- for Chrome's UI languages and the minimum locale data for other locales, |
- add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to |
- source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}. |
+ a. check out a clean copy of icu56 from the upstream on Windows |
+ outside the Chrome tree. |
- This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter) |
- cuts down the data size by ~ 11MB. |
+ $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-56-1 ${SEPARATE_ICU_ROOT}/icu56 |
- - Run scripts/trim_data.sh : About 2.1MB data size reduction. |
+ b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to |
+ ${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat |
+ c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to |
+ ${SEPARATE_ICU_ROOT}/source/data/makedata.mak |
+ c. In Visual Studio, open source/allinone/allinone.sln solution |
+ in ${SEPARATE_ICU_ROOT} |
+ d. Build 'makedata' target |
+ e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin |
+ f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll |
+ and check that in. |
+ |
+4. Note on the locale data customization |
+ |
+ - scripts/trim_data.sh |
a. Trim the locale data for Chrome's UI langauges : |
- locales, lang, region, currency |
+ locales, lang, region, currency, zone |
b. Trim the locale data for non-UI languages to the bare minimum : |
ExemplarCharacters, LocaleScript, layout, and the name of the |
language for a locale in its native language. |
c. Remove the legacy Chinese character set-based collation |
(big5han/gb2312han) that don't make any sense and nobdoy uses. |
- - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang} |
- with the minimal locale data necessary for spellchecker and |
- and language menus. Also change the English display name |
- for ckb to 'Kurdish (Arabic)'. |
- |
- - android/patch_locale.sh (to be run for Android build only): |
+ - android/patch_locale.sh |
a. Make changes to source/data/{region,lang} to exclude these data |
except the language and script names of zh_Hans and zh_Hant. |
b. Remove exemplar cities in timezone data (data/zone). |
@@ -137,112 +143,148 @@ This directory contains the source code of ICU 56.1 for C/C++. |
is not localized. |
f. Also apply android/brkitr.patch |
-6. Timezone data update |
- - Grab the latest version of the following timezone data files and |
- put them in source/data/misc using scripts/update_tz.sh |
+ - android/brkitr.patch |
+ Do not use the C+J dictionary for Chinese/Japanese segmentation |
+ to reduce the data size. Adjust word.txt and a few other files. |
- metaZones.txt |
- timezoneTypes.txt |
- windowsZones.txt |
- zoneinfo64.txt |
+C. Chromium-specific data build files and converters |
- As of Jan 20 2016, the latest version is 2015g and the above files |
- are available at |
- http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/ |
+They're preserved in step A.1 above. In general, there's no need to touch |
+them when updating ICU. |
-7. Transliterator customization |
+1. source/data/mappings |
+ - convrtrs.txt : Lists encodings and aliases required by the WHATWG |
+ Encoding spec plus a few extra (see the file as to why). |
- - Also add css3transform.txt to source/data/trnslit. |
- - Put the following line in trnslocal.mk |
+ - ucmlocal.txt : to list only converters we need. |
- TRANSLIT_SOURCE=css3transform.txt |
+ - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP, |
+ Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. |
+ They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh. |
-8. Build-related changes |
+ - gb18030.ucm and windows-936.ucm |
+ gb_table.patch was applied for the following changes. |
+ a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per |
+ the encoding spec (one-way mapping in toUnicode direction). |
+ b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map |
+ from U+1E3F to \xA8\xBC (windows-936/GBK). |
+ See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3 |
- - patches/wpo.patch |
- upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 |
- http://bugs.icu-project.org/trac/ticket/5701 |
- - patches/vscomp.patch for building with Visual Studio on Windows. |
- a. do not use WINDOWS_LOCALE_API in locmap.c |
- b. do not redefine stringpiece::npos |
+2. source/data/*/*local.mk |
+ - List locales of interest to Chromium |
+ a. Chrome's UI languages |
+ b. Variants of UI languages |
+ c. Other locales in Accept-Language list : will only have bare minimum |
+ locale data |
- - patches/data.build.patch : |
- Remove unnecessary resources : unames, collator rule source |
- - patches/data.build.win.patch : |
- Windows-only data build patch. |
- - patches/data_symb.patch : |
- Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use |
- the icu data file or icudt.dll |
+ - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now. |
-9. Pre-built data files are checked in with the following steps on Linux: |
+3. source/data/brkitr |
+ - khmerdict.txt: Abridged Khmer dictionary. See |
+ http://bugs.icu-project.org/trac/ticket/9451 |
+ - word_ja.txt (used only on Android) |
+ Added for Japanese-specific word-breaking without the C+J dictionary. |
- a. Make a icu data build directory outside the Chromium source tree |
- and cd to that directory, $ICUBUILDIR. |
- b. Run |
+4. source/data/trnslit/css3transform.txt |
+ - Handle Greek case conversion with a transliterator |
- ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout |
+5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang} |
+ with the minimal locale data necessary for spellchecker and |
+ and language menus. Also change the English display name |
+ for ckb to 'Kurdish (Arabic)'. |
- c. Run 'make' |
- d. 'make' will fail when pkgdata looks for css3transform.res. |
- See http://bugs.icu-project.org/trac/ticket/10570 |
- e. run |
+D. Local Modifications |
- ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh |
+1. Applied locale data patches from Google obtained by diff'ing |
+ the upstream copy and Google's internal copy for source/data |
- This will make and copy icudtl.dat and icudtl_dat.S for Linux and |
- Mac as listed below. Renaming the data/assembly files to drop |
- the ICU major version number as well as running make_mac_assembly.sh |
- is done by this script. |
+ - patches/locale_google.patch: |
+ * Google's internal ICU locale changes |
+ * Simpler region names for Hong Kong and Macau in all locales |
+ * Currency signs in ru, uk and tr locales |
+ * AM/PM, midnight, noon formatting for a few Indian locales |
+ * Timezone name changes in Korean and Chinese locales |
- This script can be run again whenever you update the data. |
+ - patches/locale1.patch: Minor fixes for Korean |
- - source/data/in/icudtl.dat : Built on Linux with all the patches |
- above applied. icudt54l.dat is generated in |
- {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a |
- version number (54) dropped. |
+2. Applied post-56 fixes from the upstream for measure/date format bugs |
- - {mac,linux}/icudtl_dat.S : Built on Linux with all the |
- patches above (except android/brkitr.patch) applied and checked in. |
- This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as |
- icudt54l_dat.S, but '54' is dropped while copying. |
+ - patches/measure_format.patch: combined patch of 12 CLs taken |
+ from bugs below. |
+ - upstream bugs |
+ http://bugs.icu-project.org/trac/ticket/11986 |
+ http://bugs.icu-project.org/trac/ticket/12031 |
+ http://bugs.icu-project.org/trac/ticket/12030 |
+ http://bugs.icu-project.org/trac/ticket/12041 |
- mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for |
- the header portion. With "linux/icudtl_dat.S" in its place, |
+ - patches/relative_date.patch from Android |
+ https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21 |
+ |
+3. Breakiterator patches |
+ - patches/linebrk.patch |
+ a. Drop *_loose.txt for all locales and use the corresponding normal.txt |
+ b. Drop local patches we used to have for the following issues. They'll |
+ be dealt with in the upstream (Unicode/CLDR). |
+ http://unicode.org/cldr/trac/ticket/6557 |
+ http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779) |
+ |
+ - patches/wordbrk.patch for word.txt |
+ a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
+ FQDN labels can be split at '.' |
+ b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
+ See http://unicode.org/cldr/trac/ticket/6555 |
+ |
+ - patches/khmer-dictbe.patch |
+ Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt). |
+ http://bugs.icu-project.org/trac/ticket/9451 |
+ |
+ - Add several common Chinese words that were dropped previously to |
+ source/data/cjdict/brkitr/cjdict.txt |
+ patch: patches/cjdict.patch |
+ upstream bug: http://bugs.icu-project.org/trac/ticket/10888 |
- - android/icudtl_dat.S : Built on Linux with all the patches above and |
- android/patch_locale.sh executed. |
- '54' is dropped from the name generated in the build tree. |
+4. Timezone data update |
+ Run scripts/update_tz.sh to grab the latest version of the |
+ following timezone data files and put them in source/data/misc |
- - android/icudtl.dat : Generated as icudt54l.dat in |
- {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and |
- copied to the above location with '54' dropped in its name. |
+ metaZones.txt |
+ timezoneTypes.txt |
+ windowsZones.txt |
+ zoneinfo64.txt |
- - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1 |
- and don't use this file.) |
+ As of Jan 20 2016, the latest version is 2015g and the above files |
+ are available at |
+ http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/ |
- a. check out a clean copy of icu54 from the upstream on Windows |
- outside the Chrome tree. |
+5. Build-related changes |
- $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54 |
+ - patches/wpo.patch |
+ upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 |
+ http://bugs.icu-project.org/trac/ticket/5701 |
+ - patches/vscomp.patch for building with Visual Studio on Windows. |
+ a. do not use WINDOWS_LOCALE_API in locmap.c |
+ b. do not redefine stringpiece::npos |
- b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to |
- ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat |
- c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to |
- ${SEPARATE_ICU_ROOT}/source/data/makedata.mak |
- c. In Visual Studio, open source/allinone/allinone.sln solution |
- in ${SEPARATE_ICU_ROOT} |
- d. Build 'makedata' target |
- e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin |
- f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll |
- and check that in. |
+ - patches/data.build.patch : |
+ Remove unnecessary resources : unames, collator rule source |
+ - patches/data.build.win.patch : |
+ Windows-only data build patch. |
+ - patches/data_symb.patch : |
+ Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use |
+ the icu data file or icudt.dll |
-15. Apply a timezone detection API fix |
+6. Apply a timezone detection API fix |
- patches/tzdetect.patch |
- upstream bugs |
http://bugs.icu-project.org/trac/ticket/11623 |
-23. Fix 'bad cast' found in Transliterator with a cfi build |
+7. Fix 'bad cast' found in Transliterator with a cfi build |
- patches/xlit_badcast.patch |
- upstream bug (yet to be resolved) |
http://bugs.icu-project.org/trac/ticket/11937 |
+ |
+8. TODO: If removing UTF-32 from Blink is more involved than expected, |
+ add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is |
+ defined See |
+ http://www.icu-project.org/trac/ticket/11296 |