Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(196)

Unified Diff: README.chromium

Issue 1639543006: ICU 56 step 6: Check in the pre-built ICU data (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/icu.git@56local_patches
Patch Set: address review comments Created 4 years, 11 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « BUILD.gn ('k') | android/brkitr.patch » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: README.chromium
diff --git a/README.chromium b/README.chromium
index 6d73e837595a949c45ae8e7387d65fa3656046ff..d1f0fa0d08bb24f4abe2a66a2f8ad0459f73ee32 100644
--- a/README.chromium
+++ b/README.chromium
@@ -12,122 +12,128 @@ create a branch for 54.1 to apply a fix on top of.
Description:
This directory contains the source code of ICU 56.1 for C/C++.
+A. How to update ICU
1. Run "scripts/update.sh <version>" (e.g. 56-1).
+ This will download ICU from the upstream svn repository.
+ It does preserve Chrome-specific build files (*local.mk) and
+ converter files. (see section C)
-2. Apply locale data patches from Google obtained by diff'ing
- the upstream copy and Google's internal copy for source/data
+2. Update the source file lists for i18n and common
+ in icu.gypi and BUILD.gn. See the comments in the files.
- - patches/locale_google.patch:
- * Google's internal ICU locale changes
- * Simpler region names for Hong Kong and Macau in all locales
- * Currency signs in ru, uk and tr locales
- * AM/PM, midnight, noon formatting for a few Indian locales
- * Timezone name changes in Korean and Chinese locales
+3. Review and apply patches/changes in "D. Local Modifications" if
+ necessary/applicable. Update patch files in patches/.
- - patches/locale1.patch: Minor fixes for Korean
+4. Follow the instructions in section B on building ICU data files
-3. Apply post-56 fixes from the upstream for measure/date format bugs
+B. How to build ICU data files
- - patches/measure_format.patch: combined patch of 12 CLs taken
- from bugs below.
- - upstream bugs
- http://bugs.icu-project.org/trac/ticket/11986
- http://bugs.icu-project.org/trac/ticket/12031
- http://bugs.icu-project.org/trac/ticket/12030
- http://bugs.icu-project.org/trac/ticket/12041
- - patches/relative_date.patch from Android
- https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
+Pre-built data files are generated and checked in with the following steps
-3. Breakiterator patches
- - patches/linebrk.patch
- a. Drop *_loose.txt for all locales and use the corresponding normal.txt
- b. Drop local patches we used to have for the following issues. They'll
- be dealt with in the upstream (Unicode/CLDR).
- http://unicode.org/cldr/trac/ticket/6557
- http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
-
- - patches/wordbrk.patch
- * word.txt
- a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
- FQDN labels can be split at '.'
- b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
- See http://unicode.org/cldr/trac/ticket/6555
-
- - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
- and word_POSIX.txt dropped from the build list.
-
- - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
- (source/data/brkitr/khmerdict.txt) obtained from
- http://bugs.icu-project.org/trac/ticket/9451
+1. icu data files for Chrome OS, Linux, Mac and Windows
- - Add several common Chinese words that were dropped previously to
- source/data/cjdict/brkitr/cjdict.txt
- patch: patches/cjdict.patch
- upstream bug: http://bugs.icu-project.org/trac/ticket/10888
+ a. Make a icu data build directory outside the Chromium source tree
+ and cd to that directory (say, $ICUBUILDIR).
+ b. Run
- - android/brkitr.patch (to be applied for Android build only) :
- Do not use the C+J dictionary for Chinese/Japanese segmentation
- to reduce the data size. Adjust word.txt and a few other files.
+ ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
- - source/data/brkitr/word_ja.txt (used only on Android)
- Added for Japanese-specific word-breaking without the C+J dictionary.
+ c. Run make
+ 'make' will fail when pkgdata looks for css3transform.res. This
+ is expected. See http://bugs.icu-project.org/trac/ticket/10570
-4. Converter changes :
+ d. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh
- - convrtrs.txt : Replaced the original by our own that only lists encodings
- and aliases required by the WHATWG Encoding spec plus a few extra (see
- the file as to why).
+ The full locale data for Chrome's UI languages and their select variants
+ and the bare minimum locale data for other locales will be kept.
- - Add source/data/mappings/ucmlocal.txt : to list only converters we need.
+ e. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
- - Add new tables per the WHATWG encoding standards for EUC-JP,
- Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
- They're generated with scripts :
- scripts/{eucjp,sjis,big5,single_byte}_gen.sh
+ This will make icudt${version}l.dat and icudt${version}l_dat.S
- - gb_table.patch
- 1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
- the encoding spec (one-way mapping in toUnicode direction).
- 2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
- from U+1E3F to \xA8\xBC (windows-936/GBK).
- See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
+ f. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
+
+ This will erase the result of step d (trim_data.sh).
+
+ g. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh
+
+ This will revert the changes made in source/data by trim_data.sh.
+ It will also copy the ICU data file for non-Android platform
+ and the corresponding assembly source files for Linux and Mac to
+ the following places. Check them in.
+
+ source/data/in/icudtl.dat
+ source/{linux,mac}/icudtl_dat.S
+
+ h. Whenever data is updated (e.g timezone update), follow d ~ g as long
+ as the ICU build directory used in a ~ c is kept.
+
+2. icu data files for Android
+
+ a. Follow a ~ d for non-Android platforms
+ b. Run
+
+ ${CHROME_ICU_TREE_TOP}/android/patch_locale.sh
+
+ On top of trim_data.sh, further cut the data entries for Android.
+
+ c. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
+
+ d. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh
+
+ and check in the following files.
+
+ android/icudtl.dat
+ android/icudtl_dat.S
- - uconv.patch
- http://www.icu-project.org/trac/ticket/11296 (uconv.patch)
+ e. Run
+ ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
- It was landed in the upstream and is in 55 RC with the build
- config changed to UCONFIG_ONLY_HTML_CONVERSION.
+ This will erase the result of trim_data.sh and patch_locale.sh
+3. icu data dll for Windows (non-default build option)
-5. Locale changes
+ Follow these steps to build windows/icudt.dll. By default, we set
+ icu_use_icu_data_flag to 1 and don't use this file.
- - Locale build configuration files: To include the full locale data
- for Chrome's UI languages and the minimum locale data for other locales,
- add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
- source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
+ a. check out a clean copy of icu56 from the upstream on Windows
+ outside the Chrome tree.
- This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter)
- cuts down the data size by ~ 11MB.
+ $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-56-1 ${SEPARATE_ICU_ROOT}/icu56
- - Run scripts/trim_data.sh : About 2.1MB data size reduction.
+ b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
+ ${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat
+ c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
+ ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
+ c. In Visual Studio, open source/allinone/allinone.sln solution
+ in ${SEPARATE_ICU_ROOT}
+ d. Build 'makedata' target
+ e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
+ f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
+ and check that in.
+
+4. Note on the locale data customization
+
+ - scripts/trim_data.sh
a. Trim the locale data for Chrome's UI langauges :
- locales, lang, region, currency
+ locales, lang, region, currency, zone
b. Trim the locale data for non-UI languages to the bare minimum :
ExemplarCharacters, LocaleScript, layout, and the name of the
language for a locale in its native language.
c. Remove the legacy Chinese character set-based collation
(big5han/gb2312han) that don't make any sense and nobdoy uses.
- - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
- with the minimal locale data necessary for spellchecker and
- and language menus. Also change the English display name
- for ckb to 'Kurdish (Arabic)'.
-
- - android/patch_locale.sh (to be run for Android build only):
+ - android/patch_locale.sh
a. Make changes to source/data/{region,lang} to exclude these data
except the language and script names of zh_Hans and zh_Hant.
b. Remove exemplar cities in timezone data (data/zone).
@@ -137,112 +143,148 @@ This directory contains the source code of ICU 56.1 for C/C++.
is not localized.
f. Also apply android/brkitr.patch
-6. Timezone data update
- - Grab the latest version of the following timezone data files and
- put them in source/data/misc using scripts/update_tz.sh
+ - android/brkitr.patch
+ Do not use the C+J dictionary for Chinese/Japanese segmentation
+ to reduce the data size. Adjust word.txt and a few other files.
- metaZones.txt
- timezoneTypes.txt
- windowsZones.txt
- zoneinfo64.txt
+C. Chromium-specific data build files and converters
- As of Jan 20 2016, the latest version is 2015g and the above files
- are available at
- http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
+They're preserved in step A.1 above. In general, there's no need to touch
+them when updating ICU.
-7. Transliterator customization
+1. source/data/mappings
+ - convrtrs.txt : Lists encodings and aliases required by the WHATWG
+ Encoding spec plus a few extra (see the file as to why).
- - Also add css3transform.txt to source/data/trnslit.
- - Put the following line in trnslocal.mk
+ - ucmlocal.txt : to list only converters we need.
- TRANSLIT_SOURCE=css3transform.txt
+ - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
+ Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
+ They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
-8. Build-related changes
+ - gb18030.ucm and windows-936.ucm
+ gb_table.patch was applied for the following changes.
+ a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
+ the encoding spec (one-way mapping in toUnicode direction).
+ b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
+ from U+1E3F to \xA8\xBC (windows-936/GBK).
+ See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
- - patches/wpo.patch
- upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
- http://bugs.icu-project.org/trac/ticket/5701
- - patches/vscomp.patch for building with Visual Studio on Windows.
- a. do not use WINDOWS_LOCALE_API in locmap.c
- b. do not redefine stringpiece::npos
+2. source/data/*/*local.mk
+ - List locales of interest to Chromium
+ a. Chrome's UI languages
+ b. Variants of UI languages
+ c. Other locales in Accept-Language list : will only have bare minimum
+ locale data
- - patches/data.build.patch :
- Remove unnecessary resources : unames, collator rule source
- - patches/data.build.win.patch :
- Windows-only data build patch.
- - patches/data_symb.patch :
- Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
- the icu data file or icudt.dll
+ - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now.
-9. Pre-built data files are checked in with the following steps on Linux:
+3. source/data/brkitr
+ - khmerdict.txt: Abridged Khmer dictionary. See
+ http://bugs.icu-project.org/trac/ticket/9451
+ - word_ja.txt (used only on Android)
+ Added for Japanese-specific word-breaking without the C+J dictionary.
- a. Make a icu data build directory outside the Chromium source tree
- and cd to that directory, $ICUBUILDIR.
- b. Run
+4. source/data/trnslit/css3transform.txt
+ - Handle Greek case conversion with a transliterator
- ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
+5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
+ with the minimal locale data necessary for spellchecker and
+ and language menus. Also change the English display name
+ for ckb to 'Kurdish (Arabic)'.
- c. Run 'make'
- d. 'make' will fail when pkgdata looks for css3transform.res.
- See http://bugs.icu-project.org/trac/ticket/10570
- e. run
+D. Local Modifications
- ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh
+1. Applied locale data patches from Google obtained by diff'ing
+ the upstream copy and Google's internal copy for source/data
- This will make and copy icudtl.dat and icudtl_dat.S for Linux and
- Mac as listed below. Renaming the data/assembly files to drop
- the ICU major version number as well as running make_mac_assembly.sh
- is done by this script.
+ - patches/locale_google.patch:
+ * Google's internal ICU locale changes
+ * Simpler region names for Hong Kong and Macau in all locales
+ * Currency signs in ru, uk and tr locales
+ * AM/PM, midnight, noon formatting for a few Indian locales
+ * Timezone name changes in Korean and Chinese locales
- This script can be run again whenever you update the data.
+ - patches/locale1.patch: Minor fixes for Korean
- - source/data/in/icudtl.dat : Built on Linux with all the patches
- above applied. icudt54l.dat is generated in
- {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
- version number (54) dropped.
+2. Applied post-56 fixes from the upstream for measure/date format bugs
- - {mac,linux}/icudtl_dat.S : Built on Linux with all the
- patches above (except android/brkitr.patch) applied and checked in.
- This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
- icudt54l_dat.S, but '54' is dropped while copying.
+ - patches/measure_format.patch: combined patch of 12 CLs taken
+ from bugs below.
+ - upstream bugs
+ http://bugs.icu-project.org/trac/ticket/11986
+ http://bugs.icu-project.org/trac/ticket/12031
+ http://bugs.icu-project.org/trac/ticket/12030
+ http://bugs.icu-project.org/trac/ticket/12041
- mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
- the header portion. With "linux/icudtl_dat.S" in its place,
+ - patches/relative_date.patch from Android
+ https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
+
+3. Breakiterator patches
+ - patches/linebrk.patch
+ a. Drop *_loose.txt for all locales and use the corresponding normal.txt
+ b. Drop local patches we used to have for the following issues. They'll
+ be dealt with in the upstream (Unicode/CLDR).
+ http://unicode.org/cldr/trac/ticket/6557
+ http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
+
+ - patches/wordbrk.patch for word.txt
+ a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
+ FQDN labels can be split at '.'
+ b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
+ See http://unicode.org/cldr/trac/ticket/6555
+
+ - patches/khmer-dictbe.patch
+ Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
+ http://bugs.icu-project.org/trac/ticket/9451
+
+ - Add several common Chinese words that were dropped previously to
+ source/data/cjdict/brkitr/cjdict.txt
+ patch: patches/cjdict.patch
+ upstream bug: http://bugs.icu-project.org/trac/ticket/10888
- - android/icudtl_dat.S : Built on Linux with all the patches above and
- android/patch_locale.sh executed.
- '54' is dropped from the name generated in the build tree.
+4. Timezone data update
+ Run scripts/update_tz.sh to grab the latest version of the
+ following timezone data files and put them in source/data/misc
- - android/icudtl.dat : Generated as icudt54l.dat in
- {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and
- copied to the above location with '54' dropped in its name.
+ metaZones.txt
+ timezoneTypes.txt
+ windowsZones.txt
+ zoneinfo64.txt
- - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
- and don't use this file.)
+ As of Jan 20 2016, the latest version is 2015g and the above files
+ are available at
+ http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
- a. check out a clean copy of icu54 from the upstream on Windows
- outside the Chrome tree.
+5. Build-related changes
- $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54
+ - patches/wpo.patch
+ upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
+ http://bugs.icu-project.org/trac/ticket/5701
+ - patches/vscomp.patch for building with Visual Studio on Windows.
+ a. do not use WINDOWS_LOCALE_API in locmap.c
+ b. do not redefine stringpiece::npos
- b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
- ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat
- c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
- ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
- c. In Visual Studio, open source/allinone/allinone.sln solution
- in ${SEPARATE_ICU_ROOT}
- d. Build 'makedata' target
- e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
- f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
- and check that in.
+ - patches/data.build.patch :
+ Remove unnecessary resources : unames, collator rule source
+ - patches/data.build.win.patch :
+ Windows-only data build patch.
+ - patches/data_symb.patch :
+ Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
+ the icu data file or icudt.dll
-15. Apply a timezone detection API fix
+6. Apply a timezone detection API fix
- patches/tzdetect.patch
- upstream bugs
http://bugs.icu-project.org/trac/ticket/11623
-23. Fix 'bad cast' found in Transliterator with a cfi build
+7. Fix 'bad cast' found in Transliterator with a cfi build
- patches/xlit_badcast.patch
- upstream bug (yet to be resolved)
http://bugs.icu-project.org/trac/ticket/11937
+
+8. TODO: If removing UTF-32 from Blink is more involved than expected,
+ add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is
+ defined See
+ http://www.icu-project.org/trac/ticket/11296
« no previous file with comments | « BUILD.gn ('k') | android/brkitr.patch » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698