Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1028)

Side by Side Diff: README.chromium

Issue 1639543006: ICU 56 step 6: Check in the pre-built ICU data (Closed) Base URL: https://chromium.googlesource.com/chromium/deps/icu.git@56local_patches
Patch Set: address review comments Created 4 years, 10 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « BUILD.gn ('k') | android/brkitr.patch » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Name: icu 1 Name: icu
2 URL: http://site.icu-project.org/ 2 URL: http://site.icu-project.org/
3 Version: 56.1 3 Version: 56.1
4 License: MIT 4 License: MIT
5 Security Critical: yes 5 Security Critical: yes
6 6
7 ***NOTE*** 7 ***NOTE***
8 ICU is in the middle of being updated to 56.1 and does not work, yet. 8 ICU is in the middle of being updated to 56.1 and does not work, yet.
9 If you have an urgent fix to apply, contact jshin@chromium.org to 9 If you have an urgent fix to apply, contact jshin@chromium.org to
10 create a branch for 54.1 to apply a fix on top of. 10 create a branch for 54.1 to apply a fix on top of.
11 11
12 Description: 12 Description:
13 This directory contains the source code of ICU 56.1 for C/C++. 13 This directory contains the source code of ICU 56.1 for C/C++.
14 14
15 A. How to update ICU
15 16
16 1. Run "scripts/update.sh <version>" (e.g. 56-1). 17 1. Run "scripts/update.sh <version>" (e.g. 56-1).
18 This will download ICU from the upstream svn repository.
19 It does preserve Chrome-specific build files (*local.mk) and
20 converter files. (see section C)
17 21
18 2. Apply locale data patches from Google obtained by diff'ing 22 2. Update the source file lists for i18n and common
19 the upstream copy and Google's internal copy for source/data 23 in icu.gypi and BUILD.gn. See the comments in the files.
20 24
21 - patches/locale_google.patch: 25 3. Review and apply patches/changes in "D. Local Modifications" if
22 * Google's internal ICU locale changes 26 necessary/applicable. Update patch files in patches/.
23 * Simpler region names for Hong Kong and Macau in all locales
24 * Currency signs in ru, uk and tr locales
25 * AM/PM, midnight, noon formatting for a few Indian locales
26 * Timezone name changes in Korean and Chinese locales
27 27
28 - patches/locale1.patch: Minor fixes for Korean 28 4. Follow the instructions in section B on building ICU data files
29 29
30 30
31 3. Apply post-56 fixes from the upstream for measure/date format bugs 31 B. How to build ICU data files
32
33 - patches/measure_format.patch: combined patch of 12 CLs taken
34 from bugs below.
35 - upstream bugs
36 http://bugs.icu-project.org/trac/ticket/11986
37 http://bugs.icu-project.org/trac/ticket/12031
38 http://bugs.icu-project.org/trac/ticket/12030
39 http://bugs.icu-project.org/trac/ticket/12041
40
41 - patches/relative_date.patch from Android
42 https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
43
44 3. Breakiterator patches
45 - patches/linebrk.patch
46 a. Drop *_loose.txt for all locales and use the corresponding normal.txt
47 b. Drop local patches we used to have for the following issues. They'll
48 be dealt with in the upstream (Unicode/CLDR).
49 http://unicode.org/cldr/trac/ticket/6557
50 http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
51
52 - patches/wordbrk.patch
53 * word.txt
54 a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
55 FQDN labels can be split at '.'
56 b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
57 See http://unicode.org/cldr/trac/ticket/6555
58
59 - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt
60 and word_POSIX.txt dropped from the build list.
61
62 - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary
63 (source/data/brkitr/khmerdict.txt) obtained from
64 http://bugs.icu-project.org/trac/ticket/9451
65
66 - Add several common Chinese words that were dropped previously to
67 source/data/cjdict/brkitr/cjdict.txt
68 patch: patches/cjdict.patch
69 upstream bug: http://bugs.icu-project.org/trac/ticket/10888
70 32
71 33
72 - android/brkitr.patch (to be applied for Android build only) : 34 Pre-built data files are generated and checked in with the following steps
73 Do not use the C+J dictionary for Chinese/Japanese segmentation
74 to reduce the data size. Adjust word.txt and a few other files.
75 35
76 - source/data/brkitr/word_ja.txt (used only on Android) 36 1. icu data files for Chrome OS, Linux, Mac and Windows
77 Added for Japanese-specific word-breaking without the C+J dictionary.
78 37
79 4. Converter changes : 38 a. Make a icu data build directory outside the Chromium source tree
39 and cd to that directory (say, $ICUBUILDIR).
80 40
81 - convrtrs.txt : Replaced the original by our own that only lists encodings 41 b. Run
82 and aliases required by the WHATWG Encoding spec plus a few extra (see
83 the file as to why).
84 42
85 - Add source/data/mappings/ucmlocal.txt : to list only converters we need. 43 ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
86 44
87 - Add new tables per the WHATWG encoding standards for EUC-JP, 45 c. Run make
88 Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings. 46 'make' will fail when pkgdata looks for css3transform.res. This
89 They're generated with scripts : 47 is expected. See http://bugs.icu-project.org/trac/ticket/10570
90 scripts/{eucjp,sjis,big5,single_byte}_gen.sh
91 48
92 - gb_table.patch 49 d. Run
93 1. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per 50 ${CHROME_ICU_TREE_TOP}/scripts/trim_data.sh
94 the encoding spec (one-way mapping in toUnicode direction).
95 2. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
96 from U+1E3F to \xA8\xBC (windows-936/GBK).
97 See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
98 51
99 - uconv.patch 52 The full locale data for Chrome's UI languages and their select variants
100 http://www.icu-project.org/trac/ticket/11296 (uconv.patch) 53 and the bare minimum locale data for other locales will be kept.
101 54
102 It was landed in the upstream and is in 55 RC with the build 55 e. Run
103 config changed to UCONFIG_ONLY_HTML_CONVERSION. 56 ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
104 57
58 This will make icudt${version}l.dat and icudt${version}l_dat.S
105 59
106 5. Locale changes 60 f. Run
61 ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
107 62
108 - Locale build configuration files: To include the full locale data 63 This will erase the result of step d (trim_data.sh).
109 for Chrome's UI languages and the minimum locale data for other locales,
110 add reslocal.mk or {trns,sprep,rbnf,coll}local.mk files to
111 source/data/{coll,curr,lang.locale,curr,region,translit,zone,rbnf,sprep}.
112 64
113 This along with #8 (data.build.patch), #3 (brkiter) and #4 (converter) 65 g. Run
114 cuts down the data size by ~ 11MB. 66 ${CHROME_ICU_TREE_TOP}/scripts/copy_data.sh
115 67
116 - Run scripts/trim_data.sh : About 2.1MB data size reduction. 68 This will revert the changes made in source/data by trim_data.sh.
69 It will also copy the ICU data file for non-Android platform
70 and the corresponding assembly source files for Linux and Mac to
71 the following places. Check them in.
72
73 source/data/in/icudtl.dat
74 source/{linux,mac}/icudtl_dat.S
75
76 h. Whenever data is updated (e.g timezone update), follow d ~ g as long
77 as the ICU build directory used in a ~ c is kept.
78
79 2. icu data files for Android
80
81 a. Follow a ~ d for non-Android platforms
82 b. Run
83
84 ${CHROME_ICU_TREE_TOP}/android/patch_locale.sh
85
86 On top of trim_data.sh, further cut the data entries for Android.
87
88 c. Run
89 ${CHROME_ICU_TREE_TOP}/scripts/make_data.sh
90
91 d. Run
92 ${CHROME_ICU_TREE_TOP}/scripts/copy_data_android.sh
93
94 and check in the following files.
95
96 android/icudtl.dat
97 android/icudtl_dat.S
98
99 e. Run
100 ${CHROME_ICU_TREE_TOP}/scripts/clean_up_data_source.sh
101
102 This will erase the result of trim_data.sh and patch_locale.sh
103
104 3. icu data dll for Windows (non-default build option)
105
106 Follow these steps to build windows/icudt.dll. By default, we set
107 icu_use_icu_data_flag to 1 and don't use this file.
108
109 a. check out a clean copy of icu56 from the upstream on Windows
110 outside the Chrome tree.
111
112 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu/tag s/release-56-1 ${SEPARATE_ICU_ROOT}/icu56
113
114 b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
115 ${SEPARATE_ICU_ROOT}/source/data/in/icudt56l.dat
116 c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
117 ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
118 c. In Visual Studio, open source/allinone/allinone.sln solution
119 in ${SEPARATE_ICU_ROOT}
120 d. Build 'makedata' target
121 e. icudt56.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
122 f. Copy that icudt56.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
123 and check that in.
124
125 4. Note on the locale data customization
126
127 - scripts/trim_data.sh
117 a. Trim the locale data for Chrome's UI langauges : 128 a. Trim the locale data for Chrome's UI langauges :
118 locales, lang, region, currency 129 locales, lang, region, currency, zone
119 b. Trim the locale data for non-UI languages to the bare minimum : 130 b. Trim the locale data for non-UI languages to the bare minimum :
120 ExemplarCharacters, LocaleScript, layout, and the name of the 131 ExemplarCharacters, LocaleScript, layout, and the name of the
121 language for a locale in its native language. 132 language for a locale in its native language.
122 c. Remove the legacy Chinese character set-based collation 133 c. Remove the legacy Chinese character set-based collation
123 (big5han/gb2312han) that don't make any sense and nobdoy uses. 134 (big5han/gb2312han) that don't make any sense and nobdoy uses.
124 135
125 - Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang} 136 - android/patch_locale.sh
126 with the minimal locale data necessary for spellchecker and
127 and language menus. Also change the English display name
128 for ckb to 'Kurdish (Arabic)'.
129
130 - android/patch_locale.sh (to be run for Android build only):
131 a. Make changes to source/data/{region,lang} to exclude these data 137 a. Make changes to source/data/{region,lang} to exclude these data
132 except the language and script names of zh_Hans and zh_Hant. 138 except the language and script names of zh_Hans and zh_Hant.
133 b. Remove exemplar cities in timezone data (data/zone). 139 b. Remove exemplar cities in timezone data (data/zone).
134 c. Keep only the minimal calendar data in data/locales. 140 c. Keep only the minimal calendar data in data/locales.
135 d. Include currency display names for a smaller subset of currencies. 141 d. Include currency display names for a smaller subset of currencies.
136 e. Minimize the locale data for 9 locales to which Chrome on Android 142 e. Minimize the locale data for 9 locales to which Chrome on Android
137 is not localized. 143 is not localized.
138 f. Also apply android/brkitr.patch 144 f. Also apply android/brkitr.patch
139 145
140 6. Timezone data update 146 - android/brkitr.patch
141 - Grab the latest version of the following timezone data files and 147 Do not use the C+J dictionary for Chinese/Japanese segmentation
142 put them in source/data/misc using scripts/update_tz.sh 148 to reduce the data size. Adjust word.txt and a few other files.
149
150 C. Chromium-specific data build files and converters
151
152 They're preserved in step A.1 above. In general, there's no need to touch
153 them when updating ICU.
154
155 1. source/data/mappings
156 - convrtrs.txt : Lists encodings and aliases required by the WHATWG
157 Encoding spec plus a few extra (see the file as to why).
158
159 - ucmlocal.txt : to list only converters we need.
160
161 - *html.ucm: Mapping files per WHATWG encoding standards for EUC-JP,
162 Shift_JIS, Big5 (Big5+Big5HKSCS), EUC-KR and all the single byte encodings.
163 They're generated with scripts/{eucjp,sjis,big5,euckr,single_byte}_gen.sh.
164
165 - gb18030.ucm and windows-936.ucm
166 gb_table.patch was applied for the following changes.
167 a. Map \xA3\xA0 to U+3000 instead of U+E5E5 in gb18030 and windows-936 per
168 the encoding spec (one-way mapping in toUnicode direction).
169 b. Map \xA8\xBF to U+01F9 instead of U+E7C8. Add one-way map
170 from U+1E3F to \xA8\xBC (windows-936/GBK).
171 See https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c3
172
173 2. source/data/*/*local.mk
174 - List locales of interest to Chromium
175 a. Chrome's UI languages
176 b. Variants of UI languages
177 c. Other locales in Accept-Language list : will only have bare minimum
178 locale data
179
180 - brklocal.mk drops all *loose.brk to save space ( ~370kB) for now.
181
182 3. source/data/brkitr
183 - khmerdict.txt: Abridged Khmer dictionary. See
184 http://bugs.icu-project.org/trac/ticket/9451
185 - word_ja.txt (used only on Android)
186 Added for Japanese-specific word-breaking without the C+J dictionary.
187
188 4. source/data/trnslit/css3transform.txt
189 - Handle Greek case conversion with a transliterator
190
191 5. Add tg.txt, ckb.txt, and ku.txt to source/data/{locale,lang}
192 with the minimal locale data necessary for spellchecker and
193 and language menus. Also change the English display name
194 for ckb to 'Kurdish (Arabic)'.
195
196 D. Local Modifications
197
198 1. Applied locale data patches from Google obtained by diff'ing
199 the upstream copy and Google's internal copy for source/data
200
201 - patches/locale_google.patch:
202 * Google's internal ICU locale changes
203 * Simpler region names for Hong Kong and Macau in all locales
204 * Currency signs in ru, uk and tr locales
205 * AM/PM, midnight, noon formatting for a few Indian locales
206 * Timezone name changes in Korean and Chinese locales
207
208 - patches/locale1.patch: Minor fixes for Korean
209
210
211 2. Applied post-56 fixes from the upstream for measure/date format bugs
212
213 - patches/measure_format.patch: combined patch of 12 CLs taken
214 from bugs below.
215 - upstream bugs
216 http://bugs.icu-project.org/trac/ticket/11986
217 http://bugs.icu-project.org/trac/ticket/12031
218 http://bugs.icu-project.org/trac/ticket/12030
219 http://bugs.icu-project.org/trac/ticket/12041
220
221 - patches/relative_date.patch from Android
222 https://android.googlesource.com/platform/external/icu/+/f9ffd5b%5E%21
223
224 3. Breakiterator patches
225 - patches/linebrk.patch
226 a. Drop *_loose.txt for all locales and use the corresponding normal.txt
227 b. Drop local patches we used to have for the following issues. They'll
228 be dealt with in the upstream (Unicode/CLDR).
229 http://unicode.org/cldr/trac/ticket/6557
230 http://unicode.org/cldr/trac/ticket/4200 (http://crbug.com/39779)
231
232 - patches/wordbrk.patch for word.txt
233 a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that
234 FQDN labels can be split at '.'
235 b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric.
236 See http://unicode.org/cldr/trac/ticket/6555
237
238 - patches/khmer-dictbe.patch
239 Adjust parameters to use a smaller Khmer dictionary (khmerdict.txt).
240 http://bugs.icu-project.org/trac/ticket/9451
241
242 - Add several common Chinese words that were dropped previously to
243 source/data/cjdict/brkitr/cjdict.txt
244 patch: patches/cjdict.patch
245 upstream bug: http://bugs.icu-project.org/trac/ticket/10888
246
247 4. Timezone data update
248 Run scripts/update_tz.sh to grab the latest version of the
249 following timezone data files and put them in source/data/misc
143 250
144 metaZones.txt 251 metaZones.txt
145 timezoneTypes.txt 252 timezoneTypes.txt
146 windowsZones.txt 253 windowsZones.txt
147 zoneinfo64.txt 254 zoneinfo64.txt
148 255
149 As of Jan 20 2016, the latest version is 2015g and the above files 256 As of Jan 20 2016, the latest version is 2015g and the above files
150 are available at 257 are available at
151 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/ 258 http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2015g/44/
152 259
153 7. Transliterator customization 260 5. Build-related changes
154
155 - Also add css3transform.txt to source/data/trnslit.
156 - Put the following line in trnslocal.mk
157
158 TRANSLIT_SOURCE=css3transform.txt
159
160 8. Build-related changes
161 261
162 - patches/wpo.patch 262 - patches/wpo.patch
163 upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 263 upstream bugs : http://bugs.icu-project.org/trac/ticket/8043
164 http://bugs.icu-project.org/trac/ticket/5701 264 http://bugs.icu-project.org/trac/ticket/5701
165 - patches/vscomp.patch for building with Visual Studio on Windows. 265 - patches/vscomp.patch for building with Visual Studio on Windows.
166 a. do not use WINDOWS_LOCALE_API in locmap.c 266 a. do not use WINDOWS_LOCALE_API in locmap.c
167 b. do not redefine stringpiece::npos 267 b. do not redefine stringpiece::npos
168 268
169 - patches/data.build.patch : 269 - patches/data.build.patch :
170 Remove unnecessary resources : unames, collator rule source 270 Remove unnecessary resources : unames, collator rule source
171 - patches/data.build.win.patch : 271 - patches/data.build.win.patch :
172 Windows-only data build patch. 272 Windows-only data build patch.
173 - patches/data_symb.patch : 273 - patches/data_symb.patch :
174 Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use 274 Put ICU_DATA_ENTRY_POINT(icudtXX_dat) in common when we use
175 the icu data file or icudt.dll 275 the icu data file or icudt.dll
176 276
177 9. Pre-built data files are checked in with the following steps on Linux: 277 6. Apply a timezone detection API fix
178
179 a. Make a icu data build directory outside the Chromium source tree
180 and cd to that directory, $ICUBUILDIR.
181 b. Run
182
183 ${CHROME_ICU_TREE_TOP}/source/runConfigureICU Linux --disable-layout
184
185 c. Run 'make'
186 d. 'make' will fail when pkgdata looks for css3transform.res.
187 See http://bugs.icu-project.org/trac/ticket/10570
188 e. run
189
190 ${CHROME_ICU_TREE_TOP}/scripts/make_n_copy_data.sh
191
192 This will make and copy icudtl.dat and icudtl_dat.S for Linux and
193 Mac as listed below. Renaming the data/assembly files to drop
194 the ICU major version number as well as running make_mac_assembly.sh
195 is done by this script.
196
197 This script can be run again whenever you update the data.
198
199 - source/data/in/icudtl.dat : Built on Linux with all the patches
200 above applied. icudt54l.dat is generated in
201 {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a
202 version number (54) dropped.
203
204
205 - {mac,linux}/icudtl_dat.S : Built on Linux with all the
206 patches above (except android/brkitr.patch) applied and checked in.
207 This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp as
208 icudt54l_dat.S, but '54' is dropped while copying.
209
210 mac/icudtl_dat.S is identical to linux/icudtl_dat.S except for
211 the header portion. With "linux/icudtl_dat.S" in its place,
212
213 - android/icudtl_dat.S : Built on Linux with all the patches above and
214 android/patch_locale.sh executed.
215 '54' is dropped from the name generated in the build tree.
216
217 - android/icudtl.dat : Generated as icudt54l.dat in
218 {BUILD_DIR_ROOT}/data/out/tmp along with icudt54l_dat.S and
219 copied to the above location with '54' dropped in its name.
220
221 - windows/icudt.dll (by default, we set icu_use_icu_data_flag to 1
222 and don't use this file.)
223
224 a. check out a clean copy of icu54 from the upstream on Windows
225 outside the Chrome tree.
226
227 $ svn export --native-eol LF http://source.icu-project.org/repos/icu/icu /tags/release-54-1 ${SEPARATE_ICU_ROOT}/icu54
228
229 b. copy ${CHROME_ICU_ROOT}/source/data/in/icudtl.dat to
230 ${SEPARATE_ICU_ROOT}/source/data/in/icudt54l.dat
231 c. copy ${CHROME_ICU_ROOT}/source/data/makedata.mak to
232 ${SEPARATE_ICU_ROOT}/source/data/makedata.mak
233 c. In Visual Studio, open source/allinone/allinone.sln solution
234 in ${SEPARATE_ICU_ROOT}
235 d. Build 'makedata' target
236 e. icudt54.dll will be generated in ${SEPARATE_ICU_ROOT}/bin
237 f. Copy that icudt54.dll to ${CHROME_ICU_ROOT}/windows/icudt.dll
238 and check that in.
239
240 15. Apply a timezone detection API fix
241 - patches/tzdetect.patch 278 - patches/tzdetect.patch
242 - upstream bugs 279 - upstream bugs
243 http://bugs.icu-project.org/trac/ticket/11623 280 http://bugs.icu-project.org/trac/ticket/11623
244 281
245 23. Fix 'bad cast' found in Transliterator with a cfi build 282 7. Fix 'bad cast' found in Transliterator with a cfi build
246 - patches/xlit_badcast.patch 283 - patches/xlit_badcast.patch
247 - upstream bug (yet to be resolved) 284 - upstream bug (yet to be resolved)
248 http://bugs.icu-project.org/trac/ticket/11937 285 http://bugs.icu-project.org/trac/ticket/11937
286
287 8. TODO: If removing UTF-32 from Blink is more involved than expected,
288 add back UTF-32 temporarily even when UCONFIG_ONLY_HTML_CONVERSION is
289 defined See
290 http://www.icu-project.org/trac/ticket/11296
OLDNEW
« no previous file with comments | « BUILD.gn ('k') | android/brkitr.patch » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698