Chromium Code Reviews| Index: icu52/README.chromium |
| =================================================================== |
| --- icu52/README.chromium (revision 259826) |
| +++ icu52/README.chromium (working copy) |
| @@ -18,91 +18,68 @@ |
| - source/layout |
| - source/layoutex |
| -2. Platform header files for Linux, FreeBSD, OpenBSD, Android, Mac OS X, and QNX: |
| + patches/configure.patch is applied to get runConfigureICU work in the |
| + icudata generation step without layout and layoutex directory by removing the |
| + corresponding Makefile's from ac_config variable. |
| - - Apply platform.patch in patches directory. : It applies the upstream |
| - patch to platform.h.in (see http://bugs.icu-project.org/trac/ticket/8248) |
| - and change source/common/unicode/ptypes.h to refer to plinux.h and |
| - pmac.h generated below. |
| +2. Apply the following patch for platform related headers (putilimpl.h and |
| + others). |
| - - 'runConfigureICU Linux', 'runConfigureICU FreeBSD', and |
| - 'runConfigureICU MacOSX' are run to generate |
| - source/common/unicode/platform.h. |
| + - patches/putil.patch for Android and QNX |
| + Upstream bug for Android : http://bugs.icu-project.org/trac/ticket/10478 |
| + Upstream bug for QNX : http://bugs.icu-project.org/trac/ticket/10811 |
| - - On OpenBSD, source/common/unicode/platform.h is being generated |
| - by the icu4c port in the ports directory and not by runConfigureICU. |
| - In case the file has to be updated you can do: |
| - cd /home/ports/textproc/icu4c && make configure |
| - - Rename it to 'plinux.h', 'pfreebsd.h', 'popenbsd.h' and 'pmac.h' |
| +3. Breakiterator patches |
| - - Apply patches/pmach.h.patch on Mac to pmac.h |
| + - Apply patches/brkitr.patch |
| + * word.txt |
| + a. Move full stops (U+002E, U+FF0E) from MidNumLet to MidNum so that |
| + FQDN labels can be split at '.' |
| + b. Move fullwidth digits (U+FF10 - U+FF19) from Ideographic to Numeric. |
| + See http://unicode.org/cldr/trac/ticket/6555 |
| + * line.txt |
| + a. Use Japanese rules for all locales because Japanese tailoring only |
| + affects Japanese specific characters. |
| + See http://unicode.org/cldr/trac/ticket/3974 |
|
Mark Mentovai
2014/04/04 19:22:52
Nuke the tabs.
jungshik at Google
2014/04/04 22:20:27
Done.
|
| + b. Minor changes in CL, OP and IS definitions to handle 'comma-variants' |
| + more consistenly. |
| + See http://unicode.org/cldr/trac/ticket/6557 |
| + c. Fix line breaking for Chinese characters and quotation marks |
| + See http://unicode.org/cldr/trac/ticket/4200 and |
| + http://crbug.com/39779 |
| + |
| - - On Android, the pandroid.h was generated by copying plinux.h to |
| - pandroid.h and applying the patches/pandroid.h.patch. |
| + - Add a new file brklocal.mk (copied from brkfiles.mk) with line_ja.txt |
| + and word_POSIX.txt dropped from the build list. |
| - - For QNX, the pqnx.h was generated by copying plinux.h to |
| - pqnx.h and applying the patches/platform.qnx.patch. |
| + - Apply patches/khmer-dictbe.patch and put in a smaller Khmer dictionary |
| + (source/data/brkitr/khmerdict.txt) obtained from |
| + http://bugs.icu-project.org/trac/ticket/9451 |
| - - For NaCl (icu_nacl.gypi), the pnacl.h was generated by copying plinux.h to |
| - pnacl.h and applying the patches/pnacl.h.patch. |
| - |
| - - Apply the CL at https://codereview.chromium.org/15973007/ to plinux.h |
| - |
| -3. The following directories were removed because they're not used by Chromium |
| - at the moment: |
| - as_is |
| - packaging |
| - source/extra |
| - source/sample |
| - source/layout |
| - source/layoutex |
| - |
| - |
| -4. The word breaking for Chinese and Japanese were modified to use a word |
| - frequency list with the following patch and cjdict.txt. |
| - |
| - - patches/segmentation.patch : |
| - Adds a dictionary (word-frequency)-based word breaking for CJK |
| - (Korean is supported in the code, but it does not do anything |
| - because we don't have a Korean word-list.) |
| - |
| - - source/data/brkitr/cjdict.txt : |
| - Chinese and Japanese word frequency list. |
| - See the file for license/copyright notice |
| - |
| - - source/data/brkitr/cc_edict.txt : |
| - the list of words derived from CC-Edict.) |
| - |
| - - patches/brkitr.patch |
| - * word.txt : Chinese/Japanese segmentation rules, Hebrew-script-specific |
| - handling of U+0022, and splitting of FQDN into labels at '.'. |
| - For Hebrew, see http://unicode.org/cldr/track/ticket/3120 |
| - * line.txt : Incorporated line_he and minor changes in CL, OP and ID |
| - definitions. |
| - For Hebrew, see http://unicode.org/cldr/track/ticket/4004 |
| - For others, see http://unicode.org/cldr/track/ticket/3974 |
| - http://unicode.org/cldr/track/ticket/4200 |
| - http://unicode.org/cldr/track/ticket/ |
| - * brklocal.mk : build file changes to drop unnecessary brkitr rule |
| - files (e.g. word_ja.txt, line_he.txt) |
| - |
| - android/brkitr.patch (to be applied for Android build only) : |
| Reverts some changes about Chinese/Japanese segmentation rules in |
| patches/brkitr.patch to reduce binary size for Android. |
| - If you want to run ICU tests, you have to copy source/data/brkitr/cjdict.txt |
| - to source/test/testdata/cjdict-truncated.txt to pass TestTrieWithValue test. |
| +4. Converter changes : |
| -5. Converter changes : converters.patch |
| - - Include what we really need. See source/data/mappings/ucmlocal.txt |
| - - Alias and mapping changes : source/data/mappings/convrtrs.txt |
| - - Changes several tables and add six new tables, three of which |
| - are 'fake' tables for ISO-2022-CN(-Ext). |
| - - ucnv2022.c is modified to use 3 'fake' tables added above for |
| - ISO-2022-CN(-Ext). |
| + - converters.patch : |
| + a. revises existing mapping tables |
| + b. Remove a lot of unused aliases in the converter alias table |
| + (source/data/mappings/convrtrs.txt ) leading to 40kB size reduction. |
| -6. Locale changes |
| + - Add source/data/mappings/ucmlocal.txt : to list only converters we need. |
| + - Add two new tables per WHATWG encoding standards for EUC-JP and CP866. |
| + They're generated with scripts/{eucjp, ibm866}_gen.sh. |
| + - Add three 'fake' tables for ISO-2022-CN(-Ext) : noop-*.ucm. |
| + |
| + - uconv.patch |
| + a. ucnv2022 uses 3 fake tables for ISO-2022-CN(-Ext) instead of two |
| + huge tables. |
| + b. ISO-2022-JP-[1-4] is dropped. |
| + c. SCSU, BOCU, ISCII, UTF-7 conversion is diabled. (25+kB reduction) |
| + |
| +5. Locale changes |
| - patches/locale1.patch : |
| Filipino, Amharic, and Swahili locales |
| exemplar character set changes for CJK + 9 Indian locales |
| @@ -130,13 +107,13 @@ |
| data necessary for the spellchecker. In both directories, add tg.txt to |
| reslocal.mk |
| -7. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt |
| +6. Removal of unihan collation tables from data/coll/{zh,ja,ko}.txt |
| - - patches/unihan.patch: |
| - unihan collation tables are never used in Chrome/Webkit, but it takes |
| - about 1MB in the uncompressed ICU data file in ICU 4.2.1. |
| + - run scripts/remove_unihan.sh |
| + unihan collation tables are never used in Chrome/Blink, but it takes |
| + about 1MB in the uncompressed ICU data file in ICU. |
| -8. Timezone data update |
| +7. Timezone data update |
| - Grab the latest version of the following timezone data files and |
| put them in source/data/misc. |
| @@ -145,183 +122,83 @@ |
| windowsZones.txt |
| zoneinfo64.txt |
| - As of Mar 2014, the latest version is 2014a and the above files |
| + As of April 2014, the latest version is 2014b and the above files |
| are available at |
| - http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014a/44/ |
| + http://source.icu-project.org/repos/icu/data/trunk/tzdata/icunew/2014b/44/ |
| -9. Transliterator customization |
| +8. Transliterator customization |
| - - Add el_Upper.txt taken from ICU 52 to source/data/trnslit |
| - |
| - - Also add css3transform.txt to the same directory |
| + - Also add css3transform.txt to source/data/trnslit. |
| - Put the following line in trnslocal.mk |
| TRANSLIT_SOURCE=css3transform.txt |
| -10. Build-related changes |
| +9. Build-related changes |
| - patches/wpo.patch |
| - - patches/vscomp.patch |
| - (see http://bugs.icu-project.org/trac/ticket/8355 and |
| - http://bugs.icu-project.org/trac/ticket/8356 ) |
| - - patches/rtti.patch : Make RTTI work without exception handling on Windows |
| - (see http://bugs.icu-project.org/trac/ticket/8343) |
| + Upstream bugs : http://bugs.icu-project.org/trac/ticket/8043 |
| + http://bugs.icu-project.org/trac/ticket/5701 |
| + - patches/vscomp.patch for building with Visual Studio on Windows. |
| + a. do not use WINDOWS_LOCALE_API in locmap.c |
| + b. do not redefine stringpiece::npos |
| - patches/data.build.patch : |
| - To remove some data files we don't use and cut down the data size. |
| + Remove unnecessary resources : invuca, unames, collator source, stringprep |
| - patches/data.build.win.patch : |
| Windows-only data build patch. Add a new target DATALIB to makedata.mak |
| - - patches/clang.patch: To build with Clang. |
| - (see http://bugs.icu-project.org/trac/ticket/8954 Two other chunks in |
| - the patch have already been fixed in the ICU trunk.) |
| - add an empty file (stubdatabuilt.txt) to source/stubdata |
| -11. Pre-built data libraries are checked in. |
| +10. Pre-built data files are checked in with the following steps on Linux: |
| - Before building data file on Linux, re-run 'runConfigureICU Linux' again |
| - if it's run without data.build.patch in #10 above. |
| + a. Make a icu data build directory outside the Chromium source tree. |
| + b. Run 'runConfigureICU Linux' outside the source tree. |
| + c. Run 'make' |
| + d. 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu |
| + to {BUILD_DIR_ROOT}/data/out/build/icudt52l/coll and re-run 'make' |
| + in {BUILD_DIR_ROOT}/data. |
| - Because we removed layout and layoutex directories in step 3, |
| - 'runConfigureICU Linux' will fail even with '--disable-layout'. A |
| - work-around is to have a copy of our icu tree in a separate build directory |
| - and add back directories we removed in step 3 before |
| - running 'runConfigure'. |
| - |
| - 'make' will fail in the 1st pass. Copy source/data/in/coll/invuca.icu |
| - to {BUILD_DIR_ROOT}/data/out/build/icudt46l/coll and re-run 'make' |
| - in {BUILD_DIR_ROOT}/data. |
| - |
| - 'make' will fail again when pkgdata looks for css3transform.res. Edit |
| + e. 'make' will fail again when pkgdata looks for css3transform.res. Edit |
| data/out/tmp/icudata.lst to replace 'css3transform.res' with 'root.res'. |
| (see http://bugs.icu-project.org/trac/ticket/10570 ) and run 'make' again. |
| - source/data/in/icudtl.dat : Built on Linux with all the patches |
| - above applied. icudt46l.dat is generated in |
| + above applied. icudt52l.dat is generated in |
| {BUILD_DIR_ROOT}/data/out/tmp and copied to the above location with a |
| - version number (46) dropped. |
| + version number (52) dropped. |
| - - windows/icudt.dll : With icudt46l.dat in place, all the patches applied |
| + - windows/icudt.dll : With icudt52l.dat in place, all the patches applied |
| and header files moved (#11 below), generated by building icudt_build |
| - project of build/icudt_build.sln on Windows. icudt46.dll is |
| + project of build/icudt_build.sln on Windows. icudt52.dll is |
| generated in bin/{Release,Debug} and copied to windows/icudt.dll |
| - and checked in. Note that we drop the version number ('46') from the |
| + and checked in. Note that we drop the version number ('52') from the |
| dll name to avoind having to update our build scripts/configuration |
| files everytime ICU is upgraded to a new version. |
| - - {mac,linux}/icudt46l_dat.S : Built on Linux with all the |
| + - {mac,linux}/icudt52l_dat.S : Built on Linux with all the |
| patches above (except android/brkitr.patch) applied and checked in. |
| This file will be generated in {BUILD_DIR_ROOT}/data/out/tmp. |
| - mac/icudt46l_dat.S is identical to linux/icudt46l_dat.S. It's made |
| + mac/icudt52l_dat.S is identical to linux/icudt52l_dat.S. It's made |
| by changing the header portion of the Linux version to read as following |
| (no leading whitespace) : |
| - .globl _icudt46_dat |
| + .globl _icudt52_dat |
| #ifdef U_HIDE_DATA_SYMBOL |
| - .private_extern _icudt46_dat |
| + .private_extern _icudt52_dat |
| #endif |
| .data |
| .const |
| .align 4 |
| - _icudt46_dat: |
| + _icudt52_dat: |
| - - android/icudt46l_dat.S : Built on Linux with all the patches above and |
| + - android/icudt52l_dat.S : Built on Linux with all the patches above and |
| android/brkitr.patch applied and android/patch_locale.sh executed, and |
| checked in. |
| - - android/icudtl.dat : Generated as icudt46l.dat in |
| - {BUILD_DIR_ROOT}/data/out/tmp along with icudt46l_dat.S and |
| - copied to the above location with '46' dropped in its name. |
| + - android/icudtl.dat : Generated as icudt52l.dat in |
| + {BUILD_DIR_ROOT}/data/out/tmp along with icudt52l_dat.S and |
| + copied to the above location with '52' dropped in its name. |
| -12. Apply the fix found with static analysis tools such as PSV and coverity |
| - |
| - - patches/static.analysis.patch |
| - - upstream trunk/4.8 do not have this code any more. |
| - |
| -13. Fix for msvs2010 applied: |
| ---- D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp |
| - (revision 78292) |
| -+++ D:/src/ent/src/third_party/icu/source/common/stringpiece.cpp |
| - (working copy) |
| -@@ -75,7 +75,7 @@ |
| - * Visual Studios 9.0. |
| - * Cygwin with MSVC 9.0 also complains here about redefinition. |
| - */ |
| --#if (!defined(_MSC_VER) || (_MSC_VER > 1500)) && !defined(CYGWINMSVC) |
| -+#if (!defined(_MSC_VER) || (_MSC_VER > 1600)) && !defined(CYGWINMSVC) |
| - const int32_t StringPiece::npos; |
| - #endif |
| - |
| -14. Fix for locales that don't use '.' as decimal separator: patches/nan.patch |
| - - upstream bug: http://bugs.icu-project.org/trac/ticket/8561 |
| - - Handle other chars besides the dot. This is required because decNumber's |
| - parser expects the dot as a decimal separator. |
| - - Locales that don't use dot were producing "NaN" values. |
| - |
| -15. Fix a bug in the regex engine. |
| - - patches/regex.patch |
| - - upstream bug: http://bugs.icu-project.org/trac/ticket/8666 (fixed in the upstream) |
| - |
| -16. Apply the upstream patch for Korean search collator support (ICU 4.6.1). |
| - - patches/search_collation.patch |
| - - upstream bug: http://bugs.icu-project.org/trac/ticket/8290 |
| - |
| -17. Fix a use of uninitialized memory bug in regular expression matching |
| - - patches/rematch.patch |
| - - upstream bug: http://bugs.icu-project.org/trac/ticket/8824 |
| - |
| -18. Make it compile with -Werror on gcc 4.6 |
| - - patches/gcc46.patch (ToT upstream does not have this code any more). |
| - |
| -19. Fix four out of bounds memory access error in common/uloc.c |
| - and common/uresbund.c |
| - - patches/uloc.patch |
| - - upstream bug: |
| - 1. http://bugs.icu-project.org/trac/ticket/8984 (_canonicalize) |
| - 2. http://bugs.icu-project.org/trac/ticket/9114 (_getKeywords) |
| - 3. http://bugs.icu-project.org/trac/ticket/8812 (uresbund) |
| - http://bugs.icu-project.org/trac/ticket/8813 (uresbund) |
| - 4. http://bugs.icu-project.org/trac/ticket/10250 (_getKeywords) |
| - |
| -20. Fix a null pointer error in ubrk_setText in ubrk.cpp. |
| - - patches/ubrk.patch |
| - - upstream bug : http://bugs.icu-project.org/trac/ticket/9115 |
| - |
| -21. Fix a clang warning in rbbi.cpp by merging in an upstream change. |
| - - patches/changeset_30255.patch |
| - - upstream change : http://bugs.icu-project.org/trac/changeset/30255 |
| - |
| -22. Fix time zone handling and compilation on iOS. |
| - - patches/ios_timezone.patch |
| - - upstream bugs : http://bugs.icu-project.org/trac/ticket/9051 |
| - http://bugs.icu-project.org/trac/ticket/8661 |
| - |
| -23. Fix a buffer overflow in utext |
| - - patches/utext.patch |
| - - upstream change : http://bugs.icu-project.org/trac/changeset/29356 |
| - |
| -24. Fix compilation errors on VS2012 and above. |
| - - patches/vs2012.patch |
| - |
| -25. Fix a buffer overflow in UTF-16/32 detection. |
| - - patches/csetdet.patch |
| - - upstream bug: http://bugs.icu-project.org/trac/ticket/10318 |
| - |
| -26. Add BreakIterator::getRuleStatus |
| - - patches/breakiterator.patch |
| - - Copy and paste BreakIterator::getRuleStatus API from ICU 52 |
| - |
| -27. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT. |
| +11. Change export of U_ICUDATA_ENTRY_POINT from U_IMPORT to U_EXPORT. |
| - patches/declspec.patch |
| - |
| -28. Add support for QNX Neutrino. |
| - - patches/platform.qnx.patch: |
| - See #2 about the platform header generation. |
| - - patches/si_value.undef.patch: |
| - Work around an all-lowercase macro defined in <signal.h>. |
| - Upstream took a different approach: |
| - http://bugs.icu-project.org/trac/ticket/9935 |
| - - patches/xopen_source.patch: |
| - Set _XOPEN_SOURCE to 600 as in the upstream changeset: |
| - http://bugs.icu-project.org/trac/changeset/30418 |