Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(2)

Unified Diff: icu46/source/data/unidata/changes.txt

Issue 5516007: Check in the pristine copy of ICU 4.6... (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/
Patch Set: Created 10 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
Index: icu46/source/data/unidata/changes.txt
===================================================================
--- icu46/source/data/unidata/changes.txt (revision 0)
+++ icu46/source/data/unidata/changes.txt (revision 0)
@@ -0,0 +1,934 @@
+* Copyright (C) 2004-2010, International Business Machines
+* Corporation and others. All Rights Reserved.
+*
+* file name: changes.txt
+* encoding: US-ASCII
+* tab size: 8 (not used)
+* indentation:4
+*
+* created on: 2004may06
+* created by: Markus W. Scherer
+*
+* change log for Unicode updates
+
+---------------------------------------------------------------------------- ***
+
+Unicode 6.0 update
+
+*** related ICU Trac tickets
+
+7264 Unicode 6.0 Update
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+ (configure.in & configure: have been modified to extract the version from uchar.h)
+- com.ibm.icu.util.VersionInfo
+
+*** data files & enums & parser code
+
+* file preparation
+
+~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed
+- This now prepares both unidata and testdata files in respective output subfolders.
+
+* PropertyAliases.txt changes
+- new Script_Extensions property defined in the new ScriptExtensions.txt file
+ but not listed in PropertyAliases.txt; reported to unicode.org;
+ -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt
+ scx; Script_Extensions
+ -> uchar.h with new UProperty section
+ -> com.ibm.icu.lang.UProperty, parallel with uchar.h
+
+* PropertyValueAliases.txt changes
+- 12 new block names:
+ Alchemical_Symbols
+ Bamum_Supplement
+ Batak
+ Brahmi
+ CJK_Unified_Ideographs_Extension_D
+ Emoticons
+ Ethiopic_Extended_A
+ Kana_Supplement
+ Mandaic
+ Miscellaneous_Symbols_And_Pictographs
+ Playing_Cards
+ Transport_And_Map_Symbols
+ -> add to uchar.h
+ -> add to UCharacter.UnicodeBlock
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+)
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- Joining_Group (jg) values:
+ Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias
+ -> uchar.h & UCharacter.JoiningGroup
+- 3 new scripts:
+ sc ; Batk ; Batak
+ sc ; Brah ; Brahmi
+ sc ; Mand ; Mandaic
+ -> remove these from SyntheticPropertyValueAliases.txt
+ -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html
+ (added 2009-11-11..2010-07-18)
+ Bass 259 Bassa Vah
+ Dupl 755 Duployan shortand
+ Elba 226 Elbasan
+ Gran 343 Grantha
+ Kpel 436 Kpelle
+ Loma 437 Loma
+ Mend 438 Mende
+ Merc 101 Meroitic Cursive
+ Narb 106 Old North Arabian
+ Nbat 159 Nabataean
+ Palm 126 Palmyrene
+ Sind 318 Sindhi
+ Wara 262 Warang Citi
+ -> uscript.h
+ -> com.ibm.icu.lang.UScript
+ find USCRIPT_([^ ]+) *= ([0-9]+),(.+)
+ replace public static final int \1 = \2;\3
+ -> SyntheticPropertyValueAliases.txt
+ -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI()
+ and in com.ibm.icu.dev.test.lang.TestUScript.java
+- ISO 15924 name change
+ Mero 100 Meroitic Hieroglyphs (was Meroitic)
+ -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC
+- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt
+
+* UnicodeData.txt changes
+- new CJK block:
+ 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;;
+ 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;;
+ -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion
+
+* build Unicode tools using CMake+make
+
+* run genpname/preparse.pl (on Linux)
+ + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname
+ + make sure that data.h is writable
+ + perl preparse.pl ~/svn.icu/trunk/src > out.txt
+ + preparse.pl shows no errors, out.txt Info and Warning lines look ok
+
+* rebuild Unicode tools (at least genpname) using make
+- You might first need to "make install" ICU so that the tools build can pick
+ up the new definitions from the installed header files.
+
+* run genpname
+- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in
+- rebuild ICU & tools
+
+* update source/data/unidata/norm2/nfkc_cf.txt
+- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt
+
+* update source/data/unidata/norm2/uts46.txt
+- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt
+ to ~/svn.icu/tools/trunk/src/unicode/py
+- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values
+- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py
+- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2
+
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar)
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters
+- Unicode 6.0: U+2260, U+226E, U+226F
+
+* generate core properties data files
+- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+- run makeuca.sh so that genuca picks up the new nfc.nrm:
+ ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU & tools
+
+* implement new Script_Extensions property (provisional)
+- parser & generator: genprops & uprops.icu
+- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp
+- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java
+
+* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2
+- (one-time change)
+- genbidi/gencase/genprops tools changes
+- re-run makeprops.sh (see above)
+- UCharacterProperty.java, UCharacterTypeIterator.java,
+ UBiDiProps.java, UCaseProps.java, and several others with minor changes;
+ UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java
+
+* update Java data files
+- refresh just the UCD-related files, just to be safe
+- see (ICU4C)/source/data/icu4j-readme.txt
+- mkdir /tmp/icu4j
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ output:
+ ...
+ Unicode .icu files built to ./out/build/icudt45l
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b
+ echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b
+ mkdir -p /tmp/icu4j/main/shared/data
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data
+- copy the big-endian Unicode data files to another location,
+ separate from the other data files
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
+ ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr
+- refresh ICU4J
+ ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
+
+* refresh Java test .txt files
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode
+
+* un-hardcode normalization skippable (NF*_Inert) test data
+- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools
+
+* copy updated break iterator test files
+- now handled by early ucdcopy.py and
+ copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata
+ (old instructions:
+ copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt
+ to ~/svn.icu/trunk/src/source/test/testdata)
+- they are not used in ICU4J
+
+* UCA
+
+- get output from Mark's tools; look in
+ http://www.unicode.org/~book/incoming/mark/uca6.0.0/
+ http://www.macchiato.com/unicode/utc/additional-uca-files
+ http://www.unicode.org/Public/UCA/6.0.0/
+ http://www.unicode.org/~mdavis/uca/
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt
+- update Han-implicit ranges for new CJK extensions:
+ swapCJK() in ucol.cpp & ImplicitCEGenerator.java
+- genuca: allow bytes 02 for U+FFFE, new merge-sort character;
+ do not add it into invuca so that tailoring primary-after an ignorable works
+- genuca: permit space between [variable top] bytes
+- ucol.cpp: treat noncharacters like unassigned rather than ignorable
+- run makeuca.sh:
+ ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld
+- rebuild ICU4C
+- refresh ICU4J collation data:
+ (subset of instructions above for properties data refresh, except copies all coll/*)
+ ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll
+ ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b
+- update (ICU)/source/test/testdata/CollationTest_*.txt
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt
+ with output from Mark's Unicode tools
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+* When refreshing all of ICU4J data from ICU4C
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data
+or
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install
+
+*** LayoutEngine script information
+
+(For details see the Unicode 5.2 change log below.)
+
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+* fix mixed line endings
+* review the diffs and fix incorrect @draft and missing aliases;
+ Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc.
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.2 update
+
+*** related ICU Trac tickets
+
+7084 Unicode 5.2
+
+7167 verify collation bytes
+7235 Java test NAME_ALIAS
+7236 Java DerivedCoreProperties.txt test
+7237 Java BidiTest.txt
+7238 UTrie2 in core unidata
+7239 test for tailoring gaps
+7240 Java fix CollationMiscTest
+7243 update layout engine for Unicode 5.2
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in & configure
+- update ucdVersion in gennames.c if an algorithmic range changes
+
+*** data files & enums & parser code
+
+* file preparation
+
+python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata
+- includes finding files regardless of version numbers,
+ copying them, and performing the equivalent processing of the
+ ucdstrip and ucdmerge tools on the desired set of files
+
+* notes on changes
+- PropertyAliases.txt
+ moved from numeric to enumerated:
+ ccc ; Canonical_Combining_Class
+ new string properties:
+ NFKC_CF ; NFKC_Casefold
+ Name_Alias; Name_Alias
+ new binary properties:
+ Cased ; Cased
+ CI ; Case_Ignorable
+ CWCF ; Changes_When_Casefolded
+ CWCM ; Changes_When_Casemapped
+ CWKCF ; Changes_When_NFKC_Casefolded
+ CWL ; Changes_When_Lowercased
+ CWT ; Changes_When_Titlecased
+ CWU ; Changes_When_Uppercased
+ new CJK Unihan properties (not supported by ICU)
+- PropertyValueAliases.txt
+ new block names
+ new scripts
+ one script code change:
+ sc ; Qaai ; Inherited
+ ->
+ sc ; Zinh ; Inherited ; Qaai
+ new Line_Break (lb) value:
+ lb ; CP ; Close_Parenthesis
+ new Joining_Group (jg) values: Farsi_Yeh, Nya
+ other new values:
+ ccc; 214; ATA ; Attached_Above
+- DerivedBidiClass.txt
+ new default-R range: U+1E800 - U+1EFFF
+- UnicodeData.txt
+ all of the ISO comments are gone
+ new CJK block end:
+ 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last>
+ new CJK block:
+ 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;;
+ 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;;
+
+* genpname
+- run preparse.pl
+ + cd \svn\icuproj\icu\trunk\source\tools\genpname
+ + make sure that data.h is writable
+ + perl preparse.pl \svn\icuproj\icu\trunk > out.txt
+ + preparse.pl complains with errors like the following:
+ Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34.
+ This is because ICU 4.0 had scripts from ISO 15924 which are now
+ added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt
+ and PropertyValueAliases.txt.
+ -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
+ Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt
+ + preparse.pl complains with errors about block names missing from uchar.h; add them
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+ + 26 new blocks
+ copy new blocks from Blocks.txt
+ MS VC++ 2008 regular expression:
+ find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$"
+ replace with " UBLOCK_\3 = 172, /*[\1]*/"
+ + several new script values already added in ICU 4.0 for ISO 15924 coverage
+ (removed from SyntheticPropertyValueAliases.txt, see genpname notes above)
+ + 3 new script values added for ISO 15924 and Unicode 5.2 coverage
+ + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2)
+ (added to SyntheticPropertyValueAliases.txt)
+- new Joining Group (JG) values: Farsi_Yeh, Nya
+- new Line_Break (lb) value:
+ lb ; CP ; Close_Parenthesis
+
+* hardcoded Unihan range end/limit
+- Unihan range end moves from 9FC3 to 9FCB
+ search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive)
+ + do change gennames.c
+
+* Compare definitions of new binary properties with what we used to use
+ in algorithms, to see if the definitions changed.
+- Verified that definitions for Cased and Case_Ignorable are unchanged.
+ The gencase tool now parses the newly public Case_Ignorable values
+ in case the definition changes in the future.
+
+* uchar.c & uprops.h & uprops.c & genprops
+- new numeric values that didn't exist in Unicode data before:
+ 1/7, 1/9, 1/10, 3/10, 1/16, 3/16
+ the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5,
+ therefore redesign the encoding of numeric types and values for formatVersion 6;
+ design for simple numbers up to at least 144 ("one gross"),
+ large values up to at least 10^20,
+ and fractions with numerators -1..17 and denominators 1..16
+ to cover current and expected future values
+ (e.g., more Han numeric values, Meroitic twelfths)
+
+* reimplement Hangul_Syllable_Type for new Jamo characters
+- the old code assumed that all Jamo characters are in the 11xx block
+- Unicode 5.2 fills holes there and adds new Jamo characters in
+ A960..A97F; Hangul Jamo Extended-A
+ and in
+ D7B0..D7FF; Hangul Jamo Extended-B
+- Hangul_Syllable_Type can be trivially derived from a subset of
+ Grapheme_Cluster_Break values
+
+* build Unicode data source code for hardcoding core data
+C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data
+
+ICU data make path is \svn\icuproj\icu\trunk\source\data\
+ICU root path is \svn\icuproj\icu\trunk
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
+Information: cannot find "spreplocal.mk". Not building user-additional stringprep files.
+Creating data file for Unicode Property Names
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l"
+Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp"
+
+- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common
+ and rebuild the common library
+
+*** UCA
+
+- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools)
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools
+- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools
+[ Begin obsolete instructions:
+ Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files.
+ - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py
+ on Windows:
+ python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt
+ python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt
+ End obsolete instructions]
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments)
+ not just the *_STUB.txt files
+- note on intltest: if collate/UCAConformanceTest fails, then
+ utility/MultithreadTest/TestCollators will fail as well;
+ fix the conformance test before looking into the multi-thread test
+
+*** Implement Cased & Case_Ignorable properties
+- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable()
+- Problem: These properties should be disjoint, but aren't
+- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not
+- change ucase.icu to be able to store any combination of Cased and Case_Ignorable
+
+*** Implement Changes_When_Xyz properties
+- without stored data
+
+*** Implement Name_Alias property
+- add it as another name field in unames.icu
+- make it available via u_charName() and UCharNameChoice and
+- consider it in u_charFromName()
+
+*** Break iterators
+
+* Update break iterator rules to new UAX versions and new property values
+* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary
+
+*** new BidiTest file
+- review format and data
+- copy BidiTest.txt to source/test/testdata
+- write test code using this data
+- fix ICU code where it fails the conformance test
+
+*** Java
+- generally, find and update code corresponding to C/C++
+- UCharacter.UnicodeBlock constants:
+ a) add an _ID integer per new block, update COUNT
+ b) add a class instance per new block
+ Visual Studio regex:
+ find UBLOCK_{[^ ]+} = [0-9]+, {/.+}
+ replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2
+- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias()
+
+- port test changes to Java
+
+*** LayoutEngine script information
+
+(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833)
+
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+-> Eric Mader wrote in email on 20090930:
+ "I think the tool has been modified to update @draft to @stable for
+ older scripts and to add @draft for new scripts.
+ (I worked with an intern on this last year.)
+ You should check the output after you run it."
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+* fix mixed line endings
+* review the diffs and fix incorrect @draft and missing aliases
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+-> Eric Mader wrote in email on 20090930:
+ "This is just a matter of making sure that all the per-script tables have
+ entries for any new scripts that were added.
+ If any new Indic characters were added, then the class tables in
+ IndicClassTables.cpp should be updated to reflect this.
+ John Emmons should know how to do this if it's required."
+
+* rebuild the layout and layoutex libraries.
+
+*** Documentation
+- Update User Guide
+ + Jamo_Short_Name, sfc->scf, binary property value aliases
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.1 update
+
+*** related ICU Trac tickets
+
+5696 Update to Unicode 5.1
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in & configure
+- update ucdVersion in gennames.c if an algorithmic range changes
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+ DerivedCoreProperties.txt
+ DerivedNormalizationProps.txt
+ NormalizationTest.txt
+ PropList.txt
+ Scripts.txt
+ GraphemeBreakProperty.txt
+ SentenceBreakProperty.txt
+ WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+ EastAsianWidth.txt
+ LineBreak.txt
+
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
+copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\
+copy 5.1.0\ucd\Blocks.txt ..\unidata\
+copy 5.1.0\ucd\CaseFolding.txt ..\unidata\
+copy 5.1.0\ucd\DerivedAge.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
+copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
+copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\
+copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\
+copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\
+copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\
+copy 5.1.0\ucd\UnicodeData.txt ..\unidata\
+
+ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
+ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
+ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
+ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt
+ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
+ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
+ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
+ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
+ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
+ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
+
+* genpname
+- run preparse.pl
+ + cd \svn\icuproj\icu\uni51\source\tools\genpname
+ + make sure that data.h is writable
+ + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt
+ + preparse.pl complains with errors like the following:
+ Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30.
+ This is because ICU 3.8 had scripts from ISO 15924 which are now
+ added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt
+ and PropertyValueAliases.txt.
+ -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt:
+ Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii
+ + PropertyValueAliases.txt now explicitly contains values for boolean properties:
+ N/Y, No/Yes, F/T, False/True
+ -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases.
+ It will use further values from the file if present.
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+ + 17 new blocks
+ + 11 new script values already added in ICU 3.8 for ISO 15924 coverage
+ (removed from SyntheticPropertyValueAliases.txt)
+ + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1)
+ (added to SyntheticPropertyValueAliases.txt)
+- uprops.icu (uprops.h) only provides 7 bits for script codes.
+ In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now.
+ There is none above 127 yet which is the script code for an
+ assigned Unicode character, so ICU 4.0 uprops.icu does not store any
+ script code values greater than 127.
+ However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129
+ in a parallel bit field, and that overflows now.
+ Also, future values >=128 would be incompatible anyway.
+ uprops.h is modified to move around several of the bit fields
+ in the properties vector words, and now uses 8 bits for the script code.
+ Two other bit fields also grow to accommodate future growth:
+ Block (current count: 172) grows from 8 to 9 bits,
+ and Word_Break grows from 4 to 5 bits.
+- renamed property Simple_Case_Folding (sfc->scf)
+ + nothing to be done: handled as normal alias
+- new property JSN Jamo_Short_Name
+ + no new API: only contributes to the Name property
+- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark
+- new Joining Group (JG) value: Burushashki_Yeh_Barree
+- new Sentence_Break (SB) values:
+ SB ; CR ; CR
+ SB ; EX ; Extend
+ SB ; LF ; LF
+ SB ; SC ; SContinue
+- new Word_Break (WB) values:
+ WB ; CR ; CR
+ WB ; Extend ; Extend
+ WB ; LF ; LF
+ WB ; MB ; MidNumLet
+
+* Further changes in the 2008-02-29 update:
+- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP
+ because they should not normally be invisible.
+- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed)
+- new Grapheme_Cluster_Break (GCB) value: PP=Prepend
+- new Word_Break (WB) value: NL=Newline
+
+* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison)
+- Unihan range end moves from 9FBB to 9FC3
+ search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive)
+ + do change gennames.c
+
+* build Unicode data source code for hardcoding core data
+C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data
+
+ICU data make path is \svn\icuproj\icu\uni51\source\data\
+ICU root path is \svn\icuproj\icu\uni51
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files.
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files.
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files.
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files.
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l"
+Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp"
+
+- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common
+ and rebuild the common library
+
+*** Break iterators
+
+* Update break iterator rules to new UAX versions and new property values
+
+*** UCA
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+*** Test suites
+- Test that APIs using Unicode property value aliases (like UnicodeSet)
+ support all of the boolean values N/Y, No/Yes, F/T, False/True
+ -> TestBinaryValues() tests in both cintltst and intltest
+
+*** LayoutEngine script information
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+* rebuild the layout and layoutex libraries.
+
+*** Documentation
+- Update User Guide
+ + Jamo_Short_Name, sfc->scf, binary property value aliases
+
+---------------------------------------------------------------------------- ***
+
+Unicode 5.0 update
+
+*** related Jitterbugs
+
+5084 RFE: Update to Unicode 5.0
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+ DerivedCoreProperties.txt
+ DerivedNormalizationProps.txt
+ NormalizationTest.txt
+ PropList.txt
+ Scripts.txt
+ GraphemeBreakProperty.txt
+ SentenceBreakProperty.txt
+ WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+ EastAsianWidth.txt
+ LineBreak.txt
+
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers)
+copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\
+copy 5.0.0\ucd\Blocks.txt ..\unidata\
+copy 5.0.0\ucd\CaseFolding.txt ..\unidata\
+copy 5.0.0\ucd\DerivedAge.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\
+copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\
+copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\
+copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\
+copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\
+copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\
+copy 5.0.0\ucd\UnicodeData.txt ..\unidata\
+
+ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt
+ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt
+ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt
+ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt
+ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt
+ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt
+ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt
+ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt
+ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt
+ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+* genpname
+- run preparse.pl
+ + make sure that data.h is writable
+ + perl preparse.pl \cvs\oss\icu > out.txt
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new block & script values
+ + script values already added in ICU 3.6 because all of ISO 15924 is now covered
+
+* build Unicode data source code for hardcoding core data
+C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data
+
+ICU data make path is \cvs\oss\icu\source\data\
+ICU root path is \cvs\oss\icu
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files.
+[etc.]
+Creating data file for Unicode Character Properties
+Creating data file for Unicode Case Mapping Properties
+Creating data file for Unicode BiDi/Shaping Properties
+Creating data file for Unicode Normalization
+Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l"
+Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp"
+
+- copy the .c source files to C:\cvs\oss\icu\source\common
+ and rebuild the common library
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in
+
+*** LayoutEngine script information
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h,
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates
+ScriptRunData.cpp, which is no longer needed.)
+
+The generated files have a current copyright date and "@draft" statement.
+
+* copy the above files into <icu>/source/layout, replacing the old files.
+
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...)
+
+* rebuild the layout and layoutex libraries.
+
+---------------------------------------------------------------------------- ***
+
+Unicode 4.1 update
+
+*** related Jitterbugs
+
+4332 RFE: Update to Unicode 4.1
+4157 RBBI, TR29 4.1 updates
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip:
+ DerivedCoreProperties.txt
+ DerivedNormalizationProps.txt
+ NormalizationTest.txt
+ GraphemeBreakProperty.txt
+ SentenceBreakProperty.txt
+ WordBreakProperty.txt
+- ucdstrip and ucdmerge:
+ EastAsianWidth.txt
+ LineBreak.txt
+
+* add new files to the repository
+ GraphemeBreakProperty.txt
+ SentenceBreakProperty.txt
+ WordBreakProperty.txt
+
+* update FractionalUCA.txt and UCARules.txt with new canonical closure
+
+* genpname
+- handle new enumerated properties in sub read_uchar
+- run preparse.pl
+
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops
+- new binary properties
+ + Pattern_Syntax
+ + Pattern_White_Space
+- new enumerated properties
+ + Grapheme_Cluster_Break
+ + Sentence_Break
+ + Word_Break
+- new block & script & line break values
+
+* gencase
+- case-ignorable changes
+ see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
+ now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in
+
+*** tests
+- verify that u_charMirror() round-trips
+- test all new properties and some new values of old properties
+
+*** other code
+
+* hardcoded Unihan range end/limit
+- Unihan range end moves from 9FA5 to 9FBB
+ search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive)
+ + do not modify BOCU/BOCSU code because that would change the encoding
+ and break binary compatibility!
+ + similarly, do not change the GB 18030 range data (ucnvmbcs.c),
+ NamePrepProfile.txt
+ + ignore trietest.c: test data is arbitrary
+ + ignore tstnorm.cpp: test optimization, not important
+ + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF
+ + do change line_th.txt and word_th.txt
+ by replacing hardcoded ranges with the new property values
+ + do change gennames.c
+
+source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
+source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6
+source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5,
+
+* case mappings
+- compare new special casing context conditions with previous ones
+ see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods
+
+* genpname
+- consider storing only the short name if it is the same as the long name
+
+*** other reviews
+- UAX #29 changes (grapheme/word/sentence breaks)
+- UAX #14 changes (line breaks)
+- Pattern_Syntax & Pattern_White_Space
+
+---------------------------------------------------------------------------- ***
+
+Unicode 4.0.1 update
+
+*** related Jitterbugs
+
+3170 RFE: Update to Unicode 4.0.1
+3171 Add new Unicode 4.0.1 properties
+3520 use Unicode 4.0.1 updates for break iteration
+
+*** data files & enums & parser code
+
+* file preparation
+- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt
+- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt
+
+* file fixes
+- fix UnicodeData.txt general categories of Ethiopic digits Nd->No
+ according to PRI #26
+ http://www.unicode.org/review/resolved-pri.html#pri26
+- undone again because no corrigendum in sight;
+ instead modified tests to not check consistency on this for Unicode 4.0.1
+
+* ucdterms.txt
+- update from http://www.unicode.org/copyright.html
+ formatted for plain text
+
+* uchar.h & uprops.h & uprops.c & genprops
+- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed
+- add U_LB_INSEPARABLE due to a spelling fix
+ + put short name comment only on line with new constant
+ for genpname perl script parser
+- new binary properties
+ + STerm
+ + Variation_Selector
+
+* genpname
+- fix genpname perl script so that it doesn't choke on more than 2 names per property value
+- perl script: correctly calculate the maximum number of fields per row
+
+* uscript.h
+- new script code Hrkt=Katakana_Or_Hiragana
+
+* gennorm.c track changes in DerivedNormalizationProps.txt
+- "FNC" -> "FC_NFKC"
+- single field "NFD_NO" -> two fields "NFD_QC; N" etc.
+
+* genprops/props2.c track changes in DerivedNumericValues.txt
+- changed from 3 columns to 2, dropping the numeric type
+ + assume that the type is always numeric for Han characters,
+ and that only those are added in addition to what UnicodeData.txt lists
+
+*** Unicode version numbers
+- makedata.mak
+- uchar.h
+- configure.in
+
+*** tests
+- update test of default bidi classes according to PRI #28
+ /tsutil/cucdtst/TestUnicodeData
+ http://www.unicode.org/review/resolved-pri.html#pri28
+- bidi tests: change exemplar character for ES depending on Unicode version
+- change hardcoded expected property values where they change
+
+*** other code
+
+* name matching
+- read UCD.html
+
+* scripts
+- use new Hrkt=Katakana_Or_Hiragana
+
+* ZWJ & ZWNJ
+- are now part of combining character sequences
+- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ
Property changes on: icu46/source/data/unidata/changes.txt
___________________________________________________________________
Added: svn:eol-style
+ LF
« no previous file with comments | « icu46/source/data/unidata/WordBreakProperty.txt ('k') | icu46/source/data/unidata/confusablesWholeScript.txt » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698