Index: icu46/source/data/unidata/changes.txt |
=================================================================== |
--- icu46/source/data/unidata/changes.txt (revision 0) |
+++ icu46/source/data/unidata/changes.txt (revision 0) |
@@ -0,0 +1,934 @@ |
+* Copyright (C) 2004-2010, International Business Machines |
+* Corporation and others. All Rights Reserved. |
+* |
+* file name: changes.txt |
+* encoding: US-ASCII |
+* tab size: 8 (not used) |
+* indentation:4 |
+* |
+* created on: 2004may06 |
+* created by: Markus W. Scherer |
+* |
+* change log for Unicode updates |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 6.0 update |
+ |
+*** related ICU Trac tickets |
+ |
+7264 Unicode 6.0 Update |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+ (configure.in & configure: have been modified to extract the version from uchar.h) |
+- com.ibm.icu.util.VersionInfo |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+ |
+~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720/ucd ~/uni60/processed |
+- This now prepares both unidata and testdata files in respective output subfolders. |
+ |
+* PropertyAliases.txt changes |
+- new Script_Extensions property defined in the new ScriptExtensions.txt file |
+ but not listed in PropertyAliases.txt; reported to unicode.org; |
+ -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt |
+ scx; Script_Extensions |
+ -> uchar.h with new UProperty section |
+ -> com.ibm.icu.lang.UProperty, parallel with uchar.h |
+ |
+* PropertyValueAliases.txt changes |
+- 12 new block names: |
+ Alchemical_Symbols |
+ Bamum_Supplement |
+ Batak |
+ Brahmi |
+ CJK_Unified_Ideographs_Extension_D |
+ Emoticons |
+ Ethiopic_Extended_A |
+ Kana_Supplement |
+ Mandaic |
+ Miscellaneous_Symbols_And_Pictographs |
+ Playing_Cards |
+ Transport_And_Map_Symbols |
+ -> add to uchar.h |
+ -> add to UCharacter.UnicodeBlock |
+ Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
+ replace public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
+- Joining_Group (jg) values: |
+ Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal which becomes an alias |
+ -> uchar.h & UCharacter.JoiningGroup |
+- 3 new scripts: |
+ sc ; Batk ; Batak |
+ sc ; Brah ; Brahmi |
+ sc ; Mand ; Mandaic |
+ -> remove these from SyntheticPropertyValueAliases.txt |
+ -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN |
+ -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
+ and in com.ibm.icu.dev.test.lang.TestUScript.java |
+- 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.html |
+ (added 2009-11-11..2010-07-18) |
+ Bass 259 Bassa Vah |
+ Dupl 755 Duployan shortand |
+ Elba 226 Elbasan |
+ Gran 343 Grantha |
+ Kpel 436 Kpelle |
+ Loma 437 Loma |
+ Mend 438 Mende |
+ Merc 101 Meroitic Cursive |
+ Narb 106 Old North Arabian |
+ Nbat 159 Nabataean |
+ Palm 126 Palmyrene |
+ Sind 318 Sindhi |
+ Wara 262 Warang Citi |
+ -> uscript.h |
+ -> com.ibm.icu.lang.UScript |
+ find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
+ replace public static final int \1 = \2;\3 |
+ -> SyntheticPropertyValueAliases.txt |
+ -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScriptCodeAPI() |
+ and in com.ibm.icu.dev.test.lang.TestUScript.java |
+- ISO 15924 name change |
+ Mero 100 Meroitic Hieroglyphs (was Meroitic) |
+ -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC |
+- property value alias added for Cham, was already moved out of SyntheticPropertyValueAliases.txt |
+ |
+* UnicodeData.txt changes |
+- new CJK block: |
+ 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; |
+ 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; |
+ -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion |
+ |
+* build Unicode tools using CMake+make |
+ |
+* run genpname/preparse.pl (on Linux) |
+ + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname |
+ + make sure that data.h is writable |
+ + perl preparse.pl ~/svn.icu/trunk/src > out.txt |
+ + preparse.pl shows no errors, out.txt Info and Warning lines look ok |
+ |
+* rebuild Unicode tools (at least genpname) using make |
+- You might first need to "make install" ICU so that the tools build can pick |
+ up the new definitions from the installed header files. |
+ |
+* run genpname |
+- ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/src/source/data/in |
+- rebuild ICU & tools |
+ |
+* update source/data/unidata/norm2/nfkc_cf.txt |
+- follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizationProps.txt |
+ |
+* update source/data/unidata/norm2/uts46.txt |
+- download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt |
+ to ~/svn.icu/tools/trunk/src/unicode/py |
+- adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_mapped values |
+- ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py |
+- ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/data/unidata/norm2 |
+ |
+* update uts46test.cpp and UTS46Test.java if there are new characters that are equivalent to |
+ sequences with non-LDH ASCII (that is, their decompositions contain '=' or similar) |
+- grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCII characters |
+- Unicode 6.0: U+2260, U+226E, U+226F |
+ |
+* generate core properties data files |
+- ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
+- rebuild ICU & tools |
+- run makeuca.sh so that genuca picks up the new nfc.nrm: |
+ ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
+- rebuild ICU & tools |
+ |
+* implement new Script_Extensions property (provisional) |
+- parser & generator: genprops & uprops.icu |
+- uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucdapi.c & intltest/usettest.cpp |
+- UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, UnicodeSetTest.java |
+ |
+* switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 |
+- (one-time change) |
+- genbidi/gencase/genprops tools changes |
+- re-run makeprops.sh (see above) |
+- UCharacterProperty.java, UCharacterTypeIterator.java, |
+ UBiDiProps.java, UCaseProps.java, and several others with minor changes; |
+ UCharacterPropertyReader.java deleted and its code folded into UCharacterProperty.java |
+ |
+* update Java data files |
+- refresh just the UCD-related files, just to be safe |
+- see (ICU4C)/source/data/icu4j-readme.txt |
+- mkdir /tmp/icu4j |
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
+ output: |
+ ... |
+ Unicode .icu files built to ./out/build/icudt45l |
+ mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
+ echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
+ LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
+ jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b |
+ mkdir -p /tmp/icu4j/main/shared/data |
+ cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
+- copy the big-endian Unicode data files to another location, |
+ separate from the other data files |
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/cnvalias.icu |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
+- refresh ICU4J |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
+ |
+* refresh Java test .txt files |
+- copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unicode |
+ |
+* un-hardcode normalization skippable (NF*_Inert) test data |
+- removes one manual step from the Unicode upgrade, and removes dependency on one of Mark's tools |
+ |
+* copy updated break iterator test files |
+- now handled by early ucdcopy.py and |
+ copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/testdata |
+ (old instructions: |
+ copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt |
+ to ~/svn.icu/trunk/src/source/test/testdata) |
+- they are not used in ICU4J |
+ |
+* UCA |
+ |
+- get output from Mark's tools; look in |
+ http://www.unicode.org/~book/incoming/mark/uca6.0.0/ |
+ http://www.macchiato.com/unicode/utc/additional-uca-files |
+ http://www.unicode.org/Public/UCA/6.0.0/ |
+ http://www.unicode.org/~mdavis/uca/ |
+- update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
+- update Han-implicit ranges for new CJK extensions: |
+ swapCJK() in ucol.cpp & ImplicitCEGenerator.java |
+- genuca: allow bytes 02 for U+FFFE, new merge-sort character; |
+ do not add it into invuca so that tailoring primary-after an ignorable works |
+- genuca: permit space between [variable top] bytes |
+- ucol.cpp: treat noncharacters like unassigned rather than ignorable |
+- run makeuca.sh: |
+ ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/trunk/bld |
+- rebuild ICU4C |
+- refresh ICU4J collation data: |
+ (subset of instructions above for properties data refresh, except copies all coll/*) |
+ ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
+ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
+ ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
+- update (ICU)/source/test/testdata/CollationTest_*.txt |
+ and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
+ with output from Mark's Unicode tools |
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments) |
+- note on intltest: if collate/UCAConformanceTest fails, then |
+ utility/MultithreadTest/TestCollators will fail as well; |
+ fix the conformance test before looking into the multi-thread test |
+ |
+* When refreshing all of ICU4J data from ICU4C |
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
+- cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/data |
+or |
+- ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
+ |
+*** LayoutEngine script information |
+ |
+(For details see the Unicode 5.2 change log below.) |
+ |
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, |
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates |
+ScriptRunData.cpp, which is no longer needed.) |
+ |
+The generated files have a current copyright date and "@draft" statement. |
+ |
+* copy the above files into <icu>/source/layout, replacing the old files. |
+* fix mixed line endings |
+* review the diffs and fix incorrect @draft and missing aliases; |
+ Unicode-derived script codes should be "born stable" like constants in uchar.h, uscript.h etc. |
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 5.2 update |
+ |
+*** related ICU Trac tickets |
+ |
+7084 Unicode 5.2 |
+ |
+7167 verify collation bytes |
+7235 Java test NAME_ALIAS |
+7236 Java DerivedCoreProperties.txt test |
+7237 Java BidiTest.txt |
+7238 UTrie2 in core unidata |
+7239 test for tailoring gaps |
+7240 Java fix CollationMiscTest |
+7243 update layout engine for Unicode 5.2 |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- configure.in & configure |
+- update ucdVersion in gennames.c if an algorithmic range changes |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+ |
+python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata |
+- includes finding files regardless of version numbers, |
+ copying them, and performing the equivalent processing of the |
+ ucdstrip and ucdmerge tools on the desired set of files |
+ |
+* notes on changes |
+- PropertyAliases.txt |
+ moved from numeric to enumerated: |
+ ccc ; Canonical_Combining_Class |
+ new string properties: |
+ NFKC_CF ; NFKC_Casefold |
+ Name_Alias; Name_Alias |
+ new binary properties: |
+ Cased ; Cased |
+ CI ; Case_Ignorable |
+ CWCF ; Changes_When_Casefolded |
+ CWCM ; Changes_When_Casemapped |
+ CWKCF ; Changes_When_NFKC_Casefolded |
+ CWL ; Changes_When_Lowercased |
+ CWT ; Changes_When_Titlecased |
+ CWU ; Changes_When_Uppercased |
+ new CJK Unihan properties (not supported by ICU) |
+- PropertyValueAliases.txt |
+ new block names |
+ new scripts |
+ one script code change: |
+ sc ; Qaai ; Inherited |
+ -> |
+ sc ; Zinh ; Inherited ; Qaai |
+ new Line_Break (lb) value: |
+ lb ; CP ; Close_Parenthesis |
+ new Joining_Group (jg) values: Farsi_Yeh, Nya |
+ other new values: |
+ ccc; 214; ATA ; Attached_Above |
+- DerivedBidiClass.txt |
+ new default-R range: U+1E800 - U+1EFFF |
+- UnicodeData.txt |
+ all of the ISO comments are gone |
+ new CJK block end: |
+ 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> |
+ new CJK block: |
+ 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; |
+ 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; |
+ |
+* genpname |
+- run preparse.pl |
+ + cd \svn\icuproj\icu\trunk\source\tools\genpname |
+ + make sure that data.h is writable |
+ + perl preparse.pl \svn\icuproj\icu\trunk > out.txt |
+ + preparse.pl complains with errors like the following: |
+ Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at preparse.pl line 1322, <GEN6> line 34. |
+ This is because ICU 4.0 had scripts from ISO 15924 which are now |
+ added to Unicode 5.2, and the Perl script shows a conflict between SyntheticPropertyValueAliases.txt |
+ and PropertyValueAliases.txt. |
+ -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
+ Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt |
+ + preparse.pl complains with errors about block names missing from uchar.h; add them |
+ |
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops |
+- new block & script values |
+ + 26 new blocks |
+ copy new blocks from Blocks.txt |
+ MS VC++ 2008 regular expression: |
+ find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" |
+ replace with " UBLOCK_\3 = 172, /*[\1]*/" |
+ + several new script values already added in ICU 4.0 for ISO 15924 coverage |
+ (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) |
+ + 3 new script values added for ISO 15924 and Unicode 5.2 coverage |
+ + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) |
+ (added to SyntheticPropertyValueAliases.txt) |
+- new Joining Group (JG) values: Farsi_Yeh, Nya |
+- new Line_Break (lb) value: |
+ lb ; CP ; Close_Parenthesis |
+ |
+* hardcoded Unihan range end/limit |
+- Unihan range end moves from 9FC3 to 9FCB |
+ search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) |
+ + do change gennames.c |
+ |
+* Compare definitions of new binary properties with what we used to use |
+ in algorithms, to see if the definitions changed. |
+- Verified that definitions for Cased and Case_Ignorable are unchanged. |
+ The gencase tool now parses the newly public Case_Ignorable values |
+ in case the definition changes in the future. |
+ |
+* uchar.c & uprops.h & uprops.c & genprops |
+- new numeric values that didn't exist in Unicode data before: |
+ 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 |
+ the ones with denominators >9 cannot be supported by uprops.icu formatVersion 5, |
+ therefore redesign the encoding of numeric types and values for formatVersion 6; |
+ design for simple numbers up to at least 144 ("one gross"), |
+ large values up to at least 10^20, |
+ and fractions with numerators -1..17 and denominators 1..16 |
+ to cover current and expected future values |
+ (e.g., more Han numeric values, Meroitic twelfths) |
+ |
+* reimplement Hangul_Syllable_Type for new Jamo characters |
+- the old code assumed that all Jamo characters are in the 11xx block |
+- Unicode 5.2 fills holes there and adds new Jamo characters in |
+ A960..A97F; Hangul Jamo Extended-A |
+ and in |
+ D7B0..D7FF; Hangul Jamo Extended-B |
+- Hangul_Syllable_Type can be trivially derived from a subset of |
+ Grapheme_Cluster_Break values |
+ |
+* build Unicode data source code for hardcoding core data |
+C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\trunk\source\data\ CFG=x86\release uni-core-data |
+ |
+ICU data make path is \svn\icuproj\icu\trunk\source\data\ |
+ICU root path is \svn\icuproj\icu\trunk |
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files. |
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. |
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. |
+Information: cannot find "spreplocal.mk". Not building user-additional stringprep files. |
+Creating data file for Unicode Property Names |
+Creating data file for Unicode Character Properties |
+Creating data file for Unicode Case Mapping Properties |
+Creating data file for Unicode BiDi/Shaping Properties |
+Creating data file for Unicode Normalization |
+Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt43l" |
+Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" |
+ |
+- copy the .c source files to C:\svn\icuproj\icu\trunk\source\common |
+ and rebuild the common library |
+ |
+*** UCA |
+ |
+- update FractionalUCA.txt with new canonical closure (output from Mark's Unicode tools) |
+- update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's Unicode tools |
+- update source/test/testdata/CollationTest_*.txt with output from Mark's Unicode tools |
+[ Begin obsolete instructions: |
+ Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_STUB.txt files. |
+ - generate the source/test/testdata/CollationTest_*_STUB.txt files via source/tools/genuca/genteststub.py |
+ on Windows: |
+ python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt |
+ python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py CollationTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt |
+ End obsolete instructions] |
+- run all tests with the *_SHORT.txt or the full files (the full ones have comments) |
+ not just the *_STUB.txt files |
+- note on intltest: if collate/UCAConformanceTest fails, then |
+ utility/MultithreadTest/TestCollators will fail as well; |
+ fix the conformance test before looking into the multi-thread test |
+ |
+*** Implement Cased & Case_Ignorable properties |
+- via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnorable() |
+- Problem: These properties should be disjoint, but aren't |
+- UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are Cased or not |
+- change ucase.icu to be able to store any combination of Cased and Case_Ignorable |
+ |
+*** Implement Changes_When_Xyz properties |
+- without stored data |
+ |
+*** Implement Name_Alias property |
+- add it as another name field in unames.icu |
+- make it available via u_charName() and UCharNameChoice and |
+- consider it in u_charFromName() |
+ |
+*** Break iterators |
+ |
+* Update break iterator rules to new UAX versions and new property values |
+* Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/ucd/auxiliary |
+ |
+*** new BidiTest file |
+- review format and data |
+- copy BidiTest.txt to source/test/testdata |
+- write test code using this data |
+- fix ICU code where it fails the conformance test |
+ |
+*** Java |
+- generally, find and update code corresponding to C/C++ |
+- UCharacter.UnicodeBlock constants: |
+ a) add an _ID integer per new block, update COUNT |
+ b) add a class instance per new block |
+ Visual Studio regex: |
+ find UBLOCK_{[^ ]+} = [0-9]+, {/.+} |
+ replace with public static final UnicodeBlock \1 = new UnicodeBlock("\1", \1_ID); \2 |
+- CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() |
+ |
+- port test changes to Java |
+ |
+*** LayoutEngine script information |
+ |
+(For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/changeset/23833) |
+ |
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguages.h, |
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (It also generates |
+ScriptRunData.cpp, which is no longer needed.) |
+ |
+The generated files have a current copyright date and "@draft" statement. |
+ |
+-> Eric Mader wrote in email on 20090930: |
+ "I think the tool has been modified to update @draft to @stable for |
+ older scripts and to add @draft for new scripts. |
+ (I worked with an intern on this last year.) |
+ You should check the output after you run it." |
+ |
+* copy the above files into <icu>/source/layout, replacing the old files. |
+* fix mixed line endings |
+* review the diffs and fix incorrect @draft and missing aliases |
+* manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
+ |
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
+ |
+-> Eric Mader wrote in email on 20090930: |
+ "This is just a matter of making sure that all the per-script tables have |
+ entries for any new scripts that were added. |
+ If any new Indic characters were added, then the class tables in |
+ IndicClassTables.cpp should be updated to reflect this. |
+ John Emmons should know how to do this if it's required." |
+ |
+* rebuild the layout and layoutex libraries. |
+ |
+*** Documentation |
+- Update User Guide |
+ + Jamo_Short_Name, sfc->scf, binary property value aliases |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 5.1 update |
+ |
+*** related ICU Trac tickets |
+ |
+5696 Update to Unicode 5.1 |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- configure.in & configure |
+- update ucdVersion in gennames.c if an algorithmic range changes |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+- ucdstrip: |
+ DerivedCoreProperties.txt |
+ DerivedNormalizationProps.txt |
+ NormalizationTest.txt |
+ PropList.txt |
+ Scripts.txt |
+ GraphemeBreakProperty.txt |
+ SentenceBreakProperty.txt |
+ WordBreakProperty.txt |
+- ucdstrip and ucdmerge: |
+ EastAsianWidth.txt |
+ LineBreak.txt |
+ |
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
+copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ |
+copy 5.1.0\ucd\Blocks.txt ..\unidata\ |
+copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ |
+copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ |
+copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
+copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
+copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
+copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
+copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ |
+copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ |
+copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ |
+copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ |
+copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ |
+ |
+ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
+ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
+ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
+ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt |
+ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
+ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
+ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
+ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
+ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
+ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
+ |
+* genpname |
+- run preparse.pl |
+ + cd \svn\icuproj\icu\uni51\source\tools\genpname |
+ + make sure that data.h is writable |
+ + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt |
+ + preparse.pl complains with errors like the following: |
+ Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl line 1308, <GEN6> line 30. |
+ This is because ICU 3.8 had scripts from ISO 15924 which are now |
+ added to Unicode 5.1, and the script shows a conflict between SyntheticPropertyValueAliases.txt |
+ and PropertyValueAliases.txt. |
+ -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
+ Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii |
+ + PropertyValueAliases.txt now explicitly contains values for boolean properties: |
+ N/Y, No/Yes, F/T, False/True |
+ -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. |
+ It will use further values from the file if present. |
+ |
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops |
+- new block & script values |
+ + 17 new blocks |
+ + 11 new script values already added in ICU 3.8 for ISO 15924 coverage |
+ (removed from SyntheticPropertyValueAliases.txt) |
+ + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) |
+ (added to SyntheticPropertyValueAliases.txt) |
+- uprops.icu (uprops.h) only provides 7 bits for script codes. |
+ In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. |
+ There is none above 127 yet which is the script code for an |
+ assigned Unicode character, so ICU 4.0 uprops.icu does not store any |
+ script code values greater than 127. |
+ However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=129 |
+ in a parallel bit field, and that overflows now. |
+ Also, future values >=128 would be incompatible anyway. |
+ uprops.h is modified to move around several of the bit fields |
+ in the properties vector words, and now uses 8 bits for the script code. |
+ Two other bit fields also grow to accommodate future growth: |
+ Block (current count: 172) grows from 8 to 9 bits, |
+ and Word_Break grows from 4 to 5 bits. |
+- renamed property Simple_Case_Folding (sfc->scf) |
+ + nothing to be done: handled as normal alias |
+- new property JSN Jamo_Short_Name |
+ + no new API: only contributes to the Name property |
+- new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark |
+- new Joining Group (JG) value: Burushashki_Yeh_Barree |
+- new Sentence_Break (SB) values: |
+ SB ; CR ; CR |
+ SB ; EX ; Extend |
+ SB ; LF ; LF |
+ SB ; SC ; SContinue |
+- new Word_Break (WB) values: |
+ WB ; CR ; CR |
+ WB ; Extend ; Extend |
+ WB ; LF ; LF |
+ WB ; MB ; MidNumLet |
+ |
+* Further changes in the 2008-02-29 update: |
+- Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from DICP |
+ because they should not normally be invisible. |
+- new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_Yeh_Barree (one 'h' removed) |
+- new Grapheme_Cluster_Break (GCB) value: PP=Prepend |
+- new Word_Break (WB) value: NL=Newline |
+ |
+* hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) |
+- Unihan range end moves from 9FBB to 9FC3 |
+ search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) |
+ + do change gennames.c |
+ |
+* build Unicode data source code for hardcoding core data |
+C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\icu\uni51\source\data\ CFG=debug uni-core-data |
+ |
+ICU data make path is \svn\icuproj\icu\uni51\source\data\ |
+ICU root path is \svn\icuproj\icu\uni51 |
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
+Information: cannot find "brklocal.mk". Not building user-additional break iterator files. |
+Information: cannot find "reslocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "collocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "rbnflocal.mk". Not building user-additional resource bundle files. |
+Information: cannot find "trnslocal.mk". Not building user-additional transliterator files. |
+Information: cannot find "misclocal.mk". Not building user-additional miscellaenous files. |
+Creating data file for Unicode Character Properties |
+Creating data file for Unicode Case Mapping Properties |
+Creating data file for Unicode BiDi/Shaping Properties |
+Creating data file for Unicode Normalization |
+Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt39l" |
+Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" |
+ |
+- copy the .c source files to C:\svn\icuproj\icu\uni51\source\common |
+ and rebuild the common library |
+ |
+*** Break iterators |
+ |
+* Update break iterator rules to new UAX versions and new property values |
+ |
+*** UCA |
+ |
+* update FractionalUCA.txt and UCARules.txt with new canonical closure |
+ |
+*** Test suites |
+- Test that APIs using Unicode property value aliases (like UnicodeSet) |
+ support all of the boolean values N/Y, No/Yes, F/T, False/True |
+ -> TestBinaryValues() tests in both cintltst and intltest |
+ |
+*** LayoutEngine script information |
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
+ScriptRunData.cpp, which is no longer needed.) |
+ |
+The generated files have a current copyright date and "@draft" statement. |
+ |
+* copy the above files into <icu>/source/layout, replacing the old files. |
+ |
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
+ |
+* rebuild the layout and layoutex libraries. |
+ |
+*** Documentation |
+- Update User Guide |
+ + Jamo_Short_Name, sfc->scf, binary property value aliases |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 5.0 update |
+ |
+*** related Jitterbugs |
+ |
+5084 RFE: Update to Unicode 5.0 |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+- ucdstrip: |
+ DerivedCoreProperties.txt |
+ DerivedNormalizationProps.txt |
+ NormalizationTest.txt |
+ PropList.txt |
+ Scripts.txt |
+ GraphemeBreakProperty.txt |
+ SentenceBreakProperty.txt |
+ WordBreakProperty.txt |
+- ucdstrip and ucdmerge: |
+ EastAsianWidth.txt |
+ LineBreak.txt |
+ |
+* my ucd2unidata.bat (needs to be updated each time with UCD and file version numbers) |
+copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ |
+copy 5.0.0\ucd\Blocks.txt ..\unidata\ |
+copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ |
+copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ |
+copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
+copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
+copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
+copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
+copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ |
+copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ |
+copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ |
+copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ |
+copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ |
+ |
+ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCoreProperties.txt |
+ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormalizationProps.txt |
+ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
+ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt |
+ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
+ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBreakProperty.txt |
+ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBreakProperty.txt |
+ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakProperty.txt |
+ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.txt |
+ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
+ |
+* update FractionalUCA.txt and UCARules.txt with new canonical closure |
+ |
+* genpname |
+- run preparse.pl |
+ + make sure that data.h is writable |
+ + perl preparse.pl \cvs\oss\icu > out.txt |
+ |
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops |
+- new block & script values |
+ + script values already added in ICU 3.6 because all of ISO 15924 is now covered |
+ |
+* build Unicode data source code for hardcoding core data |
+C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\data\ CFG=debug uni-core-data |
+ |
+ICU data make path is \cvs\oss\icu\source\data\ |
+ICU root path is \cvs\oss\icu |
+Information: cannot find "ucmlocal.mk". Not building user-additional converter files. |
+[etc.] |
+Creating data file for Unicode Character Properties |
+Creating data file for Unicode Case Mapping Properties |
+Creating data file for Unicode BiDi/Shaping Properties |
+Creating data file for Unicode Normalization |
+Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" |
+Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" |
+ |
+- copy the .c source files to C:\cvs\oss\icu\source\common |
+ and rebuild the common library |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- configure.in |
+ |
+*** LayoutEngine script information |
+* Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScripts.h, LELanguage.h, |
+ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory. (it also generates |
+ScriptRunData.cpp, which is no longer needed.) |
+ |
+The generated files have a current copyright date and "@draft" statement. |
+ |
+* copy the above files into <icu>/source/layout, replacing the old files. |
+ |
+Add new default entries to the indicClassTables array in <icu>/source/layout/IndicClassTables.cpp |
+and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This step should be automated...) |
+ |
+* rebuild the layout and layoutex libraries. |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 4.1 update |
+ |
+*** related Jitterbugs |
+ |
+4332 RFE: Update to Unicode 4.1 |
+4157 RBBI, TR29 4.1 updates |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+- ucdstrip: |
+ DerivedCoreProperties.txt |
+ DerivedNormalizationProps.txt |
+ NormalizationTest.txt |
+ GraphemeBreakProperty.txt |
+ SentenceBreakProperty.txt |
+ WordBreakProperty.txt |
+- ucdstrip and ucdmerge: |
+ EastAsianWidth.txt |
+ LineBreak.txt |
+ |
+* add new files to the repository |
+ GraphemeBreakProperty.txt |
+ SentenceBreakProperty.txt |
+ WordBreakProperty.txt |
+ |
+* update FractionalUCA.txt and UCARules.txt with new canonical closure |
+ |
+* genpname |
+- handle new enumerated properties in sub read_uchar |
+- run preparse.pl |
+ |
+* uchar.h & uscript.h & uprops.h & uprops.c & genprops |
+- new binary properties |
+ + Pattern_Syntax |
+ + Pattern_White_Space |
+- new enumerated properties |
+ + Grapheme_Cluster_Break |
+ + Sentence_Break |
+ + Word_Break |
+- new block & script & line break values |
+ |
+* gencase |
+- case-ignorable changes |
+ see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
+ now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- configure.in |
+ |
+*** tests |
+- verify that u_charMirror() round-trips |
+- test all new properties and some new values of old properties |
+ |
+*** other code |
+ |
+* hardcoded Unihan range end/limit |
+- Unihan range end moves from 9FA5 to 9FBB |
+ search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) |
+ + do not modify BOCU/BOCSU code because that would change the encoding |
+ and break binary compatibility! |
+ + similarly, do not change the GB 18030 range data (ucnvmbcs.c), |
+ NamePrepProfile.txt |
+ + ignore trietest.c: test data is arbitrary |
+ + ignore tstnorm.cpp: test optimization, not important |
+ + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole block up to 9FFF |
+ + do change line_th.txt and word_th.txt |
+ by replacing hardcoded ranges with the new property values |
+ + do change gennames.c |
+ |
+source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
+source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
+source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, |
+ |
+* case mappings |
+- compare new special casing context conditions with previous ones |
+ see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
+ |
+* genpname |
+- consider storing only the short name if it is the same as the long name |
+ |
+*** other reviews |
+- UAX #29 changes (grapheme/word/sentence breaks) |
+- UAX #14 changes (line breaks) |
+- Pattern_Syntax & Pattern_White_Space |
+ |
+---------------------------------------------------------------------------- *** |
+ |
+Unicode 4.0.1 update |
+ |
+*** related Jitterbugs |
+ |
+3170 RFE: Update to Unicode 4.0.1 |
+3171 Add new Unicode 4.0.1 properties |
+3520 use Unicode 4.0.1 updates for break iteration |
+ |
+*** data files & enums & parser code |
+ |
+* file preparation |
+- ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCoreProperties.txt |
+- ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt |
+ |
+* file fixes |
+- fix UnicodeData.txt general categories of Ethiopic digits Nd->No |
+ according to PRI #26 |
+ http://www.unicode.org/review/resolved-pri.html#pri26 |
+- undone again because no corrigendum in sight; |
+ instead modified tests to not check consistency on this for Unicode 4.0.1 |
+ |
+* ucdterms.txt |
+- update from http://www.unicode.org/copyright.html |
+ formatted for plain text |
+ |
+* uchar.h & uprops.h & uprops.c & genprops |
+- add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed |
+- add U_LB_INSEPARABLE due to a spelling fix |
+ + put short name comment only on line with new constant |
+ for genpname perl script parser |
+- new binary properties |
+ + STerm |
+ + Variation_Selector |
+ |
+* genpname |
+- fix genpname perl script so that it doesn't choke on more than 2 names per property value |
+- perl script: correctly calculate the maximum number of fields per row |
+ |
+* uscript.h |
+- new script code Hrkt=Katakana_Or_Hiragana |
+ |
+* gennorm.c track changes in DerivedNormalizationProps.txt |
+- "FNC" -> "FC_NFKC" |
+- single field "NFD_NO" -> two fields "NFD_QC; N" etc. |
+ |
+* genprops/props2.c track changes in DerivedNumericValues.txt |
+- changed from 3 columns to 2, dropping the numeric type |
+ + assume that the type is always numeric for Han characters, |
+ and that only those are added in addition to what UnicodeData.txt lists |
+ |
+*** Unicode version numbers |
+- makedata.mak |
+- uchar.h |
+- configure.in |
+ |
+*** tests |
+- update test of default bidi classes according to PRI #28 |
+ /tsutil/cucdtst/TestUnicodeData |
+ http://www.unicode.org/review/resolved-pri.html#pri28 |
+- bidi tests: change exemplar character for ES depending on Unicode version |
+- change hardcoded expected property values where they change |
+ |
+*** other code |
+ |
+* name matching |
+- read UCD.html |
+ |
+* scripts |
+- use new Hrkt=Katakana_Or_Hiragana |
+ |
+* ZWJ & ZWNJ |
+- are now part of combining character sequences |
+- break iteration used to assume that LB classes did not overlap; now they do for ZWJ & ZWNJ |
Property changes on: icu46/source/data/unidata/changes.txt |
___________________________________________________________________ |
Added: svn:eol-style |
+ LF |