OLD | NEW |
(Empty) | |
| 1 * Copyright (C) 2004-2010, International Business Machines |
| 2 * Corporation and others. All Rights Reserved. |
| 3 * |
| 4 * file name: changes.txt |
| 5 * encoding: US-ASCII |
| 6 * tab size: 8 (not used) |
| 7 * indentation:4 |
| 8 * |
| 9 * created on: 2004may06 |
| 10 * created by: Markus W. Scherer |
| 11 * |
| 12 * change log for Unicode updates |
| 13 |
| 14 ---------------------------------------------------------------------------- *** |
| 15 |
| 16 Unicode 6.0 update |
| 17 |
| 18 *** related ICU Trac tickets |
| 19 |
| 20 7264 Unicode 6.0 Update |
| 21 |
| 22 *** Unicode version numbers |
| 23 - makedata.mak |
| 24 - uchar.h |
| 25 (configure.in & configure: have been modified to extract the version from ucha
r.h) |
| 26 - com.ibm.icu.util.VersionInfo |
| 27 |
| 28 *** data files & enums & parser code |
| 29 |
| 30 * file preparation |
| 31 |
| 32 ~/svn.icu/tools/trunk/src/unicode/c/genprops/misc$ ./ucdcopy.py ~/uni60/20100720
/ucd ~/uni60/processed |
| 33 - This now prepares both unidata and testdata files in respective output subfold
ers. |
| 34 |
| 35 * PropertyAliases.txt changes |
| 36 - new Script_Extensions property defined in the new ScriptExtensions.txt file |
| 37 but not listed in PropertyAliases.txt; reported to unicode.org; |
| 38 -> added to tools/trunk/src/unicode/c/genpname/SyntheticPropertyAliases.txt |
| 39 scx; Script_Extensions |
| 40 -> uchar.h with new UProperty section |
| 41 -> com.ibm.icu.lang.UProperty, parallel with uchar.h |
| 42 |
| 43 * PropertyValueAliases.txt changes |
| 44 - 12 new block names: |
| 45 Alchemical_Symbols |
| 46 Bamum_Supplement |
| 47 Batak |
| 48 Brahmi |
| 49 CJK_Unified_Ideographs_Extension_D |
| 50 Emoticons |
| 51 Ethiopic_Extended_A |
| 52 Kana_Supplement |
| 53 Mandaic |
| 54 Miscellaneous_Symbols_And_Pictographs |
| 55 Playing_Cards |
| 56 Transport_And_Map_Symbols |
| 57 -> add to uchar.h |
| 58 -> add to UCharacter.UnicodeBlock |
| 59 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
| 60 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1"
, \1_ID); \2 |
| 61 - Joining_Group (jg) values: |
| 62 Teh_Marbuta_Goal becomes the new canonical value for the old Hamza_On_Heh_Goal
which becomes an alias |
| 63 -> uchar.h & UCharacter.JoiningGroup |
| 64 - 3 new scripts: |
| 65 sc ; Batk ; Batak |
| 66 sc ; Brah ; Brahmi |
| 67 sc ; Mand ; Mandaic |
| 68 -> remove these from SyntheticPropertyValueAliases.txt |
| 69 -> add alias USCRIPT_MANDAIC to USCRIPT_MANDAEAN |
| 70 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
| 71 and in com.ibm.icu.dev.test.lang.TestUScript.java |
| 72 - 13 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges
.html |
| 73 (added 2009-11-11..2010-07-18) |
| 74 Bass 259 Bassa Vah |
| 75 Dupl 755 Duployan shortand |
| 76 Elba 226 Elbasan |
| 77 Gran 343 Grantha |
| 78 Kpel 436 Kpelle |
| 79 Loma 437 Loma |
| 80 Mend 438 Mende |
| 81 Merc 101 Meroitic Cursive |
| 82 Narb 106 Old North Arabian |
| 83 Nbat 159 Nabataean |
| 84 Palm 126 Palmyrene |
| 85 Sind 318 Sindhi |
| 86 Wara 262 Warang Citi |
| 87 -> uscript.h |
| 88 -> com.ibm.icu.lang.UScript |
| 89 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
| 90 replace public static final int \1 = \2;\3 |
| 91 -> SyntheticPropertyValueAliases.txt |
| 92 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScri
ptCodeAPI() |
| 93 and in com.ibm.icu.dev.test.lang.TestUScript.java |
| 94 - ISO 15924 name change |
| 95 Mero 100 Meroitic Hieroglyphs (was Meroitic) |
| 96 -> add new alias USCRIPT_MEROITIC_HIEROGLYPHS to USCRIPT_MEROITIC |
| 97 - property value alias added for Cham, was already moved out of SyntheticPropert
yValueAliases.txt |
| 98 |
| 99 * UnicodeData.txt changes |
| 100 - new CJK block: |
| 101 2B740;<CJK Ideograph Extension D, First>;Lo;0;L;;;;;N;;;;; |
| 102 2B81D;<CJK Ideograph Extension D, Last>;Lo;0;L;;;;;N;;;;; |
| 103 -> add to tools/trunk/src/unicode/c/gennames/gennames.c, with new ucdVersion |
| 104 |
| 105 * build Unicode tools using CMake+make |
| 106 |
| 107 * run genpname/preparse.pl (on Linux) |
| 108 + cd ~/svn.icu/tools/trunk/src/unicode/c/genpname |
| 109 + make sure that data.h is writable |
| 110 + perl preparse.pl ~/svn.icu/trunk/src > out.txt |
| 111 + preparse.pl shows no errors, out.txt Info and Warning lines look ok |
| 112 |
| 113 * rebuild Unicode tools (at least genpname) using make |
| 114 - You might first need to "make install" ICU so that the tools build can pick |
| 115 up the new definitions from the installed header files. |
| 116 |
| 117 * run genpname |
| 118 - ~/svn.icu/tools/trunk/bld/unicode$ c/genpname/genpname -v -d ~/svn.icu/trunk/s
rc/source/data/in |
| 119 - rebuild ICU & tools |
| 120 |
| 121 * update source/data/unidata/norm2/nfkc_cf.txt |
| 122 - follow the instructions in nfkc_cf.txt for updating it from DerivedNormalizati
onProps.txt |
| 123 |
| 124 * update source/data/unidata/norm2/uts46.txt |
| 125 - download http://www.unicode.org/Public/idna/6.0.0/IdnaMappingTable.txt |
| 126 to ~/svn.icu/tools/trunk/src/unicode/py |
| 127 - adjust idna2nrm.py to handle new disallowed_STD3_valid and disallowed_STD3_map
ped values |
| 128 - ~/svn.icu/tools/trunk/src/unicode/py$ ./idna2nrm.py |
| 129 - ~/svn.icu/tools/trunk/src/unicode/py$ cp uts46.txt ~/svn.icu/trunk/src/source/
data/unidata/norm2 |
| 130 |
| 131 * update uts46test.cpp and UTS46Test.java if there are new characters that are e
quivalent to |
| 132 sequences with non-LDH ASCII (that is, their decompositions contain '=' or sim
ilar) |
| 133 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCI
I characters |
| 134 - Unicode 6.0: U+2260, U+226E, U+226F |
| 135 |
| 136 * generate core properties data files |
| 137 - ~/svn.icu/tools/trunk/src/unicode$ ./makeprops.sh ~/svn.icu/trunk/src ~/svn.ic
u/trunk/bld |
| 138 - rebuild ICU & tools |
| 139 - run makeuca.sh so that genuca picks up the new nfc.nrm: |
| 140 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/
trunk/bld |
| 141 - rebuild ICU & tools |
| 142 |
| 143 * implement new Script_Extensions property (provisional) |
| 144 - parser & generator: genprops & uprops.icu |
| 145 - uscript.h, uprops.h, uchar.c, uniset_props.cpp and others, plus cintltst/cucda
pi.c & intltest/usettest.cpp |
| 146 - UScript.java, UCharacterProperty.java, UnicodeSet.java, TestUScript.java, Unic
odeSetTest.java |
| 147 |
| 148 * switch ubidi.icu, ucase.icu and uprops.icu from UTrie to UTrie2 |
| 149 - (one-time change) |
| 150 - genbidi/gencase/genprops tools changes |
| 151 - re-run makeprops.sh (see above) |
| 152 - UCharacterProperty.java, UCharacterTypeIterator.java, |
| 153 UBiDiProps.java, UCaseProps.java, and several others with minor changes; |
| 154 UCharacterPropertyReader.java deleted and its code folded into UCharacterPrope
rty.java |
| 155 |
| 156 * update Java data files |
| 157 - refresh just the UCD-related files, just to be safe |
| 158 - see (ICU4C)/source/data/icu4j-readme.txt |
| 159 - mkdir /tmp/icu4j |
| 160 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 161 output: |
| 162 ... |
| 163 Unicode .icu files built to ./out/build/icudt45l |
| 164 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
| 165 echo ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
| 166 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin
/icupkg ./out/tmp/icudt45l.dat ./out/icu4j/icudt45b.dat -a ./out/icu4j/add.txt -
s ./out/build/icudt45l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt45b |
| 167 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt45b |
| 168 mkdir -p /tmp/icu4j/main/shared/data |
| 169 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
| 170 - copy the big-endian Unicode data files to another location, |
| 171 separate from the other data files |
| 172 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
| 173 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
| 174 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.icu
/tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
| 175 ~/svn.icu/trunk/bld/data/out/icu4j$ rm /tmp/icu4j/com/ibm/icu/impl/data/icud
t45b/cnvalias.icu |
| 176 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/*.nrm
/tmp/icu4j/com/ibm/icu/impl/data/icudt45b |
| 177 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*
.icu /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
| 178 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/brkitr
/* /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/brkitr |
| 179 - refresh ICU4J |
| 180 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared
/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
| 181 |
| 182 * refresh Java test .txt files |
| 183 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unic
ode |
| 184 |
| 185 * un-hardcode normalization skippable (NF*_Inert) test data |
| 186 - removes one manual step from the Unicode upgrade, and removes dependency on on
e of Mark's tools |
| 187 |
| 188 * copy updated break iterator test files |
| 189 - now handled by early ucdcopy.py and |
| 190 copying the uni60/processed/testdata files to ~/svn.icu/trunk/src/source/test/
testdata |
| 191 (old instructions: |
| 192 copy from (Unicode 6.0)/ucd/auxiliary/*BreakTest-6....txt |
| 193 to ~/svn.icu/trunk/src/source/test/testdata) |
| 194 - they are not used in ICU4J |
| 195 |
| 196 * UCA |
| 197 |
| 198 - get output from Mark's tools; look in |
| 199 http://www.unicode.org/~book/incoming/mark/uca6.0.0/ |
| 200 http://www.macchiato.com/unicode/utc/additional-uca-files |
| 201 http://www.unicode.org/Public/UCA/6.0.0/ |
| 202 http://www.unicode.org/~mdavis/uca/ |
| 203 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
| 204 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
| 205 - update Han-implicit ranges for new CJK extensions: |
| 206 swapCJK() in ucol.cpp & ImplicitCEGenerator.java |
| 207 - genuca: allow bytes 02 for U+FFFE, new merge-sort character; |
| 208 do not add it into invuca so that tailoring primary-after an ignorable works |
| 209 - genuca: permit space between [variable top] bytes |
| 210 - ucol.cpp: treat noncharacters like unassigned rather than ignorable |
| 211 - run makeuca.sh: |
| 212 ~/svn.icu/tools/trunk/src/unicode$ ./makeuca.sh ~/svn.icu/trunk/src ~/svn.icu/
trunk/bld |
| 213 - rebuild ICU4C |
| 214 - refresh ICU4J collation data: |
| 215 (subset of instructions above for properties data refresh, except copies all c
oll/*) |
| 216 ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 217 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
| 218 ~/svn.icu/trunk/bld/data/out/icu4j$ cp com/ibm/icu/impl/data/icudt45b/coll/*
/tmp/icu4j/com/ibm/icu/impl/data/icudt45b/coll |
| 219 ~/svn.icu/trunk/bld/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared
/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/icudt45b |
| 220 - update (ICU)/source/test/testdata/CollationTest_*.txt |
| 221 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
| 222 with output from Mark's Unicode tools |
| 223 - run all tests with the *_SHORT.txt or the full files (the full ones have comme
nts) |
| 224 - note on intltest: if collate/UCAConformanceTest fails, then |
| 225 utility/MultithreadTest/TestCollators will fail as well; |
| 226 fix the conformance test before looking into the multi-thread test |
| 227 |
| 228 * When refreshing all of ICU4J data from ICU4C |
| 229 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 230 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/d
ata |
| 231 or |
| 232 - ~/svn.icu/trunk/bld$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
| 233 |
| 234 *** LayoutEngine script information |
| 235 |
| 236 (For details see the Unicode 5.2 change log below.) |
| 237 |
| 238 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScri
pts.h, LELanguages.h, |
| 239 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory.
(It also generates |
| 240 ScriptRunData.cpp, which is no longer needed.) |
| 241 |
| 242 The generated files have a current copyright date and "@draft" statement. |
| 243 |
| 244 * copy the above files into <icu>/source/layout, replacing the old files. |
| 245 * fix mixed line endings |
| 246 * review the diffs and fix incorrect @draft and missing aliases; |
| 247 Unicode-derived script codes should be "born stable" like constants in uchar.h
, uscript.h etc. |
| 248 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
| 249 |
| 250 ---------------------------------------------------------------------------- *** |
| 251 |
| 252 Unicode 5.2 update |
| 253 |
| 254 *** related ICU Trac tickets |
| 255 |
| 256 7084 Unicode 5.2 |
| 257 |
| 258 7167 verify collation bytes |
| 259 7235 Java test NAME_ALIAS |
| 260 7236 Java DerivedCoreProperties.txt test |
| 261 7237 Java BidiTest.txt |
| 262 7238 UTrie2 in core unidata |
| 263 7239 test for tailoring gaps |
| 264 7240 Java fix CollationMiscTest |
| 265 7243 update layout engine for Unicode 5.2 |
| 266 |
| 267 *** Unicode version numbers |
| 268 - makedata.mak |
| 269 - uchar.h |
| 270 - configure.in & configure |
| 271 - update ucdVersion in gennames.c if an algorithmic range changes |
| 272 |
| 273 *** data files & enums & parser code |
| 274 |
| 275 * file preparation |
| 276 |
| 277 python source\tools\genprops\misc\ucdcopy.py "C:\Documents and Settings\mscherer
\My Documents\unicode\ucd\5.2.0" C:\svn\icuproj\icu\trunk\source\data\unidata |
| 278 - includes finding files regardless of version numbers, |
| 279 copying them, and performing the equivalent processing of the |
| 280 ucdstrip and ucdmerge tools on the desired set of files |
| 281 |
| 282 * notes on changes |
| 283 - PropertyAliases.txt |
| 284 moved from numeric to enumerated: |
| 285 ccc ; Canonical_Combining_Class |
| 286 new string properties: |
| 287 NFKC_CF ; NFKC_Casefold |
| 288 Name_Alias; Name_Alias |
| 289 new binary properties: |
| 290 Cased ; Cased |
| 291 CI ; Case_Ignorable |
| 292 CWCF ; Changes_When_Casefolded |
| 293 CWCM ; Changes_When_Casemapped |
| 294 CWKCF ; Changes_When_NFKC_Casefolded |
| 295 CWL ; Changes_When_Lowercased |
| 296 CWT ; Changes_When_Titlecased |
| 297 CWU ; Changes_When_Uppercased |
| 298 new CJK Unihan properties (not supported by ICU) |
| 299 - PropertyValueAliases.txt |
| 300 new block names |
| 301 new scripts |
| 302 one script code change: |
| 303 sc ; Qaai ; Inherited |
| 304 -> |
| 305 sc ; Zinh ; Inherited ; Qaai |
| 306 new Line_Break (lb) value: |
| 307 lb ; CP ; Close_Parenthesis |
| 308 new Joining_Group (jg) values: Farsi_Yeh, Nya |
| 309 other new values: |
| 310 ccc; 214; ATA ; Attached_Above |
| 311 - DerivedBidiClass.txt |
| 312 new default-R range: U+1E800 - U+1EFFF |
| 313 - UnicodeData.txt |
| 314 all of the ISO comments are gone |
| 315 new CJK block end: |
| 316 9FC3;<CJK Ideograph, Last> -> 9FCB;<CJK Ideograph, Last> |
| 317 new CJK block: |
| 318 2A700;<CJK Ideograph Extension C, First>;Lo;0;L;;;;;N;;;;; |
| 319 2B734;<CJK Ideograph Extension C, Last>;Lo;0;L;;;;;N;;;;; |
| 320 |
| 321 * genpname |
| 322 - run preparse.pl |
| 323 + cd \svn\icuproj\icu\trunk\source\tools\genpname |
| 324 + make sure that data.h is writable |
| 325 + perl preparse.pl \svn\icuproj\icu\trunk > out.txt |
| 326 + preparse.pl complains with errors like the following: |
| 327 Error: sc:Egyp already set to Egyptian_Hieroglyphs, cannot set to Egyp at
preparse.pl line 1322, <GEN6> line 34. |
| 328 This is because ICU 4.0 had scripts from ISO 15924 which are now |
| 329 added to Unicode 5.2, and the Perl script shows a conflict between Synthetic
PropertyValueAliases.txt |
| 330 and PropertyValueAliases.txt. |
| 331 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
| 332 Egyp, Java, Lana, Mtei, Orkh, Armi, Avst, Kthi, Phli, Prti, Samr, Tavt |
| 333 + preparse.pl complains with errors about block names missing from uchar.h; ad
d them |
| 334 |
| 335 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| 336 - new block & script values |
| 337 + 26 new blocks |
| 338 copy new blocks from Blocks.txt |
| 339 MS VC++ 2008 regular expression: |
| 340 find "^{[0-9A-F]+}\.\.{[0-9A-F]+}; {[A-Z].+}$" |
| 341 replace with " UBLOCK_\3 = 172, /*[\1]*/" |
| 342 + several new script values already added in ICU 4.0 for ISO 15924 coverage |
| 343 (removed from SyntheticPropertyValueAliases.txt, see genpname notes above) |
| 344 + 3 new script values added for ISO 15924 and Unicode 5.2 coverage |
| 345 + 1 new script value added for ISO 15924 coverage (not in Unicode 5.2) |
| 346 (added to SyntheticPropertyValueAliases.txt) |
| 347 - new Joining Group (JG) values: Farsi_Yeh, Nya |
| 348 - new Line_Break (lb) value: |
| 349 lb ; CP ; Close_Parenthesis |
| 350 |
| 351 * hardcoded Unihan range end/limit |
| 352 - Unihan range end moves from 9FC3 to 9FCB |
| 353 search for both 9FC3 (end) and 9FC4 (limit) (regex 9FC[34], case-insensitive) |
| 354 + do change gennames.c |
| 355 |
| 356 * Compare definitions of new binary properties with what we used to use |
| 357 in algorithms, to see if the definitions changed. |
| 358 - Verified that definitions for Cased and Case_Ignorable are unchanged. |
| 359 The gencase tool now parses the newly public Case_Ignorable values |
| 360 in case the definition changes in the future. |
| 361 |
| 362 * uchar.c & uprops.h & uprops.c & genprops |
| 363 - new numeric values that didn't exist in Unicode data before: |
| 364 1/7, 1/9, 1/10, 3/10, 1/16, 3/16 |
| 365 the ones with denominators >9 cannot be supported by uprops.icu formatVersion
5, |
| 366 therefore redesign the encoding of numeric types and values for formatVersion
6; |
| 367 design for simple numbers up to at least 144 ("one gross"), |
| 368 large values up to at least 10^20, |
| 369 and fractions with numerators -1..17 and denominators 1..16 |
| 370 to cover current and expected future values |
| 371 (e.g., more Han numeric values, Meroitic twelfths) |
| 372 |
| 373 * reimplement Hangul_Syllable_Type for new Jamo characters |
| 374 - the old code assumed that all Jamo characters are in the 11xx block |
| 375 - Unicode 5.2 fills holes there and adds new Jamo characters in |
| 376 A960..A97F; Hangul Jamo Extended-A |
| 377 and in |
| 378 D7B0..D7FF; Hangul Jamo Extended-B |
| 379 - Hangul_Syllable_Type can be trivially derived from a subset of |
| 380 Grapheme_Cluster_Break values |
| 381 |
| 382 * build Unicode data source code for hardcoding core data |
| 383 C:\svn\icuproj\icu\trunk\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\
icu\trunk\source\data\ CFG=x86\release uni-core-data |
| 384 |
| 385 ICU data make path is \svn\icuproj\icu\trunk\source\data\ |
| 386 ICU root path is \svn\icuproj\icu\trunk |
| 387 Information: cannot find "ucmlocal.mk". Not building user-additional converter f
iles. |
| 388 Information: cannot find "brklocal.mk". Not building user-additional break itera
tor files. |
| 389 Information: cannot find "reslocal.mk". Not building user-additional resource bu
ndle files. |
| 390 Information: cannot find "collocal.mk". Not building user-additional resource bu
ndle files. |
| 391 Information: cannot find "rbnflocal.mk". Not building user-additional resource b
undle files. |
| 392 Information: cannot find "trnslocal.mk". Not building user-additional transliter
ator files. |
| 393 Information: cannot find "misclocal.mk". Not building user-additional miscellaen
ous files. |
| 394 Information: cannot find "spreplocal.mk". Not building user-additional stringpre
p files. |
| 395 Creating data file for Unicode Property Names |
| 396 Creating data file for Unicode Character Properties |
| 397 Creating data file for Unicode Case Mapping Properties |
| 398 Creating data file for Unicode BiDi/Shaping Properties |
| 399 Creating data file for Unicode Normalization |
| 400 Unicode .icu files built to "\svn\icuproj\icu\trunk\source\data\out\build\icudt4
3l" |
| 401 Unicode .c source files built to "\svn\icuproj\icu\trunk\source\data\out\tmp" |
| 402 |
| 403 - copy the .c source files to C:\svn\icuproj\icu\trunk\source\common |
| 404 and rebuild the common library |
| 405 |
| 406 *** UCA |
| 407 |
| 408 - update FractionalUCA.txt with new canonical closure (output from Mark's Unicod
e tools) |
| 409 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt from Mark's U
nicode tools |
| 410 - update source/test/testdata/CollationTest_*.txt with output from Mark's Unicod
e tools |
| 411 [ Begin obsolete instructions: |
| 412 Starting with UCA 5.2, we use the CollationTest_*_SHORT.txt files not the *_ST
UB.txt files. |
| 413 - generate the source/test/testdata/CollationTest_*_STUB.txt files via sourc
e/tools/genuca/genteststub.py |
| 414 on Windows: |
| 415 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py Colla
tionTest_NON_IGNORABLE_SHORT.txt CollationTest_NON_IGNORABLE_STUB.txt |
| 416 python C:\svn\icuproj\icu\trunk\source\tools\genuca\genteststub.py Colla
tionTest_SHIFTED_SHORT.txt CollationTest_SHIFTED_STUB.txt |
| 417 End obsolete instructions] |
| 418 - run all tests with the *_SHORT.txt or the full files (the full ones have comme
nts) |
| 419 not just the *_STUB.txt files |
| 420 - note on intltest: if collate/UCAConformanceTest fails, then |
| 421 utility/MultithreadTest/TestCollators will fail as well; |
| 422 fix the conformance test before looking into the multi-thread test |
| 423 |
| 424 *** Implement Cased & Case_Ignorable properties |
| 425 - via UProperty; call ucase.h functions ucase_getType() and ucase_getTypeOrIgnor
able() |
| 426 - Problem: These properties should be disjoint, but aren't |
| 427 - UTC 2009nov decision: skip all Case_Ignorable regardless of whether they are C
ased or not |
| 428 - change ucase.icu to be able to store any combination of Cased and Case_Ignorab
le |
| 429 |
| 430 *** Implement Changes_When_Xyz properties |
| 431 - without stored data |
| 432 |
| 433 *** Implement Name_Alias property |
| 434 - add it as another name field in unames.icu |
| 435 - make it available via u_charName() and UCharNameChoice and |
| 436 - consider it in u_charFromName() |
| 437 |
| 438 *** Break iterators |
| 439 |
| 440 * Update break iterator rules to new UAX versions and new property values |
| 441 * Update source/test/testdata/<boundary>Test.txt files from <unicode.org ucd>/uc
d/auxiliary |
| 442 |
| 443 *** new BidiTest file |
| 444 - review format and data |
| 445 - copy BidiTest.txt to source/test/testdata |
| 446 - write test code using this data |
| 447 - fix ICU code where it fails the conformance test |
| 448 |
| 449 *** Java |
| 450 - generally, find and update code corresponding to C/C++ |
| 451 - UCharacter.UnicodeBlock constants: |
| 452 a) add an _ID integer per new block, update COUNT |
| 453 b) add a class instance per new block |
| 454 Visual Studio regex: |
| 455 find UBLOCK_{[^ ]+} = [0-9]+, {/.+} |
| 456 replace with public static final UnicodeBlock \1 = new UnicodeBlock("
\1", \1_ID); \2 |
| 457 - CHAR_NAME_ALIAS -> UCharacter.getNameAlias() and getCharFromNameAlias() |
| 458 |
| 459 - port test changes to Java |
| 460 |
| 461 *** LayoutEngine script information |
| 462 |
| 463 (For comparison, see the Unicode 5.1 update: http://bugs.icu-project.org/trac/ch
angeset/23833) |
| 464 |
| 465 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScri
pts.h, LELanguages.h, |
| 466 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory.
(It also generates |
| 467 ScriptRunData.cpp, which is no longer needed.) |
| 468 |
| 469 The generated files have a current copyright date and "@draft" statement. |
| 470 |
| 471 -> Eric Mader wrote in email on 20090930: |
| 472 "I think the tool has been modified to update @draft to @stable for |
| 473 older scripts and to add @draft for new scripts. |
| 474 (I worked with an intern on this last year.) |
| 475 You should check the output after you run it." |
| 476 |
| 477 * copy the above files into <icu>/source/layout, replacing the old files. |
| 478 * fix mixed line endings |
| 479 * review the diffs and fix incorrect @draft and missing aliases |
| 480 * manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
| 481 |
| 482 Add new default entries to the indicClassTables array in <icu>/source/layout/Ind
icClassTables.cpp |
| 483 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This s
tep should be automated...) |
| 484 |
| 485 -> Eric Mader wrote in email on 20090930: |
| 486 "This is just a matter of making sure that all the per-script tables have |
| 487 entries for any new scripts that were added. |
| 488 If any new Indic characters were added, then the class tables in |
| 489 IndicClassTables.cpp should be updated to reflect this. |
| 490 John Emmons should know how to do this if it's required." |
| 491 |
| 492 * rebuild the layout and layoutex libraries. |
| 493 |
| 494 *** Documentation |
| 495 - Update User Guide |
| 496 + Jamo_Short_Name, sfc->scf, binary property value aliases |
| 497 |
| 498 ---------------------------------------------------------------------------- *** |
| 499 |
| 500 Unicode 5.1 update |
| 501 |
| 502 *** related ICU Trac tickets |
| 503 |
| 504 5696 Update to Unicode 5.1 |
| 505 |
| 506 *** Unicode version numbers |
| 507 - makedata.mak |
| 508 - uchar.h |
| 509 - configure.in & configure |
| 510 - update ucdVersion in gennames.c if an algorithmic range changes |
| 511 |
| 512 *** data files & enums & parser code |
| 513 |
| 514 * file preparation |
| 515 - ucdstrip: |
| 516 DerivedCoreProperties.txt |
| 517 DerivedNormalizationProps.txt |
| 518 NormalizationTest.txt |
| 519 PropList.txt |
| 520 Scripts.txt |
| 521 GraphemeBreakProperty.txt |
| 522 SentenceBreakProperty.txt |
| 523 WordBreakProperty.txt |
| 524 - ucdstrip and ucdmerge: |
| 525 EastAsianWidth.txt |
| 526 LineBreak.txt |
| 527 |
| 528 * my ucd2unidata.bat (needs to be updated each time with UCD and file version nu
mbers) |
| 529 copy 5.1.0\ucd\BidiMirroring.txt ..\unidata\ |
| 530 copy 5.1.0\ucd\Blocks.txt ..\unidata\ |
| 531 copy 5.1.0\ucd\CaseFolding.txt ..\unidata\ |
| 532 copy 5.1.0\ucd\DerivedAge.txt ..\unidata\ |
| 533 copy 5.1.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
| 534 copy 5.1.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
| 535 copy 5.1.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
| 536 copy 5.1.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
| 537 copy 5.1.0\ucd\NormalizationCorrections.txt ..\unidata\ |
| 538 copy 5.1.0\ucd\PropertyAliases.txt ..\unidata\ |
| 539 copy 5.1.0\ucd\PropertyValueAliases.txt ..\unidata\ |
| 540 copy 5.1.0\ucd\SpecialCasing.txt ..\unidata\ |
| 541 copy 5.1.0\ucd\UnicodeData.txt ..\unidata\ |
| 542 |
| 543 ucdstrip < 5.1.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCorePropertie
s.txt |
| 544 ucdstrip < 5.1.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormaliza
tionProps.txt |
| 545 ucdstrip < 5.1.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
| 546 ucdstrip < 5.1.0\ucd\PropList.txt > ..\unidata\PropList.txt |
| 547 ucdstrip < 5.1.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
| 548 ucdstrip < 5.1.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBr
eakProperty.txt |
| 549 ucdstrip < 5.1.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBr
eakProperty.txt |
| 550 ucdstrip < 5.1.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakPrope
rty.txt |
| 551 ucdstrip < 5.1.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.t
xt |
| 552 ucdstrip < 5.1.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
| 553 |
| 554 * genpname |
| 555 - run preparse.pl |
| 556 + cd \svn\icuproj\icu\uni51\source\tools\genpname |
| 557 + make sure that data.h is writable |
| 558 + perl preparse.pl \svn\icuproj\icu\uni51 > out.txt |
| 559 + preparse.pl complains with errors like the following: |
| 560 Error: sc:Cari already set to Carian, cannot set to Cari at preparse.pl li
ne 1308, <GEN6> line 30. |
| 561 This is because ICU 3.8 had scripts from ISO 15924 which are now |
| 562 added to Unicode 5.1, and the script shows a conflict between SyntheticPrope
rtyValueAliases.txt |
| 563 and PropertyValueAliases.txt. |
| 564 -> Removed duplicate script entries from SyntheticPropertyValueAliases.txt: |
| 565 Cari, Cham, Kali, Lepc, Lyci, Lydi, Olck, Rjng, Saur, Sund, Vaii |
| 566 + PropertyValueAliases.txt now explicitly contains values for boolean properti
es: |
| 567 N/Y, No/Yes, F/T, False/True |
| 568 -> Added N/No and Y/Yes to preparse.pl function read_PropertyValueAliases. |
| 569 It will use further values from the file if present. |
| 570 |
| 571 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| 572 - new block & script values |
| 573 + 17 new blocks |
| 574 + 11 new script values already added in ICU 3.8 for ISO 15924 coverage |
| 575 (removed from SyntheticPropertyValueAliases.txt) |
| 576 + 14 new script values added for ISO 15924 coverage (not in Unicode 5.1) |
| 577 (added to SyntheticPropertyValueAliases.txt) |
| 578 - uprops.icu (uprops.h) only provides 7 bits for script codes. |
| 579 In ICU 4.0 there are USCRIPT_CODE_LIMIT=130 script codes now. |
| 580 There is none above 127 yet which is the script code for an |
| 581 assigned Unicode character, so ICU 4.0 uprops.icu does not store any |
| 582 script code values greater than 127. |
| 583 However, it does need to store the maximum script value=USCRIPT_CODE_LIMIT-1=1
29 |
| 584 in a parallel bit field, and that overflows now. |
| 585 Also, future values >=128 would be incompatible anyway. |
| 586 uprops.h is modified to move around several of the bit fields |
| 587 in the properties vector words, and now uses 8 bits for the script code. |
| 588 Two other bit fields also grow to accommodate future growth: |
| 589 Block (current count: 172) grows from 8 to 9 bits, |
| 590 and Word_Break grows from 4 to 5 bits. |
| 591 - renamed property Simple_Case_Folding (sfc->scf) |
| 592 + nothing to be done: handled as normal alias |
| 593 - new property JSN Jamo_Short_Name |
| 594 + no new API: only contributes to the Name property |
| 595 - new Grapheme_Cluster_Break (GCB) value: SM=SpacingMark |
| 596 - new Joining Group (JG) value: Burushashki_Yeh_Barree |
| 597 - new Sentence_Break (SB) values: |
| 598 SB ; CR ; CR |
| 599 SB ; EX ; Extend |
| 600 SB ; LF ; LF |
| 601 SB ; SC ; SContinue |
| 602 - new Word_Break (WB) values: |
| 603 WB ; CR ; CR |
| 604 WB ; Extend ; Extend |
| 605 WB ; LF ; LF |
| 606 WB ; MB ; MidNumLet |
| 607 |
| 608 * Further changes in the 2008-02-29 update: |
| 609 - Default_Ignorable_Code_Point: The new file removes Cc, Cs, noncharacters from
DICP |
| 610 because they should not normally be invisible. |
| 611 - new Joining Group (JG) value Burushashki_Yeh_Barree was renamed to Burushaski_
Yeh_Barree (one 'h' removed) |
| 612 - new Grapheme_Cluster_Break (GCB) value: PP=Prepend |
| 613 - new Word_Break (WB) value: NL=Newline |
| 614 |
| 615 * hardcoded Unihan range end/limit (see Unicode 4.1 update for comparison) |
| 616 - Unihan range end moves from 9FBB to 9FC3 |
| 617 search for both 9FBB (end) and 9FBC (limit) (regex 9FB[BC], case-insensitive) |
| 618 + do change gennames.c |
| 619 |
| 620 * build Unicode data source code for hardcoding core data |
| 621 C:\svn\icuproj\icu\uni51\source\data>NMAKE /f makedata.mak ICUMAKE=\svn\icuproj\
icu\uni51\source\data\ CFG=debug uni-core-data |
| 622 |
| 623 ICU data make path is \svn\icuproj\icu\uni51\source\data\ |
| 624 ICU root path is \svn\icuproj\icu\uni51 |
| 625 Information: cannot find "ucmlocal.mk". Not building user-additional converter f
iles. |
| 626 Information: cannot find "brklocal.mk". Not building user-additional break itera
tor files. |
| 627 Information: cannot find "reslocal.mk". Not building user-additional resource bu
ndle files. |
| 628 Information: cannot find "collocal.mk". Not building user-additional resource bu
ndle files. |
| 629 Information: cannot find "rbnflocal.mk". Not building user-additional resource b
undle files. |
| 630 Information: cannot find "trnslocal.mk". Not building user-additional transliter
ator files. |
| 631 Information: cannot find "misclocal.mk". Not building user-additional miscellaen
ous files. |
| 632 Creating data file for Unicode Character Properties |
| 633 Creating data file for Unicode Case Mapping Properties |
| 634 Creating data file for Unicode BiDi/Shaping Properties |
| 635 Creating data file for Unicode Normalization |
| 636 Unicode .icu files built to "\svn\icuproj\icu\uni51\source\data\out\build\icudt3
9l" |
| 637 Unicode .c source files built to "\svn\icuproj\icu\uni51\source\data\out\tmp" |
| 638 |
| 639 - copy the .c source files to C:\svn\icuproj\icu\uni51\source\common |
| 640 and rebuild the common library |
| 641 |
| 642 *** Break iterators |
| 643 |
| 644 * Update break iterator rules to new UAX versions and new property values |
| 645 |
| 646 *** UCA |
| 647 |
| 648 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| 649 |
| 650 *** Test suites |
| 651 - Test that APIs using Unicode property value aliases (like UnicodeSet) |
| 652 support all of the boolean values N/Y, No/Yes, F/T, False/True |
| 653 -> TestBinaryValues() tests in both cintltst and intltest |
| 654 |
| 655 *** LayoutEngine script information |
| 656 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScri
pts.h, LELanguage.h, |
| 657 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory.
(it also generates |
| 658 ScriptRunData.cpp, which is no longer needed.) |
| 659 |
| 660 The generated files have a current copyright date and "@draft" statement. |
| 661 |
| 662 * copy the above files into <icu>/source/layout, replacing the old files. |
| 663 |
| 664 Add new default entries to the indicClassTables array in <icu>/source/layout/Ind
icClassTables.cpp |
| 665 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This s
tep should be automated...) |
| 666 |
| 667 * rebuild the layout and layoutex libraries. |
| 668 |
| 669 *** Documentation |
| 670 - Update User Guide |
| 671 + Jamo_Short_Name, sfc->scf, binary property value aliases |
| 672 |
| 673 ---------------------------------------------------------------------------- *** |
| 674 |
| 675 Unicode 5.0 update |
| 676 |
| 677 *** related Jitterbugs |
| 678 |
| 679 5084 RFE: Update to Unicode 5.0 |
| 680 |
| 681 *** data files & enums & parser code |
| 682 |
| 683 * file preparation |
| 684 - ucdstrip: |
| 685 DerivedCoreProperties.txt |
| 686 DerivedNormalizationProps.txt |
| 687 NormalizationTest.txt |
| 688 PropList.txt |
| 689 Scripts.txt |
| 690 GraphemeBreakProperty.txt |
| 691 SentenceBreakProperty.txt |
| 692 WordBreakProperty.txt |
| 693 - ucdstrip and ucdmerge: |
| 694 EastAsianWidth.txt |
| 695 LineBreak.txt |
| 696 |
| 697 * my ucd2unidata.bat (needs to be updated each time with UCD and file version nu
mbers) |
| 698 copy 5.0.0\ucd\BidiMirroring.txt ..\unidata\ |
| 699 copy 5.0.0\ucd\Blocks.txt ..\unidata\ |
| 700 copy 5.0.0\ucd\CaseFolding.txt ..\unidata\ |
| 701 copy 5.0.0\ucd\DerivedAge.txt ..\unidata\ |
| 702 copy 5.0.0\ucd\extracted\DerivedBidiClass.txt ..\unidata\ |
| 703 copy 5.0.0\ucd\extracted\DerivedJoiningGroup.txt ..\unidata\ |
| 704 copy 5.0.0\ucd\extracted\DerivedJoiningType.txt ..\unidata\ |
| 705 copy 5.0.0\ucd\extracted\DerivedNumericValues.txt ..\unidata\ |
| 706 copy 5.0.0\ucd\NormalizationCorrections.txt ..\unidata\ |
| 707 copy 5.0.0\ucd\PropertyAliases.txt ..\unidata\ |
| 708 copy 5.0.0\ucd\PropertyValueAliases.txt ..\unidata\ |
| 709 copy 5.0.0\ucd\SpecialCasing.txt ..\unidata\ |
| 710 copy 5.0.0\ucd\UnicodeData.txt ..\unidata\ |
| 711 |
| 712 ucdstrip < 5.0.0\ucd\DerivedCoreProperties.txt > ..\unidata\DerivedCorePropertie
s.txt |
| 713 ucdstrip < 5.0.0\ucd\DerivedNormalizationProps.txt > ..\unidata\DerivedNormaliza
tionProps.txt |
| 714 ucdstrip < 5.0.0\ucd\NormalizationTest.txt > ..\unidata\NormalizationTest.txt |
| 715 ucdstrip < 5.0.0\ucd\PropList.txt > ..\unidata\PropList.txt |
| 716 ucdstrip < 5.0.0\ucd\Scripts.txt > ..\unidata\Scripts.txt |
| 717 ucdstrip < 5.0.0\ucd\auxiliary\GraphemeBreakProperty.txt > ..\unidata\GraphemeBr
eakProperty.txt |
| 718 ucdstrip < 5.0.0\ucd\auxiliary\SentenceBreakProperty.txt > ..\unidata\SentenceBr
eakProperty.txt |
| 719 ucdstrip < 5.0.0\ucd\auxiliary\WordBreakProperty.txt > ..\unidata\WordBreakPrope
rty.txt |
| 720 ucdstrip < 5.0.0\ucd\EastAsianWidth.txt | ucdmerge > ..\unidata\EastAsianWidth.t
xt |
| 721 ucdstrip < 5.0.0\ucd\LineBreak.txt | ucdmerge > ..\unidata\LineBreak.txt |
| 722 |
| 723 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| 724 |
| 725 * genpname |
| 726 - run preparse.pl |
| 727 + make sure that data.h is writable |
| 728 + perl preparse.pl \cvs\oss\icu > out.txt |
| 729 |
| 730 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| 731 - new block & script values |
| 732 + script values already added in ICU 3.6 because all of ISO 15924 is now cover
ed |
| 733 |
| 734 * build Unicode data source code for hardcoding core data |
| 735 C:\cvs\oss\icu\source\data>NMAKE /f makedata.mak ICUMAKE=\cvs\oss\icu\source\dat
a\ CFG=debug uni-core-data |
| 736 |
| 737 ICU data make path is \cvs\oss\icu\source\data\ |
| 738 ICU root path is \cvs\oss\icu |
| 739 Information: cannot find "ucmlocal.mk". Not building user-additional converter f
iles. |
| 740 [etc.] |
| 741 Creating data file for Unicode Character Properties |
| 742 Creating data file for Unicode Case Mapping Properties |
| 743 Creating data file for Unicode BiDi/Shaping Properties |
| 744 Creating data file for Unicode Normalization |
| 745 Unicode .icu files built to "\cvs\oss\icu\source\data\out\build\icudt35l" |
| 746 Unicode .c source files built to "\cvs\oss\icu\source\data\out\tmp" |
| 747 |
| 748 - copy the .c source files to C:\cvs\oss\icu\source\common |
| 749 and rebuild the common library |
| 750 |
| 751 *** Unicode version numbers |
| 752 - makedata.mak |
| 753 - uchar.h |
| 754 - configure.in |
| 755 |
| 756 *** LayoutEngine script information |
| 757 * Run ICU4J com.ibm.icu.dev.tool.layout.ScriptNameBuilder. This generates LEScri
pts.h, LELanguage.h, |
| 758 ScriptAndLanguageTags.h and ScriptAndLanguageTags.cpp in the working directory.
(it also generates |
| 759 ScriptRunData.cpp, which is no longer needed.) |
| 760 |
| 761 The generated files have a current copyright date and "@draft" statement. |
| 762 |
| 763 * copy the above files into <icu>/source/layout, replacing the old files. |
| 764 |
| 765 Add new default entries to the indicClassTables array in <icu>/source/layout/Ind
icClassTables.cpp |
| 766 and the complexTable array in <icu>/source/layoutex/ParagraphLayout.cpp. (This s
tep should be automated...) |
| 767 |
| 768 * rebuild the layout and layoutex libraries. |
| 769 |
| 770 ---------------------------------------------------------------------------- *** |
| 771 |
| 772 Unicode 4.1 update |
| 773 |
| 774 *** related Jitterbugs |
| 775 |
| 776 4332 RFE: Update to Unicode 4.1 |
| 777 4157 RBBI, TR29 4.1 updates |
| 778 |
| 779 *** data files & enums & parser code |
| 780 |
| 781 * file preparation |
| 782 - ucdstrip: |
| 783 DerivedCoreProperties.txt |
| 784 DerivedNormalizationProps.txt |
| 785 NormalizationTest.txt |
| 786 GraphemeBreakProperty.txt |
| 787 SentenceBreakProperty.txt |
| 788 WordBreakProperty.txt |
| 789 - ucdstrip and ucdmerge: |
| 790 EastAsianWidth.txt |
| 791 LineBreak.txt |
| 792 |
| 793 * add new files to the repository |
| 794 GraphemeBreakProperty.txt |
| 795 SentenceBreakProperty.txt |
| 796 WordBreakProperty.txt |
| 797 |
| 798 * update FractionalUCA.txt and UCARules.txt with new canonical closure |
| 799 |
| 800 * genpname |
| 801 - handle new enumerated properties in sub read_uchar |
| 802 - run preparse.pl |
| 803 |
| 804 * uchar.h & uscript.h & uprops.h & uprops.c & genprops |
| 805 - new binary properties |
| 806 + Pattern_Syntax |
| 807 + Pattern_White_Space |
| 808 - new enumerated properties |
| 809 + Grapheme_Cluster_Break |
| 810 + Sentence_Break |
| 811 + Word_Break |
| 812 - new block & script & line break values |
| 813 |
| 814 * gencase |
| 815 - case-ignorable changes |
| 816 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
| 817 now: (D47a) Word_Break=MidLetter or Mn, Me, Cf, Lm, Sk |
| 818 |
| 819 *** Unicode version numbers |
| 820 - makedata.mak |
| 821 - uchar.h |
| 822 - configure.in |
| 823 |
| 824 *** tests |
| 825 - verify that u_charMirror() round-trips |
| 826 - test all new properties and some new values of old properties |
| 827 |
| 828 *** other code |
| 829 |
| 830 * hardcoded Unihan range end/limit |
| 831 - Unihan range end moves from 9FA5 to 9FBB |
| 832 search for both 9FA5 (end) and 9FA6 (limit) (regex 9FA[56], case-insensitive) |
| 833 + do not modify BOCU/BOCSU code because that would change the encoding |
| 834 and break binary compatibility! |
| 835 + similarly, do not change the GB 18030 range data (ucnvmbcs.c), |
| 836 NamePrepProfile.txt |
| 837 + ignore trietest.c: test data is arbitrary |
| 838 + ignore tstnorm.cpp: test optimization, not important |
| 839 + ignore collation: 9FA[56] only appears in comments; swapCJK() uses the whole
block up to 9FFF |
| 840 + do change line_th.txt and word_th.txt |
| 841 by replacing hardcoded ranges with the new property values |
| 842 + do change gennames.c |
| 843 |
| 844 source\data\brkitr\line_th.txt(229): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\
u9FA5 \uA000-\uA48C \uA490-\uA4C6 |
| 845 source\data\brkitr\word_th.txt(23): \u33E0-\u33FE \u3400-\u4DB5 \u4E00-\u
9FA5 \uA000-\uA48C \uA490-\uA4C6 |
| 846 source\tools\gennames\gennames.c(971): 0x4e00, 0x9fa5, |
| 847 |
| 848 * case mappings |
| 849 - compare new special casing context conditions with previous ones |
| 850 see http://www.unicode.org/versions/Unicode4.1.0/#CaseMods |
| 851 |
| 852 * genpname |
| 853 - consider storing only the short name if it is the same as the long name |
| 854 |
| 855 *** other reviews |
| 856 - UAX #29 changes (grapheme/word/sentence breaks) |
| 857 - UAX #14 changes (line breaks) |
| 858 - Pattern_Syntax & Pattern_White_Space |
| 859 |
| 860 ---------------------------------------------------------------------------- *** |
| 861 |
| 862 Unicode 4.0.1 update |
| 863 |
| 864 *** related Jitterbugs |
| 865 |
| 866 3170 RFE: Update to Unicode 4.0.1 |
| 867 3171 Add new Unicode 4.0.1 properties |
| 868 3520 use Unicode 4.0.1 updates for break iteration |
| 869 |
| 870 *** data files & enums & parser code |
| 871 |
| 872 * file preparation |
| 873 - ucdstrip: DerivedNormalizationProps.txt, NormalizationTest.txt, DerivedCorePro
perties.txt |
| 874 - ucdstrip and ucdmerge: EastAsianWidth.txt, LineBreak.txt |
| 875 |
| 876 * file fixes |
| 877 - fix UnicodeData.txt general categories of Ethiopic digits Nd->No |
| 878 according to PRI #26 |
| 879 http://www.unicode.org/review/resolved-pri.html#pri26 |
| 880 - undone again because no corrigendum in sight; |
| 881 instead modified tests to not check consistency on this for Unicode 4.0.1 |
| 882 |
| 883 * ucdterms.txt |
| 884 - update from http://www.unicode.org/copyright.html |
| 885 formatted for plain text |
| 886 |
| 887 * uchar.h & uprops.h & uprops.c & genprops |
| 888 - add UBLOCK_CYRILLIC_SUPPLEMENT because the block is renamed |
| 889 - add U_LB_INSEPARABLE due to a spelling fix |
| 890 + put short name comment only on line with new constant |
| 891 for genpname perl script parser |
| 892 - new binary properties |
| 893 + STerm |
| 894 + Variation_Selector |
| 895 |
| 896 * genpname |
| 897 - fix genpname perl script so that it doesn't choke on more than 2 names per pro
perty value |
| 898 - perl script: correctly calculate the maximum number of fields per row |
| 899 |
| 900 * uscript.h |
| 901 - new script code Hrkt=Katakana_Or_Hiragana |
| 902 |
| 903 * gennorm.c track changes in DerivedNormalizationProps.txt |
| 904 - "FNC" -> "FC_NFKC" |
| 905 - single field "NFD_NO" -> two fields "NFD_QC; N" etc. |
| 906 |
| 907 * genprops/props2.c track changes in DerivedNumericValues.txt |
| 908 - changed from 3 columns to 2, dropping the numeric type |
| 909 + assume that the type is always numeric for Han characters, |
| 910 and that only those are added in addition to what UnicodeData.txt lists |
| 911 |
| 912 *** Unicode version numbers |
| 913 - makedata.mak |
| 914 - uchar.h |
| 915 - configure.in |
| 916 |
| 917 *** tests |
| 918 - update test of default bidi classes according to PRI #28 |
| 919 /tsutil/cucdtst/TestUnicodeData |
| 920 http://www.unicode.org/review/resolved-pri.html#pri28 |
| 921 - bidi tests: change exemplar character for ES depending on Unicode version |
| 922 - change hardcoded expected property values where they change |
| 923 |
| 924 *** other code |
| 925 |
| 926 * name matching |
| 927 - read UCD.html |
| 928 |
| 929 * scripts |
| 930 - use new Hrkt=Katakana_Or_Hiragana |
| 931 |
| 932 * ZWJ & ZWNJ |
| 933 - are now part of combining character sequences |
| 934 - break iteration used to assume that LB classes did not overlap; now they do fo
r ZWJ & ZWNJ |
OLD | NEW |