OLD | NEW |
1 * Copyright (C) 2004-2013, International Business Machines | 1 * Copyright (C) 2004-2014, International Business Machines |
2 * Corporation and others. All Rights Reserved. | 2 * Corporation and others. All Rights Reserved. |
3 * | 3 * |
4 * file name: changes.txt | 4 * file name: changes.txt |
5 * encoding: US-ASCII | 5 * encoding: US-ASCII |
6 * tab size: 8 (not used) | 6 * tab size: 8 (not used) |
7 * indentation:4 | 7 * indentation:4 |
8 * | 8 * |
9 * created on: 2004may06 | 9 * created on: 2004may06 |
10 * created by: Markus W. Scherer | 10 * created by: Markus W. Scherer |
11 * | 11 * |
12 * change log for Unicode updates | 12 * change log for Unicode updates |
13 | 13 |
14 ---------------------------------------------------------------------------- *** | 14 ---------------------------------------------------------------------------- *** |
15 | 15 |
| 16 Unicode 8.0 update for ICU ?? |
| 17 |
| 18 * UCA issue from 7.0 |
| 19 |
| 20 - U+1DE9 COMBINING LATIN SMALL LETTER BETA |
| 21 sorts with Greek Beta, should sort with Latin B? |
| 22 + Ken says: |
| 23 No, it was deliberate: |
| 24 |
| 25 03B2;GREEK SMALL LETTER BETA;Ll;;;;0392;;0392 |
| 26 1D5D;MODIFIER LETTER SMALL BETA;Lm;<super> 03B2;;;;; |
| 27 1DE9;COMBINING LATIN SMALL LETTER BETA;Mn;<sort> 03B2;;;;; |
| 28 1D66;GREEK SUBSCRIPT SMALL LETTER BETA;Ll;<sub> 03B2;;;;; |
| 29 |
| 30 Note the relationship to U+1D5D. |
| 31 |
| 32 When the disunified *Latin* beta base letter shows up in Unicode 8.0: |
| 33 |
| 34 U+A7B4 LATIN CAPITAL LETTER BETA |
| 35 U+A7B5 LATIN SMALL LETTER BETA |
| 36 |
| 37 we could re-evaluate what U+1DE9 equates to, for collation, |
| 38 but currently there isn’t any Latin beta to serve that function |
| 39 in Unicode 7.0. |
| 40 |
| 41 - ICU_ROOT=~/svn.icu/trunk |
| 42 - ICU_SRC_DIR=$ICU_ROOT/src |
| 43 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder implicit $ICU_SRC
_DIR |
| 44 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca --hanOrder radical-stroke $I
CU_SRC_DIR |
| 45 |
| 46 |
| 47 ---------------------------------------------------------------------------- *** |
| 48 |
| 49 Unicode 7.0 update for ICU 54 |
| 50 |
| 51 http://www.unicode.org/review/pri271/ -- beta review |
| 52 http://www.unicode.org/reports/uax-proposed-updates.html |
| 53 http://www.unicode.org/versions/beta-7.0.0.html#notable_issues |
| 54 http://www.unicode.org/reports/tr44/tr44-13.html |
| 55 |
| 56 *** ICU Trac |
| 57 |
| 58 - ticket 10821: Unicode 7.0, UCA 7.0 |
| 59 - C++ branches/markus/uni70 at r35584 from trunk at r35580 |
| 60 - Java branches/markus/uni70 at r35587 from trunk at r35545 |
| 61 |
| 62 *** CLDR Trac |
| 63 |
| 64 - ticket 7195: UCA 7.0 CLDR root collation |
| 65 - branches/markus/uni70 at r10062 from trunk at r10061 |
| 66 |
| 67 - ticket 6762: script metadata for Unicode 7.0 new scripts |
| 68 |
| 69 *** Unicode version numbers |
| 70 - makedata.mak |
| 71 - uchar.h |
| 72 - com.ibm.icu.util.VersionInfo |
| 73 - com.ibm.icu.dev.test.lang.UCharacterTest.VERSION_ |
| 74 |
| 75 - Run ICU4C "configure" _after_ updating the Unicode version number in uchar.h |
| 76 so that the makefiles see the new version number. |
| 77 |
| 78 *** data files & enums & parser code |
| 79 |
| 80 * file preparation |
| 81 |
| 82 - download UCD & IDNA files |
| 83 - make sure that the Unicode data folder passed into preparseucd.py |
| 84 includes a copy of the latest IdnaMappingTable.txt (can be in some subfolder) |
| 85 - only for manual diffs: remove version suffixes from the file names |
| 86 ~/unidata/uni70/20140403$ ../../desuffixucd.py . |
| 87 (see https://sites.google.com/site/unicodetools/inputdata) |
| 88 - only for manual diffs: extract Unihan.zip to "here" (.../ucd/Unihan/*.txt), de
lete Unihan.zip |
| 89 - ~/svn.icutools/trunk/src/unicode$ py/preparseucd.py ~/unidata/uni70/20140403 $
ICU_SRC_DIR ~/svn.icutools/trunk/src |
| 90 - This writes files (especially ppucd.txt) to the ICU4C unidata and testdata sub
folders. |
| 91 - Restore TODO diffs in source/data/unidata/UCARules.txt |
| 92 cd $ICU_SRC_DIR |
| 93 meld ../../trunk/src/source/data/unidata/UCARules.txt source/data/unidata/UC
ARules.txt |
| 94 - Restore ICU patches for ticket #10176 in source/test/testdata/LineBreakTest.tx
t |
| 95 |
| 96 - also: from http://unicode.org/Public/security/7.0.0/ download new |
| 97 confusables.txt & confusablesWholeScript.txt |
| 98 and copy to $ICU_ROOT/src/source/data/unidata/ |
| 99 |
| 100 * initial preparseucd.py changes |
| 101 - remove new Unicode scripts from the |
| 102 only-in-ISO-15924 list according to the error message: |
| 103 ValueError: remove ['Hmng', 'Lina', 'Perm', 'Mani', 'Phlp', 'Bass', |
| 104 'Dupl', 'Elba', 'Gran', 'Mend', 'Narb', 'Nbat', 'Palm', |
| 105 'Sind', 'Wara', 'Mroo', 'Khoj', 'Tirh', 'Aghb', 'Mahj'] |
| 106 from _scripts_only_in_iso15924 |
| 107 -> fix expectedLong names in cucdapi.c/TestUScriptCodeAPI() |
| 108 and in com.ibm.icu.dev.test.lang.TestUScript.java |
| 109 - NamesList.txt now has a heading with a non-ASCII character |
| 110 + keep ppucd.txt in platform charset, rather than changing tool/test parsers |
| 111 + escape non-ASCII characters in heading comments |
| 112 - gets Unicode copyright line from PropertyAliases.txt which is currently still
at 2013 |
| 113 + get the copyright from the first file whose copyright line contains the curr
ent year |
| 114 |
| 115 * PropertyValueAliases.txt changes |
| 116 - 32 new Block (blk) values: |
| 117 blk; Bassa_Vah ; Bassa_Vah |
| 118 blk; Caucasian_Albanian ; Caucasian_Albanian |
| 119 blk; Coptic_Epact_Numbers ; Coptic_Epact_Numbers |
| 120 blk; Diacriticals_Ext ; Combining_Diacritical_Marks_Extended |
| 121 blk; Duployan ; Duployan |
| 122 blk; Elbasan ; Elbasan |
| 123 blk; Geometric_Shapes_Ext ; Geometric_Shapes_Extended |
| 124 blk; Grantha ; Grantha |
| 125 blk; Khojki ; Khojki |
| 126 blk; Khudawadi ; Khudawadi |
| 127 blk; Latin_Ext_E ; Latin_Extended_E |
| 128 blk; Linear_A ; Linear_A |
| 129 blk; Mahajani ; Mahajani |
| 130 blk; Manichaean ; Manichaean |
| 131 blk; Mende_Kikakui ; Mende_Kikakui |
| 132 blk; Modi ; Modi |
| 133 blk; Mro ; Mro |
| 134 blk; Myanmar_Ext_B ; Myanmar_Extended_B |
| 135 blk; Nabataean ; Nabataean |
| 136 blk; Old_North_Arabian ; Old_North_Arabian |
| 137 blk; Old_Permic ; Old_Permic |
| 138 blk; Ornamental_Dingbats ; Ornamental_Dingbats |
| 139 blk; Pahawh_Hmong ; Pahawh_Hmong |
| 140 blk; Palmyrene ; Palmyrene |
| 141 blk; Pau_Cin_Hau ; Pau_Cin_Hau |
| 142 blk; Psalter_Pahlavi ; Psalter_Pahlavi |
| 143 blk; Shorthand_Format_Controls ; Shorthand_Format_Controls |
| 144 blk; Siddham ; Siddham |
| 145 blk; Sinhala_Archaic_Numbers ; Sinhala_Archaic_Numbers |
| 146 blk; Sup_Arrows_C ; Supplemental_Arrows_C |
| 147 blk; Tirhuta ; Tirhuta |
| 148 blk; Warang_Citi ; Warang_Citi |
| 149 -> add to uchar.h |
| 150 use long property names for enum constants |
| 151 -> add to UCharacter.UnicodeBlock IDs |
| 152 Eclipse find UBLOCK_([^ ]+) = ([0-9]+), (/.+) |
| 153 replace public static final int \1_ID = \2; \3 |
| 154 -> add to UCharacter.UnicodeBlock objects |
| 155 Eclipse find UBLOCK_([^ ]+) = [0-9]+, (/.+) |
| 156 replace public static final UnicodeBlock \1 = new UnicodeBlock("\1"
, \1_ID); \2 |
| 157 - 28 new Joining_Group (jg) values: |
| 158 jg ; Manichaean_Aleph ; Manichaean_Aleph |
| 159 jg ; Manichaean_Ayin ; Manichaean_Ayin |
| 160 jg ; Manichaean_Beth ; Manichaean_Beth |
| 161 jg ; Manichaean_Daleth ; Manichaean_Daleth |
| 162 jg ; Manichaean_Dhamedh ; Manichaean_Dhamedh |
| 163 jg ; Manichaean_Five ; Manichaean_Five |
| 164 jg ; Manichaean_Gimel ; Manichaean_Gimel |
| 165 jg ; Manichaean_Heth ; Manichaean_Heth |
| 166 jg ; Manichaean_Hundred ; Manichaean_Hundred |
| 167 jg ; Manichaean_Kaph ; Manichaean_Kaph |
| 168 jg ; Manichaean_Lamedh ; Manichaean_Lamedh |
| 169 jg ; Manichaean_Mem ; Manichaean_Mem |
| 170 jg ; Manichaean_Nun ; Manichaean_Nun |
| 171 jg ; Manichaean_One ; Manichaean_One |
| 172 jg ; Manichaean_Pe ; Manichaean_Pe |
| 173 jg ; Manichaean_Qoph ; Manichaean_Qoph |
| 174 jg ; Manichaean_Resh ; Manichaean_Resh |
| 175 jg ; Manichaean_Sadhe ; Manichaean_Sadhe |
| 176 jg ; Manichaean_Samekh ; Manichaean_Samekh |
| 177 jg ; Manichaean_Taw ; Manichaean_Taw |
| 178 jg ; Manichaean_Ten ; Manichaean_Ten |
| 179 jg ; Manichaean_Teth ; Manichaean_Teth |
| 180 jg ; Manichaean_Thamedh ; Manichaean_Thamedh |
| 181 jg ; Manichaean_Twenty ; Manichaean_Twenty |
| 182 jg ; Manichaean_Waw ; Manichaean_Waw |
| 183 jg ; Manichaean_Yodh ; Manichaean_Yodh |
| 184 jg ; Manichaean_Zayin ; Manichaean_Zayin |
| 185 jg ; Straight_Waw ; Straight_Waw |
| 186 -> uchar.h & UCharacter.JoiningGroup |
| 187 - 23 new Script (sc) values: |
| 188 sc ; Aghb ; Caucasian_Albanian |
| 189 sc ; Bass ; Bassa_Vah |
| 190 sc ; Dupl ; Duployan |
| 191 sc ; Elba ; Elbasan |
| 192 sc ; Gran ; Grantha |
| 193 sc ; Hmng ; Pahawh_Hmong |
| 194 sc ; Khoj ; Khojki |
| 195 sc ; Lina ; Linear_A |
| 196 sc ; Mahj ; Mahajani |
| 197 sc ; Mani ; Manichaean |
| 198 sc ; Mend ; Mende_Kikakui |
| 199 sc ; Modi ; Modi |
| 200 sc ; Mroo ; Mro |
| 201 sc ; Narb ; Old_North_Arabian |
| 202 sc ; Nbat ; Nabataean |
| 203 sc ; Palm ; Palmyrene |
| 204 sc ; Pauc ; Pau_Cin_Hau |
| 205 sc ; Perm ; Old_Permic |
| 206 sc ; Phlp ; Psalter_Pahlavi |
| 207 sc ; Sidd ; Siddham |
| 208 sc ; Sind ; Khudawadi |
| 209 sc ; Tirh ; Tirhuta |
| 210 sc ; Wara ; Warang_Citi |
| 211 -> uscript.h (many were added before) |
| 212 comment "Mende Kikakui" for USCRIPT_MENDE |
| 213 add USCRIPT_KHUDAWADI, make USCRIPT_SINDHI an alias |
| 214 -> com.ibm.icu.lang.UScript |
| 215 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
| 216 replace public static final int \1 = \2; \3 |
| 217 - 6 new script codes from ISO 15924 http://www.unicode.org/iso15924/codechanges.
html |
| 218 (added 2012-11-01) |
| 219 Ahom 338 Ahom |
| 220 Hatr 127 Hatran |
| 221 Mult 323 Multani |
| 222 (added 2013-10-12) |
| 223 Modi 324 Modi |
| 224 Pauc 263 Pau Cin Hau |
| 225 Sidd 302 Siddham |
| 226 -> uscript.h (some overlap with additions from Unicode) |
| 227 -> com.ibm.icu.lang.UScript |
| 228 find USCRIPT_([^ ]+) *= ([0-9]+),(.+) |
| 229 replace public static final int \1 = \2; \3 |
| 230 -> add Ahom, Hatr, Mult to preparseucd.py _scripts_only_in_iso15924 |
| 231 -> add to expectedLong and expectedShort names in cintltst/cucdapi.c/TestUScri
ptCodeAPI() |
| 232 and in com.ibm.icu.dev.test.lang.TestUScript.java |
| 233 |
| 234 * update Script metadata: SCRIPT_PROPS[] in uscript_props.cpp & UScript.ScriptMe
tadata |
| 235 (not strictly necessary for NOT_ENCODED scripts) |
| 236 ~/svn.icutools/trunk/src/unicode$ py/parsescriptmetadata.py $ICU_SRC_DIR/sourc
e/common/unicode/uscript.h ~/svn.cldr/trunk/common/properties/scriptMetadata.txt |
| 237 |
| 238 * generate normalization data files |
| 239 - cd $ICU_ROOT/dbg |
| 240 - export LD_LIBRARY_PATH=$ICU_ROOT/dbg/lib |
| 241 - SRC_DATA_IN=$ICU_SRC_DIR/source/data/in |
| 242 - UNIDATA=$ICU_SRC_DIR/source/data/unidata |
| 243 - bin/gennorm2 -o $ICU_SRC_DIR/source/common/norm2_nfc_data.h -s $UNIDATA/norm2
nfc.txt --csource |
| 244 - bin/gennorm2 -o $SRC_DATA_IN/nfc.nrm -s $UNIDATA/norm2 nfc.txt |
| 245 - bin/gennorm2 -o $SRC_DATA_IN/nfkc.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt |
| 246 - bin/gennorm2 -o $SRC_DATA_IN/nfkc_cf.nrm -s $UNIDATA/norm2 nfc.txt nfkc.txt nf
kc_cf.txt |
| 247 - bin/gennorm2 -o $SRC_DATA_IN/uts46.nrm -s $UNIDATA/norm2 nfc.txt uts46.txt |
| 248 |
| 249 * build ICU (make install) |
| 250 so that the tools build can pick up the new definitions from the installed hea
der files. |
| 251 |
| 252 ~/svn.icu/uni70/dbg$ echo;echo;make -j5 install > out.txt 2>&1 ; tail -n 20 out.
txt |
| 253 |
| 254 * build Unicode tools using CMake+make |
| 255 |
| 256 ~/svn.icutools/trunk/src/unicode/c/icudefs.txt: |
| 257 |
| 258 # Location (--prefix) of where ICU was installed. |
| 259 set(ICU_INST_DIR /home/mscherer/svn.icu/uni70/inst) |
| 260 # Location of the ICU source tree. |
| 261 set(ICU_SRC_DIR /home/mscherer/svn.icu/uni70/src) |
| 262 |
| 263 ~/svn.icutools/trunk/dbg/unicode/c$ cmake ../../../src/unicode/c |
| 264 ~/svn.icutools/trunk/dbg/unicode/c$ make |
| 265 |
| 266 * genprops work |
| 267 - new code point range for Joining_Group values: 10AC0..10AFF Manichaean |
| 268 + add second array of Joining_Group values for at most 10800..10FFF |
| 269 icutools: unicode/c/genprops/bidipropsbuilder.cpp |
| 270 icu: source/common/ubidi_props.h/.c/_data.h |
| 271 icu4j: main/classes/core/src/com/ibm/icu/impl/UBiDiProps.java |
| 272 |
| 273 * generate core properties data files |
| 274 - ~/svn.icutools/trunk/dbg/unicode/c$ genprops/genprops $ICU_SRC_DIR |
| 275 - ~/svn.icutools/trunk/dbg/unicode/c$ genuca/genuca $ICU_SRC_DIR |
| 276 - rebuild ICU (make install) & tools |
| 277 - run genuca again (see step above) so that it picks up the new nfc.nrm |
| 278 - rebuild ICU (make install) & tools |
| 279 |
| 280 * update uts46test.cpp and UTS46Test.java if there are new characters that are e
quivalent to |
| 281 sequences with non-LDH ASCII (that is, their decompositions contain '=' or sim
ilar) |
| 282 - grep IdnaMappingTable.txt or uts46.txt for "disallowed_STD3_valid" on non-ASCI
I characters |
| 283 - Unicode 6.0..7.0: U+2260, U+226E, U+226F |
| 284 - nothing new in 7.0, no test file to update |
| 285 |
| 286 * run & fix ICU4C tests |
| 287 |
| 288 * update Java data files |
| 289 - refresh just the UCD-related files, just to be safe |
| 290 - see (ICU4C)/source/data/icu4j-readme.txt |
| 291 - mkdir /tmp/icu4j |
| 292 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 293 output: |
| 294 ... |
| 295 Unicode .icu files built to ./out/build/icudt53l |
| 296 echo timestamp > uni-core-data |
| 297 mkdir -p ./out/icu4j/com/ibm/icu/impl/data/icudt53b |
| 298 mkdir -p ./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b |
| 299 echo pnames.icu ubidi.icu ucase.icu uprops.icu > ./out/icu4j/add.txt |
| 300 LD_LIBRARY_PATH=../lib:../stubdata:../tools/ctestfw:$LD_LIBRARY_PATH ../bin
/icupkg ./out/tmp/icudt53l.dat ./out/icu4j/icudt53b.dat -a ./out/icu4j/add.txt -
s ./out/build/icudt53l -x '*' -tb -d ./out/icu4j/com/ibm/icu/impl/data/icudt53b |
| 301 mv ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/zoneinfo64.res" ./out/icu4j/"
com/ibm/icu/impl/data/icudt53b/metaZones.res" ./out/icu4j/"com/ibm/icu/impl/data
/icudt53b/timezoneTypes.res" ./out/icu4j/"com/ibm/icu/impl/data/icudt53b/windows
Zones.res" "./out/icu4j/tzdata/com/ibm/icu/impl/data/icudt53b" |
| 302 jar cf ./out/icu4j/icudata.jar -C ./out/icu4j com/ibm/icu/impl/data/icudt53b
/ |
| 303 mkdir -p /tmp/icu4j/main/shared/data |
| 304 cp ./out/icu4j/icudata.jar /tmp/icu4j/main/shared/data |
| 305 jar cf ./out/icu4j/icutzdata.jar -C ./out/icu4j/tzdata com/ibm/icu/impl/data
/icudt53b/ |
| 306 mkdir -p /tmp/icu4j/main/shared/data |
| 307 cp ./out/icu4j/icutzdata.jar /tmp/icu4j/main/shared/data |
| 308 make[1]: Leaving directory `/home/mscherer/svn.icu/uni70/dbg/data' |
| 309 - copy the big-endian Unicode data files to another location, |
| 310 separate from the other data files |
| 311 ICUDT=icudt54b |
| 312 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll |
| 313 mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/brkitr |
| 314 cd ~/svn.icu/uni70/dbg/data/out/icu4j |
| 315 cp com/ibm/icu/impl/data/$ICUDT/confusables.cfu /tmp/icu4j/com/ibm/icu/impl/
data/$ICUDT |
| 316 cp com/ibm/icu/impl/data/$ICUDT/*.icu /tmp/icu4j/com/ibm/icu/impl/data/$ICUD
T |
| 317 rm /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/cnvalias.icu |
| 318 cp com/ibm/icu/impl/data/$ICUDT/*.nrm /tmp/icu4j/com/ibm/icu/impl/data/$ICUD
T |
| 319 cp com/ibm/icu/impl/data/$ICUDT/coll/*.icu /tmp/icu4j/com/ibm/icu/impl/data/
$ICUDT/coll |
| 320 cp com/ibm/icu/impl/data/$ICUDT/brkitr/* /tmp/icu4j/com/ibm/icu/impl/data/$I
CUDT/brkitr |
| 321 - refresh ICU4J |
| 322 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared
/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT |
| 323 |
| 324 * update CollationFCD.java |
| 325 + copy & paste the initializers of lcccIndex[] etc. from |
| 326 ICU4C/source/i18n/collationfcd.cpp to |
| 327 ICU4J/main/classes/collate/src/com/ibm/icu/impl/coll/CollationFCD.java |
| 328 |
| 329 * refresh Java test .txt files |
| 330 - copy new .txt files into ICU4J's main/tests/core/src/com/ibm/icu/dev/data/unic
ode |
| 331 cd $ICU_SRC_DIR/source/data/unidata |
| 332 cp confusables.txt confusablesWholeScript.txt NormalizationCorrections.txt N
ormalizationTest.txt SpecialCasing.txt UnicodeData.txt ~/svn.icu4j/trunk/src/mai
n/tests/core/src/com/ibm/icu/dev/data/unicode |
| 333 cd ../../test/testdata |
| 334 cp BidiCharacterTest.txt BidiTest.txt ~/svn.icu4j/trunk/src/main/tests/core/
src/com/ibm/icu/dev/data/unicode |
| 335 cp ~/unidata/uni70/20140409/ucd/CompositionExclusions.txt ~/svn.icu4j/trunk/
src/main/tests/core/src/com/ibm/icu/dev/data/unicode |
| 336 |
| 337 * UCA |
| 338 |
| 339 - download UCA files (mostly allkeys.txt) from http://www.unicode.org/Public/UCA
/<beta version>/ |
| 340 - run desuffixucd.py (see https://sites.google.com/site/unicodetools/inputdata) |
| 341 - update the input files for Mark's UCA tools, in ~/svn.unitools/trunk/data/uca/
7.0.0/ |
| 342 - run Mark's UCA Main: https://sites.google.com/site/unicodetools/home#TOC-UCA |
| 343 - output files are in ~/svn.unitools/Generated/uca/7.0.0/ |
| 344 - review data; compare files, use blankweights.sed or similar |
| 345 ~/svn.unitools$ sed -r -f blankweights.sed Generated/uca/7.0.0/CollationAuxili
ary/FractionalUCA.txt > frac-7.0.txt |
| 346 - cd ~/svn.unitools/Generated/uca/7.0.0/ |
| 347 - update source/data/unidata/FractionalUCA.txt with FractionalUCA_SHORT.txt |
| 348 cp CollationAuxiliary/FractionalUCA_SHORT.txt $ICU_SRC_DIR/source/data/unidata
/FractionalUCA.txt |
| 349 - update source/data/unidata/UCARules.txt with UCA_Rules_SHORT.txt |
| 350 (note removing the underscore before "Rules") |
| 351 cp CollationAuxiliary/UCA_Rules_SHORT.txt $ICU_SRC_DIR/source/data/unidata/U
CARules.txt |
| 352 - update (ICU4C)/source/test/testdata/CollationTest_*.txt |
| 353 and (ICU4J)/main/tests/collate/src/com/ibm/icu/dev/data/CollationTest_*.txt |
| 354 with output from Mark's Unicode tools (..._CLDR_..._SHORT.txt) |
| 355 cp CollationAuxiliary/CollationTest_CLDR_NON_IGNORABLE_SHORT.txt $ICU_SRC_DI
R/source/test/testdata/CollationTest_NON_IGNORABLE_SHORT.txt |
| 356 cp CollationAuxiliary/CollationTest_CLDR_SHIFTED_SHORT.txt $ICU_SRC_DIR/sour
ce/test/testdata/CollationTest_SHIFTED_SHORT.txt |
| 357 cp $ICU_SRC_DIR/source/test/testdata/CollationTest_*.txt ~/svn.icu4j/trunk/s
rc/main/tests/collate/src/com/ibm/icu/dev/data |
| 358 - run genuca, see command line above |
| 359 - rebuild ICU4C |
| 360 - refresh ICU4J collation data: |
| 361 (subset of instructions above for properties data refresh, except copies all c
oll/*) |
| 362 ICUDT=icudt54b |
| 363 ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 364 ~/svn.icu/uni70/dbg$ mkdir -p /tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll |
| 365 ~/svn.icu/uni70/dbg/data/out/icu4j$ cp com/ibm/icu/impl/data/$ICUDT/coll/* /
tmp/icu4j/com/ibm/icu/impl/data/$ICUDT/coll |
| 366 ~/svn.icu/uni70/dbg/data/out/icu4j$ jar uf ~/svn.icu4j/trunk/src/main/shared
/data/icudata.jar -C /tmp/icu4j com/ibm/icu/impl/data/$ICUDT |
| 367 - run all tests with the *_SHORT.txt or the full files (the full ones have comme
nts, useful for debugging) |
| 368 - note on intltest: if collate/UCAConformanceTest fails, then |
| 369 utility/MultithreadTest/TestCollators will fail as well; |
| 370 fix the conformance test before looking into the multi-thread test |
| 371 - copy all output from Mark's UCA tool to unicode.org for review & staging by Ke
n & editors |
| 372 - copy most of ~/svn.unitools/Generated/uca/7.0.0/CollationAuxiliary/* to CLDR b
ranch |
| 373 ~/svn.unitools$ cp Generated/uca/7.0.0/CollationAuxiliary/* ~/svn.cldr/trunk/c
ommon/uca/ |
| 374 |
| 375 * When refreshing all of ICU4J data from ICU4C |
| 376 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=/tmp/icu4j icu4j-data-install |
| 377 - cp /tmp/icu4j/main/shared/data/icudata.jar ~/svn.icu4j/trunk/src/main/shared/d
ata |
| 378 or |
| 379 - ~/svn.icu/uni70/dbg$ make ICU4J_ROOT=~/svn.icu4j/trunk/src icu4j-data-install |
| 380 |
| 381 * run & fix ICU4J tests |
| 382 |
| 383 *** LayoutEngine script information |
| 384 |
| 385 (For details see the Unicode 5.2 change log below.) |
| 386 |
| 387 * Run icu4j-tools: com.ibm.icu.dev.tool.layout.ScriptNameBuilder. |
| 388 This generates LEScripts.h, LELanguages.h, ScriptAndLanguageTags.h and ScriptA
ndLanguageTags.cpp |
| 389 in the working directory. |
| 390 (It also generates ScriptRunData.cpp, which is no longer needed.) |
| 391 |
| 392 The generated files have a current copyright date and "@stable" statement. |
| 393 ICU 54: Fixed tools/misc/src/com/ibm/icu/dev/tool/layout/ScriptIDModuleWriter.
java |
| 394 for "born stable" Unicode API constants, and to stop parsing ICU version numbe
rs |
| 395 which may not contain dots any more. |
| 396 |
| 397 - diff current <icu>/source/layout files vs. generated ones |
| 398 ~/svn.icu4j/trunk/src$ meld $ICU_SRC_DIR/source/layout tools/misc/src/com/ib
m/icu/dev/tool/layout |
| 399 review and manually merge desired changes; |
| 400 fix gratuitous changes, incorrect @draft/@stable and missing aliases; |
| 401 Unicode-derived script codes should be "born stable" like constants in uchar.h
, uscript.h etc. |
| 402 - if you just copy the above files, then |
| 403 fix mixed line endings, review the diffs as above and restore changes to API t
ags etc.; |
| 404 manually re-add the "Indic script xyz v.2" tags in ScriptAndLanguageTags.h |
| 405 |
| 406 *** API additions |
| 407 - send notice to icu-design about new born-@stable API (enum constants etc.) |
| 408 |
| 409 *** merge the Unicode update branches back onto the trunk |
| 410 - do not merge the icudata.jar and testdata.jar, |
| 411 instead rebuild them from merged & tested ICU4C |
| 412 |
| 413 ---------------------------------------------------------------------------- *** |
| 414 |
16 Unicode 6.3 update | 415 Unicode 6.3 update |
17 | 416 |
18 http://www.unicode.org/review/pri249/ -- beta review | 417 http://www.unicode.org/review/pri249/ -- beta review |
19 http://www.unicode.org/reports/uax-proposed-updates.html | 418 http://www.unicode.org/reports/uax-proposed-updates.html |
20 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues | 419 http://www.unicode.org/versions/beta-6.3.0.html#notable_issues |
21 http://www.unicode.org/reports/tr44/tr44-11.html | 420 http://www.unicode.org/reports/tr44/tr44-11.html |
22 | 421 |
23 *** ICU Trac | 422 *** ICU Trac |
24 | 423 |
25 - ticket 10128: update ICU to Unicode 6.3 beta | 424 - ticket 10128: update ICU to Unicode 6.3 beta |
(...skipping 1658 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
1684 | 2083 |
1685 * name matching | 2084 * name matching |
1686 - read UCD.html | 2085 - read UCD.html |
1687 | 2086 |
1688 * scripts | 2087 * scripts |
1689 - use new Hrkt=Katakana_Or_Hiragana | 2088 - use new Hrkt=Katakana_Or_Hiragana |
1690 | 2089 |
1691 * ZWJ & ZWNJ | 2090 * ZWJ & ZWNJ |
1692 - are now part of combining character sequences | 2091 - are now part of combining character sequences |
1693 - break iteration used to assume that LB classes did not overlap; now they do fo
r ZWJ & ZWNJ | 2092 - break iteration used to assume that LB classes did not overlap; now they do fo
r ZWJ & ZWNJ |
OLD | NEW |