OLD | NEW |
1 # Copyright (c) 2001-2014 International Business Machines | 1 # Copyright (c) 2001-2015 International Business Machines |
2 # Corporation and others. All Rights Reserved. | 2 # Corporation and others. All Rights Reserved. |
3 # | 3 # |
4 # RBBI Test Data | 4 # RBBI Test Data |
5 # | 5 # |
6 # File: rbbitst.txt | 6 # File: rbbitst.txt |
7 # | 7 # |
8 # The format of this file looks vaguely like some kind of xml-ish markup, | 8 # The format of this file looks vaguely like some kind of xml-ish markup, |
9 # but it is NOT. The syntax is this.. | 9 # but it is NOT. The syntax is this.. |
10 # | 10 # |
11 # <word> any following data is for word break testing | 11 # <word> any following data is for word break testing |
(...skipping 13 matching lines...) Expand all Loading... |
25 # | 25 # |
26 # There are two copies of this file in the source repository, | 26 # There are two copies of this file in the source repository, |
27 # [ICU4C] source/test/testdata/rbbitst.txt | 27 # [ICU4C] source/test/testdata/rbbitst.txt |
28 # [ICU4J] main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt | 28 # [ICU4J] main/tests/core/src/com/ibm/icu/dev/test/rbbi/rbbitst.txt |
29 # | 29 # |
30 # ICU4C's copy is the master. If any changes are made to ICU4J's copy, make sur
e they | 30 # ICU4C's copy is the master. If any changes are made to ICU4J's copy, make sur
e they |
31 # are merged back into ICU4C's copy of the file, lest they get overwritten late
r. | 31 # are merged back into ICU4C's copy of the file, lest they get overwritten late
r. |
32 # TODO: figure out how to have a single copy of the file for use by both C and
Java. | 32 # TODO: figure out how to have a single copy of the file for use by both C and
Java. |
33 | 33 |
34 | 34 |
| 35 ## FILTERED BREAK TESTS |
| 36 |
| 37 # (William Bradford, public domain. http://catalog.hathitrust.org/Record/0086512
24 ) - edited. |
| 38 <locale en> |
| 39 <sent> |
| 40 <data>\ |
| 41 •In the meantime Mr. •Weston arrived with his small ship, which he had now recov
ered. •Capt. •Gorges, who informed the Sgt. here that one purpose of his going e
ast was to meet with Mr. •Weston, took this opportunity to call him to account f
or some abuses he had to lay to his charge.•</data> |
| 42 |
| 43 <locale en@ss=standard> |
| 44 <sent> |
| 45 <data>\ |
| 46 •In the meantime Mr. Weston arrived with his small ship, which he had now recove
red. •Capt. Gorges, who informed the Sgt. here that one purpose of his going eas
t was to meet with Mr. Weston, took this opportunity to call him to account for
some abuses he had to lay to his charge.•</data> |
| 47 |
| 48 ## END FILTERED BREAK TESTS |
| 49 |
| 50 <locale> |
| 51 |
35 # Temp debugging tests | 52 # Temp debugging tests |
36 <sent> | 53 <sent> |
37 <data>•\u00c0.•</data> | 54 <data>•\u00c0.•</data> |
38 | 55 |
39 #<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8
\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u202
9•</data> | 56 #<data>•\u5487\u67ff\ue591\u5017\u61b3\u60a1\u9510\u8165:"JAVA\u821c\u8165\u7fc8
\u51ce\u306d,\u2494\u56d8\u4ec0\u60b1\u8560\u51ba\u611d\u57b6\u2510\u5d46".\u202
9•</data> |
40 ################################################################################
######## | 57 ################################################################################
######## |
41 # | 58 # |
42 # | 59 # |
43 # G r a p h e m e C l u s t e r T e s t s | 60 # G r a p h e m e C l u s t e r T e s t s |
44 # | 61 # |
(...skipping 111 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
156 #Hindi Numbers | 173 #Hindi Numbers |
157 <data>• •\u0968\u0966.\u0969\u096f<100> •\u0967\u0966\u0966.\u0966\u0966<100> •\
N{RUPEE SIGN}•\u0967,\u0967\u0966\u0966.\u0966\u0966<100> • •\u0905\u092e\u091c<
200>\n•</data> | 174 <data>• •\u0968\u0966.\u0969\u096f<100> •\u0967\u0966\u0966.\u0966\u0966<100> •\
N{RUPEE SIGN}•\u0967,\u0967\u0966\u0966.\u0966\u0966<100> • •\u0905\u092e\u091c<
200>\n•</data> |
158 | 175 |
159 <data>•\u0938\u094d\u200d\u0935\u0924\u0902deadTA\u0930<200>\r•It's<200> •$•30.1
0<100> •12,34<100>¢•£•¤•¥•alpha\u05f3beta\u05f4gamma<200> •</data> | 176 <data>•\u0938\u094d\u200d\u0935\u0924\u0902deadTA\u0930<200>\r•It's<200> •$•30.1
0<100> •12,34<100>¢•£•¤•¥•alpha\u05f3beta\u05f4gamma<200> •</data> |
160 | 177 |
161 <data>•Badges<200>?• •BADGES<200>!•?•!• •We<200> •don't<200> •need<200> •no<200>
•STINKING<200> •BADGES<200>!•!•1000,233,456.000<100> •1,23.322<100>%•123.1222<1
00>$•123,000.20<100> •179.01<100>%•X<200> •Now<200>\r•is<200>\n•the<200>\r\n•tim
e<200> •</data> | 178 <data>•Badges<200>?• •BADGES<200>!•?•!• •We<200> •don't<200> •need<200> •no<200>
•STINKING<200> •BADGES<200>!•!•1000,233,456.000<100> •1,23.322<100>%•123.1222<1
00>$•123,000.20<100> •179.01<100>%•X<200> •Now<200>\r•is<200>\n•the<200>\r\n•tim
e<200> •</data> |
162 | 179 |
163 #Hangul | 180 #Hangul |
164 <data>•\uc5f0\ud569<200> •\uc7a5\ub85c\uad50\ud68c<200> •\u1109\u1161\u11bc\u111
2\u1161\u11bc<200> •\u1112\u1161\u11ab\u110b\u1175\u11ab<200> •Hello<200>,• •how
<200> •are<200> •you<200> •</data> | 181 <data>•\uc5f0\ud569<200> •\uc7a5\ub85c\uad50\ud68c<200> •\u1109\u1161\u11bc\u111
2\u1161\u11bc<200> •\u1112\u1161\u11ab\u110b\u1175\u11ab<200> •Hello<200>,• •how
<200> •are<200> •you<200> •</data> |
165 | 182 |
| 183 <data>•Hello<200>,• •how<200> •are<200> •you<200> •\uc5f0\ud569<200> •\uc7a5\ub8
5c\uad50\ud68c<200> •\u1109\u1161\u11bc\u1112\u1161\u11bc<200> •\u1112\u1161\u11
ab\u110b\u1175\u11ab<200> •</data> |
166 | 184 |
167 # Words containing non-BMP letters | 185 # Words containing non-BMP letters |
168 <data>•abc\U00010300<200> •abc\N{DESERET SMALL LETTER ENG}<200> •abc\N{MATHEMATI
CAL BOLD SMALL Z}<200> •abc\N{MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL}<200
> •</data> | 186 <data>•abc\U00010300<200> •abc\N{DESERET SMALL LETTER ENG}<200> •abc\N{MATHEMATI
CAL BOLD SMALL Z}<200> •abc\N{MATHEMATICAL SANS-SERIF BOLD ITALIC PI SYMBOL}<200
> •</data> |
169 | 187 |
170 # Unassigned code points | 188 # Unassigned code points |
171 <data>•abc<200>\U0001D800•def<200>\U0001D3FF• •</data> | 189 <data>•abc<200>\U0001D800•def<200>\U0001D3FF• •</data> |
172 | 190 |
173 # Hiragana & Katakana stay together, but separates from each other and Latin. | 191 # Hiragana & Katakana stay together, but separates from each other and Latin. |
174 # *** what to do about theoretical combos of chars? i.e. hiragana + accent | 192 # *** what to do about theoretical combos of chars? i.e. hiragana + accent |
175 #<data>•abc<200>\N{HIRAGANA LETTER SMALL A}<400>\N{HIRAGANA LETTER VU}\N{COMBINI
NG ACUTE ACCENT}<400>\N{HIRAGANA ITERATION MARK}<400>\N{KATAKANA LETTER SMALL A}
\N{KATAKANA ITERATION MARK}\N{HALFWIDTH KATAKANA LETTER WO}\N{HALFWIDTH KATAKANA
LETTER N}<400>def<200>#•</data> | 193 #<data>•abc<200>\N{HIRAGANA LETTER SMALL A}<400>\N{HIRAGANA LETTER VU}\N{COMBINI
NG ACUTE ACCENT}<400>\N{HIRAGANA ITERATION MARK}<400>\N{KATAKANA LETTER SMALL A}
\N{KATAKANA ITERATION MARK}\N{HALFWIDTH KATAKANA LETTER WO}\N{HALFWIDTH KATAKANA
LETTER N}<400>def<200>#•</data> |
(...skipping 69 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
245 <data>•\u8527<400>\u02ba<200>\u0027\u0d42•\u00b7•\u09ea<100></data> | 263 <data>•\u8527<400>\u02ba<200>\u0027\u0d42•\u00b7•\u09ea<100></data> |
246 | 264 |
247 # | 265 # |
248 # Jitterbug 5276 - treat Japanese half width voicing marks as Grapheme Extend | 266 # Jitterbug 5276 - treat Japanese half width voicing marks as Grapheme Extend |
249 # | 267 # |
250 <data>•A\uff9e\uff9fBC<200> •1\uff9e\uff9f23<100></data> | 268 <data>•A\uff9e\uff9fBC<200> •1\uff9e\uff9f23<100></data> |
251 | 269 |
252 # User guide example: | 270 # User guide example: |
253 <data>•Parlez<200>-•vous<200> •français<200> •?•</data> | 271 <data>•Parlez<200>-•vous<200> •français<200> •?•</data> |
254 | 272 |
| 273 # Test for #11673 |
| 274 <word> |
| 275 <data>•ジョージア<400> •</data> |
| 276 |
255 ################################################################################
######## | 277 ################################################################################
######## |
256 # | 278 # |
257 # | 279 # |
258 # S e n t e n c e B o u n d a r y T e s t s | 280 # S e n t e n c e B o u n d a r y T e s t s |
259 # | 281 # |
260 # | 282 # |
261 ################################################################################
########## | 283 ################################################################################
########## |
262 | 284 |
263 | 285 |
264 # | 286 # |
(...skipping 432 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
697 <data>•เล่น•ผ่าน•ทาง•บลูทูธ•บน•อุปกรณ์•</data> | 719 <data>•เล่น•ผ่าน•ทาง•บลูทูธ•บน•อุปกรณ์•</data> |
698 | 720 |
699 # Test for city names #10691 | 721 # Test for city names #10691 |
700 <line> | 722 <line> |
701 <data>•ไป•ที่•ซานฟรานซิสโก•</data> | 723 <data>•ไป•ที่•ซานฟรานซิสโก•</data> |
702 | 724 |
703 # Test for #10630, #10631 | 725 # Test for #10630, #10631 |
704 <line> | 726 <line> |
705 <data>•แท็ก•แอปพลิเคชัน•เป็น•พิเศษ•</data> | 727 <data>•แท็ก•แอปพลิเคชัน•เป็น•พิเศษ•</data> |
706 | 728 |
| 729 # Test for #11019 |
| 730 <line> |
| 731 <data>•เบ•เบราว์เซอร์•โพ•โพสต์•โพสท์•</data> |
| 732 |
| 733 # Test for #11688 |
| 734 <line> |
| 735 <data>•อัปเดต•อีเวนต์•</data> |
| 736 |
707 ################################################################################
########## | 737 ################################################################################
########## |
708 # | 738 # |
709 # Lao Tests | 739 # Lao Tests |
710 # | 740 # |
711 ################################################################################
########## | 741 ################################################################################
########## |
712 <locale en> | 742 <locale en> |
713 # Basic check for #7647 | 743 # Basic check for #7647 |
714 <line> | 744 <line> |
715 <data>•ສະບາຍດີ•</data> | 745 <data>•ສະບາຍດີ•</data> |
716 <data>•ດີ•ຂອບໃຈ•</data> | 746 <data>•ດີ•ຂອບໃຈ•</data> |
(...skipping 174 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
891 | 921 |
892 <data>•abc •- •def •abc •-def •abc- •def •</data> # With ASCII hyphen | 922 <data>•abc •- •def •abc •-def •abc- •def •</data> # With ASCII hyphen |
893 <data>•abc •‐ •def •abc •‐def •abc‐ •def •</data> # With Unicode u2010
hyphen | 923 <data>•abc •‐ •def •abc •‐def •abc‐ •def •</data> # With Unicode u2010
hyphen |
894 | 924 |
895 # Test for #10176 (in fi) | 925 # Test for #10176 (in fi) |
896 <line> | 926 <line> |
897 <data>•abc/•s •def•</data> | 927 <data>•abc/•s •def•</data> |
898 <data>•abc/\u05D9 •def•</data> | 928 <data>•abc/\u05D9 •def•</data> |
899 <data>•\u05E7\u05D7/\u05D9 •\u05DE\u05E2\u05D9\u05DC•</data> | 929 <data>•\u05E7\u05D7/\u05D9 •\u05DE\u05E2\u05D9\u05DC•</data> |
900 <data>•\u05D3\u05E8\u05D5\u05E9\u05D9\u05DD •\u05E9\u05D7\u05E7\u05E0\u05D9\u05D
D/\u05D9\u05D5\u05EA•</data> | 930 <data>•\u05D3\u05E8\u05D5\u05E9\u05D9\u05DD •\u05E9\u05D7\u05E7\u05E0\u05D9\u05D
D/\u05D9\u05D5\u05EA•</data> |
| 931 |
| 932 ################################################################################
#### |
| 933 # |
| 934 # Test CSS line break variants: strict, normal, loose |
| 935 # |
| 936 ################################################################################
#### |
| 937 |
| 938 <locale ja@lb=strict> |
| 939 <line> |
| 940 # •no brk before 3063 •no brk before 301C•no brk btw 2026 •no
brk before FF01• |
| 941 <data>•\u3084\u3063•\u3071•\u308A\u0020•\u0031\u301C\u0020•\u2026\u2026\u0020•\u
30A2\uFF01\u0020•</data> |
| 942 |
| 943 <locale ja@lb=normal> |
| 944 <line> |
| 945 # •brk OK before 3063 •brk OK before 301C •no brk btw 2026 •
no brk before FF01• |
| 946 <data>•\u3084•\u3063•\u3071•\u308A\u0020•\u0031•\u301C\u0020•\u2026\u2026\u0020•
\u30A2\uFF01\u0020•</data> |
| 947 |
| 948 <locale ja@lb=loose> |
| 949 <line> |
| 950 # •brk OK before 3063 •brk OK before 301C •brk OK btw 2026
•brk OK before FF01• |
| 951 <data>•\u3084•\u3063•\u3071•\u308A\u0020•\u0031•\u301C\u0020•\u2026•\u2026\u0020
•u30A2•\uFF01\u0020•</data> |
| 952 |
| 953 <locale en@lb=strict> |
| 954 <line> |
| 955 # •no brk before 3063 •no brk before 301C•no brk btw 2026 •no
brk before FF01• |
| 956 <data>•\u3084\u3063•\u3071•\u308A\u0020•\u0031\u301C\u0020•\u2026\u2026\u0020•\u
30A2\uFF01\u0020•</data> |
| 957 |
| 958 <locale en@lb=normal> |
| 959 <line> |
| 960 # •brk OK before 3063 •no brk before 301C •no brk btw 2026 •n
o brk before FF01• |
| 961 <data>•\u3084•\u3063•\u3071•\u308A\u0020•\u0031\u301C\u0020•\u2026\u2026\u0020•\
u30A2\uFF01\u0020•</data> |
| 962 |
| 963 <locale en@lb=loose> |
| 964 <line> |
| 965 # •brk OK before 3063 •no brk before 301C •brk OK btw 2026 •
no brk before FF01• |
| 966 <data>•\u3084•\u3063•\u3071•\u308A\u0020•\u0031\u301C\u0020•\u2026•\u2026\u0020•
u30A2\uFF01\u0020•</data> |
OLD | NEW |