source/test/testdata/collationtest.txt - Issue 2435373002: Delete source/test

Unified Diff: source/test/testdata/collationtest.txt

Issue 2435373002: Delete source/test (Closed)

Patch Set: Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Index: source/test/testdata/collationtest.txt

diff --git a/source/test/testdata/collationtest.txt b/source/test/testdata/collationtest.txt

deleted file mode 100644

index 3a703cb10b6a091cb39eb33b96a4ce7b18d0ae53..0000000000000000000000000000000000000000

--- a/source/test/testdata/collationtest.txt

+++ /dev/null

@@ -1,2540 +0,0 @@

-# This file should be in UTF-8 with a signature byte sequence ("BOM").

-# collationtest.txt: Collation test data.

-# created on: 2012apr13

-# created by: Markus W. Scherer

-# A line with "** test: description" is used for verbose and error output.

-# A collator can be set with "@ root" or "@ locale language-tag",

-# for example "@ locale de-u-co-phonebk".

-# An old-style locale ID can also be used, for example "@ locale de@collation=phonebook".

-# A collator can be built with "@ rules".

-# An "@ rules" line is followed by one or more lines with the tailoring rules.

-# A collator can be modified with "% attribute=value".

-# "* compare" tests the order (= or <) of the following strings.

-# The relation can be "=" or "<" (the level of the difference is not specified)

-# or "<1", "<2", "<c", "<3", "<4" (indicating the level of the difference).

-# Test sections ("* compare") are terminated by

-# definitions of new collators, changing attributes, or new test sections.

-** test: simple CEs & expansions

-# Many types of mappings are tested elsewhere, including via the UCA conformance tests.

-# Here we mostly cover a few unusual mappings.

-@ rules

-&\x01 # most control codes are ignorable

-<<<\u0300 # tertiary CE

-&9<\x00 # NUL not ignorable

-&\uA00A\uA00B=\uA002 # two long-primary CEs

-&\uA00A\uA00B\u00050005=\uA003 # three CEs, require 64 bits

-* compare

-= \x01

-= \x02

-<3 \u0300

-<1 9

-<1 \x00

-= \x01\x00\x02

-<1 a

-<3 a\u0300

-<2 a\u0308

-= ä

-<1 b

-<1 か # Hiragana Ka (U+304B)

-<2 か\u3099 # plus voiced sound mark

-= が # Hiragana Ga (U+304C)

-<1 \uA00A\uA00B

-= \uA002

-<1 \uA00A\uA00B\u00050004

-<1 \uA00A\uA00B\u00050005

-= \uA003

-<1 \uA00A\uA00B\u00050006

-** test: contractions

-# Create some interesting mappings, and map some normalization-inert characters

-# (which are not subject to canonical reordering)

-# to some of the same CEs to check the sequence of CEs.

-@ rules

-# Contractions starting with 'a' should not continue with any character < U+0300

-# so that we can test a shortcut for that.

-&a=ⓐ

-&b<bz=ⓑ

-&d<dz\u0301=ⓓ # d+z+acute

-&z

-<a\u0301=Ⓐ # a+acute sorts after z

-<a\u0301\u0301=Ⓑ # a+acute+acute

-<a\u0301\u0301\u0358=Ⓒ # a+acute+acute+dot above right

-<a\u030a=Ⓓ # a+ring

-<a\u0323=Ⓔ # a+dot below

-<a\u0323\u0358=Ⓕ # a+dot below+dot above right

-<a\u0327\u0323\u030a=Ⓖ # a+cedilla+dot below+ring

-<a\u0327\u0323bz=Ⓗ # a+cedilla+dot below+b+z

-&\U0001D158=⁰ # musical notehead black (has a symbol primary)

-<\U0001D158\U0001D165=¼ # musical quarter note

-# deliberately missing prefix contractions:

-# dz

-# a\u0327

-# a\u0327\u0323

-# a\u0327\u0323b

-&\x01

-<<<\U0001D165=¹ # musical stem (ccc=216)

-<<<\U0001D16D=² # musical augmentation dot (ccc=226)

-<<<\U0001D165\U0001D16D=³ # stem+dot (ccc=216 226)

-&\u0301=❶ # acute (ccc=230)

-&\u030a=❷ # ring (ccc=230)

-&\u0308=❸ # diaeresis (ccc=230)

-<<\u0308\u0301=❹ # diaeresis+acute (=dialytika tonos) (ccc=230 230)

-&\u0327=❺ # cedilla (ccc=202)

-&\u0323=❻ # dot below (ccc=220)

-&\u0331=❼ # macron below (ccc=220)

-<<\u0331\u0358=❽ # macron below+dot above right (ccc=220 232)

-&\u0334=❾ # tilde overlay (ccc=1)

-&\u0358=❿ # dot above right (ccc=232)

-&\u0f71=① # tibetan vowel sign aa

-&\u0f72=② # tibetan vowel sign i

-# \u0f71\u0f72 # tibetan vowel sign aa + i = ii = U+0F73

-&\u0f73=③ # tibetan vowel sign ii (ccc=0 but lccc=129)

-** test: simple contractions

-# Some strings are chosen to cause incremental contiguous contraction matching to

-# go into partial matches for prefixes of contractions

-# (where the prefixes are deliberately not also contractions).

-# When there is no complete match, then the matching code must back out of those

-# so that discontiguous contractions work as specified.

-* compare

-# contraction starter with no following text, or mismatch, or blocked

-<1 a

-= ⓐ

-<1 aa

-= ⓐⓐ

-<1 ab

-= ⓐb

-<1 az

-= ⓐz

-* compare

-<1 a

-<2 a\u0308\u030a # ring blocked by diaeresis

-= ⓐ❸❷

-<2 a\u0327

-= ⓐ❺

-* compare

-<2 \u0308

-= ❸

-<2 \u0308\u030a\u0301 # acute blocked by ring

-= ❸❷❶

-* compare

-<1 \U0001D158

-= ⁰

-<1 \U0001D158\U0001D165

-= ¼

-# no discontiguous contraction because of missing prefix contraction d+z,

-# and a starter ('z') after the 'd'

-* compare

-<1 dz\u0323\u0301

-= dz❻❶

-# contiguous contractions

-* compare

-<1 abz

-= ⓐⓑ

-<1 abzz

-= ⓐⓑz

-* compare

-<1 a

-<1 z

-<1 a\u0301

-= Ⓐ

-<1 a\u0301\u0301

-= Ⓑ

-<1 a\u0301\u0301\u0358

-= Ⓒ

-<1 a\u030a

-= Ⓓ

-<1 a\u0323\u0358

-= Ⓕ

-<1 a\u0327\u0323\u030a # match despite missing prefix

-= Ⓖ

-<1 a\u0327\u0323bz

-= Ⓗ

-* compare

-<2 \u0308\u0308\u0301 # acute blocked from first diaeresis, contracts with second

-= ❸❹

-* compare

-<1 \U0001D158\U0001D165

-= ¼

-* compare

-<3 \U0001D165\U0001D16D

-= ³

-** test: discontiguous contractions

-* compare

-<1 a\u0327\u030a # a+ring skips cedilla

-= Ⓓ❺

-<2 a\u0327\u0327\u030a # a+ring skips 2 cedillas

-= Ⓓ❺❺

-<2 a\u0327\u0327\u0327\u030a # a+ring skips 3 cedillas

-= Ⓓ❺❺❺

-<2 a\u0334\u0327\u0327\u030a # a+ring skips tilde overlay & 2 cedillas

-= Ⓓ❾❺❺

-<1 a\u0327\u0323 # a+dot below skips cedilla

-= Ⓔ❺

-<1 a\u0323\u0301\u0358 # a+dot below+dot ab.r.: 2-char match, then skips acute

-= Ⓕ❶

-<2 a\u0334\u0323\u0358 # a+dot below skips tilde overlay

-= Ⓕ❾

-* compare

-<2 \u0331\u0331\u0358 # macron below+dot ab.r. skips the second macron below

-= ❽❼

-* compare

-<1 a\u0327\u0331\u0323\u030a # a+ring skips cedilla, macron below, dot below (dot blocked by macron)

-= Ⓓ❺❼❻

-<1 a\u0327\u0323\U0001D16D\u030a # a+dot below skips cedilla

-= Ⓔ❺²❷

-<2 a\u0327\u0327\u0323\u030a # a+dot below skips 2 cedillas

-= Ⓔ❺❺❷

-<2 a\u0327\u0323\u0323\u030a # a+dot below skips cedilla

-= Ⓔ❺❻❷

-<2 a\u0334\u0327\u0323\u030a # a+dot below skips tilde overlay & cedilla

-= Ⓔ❾❺❷

-* compare

-<1 \U0001D158\u0327\U0001D165 # quarter note skips cedilla

-= ¼❺

-<1 a\U0001D165\u0323 # a+dot below skips stem

-= Ⓔ¹

-# partial contiguous match, backs up, matches discontiguous contraction

-<1 a\u0327\u0323b

-= Ⓔ❺b

-<1 a\u0327\u0323ba

-= Ⓔ❺bⓐ

-# a+acute+acute+dot above right skips cedilla, continues matching 2 same-ccc combining marks

-* compare

-<1 a\u0327\u0301\u0301\u0358

-= Ⓒ❺

-# FCD but not NFD

-* compare

-<1 a\u0f73\u0301 # a+acute skips tibetan ii

-= Ⓐ③

-# FCD but the 0f71 inside the 0f73 must be skipped

-# to match the discontiguous contraction of the first 0f71 with the trailing 0f72 inside the 0f73

-* compare

-<1 \u0f71\u0f73 # == \u0f73\u0f71 == \u0f71\u0f71\u0f72

-= ③①

-** test: discontiguous contractions with nested contractions

-* compare

-<1 a\u0323\u0308\u0301\u0358

-= Ⓕ❹

-<2 a\u0323\u0308\u0301\u0308\u0301\u0358

-= Ⓕ❹❹

-** test: discontiguous contractions with interleaved contractions

-* compare

-# a+ring & cedilla & macron below+dot above right

-<1 a\u0327\u0331\u030a\u0358

-= Ⓓ❺❽

-# a+ring & 1x..3x macron below+dot above right

-<2 a\u0331\u030a\u0358

-= Ⓓ❽

-<2 a\u0331\u0331\u030a\u0358\u0358

-= Ⓓ❽❽

-# also skips acute

-<2 a\u0331\u0331\u0331\u030a\u0301\u0358\u0358\u0358

-= Ⓓ❽❽❽❶

-# a+dot below & stem+augmentation dot, followed by contiguous d+z+acute

-<1 a\U0001D165\u0323\U0001D16Ddz\u0301

-= Ⓔ³ⓓ

-** test: some simple string comparisons

-@ root

-* compare

-# first string compares against ""

-= \u0000

-< a

-<1 b

-<3 B

-= \u0000B\u0000

-** test: compare with strength=primary

-% strength=primary

-* compare

-<1 a

-<1 b

-= B

-** test: compare with strength=secondary

-% strength=secondary

-* compare

-<1 a

-<1 b

-= B

-** test: compare with strength=tertiary

-% strength=tertiary

-* compare

-<1 a

-<1 b

-<3 B

-** test: compare with strength=quaternary

-% strength=quaternary

-* compare

-<1 a

-<1 b

-<3 B

-** test: compare with strength=identical

-% strength=identical

-* compare

-<1 a

-<1 b

-<3 B

-** test: côté with forwards secondary

-@ root

-* compare

-<1 cote

-<2 coté

-<2 côte

-<2 côté

-** test: côté with forwards secondary vs. U+FFFE merge separator

-# Merged sort keys: On each level, any difference in the first segment

-# must trump any further difference.

-* compare

-<1 cote\uFFFEcôté

-<2 coté\uFFFEcôte

-<2 côte\uFFFEcoté

-<2 côté\uFFFEcote

-** test: côté with backwards secondary

-% backwards=on

-* compare

-<1 cote

-<2 côte

-<2 coté

-<2 côté

-** test: côté with backwards secondary vs. U+FFFE merge separator

-# Merged sort keys: On each level, any difference in the first segment

-# must trump any further difference.

-* compare

-<1 cote\uFFFEcôté

-<2 côte\uFFFEcoté

-<2 coté\uFFFEcôte

-<2 côté\uFFFEcote

-** test: U+FFFE on identical level

-@ root

-% strength=identical

-* compare

-# All of these control codes are completely-ignorable, so that

-# their low code points are compared with the merge separator.

-# The merge separator must compare less than any other character.

-<1 \uFFFE\u0001\u0002\u0003

-<i \u0001\uFFFE\u0002\u0003

-<i \u0001\u0002\uFFFE\u0003

-<i \u0001\u0002\u0003\uFFFE

-* compare

-# The merge separator must even compare less than U+0000.

-<1 \uFFFE\u0000\u0000

-<i \u0000\uFFFE\u0000

-<i \u0000\u0000\uFFFE

-** test: Hani < surrogates < U+FFFD

-# Note: compareUTF8() treats unpaired surrogates like U+FFFD,

-# so with that the strings with surrogates will compare equal to each other

-# and equal to the string with U+FFFD.

-@ root

-% strength=identical

-* compare

-<1 abz

-<1 a\u4e00z

-<1 a\U00020000z

-<1 a\ud800z

-<1 a\udbffz

-<1 a\udc00z

-<1 a\udfffz

-<1 a\ufffdz

-** test: script reordering

-@ root

-% reorder Hani Zzzz digit

-* compare

-<1 ?

-<1 +

-<1 丂

-<1 a

-<1 α

-<1 5

-% reorder default

-* compare

-<1 ?

-<1 +

-<1 5

-<1 a

-<1 α

-<1 丂

-** test: empty rules

-@ rules

-* compare

-<1 a

-<2 ä

-<3 Ä

-<1 b

-** test: very simple rules

-@ rules

-&a=e<<<<q<<<<r<x<<<X<<y<<<Y;z,Z

-% strength=quaternary

-* compare

-<1 a

-= e

-<4 q

-<4 r

-<1 x

-<3 X

-<2 y

-<3 Y

-<2 z

-<3 Z

-** test: tailoring twice before a root position: primary

-@ rules

-&[before 1]b<p

-&[before 1]b<q

-* compare

-<1 a

-<1 p

-<1 q

-<1 b

-** test: tailoring twice before a root position: secondary

-@ rules

-&[before 2]ſ<<p

-&[before 2]ſ<<q

-* compare

-<1 s

-<2 p

-<2 q

-<2 ſ

-# secondary-before common weight

-@ rules

-&[before 2]b<<p

-&[before 2]b<<q

-* compare

-<1 a

-<1 p

-<2 q

-<2 b

-** test: tailoring twice before a root position: tertiary

-@ rules

-&[before 3]B<<<p

-&[before 3]B<<<q

-* compare

-<1 b

-<3 p

-<3 q

-<3 B

-# tertiary-before common weight

-@ rules

-&[before 3]b<<<p

-&[before 3]b<<<q

-* compare

-<1 a

-<1 p

-<3 q

-<3 b

-@ rules

-&[before 2]b<<s

-&[before 3]s<<<p

-&[before 3]s<<<q

-* compare

-<1 a

-<1 p

-<3 q

-<3 s

-<2 b

-** test: tailor after completely ignorable

-@ rules

-&\x00<<<x<<y

-* compare

-= \x00

-= \x1F

-<3 x

-<2 y

-** test: secondary tailoring gaps, ICU ticket 9362

-@ rules

-&[before 2]s<<'_'

-&s<<r # secondary between s and ſ (long s)

-&ſ<<*a-q # more than 15 between ſ and secondary CE boundary

-&[before 2][first primary ignorable]<<u<<v # between secondary CE boundary & lowest secondary CE

-&[last primary ignorable]<<y<<z

-* compare

-<2 u

-<2 v

-<2 \u0332 # lowest secondary CE

-<2 \u0308

-<2 y

-<2 z

-<1 s_

-<2 ss

-<2 sr

-<2 sſ

-<2 sa

-<2 sb

-<2 sp

-<2 sq

-<2 sus

-<2 svs

-<2 rs

-** test: tertiary tailoring gaps, ICU ticket 9362

-@ rules

-&[before 3]t<<<'_'

-&t<<<r # tertiary between t and fullwidth t

-&ᵀ<<<*a-q # more than 15 between ᵀ (modifier letter T) and tertiary CE boundary

-&[before 3][first secondary ignorable]<<<u<<<v # between tertiary CE boundary & lowest tertiary CE

-&[last secondary ignorable]<<<y<<<z

-* compare

-<3 u

-<3 v

-# Note: The root collator currently does not map any characters to tertiary CEs.

-<3 y

-<3 z

-<1 t_

-<3 tt

-<3 tr

-<3 tｔ

-<3 tᵀ

-<3 ta

-<3 tb

-<3 tp

-<3 tq

-<3 tut

-<3 tvt

-<3 rt

-** test: secondary & tertiary around root character

-@ rules

-&[before 2]m<<r

-&m<<s

-&[before 3]m<<<u

-&m<<<v

-* compare

-<1 l

-<1 r

-<2 u

-<3 m

-<3 v

-<2 s

-<1 n

-** test: secondary & tertiary around tailored item

-@ rules

-&m<x

-&[before 2]x<<r

-&x<<s

-&[before 3]x<<<u

-&x<<<v

-* compare

-<1 m

-<1 r

-<2 u

-<3 x

-<3 v

-<2 s

-<1 n

-** test: more nesting of secondary & tertiary before

-@ rules

-&[before 3]m<<<u

-&[before 2]m<<r

-&[before 3]r<<<q

-&m<<<w

-&m<<t

-&[before 3]w<<<v

-&w<<<x

-&w<<s

-* compare

-<1 l

-<1 q

-<3 r

-<2 u

-<3 m

-<3 v

-<3 w

-<3 x

-<2 s

-<2 t

-<1 n

-** test: case bits

-@ rules

-&w<x # tailored CE getting case bits

- =uv=uV=Uv=UV # 2 chars -> 1 CE

-&ae=ch=cH=Ch=CH # 2 chars -> 2 CEs

-&rst=yz=yZ=Yz=YZ # 2 chars -> 3 CEs

-% caseFirst=lower

-* compare

-<1 ae

-= ch

-<3 cH

-<3 Ch

-<3 CH

-<1 rst

-= yz

-<3 yZ

-<3 Yz

-<3 YZ

-<1 w

-<1 x

-= uv

-<3 uV

-= Uv # mixed case on single CE cannot distinguish variations

-<3 UV

-** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=lower

-@ rules

-&\u0001<<<t<<<T # tertiary CEs

-% caseFirst=lower

-* compare

-<1 aa

-<3 aat

-<3 aaT

-<3 aA

-<3 aAt

-<3 ata

-<3 aTa

-** test: tertiary CEs, tertiary, caseLevel=off, caseFirst=upper

-% caseFirst=upper

-* compare

-<1 aA

-<3 aAt

-<3 aa

-<3 aat

-<3 aaT

-<3 ata

-<3 aTa

-** test: reset on expansion, ICU tickets 9415 & 9593

-@ rules

-&æ<x # tailor the last primary CE so that x sorts between ae and af

-&æb=bæ # copy all reset CEs to make bæ sort the same

-&각<h # copy/tailor 3 CEs to make h sort before the next Hangul syllable 갂

-&⒀<<y # copy/tailor 4 CEs to make y sort with only a secondary difference

-&l·=z # handle the pre-context for · when fetching reset CEs

- <<u # copy/tailor 2 CEs

-* compare

-<1 ae

-<2 æ

-<1 x

-<1 af

-* compare

-<1 aeb

-<2 æb

-= bæ

-* compare

-<1 각

-<1 h

-<1 갂

-<1 갃

-* compare

-<1 · # by itself: primary CE

-<1 l

-<2 l· # l+middle dot has only a secondary difference from l

-= z

-<2 u

-* compare

-<1 (13)

-<3 ⒀ # DUCET sets special tertiary weights in all CEs

-<2 y

-<1 (13[

-% alternate=shifted

-* compare

-<1 (13)

-= 13

-<3 ⒀

-= y # alternate=shifted removes the tailoring difference on the last CE

-<1 14

-** test: contraction inside extension, ICU ticket 9378

-@ rules

-&а<<х/й # all letters are Cyrillic

-* compare

-<1 ай

-<2 х

-** test: no duplicate tailored CEs for different reset positions with same CEs, ICU ticket 10104

-@ rules

-&t<x &ᵀ<y # same primary weights

-&q<u &[before 1]ꝗ<v # q and ꝗ are primary adjacent

-* compare

-<1 q

-<1 u

-<1 v

-<1 ꝗ

-<1 t

-<3 ᵀ

-<1 y

-<1 x

-# Principle: Each rule builds on the state of preceding rules and ignores following rules.

-** test: later rule does not affect earlier reset position, ICU ticket 10105

-@ rules

-&a < u < v < w &ov < x &b < v

-* compare

-<1 oa

-<1 ou

-<1 x # CE(o) followed by CE between u and w

-<1 ow

-<1 ob

-<1 ov

-** test: later rule does not affect earlier extension (1), ICU ticket 10105

-@ rules

-&a=x/b &v=b

-% strength=secondary

-* compare

-<1 B

-<1 c

-<1 v

-= b

-* compare

-<1 AB

-= x

-<1 ac

-<1 av

-= ab

-** test: later rule does not affect earlier extension (2), ICU ticket 10105

-@ rules

-&a <<< c / e &g <<< e / l

-% strength=secondary

-* compare

-<1 AE

-= c

-<2 æ

-<1 agl

-= ae

-** test: later rule does not affect earlier extension (3), ICU ticket 10105

-@ rules

-&a = b / c &d = c / e

-% strength=secondary

-* compare

-<1 AC # C is still only tertiary different from the original c

-= b

-<1 ade

-= ac

-** test: extension contains tailored character, ICU ticket 10105

-@ rules

-&a=e &b=u/e

-* compare

-<1 a

-= e

-<1 ba

-= be

-= u

-** test: add simple mappings for characters with root context

-@ rules

-&z=· # middle dot has a prefix mapping in the CLDR root

-&n=и # и (U+0438) has contractions in the root

-* compare

-<1 l

-<2 l· # root mapping for l|· still works

-<1 z

-= ·

-* compare

-<1 n

-= и

-<1 И

-<1 и\u0306 # root mapping for й=и\u0306 still works

-= й

-<3 Й

-** test: add context mappings around characters with root context

-@ rules

-&z=·h # middle dot has a prefix mapping in the CLDR root

-&n=ә|и # и (U+0438) has contractions in the root

-* compare

-<1 l

-<2 l· # root mapping for l|· still works

-<1 z

-= ·h

-* compare

-<1 и

-<3 И

-<1 и\u0306 # root mapping for й=и\u0306 still works

-= й

-* compare

-<1 әn

-= әи

-<1 әo

-** test: many secondary CEs at the top of their range

-@ rules

-&[last primary ignorable]<<*\u2801-\u28ff

-* compare

-<2 \u0308

-<2 \u2801

-<2 \u2802

-<2 \u2803

-<2 \u2804

-<2 \u28fd

-<2 \u28fe

-<2 \u28ff

-<1 \x20

-** test: many tertiary CEs at the top of their range

-@ rules

-&[last secondary ignorable]<<<*a-z

-* compare

-<3 a

-<3 b

-<3 c

-<3 d

-# e..w

-<3 x

-<3 y

-<3 z

-<2 \u0308

-** test: tailor contraction together with nearly equivalent prefix, ICU ticket 10101

-@ rules

-&a=p|x &b=px &c=op

-* compare

-<1 b

-= px

-<3 B

-<1 c

-= op

-<3 C

-* compare

-<1 ca

-= opx # first contraction op, then prefix p|x

-<3 cA

-<3 Ca

-** test: reset position with prefix (pre-context), ICU ticket 10102

-@ rules

-&a=p|x &px=y

-* compare

-<1 pa

-= px

-= y

-<3 pA

-<1 q

-<1 x

-** test: prefix+contraction together (1), ICU ticket 10071

-@ rules

-&x=a|bc

-* compare

-<1 ab

-<1 Abc

-<1 abd

-<1 ac

-<1 aw

-<1 ax

-= abc

-<3 aX

-<3 Ax

-<1 b

-<1 bb

-<1 bc

-<3 bC

-<3 Bc

-<1 bd

-** test: prefix+contraction together (2), ICU ticket 10071

-@ rules

-&w=bc &x=a|b

-* compare

-<1 w

-= bc

-<3 W

-* compare

-<1 aw

-<1 ax

-= ab

-<3 aX

-<1 axb

-<1 axc

-= abc # prefix match a|b takes precedence over contraction match bc

-<3 abC

-<1 abd

-<1 ay

-** test: prefix+contraction together (3), ICU ticket 10071

-@ rules

-&x=a|b &w=bc # reverse order of rules as previous test, order should not matter here

-* compare # same "compare" sequences as previous test

-<1 w

-= bc

-<3 W

-* compare

-<1 aw

-<1 ax

-= ab

-<3 aX

-<1 axb

-<1 axc

-= abc # prefix match a|b takes precedence over contraction match bc

-<3 abC

-<1 abd

-<1 ay

-** test: no mapping p|c, falls back to contraction ch, CLDR ticket 5962

-@ rules

-&d=ch &v=p|ci

-* compare

-<1 pc

-<3 pC

-<1 pcH

-<1 pcI

-<1 pd

-= pch # no-prefix contraction ch matches

-<3 pD

-<1 pv

-= pci # prefix+contraction p|ci matches

-<3 pV

-** test: tailor in & around compact ranges of root primaries

-# The Ogham characters U+1681..U+169A are in simple ascending order of primary CEs

-# which should be reliably encoded as one range in the root elements data.

-@ rules

-&[before 1]ᚁ<a

-&ᚁ<b

-&[before 1]ᚂ<c

-&ᚂ<d

-&[before 1]ᚚ<y

-&ᚚ<z

-&[before 2]ᚁ<<r

-&ᚁ<<s

-&[before 3]ᚚ<<<t

-&ᚚ<<<u

-* compare

-<1 ᣵ # U+18F5 last Canadian Aboriginal

-<1 a

-<1 r

-<2 ᚁ

-<2 s

-<1 b

-<1 c

-<1 ᚂ

-<1 d

-<1 ᚃ

-<1 ᚙ

-<1 y

-<1 t

-<3 ᚚ

-<3 u

-<1 z

-<1 ᚠ # U+16A0 first Runic

-** test: suppressContractions

-@ rules

-&z<ch<әж [suppressContractions [·cә]]

-* compare

-<1 ch

-<3 cH # ch was suppressed

-<1 l

-<1 l· # primary difference, not secondary, because l|· was suppressed

-<1 ә

-<2 ә\u0308 # secondary difference, not primary, because contractions for ә were suppressed

-<1 әж

-<3 әЖ

-** test: Hangul & Jamo

-@ rules

-&L=\u1100 # first Jamo L

-&V=\u1161 # first Jamo V

-&T=\u11A8 # first Jamo T

-&\uAC01<<*\u4E00-\u4EFF # first Hangul LVT syllable & lots of secondary diffs

-* compare

-<1 Lv

-<3 LV

-= \u1100\u1161

-= \uAC00

-<1 LVt

-<3 LVT

-= \u1100\u1161\u11A8

-= \uAC00\u11A8

-= \uAC01

-<2 LVT\u0308

-<2 \u4E00

-<2 \u4E01

-<2 \u4E80

-<2 \u4EFF

-<2 LV\u0308T

-<1 \uAC02

-** test: adjust special reset positions according to previous rules, CLDR ticket 6070

-@ rules

-&[last variable]<x

-[maxVariable space] # has effect only after building, no effect on following rules

-&[last variable]<y

-&[before 1][first regular]<z

-* compare

-<1 ? # some punctuation

-<1 x

-<1 y

-<1 z

-<1 $ # some symbol

-@ rules

-&[last primary ignorable]<<x<<<y

-&[last primary ignorable]<<z

-* compare

-<2 \u0358

-<2 x

-<3 y

-<2 z

-<1 \x20

-@ rules

-&[last secondary ignorable]<<<x

-&[last secondary ignorable]<<<y

-* compare

-<3 x

-<3 y

-<2 \u0358

-@ rules

-&[before 2][first variable]<<z

-&[before 2][first variable]<<y

-&[before 3][first variable]<<<x

-&[before 3][first variable]<<<w

-&[before 1][first variable]<v

-&[before 2][first variable]<<u

-&[before 3][first variable]<<<t

-&[before 2]\uFDD1\xA0<<s # FractionalUCA.txt: FDD1 00A0, SPACE first primary

-* compare

-<2 \u0358

-<1 s

-<2 \uFDD1\xA0

-<1 t

-<3 u

-<2 v

-<1 w

-<3 x

-<3 y

-<2 z

-<2 \t

-@ rules

-&[before 2][first regular]<<z

-&[before 3][first regular]<<<y

-&[before 1][first regular]<x

-&[before 3][first regular]<<<w

-&[before 2]\uFDD1\u263A<<v # FractionalUCA.txt: FDD1 263A, SYMBOL first primary

-&[before 3][first regular]<<<u

-&[before 1][first regular]<p # primary before the boundary: becomes variable

-&[before 3][first regular]<<<t # not affected by p

-&[last variable]<q # after p!

-* compare

-<1 ?

-<1 p

-<1 q

-<1 t

-<3 u

-<3 v

-<1 w

-<3 x

-<1 y

-<3 z

-<1 $

-# check that p & q are indeed variable

-% alternate=shifted

-* compare

-= ?

-= p

-= q

-<1 t

-<3 u

-<3 v

-<1 w

-<3 x

-<1 y

-<3 z

-<1 $

-@ rules

-&[before 2][first trailing]<<z

-&[before 1][first trailing]<y

-&[before 3][first trailing]<<<x

-* compare

-<1 \u4E00 # first Han, first implicit

-<1 \uFDD1\uFDD0 # FractionalUCA.txt: unassigned first primary

-# Note: The root collator currently does not map any characters to the trailing first boundary primary.

-<1 x

-<3 y

-<1 z

-<2 \uFFFD # The root collator currently maps U+FFFD to the first real trailing primary.

-@ rules

-&[before 2][first primary ignorable]<<z

-&[before 2][first primary ignorable]<<y

-&[before 3][first primary ignorable]<<<x

-&[before 3][first primary ignorable]<<<w

-* compare

-= \x01

-<2 w

-<3 x

-<3 y

-<2 z

-<2 \u0301

-@ rules

-&[before 3][first secondary ignorable]<<<y

-&[before 3][first secondary ignorable]<<<x

-* compare

-= \x01

-<3 x

-<3 y

-<2 \u0301

-** test: canonical closure

-@ rules

-&X=A &U=Â

-* compare

-<1 U

-= Â

-= A\u0302

-<2 Ú # U with acute

-= U\u0301

-= Ấ # A with circumflex & acute

-= Â\u0301

-= A\u0302\u0301

-<1 X

-= A

-<2 X\u030A # with ring above

-= Å

-= A\u030A

-= \u212B # Angstrom sign

-@ rules

-&x=\u5140\u55C0

-* compare

-<1 x

-= \u5140\u55C0

-= \u5140\uFA0D

-= \uFA0C\u55C0

-= \uFA0C\uFA0D # CJK compatibility characters

-<3 X

-# canonical closure on prefix rules, ICU ticket 9444

-@ rules

-&x=ä|ŝ

-* compare

-<1 äs # not tailored

-<1 äx

-= äŝ

-= a\u0308s\u0302

-= a\u0308ŝ

-= äs\u0302

-<3 äX

-** test: conjoining Jamo map to expansions

-@ rules

-&gg=\u1101 # Jamo Lead consonant GG

-&nj=\u11AC # Jamo Trail consonant NJ

-* compare

-<1 gg\u1161nj

-= \u1101\u1161\u11AC

-= \uAE4C\u11AC

-= \uAE51

-<3 gg\u1161nJ

-<1 \u1100\u1100

-** test: canonical tail closure, ICU ticket 5913

-@ rules

-&a<â

-* compare

-<1 a

-<1 â # tailored

-= a\u0302

-<2 a\u0323\u0302 # discontiguous contraction

-= ạ\u0302 # equivalent

-= ậ # equivalent

-<1 b

-@ rules

-&a<ạ

-* compare

-<1 a

-<1 ạ # tailored

-= a\u0323

-<2 a\u0323\u0302 # contiguous contraction plus extra diacritic

-= ạ\u0302 # equivalent

-= ậ # equivalent

-<1 b

-# Tail closure should work even if there is a prefix and/or contraction.

-@ rules

-&a<\u5140|câ

-# In order to find discontiguous contractions for \u5140|câ

-# there must exist a mapping for \u5140|ca, regardless of what it maps to.

-# (This follows from the UCA spec.)

-&x=\u5140|ca

-* compare

-<1 \u5140a

-= \uFA0Ca

-<1 \u5140câ # tailored

-= \uFA0Ccâ

-= \u5140ca\u0302

-= \uFA0Cca\u0302

-<2 \u5140ca\u0323\u0302 # discontiguous contraction

-= \uFA0Cca\u0323\u0302

-= \u5140cạ\u0302

-= \uFA0Ccạ\u0302

-= \u5140cậ

-= \uFA0Ccậ

-<1 \u5140b

-= \uFA0Cb

-<1 \u5140x

-= \u5140ca

-# Double-check that without the extra mapping there will be no discontiguous match.

-@ rules

-&a<\u5140|câ

-* compare

-<1 \u5140a

-= \uFA0Ca

-<1 \u5140câ # tailored

-= \uFA0Ccâ

-= \u5140ca\u0302

-= \uFA0Cca\u0302

-<1 \u5140b

-= \uFA0Cb

-<1 \u5140ca\u0323\u0302 # no discontiguous contraction

-= \uFA0Cca\u0323\u0302

-= \u5140cạ\u0302

-= \uFA0Ccạ\u0302

-= \u5140cậ

-= \uFA0Ccậ

-@ rules

-&a<cạ

-* compare

-<1 a

-<1 cạ # tailored

-= ca\u0323

-<2 ca\u0323\u0302 # contiguous contraction plus extra diacritic

-= cạ\u0302 # equivalent

-= cậ # equivalent

-<1 b

-# ᾢ = U+1FA2 GREEK SMALL LETTER OMEGA WITH PSILI AND VARIA AND YPOGEGRAMMENI

-# = 03C9 0313 0300 0345

-# ccc = 0, 230, 230, 240

-@ rules

-&δ=αῳ

-# In order to find discontiguous contractions for αῳ

-# there must exist a mapping for αω, regardless of what it maps to.

-# (This follows from the UCA spec.)

-&ε=αω

-* compare

-<1 δ

-= αῳ

-= αω\u0345

-<2 αω\u0313\u0300\u0345 # discontiguous contraction

-= αὠ\u0300\u0345

-= αὢ\u0345

-= αᾢ

-<2 αω\u0300\u0313\u0345

-= αὼ\u0313\u0345

-= αῲ\u0313 # not FCD

-<1 ε

-= αω

-# Double-check that without the extra mapping there will be no discontiguous match.

-@ rules

-&δ=αῳ

-* compare

-<1 αω\u0313\u0300\u0345 # no discontiguous contraction

-= αὠ\u0300\u0345

-= αὢ\u0345

-= αᾢ

-<2 αω\u0300\u0313\u0345

-= αὼ\u0313\u0345

-= αῲ\u0313 # not FCD

-<1 δ

-= αῳ

-= αω\u0345

-# Add U+0315 COMBINING COMMA ABOVE RIGHT which has ccc=232.

-# Tests code paths where the tailored string has a combining mark

-# that does not occur in any composite's decomposition.

-@ rules

-&δ=αὼ\u0315

-* compare

-<1 αω\u0313\u0300\u0315 # Not tailored: The grave accent blocks the comma above.

-= αὠ\u0300\u0315

-= αὢ\u0315

-<1 δ

-= αὼ\u0315

-= αω\u0300\u0315

-<2 αω\u0300\u0315\u0345

-= αὼ\u0315\u0345

-= αῲ\u0315 # not FCD

-** test: danish a+a vs. a-umlaut, ICU ticket 9319

-@ rules

-&z<aa

-* compare

-<1 z

-<1 aa

-<2 aa\u0308

-= aä

-** test: Jamo L with and in prefix

-# Useful for the Korean "searchjl" tailoring (instead of contractions of pairs of Jamo L).

-@ rules

-# Jamo Lead consonant G after G or GG

-&[last primary ignorable]<<\u1100|\u1100=\u1101|\u1100

-# Jamo Lead consonant GG sorts like G+G

-&\u1100\u1100=\u1101

-# Note: Making G|GG and GG|GG sort the same as G|G+G

-# would require the ability to reset on G|G+G,

-# or we could make G-after-G equal to some secondary-CE character,

-# and reset on a pair of those.

-# (It does not matter much if there are at most two G in a row in real text.)

-* compare

-<1 \u1100

-<2 \u1100\u1100 # only one primary from a sequence of G lead consonants

-= \u1101

-<2 \u1100\u1100\u1100

-= \u1101\u1100

-# but not = \u1100\u1101, see above

-<1 \u1100\u1161

-= \uAC00

-<2 \u1100\u1100\u1161

-= \u1100\uAC00 # prefix match from the L of the LV syllable

-= \u1101\u1161

-= \uAE4C

-** test: proposed Korean "searchjl" tailoring with prefixes, CLDR ticket 6546

-@ rules

-# Low secondary CEs for Jamo V & T.

-# Note: T should sort before V for proper syllable order.

-&\u0332 # COMBINING LOW LINE (first primary ignorable)

-<<\u1161<<\u1162

-# Korean Jamo lead consonant search rules, part 2:

-# Make modern compound L jamo primary equivalent to non-compound forms.

-# Secondary CEs for Jamo L-after-L, greater than Jamo V & T.

-&\u0313 # COMBINING COMMA ABOVE (second primary ignorable)

-=\u1100|\u1100

-=\u1103|\u1103

-=\u1107|\u1107

-=\u1109|\u1109

-=\u110C|\u110C

-# Compound L Jamo map to equivalent expansions of primary+secondary CE.

-&\u1100\u0313=\u1101<<<\u3132 # HANGUL CHOSEONG SSANGKIYEOK, HANGUL LETTER SSANGKIYEOK

-&\u1103\u0313=\u1104<<<\u3138 # HANGUL CHOSEONG SSANGTIKEUT, HANGUL LETTER SSANGTIKEUT

-&\u1107\u0313=\u1108<<<\u3143 # HANGUL CHOSEONG SSANGPIEUP, HANGUL LETTER SSANGPIEUP

-&\u1109\u0313=\u110A<<<\u3146 # HANGUL CHOSEONG SSANGSIOS, HANGUL LETTER SSANGSIOS

-&\u110C\u0313=\u110D<<<\u3149 # HANGUL CHOSEONG SSANGCIEUC, HANGUL LETTER SSANGCIEUC

-* compare

-<1 \u1100\u1161

-= \uAC00

-<2 \u1100\u1162

-= \uAC1C

-<2 \u1100\u1100\u1161

-= \u1100\uAC00

-= \u1101\u1161

-= \uAE4C

-<3 \u3132\u1161

-** test: Hangul syllables in prefix & in the interior of a contraction

-@ rules

-&x=\u1100\u1161|a\u1102\u1162z

-* compare

-<1 \u1100\u1161x

-= \u1100\u1161a\u1102\u1162z

-= \u1100\u1161a\uB0B4z

-= \uAC00a\u1102\u1162z

-= \uAC00a\uB0B4z

-** test: digits are unsafe-backwards when numeric=on

-@ root

-% numeric=on

-* compare

-# If digits are not unsafe, then numeric collation sees "1"=="01" and "b">"a".

-# We need to back up before the identical prefix "1" and compare the full numbers.

-<1 11b

-<1 101a

-** test: simple locale data test

-@ locale de

-* compare

-<1 a

-<2 ä

-<1 ae

-<2 æ

-@ locale de-u-co-phonebk

-* compare

-<1 a

-<1 ae

-<2 ä

-<2 æ

-# The following test cases were moved here from ICU 52's DataDrivenCollationTest.txt.

-** test: DataDrivenCollationTest/TestMorePinyin

-# Testing the primary strength.

-@ locale zh

-% strength=primary

-* compare

-< lā

-= lĀ

-= Lā

-= LĀ

-< lān

-= lĀn

-< lē

-= lĒ

-= Lē

-= LĒ

-< lēn

-= lĒn

-** test: DataDrivenCollationTest/TestLithuanian

-# Lithuanian sort order.

-@ locale lt

-* compare

-< cz

-< č

-< d

-< iz

-< j

-< sz

-< š

-< t

-< zz

-< ž

-** test: DataDrivenCollationTest/TestLatvian

-# Latvian sort order.

-@ locale lv

-* compare

-< cz

-< č

-< d

-< gz

-< ģ

-< h

-< iz

-< j

-< kz

-< ķ

-< l

-< lz

-< ļ

-< m

-< nz

-< ņ

-< o

-< rz

-< ŗ

-< s

-< sz

-< š

-< t

-< zz

-< ž

-** test: DataDrivenCollationTest/TestEstonian

-# Estonian sort order.

-@ locale et

-* compare

-< sy

-< š

-< šy

-< z

-< zy

-< ž

-< v

-< va

-< w

-< õ

-< õy

-< ä

-< äy

-< ö

-< öy

-< ü

-< üy

-< x

-** test: DataDrivenCollationTest/TestAlbanian

-# Albanian sort order.

-@ locale sq

-* compare

-< cz

-< ç

-< d

-< dz

-< dh

-< e

-< ez

-< ë

-< f

-< gz

-< gj

-< h

-< lz

-< ll

-< m

-< nz

-< nj

-< o

-< rz

-< rr

-< s

-< sz

-< sh

-< t

-< tz

-< th

-< u

-< xz

-< xh

-< y

-< zz

-< zh

-** test: DataDrivenCollationTest/TestSimplifiedChineseOrder

-# Sorted file has different order.

-@ root

-# normalization=on turned on & off automatically.

-* compare

-< \u5F20

-< \u5F20\u4E00\u8E3F

-** test: DataDrivenCollationTest/TestTibetanNormalizedIterativeCrash

-# This pretty much crashes.

-@ root

-* compare

-< \u0f71\u0f72\u0f80\u0f71\u0f72

-< \u0f80

-** test: DataDrivenCollationTest/TestThaiPartialSortKeyProblems

-# These are examples of strings that caused trouble in partial sort key testing.

-@ locale th-TH

-* compare

-< \u0E01\u0E01\u0E38\u0E18\u0E20\u0E31\u0E13\u0E11\u0E4C

-< \u0E01\u0E01\u0E38\u0E2A\u0E31\u0E19\u0E42\u0E18

-* compare

-< \u0E01\u0E07\u0E01\u0E32\u0E23

-< \u0E01\u0E07\u0E42\u0E01\u0E49

-* compare

-< \u0E01\u0E23\u0E19\u0E17\u0E32

-< \u0E01\u0E23\u0E19\u0E19\u0E40\u0E0A\u0E49\u0E32

-* compare

-< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E22\u0E27

-< \u0E01\u0E23\u0E30\u0E40\u0E08\u0E35\u0E4A\u0E22\u0E27

-* compare

-< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E2D

-< \u0E01\u0E23\u0E23\u0E40\u0E0A\u0E49\u0E32

-** test: DataDrivenCollationTest/TestJavaStyleRule

-# java.text allows rules to start as '<<<x<<<y...'

-# we emulate this by assuming a &[first tertiary ignorable] in this case.

-@ rules

-&\u0001=equal<<<z<<x<<<w &[first tertiary ignorable]=a &[first primary ignorable]=b

-* compare

-= a

-= equal

-< z

-< x

-= b # x had become the new first primary ignorable

-< w

-** test: DataDrivenCollationTest/TestShiftedIgnorable

-# The UCA states that primary ignorables should be completely

-# ignorable when following a shifted code point.

-@ root

-% alternate=shifted

-% strength=quaternary

-* compare

-< a\u0020b

-= a\u0020\u0300b

-= a\u0020\u0301b

-< a_b

-= a_\u0300b

-= a_\u0301b

-< A\u0020b

-= A\u0020\u0300b

-= A\u0020\u0301b

-< A_b

-= A_\u0300b

-= A_\u0301b

-< a\u0301b

-< A\u0301b

-< a\u0300b

-< A\u0300b

-** test: DataDrivenCollationTest/TestNShiftedIgnorable

-# The UCA states that primary ignorables should be completely

-# ignorable when following a shifted code point.

-@ root

-% alternate=non-ignorable

-% strength=tertiary

-* compare

-< a\u0020b

-< A\u0020b

-< a\u0020\u0301b

-< A\u0020\u0301b

-< a\u0020\u0300b

-< A\u0020\u0300b

-< a_b

-< A_b

-< a_\u0301b

-< A_\u0301b

-< a_\u0300b

-< A_\u0300b

-< a\u0301b

-< A\u0301b

-< a\u0300b

-< A\u0300b

-** test: DataDrivenCollationTest/TestSafeSurrogates

-# It turned out that surrogates were not skipped properly

-# when iterating backwards if they were in the middle of a

-# contraction. This test assures that this is fixed.

-@ rules

-&a < x\ud800\udc00b

-* compare

-< a

-< x\ud800\udc00b

-** test: DataDrivenCollationTest/da_TestPrimary

-# This test goes through primary strength cases

-@ locale da

-% strength=primary

-* compare

-< Lvi

-< Lwi

-* compare

-< L\u00e4vi

-< L\u00f6wi

-* compare

-< L\u00fcbeck

-= Lybeck

-** test: DataDrivenCollationTest/da_TestTertiary

-# This test goes through tertiary strength cases

-@ locale da

-% strength=tertiary

-* compare

-< Luc

-< luck

-* compare

-< luck

-< L\u00fcbeck

-* compare

-< lybeck

-< L\u00fcbeck

-* compare

-< L\u00e4vi

-< L\u00f6we

-* compare

-< L\u00f6ww

-< mast

-* compare

-< A/S

-< ANDRE

-< ANDR\u00c9

-< ANDREAS

-< AS

-< CA

-< \u00c7A

-< CB

-< \u00c7C

-< D.S.B.

-< DA

-< \u00d0A

-< DB

-< \u00d0C

-< DSB

-< DSC

-< EKSTRA_ARBEJDE

-< EKSTRABUD0

-< H\u00d8ST

-< HAAG

-< H\u00c5NDBOG

-< HAANDV\u00c6RKSBANKEN

-< Karl

-< karl

-< NIELS\u0020J\u00d8RGEN

-< NIELS-J\u00d8RGEN

-< NIELSEN

-< R\u00c9E,\u0020A

-< REE,\u0020B

-< R\u00c9E,\u0020L

-< REE,\u0020V

-< SCHYTT,\u0020B

-< SCHYTT,\u0020H

-< SCH\u00dcTT,\u0020H

-< SCHYTT,\u0020L

-< SCH\u00dcTT,\u0020M

-< SS

-< \u00df

-< SSA

-< STORE\u0020VILDMOSE

-< STOREK\u00c6R0

-< STORM\u0020PETERSEN

-< STORMLY

-< THORVALD

-< THORVARDUR

-< \u00feORVAR\u00d0UR

-< THYGESEN

-< VESTERG\u00c5RD,\u0020A

-< VESTERGAARD,\u0020A

-< VESTERG\u00c5RD,\u0020B

-< \u00c6BLE

-< \u00c4BLE

-< \u00d8BERG

-< \u00d6BERG

-* compare

-< andere

-< chaque

-< chemin

-< cote

-< cot\u00e9

-< c\u00f4te

-< c\u00f4t\u00e9

-< \u010du\u010d\u0113t

-< Czech

-< hi\u0161a

-< irdisch

-< lie

-< lire

-< llama

-< l\u00f5ug

-< l\u00f2za

-< lu\u010d

-< luck

-< L\u00fcbeck

-< lye

-< l\u00e4vi

-< L\u00f6wen

-< m\u00e0\u0161ta

-< m\u00eer

-< myndig

-< M\u00e4nner

-< m\u00f6chten

-< pi\u00f1a

-< pint

-< pylon

-< \u0161\u00e0ran

-< savoir

-< \u0160erb\u016bra

-< Sietla

-< \u015blub

-< subtle

-< symbol

-< s\u00e4mtlich

-< verkehrt

-< vox

-< v\u00e4ga

-< waffle

-< wood

-< yen

-< yuan

-< yucca

-< \u017eal

-< \u017eena

-< \u017den\u0113va

-< zoo0

-< Zviedrija

-< Z\u00fcrich

-< zysk0

-< \u00e4ndere

-** test: DataDrivenCollationTest/hi_TestNewRules

-# This test goes through new rules and tests against old rules

-@ locale hi

-* compare

-< कॐ

-< कं

-< कँ

-< कः

-** test: DataDrivenCollationTest/ro_TestNewRules

-# This test goes through new rules and tests against old rules

-@ locale ro

-* compare

-< xAx

-< xă

-< xĂ

-< Xă

-< XĂ

-< xăx

-< xĂx

-< xâ

-< xÂ

-< Xâ

-< XÂ

-< xâx

-< xÂx

-< xb

-< xIx

-< xî

-< xÎ

-< Xî

-< XÎ

-< xîx

-< xÎx

-< xj

-< xSx

-< xș

-= xş

-< xȘ

-= xŞ

-< Xș

-= Xş

-< XȘ

-= XŞ

-< xșx

-= xşx

-< xȘx

-= xŞx

-< xT

-< xTx

-< xț

-= xţ

-< xȚ

-= xŢ

-< Xț

-= Xţ

-< XȚ

-= XŢ

-< xțx

-= xţx

-< xȚx

-= xŢx

-< xU

-** test: DataDrivenCollationTest/testOffsets

-# This tests cases where forwards and backwards iteration get different offsets

-@ locale en

-% strength=tertiary

-* compare

-< a\uD800\uDC00\uDC00

-< b\uD800\uDC00\uDC00

-* compare

-< \u0301A\u0301\u0301

-< \u0301B\u0301\u0301

-* compare

-< abcd\r\u0301

-< abce\r\u0301

-# TODO: test offsets in new CollationTest

-# End of test cases moved here from ICU 52's DataDrivenCollationTest.txt.

-** test: was ICU 52 cmsccoll/TestRedundantRules

-@ rules

-& a < b < c < d& [before 1] c < m

-* compare

-<1 a

-<1 b

-<1 m

-<1 c

-<1 d

-@ rules

-& a < b <<< c << d <<< e& [before 3] e <<< x

-* compare

-<1 a

-<1 b

-<3 c

-<2 d

-<3 x

-<3 e

-@ rules

-& a < b <<< c << d <<< e <<< f < g& [before 1] g < x

-* compare

-<1 a

-<1 b

-<3 c

-<2 d

-<3 e

-<3 f

-<1 x

-<1 g

-@ rules

-& a <<< b << c < d& a < m

-* compare

-<1 a

-<3 b

-<2 c

-<1 m

-<1 d

-@ rules

-&a<b<<b\u0301 &z<b

-* compare

-<1 a

-<1 b\u0301

-<1 z

-<1 b

-@ rules

-&z<m<<<q<<<m

-* compare

-<1 z

-<1 q

-<3 m

-@ rules

-&z<<<m<q<<<m

-* compare

-<1 z

-<1 q

-<3 m

-@ rules

-& a < b < c < d& r < c

-* compare

-<1 a

-<1 b

-<1 d

-<1 r

-<1 c

-@ rules

-& a < b < c < d& c < m

-* compare

-<1 a

-<1 b

-<1 c

-<1 m

-<1 d

-@ rules

-& a < b < c < d& a < m

-* compare

-<1 a

-<1 m

-<1 b

-<1 c

-<1 d

-** test: was ICU 52 cmsccoll/TestExpansionSyntax

-# The following two rules should sort the particular list of strings the same.

-@ rules

-&AE <<< a << b <<< c &d <<< f

-* compare

-<1 AE

-<3 a

-<2 b

-<3 c

-<1 d

-<3 f

-@ rules

-&A <<< a / E << b / E <<< c /E &d <<< f

-* compare

-<1 AE

-<3 a

-<2 b

-<3 c

-<1 d

-<3 f

-# The following two rules should sort the particular list of strings the same.

-@ rules

-&AE <<< a <<< b << c << d < e < f <<< g

-* compare

-<1 AE

-<3 a

-<3 b

-<2 c

-<2 d

-<1 e

-<1 f

-<3 g

-@ rules

-&A <<< a / E <<< b / E << c / E << d / E < e < f <<< g

-* compare

-<1 AE

-<3 a

-<3 b

-<2 c

-<2 d

-<1 e

-<1 f

-<3 g

-# The following two rules should sort the particular list of strings the same.

-@ rules

-&AE <<< B <<< C / D <<< F

-* compare

-<1 AE

-<3 B

-<3 F

-<1 AED

-<3 C

-@ rules

-&A <<< B / E <<< C / ED <<< F / E

-* compare

-<1 AE

-<3 B

-<3 F

-<1 AED

-<3 C

-** test: never reorder trailing primaries

-@ root

-% reorder Zzzz Grek

-* compare

-<1 L

-<1 字

-<1 Ω

-<1 \uFFFD

-<1 \uFFFF

-** test: fall back to mappings with shorter prefixes, not immediately to ones with no prefixes

-@ rules

-&u=ab|cd

-&v=b|ce

-* compare

-<1 abc

-<1 abcc

-<1 abcf

-<1 abcd

-= abu

-<1 abce

-= abv

-# With the following rules, there is only one prefix per composite ĉ or ç,

-# but both prefixes apply to just c in NFD form.

-# We would get different results for composed vs. NFD input

-# if we fell back directly from longest-prefix mappings to no-prefix mappings.

-@ rules

-&x=op|ĉ

-&y=p|ç

-* compare

-<1 opc

-<2 opć

-<1 opcz

-<1 opd

-<1 opĉ

-= opc\u0302

-= opx

-<1 opç

-= opc\u0327

-= opy

-# The mapping is used which has the longest matching prefix for which

-# there is also a suffix match, with the longest suffix match among several for that prefix.

-@ rules

-&❶=d

-&❷=de

-&❸=def

-&①=c|d

-&②=c|de

-&③=c|def

-&④=bc|d

-&⑤=bc|de

-&⑥=bc|def

-&⑦=abc|d

-&⑧=abc|de

-&⑨=abc|def

-* compare

-<1 9aadzz

-= 9aa❶zz

-<1 9aadez

-= 9aa❷z

-<1 9aadef

-= 9aa❸

-<1 9acdzz

-= 9ac①zz

-<1 9acdez

-= 9ac②z

-<1 9acdef

-= 9ac③

-<1 9bcdzz

-= 9bc④zz

-<1 9bcdez

-= 9bc⑤z

-<1 9bcdef

-= 9bc⑥

-<1 abcdzz

-= abc⑦zz

-<1 abcdez

-= abc⑧z

-<1 abcdef

-= abc⑨

-** test: prefix + discontiguous contraction with missing prefix contraction

-# Unfortunate terminology: The first "prefix" here is the pre-context,

-# the second "prefix" refers to the contraction/relation string that is

-# one shorter than the one being tested.

-@ rules

-&x=p|e

-&y=p|ê

-&z=op|ê

-# No mapping for op|e:

-# Discontiguous contraction matching should not match op|ê in opệ

-# because it would have to skip the dot below and extend a match on op|e by the circumflex,

-# but there is no match on op|e.

-* compare

-<1 oPe

-<1 ope

-= opx

-<1 opệ

-= opy\u0323 # y not z

-<1 opê

-= opz

-# We cannot test for fallback by whether the contraction default CE32

-# is for another contraction. With the following rules, there is no mapping for op|e,

-# and the fallback to prefix p has no contractions.

-@ rules

-&x=p|e

-&z=op|ê

-* compare

-<1 oPe

-<1 ope

-= opx

-<2 opệ

-= opx\u0323\u0302 # x not z

-<1 opê

-= opz

-# One more variation: Fallback to the simple code point, no shorter non-empty prefix.

-@ rules

-&x=e

-&z=op|ê

-* compare

-<1 ope

-= opx

-<3 oPe

-= oPx

-<2 opệ

-= opx\u0323\u0302 # x not z

-<1 opê

-= opz

-** test: maxVariable via rules

-@ rules

-[maxVariable space][alternate shifted]

-* compare

-= \u0020

-= \u000A

-<1 .

-<1 ° # degree sign

-<1 $

-<1 0

-** test: maxVariable via setting

-@ root

-% maxVariable=currency

-% alternate=shifted

-* compare

-= \u0020

-= \u000A

-= .

-= ° # degree sign

-= $

-<1 0

-** test: ICU4J CollationMiscTest/TestContractionClosure (ää)

-# This tests canonical closure, but it also tests that CollationFastLatin

-# bails out properly for contractions with combining marks.

-# For that we need pairs of strings that remain in the Latin fastpath

-# long enough, hence the extra "= b" lines.

-@ rules

-&b=\u00e4\u00e4

-* compare

-<1 b

-= \u00e4\u00e4

-= b

-= a\u0308a\u0308

-= b

-= \u00e4a\u0308

-= b

-= a\u0308\u00e4

-** test: ICU4J CollationMiscTest/TestContractionClosure (Å)

-@ rules

-&b=\u00C5

-* compare

-<1 b

-= \u00C5

-= b

-= A\u030A

-= b

-= \u212B

-** test: reset-before on already-tailored characters, ICU ticket 10108

-@ rules

-&a<w<<x &[before 2]x<<y

-* compare

-<1 a

-<1 w

-<2 y

-<2 x

-@ rules

-&a<<w<<<x &[before 2]x<<y

-* compare

-<1 a

-<2 y

-<2 w

-<3 x

-@ rules

-&a<w<x &[before 2]x<<y

-* compare

-<1 a

-<1 w

-<1 y

-<2 x

-@ rules

-&a<w<<<x &[before 2]x<<y

-* compare

-<1 a

-<1 y

-<2 w

-<3 x

-** test: numeric collation with other settings, ICU ticket 9092

-@ root

-% strength=identical

-% caseFirst=upper

-% numeric=on

-* compare

-<1 100\u0020a

-<1 101

-** test: collation type fallback from unsupported type, ICU ticket 10149

-@ locale fr-CA-u-co-phonebk

-# Expect the same result as with fr-CA, using backwards-secondary order.

-# That is, we should fall back from the unsupported collation type

-# to the locale's default collation type.

-* compare

-<1 cote

-<2 côte

-<2 coté

-<2 côté

-** test: @ is equivalent to [backwards 2], ICU ticket 9956

-@ rules

-&b<a @ &v<<w

-* compare

-<1 b

-<1 a

-<1 cote

-<2 côte

-<2 coté

-<2 côté

-<1 v

-<2 w

-<1 x

-** test: shifted+reordering, ICU ticket 9507

-@ root

-% reorder Grek punct space

-% alternate=shifted

-% strength=quaternary

-# Which primaries are "variable" should be determined without script reordering,

-# and then primaries should be reordered whether they are shifted to quaternary or not.

-* compare

-<4 ( # punctuation

-<4 )

-<4 \u0020 # space

-<1 ` # symbol

-<1 ^

-<1 $ # currency symbol

-<1 €

-<1 0 # numbers

-<1 ε # Greek

-<1 e # Latin

-<1 e(e

-<4 e)e

-<4 e\u0020e

-<4 ee

-<3 e(E

-<4 e)E

-<4 e\u0020E

-<4 eE

-** test: "uppercase first" could sort a string before its prefix, ICU ticket 9351

-@ rules

-&\u0001<<<b<<<B

-% caseFirst=upper

-* compare

-<1 aaa

-<3 aaaB

-** test: secondary+case ignores secondary ignorables, ICU ticket 9355

-@ rules

-&\u0001<<<b<<<B

-% strength=secondary

-% caseLevel=on

-* compare

-<1 a

-= ab

-= aB

-** test: custom collation rules involving tail of a contraction in Malayalam, ICU ticket 6328

-@ rules

-&[before 2] ൌ << ൗ # U+0D57 << U+0D4C == 0D46+0D57

-* compare

-<1 ൗx

-<2 ൌx

-<1 ൗy

-<2 ൌy

-** test: quoted apostrophe in compact syntax, ICU ticket 8204

-@ rules

-&q<<*a''c

-* compare

-<1 d

-<1 p

-<1 q

-<2 a

-<2 \u0027

-<2 c

-<1 r

-# ICU ticket #8260 "Support all collation-related keywords in Collator.getInstance()"

-** test: locale -u- with collation keywords, ICU ticket 8260

-@ locale de-u-kv-sPace-ka-shifTed-kn-kk-falsE-kf-Upper-kc-tRue-ks-leVel4

-* compare

-<4 \u0020 # space is shifted, strength=quaternary

-<1 ! # punctuation is regular

-<1 2

-<1 12 # numeric sorting

-<1 B

-<c b # uppercase first on case level

-<1 x\u0301\u0308

-<2 x\u0308\u0301 # normalization off

-** test: locale @ with collation keywords, ICU ticket 8260

-@ locale fr@colbAckwards=yes;ColStrength=Quaternary;kv=currencY;colalternate=shifted

-* compare

-<4 $ # currency symbols are shifted, strength=quaternary

-<1 àla

-<2 alà # backwards secondary level

-** test: locale -u- with script reordering, ICU ticket 8260

-@ locale el-u-kr-kana-SYMBOL-Grek-hani-cyrl-latn-digit-armn-deva-ethi-thai

-* compare

-<1 \u0020

-<1 あ

-<1 ☂

-<1 Ω

-<1 丂

-<1 ж

-<1 L

-<1 4

-<1 Ձ

-<1 अ

-<1 ሄ

-<1 ฉ

-** test: locale @collation=type should be case-insensitive

-@ locale de@coLLation=PhoneBook

-* compare

-<1 ae

-<2 ä

-<3 Ä

-** test: import root search rules plus German phonebook rules, ICU ticket 8962

-@ locale de-u-co-search

-* compare

-<1 =

-<1 ≠

-<1 a

-<1 ae

-<2 ä

-# Once more, but with runtime builder.

-@ rules

-[import und-u-co-search][import de-u-co-phonebk]

-* compare

-<1 =

-<1 ≠

-<1 a

-<1 ae

-<2 ä

-# Once again, with import from "root" not "und" (as in a proper language tag).

-@ rules

-[import root-u-co-search][import de-u-co-phonebk]

-* compare

-<1 =

-<1 ≠

-<1 a

-<1 ae

-<2 ä

-** test: import rules from a language with non-Latin native script, and reset the reordering, ICU ticket 10998

-# Greek should sort Greek first.

-@ rules

-[import el]

-* compare

-<1 4

-<1 Ω

-<1 L

-# Import Greek, and then reset the reordering.

-@ rules

-[import el][reorder Zzzz]

-* compare

-<1 4

-<1 L

-<1 Ω

-# "others" is a synonym for Zzzz.

-@ rules

-[import el][reorder others]

-* compare

-<1 4

-<1 L

-<1 Ω

-** test: regression test for CollationFastLatinBuilder, ICU ticket 11388

-@ rules

-&x<<aa<<<Aa<<<AA

-% strength=secondary

-* compare

-<1 AA

-<2 Aẩ

-<2 aą

-* compare

-<1 AA

-<2 aą

-** test: tailor tertiary-after a common tertiary where there is a lower one

-# Assume that Hiragana small A has a below-common tertiary, and Hiragana A has a common one.

-# See ICU ticket 11448 & CLDR ticket 7222.

-@ rules

-&あ<<<x<<<y<<<z

-* compare

-<1 ぁ

-<3 あ

-<3 x

-<3 y

-<3 z

-<3 ァ

-<1 い

-** test: tailor tertiary-after a below-common tertiary

-@ rules

-&ぁ<<<x<<<y<<<z

-* compare

-<1 ぁ

-<3 x

-<3 y

-<3 z

-<3 あ

-<3 ァ

-<1 い

-** test: tailor tertiary-before a common tertiary where there is a lower one

-@ rules

-&[before 3]あ<<<x<<<y<<<z

-* compare

-<1 ぁ

-<3 x

-<3 y

-<3 z

-<3 あ

-<3 ァ

-<1 い

-** test: tailor tertiary-before a below-common tertiary

-@ rules

-&[before 3]ぁ<<<x<<<y<<<z

-* compare

-<1 x

-<3 y

-<3 z

-<3 ぁ

-<3 あ

-<3 ァ

-<1 い

-** test: reorder single scripts not groups, ICU ticket 11449

-@ root

-% reorder Goth Latn

-* compare

-<1 4

-<1 𐌰 # Gothic

-<1 L

-<1 Ω

-# Before ICU 55, the following reordered together with Gothic.

-<1 𐌈 # Old Italic

-<1 𐑐 # Shavian

« no previous file with comments | « source/test/testdata/casing.txt ('k') | source/test/testdata/conversion.txt » ('j') | no next file with comments »