icu46/source/i18n/regexcst.txt - Issue 5516007: Check in the pristine copy of ICU 4.6...

Side by Side Diff: icu46/source/i18n/regexcst.txt

Issue 5516007: Check in the pristine copy of ICU 4.6... (Closed) Base URL: svn://chrome-svn/chrome/trunk/deps/third_party/

Patch Set: Created 10 years ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch | Annotate | Revision Log

Property Changes:

Added: svn:eol-style
+ LF

OLD	NEW
(Empty)
	1

	2 #*****************************************************************************

	3 #

	4 # Copyright (C) 2002-2007, International Business Machines Corporation and oth ers.

	5 # All Rights Reserved.

	6 #

	7 #*****************************************************************************

	8 #

	9 # file: regexcst.txt

	10 # ICU Regular Expression Parser State Table

	11 #

	12 # This state table is used when reading and parsing a regular expression pat tern

	13 # The pattern parser uses a state machine; the data in this file define the

	14 # state transitions that occur for each input character.

	15 #

	16 # *** This file defines the regex pattern grammar. This is it.

	17 # *** The determination of what is accepted is here.

	18 #

	19 # This file is processed by a perl script "regexcst.pl" to produce initializ ed C arrays

	20 # that are then built with the rule parser.

	21 #

	22

	23 #

	24 # Here is the syntax of the state definitions in this file:

	25 #

	26 #

	27 #StateName:

	28 # input-char n next-state ^push-state action

	29 # input-char n next-state ^push-state action

	30 # \| \| \| \| \|

	31 # \| \| \| \| \|--- action to be performed by state machine

	32 # \| \| \| \| See funct ion RBBIRuleScanner::doParseActions()

	33 # \| \| \| \|

	34 # \| \| \| \|--- Push this named state o nto the state stack.

	35 # \| \| \| Later, when next state is specified as "pop",

	36 # \| \| \| the pushed state will b ecome the current state.

	37 # \| \| \|

	38 # \| \| \|--- Transition to this state if the current input character matches the input

	39 # \| \| character or char class in the left hand colum n. "pop" causes the next

	40 # \| \| state to be popped from the state stack.

	41 # \| \|

	42 # \| \|--- When making the state transition specified on this line, advance to the next

	43 # \| character from the input only if 'n' appears here.

	44 # \|

	45 # \|--- Character or named character classes to test for. If the current c haracter being scanned

	46 # matches, peform the actions and go to the state specified on this l ine.

	47 # The input character is tested sequentally, in the order written. T he characters and

	48 # character classes tested for do not need to be mutually exclusive. The first match wins.

	49 #

	50

	51

	52

	53

	54 #

	55 # start state, scan position is at the beginning of the pattern.

	56 #

	57 start:

	58 default term doPatStart

	59

	60

	61

	62

	63 #

	64 # term. At a position where we can accept the start most items in a pattern.

	65 #

	66 term:

	67 quoted n expr-quant doLiteralCha r

	68 rule_char n expr-quant doLiteralCha r

	69 '[' n set-open ^set-finish doSetBegin

	70 '(' n open-paren

	71 '.' n expr-quant doDotAny

	72 '^' n expr-quant doCaret

	73 '$' n expr-quant doDollar

	74 '\' n backslash

	75 '\|' n term doOrOperator

	76 ')' n pop doCloseParen

	77 eof term doPatFinish

	78 default errorDeath doRuleError

	79

	80

	81

	82 #

	83 # expr-quant We've just finished scanning a term, now look for the optional

	84 # trailing quantifier - , +, ?, ?, etc.

	85 #

	86 expr-quant:

	87 '*' n quant-star

	88 '+' n quant-plus

	89 '?' n quant-opt

	90 '{' n interval-open doIntervalIni t

	91 '(' n open-paren-quant

	92 default expr-cont

	93

	94

	95 #

	96 # expr-cont Expression, continuation. At a point where additional terms a re

	97 # allowed, but not required. No Quan tifiers

	98 #

	99 expr-cont:

	100 '\|' n term doOrOperator

	101 ')' n pop doCloseParen

	102 default term

	103

	104

	105 #

	106 # open-paren-quant Special case handling for comments appearing before a qua ntifier,

	107 # e.g. x(?#comment )*

	108 # Open parens from expr-quant come here; anything but a (?# comment

	109 # branches into the normal parenthesis sequence as quickly as possible.

	110 #

	111 open-paren-quant:

	112 '?' n open-paren-quant2 doSuppressCom ments

	113 default open-paren

	114

	115 open-paren-quant2:

	116 '#' n paren-comment ^expr-quant

	117 default open-paren-extended

	118

	119

	120 #

	121 # open-paren We've got an open paren. We need to scan further to

	122 # determine what kind of quantifier it is - plain (, (?:, (?>, o r whatever.

	123 #

	124 open-paren:

	125 '?' n open-paren-extended doSuppressCo mments

	126 default term ^expr-quant doOpenCaptur eParen

	127

	128 open-paren-extended:

	129 ':' n term ^expr-quant doOpenNonCap tureParen # (?:

	130 '>' n term ^expr-quant doOpenAtomic Paren # (?>

	131 '=' n term ^expr-cont doOpenLookAh ead # (?=

	132 '!' n term ^expr-cont doOpenLookAh eadNeg # (?!

	133 '<' n open-paren-lookbehind

	134 '#' n paren-comment ^term

	135 'i' paren-flag doBeginMatch Mode

	136 'd' paren-flag doBeginMatch Mode

	137 'm' paren-flag doBeginMatch Mode

	138 's' paren-flag doBeginMatch Mode

	139 'u' paren-flag doBeginMatch Mode

	140 'w' paren-flag doBeginMatch Mode

	141 'x' paren-flag doBeginMatch Mode

	142 '-' paren-flag doBeginMatch Mode

	143 '(' n errorDeath doConditiona lExpr

	144 '{' n errorDeath doPerlInline

	145 default errorDeath doBadOpenPar enType

	146

	147 open-paren-lookbehind:

	148 '=' n term ^expr-cont doOpenLookBe hind # (?<=

	149 '!' n term ^expr-cont doOpenLookBe hindNeg # (?<!

	150 default errorDeath doBadOpenPar enType

	151

	152

	153 #

	154 # paren-comment We've got a (?# ... ) style comment. Eat pattern text til l we get to the ')'

	155 #

	156 paren-comment:

	157 ')' n pop

	158 eof errorDeath doMismat chedParenErr

	159 default n paren-comment

	160

	161 #

	162 # paren-flag Scanned a (?ismx-ismx flag setting

	163 #

	164 paren-flag:

	165 'i' n paren-flag doMatchMode

	166 'd' n paren-flag doMatchMode

	167 'm' n paren-flag doMatchMode

	168 's' n paren-flag doMatchMode

	169 'u' n paren-flag doMatchMode

	170 'w' n paren-flag doMatchMode

	171 'x' n paren-flag doMatchMode

	172 '-' n paren-flag doMatchMode

	173 ')' n term doSetMatchMo de

	174 ':' n term ^expr-quant doMatchModeP aren

	175 default errorDeath doBadModeFla g

	176

	177

	178 #

	179 # quant-star Scanning a '*' quantifier. Need to look ahead to decide

	180 # between plain '', '?', '*+'

	181 #

	182 quant-star:

	183 '?' n expr-cont doNGStar # *?

	184 '+' n expr-cont doPossessive Star # *+

	185 default expr-cont doStar

	186

	187

	188 #

	189 # quant-plus Scanning a '+' quantifier. Need to look ahead to decide

	190 # between plain '+', '+?', '++'

	191 #

	192 quant-plus:

	193 '?' n expr-cont doNGPlus # *?

	194 '+' n expr-cont doPossessive Plus # *+

	195 default expr-cont doPlus

	196

	197

	198 #

	199 # quant-opt Scanning a '?' quantifier. Need to look ahead to decide

	200 # between plain '?', '??', '?+'

	201 #

	202 quant-opt:

	203 '?' n expr-cont doNGOpt # ??

	204 '+' n expr-cont doPossessive Opt # ?+

	205 default expr-cont doOpt # ?

	206

	207

	208 #

	209 # Interval scanning a '{', the opening delimiter for an interval speci fication

	210 # {number} or {min, max} or {min,}

	211 #

	212 interval-open:

	213 digit_char interval-lower

	214 default errorDeath doIntervalEr ror

	215

	216 interval-lower:

	217 digit_char n interval-lower doIntevalLow erDigit

	218 ',' n interval-upper

	219 '}' n interval-type doIntervalSa me # {n}

	220 default errorDeath doIntervalEr ror

	221

	222 interval-upper:

	223 digit_char n interval-upper doIntervalUp perDigit

	224 '}' n interval-type

	225 default errorDeath doIntervalEr ror

	226

	227 interval-type:

	228 '?' n expr-cont doNGInterval # {n,m}?

	229 '+' n expr-cont doPossessive Interval # {n,m}+

	230 default expr-cont doInterval # {m,n}

	231

	232

	233 #

	234 # backslash # Backslash. Figure out which of the \thingies we have enc ountered.

	235 # The low level next-char function will have pr eprocessed

	236 # some of them already; those won't come here.

	237 backslash:

	238 'A' n term doBackslashA

	239 'B' n term doBackslashB

	240 'b' n term doBackslashb

	241 'd' n expr-quant doBackslashd

	242 'D' n expr-quant doBackslashD

	243 'G' n term doBackslashG

	244 'N' expr-quant doNamedChar # \N{NAME} named char

	245 'p' expr-quant doProperty # \p{Lu} style property

	246 'P' expr-quant doProperty

	247 'Q' n term doEnterQuote Mode

	248 'S' n expr-quant doBackslashS

	249 's' n expr-quant doBackslashs

	250 'W' n expr-quant doBackslashW

	251 'w' n expr-quant doBackslashw

	252 'X' n expr-quant doBackslashX

	253 'Z' n term doBackslashZ

	254 'z' n term doBackslashz

	255 digit_char n expr-quant doBackRef # Will scan multiple digits

	256 eof errorDeath doEscapeErro r

	257 default n expr-quant doEscapedLit eralChar

	258

	259

	260

	261 #

	262 # [set expression] parsing,

	263 # All states involved in parsing set expressions have names beginning with "s et-"

	264 #

	265

	266 set-open:

	267 '^' n set-open2 doSetNegate

	268 ':' set-posix doSetPosixPr op

	269 default set-open2

	270

	271 set-open2:

	272 ']' n set-after-lit doSetLiteral

	273 default set-start

	274

	275 # set-posix:

	276 # scanned a '[:' If it really is a [:property:], doSetPosixPro p will have

	277 # moved the scan to the closing ']'. If it wasn't a property

	278 # expression, the scan will still be at the opening ':', which should

	279 # be interpreted as a normal set expression.

	280 set-posix:

	281 ']' n pop doSetEnd

	282 ':' set-start

	283 default errorDeath doRuleError # should not be possible.

	284

	285 #

	286 # set-start after the [ and special case leading characters (^ and/or ]) but before

	287 # everything else. A '-' is literal at this point.

	288 #

	289 set-start:

	290 ']' n pop doSetEnd

	291 '[' n set-open ^set-after-set doSetBeginUn ion

	292 '\' n set-escape

	293 '-' n set-start-dash

	294 '&' n set-start-amp

	295 default n set-after-lit doSetLiteral

	296

	297 # set-start-dash Turn "[--" into a syntax error.

	298 # "[-x" is good, - and x are literals.

	299 #

	300 set-start-dash:

	301 '-' errorDeath doRuleError

	302 default set-after-lit doSetAddDash

	303

	304 # set-start-amp Turn "[&&" into a syntax error.

	305 # "[&x" is good, & and x are literals.

	306 #

	307 set-start-amp:

	308 '&' errorDeath doRuleError

	309 default set-after-lit doSetAddAmp

	310

	311 #

	312 # set-after-lit The last thing scanned was a literal character within a set .

	313 # Can be followed by anything. Single '-' or '&' are

	314 # literals in this context, not operators.

	315 set-after-lit:

	316 ']' n pop doSetEnd

	317 '[' n set-open ^set-after-set doSetBeginUn ion

	318 '-' n set-lit-dash

	319 '&' n set-lit-amp

	320 '\' n set-escape

	321 eof errorDeath doSetNoClose Error

	322 default n set-after-lit doSetLiteral

	323

	324 set-after-set:

	325 ']' n pop doSetEnd

	326 '[' n set-open ^set-after-set doSetBeginUn ion

	327 '-' n set-set-dash

	328 '&' n set-set-amp

	329 '\' n set-escape

	330 eof errorDeath doSetNoClose Error

	331 default n set-after-lit doSetLiteral

	332

	333 set-after-range:

	334 ']' n pop doSetEnd

	335 '[' n set-open ^set-after-set doSetBeginUn ion

	336 '-' n set-range-dash

	337 '&' n set-range-amp

	338 '\' n set-escape

	339 eof errorDeath doSetNoClose Error

	340 default n set-after-lit doSetLiteral

	341

	342

	343 # set-after-op

	344 # After a -- or &&

	345 # It is an error to close a set at this point.

	346 #

	347 set-after-op:

	348 '[' n set-open ^set-after-set doSetBeginUn ion

	349 ']' errorDeath doSetOpError

	350 '\' n set-escape

	351 default n set-after-lit doSetLiteral

	352

	353 #

	354 # set-set-amp

	355 # Have scanned [[set]&

	356 # Could be a '&' intersection operator, if a set follows.

	357 # Could be the start of a '&&' operator.

	358 # Otherewise is a literal.

	359 set-set-amp:

	360 '[' n set-open ^set-after-set doSetBeginInt ersection1

	361 '&' n set-after-op doSetIntersec tion2

	362 default set-after-lit doSetAddAmp

	363

	364

	365 # set-lit-amp Have scanned "[literals&"

	366 # Could be a start of "&&" operator or a literal

	367 # In [abc&[def]], the '&' is a literal

	368 #

	369 set-lit-amp:

	370 '&' n set-after-op doSetInterse ction2

	371 default set-after-lit doSetAddAmp

	372

	373

	374 #

	375 # set-set-dash

	376 # Have scanned [set]-

	377 # Could be a '-' difference operator, if a [set] follows.

	378 # Could be the start of a '--' operator.

	379 # Otherewise is a literal.

	380 set-set-dash:

	381 '[' n set-open ^set-after-set doSetBeginDif ference1

	382 '-' n set-after-op doSetDifferen ce2

	383 default set-after-lit doSetAddDash

	384

	385

	386 #

	387 # set-range-dash

	388 # scanned a-b- or \w-

	389 # any set or range like item where the trailing single '-' should

	390 # be literal, not a set difference operation.

	391 # A trailing "--" is still a difference operator.

	392 set-range-dash:

	393 '-' n set-after-op doSetDifferen ce2

	394 default set-after-lit doSetAddDash

	395

	396

	397 set-range-amp:

	398 '&' n set-after-op doSetIntersec tion2

	399 default set-after-lit doSetAddAmp

	400

	401

	402 # set-lit-dash

	403 # Have scanned "[literals-" Could be a range or a -- operator or a literal

	404 # In [abc-[def]], the '-' is a literal (confirmed with a Java test)

	405 # [abc-\p{xx} the '-' is an error

	406 # [abc-] the '-' is a literal

	407 # [ab-xy] the '-' is a range

	408 #

	409 set-lit-dash:

	410 '-' n set-after-op doSetDiffere nce2

	411 '[' set-after-lit doSetAddDash

	412 ']' set-after-lit doSetAddDash

	413 '\' n set-lit-dash-escape

	414 default n set-after-range doSetRange

	415

	416 # set-lit-dash-escape

	417 #

	418 # scanned "[literal-\"

	419 # Could be a range, if the \ introduces an escaped literal char or a named ch ar.

	420 # Otherwise it is an error.

	421 #

	422 set-lit-dash-escape:

	423 's' errorDeath doSetOpError

	424 'S' errorDeath doSetOpError

	425 'w' errorDeath doSetOpError

	426 'W' errorDeath doSetOpError

	427 'd' errorDeath doSetOpError

	428 'D' errorDeath doSetOpError

	429 'N' set-after-range doSetNamedRan ge

	430 default n set-after-range doSetRange

	431

	432

	433 #

	434 # set-escape

	435 # Common back-slash escape processing within set expressions

	436 #

	437 set-escape:

	438 'p' set-after-set doSetProp

	439 'P' set-after-set doSetProp

	440 'N' set-after-lit doSetNamedCh ar

	441 's' n set-after-range doSetBacksla sh_s

	442 'S' n set-after-range doSetBacksla sh_S

	443 'w' n set-after-range doSetBacksla sh_w

	444 'W' n set-after-range doSetBacksla sh_W

	445 'd' n set-after-range doSetBacksla sh_d

	446 'D' n set-after-range doSetBacksla sh_D

	447 default n set-after-lit doSetLiteral Escaped

	448

	449 #

	450 # set-finish

	451 # Have just encountered the final ']' that completes a [set], and

	452 # arrived here via a pop. From here, we exit the set parsing world, and go

	453 # back to generic regular expression parsing.

	454 #

	455 set-finish:

	456 default expr-quant doSetFinish

	457

	458

	459 #

	460 # errorDeath. This state is specified as the next state whenever a syntax erro r

	461 # in the source rules is detected. Barring bugs, the state machin e will never

	462 # actually get here, but will stop because of the action associate d with the error.

	463 # But, just in case, this state asks the state machine to exit.

	464 errorDeath:

	465 default n errorDeath doExit

	466

	467

OLD	NEW

« no previous file with comments | « icu46/source/i18n/regexcst.pl ('k') | icu46/source/i18n/regeximp.h » ('j') | no next file with comments »