OLD | NEW |
(Empty) | |
| 1 Version 3.1 |
| 2 ----------------------------- |
| 3 02/28/09: beazley |
| 4 Fixed broken start argument to yacc(). PLY-3.0 broke this |
| 5 feature by accident. |
| 6 |
| 7 02/28/09: beazley |
| 8 Fixed debugging output. yacc() no longer reports shift/reduce |
| 9 or reduce/reduce conflicts if debugging is turned off. This |
| 10 restores similar behavior in PLY-2.5. Reported by Andrew Waters. |
| 11 |
| 12 Version 3.0 |
| 13 ----------------------------- |
| 14 02/03/09: beazley |
| 15 Fixed missing lexer attribute on certain tokens when |
| 16 invoking the parser p_error() function. Reported by |
| 17 Bart Whiteley. |
| 18 |
| 19 02/02/09: beazley |
| 20 The lex() command now does all error-reporting and diagonistics |
| 21 using the logging module interface. Pass in a Logger object |
| 22 using the errorlog parameter to specify a different logger. |
| 23 |
| 24 02/02/09: beazley |
| 25 Refactored ply.lex to use a more object-oriented and organized |
| 26 approach to collecting lexer information. |
| 27 |
| 28 02/01/09: beazley |
| 29 Removed the nowarn option from lex(). All output is controlled |
| 30 by passing in a logger object. Just pass in a logger with a high |
| 31 level setting to suppress output. This argument was never |
| 32 documented to begin with so hopefully no one was relying upon it. |
| 33 |
| 34 02/01/09: beazley |
| 35 Discovered and removed a dead if-statement in the lexer. This |
| 36 resulted in a 6-7% speedup in lexing when I tested it. |
| 37 |
| 38 01/13/09: beazley |
| 39 Minor change to the procedure for signalling a syntax error in a |
| 40 production rule. A normal SyntaxError exception should be raised |
| 41 instead of yacc.SyntaxError. |
| 42 |
| 43 01/13/09: beazley |
| 44 Added a new method p.set_lineno(n,lineno) that can be used to set the |
| 45 line number of symbol n in grammar rules. This simplifies manual |
| 46 tracking of line numbers. |
| 47 |
| 48 01/11/09: beazley |
| 49 Vastly improved debugging support for yacc.parse(). Instead of passi
ng |
| 50 debug as an integer, you can supply a Logging object (see the logging |
| 51 module). Messages will be generated at the ERROR, INFO, and DEBUG |
| 52 logging levels, each level providing progressively more information. |
| 53 The debugging trace also shows states, grammar rule, values passed |
| 54 into grammar rules, and the result of each reduction. |
| 55 |
| 56 01/09/09: beazley |
| 57 The yacc() command now does all error-reporting and diagnostics using |
| 58 the interface of the logging module. Use the errorlog parameter to |
| 59 specify a logging object for error messages. Use the debuglog paramet
er |
| 60 to specify a logging object for the 'parser.out' output. |
| 61 |
| 62 01/09/09: beazley |
| 63 *HUGE* refactoring of the the ply.yacc() implementation. The high-le
vel |
| 64 user interface is backwards compatible, but the internals are complete
ly |
| 65 reorganized into classes. No more global variables. The internals |
| 66 are also more extensible. For example, you can use the classes to |
| 67 construct a LALR(1) parser in an entirely different manner than |
| 68 what is currently the case. Documentation is forthcoming. |
| 69 |
| 70 01/07/09: beazley |
| 71 Various cleanup and refactoring of yacc internals. |
| 72 |
| 73 01/06/09: beazley |
| 74 Fixed a bug with precedence assignment. yacc was assigning the preced
ence |
| 75 each rule based on the left-most token, when in fact, it should have b
een |
| 76 using the right-most token. Reported by Bruce Frederiksen. |
| 77 |
| 78 11/27/08: beazley |
| 79 Numerous changes to support Python 3.0 including removal of deprecated
|
| 80 statements (e.g., has_key) and the additional of compatibility code |
| 81 to emulate features from Python 2 that have been removed, but which |
| 82 are needed. Fixed the unit testing suite to work with Python 3.0. |
| 83 The code should be backwards compatible with Python 2. |
| 84 |
| 85 11/26/08: beazley |
| 86 Loosened the rules on what kind of objects can be passed in as the |
| 87 "module" parameter to lex() and yacc(). Previously, you could only us
e |
| 88 a module or an instance. Now, PLY just uses dir() to get a list of |
| 89 symbols on whatever the object is without regard for its type. |
| 90 |
| 91 11/26/08: beazley |
| 92 Changed all except: statements to be compatible with Python2.x/3.x syn
tax. |
| 93 |
| 94 11/26/08: beazley |
| 95 Changed all raise Exception, value statements to raise Exception(value
) for |
| 96 forward compatibility. |
| 97 |
| 98 11/26/08: beazley |
| 99 Removed all print statements from lex and yacc, using sys.stdout and s
ys.stderr |
| 100 directly. Preparation for Python 3.0 support. |
| 101 |
| 102 11/04/08: beazley |
| 103 Fixed a bug with referring to symbols on the the parsing stack using n
egative |
| 104 indices. |
| 105 |
| 106 05/29/08: beazley |
| 107 Completely revamped the testing system to use the unittest module for
everything. |
| 108 Added additional tests to cover new errors/warnings. |
| 109 |
| 110 Version 2.5 |
| 111 ----------------------------- |
| 112 05/28/08: beazley |
| 113 Fixed a bug with writing lex-tables in optimized mode and start states
. |
| 114 Reported by Kevin Henry. |
| 115 |
| 116 Version 2.4 |
| 117 ----------------------------- |
| 118 05/04/08: beazley |
| 119 A version number is now embedded in the table file signature so that |
| 120 yacc can more gracefully accomodate changes to the output format |
| 121 in the future. |
| 122 |
| 123 05/04/08: beazley |
| 124 Removed undocumented .pushback() method on grammar productions. I'm |
| 125 not sure this ever worked and can't recall ever using it. Might have |
| 126 been an abandoned idea that never really got fleshed out. This |
| 127 feature was never described or tested so removing it is hopefully |
| 128 harmless. |
| 129 |
| 130 05/04/08: beazley |
| 131 Added extra error checking to yacc() to detect precedence rules define
d |
| 132 for undefined terminal symbols. This allows yacc() to detect a poten
tial |
| 133 problem that can be really tricky to debug if no warning message or er
ror |
| 134 message is generated about it. |
| 135 |
| 136 05/04/08: beazley |
| 137 lex() now has an outputdir that can specify the output directory for |
| 138 tables when running in optimize mode. For example: |
| 139 |
| 140 lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") |
| 141 |
| 142 The behavior of specifying a table module and output directory are |
| 143 more aligned with the behavior of yacc(). |
| 144 |
| 145 05/04/08: beazley |
| 146 [Issue 9] |
| 147 Fixed filename bug in when specifying the modulename in lex() and yacc
(). |
| 148 If you specified options such as the following: |
| 149 |
| 150 parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar"
) |
| 151 |
| 152 yacc would create a file "foo.bar.parsetab.py" in the given directory. |
| 153 Now, it simply generates a file "parsetab.py" in that directory. |
| 154 Bug reported by cptbinho. |
| 155 |
| 156 05/04/08: beazley |
| 157 Slight modification to lex() and yacc() to allow their table files |
| 158 to be loaded from a previously loaded module. This might make |
| 159 it easier to load the parsing tables from a complicated package |
| 160 structure. For example: |
| 161 |
| 162 import foo.bar.spam.parsetab as parsetab |
| 163 parser = yacc.yacc(tabmodule=parsetab) |
| 164 |
| 165 Note: lex and yacc will never regenerate the table file if used |
| 166 in the form---you will get a warning message instead. |
| 167 This idea suggested by Brian Clapper. |
| 168 |
| 169 |
| 170 04/28/08: beazley |
| 171 Fixed a big with p_error() functions being picked up correctly |
| 172 when running in yacc(optimize=1) mode. Patch contributed by |
| 173 Bart Whiteley. |
| 174 |
| 175 02/28/08: beazley |
| 176 Fixed a bug with 'nonassoc' precedence rules. Basically the |
| 177 non-precedence was being ignored and not producing the correct |
| 178 run-time behavior in the parser. |
| 179 |
| 180 02/16/08: beazley |
| 181 Slight relaxation of what the input() method to a lexer will |
| 182 accept as a string. Instead of testing the input to see |
| 183 if the input is a string or unicode string, it checks to see |
| 184 if the input object looks like it contains string data. |
| 185 This change makes it possible to pass string-like objects |
| 186 in as input. For example, the object returned by mmap. |
| 187 |
| 188 import mmap, os |
| 189 data = mmap.mmap(os.open(filename,os.O_RDONLY), |
| 190 os.path.getsize(filename), |
| 191 access=mmap.ACCESS_READ) |
| 192 lexer.input(data) |
| 193 |
| 194 |
| 195 11/29/07: beazley |
| 196 Modification of ply.lex to allow token functions to aliased. |
| 197 This is subtle, but it makes it easier to create libraries and |
| 198 to reuse token specifications. For example, suppose you defined |
| 199 a function like this: |
| 200 |
| 201 def number(t): |
| 202 r'\d+' |
| 203 t.value = int(t.value) |
| 204 return t |
| 205 |
| 206 This change would allow you to define a token rule as follows: |
| 207 |
| 208 t_NUMBER = number |
| 209 |
| 210 In this case, the token type will be set to 'NUMBER' and use |
| 211 the associated number() function to process tokens. |
| 212 |
| 213 11/28/07: beazley |
| 214 Slight modification to lex and yacc to grab symbols from both |
| 215 the local and global dictionaries of the caller. This |
| 216 modification allows lexers and parsers to be defined using |
| 217 inner functions and closures. |
| 218 |
| 219 11/28/07: beazley |
| 220 Performance optimization: The lexer.lexmatch and t.lexer |
| 221 attributes are no longer set for lexer tokens that are not |
| 222 defined by functions. The only normal use of these attributes |
| 223 would be in lexer rules that need to perform some kind of |
| 224 special processing. Thus, it doesn't make any sense to set |
| 225 them on every token. |
| 226 |
| 227 *** POTENTIAL INCOMPATIBILITY *** This might break code |
| 228 that is mucking around with internal lexer state in some |
| 229 sort of magical way. |
| 230 |
| 231 11/27/07: beazley |
| 232 Added the ability to put the parser into error-handling mode |
| 233 from within a normal production. To do this, simply raise |
| 234 a yacc.SyntaxError exception like this: |
| 235 |
| 236 def p_some_production(p): |
| 237 'some_production : prod1 prod2' |
| 238 ... |
| 239 raise yacc.SyntaxError # Signal an error |
| 240 |
| 241 A number of things happen after this occurs: |
| 242 |
| 243 - The last symbol shifted onto the symbol stack is discarded |
| 244 and parser state backed up to what it was before the |
| 245 the rule reduction. |
| 246 |
| 247 - The current lookahead symbol is saved and replaced by |
| 248 the 'error' symbol. |
| 249 |
| 250 - The parser enters error recovery mode where it tries |
| 251 to either reduce the 'error' rule or it starts |
| 252 discarding items off of the stack until the parser |
| 253 resets. |
| 254 |
| 255 When an error is manually set, the parser does *not* call |
| 256 the p_error() function (if any is defined). |
| 257 *** NEW FEATURE *** Suggested on the mailing list |
| 258 |
| 259 11/27/07: beazley |
| 260 Fixed structure bug in examples/ansic. Reported by Dion Blazakis. |
| 261 |
| 262 11/27/07: beazley |
| 263 Fixed a bug in the lexer related to start conditions and ignored |
| 264 token rules. If a rule was defined that changed state, but |
| 265 returned no token, the lexer could be left in an inconsistent |
| 266 state. Reported by |
| 267 |
| 268 11/27/07: beazley |
| 269 Modified setup.py to support Python Eggs. Patch contributed by |
| 270 Simon Cross. |
| 271 |
| 272 11/09/07: beazely |
| 273 Fixed a bug in error handling in yacc. If a syntax error occurred and
the |
| 274 parser rolled the entire parse stack back, the parser would be left in
in |
| 275 inconsistent state that would cause it to trigger incorrect actions on |
| 276 subsequent input. Reported by Ton Biegstraaten, Justin King, and othe
rs. |
| 277 |
| 278 11/09/07: beazley |
| 279 Fixed a bug when passing empty input strings to yacc.parse(). This |
| 280 would result in an error message about "No input given". Reported |
| 281 by Andrew Dalke. |
| 282 |
| 283 Version 2.3 |
| 284 ----------------------------- |
| 285 02/20/07: beazley |
| 286 Fixed a bug with character literals if the literal '.' appeared as the |
| 287 last symbol of a grammar rule. Reported by Ales Smrcka. |
| 288 |
| 289 02/19/07: beazley |
| 290 Warning messages are now redirected to stderr instead of being printed |
| 291 to standard output. |
| 292 |
| 293 02/19/07: beazley |
| 294 Added a warning message to lex.py if it detects a literal backslash |
| 295 character inside the t_ignore declaration. This is to help |
| 296 problems that might occur if someone accidentally defines t_ignore |
| 297 as a Python raw string. For example: |
| 298 |
| 299 t_ignore = r' \t' |
| 300 |
| 301 The idea for this is from an email I received from David Cimimi who |
| 302 reported bizarre behavior in lexing as a result of defining t_ignore |
| 303 as a raw string by accident. |
| 304 |
| 305 02/18/07: beazley |
| 306 Performance improvements. Made some changes to the internal |
| 307 table organization and LR parser to improve parsing performance. |
| 308 |
| 309 02/18/07: beazley |
| 310 Automatic tracking of line number and position information must now be
|
| 311 enabled by a special flag to parse(). For example: |
| 312 |
| 313 yacc.parse(data,tracking=True) |
| 314 |
| 315 In many applications, it's just not that important to have the |
| 316 parser automatically track all line numbers. By making this an |
| 317 optional feature, it allows the parser to run significantly faster |
| 318 (more than a 20% speed increase in many cases). Note: positional |
| 319 information is always available for raw tokens---this change only |
| 320 applies to positional information associated with nonterminal |
| 321 grammar symbols. |
| 322 *** POTENTIAL INCOMPATIBILITY *** |
| 323 |
| 324 02/18/07: beazley |
| 325 Yacc no longer supports extended slices of grammar productions. |
| 326 However, it does support regular slices. For example: |
| 327 |
| 328 def p_foo(p): |
| 329 '''foo: a b c d e''' |
| 330 p[0] = p[1:3] |
| 331 |
| 332 This change is a performance improvement to the parser--it streamlines |
| 333 normal access to the grammar values since slices are now handled in |
| 334 a __getslice__() method as opposed to __getitem__(). |
| 335 |
| 336 02/12/07: beazley |
| 337 Fixed a bug in the handling of token names when combined with |
| 338 start conditions. Bug reported by Todd O'Bryan. |
| 339 |
| 340 Version 2.2 |
| 341 ------------------------------ |
| 342 11/01/06: beazley |
| 343 Added lexpos() and lexspan() methods to grammar symbols. These |
| 344 mirror the same functionality of lineno() and linespan(). For |
| 345 example: |
| 346 |
| 347 def p_expr(p): |
| 348 'expr : expr PLUS expr' |
| 349 p.lexpos(1) # Lexing position of left-hand-expression |
| 350 p.lexpos(1) # Lexing position of PLUS |
| 351 start,end = p.lexspan(3) # Lexing range of right hand expression |
| 352 |
| 353 11/01/06: beazley |
| 354 Minor change to error handling. The recommended way to skip character
s |
| 355 in the input is to use t.lexer.skip() as shown here: |
| 356 |
| 357 def t_error(t): |
| 358 print "Illegal character '%s'" % t.value[0] |
| 359 t.lexer.skip(1) |
| 360 |
| 361 The old approach of just using t.skip(1) will still work, but won't |
| 362 be documented. |
| 363 |
| 364 10/31/06: beazley |
| 365 Discarded tokens can now be specified as simple strings instead of |
| 366 functions. To do this, simply include the text "ignore_" in the |
| 367 token declaration. For example: |
| 368 |
| 369 t_ignore_cppcomment = r'//.*' |
| 370 |
| 371 Previously, this had to be done with a function. For example: |
| 372 |
| 373 def t_ignore_cppcomment(t): |
| 374 r'//.*' |
| 375 pass |
| 376 |
| 377 If start conditions/states are being used, state names should appear |
| 378 before the "ignore_" text. |
| 379 |
| 380 10/19/06: beazley |
| 381 The Lex module now provides support for flex-style start conditions |
| 382 as described at http://www.gnu.org/software/flex/manual/html_chapter/f
lex_11.html. |
| 383 Please refer to this document to understand this change note. Refer t
o |
| 384 the PLY documentation for PLY-specific explanation of how this works. |
| 385 |
| 386 To use start conditions, you first need to declare a set of states in |
| 387 your lexer file: |
| 388 |
| 389 states = ( |
| 390 ('foo','exclusive'), |
| 391 ('bar','inclusive') |
| 392 ) |
| 393 |
| 394 This serves the same role as the %s and %x specifiers in flex. |
| 395 |
| 396 One a state has been declared, tokens for that state can be |
| 397 declared by defining rules of the form t_state_TOK. For example: |
| 398 |
| 399 t_PLUS = '\+' # Rule defined in INITIAL state |
| 400 t_foo_NUM = '\d+' # Rule defined in foo state |
| 401 t_bar_NUM = '\d+' # Rule defined in bar state |
| 402 |
| 403 t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar |
| 404 t_ANY_NUM = '\d+' # Rule defined in all states |
| 405 |
| 406 In addition to defining tokens for each state, the t_ignore and t_erro
r |
| 407 specifications can be customized for specific states. For example: |
| 408 |
| 409 t_foo_ignore = " " # Ignored characters for foo state |
| 410 def t_bar_error(t): |
| 411 # Handle errors in bar state |
| 412 |
| 413 With token rules, the following methods can be used to change states |
| 414 |
| 415 def t_TOKNAME(t): |
| 416 t.lexer.begin('foo') # Begin state 'foo' |
| 417 t.lexer.push_state('foo') # Begin state 'foo', push old state |
| 418 # onto a stack |
| 419 t.lexer.pop_state() # Restore previous state |
| 420 t.lexer.current_state() # Returns name of current state |
| 421 |
| 422 These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and |
| 423 yy_top_state() functions in flex. |
| 424 |
| 425 The use of start states can be used as one way to write sub-lexers. |
| 426 For example, the lexer or parser might instruct the lexer to start |
| 427 generating a different set of tokens depending on the context. |
| 428 |
| 429 example/yply/ylex.py shows the use of start states to grab C/C++ |
| 430 code fragments out of traditional yacc specification files. |
| 431 |
| 432 *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also |
| 433 discussed various aspects of the design. |
| 434 |
| 435 10/19/06: beazley |
| 436 Minor change to the way in which yacc.py was reporting shift/reduce |
| 437 conflicts. Although the underlying LALR(1) algorithm was correct, |
| 438 PLY was under-reporting the number of conflicts compared to yacc/bison |
| 439 when precedence rules were in effect. This change should make PLY |
| 440 report the same number of conflicts as yacc. |
| 441 |
| 442 10/19/06: beazley |
| 443 Modified yacc so that grammar rules could also include the '-' |
| 444 character. For example: |
| 445 |
| 446 def p_expr_list(p): |
| 447 'expression-list : expression-list expression' |
| 448 |
| 449 Suggested by Oldrich Jedlicka. |
| 450 |
| 451 10/18/06: beazley |
| 452 Attribute lexer.lexmatch added so that token rules can access the re |
| 453 match object that was generated. For example: |
| 454 |
| 455 def t_FOO(t): |
| 456 r'some regex' |
| 457 m = t.lexer.lexmatch |
| 458 # Do something with m |
| 459 |
| 460 |
| 461 This may be useful if you want to access named groups specified within |
| 462 the regex for a specific token. Suggested by Oldrich Jedlicka. |
| 463 |
| 464 10/16/06: beazley |
| 465 Changed the error message that results if an illegal character |
| 466 is encountered and no default error function is defined in lex. |
| 467 The exception is now more informative about the actual cause of |
| 468 the error. |
| 469 |
| 470 Version 2.1 |
| 471 ------------------------------ |
| 472 10/02/06: beazley |
| 473 The last Lexer object built by lex() can be found in lex.lexer. |
| 474 The last Parser object built by yacc() can be found in yacc.parser. |
| 475 |
| 476 10/02/06: beazley |
| 477 New example added: examples/yply |
| 478 |
| 479 This example uses PLY to convert Unix-yacc specification files to |
| 480 PLY programs with the same grammar. This may be useful if you |
| 481 want to convert a grammar from bison/yacc to use with PLY. |
| 482 |
| 483 10/02/06: beazley |
| 484 Added support for a start symbol to be specified in the yacc |
| 485 input file itself. Just do this: |
| 486 |
| 487 start = 'name' |
| 488 |
| 489 where 'name' matches some grammar rule. For example: |
| 490 |
| 491 def p_name(p): |
| 492 'name : A B C' |
| 493 ... |
| 494 |
| 495 This mirrors the functionality of the yacc %start specifier. |
| 496 |
| 497 09/30/06: beazley |
| 498 Some new examples added.: |
| 499 |
| 500 examples/GardenSnake : A simple indentation based language similar |
| 501 to Python. Shows how you might handle |
| 502 whitespace. Contributed by Andrew Dalke. |
| 503 |
| 504 examples/BASIC : An implementation of 1964 Dartmouth BASIC. |
| 505 Contributed by Dave against his better |
| 506 judgement. |
| 507 |
| 508 09/28/06: beazley |
| 509 Minor patch to allow named groups to be used in lex regular |
| 510 expression rules. For example: |
| 511 |
| 512 t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' |
| 513 |
| 514 Patch submitted by Adam Ring. |
| 515 |
| 516 09/28/06: beazley |
| 517 LALR(1) is now the default parsing method. To use SLR, use |
| 518 yacc.yacc(method="SLR"). Note: there is no performance impact |
| 519 on parsing when using LALR(1) instead of SLR. However, constructing |
| 520 the parsing tables will take a little longer. |
| 521 |
| 522 09/26/06: beazley |
| 523 Change to line number tracking. To modify line numbers, modify |
| 524 the line number of the lexer itself. For example: |
| 525 |
| 526 def t_NEWLINE(t): |
| 527 r'\n' |
| 528 t.lexer.lineno += 1 |
| 529 |
| 530 This modification is both cleanup and a performance optimization. |
| 531 In past versions, lex was monitoring every token for changes in |
| 532 the line number. This extra processing is unnecessary for a vast |
| 533 majority of tokens. Thus, this new approach cleans it up a bit. |
| 534 |
| 535 *** POTENTIAL INCOMPATIBILITY *** |
| 536 You will need to change code in your lexer that updates the line |
| 537 number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" |
| 538 |
| 539 09/26/06: beazley |
| 540 Added the lexing position to tokens as an attribute lexpos. This |
| 541 is the raw index into the input text at which a token appears. |
| 542 This information can be used to compute column numbers and other |
| 543 details (e.g., scan backwards from lexpos to the first newline |
| 544 to get a column position). |
| 545 |
| 546 09/25/06: beazley |
| 547 Changed the name of the __copy__() method on the Lexer class |
| 548 to clone(). This is used to clone a Lexer object (e.g., if |
| 549 you're running different lexers at the same time). |
| 550 |
| 551 09/21/06: beazley |
| 552 Limitations related to the use of the re module have been eliminated. |
| 553 Several users reported problems with regular expressions exceeding |
| 554 more than 100 named groups. To solve this, lex.py is now capable |
| 555 of automatically splitting its master regular regular expression into |
| 556 smaller expressions as needed. This should, in theory, make it |
| 557 possible to specify an arbitrarily large number of tokens. |
| 558 |
| 559 09/21/06: beazley |
| 560 Improved error checking in lex.py. Rules that match the empty string |
| 561 are now rejected (otherwise they cause the lexer to enter an infinite |
| 562 loop). An extra check for rules containing '#' has also been added. |
| 563 Since lex compiles regular expressions in verbose mode, '#' is interpr
eted |
| 564 as a regex comment, it is critical to use '\#' instead. |
| 565 |
| 566 09/18/06: beazley |
| 567 Added a @TOKEN decorator function to lex.py that can be used to |
| 568 define token rules where the documentation string might be computed |
| 569 in some way. |
| 570 |
| 571 digit = r'([0-9])' |
| 572 nondigit = r'([_A-Za-z])' |
| 573 identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit +
r')*)' |
| 574 |
| 575 from ply.lex import TOKEN |
| 576 |
| 577 @TOKEN(identifier) |
| 578 def t_ID(t): |
| 579 # Do whatever |
| 580 |
| 581 The @TOKEN decorator merely sets the documentation string of the |
| 582 associated token function as needed for lex to work. |
| 583 |
| 584 Note: An alternative solution is the following: |
| 585 |
| 586 def t_ID(t): |
| 587 # Do whatever |
| 588 |
| 589 t_ID.__doc__ = identifier |
| 590 |
| 591 Note: Decorators require the use of Python 2.4 or later. If compatibi
lity |
| 592 with old versions is needed, use the latter solution. |
| 593 |
| 594 The need for this feature was suggested by Cem Karan. |
| 595 |
| 596 09/14/06: beazley |
| 597 Support for single-character literal tokens has been added to yacc. |
| 598 These literals must be enclosed in quotes. For example: |
| 599 |
| 600 def p_expr(p): |
| 601 "expr : expr '+' expr" |
| 602 ... |
| 603 |
| 604 def p_expr(p): |
| 605 'expr : expr "-" expr' |
| 606 ... |
| 607 |
| 608 In addition to this, it is necessary to tell the lexer module about |
| 609 literal characters. This is done by defining the variable 'literals' |
| 610 as a list of characters. This should be defined in the module that |
| 611 invokes the lex.lex() function. For example: |
| 612 |
| 613 literals = ['+','-','*','/','(',')','='] |
| 614 |
| 615 or simply |
| 616 |
| 617 literals = '+=*/()=' |
| 618 |
| 619 It is important to note that literals can only be a single character. |
| 620 When the lexer fails to match a token using its normal regular express
ion |
| 621 rules, it will check the current character against the literal list. |
| 622 If found, it will be returned with a token type set to match the liter
al |
| 623 character. Otherwise, an illegal character will be signalled. |
| 624 |
| 625 |
| 626 09/14/06: beazley |
| 627 Modified PLY to install itself as a proper Python package called 'ply'
. |
| 628 This will make it a little more friendly to other modules. This |
| 629 changes the usage of PLY only slightly. Just do this to import the |
| 630 modules |
| 631 |
| 632 import ply.lex as lex |
| 633 import ply.yacc as yacc |
| 634 |
| 635 Alternatively, you can do this: |
| 636 |
| 637 from ply import * |
| 638 |
| 639 Which imports both the lex and yacc modules. |
| 640 Change suggested by Lee June. |
| 641 |
| 642 09/13/06: beazley |
| 643 Changed the handling of negative indices when used in production rules
. |
| 644 A negative production index now accesses already parsed symbols on the |
| 645 parsing stack. For example, |
| 646 |
| 647 def p_foo(p): |
| 648 "foo: A B C D" |
| 649 print p[1] # Value of 'A' symbol |
| 650 print p[2] # Value of 'B' symbol |
| 651 print p[-1] # Value of whatever symbol appears before A |
| 652 # on the parsing stack. |
| 653 |
| 654 p[0] = some_val # Sets the value of the 'foo' grammer symbol |
| 655 |
| 656 This behavior makes it easier to work with embedded actions within the |
| 657 parsing rules. For example, in C-yacc, it is possible to write code li
ke |
| 658 this: |
| 659 |
| 660 bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } |
| 661 |
| 662 In this example, the printf() code executes immediately after A has be
en |
| 663 parsed. Within the embedded action code, $1 refers to the A symbol on |
| 664 the stack. |
| 665 |
| 666 To perform this equivalent action in PLY, you need to write a pair |
| 667 of rules like this: |
| 668 |
| 669 def p_bar(p): |
| 670 "bar : A seen_A B" |
| 671 do_stuff |
| 672 |
| 673 def p_seen_A(p): |
| 674 "seen_A :" |
| 675 print "seen an A =", p[-1] |
| 676 |
| 677 The second rule "seen_A" is merely a empty production which should be |
| 678 reduced as soon as A is parsed in the "bar" rule above. The use |
| 679 of the negative index p[-1] is used to access whatever symbol appeared |
| 680 before the seen_A symbol. |
| 681 |
| 682 This feature also makes it possible to support inherited attributes. |
| 683 For example: |
| 684 |
| 685 def p_decl(p): |
| 686 "decl : scope name" |
| 687 |
| 688 def p_scope(p): |
| 689 """scope : GLOBAL |
| 690 | LOCAL""" |
| 691 p[0] = p[1] |
| 692 |
| 693 def p_name(p): |
| 694 "name : ID" |
| 695 if p[-1] == "GLOBAL": |
| 696 # ... |
| 697 else if p[-1] == "LOCAL": |
| 698 #... |
| 699 |
| 700 In this case, the name rule is inheriting an attribute from the |
| 701 scope declaration that precedes it. |
| 702 |
| 703 *** POTENTIAL INCOMPATIBILITY *** |
| 704 If you are currently using negative indices within existing grammar ru
les, |
| 705 your code will break. This should be extremely rare if non-existent i
n |
| 706 most cases. The argument to various grammar rules is not usually not |
| 707 processed in the same way as a list of items. |
| 708 |
| 709 Version 2.0 |
| 710 ------------------------------ |
| 711 09/07/06: beazley |
| 712 Major cleanup and refactoring of the LR table generation code. Both S
LR |
| 713 and LALR(1) table generation is now performed by the same code base wi
th |
| 714 only minor extensions for extra LALR(1) processing. |
| 715 |
| 716 09/07/06: beazley |
| 717 Completely reimplemented the entire LALR(1) parsing engine to use the |
| 718 DeRemer and Pennello algorithm for calculating lookahead sets. This |
| 719 significantly improves the performance of generating LALR(1) tables |
| 720 and has the added feature of actually working correctly! If you |
| 721 experienced weird behavior with LALR(1) in prior releases, this should |
| 722 hopefully resolve all of those problems. Many thanks to |
| 723 Andrew Waters and Markus Schoepflin for submitting bug reports |
| 724 and helping me test out the revised LALR(1) support. |
| 725 |
| 726 Version 1.8 |
| 727 ------------------------------ |
| 728 08/02/06: beazley |
| 729 Fixed a problem related to the handling of default actions in LALR(1) |
| 730 parsing. If you experienced subtle and/or bizarre behavior when tryin
g |
| 731 to use the LALR(1) engine, this may correct those problems. Patch |
| 732 contributed by Russ Cox. Note: This patch has been superceded by |
| 733 revisions for LALR(1) parsing in Ply-2.0. |
| 734 |
| 735 08/02/06: beazley |
| 736 Added support for slicing of productions in yacc. |
| 737 Patch contributed by Patrick Mezard. |
| 738 |
| 739 Version 1.7 |
| 740 ------------------------------ |
| 741 03/02/06: beazley |
| 742 Fixed infinite recursion problem ReduceToTerminals() function that |
| 743 would sometimes come up in LALR(1) table generation. Reported by |
| 744 Markus Schoepflin. |
| 745 |
| 746 03/01/06: beazley |
| 747 Added "reflags" argument to lex(). For example: |
| 748 |
| 749 lex.lex(reflags=re.UNICODE) |
| 750 |
| 751 This can be used to specify optional flags to the re.compile() functio
n |
| 752 used inside the lexer. This may be necessary for special situations
such |
| 753 as processing Unicode (e.g., if you want escapes like \w and \b to con
sult |
| 754 the Unicode character property database). The need for this suggeste
d by |
| 755 Andreas Jung. |
| 756 |
| 757 03/01/06: beazley |
| 758 Fixed a bug with an uninitialized variable on repeated instantiations
of parser |
| 759 objects when the write_tables=0 argument was used. Reported by Micha
el Brown. |
| 760 |
| 761 03/01/06: beazley |
| 762 Modified lex.py to accept Unicode strings both as the regular expressi
ons for |
| 763 tokens and as input. Hopefully this is the only change needed for Unic
ode support. |
| 764 Patch contributed by Johan Dahl. |
| 765 |
| 766 03/01/06: beazley |
| 767 Modified the class-based interface to work with new-style or old-style
classes. |
| 768 Patch contributed by Michael Brown (although I tweaked it slightly so
it would work |
| 769 with older versions of Python). |
| 770 |
| 771 Version 1.6 |
| 772 ------------------------------ |
| 773 05/27/05: beazley |
| 774 Incorporated patch contributed by Christopher Stawarz to fix an extrem
ely |
| 775 devious bug in LALR(1) parser generation. This patch should fix prob
lems |
| 776 numerous people reported with LALR parsing. |
| 777 |
| 778 05/27/05: beazley |
| 779 Fixed problem with lex.py copy constructor. Reported by Dave Aitel, A
aron Lav, |
| 780 and Thad Austin. |
| 781 |
| 782 05/27/05: beazley |
| 783 Added outputdir option to yacc() to control output directory. Contrib
uted |
| 784 by Christopher Stawarz. |
| 785 |
| 786 05/27/05: beazley |
| 787 Added rununit.py test script to run tests using the Python unittest mo
dule. |
| 788 Contributed by Miki Tebeka. |
| 789 |
| 790 Version 1.5 |
| 791 ------------------------------ |
| 792 05/26/04: beazley |
| 793 Major enhancement. LALR(1) parsing support is now working. |
| 794 This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu
) |
| 795 and optimized by David Beazley. To use LALR(1) parsing do |
| 796 the following: |
| 797 |
| 798 yacc.yacc(method="LALR") |
| 799 |
| 800 Computing LALR(1) parsing tables takes about twice as long as |
| 801 the default SLR method. However, LALR(1) allows you to handle |
| 802 more complex grammars. For example, the ANSI C grammar |
| 803 (in example/ansic) has 13 shift-reduce conflicts with SLR, but |
| 804 only has 1 shift-reduce conflict with LALR(1). |
| 805 |
| 806 05/20/04: beazley |
| 807 Added a __len__ method to parser production lists. Can |
| 808 be used in parser rules like this: |
| 809 |
| 810 def p_somerule(p): |
| 811 """a : B C D |
| 812 | E F" |
| 813 if (len(p) == 3): |
| 814 # Must have been first rule |
| 815 elif (len(p) == 2): |
| 816 # Must be second rule |
| 817 |
| 818 Suggested by Joshua Gerth and others. |
| 819 |
| 820 Version 1.4 |
| 821 ------------------------------ |
| 822 04/23/04: beazley |
| 823 Incorporated a variety of patches contributed by Eric Raymond. |
| 824 These include: |
| 825 |
| 826 0. Cleans up some comments so they don't wrap on an 80-column display
. |
| 827 1. Directs compiler errors to stderr where they belong. |
| 828 2. Implements and documents automatic line counting when \n is ignore
d. |
| 829 3. Changes the way progress messages are dumped when debugging is on.
|
| 830 The new format is both less verbose and conveys more information t
han |
| 831 the old, including shift and reduce actions. |
| 832 |
| 833 04/23/04: beazley |
| 834 Added a Python setup.py file to simply installation. Contributed |
| 835 by Adam Kerrison. |
| 836 |
| 837 04/23/04: beazley |
| 838 Added patches contributed by Adam Kerrison. |
| 839 |
| 840 - Some output is now only shown when debugging is enabled. This |
| 841 means that PLY will be completely silent when not in debugging mod
e. |
| 842 |
| 843 - An optional parameter "write_tables" can be passed to yacc() to |
| 844 control whether or not parsing tables are written. By default, |
| 845 it is true, but it can be turned off if you don't want the yacc |
| 846 table file. Note: disabling this will cause yacc() to regenerate |
| 847 the parsing table each time. |
| 848 |
| 849 04/23/04: beazley |
| 850 Added patches contributed by David McNab. This patch addes two |
| 851 features: |
| 852 |
| 853 - The parser can be supplied as a class instead of a module. |
| 854 For an example of this, see the example/classcalc directory. |
| 855 |
| 856 - Debugging output can be directed to a filename of the user's |
| 857 choice. Use |
| 858 |
| 859 yacc(debugfile="somefile.out") |
| 860 |
| 861 |
| 862 Version 1.3 |
| 863 ------------------------------ |
| 864 12/10/02: jmdyck |
| 865 Various minor adjustments to the code that Dave checked in today. |
| 866 Updated test/yacc_{inf,unused}.exp to reflect today's changes. |
| 867 |
| 868 12/10/02: beazley |
| 869 Incorporated a variety of minor bug fixes to empty production |
| 870 handling and infinite recursion checking. Contributed by |
| 871 Michael Dyck. |
| 872 |
| 873 12/10/02: beazley |
| 874 Removed bogus recover() method call in yacc.restart() |
| 875 |
| 876 Version 1.2 |
| 877 ------------------------------ |
| 878 11/27/02: beazley |
| 879 Lexer and parser objects are now available as an attribute |
| 880 of tokens and slices respectively. For example: |
| 881 |
| 882 def t_NUMBER(t): |
| 883 r'\d+' |
| 884 print t.lexer |
| 885 |
| 886 def p_expr_plus(t): |
| 887 'expr: expr PLUS expr' |
| 888 print t.lexer |
| 889 print t.parser |
| 890 |
| 891 This can be used for state management (if needed). |
| 892 |
| 893 10/31/02: beazley |
| 894 Modified yacc.py to work with Python optimize mode. To make |
| 895 this work, you need to use |
| 896 |
| 897 yacc.yacc(optimize=1) |
| 898 |
| 899 Furthermore, you need to first run Python in normal mode |
| 900 to generate the necessary parsetab.py files. After that, |
| 901 you can use python -O or python -OO. |
| 902 |
| 903 Note: optimized mode turns off a lot of error checking. |
| 904 Only use when you are sure that your grammar is working. |
| 905 Make sure parsetab.py is up to date! |
| 906 |
| 907 10/30/02: beazley |
| 908 Added cloning of Lexer objects. For example: |
| 909 |
| 910 import copy |
| 911 l = lex.lex() |
| 912 lc = copy.copy(l) |
| 913 |
| 914 l.input("Some text") |
| 915 lc.input("Some other text") |
| 916 ... |
| 917 |
| 918 This might be useful if the same "lexer" is meant to |
| 919 be used in different contexts---or if multiple lexers |
| 920 are running concurrently. |
| 921 |
| 922 10/30/02: beazley |
| 923 Fixed subtle bug with first set computation and empty productions. |
| 924 Patch submitted by Michael Dyck. |
| 925 |
| 926 10/30/02: beazley |
| 927 Fixed error messages to use "filename:line: message" instead |
| 928 of "filename:line. message". This makes error reporting more |
| 929 friendly to emacs. Patch submitted by François Pinard. |
| 930 |
| 931 10/30/02: beazley |
| 932 Improvements to parser.out file. Terminals and nonterminals |
| 933 are sorted instead of being printed in random order. |
| 934 Patch submitted by François Pinard. |
| 935 |
| 936 10/30/02: beazley |
| 937 Improvements to parser.out file output. Rules are now printed |
| 938 in a way that's easier to understand. Contributed by Russ Cox. |
| 939 |
| 940 10/30/02: beazley |
| 941 Added 'nonassoc' associativity support. This can be used |
| 942 to disable the chaining of operators like a < b < c. |
| 943 To use, simply specify 'nonassoc' in the precedence table |
| 944 |
| 945 precedence = ( |
| 946 ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators |
| 947 ('left', 'PLUS', 'MINUS'), |
| 948 ('left', 'TIMES', 'DIVIDE'), |
| 949 ('right', 'UMINUS'), # Unary minus operator |
| 950 ) |
| 951 |
| 952 Patch contributed by Russ Cox. |
| 953 |
| 954 10/30/02: beazley |
| 955 Modified the lexer to provide optional support for Python -O and -OO |
| 956 modes. To make this work, Python *first* needs to be run in |
| 957 unoptimized mode. This reads the lexing information and creates a |
| 958 file "lextab.py". Then, run lex like this: |
| 959 |
| 960 # module foo.py |
| 961 ... |
| 962 ... |
| 963 lex.lex(optimize=1) |
| 964 |
| 965 Once the lextab file has been created, subsequent calls to |
| 966 lex.lex() will read data from the lextab file instead of using |
| 967 introspection. In optimized mode (-O, -OO) everything should |
| 968 work normally despite the loss of doc strings. |
| 969 |
| 970 To change the name of the file 'lextab.py' use the following: |
| 971 |
| 972 lex.lex(lextab="footab") |
| 973 |
| 974 (this creates a file footab.py) |
| 975 |
| 976 |
| 977 Version 1.1 October 25, 2001 |
| 978 ------------------------------ |
| 979 |
| 980 10/25/01: beazley |
| 981 Modified the table generator to produce much more compact data. |
| 982 This should greatly reduce the size of the parsetab.py[c] file. |
| 983 Caveat: the tables still need to be constructed so a little more |
| 984 work is done in parsetab on import. |
| 985 |
| 986 10/25/01: beazley |
| 987 There may be a possible bug in the cycle detector that reports errors |
| 988 about infinite recursion. I'm having a little trouble tracking it |
| 989 down, but if you get this problem, you can disable the cycle |
| 990 detector as follows: |
| 991 |
| 992 yacc.yacc(check_recursion = 0) |
| 993 |
| 994 10/25/01: beazley |
| 995 Fixed a bug in lex.py that sometimes caused illegal characters to be |
| 996 reported incorrectly. Reported by Sverre Jørgensen. |
| 997 |
| 998 7/8/01 : beazley |
| 999 Added a reference to the underlying lexer object when tokens are handl
ed by |
| 1000 functions. The lexer is available as the 'lexer' attribute. This |
| 1001 was added to provide better lexing support for languages such as Fortr
an |
| 1002 where certain types of tokens can't be conveniently expressed as regul
ar |
| 1003 expressions (and where the tokenizing function may want to perform a |
| 1004 little backtracking). Suggested by Pearu Peterson. |
| 1005 |
| 1006 6/20/01 : beazley |
| 1007 Modified yacc() function so that an optional starting symbol can be sp
ecified. |
| 1008 For example: |
| 1009 |
| 1010 yacc.yacc(start="statement") |
| 1011 |
| 1012 Normally yacc always treats the first production rule as the starting
symbol. |
| 1013 However, if you are debugging your grammar it may be useful to specify |
| 1014 an alternative starting symbol. Idea suggested by Rich Salz. |
| 1015 |
| 1016 Version 1.0 June 18, 2001 |
| 1017 -------------------------- |
| 1018 Initial public offering |
| 1019 |
OLD | NEW |