Index: tools/nixysa/third_party/ply-3.1/CHANGES |
=================================================================== |
--- tools/nixysa/third_party/ply-3.1/CHANGES (revision 0) |
+++ tools/nixysa/third_party/ply-3.1/CHANGES (revision 0) |
@@ -0,0 +1,1019 @@ |
+Version 3.1 |
+----------------------------- |
+02/28/09: beazley |
+ Fixed broken start argument to yacc(). PLY-3.0 broke this |
+ feature by accident. |
+ |
+02/28/09: beazley |
+ Fixed debugging output. yacc() no longer reports shift/reduce |
+ or reduce/reduce conflicts if debugging is turned off. This |
+ restores similar behavior in PLY-2.5. Reported by Andrew Waters. |
+ |
+Version 3.0 |
+----------------------------- |
+02/03/09: beazley |
+ Fixed missing lexer attribute on certain tokens when |
+ invoking the parser p_error() function. Reported by |
+ Bart Whiteley. |
+ |
+02/02/09: beazley |
+ The lex() command now does all error-reporting and diagonistics |
+ using the logging module interface. Pass in a Logger object |
+ using the errorlog parameter to specify a different logger. |
+ |
+02/02/09: beazley |
+ Refactored ply.lex to use a more object-oriented and organized |
+ approach to collecting lexer information. |
+ |
+02/01/09: beazley |
+ Removed the nowarn option from lex(). All output is controlled |
+ by passing in a logger object. Just pass in a logger with a high |
+ level setting to suppress output. This argument was never |
+ documented to begin with so hopefully no one was relying upon it. |
+ |
+02/01/09: beazley |
+ Discovered and removed a dead if-statement in the lexer. This |
+ resulted in a 6-7% speedup in lexing when I tested it. |
+ |
+01/13/09: beazley |
+ Minor change to the procedure for signalling a syntax error in a |
+ production rule. A normal SyntaxError exception should be raised |
+ instead of yacc.SyntaxError. |
+ |
+01/13/09: beazley |
+ Added a new method p.set_lineno(n,lineno) that can be used to set the |
+ line number of symbol n in grammar rules. This simplifies manual |
+ tracking of line numbers. |
+ |
+01/11/09: beazley |
+ Vastly improved debugging support for yacc.parse(). Instead of passing |
+ debug as an integer, you can supply a Logging object (see the logging |
+ module). Messages will be generated at the ERROR, INFO, and DEBUG |
+ logging levels, each level providing progressively more information. |
+ The debugging trace also shows states, grammar rule, values passed |
+ into grammar rules, and the result of each reduction. |
+ |
+01/09/09: beazley |
+ The yacc() command now does all error-reporting and diagnostics using |
+ the interface of the logging module. Use the errorlog parameter to |
+ specify a logging object for error messages. Use the debuglog parameter |
+ to specify a logging object for the 'parser.out' output. |
+ |
+01/09/09: beazley |
+ *HUGE* refactoring of the the ply.yacc() implementation. The high-level |
+ user interface is backwards compatible, but the internals are completely |
+ reorganized into classes. No more global variables. The internals |
+ are also more extensible. For example, you can use the classes to |
+ construct a LALR(1) parser in an entirely different manner than |
+ what is currently the case. Documentation is forthcoming. |
+ |
+01/07/09: beazley |
+ Various cleanup and refactoring of yacc internals. |
+ |
+01/06/09: beazley |
+ Fixed a bug with precedence assignment. yacc was assigning the precedence |
+ each rule based on the left-most token, when in fact, it should have been |
+ using the right-most token. Reported by Bruce Frederiksen. |
+ |
+11/27/08: beazley |
+ Numerous changes to support Python 3.0 including removal of deprecated |
+ statements (e.g., has_key) and the additional of compatibility code |
+ to emulate features from Python 2 that have been removed, but which |
+ are needed. Fixed the unit testing suite to work with Python 3.0. |
+ The code should be backwards compatible with Python 2. |
+ |
+11/26/08: beazley |
+ Loosened the rules on what kind of objects can be passed in as the |
+ "module" parameter to lex() and yacc(). Previously, you could only use |
+ a module or an instance. Now, PLY just uses dir() to get a list of |
+ symbols on whatever the object is without regard for its type. |
+ |
+11/26/08: beazley |
+ Changed all except: statements to be compatible with Python2.x/3.x syntax. |
+ |
+11/26/08: beazley |
+ Changed all raise Exception, value statements to raise Exception(value) for |
+ forward compatibility. |
+ |
+11/26/08: beazley |
+ Removed all print statements from lex and yacc, using sys.stdout and sys.stderr |
+ directly. Preparation for Python 3.0 support. |
+ |
+11/04/08: beazley |
+ Fixed a bug with referring to symbols on the the parsing stack using negative |
+ indices. |
+ |
+05/29/08: beazley |
+ Completely revamped the testing system to use the unittest module for everything. |
+ Added additional tests to cover new errors/warnings. |
+ |
+Version 2.5 |
+----------------------------- |
+05/28/08: beazley |
+ Fixed a bug with writing lex-tables in optimized mode and start states. |
+ Reported by Kevin Henry. |
+ |
+Version 2.4 |
+----------------------------- |
+05/04/08: beazley |
+ A version number is now embedded in the table file signature so that |
+ yacc can more gracefully accomodate changes to the output format |
+ in the future. |
+ |
+05/04/08: beazley |
+ Removed undocumented .pushback() method on grammar productions. I'm |
+ not sure this ever worked and can't recall ever using it. Might have |
+ been an abandoned idea that never really got fleshed out. This |
+ feature was never described or tested so removing it is hopefully |
+ harmless. |
+ |
+05/04/08: beazley |
+ Added extra error checking to yacc() to detect precedence rules defined |
+ for undefined terminal symbols. This allows yacc() to detect a potential |
+ problem that can be really tricky to debug if no warning message or error |
+ message is generated about it. |
+ |
+05/04/08: beazley |
+ lex() now has an outputdir that can specify the output directory for |
+ tables when running in optimize mode. For example: |
+ |
+ lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar") |
+ |
+ The behavior of specifying a table module and output directory are |
+ more aligned with the behavior of yacc(). |
+ |
+05/04/08: beazley |
+ [Issue 9] |
+ Fixed filename bug in when specifying the modulename in lex() and yacc(). |
+ If you specified options such as the following: |
+ |
+ parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar") |
+ |
+ yacc would create a file "foo.bar.parsetab.py" in the given directory. |
+ Now, it simply generates a file "parsetab.py" in that directory. |
+ Bug reported by cptbinho. |
+ |
+05/04/08: beazley |
+ Slight modification to lex() and yacc() to allow their table files |
+ to be loaded from a previously loaded module. This might make |
+ it easier to load the parsing tables from a complicated package |
+ structure. For example: |
+ |
+ import foo.bar.spam.parsetab as parsetab |
+ parser = yacc.yacc(tabmodule=parsetab) |
+ |
+ Note: lex and yacc will never regenerate the table file if used |
+ in the form---you will get a warning message instead. |
+ This idea suggested by Brian Clapper. |
+ |
+ |
+04/28/08: beazley |
+ Fixed a big with p_error() functions being picked up correctly |
+ when running in yacc(optimize=1) mode. Patch contributed by |
+ Bart Whiteley. |
+ |
+02/28/08: beazley |
+ Fixed a bug with 'nonassoc' precedence rules. Basically the |
+ non-precedence was being ignored and not producing the correct |
+ run-time behavior in the parser. |
+ |
+02/16/08: beazley |
+ Slight relaxation of what the input() method to a lexer will |
+ accept as a string. Instead of testing the input to see |
+ if the input is a string or unicode string, it checks to see |
+ if the input object looks like it contains string data. |
+ This change makes it possible to pass string-like objects |
+ in as input. For example, the object returned by mmap. |
+ |
+ import mmap, os |
+ data = mmap.mmap(os.open(filename,os.O_RDONLY), |
+ os.path.getsize(filename), |
+ access=mmap.ACCESS_READ) |
+ lexer.input(data) |
+ |
+ |
+11/29/07: beazley |
+ Modification of ply.lex to allow token functions to aliased. |
+ This is subtle, but it makes it easier to create libraries and |
+ to reuse token specifications. For example, suppose you defined |
+ a function like this: |
+ |
+ def number(t): |
+ r'\d+' |
+ t.value = int(t.value) |
+ return t |
+ |
+ This change would allow you to define a token rule as follows: |
+ |
+ t_NUMBER = number |
+ |
+ In this case, the token type will be set to 'NUMBER' and use |
+ the associated number() function to process tokens. |
+ |
+11/28/07: beazley |
+ Slight modification to lex and yacc to grab symbols from both |
+ the local and global dictionaries of the caller. This |
+ modification allows lexers and parsers to be defined using |
+ inner functions and closures. |
+ |
+11/28/07: beazley |
+ Performance optimization: The lexer.lexmatch and t.lexer |
+ attributes are no longer set for lexer tokens that are not |
+ defined by functions. The only normal use of these attributes |
+ would be in lexer rules that need to perform some kind of |
+ special processing. Thus, it doesn't make any sense to set |
+ them on every token. |
+ |
+ *** POTENTIAL INCOMPATIBILITY *** This might break code |
+ that is mucking around with internal lexer state in some |
+ sort of magical way. |
+ |
+11/27/07: beazley |
+ Added the ability to put the parser into error-handling mode |
+ from within a normal production. To do this, simply raise |
+ a yacc.SyntaxError exception like this: |
+ |
+ def p_some_production(p): |
+ 'some_production : prod1 prod2' |
+ ... |
+ raise yacc.SyntaxError # Signal an error |
+ |
+ A number of things happen after this occurs: |
+ |
+ - The last symbol shifted onto the symbol stack is discarded |
+ and parser state backed up to what it was before the |
+ the rule reduction. |
+ |
+ - The current lookahead symbol is saved and replaced by |
+ the 'error' symbol. |
+ |
+ - The parser enters error recovery mode where it tries |
+ to either reduce the 'error' rule or it starts |
+ discarding items off of the stack until the parser |
+ resets. |
+ |
+ When an error is manually set, the parser does *not* call |
+ the p_error() function (if any is defined). |
+ *** NEW FEATURE *** Suggested on the mailing list |
+ |
+11/27/07: beazley |
+ Fixed structure bug in examples/ansic. Reported by Dion Blazakis. |
+ |
+11/27/07: beazley |
+ Fixed a bug in the lexer related to start conditions and ignored |
+ token rules. If a rule was defined that changed state, but |
+ returned no token, the lexer could be left in an inconsistent |
+ state. Reported by |
+ |
+11/27/07: beazley |
+ Modified setup.py to support Python Eggs. Patch contributed by |
+ Simon Cross. |
+ |
+11/09/07: beazely |
+ Fixed a bug in error handling in yacc. If a syntax error occurred and the |
+ parser rolled the entire parse stack back, the parser would be left in in |
+ inconsistent state that would cause it to trigger incorrect actions on |
+ subsequent input. Reported by Ton Biegstraaten, Justin King, and others. |
+ |
+11/09/07: beazley |
+ Fixed a bug when passing empty input strings to yacc.parse(). This |
+ would result in an error message about "No input given". Reported |
+ by Andrew Dalke. |
+ |
+Version 2.3 |
+----------------------------- |
+02/20/07: beazley |
+ Fixed a bug with character literals if the literal '.' appeared as the |
+ last symbol of a grammar rule. Reported by Ales Smrcka. |
+ |
+02/19/07: beazley |
+ Warning messages are now redirected to stderr instead of being printed |
+ to standard output. |
+ |
+02/19/07: beazley |
+ Added a warning message to lex.py if it detects a literal backslash |
+ character inside the t_ignore declaration. This is to help |
+ problems that might occur if someone accidentally defines t_ignore |
+ as a Python raw string. For example: |
+ |
+ t_ignore = r' \t' |
+ |
+ The idea for this is from an email I received from David Cimimi who |
+ reported bizarre behavior in lexing as a result of defining t_ignore |
+ as a raw string by accident. |
+ |
+02/18/07: beazley |
+ Performance improvements. Made some changes to the internal |
+ table organization and LR parser to improve parsing performance. |
+ |
+02/18/07: beazley |
+ Automatic tracking of line number and position information must now be |
+ enabled by a special flag to parse(). For example: |
+ |
+ yacc.parse(data,tracking=True) |
+ |
+ In many applications, it's just not that important to have the |
+ parser automatically track all line numbers. By making this an |
+ optional feature, it allows the parser to run significantly faster |
+ (more than a 20% speed increase in many cases). Note: positional |
+ information is always available for raw tokens---this change only |
+ applies to positional information associated with nonterminal |
+ grammar symbols. |
+ *** POTENTIAL INCOMPATIBILITY *** |
+ |
+02/18/07: beazley |
+ Yacc no longer supports extended slices of grammar productions. |
+ However, it does support regular slices. For example: |
+ |
+ def p_foo(p): |
+ '''foo: a b c d e''' |
+ p[0] = p[1:3] |
+ |
+ This change is a performance improvement to the parser--it streamlines |
+ normal access to the grammar values since slices are now handled in |
+ a __getslice__() method as opposed to __getitem__(). |
+ |
+02/12/07: beazley |
+ Fixed a bug in the handling of token names when combined with |
+ start conditions. Bug reported by Todd O'Bryan. |
+ |
+Version 2.2 |
+------------------------------ |
+11/01/06: beazley |
+ Added lexpos() and lexspan() methods to grammar symbols. These |
+ mirror the same functionality of lineno() and linespan(). For |
+ example: |
+ |
+ def p_expr(p): |
+ 'expr : expr PLUS expr' |
+ p.lexpos(1) # Lexing position of left-hand-expression |
+ p.lexpos(1) # Lexing position of PLUS |
+ start,end = p.lexspan(3) # Lexing range of right hand expression |
+ |
+11/01/06: beazley |
+ Minor change to error handling. The recommended way to skip characters |
+ in the input is to use t.lexer.skip() as shown here: |
+ |
+ def t_error(t): |
+ print "Illegal character '%s'" % t.value[0] |
+ t.lexer.skip(1) |
+ |
+ The old approach of just using t.skip(1) will still work, but won't |
+ be documented. |
+ |
+10/31/06: beazley |
+ Discarded tokens can now be specified as simple strings instead of |
+ functions. To do this, simply include the text "ignore_" in the |
+ token declaration. For example: |
+ |
+ t_ignore_cppcomment = r'//.*' |
+ |
+ Previously, this had to be done with a function. For example: |
+ |
+ def t_ignore_cppcomment(t): |
+ r'//.*' |
+ pass |
+ |
+ If start conditions/states are being used, state names should appear |
+ before the "ignore_" text. |
+ |
+10/19/06: beazley |
+ The Lex module now provides support for flex-style start conditions |
+ as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html. |
+ Please refer to this document to understand this change note. Refer to |
+ the PLY documentation for PLY-specific explanation of how this works. |
+ |
+ To use start conditions, you first need to declare a set of states in |
+ your lexer file: |
+ |
+ states = ( |
+ ('foo','exclusive'), |
+ ('bar','inclusive') |
+ ) |
+ |
+ This serves the same role as the %s and %x specifiers in flex. |
+ |
+ One a state has been declared, tokens for that state can be |
+ declared by defining rules of the form t_state_TOK. For example: |
+ |
+ t_PLUS = '\+' # Rule defined in INITIAL state |
+ t_foo_NUM = '\d+' # Rule defined in foo state |
+ t_bar_NUM = '\d+' # Rule defined in bar state |
+ |
+ t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar |
+ t_ANY_NUM = '\d+' # Rule defined in all states |
+ |
+ In addition to defining tokens for each state, the t_ignore and t_error |
+ specifications can be customized for specific states. For example: |
+ |
+ t_foo_ignore = " " # Ignored characters for foo state |
+ def t_bar_error(t): |
+ # Handle errors in bar state |
+ |
+ With token rules, the following methods can be used to change states |
+ |
+ def t_TOKNAME(t): |
+ t.lexer.begin('foo') # Begin state 'foo' |
+ t.lexer.push_state('foo') # Begin state 'foo', push old state |
+ # onto a stack |
+ t.lexer.pop_state() # Restore previous state |
+ t.lexer.current_state() # Returns name of current state |
+ |
+ These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and |
+ yy_top_state() functions in flex. |
+ |
+ The use of start states can be used as one way to write sub-lexers. |
+ For example, the lexer or parser might instruct the lexer to start |
+ generating a different set of tokens depending on the context. |
+ |
+ example/yply/ylex.py shows the use of start states to grab C/C++ |
+ code fragments out of traditional yacc specification files. |
+ |
+ *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also |
+ discussed various aspects of the design. |
+ |
+10/19/06: beazley |
+ Minor change to the way in which yacc.py was reporting shift/reduce |
+ conflicts. Although the underlying LALR(1) algorithm was correct, |
+ PLY was under-reporting the number of conflicts compared to yacc/bison |
+ when precedence rules were in effect. This change should make PLY |
+ report the same number of conflicts as yacc. |
+ |
+10/19/06: beazley |
+ Modified yacc so that grammar rules could also include the '-' |
+ character. For example: |
+ |
+ def p_expr_list(p): |
+ 'expression-list : expression-list expression' |
+ |
+ Suggested by Oldrich Jedlicka. |
+ |
+10/18/06: beazley |
+ Attribute lexer.lexmatch added so that token rules can access the re |
+ match object that was generated. For example: |
+ |
+ def t_FOO(t): |
+ r'some regex' |
+ m = t.lexer.lexmatch |
+ # Do something with m |
+ |
+ |
+ This may be useful if you want to access named groups specified within |
+ the regex for a specific token. Suggested by Oldrich Jedlicka. |
+ |
+10/16/06: beazley |
+ Changed the error message that results if an illegal character |
+ is encountered and no default error function is defined in lex. |
+ The exception is now more informative about the actual cause of |
+ the error. |
+ |
+Version 2.1 |
+------------------------------ |
+10/02/06: beazley |
+ The last Lexer object built by lex() can be found in lex.lexer. |
+ The last Parser object built by yacc() can be found in yacc.parser. |
+ |
+10/02/06: beazley |
+ New example added: examples/yply |
+ |
+ This example uses PLY to convert Unix-yacc specification files to |
+ PLY programs with the same grammar. This may be useful if you |
+ want to convert a grammar from bison/yacc to use with PLY. |
+ |
+10/02/06: beazley |
+ Added support for a start symbol to be specified in the yacc |
+ input file itself. Just do this: |
+ |
+ start = 'name' |
+ |
+ where 'name' matches some grammar rule. For example: |
+ |
+ def p_name(p): |
+ 'name : A B C' |
+ ... |
+ |
+ This mirrors the functionality of the yacc %start specifier. |
+ |
+09/30/06: beazley |
+ Some new examples added.: |
+ |
+ examples/GardenSnake : A simple indentation based language similar |
+ to Python. Shows how you might handle |
+ whitespace. Contributed by Andrew Dalke. |
+ |
+ examples/BASIC : An implementation of 1964 Dartmouth BASIC. |
+ Contributed by Dave against his better |
+ judgement. |
+ |
+09/28/06: beazley |
+ Minor patch to allow named groups to be used in lex regular |
+ expression rules. For example: |
+ |
+ t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)''' |
+ |
+ Patch submitted by Adam Ring. |
+ |
+09/28/06: beazley |
+ LALR(1) is now the default parsing method. To use SLR, use |
+ yacc.yacc(method="SLR"). Note: there is no performance impact |
+ on parsing when using LALR(1) instead of SLR. However, constructing |
+ the parsing tables will take a little longer. |
+ |
+09/26/06: beazley |
+ Change to line number tracking. To modify line numbers, modify |
+ the line number of the lexer itself. For example: |
+ |
+ def t_NEWLINE(t): |
+ r'\n' |
+ t.lexer.lineno += 1 |
+ |
+ This modification is both cleanup and a performance optimization. |
+ In past versions, lex was monitoring every token for changes in |
+ the line number. This extra processing is unnecessary for a vast |
+ majority of tokens. Thus, this new approach cleans it up a bit. |
+ |
+ *** POTENTIAL INCOMPATIBILITY *** |
+ You will need to change code in your lexer that updates the line |
+ number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1" |
+ |
+09/26/06: beazley |
+ Added the lexing position to tokens as an attribute lexpos. This |
+ is the raw index into the input text at which a token appears. |
+ This information can be used to compute column numbers and other |
+ details (e.g., scan backwards from lexpos to the first newline |
+ to get a column position). |
+ |
+09/25/06: beazley |
+ Changed the name of the __copy__() method on the Lexer class |
+ to clone(). This is used to clone a Lexer object (e.g., if |
+ you're running different lexers at the same time). |
+ |
+09/21/06: beazley |
+ Limitations related to the use of the re module have been eliminated. |
+ Several users reported problems with regular expressions exceeding |
+ more than 100 named groups. To solve this, lex.py is now capable |
+ of automatically splitting its master regular regular expression into |
+ smaller expressions as needed. This should, in theory, make it |
+ possible to specify an arbitrarily large number of tokens. |
+ |
+09/21/06: beazley |
+ Improved error checking in lex.py. Rules that match the empty string |
+ are now rejected (otherwise they cause the lexer to enter an infinite |
+ loop). An extra check for rules containing '#' has also been added. |
+ Since lex compiles regular expressions in verbose mode, '#' is interpreted |
+ as a regex comment, it is critical to use '\#' instead. |
+ |
+09/18/06: beazley |
+ Added a @TOKEN decorator function to lex.py that can be used to |
+ define token rules where the documentation string might be computed |
+ in some way. |
+ |
+ digit = r'([0-9])' |
+ nondigit = r'([_A-Za-z])' |
+ identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)' |
+ |
+ from ply.lex import TOKEN |
+ |
+ @TOKEN(identifier) |
+ def t_ID(t): |
+ # Do whatever |
+ |
+ The @TOKEN decorator merely sets the documentation string of the |
+ associated token function as needed for lex to work. |
+ |
+ Note: An alternative solution is the following: |
+ |
+ def t_ID(t): |
+ # Do whatever |
+ |
+ t_ID.__doc__ = identifier |
+ |
+ Note: Decorators require the use of Python 2.4 or later. If compatibility |
+ with old versions is needed, use the latter solution. |
+ |
+ The need for this feature was suggested by Cem Karan. |
+ |
+09/14/06: beazley |
+ Support for single-character literal tokens has been added to yacc. |
+ These literals must be enclosed in quotes. For example: |
+ |
+ def p_expr(p): |
+ "expr : expr '+' expr" |
+ ... |
+ |
+ def p_expr(p): |
+ 'expr : expr "-" expr' |
+ ... |
+ |
+ In addition to this, it is necessary to tell the lexer module about |
+ literal characters. This is done by defining the variable 'literals' |
+ as a list of characters. This should be defined in the module that |
+ invokes the lex.lex() function. For example: |
+ |
+ literals = ['+','-','*','/','(',')','='] |
+ |
+ or simply |
+ |
+ literals = '+=*/()=' |
+ |
+ It is important to note that literals can only be a single character. |
+ When the lexer fails to match a token using its normal regular expression |
+ rules, it will check the current character against the literal list. |
+ If found, it will be returned with a token type set to match the literal |
+ character. Otherwise, an illegal character will be signalled. |
+ |
+ |
+09/14/06: beazley |
+ Modified PLY to install itself as a proper Python package called 'ply'. |
+ This will make it a little more friendly to other modules. This |
+ changes the usage of PLY only slightly. Just do this to import the |
+ modules |
+ |
+ import ply.lex as lex |
+ import ply.yacc as yacc |
+ |
+ Alternatively, you can do this: |
+ |
+ from ply import * |
+ |
+ Which imports both the lex and yacc modules. |
+ Change suggested by Lee June. |
+ |
+09/13/06: beazley |
+ Changed the handling of negative indices when used in production rules. |
+ A negative production index now accesses already parsed symbols on the |
+ parsing stack. For example, |
+ |
+ def p_foo(p): |
+ "foo: A B C D" |
+ print p[1] # Value of 'A' symbol |
+ print p[2] # Value of 'B' symbol |
+ print p[-1] # Value of whatever symbol appears before A |
+ # on the parsing stack. |
+ |
+ p[0] = some_val # Sets the value of the 'foo' grammer symbol |
+ |
+ This behavior makes it easier to work with embedded actions within the |
+ parsing rules. For example, in C-yacc, it is possible to write code like |
+ this: |
+ |
+ bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; } |
+ |
+ In this example, the printf() code executes immediately after A has been |
+ parsed. Within the embedded action code, $1 refers to the A symbol on |
+ the stack. |
+ |
+ To perform this equivalent action in PLY, you need to write a pair |
+ of rules like this: |
+ |
+ def p_bar(p): |
+ "bar : A seen_A B" |
+ do_stuff |
+ |
+ def p_seen_A(p): |
+ "seen_A :" |
+ print "seen an A =", p[-1] |
+ |
+ The second rule "seen_A" is merely a empty production which should be |
+ reduced as soon as A is parsed in the "bar" rule above. The use |
+ of the negative index p[-1] is used to access whatever symbol appeared |
+ before the seen_A symbol. |
+ |
+ This feature also makes it possible to support inherited attributes. |
+ For example: |
+ |
+ def p_decl(p): |
+ "decl : scope name" |
+ |
+ def p_scope(p): |
+ """scope : GLOBAL |
+ | LOCAL""" |
+ p[0] = p[1] |
+ |
+ def p_name(p): |
+ "name : ID" |
+ if p[-1] == "GLOBAL": |
+ # ... |
+ else if p[-1] == "LOCAL": |
+ #... |
+ |
+ In this case, the name rule is inheriting an attribute from the |
+ scope declaration that precedes it. |
+ |
+ *** POTENTIAL INCOMPATIBILITY *** |
+ If you are currently using negative indices within existing grammar rules, |
+ your code will break. This should be extremely rare if non-existent in |
+ most cases. The argument to various grammar rules is not usually not |
+ processed in the same way as a list of items. |
+ |
+Version 2.0 |
+------------------------------ |
+09/07/06: beazley |
+ Major cleanup and refactoring of the LR table generation code. Both SLR |
+ and LALR(1) table generation is now performed by the same code base with |
+ only minor extensions for extra LALR(1) processing. |
+ |
+09/07/06: beazley |
+ Completely reimplemented the entire LALR(1) parsing engine to use the |
+ DeRemer and Pennello algorithm for calculating lookahead sets. This |
+ significantly improves the performance of generating LALR(1) tables |
+ and has the added feature of actually working correctly! If you |
+ experienced weird behavior with LALR(1) in prior releases, this should |
+ hopefully resolve all of those problems. Many thanks to |
+ Andrew Waters and Markus Schoepflin for submitting bug reports |
+ and helping me test out the revised LALR(1) support. |
+ |
+Version 1.8 |
+------------------------------ |
+08/02/06: beazley |
+ Fixed a problem related to the handling of default actions in LALR(1) |
+ parsing. If you experienced subtle and/or bizarre behavior when trying |
+ to use the LALR(1) engine, this may correct those problems. Patch |
+ contributed by Russ Cox. Note: This patch has been superceded by |
+ revisions for LALR(1) parsing in Ply-2.0. |
+ |
+08/02/06: beazley |
+ Added support for slicing of productions in yacc. |
+ Patch contributed by Patrick Mezard. |
+ |
+Version 1.7 |
+------------------------------ |
+03/02/06: beazley |
+ Fixed infinite recursion problem ReduceToTerminals() function that |
+ would sometimes come up in LALR(1) table generation. Reported by |
+ Markus Schoepflin. |
+ |
+03/01/06: beazley |
+ Added "reflags" argument to lex(). For example: |
+ |
+ lex.lex(reflags=re.UNICODE) |
+ |
+ This can be used to specify optional flags to the re.compile() function |
+ used inside the lexer. This may be necessary for special situations such |
+ as processing Unicode (e.g., if you want escapes like \w and \b to consult |
+ the Unicode character property database). The need for this suggested by |
+ Andreas Jung. |
+ |
+03/01/06: beazley |
+ Fixed a bug with an uninitialized variable on repeated instantiations of parser |
+ objects when the write_tables=0 argument was used. Reported by Michael Brown. |
+ |
+03/01/06: beazley |
+ Modified lex.py to accept Unicode strings both as the regular expressions for |
+ tokens and as input. Hopefully this is the only change needed for Unicode support. |
+ Patch contributed by Johan Dahl. |
+ |
+03/01/06: beazley |
+ Modified the class-based interface to work with new-style or old-style classes. |
+ Patch contributed by Michael Brown (although I tweaked it slightly so it would work |
+ with older versions of Python). |
+ |
+Version 1.6 |
+------------------------------ |
+05/27/05: beazley |
+ Incorporated patch contributed by Christopher Stawarz to fix an extremely |
+ devious bug in LALR(1) parser generation. This patch should fix problems |
+ numerous people reported with LALR parsing. |
+ |
+05/27/05: beazley |
+ Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav, |
+ and Thad Austin. |
+ |
+05/27/05: beazley |
+ Added outputdir option to yacc() to control output directory. Contributed |
+ by Christopher Stawarz. |
+ |
+05/27/05: beazley |
+ Added rununit.py test script to run tests using the Python unittest module. |
+ Contributed by Miki Tebeka. |
+ |
+Version 1.5 |
+------------------------------ |
+05/26/04: beazley |
+ Major enhancement. LALR(1) parsing support is now working. |
+ This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu) |
+ and optimized by David Beazley. To use LALR(1) parsing do |
+ the following: |
+ |
+ yacc.yacc(method="LALR") |
+ |
+ Computing LALR(1) parsing tables takes about twice as long as |
+ the default SLR method. However, LALR(1) allows you to handle |
+ more complex grammars. For example, the ANSI C grammar |
+ (in example/ansic) has 13 shift-reduce conflicts with SLR, but |
+ only has 1 shift-reduce conflict with LALR(1). |
+ |
+05/20/04: beazley |
+ Added a __len__ method to parser production lists. Can |
+ be used in parser rules like this: |
+ |
+ def p_somerule(p): |
+ """a : B C D |
+ | E F" |
+ if (len(p) == 3): |
+ # Must have been first rule |
+ elif (len(p) == 2): |
+ # Must be second rule |
+ |
+ Suggested by Joshua Gerth and others. |
+ |
+Version 1.4 |
+------------------------------ |
+04/23/04: beazley |
+ Incorporated a variety of patches contributed by Eric Raymond. |
+ These include: |
+ |
+ 0. Cleans up some comments so they don't wrap on an 80-column display. |
+ 1. Directs compiler errors to stderr where they belong. |
+ 2. Implements and documents automatic line counting when \n is ignored. |
+ 3. Changes the way progress messages are dumped when debugging is on. |
+ The new format is both less verbose and conveys more information than |
+ the old, including shift and reduce actions. |
+ |
+04/23/04: beazley |
+ Added a Python setup.py file to simply installation. Contributed |
+ by Adam Kerrison. |
+ |
+04/23/04: beazley |
+ Added patches contributed by Adam Kerrison. |
+ |
+ - Some output is now only shown when debugging is enabled. This |
+ means that PLY will be completely silent when not in debugging mode. |
+ |
+ - An optional parameter "write_tables" can be passed to yacc() to |
+ control whether or not parsing tables are written. By default, |
+ it is true, but it can be turned off if you don't want the yacc |
+ table file. Note: disabling this will cause yacc() to regenerate |
+ the parsing table each time. |
+ |
+04/23/04: beazley |
+ Added patches contributed by David McNab. This patch addes two |
+ features: |
+ |
+ - The parser can be supplied as a class instead of a module. |
+ For an example of this, see the example/classcalc directory. |
+ |
+ - Debugging output can be directed to a filename of the user's |
+ choice. Use |
+ |
+ yacc(debugfile="somefile.out") |
+ |
+ |
+Version 1.3 |
+------------------------------ |
+12/10/02: jmdyck |
+ Various minor adjustments to the code that Dave checked in today. |
+ Updated test/yacc_{inf,unused}.exp to reflect today's changes. |
+ |
+12/10/02: beazley |
+ Incorporated a variety of minor bug fixes to empty production |
+ handling and infinite recursion checking. Contributed by |
+ Michael Dyck. |
+ |
+12/10/02: beazley |
+ Removed bogus recover() method call in yacc.restart() |
+ |
+Version 1.2 |
+------------------------------ |
+11/27/02: beazley |
+ Lexer and parser objects are now available as an attribute |
+ of tokens and slices respectively. For example: |
+ |
+ def t_NUMBER(t): |
+ r'\d+' |
+ print t.lexer |
+ |
+ def p_expr_plus(t): |
+ 'expr: expr PLUS expr' |
+ print t.lexer |
+ print t.parser |
+ |
+ This can be used for state management (if needed). |
+ |
+10/31/02: beazley |
+ Modified yacc.py to work with Python optimize mode. To make |
+ this work, you need to use |
+ |
+ yacc.yacc(optimize=1) |
+ |
+ Furthermore, you need to first run Python in normal mode |
+ to generate the necessary parsetab.py files. After that, |
+ you can use python -O or python -OO. |
+ |
+ Note: optimized mode turns off a lot of error checking. |
+ Only use when you are sure that your grammar is working. |
+ Make sure parsetab.py is up to date! |
+ |
+10/30/02: beazley |
+ Added cloning of Lexer objects. For example: |
+ |
+ import copy |
+ l = lex.lex() |
+ lc = copy.copy(l) |
+ |
+ l.input("Some text") |
+ lc.input("Some other text") |
+ ... |
+ |
+ This might be useful if the same "lexer" is meant to |
+ be used in different contexts---or if multiple lexers |
+ are running concurrently. |
+ |
+10/30/02: beazley |
+ Fixed subtle bug with first set computation and empty productions. |
+ Patch submitted by Michael Dyck. |
+ |
+10/30/02: beazley |
+ Fixed error messages to use "filename:line: message" instead |
+ of "filename:line. message". This makes error reporting more |
+ friendly to emacs. Patch submitted by François Pinard. |
+ |
+10/30/02: beazley |
+ Improvements to parser.out file. Terminals and nonterminals |
+ are sorted instead of being printed in random order. |
+ Patch submitted by François Pinard. |
+ |
+10/30/02: beazley |
+ Improvements to parser.out file output. Rules are now printed |
+ in a way that's easier to understand. Contributed by Russ Cox. |
+ |
+10/30/02: beazley |
+ Added 'nonassoc' associativity support. This can be used |
+ to disable the chaining of operators like a < b < c. |
+ To use, simply specify 'nonassoc' in the precedence table |
+ |
+ precedence = ( |
+ ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators |
+ ('left', 'PLUS', 'MINUS'), |
+ ('left', 'TIMES', 'DIVIDE'), |
+ ('right', 'UMINUS'), # Unary minus operator |
+ ) |
+ |
+ Patch contributed by Russ Cox. |
+ |
+10/30/02: beazley |
+ Modified the lexer to provide optional support for Python -O and -OO |
+ modes. To make this work, Python *first* needs to be run in |
+ unoptimized mode. This reads the lexing information and creates a |
+ file "lextab.py". Then, run lex like this: |
+ |
+ # module foo.py |
+ ... |
+ ... |
+ lex.lex(optimize=1) |
+ |
+ Once the lextab file has been created, subsequent calls to |
+ lex.lex() will read data from the lextab file instead of using |
+ introspection. In optimized mode (-O, -OO) everything should |
+ work normally despite the loss of doc strings. |
+ |
+ To change the name of the file 'lextab.py' use the following: |
+ |
+ lex.lex(lextab="footab") |
+ |
+ (this creates a file footab.py) |
+ |
+ |
+Version 1.1 October 25, 2001 |
+------------------------------ |
+ |
+10/25/01: beazley |
+ Modified the table generator to produce much more compact data. |
+ This should greatly reduce the size of the parsetab.py[c] file. |
+ Caveat: the tables still need to be constructed so a little more |
+ work is done in parsetab on import. |
+ |
+10/25/01: beazley |
+ There may be a possible bug in the cycle detector that reports errors |
+ about infinite recursion. I'm having a little trouble tracking it |
+ down, but if you get this problem, you can disable the cycle |
+ detector as follows: |
+ |
+ yacc.yacc(check_recursion = 0) |
+ |
+10/25/01: beazley |
+ Fixed a bug in lex.py that sometimes caused illegal characters to be |
+ reported incorrectly. Reported by Sverre Jørgensen. |
+ |
+7/8/01 : beazley |
+ Added a reference to the underlying lexer object when tokens are handled by |
+ functions. The lexer is available as the 'lexer' attribute. This |
+ was added to provide better lexing support for languages such as Fortran |
+ where certain types of tokens can't be conveniently expressed as regular |
+ expressions (and where the tokenizing function may want to perform a |
+ little backtracking). Suggested by Pearu Peterson. |
+ |
+6/20/01 : beazley |
+ Modified yacc() function so that an optional starting symbol can be specified. |
+ For example: |
+ |
+ yacc.yacc(start="statement") |
+ |
+ Normally yacc always treats the first production rule as the starting symbol. |
+ However, if you are debugging your grammar it may be useful to specify |
+ an alternative starting symbol. Idea suggested by Rich Salz. |
+ |
+Version 1.0 June 18, 2001 |
+-------------------------- |
+Initial public offering |
+ |