| Index: tools/nixysa/third_party/ply-3.1/CHANGES
|
| ===================================================================
|
| --- tools/nixysa/third_party/ply-3.1/CHANGES (revision 0)
|
| +++ tools/nixysa/third_party/ply-3.1/CHANGES (revision 0)
|
| @@ -0,0 +1,1019 @@
|
| +Version 3.1
|
| +-----------------------------
|
| +02/28/09: beazley
|
| + Fixed broken start argument to yacc(). PLY-3.0 broke this
|
| + feature by accident.
|
| +
|
| +02/28/09: beazley
|
| + Fixed debugging output. yacc() no longer reports shift/reduce
|
| + or reduce/reduce conflicts if debugging is turned off. This
|
| + restores similar behavior in PLY-2.5. Reported by Andrew Waters.
|
| +
|
| +Version 3.0
|
| +-----------------------------
|
| +02/03/09: beazley
|
| + Fixed missing lexer attribute on certain tokens when
|
| + invoking the parser p_error() function. Reported by
|
| + Bart Whiteley.
|
| +
|
| +02/02/09: beazley
|
| + The lex() command now does all error-reporting and diagonistics
|
| + using the logging module interface. Pass in a Logger object
|
| + using the errorlog parameter to specify a different logger.
|
| +
|
| +02/02/09: beazley
|
| + Refactored ply.lex to use a more object-oriented and organized
|
| + approach to collecting lexer information.
|
| +
|
| +02/01/09: beazley
|
| + Removed the nowarn option from lex(). All output is controlled
|
| + by passing in a logger object. Just pass in a logger with a high
|
| + level setting to suppress output. This argument was never
|
| + documented to begin with so hopefully no one was relying upon it.
|
| +
|
| +02/01/09: beazley
|
| + Discovered and removed a dead if-statement in the lexer. This
|
| + resulted in a 6-7% speedup in lexing when I tested it.
|
| +
|
| +01/13/09: beazley
|
| + Minor change to the procedure for signalling a syntax error in a
|
| + production rule. A normal SyntaxError exception should be raised
|
| + instead of yacc.SyntaxError.
|
| +
|
| +01/13/09: beazley
|
| + Added a new method p.set_lineno(n,lineno) that can be used to set the
|
| + line number of symbol n in grammar rules. This simplifies manual
|
| + tracking of line numbers.
|
| +
|
| +01/11/09: beazley
|
| + Vastly improved debugging support for yacc.parse(). Instead of passing
|
| + debug as an integer, you can supply a Logging object (see the logging
|
| + module). Messages will be generated at the ERROR, INFO, and DEBUG
|
| + logging levels, each level providing progressively more information.
|
| + The debugging trace also shows states, grammar rule, values passed
|
| + into grammar rules, and the result of each reduction.
|
| +
|
| +01/09/09: beazley
|
| + The yacc() command now does all error-reporting and diagnostics using
|
| + the interface of the logging module. Use the errorlog parameter to
|
| + specify a logging object for error messages. Use the debuglog parameter
|
| + to specify a logging object for the 'parser.out' output.
|
| +
|
| +01/09/09: beazley
|
| + *HUGE* refactoring of the the ply.yacc() implementation. The high-level
|
| + user interface is backwards compatible, but the internals are completely
|
| + reorganized into classes. No more global variables. The internals
|
| + are also more extensible. For example, you can use the classes to
|
| + construct a LALR(1) parser in an entirely different manner than
|
| + what is currently the case. Documentation is forthcoming.
|
| +
|
| +01/07/09: beazley
|
| + Various cleanup and refactoring of yacc internals.
|
| +
|
| +01/06/09: beazley
|
| + Fixed a bug with precedence assignment. yacc was assigning the precedence
|
| + each rule based on the left-most token, when in fact, it should have been
|
| + using the right-most token. Reported by Bruce Frederiksen.
|
| +
|
| +11/27/08: beazley
|
| + Numerous changes to support Python 3.0 including removal of deprecated
|
| + statements (e.g., has_key) and the additional of compatibility code
|
| + to emulate features from Python 2 that have been removed, but which
|
| + are needed. Fixed the unit testing suite to work with Python 3.0.
|
| + The code should be backwards compatible with Python 2.
|
| +
|
| +11/26/08: beazley
|
| + Loosened the rules on what kind of objects can be passed in as the
|
| + "module" parameter to lex() and yacc(). Previously, you could only use
|
| + a module or an instance. Now, PLY just uses dir() to get a list of
|
| + symbols on whatever the object is without regard for its type.
|
| +
|
| +11/26/08: beazley
|
| + Changed all except: statements to be compatible with Python2.x/3.x syntax.
|
| +
|
| +11/26/08: beazley
|
| + Changed all raise Exception, value statements to raise Exception(value) for
|
| + forward compatibility.
|
| +
|
| +11/26/08: beazley
|
| + Removed all print statements from lex and yacc, using sys.stdout and sys.stderr
|
| + directly. Preparation for Python 3.0 support.
|
| +
|
| +11/04/08: beazley
|
| + Fixed a bug with referring to symbols on the the parsing stack using negative
|
| + indices.
|
| +
|
| +05/29/08: beazley
|
| + Completely revamped the testing system to use the unittest module for everything.
|
| + Added additional tests to cover new errors/warnings.
|
| +
|
| +Version 2.5
|
| +-----------------------------
|
| +05/28/08: beazley
|
| + Fixed a bug with writing lex-tables in optimized mode and start states.
|
| + Reported by Kevin Henry.
|
| +
|
| +Version 2.4
|
| +-----------------------------
|
| +05/04/08: beazley
|
| + A version number is now embedded in the table file signature so that
|
| + yacc can more gracefully accomodate changes to the output format
|
| + in the future.
|
| +
|
| +05/04/08: beazley
|
| + Removed undocumented .pushback() method on grammar productions. I'm
|
| + not sure this ever worked and can't recall ever using it. Might have
|
| + been an abandoned idea that never really got fleshed out. This
|
| + feature was never described or tested so removing it is hopefully
|
| + harmless.
|
| +
|
| +05/04/08: beazley
|
| + Added extra error checking to yacc() to detect precedence rules defined
|
| + for undefined terminal symbols. This allows yacc() to detect a potential
|
| + problem that can be really tricky to debug if no warning message or error
|
| + message is generated about it.
|
| +
|
| +05/04/08: beazley
|
| + lex() now has an outputdir that can specify the output directory for
|
| + tables when running in optimize mode. For example:
|
| +
|
| + lexer = lex.lex(optimize=True, lextab="ltab", outputdir="foo/bar")
|
| +
|
| + The behavior of specifying a table module and output directory are
|
| + more aligned with the behavior of yacc().
|
| +
|
| +05/04/08: beazley
|
| + [Issue 9]
|
| + Fixed filename bug in when specifying the modulename in lex() and yacc().
|
| + If you specified options such as the following:
|
| +
|
| + parser = yacc.yacc(tabmodule="foo.bar.parsetab",outputdir="foo/bar")
|
| +
|
| + yacc would create a file "foo.bar.parsetab.py" in the given directory.
|
| + Now, it simply generates a file "parsetab.py" in that directory.
|
| + Bug reported by cptbinho.
|
| +
|
| +05/04/08: beazley
|
| + Slight modification to lex() and yacc() to allow their table files
|
| + to be loaded from a previously loaded module. This might make
|
| + it easier to load the parsing tables from a complicated package
|
| + structure. For example:
|
| +
|
| + import foo.bar.spam.parsetab as parsetab
|
| + parser = yacc.yacc(tabmodule=parsetab)
|
| +
|
| + Note: lex and yacc will never regenerate the table file if used
|
| + in the form---you will get a warning message instead.
|
| + This idea suggested by Brian Clapper.
|
| +
|
| +
|
| +04/28/08: beazley
|
| + Fixed a big with p_error() functions being picked up correctly
|
| + when running in yacc(optimize=1) mode. Patch contributed by
|
| + Bart Whiteley.
|
| +
|
| +02/28/08: beazley
|
| + Fixed a bug with 'nonassoc' precedence rules. Basically the
|
| + non-precedence was being ignored and not producing the correct
|
| + run-time behavior in the parser.
|
| +
|
| +02/16/08: beazley
|
| + Slight relaxation of what the input() method to a lexer will
|
| + accept as a string. Instead of testing the input to see
|
| + if the input is a string or unicode string, it checks to see
|
| + if the input object looks like it contains string data.
|
| + This change makes it possible to pass string-like objects
|
| + in as input. For example, the object returned by mmap.
|
| +
|
| + import mmap, os
|
| + data = mmap.mmap(os.open(filename,os.O_RDONLY),
|
| + os.path.getsize(filename),
|
| + access=mmap.ACCESS_READ)
|
| + lexer.input(data)
|
| +
|
| +
|
| +11/29/07: beazley
|
| + Modification of ply.lex to allow token functions to aliased.
|
| + This is subtle, but it makes it easier to create libraries and
|
| + to reuse token specifications. For example, suppose you defined
|
| + a function like this:
|
| +
|
| + def number(t):
|
| + r'\d+'
|
| + t.value = int(t.value)
|
| + return t
|
| +
|
| + This change would allow you to define a token rule as follows:
|
| +
|
| + t_NUMBER = number
|
| +
|
| + In this case, the token type will be set to 'NUMBER' and use
|
| + the associated number() function to process tokens.
|
| +
|
| +11/28/07: beazley
|
| + Slight modification to lex and yacc to grab symbols from both
|
| + the local and global dictionaries of the caller. This
|
| + modification allows lexers and parsers to be defined using
|
| + inner functions and closures.
|
| +
|
| +11/28/07: beazley
|
| + Performance optimization: The lexer.lexmatch and t.lexer
|
| + attributes are no longer set for lexer tokens that are not
|
| + defined by functions. The only normal use of these attributes
|
| + would be in lexer rules that need to perform some kind of
|
| + special processing. Thus, it doesn't make any sense to set
|
| + them on every token.
|
| +
|
| + *** POTENTIAL INCOMPATIBILITY *** This might break code
|
| + that is mucking around with internal lexer state in some
|
| + sort of magical way.
|
| +
|
| +11/27/07: beazley
|
| + Added the ability to put the parser into error-handling mode
|
| + from within a normal production. To do this, simply raise
|
| + a yacc.SyntaxError exception like this:
|
| +
|
| + def p_some_production(p):
|
| + 'some_production : prod1 prod2'
|
| + ...
|
| + raise yacc.SyntaxError # Signal an error
|
| +
|
| + A number of things happen after this occurs:
|
| +
|
| + - The last symbol shifted onto the symbol stack is discarded
|
| + and parser state backed up to what it was before the
|
| + the rule reduction.
|
| +
|
| + - The current lookahead symbol is saved and replaced by
|
| + the 'error' symbol.
|
| +
|
| + - The parser enters error recovery mode where it tries
|
| + to either reduce the 'error' rule or it starts
|
| + discarding items off of the stack until the parser
|
| + resets.
|
| +
|
| + When an error is manually set, the parser does *not* call
|
| + the p_error() function (if any is defined).
|
| + *** NEW FEATURE *** Suggested on the mailing list
|
| +
|
| +11/27/07: beazley
|
| + Fixed structure bug in examples/ansic. Reported by Dion Blazakis.
|
| +
|
| +11/27/07: beazley
|
| + Fixed a bug in the lexer related to start conditions and ignored
|
| + token rules. If a rule was defined that changed state, but
|
| + returned no token, the lexer could be left in an inconsistent
|
| + state. Reported by
|
| +
|
| +11/27/07: beazley
|
| + Modified setup.py to support Python Eggs. Patch contributed by
|
| + Simon Cross.
|
| +
|
| +11/09/07: beazely
|
| + Fixed a bug in error handling in yacc. If a syntax error occurred and the
|
| + parser rolled the entire parse stack back, the parser would be left in in
|
| + inconsistent state that would cause it to trigger incorrect actions on
|
| + subsequent input. Reported by Ton Biegstraaten, Justin King, and others.
|
| +
|
| +11/09/07: beazley
|
| + Fixed a bug when passing empty input strings to yacc.parse(). This
|
| + would result in an error message about "No input given". Reported
|
| + by Andrew Dalke.
|
| +
|
| +Version 2.3
|
| +-----------------------------
|
| +02/20/07: beazley
|
| + Fixed a bug with character literals if the literal '.' appeared as the
|
| + last symbol of a grammar rule. Reported by Ales Smrcka.
|
| +
|
| +02/19/07: beazley
|
| + Warning messages are now redirected to stderr instead of being printed
|
| + to standard output.
|
| +
|
| +02/19/07: beazley
|
| + Added a warning message to lex.py if it detects a literal backslash
|
| + character inside the t_ignore declaration. This is to help
|
| + problems that might occur if someone accidentally defines t_ignore
|
| + as a Python raw string. For example:
|
| +
|
| + t_ignore = r' \t'
|
| +
|
| + The idea for this is from an email I received from David Cimimi who
|
| + reported bizarre behavior in lexing as a result of defining t_ignore
|
| + as a raw string by accident.
|
| +
|
| +02/18/07: beazley
|
| + Performance improvements. Made some changes to the internal
|
| + table organization and LR parser to improve parsing performance.
|
| +
|
| +02/18/07: beazley
|
| + Automatic tracking of line number and position information must now be
|
| + enabled by a special flag to parse(). For example:
|
| +
|
| + yacc.parse(data,tracking=True)
|
| +
|
| + In many applications, it's just not that important to have the
|
| + parser automatically track all line numbers. By making this an
|
| + optional feature, it allows the parser to run significantly faster
|
| + (more than a 20% speed increase in many cases). Note: positional
|
| + information is always available for raw tokens---this change only
|
| + applies to positional information associated with nonterminal
|
| + grammar symbols.
|
| + *** POTENTIAL INCOMPATIBILITY ***
|
| +
|
| +02/18/07: beazley
|
| + Yacc no longer supports extended slices of grammar productions.
|
| + However, it does support regular slices. For example:
|
| +
|
| + def p_foo(p):
|
| + '''foo: a b c d e'''
|
| + p[0] = p[1:3]
|
| +
|
| + This change is a performance improvement to the parser--it streamlines
|
| + normal access to the grammar values since slices are now handled in
|
| + a __getslice__() method as opposed to __getitem__().
|
| +
|
| +02/12/07: beazley
|
| + Fixed a bug in the handling of token names when combined with
|
| + start conditions. Bug reported by Todd O'Bryan.
|
| +
|
| +Version 2.2
|
| +------------------------------
|
| +11/01/06: beazley
|
| + Added lexpos() and lexspan() methods to grammar symbols. These
|
| + mirror the same functionality of lineno() and linespan(). For
|
| + example:
|
| +
|
| + def p_expr(p):
|
| + 'expr : expr PLUS expr'
|
| + p.lexpos(1) # Lexing position of left-hand-expression
|
| + p.lexpos(1) # Lexing position of PLUS
|
| + start,end = p.lexspan(3) # Lexing range of right hand expression
|
| +
|
| +11/01/06: beazley
|
| + Minor change to error handling. The recommended way to skip characters
|
| + in the input is to use t.lexer.skip() as shown here:
|
| +
|
| + def t_error(t):
|
| + print "Illegal character '%s'" % t.value[0]
|
| + t.lexer.skip(1)
|
| +
|
| + The old approach of just using t.skip(1) will still work, but won't
|
| + be documented.
|
| +
|
| +10/31/06: beazley
|
| + Discarded tokens can now be specified as simple strings instead of
|
| + functions. To do this, simply include the text "ignore_" in the
|
| + token declaration. For example:
|
| +
|
| + t_ignore_cppcomment = r'//.*'
|
| +
|
| + Previously, this had to be done with a function. For example:
|
| +
|
| + def t_ignore_cppcomment(t):
|
| + r'//.*'
|
| + pass
|
| +
|
| + If start conditions/states are being used, state names should appear
|
| + before the "ignore_" text.
|
| +
|
| +10/19/06: beazley
|
| + The Lex module now provides support for flex-style start conditions
|
| + as described at http://www.gnu.org/software/flex/manual/html_chapter/flex_11.html.
|
| + Please refer to this document to understand this change note. Refer to
|
| + the PLY documentation for PLY-specific explanation of how this works.
|
| +
|
| + To use start conditions, you first need to declare a set of states in
|
| + your lexer file:
|
| +
|
| + states = (
|
| + ('foo','exclusive'),
|
| + ('bar','inclusive')
|
| + )
|
| +
|
| + This serves the same role as the %s and %x specifiers in flex.
|
| +
|
| + One a state has been declared, tokens for that state can be
|
| + declared by defining rules of the form t_state_TOK. For example:
|
| +
|
| + t_PLUS = '\+' # Rule defined in INITIAL state
|
| + t_foo_NUM = '\d+' # Rule defined in foo state
|
| + t_bar_NUM = '\d+' # Rule defined in bar state
|
| +
|
| + t_foo_bar_NUM = '\d+' # Rule defined in both foo and bar
|
| + t_ANY_NUM = '\d+' # Rule defined in all states
|
| +
|
| + In addition to defining tokens for each state, the t_ignore and t_error
|
| + specifications can be customized for specific states. For example:
|
| +
|
| + t_foo_ignore = " " # Ignored characters for foo state
|
| + def t_bar_error(t):
|
| + # Handle errors in bar state
|
| +
|
| + With token rules, the following methods can be used to change states
|
| +
|
| + def t_TOKNAME(t):
|
| + t.lexer.begin('foo') # Begin state 'foo'
|
| + t.lexer.push_state('foo') # Begin state 'foo', push old state
|
| + # onto a stack
|
| + t.lexer.pop_state() # Restore previous state
|
| + t.lexer.current_state() # Returns name of current state
|
| +
|
| + These methods mirror the BEGIN(), yy_push_state(), yy_pop_state(), and
|
| + yy_top_state() functions in flex.
|
| +
|
| + The use of start states can be used as one way to write sub-lexers.
|
| + For example, the lexer or parser might instruct the lexer to start
|
| + generating a different set of tokens depending on the context.
|
| +
|
| + example/yply/ylex.py shows the use of start states to grab C/C++
|
| + code fragments out of traditional yacc specification files.
|
| +
|
| + *** NEW FEATURE *** Suggested by Daniel Larraz with whom I also
|
| + discussed various aspects of the design.
|
| +
|
| +10/19/06: beazley
|
| + Minor change to the way in which yacc.py was reporting shift/reduce
|
| + conflicts. Although the underlying LALR(1) algorithm was correct,
|
| + PLY was under-reporting the number of conflicts compared to yacc/bison
|
| + when precedence rules were in effect. This change should make PLY
|
| + report the same number of conflicts as yacc.
|
| +
|
| +10/19/06: beazley
|
| + Modified yacc so that grammar rules could also include the '-'
|
| + character. For example:
|
| +
|
| + def p_expr_list(p):
|
| + 'expression-list : expression-list expression'
|
| +
|
| + Suggested by Oldrich Jedlicka.
|
| +
|
| +10/18/06: beazley
|
| + Attribute lexer.lexmatch added so that token rules can access the re
|
| + match object that was generated. For example:
|
| +
|
| + def t_FOO(t):
|
| + r'some regex'
|
| + m = t.lexer.lexmatch
|
| + # Do something with m
|
| +
|
| +
|
| + This may be useful if you want to access named groups specified within
|
| + the regex for a specific token. Suggested by Oldrich Jedlicka.
|
| +
|
| +10/16/06: beazley
|
| + Changed the error message that results if an illegal character
|
| + is encountered and no default error function is defined in lex.
|
| + The exception is now more informative about the actual cause of
|
| + the error.
|
| +
|
| +Version 2.1
|
| +------------------------------
|
| +10/02/06: beazley
|
| + The last Lexer object built by lex() can be found in lex.lexer.
|
| + The last Parser object built by yacc() can be found in yacc.parser.
|
| +
|
| +10/02/06: beazley
|
| + New example added: examples/yply
|
| +
|
| + This example uses PLY to convert Unix-yacc specification files to
|
| + PLY programs with the same grammar. This may be useful if you
|
| + want to convert a grammar from bison/yacc to use with PLY.
|
| +
|
| +10/02/06: beazley
|
| + Added support for a start symbol to be specified in the yacc
|
| + input file itself. Just do this:
|
| +
|
| + start = 'name'
|
| +
|
| + where 'name' matches some grammar rule. For example:
|
| +
|
| + def p_name(p):
|
| + 'name : A B C'
|
| + ...
|
| +
|
| + This mirrors the functionality of the yacc %start specifier.
|
| +
|
| +09/30/06: beazley
|
| + Some new examples added.:
|
| +
|
| + examples/GardenSnake : A simple indentation based language similar
|
| + to Python. Shows how you might handle
|
| + whitespace. Contributed by Andrew Dalke.
|
| +
|
| + examples/BASIC : An implementation of 1964 Dartmouth BASIC.
|
| + Contributed by Dave against his better
|
| + judgement.
|
| +
|
| +09/28/06: beazley
|
| + Minor patch to allow named groups to be used in lex regular
|
| + expression rules. For example:
|
| +
|
| + t_QSTRING = r'''(?P<quote>['"]).*?(?P=quote)'''
|
| +
|
| + Patch submitted by Adam Ring.
|
| +
|
| +09/28/06: beazley
|
| + LALR(1) is now the default parsing method. To use SLR, use
|
| + yacc.yacc(method="SLR"). Note: there is no performance impact
|
| + on parsing when using LALR(1) instead of SLR. However, constructing
|
| + the parsing tables will take a little longer.
|
| +
|
| +09/26/06: beazley
|
| + Change to line number tracking. To modify line numbers, modify
|
| + the line number of the lexer itself. For example:
|
| +
|
| + def t_NEWLINE(t):
|
| + r'\n'
|
| + t.lexer.lineno += 1
|
| +
|
| + This modification is both cleanup and a performance optimization.
|
| + In past versions, lex was monitoring every token for changes in
|
| + the line number. This extra processing is unnecessary for a vast
|
| + majority of tokens. Thus, this new approach cleans it up a bit.
|
| +
|
| + *** POTENTIAL INCOMPATIBILITY ***
|
| + You will need to change code in your lexer that updates the line
|
| + number. For example, "t.lineno += 1" becomes "t.lexer.lineno += 1"
|
| +
|
| +09/26/06: beazley
|
| + Added the lexing position to tokens as an attribute lexpos. This
|
| + is the raw index into the input text at which a token appears.
|
| + This information can be used to compute column numbers and other
|
| + details (e.g., scan backwards from lexpos to the first newline
|
| + to get a column position).
|
| +
|
| +09/25/06: beazley
|
| + Changed the name of the __copy__() method on the Lexer class
|
| + to clone(). This is used to clone a Lexer object (e.g., if
|
| + you're running different lexers at the same time).
|
| +
|
| +09/21/06: beazley
|
| + Limitations related to the use of the re module have been eliminated.
|
| + Several users reported problems with regular expressions exceeding
|
| + more than 100 named groups. To solve this, lex.py is now capable
|
| + of automatically splitting its master regular regular expression into
|
| + smaller expressions as needed. This should, in theory, make it
|
| + possible to specify an arbitrarily large number of tokens.
|
| +
|
| +09/21/06: beazley
|
| + Improved error checking in lex.py. Rules that match the empty string
|
| + are now rejected (otherwise they cause the lexer to enter an infinite
|
| + loop). An extra check for rules containing '#' has also been added.
|
| + Since lex compiles regular expressions in verbose mode, '#' is interpreted
|
| + as a regex comment, it is critical to use '\#' instead.
|
| +
|
| +09/18/06: beazley
|
| + Added a @TOKEN decorator function to lex.py that can be used to
|
| + define token rules where the documentation string might be computed
|
| + in some way.
|
| +
|
| + digit = r'([0-9])'
|
| + nondigit = r'([_A-Za-z])'
|
| + identifier = r'(' + nondigit + r'(' + digit + r'|' + nondigit + r')*)'
|
| +
|
| + from ply.lex import TOKEN
|
| +
|
| + @TOKEN(identifier)
|
| + def t_ID(t):
|
| + # Do whatever
|
| +
|
| + The @TOKEN decorator merely sets the documentation string of the
|
| + associated token function as needed for lex to work.
|
| +
|
| + Note: An alternative solution is the following:
|
| +
|
| + def t_ID(t):
|
| + # Do whatever
|
| +
|
| + t_ID.__doc__ = identifier
|
| +
|
| + Note: Decorators require the use of Python 2.4 or later. If compatibility
|
| + with old versions is needed, use the latter solution.
|
| +
|
| + The need for this feature was suggested by Cem Karan.
|
| +
|
| +09/14/06: beazley
|
| + Support for single-character literal tokens has been added to yacc.
|
| + These literals must be enclosed in quotes. For example:
|
| +
|
| + def p_expr(p):
|
| + "expr : expr '+' expr"
|
| + ...
|
| +
|
| + def p_expr(p):
|
| + 'expr : expr "-" expr'
|
| + ...
|
| +
|
| + In addition to this, it is necessary to tell the lexer module about
|
| + literal characters. This is done by defining the variable 'literals'
|
| + as a list of characters. This should be defined in the module that
|
| + invokes the lex.lex() function. For example:
|
| +
|
| + literals = ['+','-','*','/','(',')','=']
|
| +
|
| + or simply
|
| +
|
| + literals = '+=*/()='
|
| +
|
| + It is important to note that literals can only be a single character.
|
| + When the lexer fails to match a token using its normal regular expression
|
| + rules, it will check the current character against the literal list.
|
| + If found, it will be returned with a token type set to match the literal
|
| + character. Otherwise, an illegal character will be signalled.
|
| +
|
| +
|
| +09/14/06: beazley
|
| + Modified PLY to install itself as a proper Python package called 'ply'.
|
| + This will make it a little more friendly to other modules. This
|
| + changes the usage of PLY only slightly. Just do this to import the
|
| + modules
|
| +
|
| + import ply.lex as lex
|
| + import ply.yacc as yacc
|
| +
|
| + Alternatively, you can do this:
|
| +
|
| + from ply import *
|
| +
|
| + Which imports both the lex and yacc modules.
|
| + Change suggested by Lee June.
|
| +
|
| +09/13/06: beazley
|
| + Changed the handling of negative indices when used in production rules.
|
| + A negative production index now accesses already parsed symbols on the
|
| + parsing stack. For example,
|
| +
|
| + def p_foo(p):
|
| + "foo: A B C D"
|
| + print p[1] # Value of 'A' symbol
|
| + print p[2] # Value of 'B' symbol
|
| + print p[-1] # Value of whatever symbol appears before A
|
| + # on the parsing stack.
|
| +
|
| + p[0] = some_val # Sets the value of the 'foo' grammer symbol
|
| +
|
| + This behavior makes it easier to work with embedded actions within the
|
| + parsing rules. For example, in C-yacc, it is possible to write code like
|
| + this:
|
| +
|
| + bar: A { printf("seen an A = %d\n", $1); } B { do_stuff; }
|
| +
|
| + In this example, the printf() code executes immediately after A has been
|
| + parsed. Within the embedded action code, $1 refers to the A symbol on
|
| + the stack.
|
| +
|
| + To perform this equivalent action in PLY, you need to write a pair
|
| + of rules like this:
|
| +
|
| + def p_bar(p):
|
| + "bar : A seen_A B"
|
| + do_stuff
|
| +
|
| + def p_seen_A(p):
|
| + "seen_A :"
|
| + print "seen an A =", p[-1]
|
| +
|
| + The second rule "seen_A" is merely a empty production which should be
|
| + reduced as soon as A is parsed in the "bar" rule above. The use
|
| + of the negative index p[-1] is used to access whatever symbol appeared
|
| + before the seen_A symbol.
|
| +
|
| + This feature also makes it possible to support inherited attributes.
|
| + For example:
|
| +
|
| + def p_decl(p):
|
| + "decl : scope name"
|
| +
|
| + def p_scope(p):
|
| + """scope : GLOBAL
|
| + | LOCAL"""
|
| + p[0] = p[1]
|
| +
|
| + def p_name(p):
|
| + "name : ID"
|
| + if p[-1] == "GLOBAL":
|
| + # ...
|
| + else if p[-1] == "LOCAL":
|
| + #...
|
| +
|
| + In this case, the name rule is inheriting an attribute from the
|
| + scope declaration that precedes it.
|
| +
|
| + *** POTENTIAL INCOMPATIBILITY ***
|
| + If you are currently using negative indices within existing grammar rules,
|
| + your code will break. This should be extremely rare if non-existent in
|
| + most cases. The argument to various grammar rules is not usually not
|
| + processed in the same way as a list of items.
|
| +
|
| +Version 2.0
|
| +------------------------------
|
| +09/07/06: beazley
|
| + Major cleanup and refactoring of the LR table generation code. Both SLR
|
| + and LALR(1) table generation is now performed by the same code base with
|
| + only minor extensions for extra LALR(1) processing.
|
| +
|
| +09/07/06: beazley
|
| + Completely reimplemented the entire LALR(1) parsing engine to use the
|
| + DeRemer and Pennello algorithm for calculating lookahead sets. This
|
| + significantly improves the performance of generating LALR(1) tables
|
| + and has the added feature of actually working correctly! If you
|
| + experienced weird behavior with LALR(1) in prior releases, this should
|
| + hopefully resolve all of those problems. Many thanks to
|
| + Andrew Waters and Markus Schoepflin for submitting bug reports
|
| + and helping me test out the revised LALR(1) support.
|
| +
|
| +Version 1.8
|
| +------------------------------
|
| +08/02/06: beazley
|
| + Fixed a problem related to the handling of default actions in LALR(1)
|
| + parsing. If you experienced subtle and/or bizarre behavior when trying
|
| + to use the LALR(1) engine, this may correct those problems. Patch
|
| + contributed by Russ Cox. Note: This patch has been superceded by
|
| + revisions for LALR(1) parsing in Ply-2.0.
|
| +
|
| +08/02/06: beazley
|
| + Added support for slicing of productions in yacc.
|
| + Patch contributed by Patrick Mezard.
|
| +
|
| +Version 1.7
|
| +------------------------------
|
| +03/02/06: beazley
|
| + Fixed infinite recursion problem ReduceToTerminals() function that
|
| + would sometimes come up in LALR(1) table generation. Reported by
|
| + Markus Schoepflin.
|
| +
|
| +03/01/06: beazley
|
| + Added "reflags" argument to lex(). For example:
|
| +
|
| + lex.lex(reflags=re.UNICODE)
|
| +
|
| + This can be used to specify optional flags to the re.compile() function
|
| + used inside the lexer. This may be necessary for special situations such
|
| + as processing Unicode (e.g., if you want escapes like \w and \b to consult
|
| + the Unicode character property database). The need for this suggested by
|
| + Andreas Jung.
|
| +
|
| +03/01/06: beazley
|
| + Fixed a bug with an uninitialized variable on repeated instantiations of parser
|
| + objects when the write_tables=0 argument was used. Reported by Michael Brown.
|
| +
|
| +03/01/06: beazley
|
| + Modified lex.py to accept Unicode strings both as the regular expressions for
|
| + tokens and as input. Hopefully this is the only change needed for Unicode support.
|
| + Patch contributed by Johan Dahl.
|
| +
|
| +03/01/06: beazley
|
| + Modified the class-based interface to work with new-style or old-style classes.
|
| + Patch contributed by Michael Brown (although I tweaked it slightly so it would work
|
| + with older versions of Python).
|
| +
|
| +Version 1.6
|
| +------------------------------
|
| +05/27/05: beazley
|
| + Incorporated patch contributed by Christopher Stawarz to fix an extremely
|
| + devious bug in LALR(1) parser generation. This patch should fix problems
|
| + numerous people reported with LALR parsing.
|
| +
|
| +05/27/05: beazley
|
| + Fixed problem with lex.py copy constructor. Reported by Dave Aitel, Aaron Lav,
|
| + and Thad Austin.
|
| +
|
| +05/27/05: beazley
|
| + Added outputdir option to yacc() to control output directory. Contributed
|
| + by Christopher Stawarz.
|
| +
|
| +05/27/05: beazley
|
| + Added rununit.py test script to run tests using the Python unittest module.
|
| + Contributed by Miki Tebeka.
|
| +
|
| +Version 1.5
|
| +------------------------------
|
| +05/26/04: beazley
|
| + Major enhancement. LALR(1) parsing support is now working.
|
| + This feature was implemented by Elias Ioup (ezioup@alumni.uchicago.edu)
|
| + and optimized by David Beazley. To use LALR(1) parsing do
|
| + the following:
|
| +
|
| + yacc.yacc(method="LALR")
|
| +
|
| + Computing LALR(1) parsing tables takes about twice as long as
|
| + the default SLR method. However, LALR(1) allows you to handle
|
| + more complex grammars. For example, the ANSI C grammar
|
| + (in example/ansic) has 13 shift-reduce conflicts with SLR, but
|
| + only has 1 shift-reduce conflict with LALR(1).
|
| +
|
| +05/20/04: beazley
|
| + Added a __len__ method to parser production lists. Can
|
| + be used in parser rules like this:
|
| +
|
| + def p_somerule(p):
|
| + """a : B C D
|
| + | E F"
|
| + if (len(p) == 3):
|
| + # Must have been first rule
|
| + elif (len(p) == 2):
|
| + # Must be second rule
|
| +
|
| + Suggested by Joshua Gerth and others.
|
| +
|
| +Version 1.4
|
| +------------------------------
|
| +04/23/04: beazley
|
| + Incorporated a variety of patches contributed by Eric Raymond.
|
| + These include:
|
| +
|
| + 0. Cleans up some comments so they don't wrap on an 80-column display.
|
| + 1. Directs compiler errors to stderr where they belong.
|
| + 2. Implements and documents automatic line counting when \n is ignored.
|
| + 3. Changes the way progress messages are dumped when debugging is on.
|
| + The new format is both less verbose and conveys more information than
|
| + the old, including shift and reduce actions.
|
| +
|
| +04/23/04: beazley
|
| + Added a Python setup.py file to simply installation. Contributed
|
| + by Adam Kerrison.
|
| +
|
| +04/23/04: beazley
|
| + Added patches contributed by Adam Kerrison.
|
| +
|
| + - Some output is now only shown when debugging is enabled. This
|
| + means that PLY will be completely silent when not in debugging mode.
|
| +
|
| + - An optional parameter "write_tables" can be passed to yacc() to
|
| + control whether or not parsing tables are written. By default,
|
| + it is true, but it can be turned off if you don't want the yacc
|
| + table file. Note: disabling this will cause yacc() to regenerate
|
| + the parsing table each time.
|
| +
|
| +04/23/04: beazley
|
| + Added patches contributed by David McNab. This patch addes two
|
| + features:
|
| +
|
| + - The parser can be supplied as a class instead of a module.
|
| + For an example of this, see the example/classcalc directory.
|
| +
|
| + - Debugging output can be directed to a filename of the user's
|
| + choice. Use
|
| +
|
| + yacc(debugfile="somefile.out")
|
| +
|
| +
|
| +Version 1.3
|
| +------------------------------
|
| +12/10/02: jmdyck
|
| + Various minor adjustments to the code that Dave checked in today.
|
| + Updated test/yacc_{inf,unused}.exp to reflect today's changes.
|
| +
|
| +12/10/02: beazley
|
| + Incorporated a variety of minor bug fixes to empty production
|
| + handling and infinite recursion checking. Contributed by
|
| + Michael Dyck.
|
| +
|
| +12/10/02: beazley
|
| + Removed bogus recover() method call in yacc.restart()
|
| +
|
| +Version 1.2
|
| +------------------------------
|
| +11/27/02: beazley
|
| + Lexer and parser objects are now available as an attribute
|
| + of tokens and slices respectively. For example:
|
| +
|
| + def t_NUMBER(t):
|
| + r'\d+'
|
| + print t.lexer
|
| +
|
| + def p_expr_plus(t):
|
| + 'expr: expr PLUS expr'
|
| + print t.lexer
|
| + print t.parser
|
| +
|
| + This can be used for state management (if needed).
|
| +
|
| +10/31/02: beazley
|
| + Modified yacc.py to work with Python optimize mode. To make
|
| + this work, you need to use
|
| +
|
| + yacc.yacc(optimize=1)
|
| +
|
| + Furthermore, you need to first run Python in normal mode
|
| + to generate the necessary parsetab.py files. After that,
|
| + you can use python -O or python -OO.
|
| +
|
| + Note: optimized mode turns off a lot of error checking.
|
| + Only use when you are sure that your grammar is working.
|
| + Make sure parsetab.py is up to date!
|
| +
|
| +10/30/02: beazley
|
| + Added cloning of Lexer objects. For example:
|
| +
|
| + import copy
|
| + l = lex.lex()
|
| + lc = copy.copy(l)
|
| +
|
| + l.input("Some text")
|
| + lc.input("Some other text")
|
| + ...
|
| +
|
| + This might be useful if the same "lexer" is meant to
|
| + be used in different contexts---or if multiple lexers
|
| + are running concurrently.
|
| +
|
| +10/30/02: beazley
|
| + Fixed subtle bug with first set computation and empty productions.
|
| + Patch submitted by Michael Dyck.
|
| +
|
| +10/30/02: beazley
|
| + Fixed error messages to use "filename:line: message" instead
|
| + of "filename:line. message". This makes error reporting more
|
| + friendly to emacs. Patch submitted by François Pinard.
|
| +
|
| +10/30/02: beazley
|
| + Improvements to parser.out file. Terminals and nonterminals
|
| + are sorted instead of being printed in random order.
|
| + Patch submitted by François Pinard.
|
| +
|
| +10/30/02: beazley
|
| + Improvements to parser.out file output. Rules are now printed
|
| + in a way that's easier to understand. Contributed by Russ Cox.
|
| +
|
| +10/30/02: beazley
|
| + Added 'nonassoc' associativity support. This can be used
|
| + to disable the chaining of operators like a < b < c.
|
| + To use, simply specify 'nonassoc' in the precedence table
|
| +
|
| + precedence = (
|
| + ('nonassoc', 'LESSTHAN', 'GREATERTHAN'), # Nonassociative operators
|
| + ('left', 'PLUS', 'MINUS'),
|
| + ('left', 'TIMES', 'DIVIDE'),
|
| + ('right', 'UMINUS'), # Unary minus operator
|
| + )
|
| +
|
| + Patch contributed by Russ Cox.
|
| +
|
| +10/30/02: beazley
|
| + Modified the lexer to provide optional support for Python -O and -OO
|
| + modes. To make this work, Python *first* needs to be run in
|
| + unoptimized mode. This reads the lexing information and creates a
|
| + file "lextab.py". Then, run lex like this:
|
| +
|
| + # module foo.py
|
| + ...
|
| + ...
|
| + lex.lex(optimize=1)
|
| +
|
| + Once the lextab file has been created, subsequent calls to
|
| + lex.lex() will read data from the lextab file instead of using
|
| + introspection. In optimized mode (-O, -OO) everything should
|
| + work normally despite the loss of doc strings.
|
| +
|
| + To change the name of the file 'lextab.py' use the following:
|
| +
|
| + lex.lex(lextab="footab")
|
| +
|
| + (this creates a file footab.py)
|
| +
|
| +
|
| +Version 1.1 October 25, 2001
|
| +------------------------------
|
| +
|
| +10/25/01: beazley
|
| + Modified the table generator to produce much more compact data.
|
| + This should greatly reduce the size of the parsetab.py[c] file.
|
| + Caveat: the tables still need to be constructed so a little more
|
| + work is done in parsetab on import.
|
| +
|
| +10/25/01: beazley
|
| + There may be a possible bug in the cycle detector that reports errors
|
| + about infinite recursion. I'm having a little trouble tracking it
|
| + down, but if you get this problem, you can disable the cycle
|
| + detector as follows:
|
| +
|
| + yacc.yacc(check_recursion = 0)
|
| +
|
| +10/25/01: beazley
|
| + Fixed a bug in lex.py that sometimes caused illegal characters to be
|
| + reported incorrectly. Reported by Sverre Jørgensen.
|
| +
|
| +7/8/01 : beazley
|
| + Added a reference to the underlying lexer object when tokens are handled by
|
| + functions. The lexer is available as the 'lexer' attribute. This
|
| + was added to provide better lexing support for languages such as Fortran
|
| + where certain types of tokens can't be conveniently expressed as regular
|
| + expressions (and where the tokenizing function may want to perform a
|
| + little backtracking). Suggested by Pearu Peterson.
|
| +
|
| +6/20/01 : beazley
|
| + Modified yacc() function so that an optional starting symbol can be specified.
|
| + For example:
|
| +
|
| + yacc.yacc(start="statement")
|
| +
|
| + Normally yacc always treats the first production rule as the starting symbol.
|
| + However, if you are debugging your grammar it may be useful to specify
|
| + an alternative starting symbol. Idea suggested by Rich Salz.
|
| +
|
| +Version 1.0 June 18, 2001
|
| +--------------------------
|
| +Initial public offering
|
| +
|
|
|