OLD | NEW |
(Empty) | |
| 1 <html> |
| 2 <head> |
| 3 <title>PLY Internals</title> |
| 4 </head> |
| 5 <body bgcolor="#ffffff"> |
| 6 |
| 7 <h1>PLY Internals</h1> |
| 8 |
| 9 <b> |
| 10 David M. Beazley <br> |
| 11 dave@dabeaz.com<br> |
| 12 </b> |
| 13 |
| 14 <p> |
| 15 <b>PLY Version: 3.0</b> |
| 16 <p> |
| 17 |
| 18 <!-- INDEX --> |
| 19 <div class="sectiontoc"> |
| 20 <ul> |
| 21 <li><a href="#internal_nn1">Introduction</a> |
| 22 <li><a href="#internal_nn2">Grammar Class</a> |
| 23 <li><a href="#internal_nn3">Productions</a> |
| 24 <li><a href="#internal_nn4">LRItems</a> |
| 25 <li><a href="#internal_nn5">LRTable</a> |
| 26 <li><a href="#internal_nn6">LRGeneratedTable</a> |
| 27 <li><a href="#internal_nn7">LRParser</a> |
| 28 <li><a href="#internal_nn8">ParserReflect</a> |
| 29 <li><a href="#internal_nn9">High-level operation</a> |
| 30 </ul> |
| 31 </div> |
| 32 <!-- INDEX --> |
| 33 |
| 34 |
| 35 <H2><a name="internal_nn1"></a>1. Introduction</H2> |
| 36 |
| 37 |
| 38 This document describes classes and functions that make up the internal |
| 39 operation of PLY. Using this programming interface, it is possible to |
| 40 manually build an parser using a different interface specification |
| 41 than what PLY normally uses. For example, you could build a gramar |
| 42 from information parsed in a completely different input format. Some of |
| 43 these objects may be useful for building more advanced parsing engines |
| 44 such as GLR. |
| 45 |
| 46 <p> |
| 47 It should be stressed that using PLY at this level is not for the |
| 48 faint of heart. Generally, it's assumed that you know a bit of |
| 49 the underlying compiler theory and how an LR parser is put together. |
| 50 |
| 51 <H2><a name="internal_nn2"></a>2. Grammar Class</H2> |
| 52 |
| 53 |
| 54 The file <tt>ply.yacc</tt> defines a class <tt>Grammar</tt> that |
| 55 is used to hold and manipulate information about a grammar |
| 56 specification. It encapsulates the same basic information |
| 57 about a grammar that is put into a YACC file including |
| 58 the list of tokens, precedence rules, and grammar rules. |
| 59 Various operations are provided to perform different validations |
| 60 on the grammar. In addition, there are operations to compute |
| 61 the first and follow sets that are needed by the various table |
| 62 generation algorithms. |
| 63 |
| 64 <p> |
| 65 <tt><b>Grammar(terminals)</b></tt> |
| 66 |
| 67 <blockquote> |
| 68 Creates a new grammar object. <tt>terminals</tt> is a list of strings |
| 69 specifying the terminals for the grammar. An instance <tt>g</tt> of |
| 70 <tt>Grammar</tt> has the following methods: |
| 71 </blockquote> |
| 72 |
| 73 <p> |
| 74 <b><tt>g.set_precedence(term,assoc,level)</tt></b> |
| 75 <blockquote> |
| 76 Sets the precedence level and associativity for a given terminal <tt>term</tt>.
|
| 77 <tt>assoc</tt> is one of <tt>'right'</tt>, |
| 78 <tt>'left'</tt>, or <tt>'nonassoc'</tt> and <tt>level</tt> is a positive integer
. The higher |
| 79 the value of <tt>level</tt>, the higher the precedence. Here is an example of t
ypical |
| 80 precedence settings: |
| 81 |
| 82 <pre> |
| 83 g.set_precedence('PLUS', 'left',1) |
| 84 g.set_precedence('MINUS', 'left',1) |
| 85 g.set_precedence('TIMES', 'left',2) |
| 86 g.set_precedence('DIVIDE','left',2) |
| 87 g.set_precedence('UMINUS','left',3) |
| 88 </pre> |
| 89 |
| 90 This method must be called prior to adding any productions to the |
| 91 grammar with <tt>g.add_production()</tt>. The precedence of individual grammar |
| 92 rules is determined by the precedence of the right-most terminal. |
| 93 |
| 94 </blockquote> |
| 95 <p> |
| 96 <b><tt>g.add_production(name,syms,func=None,file='',line=0)</tt></b> |
| 97 <blockquote> |
| 98 Adds a new grammar rule. <tt>name</tt> is the name of the rule, |
| 99 <tt>syms</tt> is a list of symbols making up the right hand |
| 100 side of the rule, <tt>func</tt> is the function to call when |
| 101 reducing the rule. <tt>file</tt> and <tt>line</tt> specify |
| 102 the filename and line number of the rule and are used for |
| 103 generating error messages. |
| 104 |
| 105 <p> |
| 106 The list of symbols in <tt>syms</tt> may include character |
| 107 literals and <tt>%prec</tt> specifiers. Here are some |
| 108 examples: |
| 109 |
| 110 <pre> |
| 111 g.add_production('expr',['expr','PLUS','term'],func,file,line) |
| 112 g.add_production('expr',['expr','"+"','term'],func,file,line) |
| 113 g.add_production('expr',['MINUS','expr','%prec','UMINUS'],func,file,line) |
| 114 </pre> |
| 115 |
| 116 <p> |
| 117 If any kind of error is detected, a <tt>GrammarError</tt> exception |
| 118 is raised with a message indicating the reason for the failure. |
| 119 </blockquote> |
| 120 |
| 121 <p> |
| 122 <b><tt>g.set_start(start=None)</tt></b> |
| 123 <blockquote> |
| 124 Sets the starting rule for the grammar. <tt>start</tt> is a string |
| 125 specifying the name of the start rule. If <tt>start</tt> is omitted, |
| 126 the first grammar rule added with <tt>add_production()</tt> is taken to be |
| 127 the starting rule. This method must always be called after all |
| 128 productions have been added. |
| 129 </blockquote> |
| 130 |
| 131 <p> |
| 132 <b><tt>g.find_unreachable()</tt></b> |
| 133 <blockquote> |
| 134 Diagnostic function. Returns a list of all unreachable non-terminals |
| 135 defined in the grammar. This is used to identify inactive parts of |
| 136 the grammar specification. |
| 137 </blockquote> |
| 138 |
| 139 <p> |
| 140 <b><tt>g.infinite_cycle()</tt></b> |
| 141 <blockquote> |
| 142 Diagnostic function. Returns a list of all non-terminals in the |
| 143 grammar that result in an infinite cycle. This condition occurs if |
| 144 there is no way for a grammar rule to expand to a string containing |
| 145 only terminal symbols. |
| 146 </blockquote> |
| 147 |
| 148 <p> |
| 149 <b><tt>g.undefined_symbols()</tt></b> |
| 150 <blockquote> |
| 151 Diagnostic function. Returns a list of tuples <tt>(name, prod)</tt> |
| 152 corresponding to undefined symbols in the grammar. <tt>name</tt> is the |
| 153 name of the undefined symbol and <tt>prod</tt> is an instance of |
| 154 <tt>Production</tt> which has information about the production rule |
| 155 where the undefined symbol was used. |
| 156 </blockquote> |
| 157 |
| 158 <p> |
| 159 <b><tt>g.unused_terminals()</tt></b> |
| 160 <blockquote> |
| 161 Diagnostic function. Returns a list of terminals that were defined, |
| 162 but never used in the grammar. |
| 163 </blockquote> |
| 164 |
| 165 <p> |
| 166 <b><tt>g.unused_rules()</tt></b> |
| 167 <blockquote> |
| 168 Diagnostic function. Returns a list of <tt>Production</tt> instances |
| 169 corresponding to production rules that were defined in the grammar, |
| 170 but never used anywhere. This is slightly different |
| 171 than <tt>find_unreachable()</tt>. |
| 172 </blockquote> |
| 173 |
| 174 <p> |
| 175 <b><tt>g.unused_precedence()</tt></b> |
| 176 <blockquote> |
| 177 Diagnostic function. Returns a list of tuples <tt>(term, assoc)</tt> |
| 178 corresponding to precedence rules that were set, but never used the |
| 179 grammar. <tt>term</tt> is the terminal name and <tt>assoc</tt> is the |
| 180 precedence associativity (e.g., <tt>'left'</tt>, <tt>'right'</tt>, |
| 181 or <tt>'nonassoc'</tt>. |
| 182 </blockquote> |
| 183 |
| 184 <p> |
| 185 <b><tt>g.compute_first()</tt></b> |
| 186 <blockquote> |
| 187 Compute all of the first sets for all symbols in the grammar. Returns a diction
ary |
| 188 mapping symbol names to a list of all first symbols. |
| 189 </blockquote> |
| 190 |
| 191 <p> |
| 192 <b><tt>g.compute_follow()</tt></b> |
| 193 <blockquote> |
| 194 Compute all of the follow sets for all non-terminals in the grammar. |
| 195 The follow set is the set of all possible symbols that might follow a |
| 196 given non-terminal. Returns a dictionary mapping non-terminal names |
| 197 to a list of symbols. |
| 198 </blockquote> |
| 199 |
| 200 <p> |
| 201 <b><tt>g.build_lritems()</tt></b> |
| 202 <blockquote> |
| 203 Calculates all of the LR items for all productions in the grammar. This |
| 204 step is required before using the grammar for any kind of table generation. |
| 205 See the section on LR items below. |
| 206 </blockquote> |
| 207 |
| 208 <p> |
| 209 The following attributes are set by the above methods and may be useful |
| 210 in code that works with the grammar. All of these attributes should be |
| 211 assumed to be read-only. Changing their values directly will likely |
| 212 break the grammar. |
| 213 |
| 214 <p> |
| 215 <b><tt>g.Productions</tt></b> |
| 216 <blockquote> |
| 217 A list of all productions added. The first entry is reserved for |
| 218 a production representing the starting rule. The objects in this list |
| 219 are instances of the <tt>Production</tt> class, described shortly. |
| 220 </blockquote> |
| 221 |
| 222 <p> |
| 223 <b><tt>g.Prodnames</tt></b> |
| 224 <blockquote> |
| 225 A dictionary mapping the names of nonterminals to a list of all |
| 226 productions of that nonterminal. |
| 227 </blockquote> |
| 228 |
| 229 <p> |
| 230 <b><tt>g.Terminals</tt></b> |
| 231 <blockquote> |
| 232 A dictionary mapping the names of terminals to a list of the |
| 233 production numbers where they are used. |
| 234 </blockquote> |
| 235 |
| 236 <p> |
| 237 <b><tt>g.Nonterminals</tt></b> |
| 238 <blockquote> |
| 239 A dictionary mapping the names of nonterminals to a list of the |
| 240 production numbers where they are used. |
| 241 </blockquote> |
| 242 |
| 243 <p> |
| 244 <b><tt>g.First</tt></b> |
| 245 <blockquote> |
| 246 A dictionary representing the first sets for all grammar symbols. This is |
| 247 computed and returned by the <tt>compute_first()</tt> method. |
| 248 </blockquote> |
| 249 |
| 250 <p> |
| 251 <b><tt>g.Follow</tt></b> |
| 252 <blockquote> |
| 253 A dictionary representing the follow sets for all grammar rules. This is |
| 254 computed and returned by the <tt>compute_follow()</tt> method. |
| 255 </blockquote> |
| 256 |
| 257 <p> |
| 258 <b><tt>g.Start</tt></b> |
| 259 <blockquote> |
| 260 Starting symbol for the grammar. Set by the <tt>set_start()</tt> method. |
| 261 </blockquote> |
| 262 |
| 263 For the purposes of debugging, a <tt>Grammar</tt> object supports the <tt>__len_
_()</tt> and |
| 264 <tt>__getitem__()</tt> special methods. Accessing <tt>g[n]</tt> returns the nth
production |
| 265 from the grammar. |
| 266 |
| 267 |
| 268 <H2><a name="internal_nn3"></a>3. Productions</H2> |
| 269 |
| 270 |
| 271 <tt>Grammar</tt> objects store grammar rules as instances of a <tt>Production</t
t> class. This |
| 272 class has no public constructor--you should only create productions by calling <
tt>Grammar.add_production()</tt>. |
| 273 The following attributes are available on a <tt>Production</tt> instance <tt>p</
tt>. |
| 274 |
| 275 <p> |
| 276 <b><tt>p.name</tt></b> |
| 277 <blockquote> |
| 278 The name of the production. For a grammar rule such as <tt>A : B C D</tt>, this
is <tt>'A'</tt>. |
| 279 </blockquote> |
| 280 |
| 281 <p> |
| 282 <b><tt>p.prod</tt></b> |
| 283 <blockquote> |
| 284 A tuple of symbols making up the right-hand side of the production. For a gramm
ar rule such as <tt>A : B C D</tt>, this is <tt>('B','C','D')</tt>. |
| 285 </blockquote> |
| 286 |
| 287 <p> |
| 288 <b><tt>p.number</tt></b> |
| 289 <blockquote> |
| 290 Production number. An integer containing the index of the production in the gra
mmar's <tt>Productions</tt> list. |
| 291 </blockquote> |
| 292 |
| 293 <p> |
| 294 <b><tt>p.func</tt></b> |
| 295 <blockquote> |
| 296 The name of the reduction function associated with the production. |
| 297 This is the function that will execute when reducing the entire |
| 298 grammar rule during parsing. |
| 299 </blockquote> |
| 300 |
| 301 <p> |
| 302 <b><tt>p.callable</tt></b> |
| 303 <blockquote> |
| 304 The callable object associated with the name in <tt>p.func</tt>. This is <tt>No
ne</tt> |
| 305 unless the production has been bound using <tt>bind()</tt>. |
| 306 </blockquote> |
| 307 |
| 308 <p> |
| 309 <b><tt>p.file</tt></b> |
| 310 <blockquote> |
| 311 Filename associated with the production. Typically this is the file where the p
roduction was defined. Used for error messages. |
| 312 </blockquote> |
| 313 |
| 314 <p> |
| 315 <b><tt>p.lineno</tt></b> |
| 316 <blockquote> |
| 317 Line number associated with the production. Typically this is the line number i
n <tt>p.file</tt> where the production was defined. Used for error messages. |
| 318 </blockquote> |
| 319 |
| 320 <p> |
| 321 <b><tt>p.prec</tt></b> |
| 322 <blockquote> |
| 323 Precedence and associativity associated with the production. This is a tuple <t
t>(assoc,level)</tt> where |
| 324 <tt>assoc</tt> is one of <tt>'left'</tt>,<tt>'right'</tt>, or <tt>'nonassoc'</tt
> and <tt>level</tt> is |
| 325 an integer. This value is determined by the precedence of the right-most termi
nal symbol in the production |
| 326 or by use of the <tt>%prec</tt> specifier when adding the production. |
| 327 </blockquote> |
| 328 |
| 329 <p> |
| 330 <b><tt>p.usyms</tt></b> |
| 331 <blockquote> |
| 332 A list of all unique symbols found in the production. |
| 333 </blockquote> |
| 334 |
| 335 <p> |
| 336 <b><tt>p.lr_items</tt></b> |
| 337 <blockquote> |
| 338 A list of all LR items for this production. This attribute only has a meaningfu
l value if the |
| 339 <tt>Grammar.build_lritems()</tt> method has been called. The items in this list
are |
| 340 instances of <tt>LRItem</tt> described below. |
| 341 </blockquote> |
| 342 |
| 343 <p> |
| 344 <b><tt>p.lr_next</tt></b> |
| 345 <blockquote> |
| 346 The head of a linked-list representation of the LR items in <tt>p.lr_items</tt>.
|
| 347 This attribute only has a meaningful value if the <tt>Grammar.build_lritems()</t
t> |
| 348 method has been called. Each <tt>LRItem</tt> instance has a <tt>lr_next</tt> at
tribute |
| 349 to move to the next item. The list is terminated by <tt>None</tt>. |
| 350 </blockquote> |
| 351 |
| 352 <p> |
| 353 <b><tt>p.bind(dict)</tt></b> |
| 354 <blockquote> |
| 355 Binds the production function name in <tt>p.func</tt> to a callable object in |
| 356 <tt>dict</tt>. This operation is typically carried out in the last step |
| 357 prior to running the parsing engine and is needed since parsing tables are typic
ally |
| 358 read from files which only include the function names, not the functions themsel
ves. |
| 359 </blockquote> |
| 360 |
| 361 <P> |
| 362 <tt>Production</tt> objects support |
| 363 the <tt>__len__()</tt>, <tt>__getitem__()</tt>, and <tt>__str__()</tt> |
| 364 special methods. |
| 365 <tt>len(p)</tt> returns the number of symbols in <tt>p.prod</tt> |
| 366 and <tt>p[n]</tt> is the same as <tt>p.prod[n]</tt>. |
| 367 |
| 368 <H2><a name="internal_nn4"></a>4. LRItems</H2> |
| 369 |
| 370 |
| 371 The construction of parsing tables in an LR-based parser generator is primarily |
| 372 done over a set of "LR Items". An LR item represents a stage of parsing one |
| 373 of the grammar rules. To compute the LR items, it is first necessary to |
| 374 call <tt>Grammar.build_lritems()</tt>. Once this step, all of the productions |
| 375 in the grammar will have their LR items attached to them. |
| 376 |
| 377 <p> |
| 378 Here is an interactive example that shows what LR items look like if you |
| 379 interactively experiment. In this example, <tt>g</tt> is a <tt>Grammar</tt> |
| 380 object. |
| 381 |
| 382 <blockquote> |
| 383 <pre> |
| 384 >>> <b>g.build_lritems()</b> |
| 385 >>> <b>p = g[1]</b> |
| 386 >>> <b>p</b> |
| 387 Production(statement -> ID = expr) |
| 388 >>> |
| 389 </pre> |
| 390 </blockquote> |
| 391 |
| 392 In the above code, <tt>p</tt> represents the first grammar rule. In |
| 393 this case, a rule <tt>'statement -> ID = expr'</tt>. |
| 394 |
| 395 <p> |
| 396 Now, let's look at the LR items for <tt>p</tt>. |
| 397 |
| 398 <blockquote> |
| 399 <pre> |
| 400 >>> <b>p.lr_items</b> |
| 401 [LRItem(statement -> . ID = expr), |
| 402 LRItem(statement -> ID . = expr), |
| 403 LRItem(statement -> ID = . expr), |
| 404 LRItem(statement -> ID = expr .)] |
| 405 >>> |
| 406 </pre> |
| 407 </blockquote> |
| 408 |
| 409 In each LR item, the dot (.) represents a specific stage of parsing. In each LR
item, the dot |
| 410 is advanced by one symbol. It is only when the dot reaches the very end that a
production |
| 411 is successfully parsed. |
| 412 |
| 413 <p> |
| 414 An instance <tt>lr</tt> of <tt>LRItem</tt> has the following |
| 415 attributes that hold information related to that specific stage of |
| 416 parsing. |
| 417 |
| 418 <p> |
| 419 <b><tt>lr.name</tt></b> |
| 420 <blockquote> |
| 421 The name of the grammar rule. For example, <tt>'statement'</tt> in the above exa
mple. |
| 422 </blockquote> |
| 423 |
| 424 <p> |
| 425 <b><tt>lr.prod</tt></b> |
| 426 <blockquote> |
| 427 A tuple of symbols representing the right-hand side of the production, including
the |
| 428 special <tt>'.'</tt> character. For example, <tt>('ID','.','=','expr')</tt>. |
| 429 </blockquote> |
| 430 |
| 431 <p> |
| 432 <b><tt>lr.number</tt></b> |
| 433 <blockquote> |
| 434 An integer representing the production number in the grammar. |
| 435 </blockquote> |
| 436 |
| 437 <p> |
| 438 <b><tt>lr.usyms</tt></b> |
| 439 <blockquote> |
| 440 A set of unique symbols in the production. Inherited from the original <tt>Prod
uction</tt> instance. |
| 441 </blockquote> |
| 442 |
| 443 <p> |
| 444 <b><tt>lr.lr_index</tt></b> |
| 445 <blockquote> |
| 446 An integer representing the position of the dot (.). You should never use <tt>l
r.prod.index()</tt> |
| 447 to search for it--the result will be wrong if the grammar happens to also use (.
) as a character |
| 448 literal. |
| 449 </blockquote> |
| 450 |
| 451 <p> |
| 452 <b><tt>lr.lr_after</tt></b> |
| 453 <blockquote> |
| 454 A list of all productions that can legally appear immediately to the right of th
e |
| 455 dot (.). This list contains <tt>Production</tt> instances. This attribute |
| 456 represents all of the possible branches a parse can take from the current positi
on. |
| 457 For example, suppose that <tt>lr</tt> represents a stage immediately before |
| 458 an expression like this: |
| 459 |
| 460 <pre> |
| 461 >>> <b>lr</b> |
| 462 LRItem(statement -> ID = . expr) |
| 463 >>> |
| 464 </pre> |
| 465 |
| 466 Then, the value of <tt>lr.lr_after</tt> might look like this, showing all produc
tions that |
| 467 can legally appear next: |
| 468 |
| 469 <pre> |
| 470 >>> <b>lr.lr_after</b> |
| 471 [Production(expr -> expr PLUS expr), |
| 472 Production(expr -> expr MINUS expr), |
| 473 Production(expr -> expr TIMES expr), |
| 474 Production(expr -> expr DIVIDE expr), |
| 475 Production(expr -> MINUS expr), |
| 476 Production(expr -> LPAREN expr RPAREN), |
| 477 Production(expr -> NUMBER), |
| 478 Production(expr -> ID)] |
| 479 >>> |
| 480 </pre> |
| 481 |
| 482 </blockquote> |
| 483 |
| 484 <p> |
| 485 <b><tt>lr.lr_before</tt></b> |
| 486 <blockquote> |
| 487 The grammar symbol that appears immediately before the dot (.) or <tt>None</tt>
if |
| 488 at the beginning of the parse. |
| 489 </blockquote> |
| 490 |
| 491 <p> |
| 492 <b><tt>lr.lr_next</tt></b> |
| 493 <blockquote> |
| 494 A link to the next LR item, representing the next stage of the parse. <tt>None<
/tt> if <tt>lr</tt> |
| 495 is the last LR item. |
| 496 </blockquote> |
| 497 |
| 498 <tt>LRItem</tt> instances also support the <tt>__len__()</tt> and <tt>__getitem_
_()</tt> special methods. |
| 499 <tt>len(lr)</tt> returns the number of items in <tt>lr.prod</tt> including the d
ot (.). <tt>lr[n]</tt> |
| 500 returns <tt>lr.prod[n]</tt>. |
| 501 |
| 502 <p> |
| 503 It goes without saying that all of the attributes associated with LR |
| 504 items should be assumed to be read-only. Modifications will very |
| 505 likely create a small black-hole that will consume you and your code. |
| 506 |
| 507 <H2><a name="internal_nn5"></a>5. LRTable</H2> |
| 508 |
| 509 |
| 510 The <tt>LRTable</tt> class is used to represent LR parsing table data. This |
| 511 minimally includes the production list, action table, and goto table. |
| 512 |
| 513 <p> |
| 514 <b><tt>LRTable()</tt></b> |
| 515 <blockquote> |
| 516 Create an empty LRTable object. This object contains only the information neede
d to |
| 517 run an LR parser. |
| 518 </blockquote> |
| 519 |
| 520 An instance <tt>lrtab</tt> of <tt>LRTable</tt> has the following methods: |
| 521 |
| 522 <p> |
| 523 <b><tt>lrtab.read_table(module)</tt></b> |
| 524 <blockquote> |
| 525 Populates the LR table with information from the module specified in <tt>module<
/tt>. |
| 526 <tt>module</tt> is either a module object already loaded with <tt>import</tt> or |
| 527 the name of a Python module. If it's a string containing a module name, it is |
| 528 loaded and parsing data is extracted. Returns the signature value that was us
ed |
| 529 when initially writing the tables. Raises a <tt>VersionError</tt> exception if |
| 530 the module was created using an incompatible version of PLY. |
| 531 </blockquote> |
| 532 |
| 533 <p> |
| 534 <b><tt>lrtab.bind_callables(dict)</tt></b> |
| 535 <blockquote> |
| 536 This binds all of the function names used in productions to callable objects |
| 537 found in the dictionary <tt>dict</tt>. During table generation and when reading |
| 538 LR tables from files, PLY only uses the names of action functions such as <tt>'p
_expr'</tt>, |
| 539 <tt>'p_statement'</tt>, etc. In order to actually run the parser, these names |
| 540 have to be bound to callable objects. This method is always called prior to |
| 541 running a parser. |
| 542 </blockquote> |
| 543 |
| 544 After <tt>lrtab</tt> has been populated, the following attributes are defined. |
| 545 |
| 546 <p> |
| 547 <b><tt>lrtab.lr_method</tt></b> |
| 548 <blockquote> |
| 549 The LR parsing method used (e.g., <tt>'LALR'</tt>) |
| 550 </blockquote> |
| 551 |
| 552 |
| 553 <p> |
| 554 <b><tt>lrtab.lr_productions</tt></b> |
| 555 <blockquote> |
| 556 The production list. If the parsing tables have been newly |
| 557 constructed, this will be a list of <tt>Production</tt> instances. If |
| 558 the parsing tables have been read from a file, it's a list |
| 559 of <tt>MiniProduction</tt> instances. This, together |
| 560 with <tt>lr_action</tt> and <tt>lr_goto</tt> contain all of the |
| 561 information needed by the LR parsing engine. |
| 562 </blockquote> |
| 563 |
| 564 <p> |
| 565 <b><tt>lrtab.lr_action</tt></b> |
| 566 <blockquote> |
| 567 The LR action dictionary that implements the underlying state machine. |
| 568 The keys of this dictionary are the LR states. |
| 569 </blockquote> |
| 570 |
| 571 <p> |
| 572 <b><tt>lrtab.lr_goto</tt></b> |
| 573 <blockquote> |
| 574 The LR goto table that contains information about grammar rule reductions. |
| 575 </blockquote> |
| 576 |
| 577 |
| 578 <H2><a name="internal_nn6"></a>6. LRGeneratedTable</H2> |
| 579 |
| 580 |
| 581 The <tt>LRGeneratedTable</tt> class represents constructed LR parsing tables on
a |
| 582 grammar. It is a subclass of <tt>LRTable</tt>. |
| 583 |
| 584 <p> |
| 585 <b><tt>LRGeneratedTable(grammar, method='LALR',log=None)</tt></b> |
| 586 <blockquote> |
| 587 Create the LR parsing tables on a grammar. <tt>grammar</tt> is an instance of <
tt>Grammar</tt>, |
| 588 <tt>method</tt> is a string with the parsing method (<tt>'SLR'</tt> or <tt>'LALR
'</tt>), and |
| 589 <tt>log</tt> is a logger object used to write debugging information. The debugg
ing information |
| 590 written to <tt>log</tt> is the same as what appears in the <tt>parser.out</tt> f
ile created |
| 591 by yacc. By supplying a custom logger with a different message format, it is po
ssible to get |
| 592 more information (e.g., the line number in <tt>yacc.py</tt> used for issuing eac
h line of |
| 593 output in the log). The result is an instance of <tt>LRGeneratedTable</tt>. |
| 594 </blockquote> |
| 595 |
| 596 <p> |
| 597 An instance <tt>lr</tt> of <tt>LRGeneratedTable</tt> has the following attribute
s. |
| 598 |
| 599 <p> |
| 600 <b><tt>lr.grammar</tt></b> |
| 601 <blockquote> |
| 602 A link to the Grammar object used to construct the parsing tables. |
| 603 </blockquote> |
| 604 |
| 605 <p> |
| 606 <b><tt>lr.lr_method</tt></b> |
| 607 <blockquote> |
| 608 The LR parsing method used (e.g., <tt>'LALR'</tt>) |
| 609 </blockquote> |
| 610 |
| 611 |
| 612 <p> |
| 613 <b><tt>lr.lr_productions</tt></b> |
| 614 <blockquote> |
| 615 A reference to <tt>grammar.Productions</tt>. This, together with <tt>lr_action<
/tt> and <tt>lr_goto</tt> |
| 616 contain all of the information needed by the LR parsing engine. |
| 617 </blockquote> |
| 618 |
| 619 <p> |
| 620 <b><tt>lr.lr_action</tt></b> |
| 621 <blockquote> |
| 622 The LR action dictionary that implements the underlying state machine. The keys
of this dictionary are |
| 623 the LR states. |
| 624 </blockquote> |
| 625 |
| 626 <p> |
| 627 <b><tt>lr.lr_goto</tt></b> |
| 628 <blockquote> |
| 629 The LR goto table that contains information about grammar rule reductions. |
| 630 </blockquote> |
| 631 |
| 632 <p> |
| 633 <b><tt>lr.sr_conflicts</tt></b> |
| 634 <blockquote> |
| 635 A list of tuples <tt>(state,token,resolution)</tt> identifying all shift/reduce
conflicts. <tt>state</tt> is the LR state |
| 636 number where the conflict occurred, <tt>token</tt> is the token causing the conf
lict, and <tt>resolution</tt> is |
| 637 a string describing the resolution taken. <tt>resolution</tt> is either <tt>'sh
ift'</tt> or <tt>'reduce'</tt>. |
| 638 </blockquote> |
| 639 |
| 640 <p> |
| 641 <b><tt>lr.rr_conflicts</tt></b> |
| 642 <blockquote> |
| 643 A list of tuples <tt>(state,rule,rejected)</tt> identifying all reduce/reduce co
nflicts. <tt>state</tt> is the |
| 644 LR state number where the conflict occurred, <tt>rule</tt> is the production rul
e that was selected |
| 645 and <tt>rejected</tt> is the production rule that was rejected. Both <tt>rule<
/tt> and </tt>rejected</tt> are |
| 646 instances of <tt>Production</tt>. They can be inspected to provide the user wit
h more information. |
| 647 </blockquote> |
| 648 |
| 649 <p> |
| 650 There are two public methods of <tt>LRGeneratedTable</tt>. |
| 651 |
| 652 <p> |
| 653 <b><tt>lr.write_table(modulename,outputdir="",signature="")</tt></b> |
| 654 <blockquote> |
| 655 Writes the LR parsing table information to a Python module. <tt>modulename</tt>
is a string |
| 656 specifying the name of a module such as <tt>"parsetab"</tt>. <tt>outputdir</tt>
is the name of a |
| 657 directory where the module should be created. <tt>signature</tt> is a string re
presenting a |
| 658 grammar signature that's written into the output file. This can be used to detec
t when |
| 659 the data stored in a module file is out-of-sync with the the grammar specificati
on (and that |
| 660 the tables need to be regenerated). If <tt>modulename</tt> is a string <tt>"par
setab"</tt>, |
| 661 this function creates a file called <tt>parsetab.py</tt>. If the module name re
presents a |
| 662 package such as <tt>"foo.bar.parsetab"</tt>, then only the last component, <tt>"
parsetab"</tt> is |
| 663 used. |
| 664 </blockquote> |
| 665 |
| 666 |
| 667 <H2><a name="internal_nn7"></a>7. LRParser</H2> |
| 668 |
| 669 |
| 670 The <tt>LRParser</tt> class implements the low-level LR parsing engine. |
| 671 |
| 672 |
| 673 <p> |
| 674 <b><tt>LRParser(lrtab, error_func)</tt></b> |
| 675 <blockquote> |
| 676 Create an LRParser. <tt>lrtab</tt> is an instance of <tt>LRTable</tt> |
| 677 containing the LR production and state tables. <tt>error_func</tt> is the |
| 678 error function to invoke in the event of a parsing error. |
| 679 </blockquote> |
| 680 |
| 681 An instance <tt>p</tt> of <tt>LRParser</tt> has the following methods: |
| 682 |
| 683 <p> |
| 684 <b><tt>p.parse(input=None,lexer=None,debug=0,tracking=0,tokenfunc=None)</tt></b> |
| 685 <blockquote> |
| 686 Run the parser. <tt>input</tt> is a string, which if supplied is fed into the |
| 687 lexer using its <tt>input()</tt> method. <tt>lexer</tt> is an instance of the |
| 688 <tt>Lexer</tt> class to use for tokenizing. If not supplied, the last lexer |
| 689 created with the <tt>lex</tt> module is used. <tt>debug</tt> is a boolean flag |
| 690 that enables debugging. <tt>tracking</tt> is a boolean flag that tells the |
| 691 parser to perform additional line number tracking. <tt>tokenfunc</tt> is a call
able |
| 692 function that returns the next token. If supplied, the parser will use it to ge
t |
| 693 all tokens. |
| 694 </blockquote> |
| 695 |
| 696 <p> |
| 697 <b><tt>p.restart()</tt></b> |
| 698 <blockquote> |
| 699 Resets the parser state for a parse already in progress. |
| 700 </blockquote> |
| 701 |
| 702 <H2><a name="internal_nn8"></a>8. ParserReflect</H2> |
| 703 |
| 704 |
| 705 <p> |
| 706 The <tt>ParserReflect</tt> class is used to collect parser specification data |
| 707 from a Python module or object. This class is what collects all of the |
| 708 <tt>p_rule()</tt> functions in a PLY file, performs basic error checking, |
| 709 and collects all of the needed information to build a grammar. Most of the |
| 710 high-level PLY interface as used by the <tt>yacc()</tt> function is actually |
| 711 implemented by this class. |
| 712 |
| 713 <p> |
| 714 <b><tt>ParserReflect(pdict, log=None)</tt></b> |
| 715 <blockquote> |
| 716 Creates a <tt>ParserReflect</tt> instance. <tt>pdict</tt> is a dictionary |
| 717 containing parser specification data. This dictionary typically corresponds |
| 718 to the module or class dictionary of code that implements a PLY parser. |
| 719 <tt>log</tt> is a logger instance that will be used to report error |
| 720 messages. |
| 721 </blockquote> |
| 722 |
| 723 An instance <tt>p</tt> of <tt>ParserReflect</tt> has the following methods: |
| 724 |
| 725 <p> |
| 726 <b><tt>p.get_all()</tt></b> |
| 727 <blockquote> |
| 728 Collect and store all required parsing information. |
| 729 </blockquote> |
| 730 |
| 731 <p> |
| 732 <b><tt>p.validate_all()</tt></b> |
| 733 <blockquote> |
| 734 Validate all of the collected parsing information. This is a seprate step |
| 735 from <tt>p.get_all()</tt> as a performance optimization. In order to |
| 736 increase parser start-up time, a parser can elect to only validate the |
| 737 parsing data when regenerating the parsing tables. The validation |
| 738 step tries to collect as much information as possible rather than |
| 739 raising an exception at the first sign of trouble. The attribute |
| 740 <tt>p.error</tt> is set if there are any validation errors. The |
| 741 value of this attribute is also returned. |
| 742 </blockquote> |
| 743 |
| 744 <p> |
| 745 <b><tt>p.signature()</tt></b> |
| 746 <blockquote> |
| 747 Compute a signature representing the contents of the collected parsing |
| 748 data. The signature value should change if anything in the parser |
| 749 specification has changed in a way that would justify parser table |
| 750 regeneration. This method can be called after <tt>p.get_all()</tt>, |
| 751 but before <tt>p.validate_all()</tt>. |
| 752 </blockquote> |
| 753 |
| 754 The following attributes are set in the process of collecting data: |
| 755 |
| 756 <p> |
| 757 <b><tt>p.start</tt></b> |
| 758 <blockquote> |
| 759 The grammar start symbol, if any. Taken from <tt>pdict['start']</tt>. |
| 760 </blockquote> |
| 761 |
| 762 <p> |
| 763 <b><tt>p.error_func</tt></b> |
| 764 <blockquote> |
| 765 The error handling function or <tt>None</tt>. Taken from <tt>pdict['p_error']</t
t>. |
| 766 </blockquote> |
| 767 |
| 768 <p> |
| 769 <b><tt>p.tokens</tt></b> |
| 770 <blockquote> |
| 771 The token list. Taken from <tt>pdict['tokens']</tt>. |
| 772 </blockquote> |
| 773 |
| 774 <p> |
| 775 <b><tt>p.prec</tt></b> |
| 776 <blockquote> |
| 777 The precedence specifier. Taken from <tt>pdict['precedence']</tt>. |
| 778 </blockquote> |
| 779 |
| 780 <p> |
| 781 <b><tt>p.preclist</tt></b> |
| 782 <blockquote> |
| 783 A parsed version of the precedence specified. A list of tuples of the form |
| 784 <tt>(token,assoc,level)</tt> where <tt>token</tt> is the terminal symbol, |
| 785 <tt>assoc</tt> is the associativity (e.g., <tt>'left'</tt>) and <tt>level</tt> |
| 786 is a numeric precedence level. |
| 787 </blockquote> |
| 788 |
| 789 <p> |
| 790 <b><tt>p.grammar</tt></b> |
| 791 <blockquote> |
| 792 A list of tuples <tt>(name, rules)</tt> representing the grammar rules. <tt>name
</tt> is the |
| 793 name of a Python function or method in <tt>pdict</tt> that starts with <tt>"p_"<
/tt>. |
| 794 <tt>rules</tt> is a list of tuples <tt>(filename,line,prodname,syms)</tt> repres
enting |
| 795 the grammar rules found in the documentation string of that function. <tt>filena
me</tt> and <tt>line</tt> contain location |
| 796 information that can be used for debugging. <tt>prodname</tt> is the name of the
|
| 797 production. <tt>syms</tt> is the right-hand side of the production. If you have
a |
| 798 function like this |
| 799 |
| 800 <pre> |
| 801 def p_expr(p): |
| 802 '''expr : expr PLUS expr |
| 803 | expr MINUS expr |
| 804 | expr TIMES expr |
| 805 | expr DIVIDE expr''' |
| 806 </pre> |
| 807 |
| 808 then the corresponding entry in <tt>p.grammar</tt> might look like this: |
| 809 |
| 810 <pre> |
| 811 ('p_expr', [ ('calc.py',10,'expr', ['expr','PLUS','expr']), |
| 812 ('calc.py',11,'expr', ['expr','MINUS','expr']), |
| 813 ('calc.py',12,'expr', ['expr','TIMES','expr']), |
| 814 ('calc.py',13,'expr', ['expr','DIVIDE','expr']) |
| 815 ]) |
| 816 </pre> |
| 817 </blockquote> |
| 818 |
| 819 <p> |
| 820 <b><tt>p.pfuncs</tt></b> |
| 821 <blockquote> |
| 822 A sorted list of tuples <tt>(line, file, name, doc)</tt> representing all of |
| 823 the <tt>p_</tt> functions found. <tt>line</tt> and <tt>file</tt> give location |
| 824 information. <tt>name</tt> is the name of the function. <tt>doc</tt> is the |
| 825 documentation string. This list is sorted in ascending order by line number. |
| 826 </blockquote> |
| 827 |
| 828 <p> |
| 829 <b><tt>p.files</tt></b> |
| 830 <blockquote> |
| 831 A dictionary holding all of the source filenames that were encountered |
| 832 while collecting parser information. Only the keys of this dictionary have |
| 833 any meaning. |
| 834 </blockquote> |
| 835 |
| 836 <p> |
| 837 <b><tt>p.error</tt></b> |
| 838 <blockquote> |
| 839 An attribute that indicates whether or not any critical errors |
| 840 occurred in validation. If this is set, it means that that some kind |
| 841 of problem was detected and that no further processing should be |
| 842 performed. |
| 843 </blockquote> |
| 844 |
| 845 |
| 846 <H2><a name="internal_nn9"></a>9. High-level operation</H2> |
| 847 |
| 848 |
| 849 Using all of the above classes requires some attention to detail. The <tt>yacc(
)</tt> |
| 850 function carries out a very specific sequence of operations to create a grammar. |
| 851 This same sequence should be emulated if you build an alternative PLY interface. |
| 852 |
| 853 <ol> |
| 854 <li>A <tt>ParserReflect</tt> object is created and raw grammar specification dat
a is |
| 855 collected. |
| 856 <li>A <tt>Grammar</tt> object is created and populated with information |
| 857 from the specification data. |
| 858 <li>A <tt>LRGenerator</tt> object is created to run the LALR algorithm over |
| 859 the <tt>Grammar</tt> object. |
| 860 <li>Productions in the LRGenerator and bound to callables using the <tt>bind_cal
lables()</tt> |
| 861 method. |
| 862 <li>A <tt>LRParser</tt> object is created from from the information in the |
| 863 <tt>LRGenerator</tt> object. |
| 864 </ol> |
| 865 |
| 866 </body> |
| 867 </html> |
| 868 |
| 869 |
| 870 |
| 871 |
| 872 |
| 873 |
| 874 |
OLD | NEW |