Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(13)

Side by Side Diff: third_party/WebKit/Source/bindings/IDLCompiler.md

Issue 2195743003: Migrate IDL compiler documentation to Markdown. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master
Patch Set: Created 4 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 The **IDL compiler** or **bindings generator** [transcompiles](http://en.wikiped ia.org/wiki/Transcompiler) [Web IDL](https://sites.google.com/a/chromium.org/dev /blink/webidl) to C++ code, specifically [bindings](http://en.wikipedia.org/wiki /Language_binding) between V8 (the [JavaScript engine](http://en.wikipedia.org/w iki/JavaScript_engine)) and Blink. That is, when an attribute or method in a [We b IDL interface](https://sites.google.com/a/chromium.org/dev/developers/web-idl- interfaces) is used from JavaScript, V8 calls the bindings code, which calls Bli nk code.
2
3 As of early 2014, there are almost 700 IDL files, over 2,000 attributes, and alm ost 4,000 methods. This bindings code is highly repetitive, primarily passing va lues (arguments and return values) between V8 and Blink and doing various conver sions and checks (e.g., type checking), hence it is mostly machine-generated.
4
5 This is complicated by numerous parameters and special cases, due to either want ing to automatically generate repetitive code (so Blink implementation is simple r), supporting quirks of Web interfaces, or Blink implementation details. The co mpiler is under active development, due to adding new features (bug [345506](htt ps://code.google.com/p/chromium/issues/detail?id=345506)), adjusting to changes in V8 or Blink, and reducing technical debt, particularly custom bindings (manua lly written bindings, bug [345519](https://code.google.com/p/chromium/issues/det ail?id=345519)). By far the majority of development occurs on the _back end_ (** code generator**); the _front end_ and overall structure is quite stable.
6
7 Users of the IDL compiler consist of:
8 * Blink (hence Chromium), which uses the whole compiler in the build (to gener ate V8 bindings) – this is the primary use.
9 * Other bindings generators (who want to compile Blink IDL files), which use t he front end and optionally parts of the top-level compiler and build scripts.
10 * Other IDL use: Web IDL is a convenient format for specifying JavaScript inte rfaces, and the front end can be used for reading these files, most simply using the IR object (`IdlDefinitions`).
11
12 The compiler is in [Source/bindings/scripts](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/), and bindin gs are generated as part of the build (see [IDL build](https://sites.google.com/ a/chromium.org/dev/developers/design-documents/idl-build)). The top-level file [ idl_compiler.py](https://code.google.com/p/chromium/codesearch#chromium/src/thir d_party/WebKit/Source/bindings/scripts/idl_compiler.py), can be invoked directly (if you want to compile a single file), or via [Tools/Scripts/run-bindings-test s](https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit /Tools/Scripts/run-bindings-tests). The rest of this document describes design, implementation, and use.
13
14 ## Overall structure
15 The compiler is factored into a pipeline, and the steps further divided into sep arate Python modules. While there are a number of components, each is simple and well-defined, and generally only one or two must be considered at a time.
16
17 There are complicated issues with global information such as dependencies (see b elow), but initially it is sufficient to consider compiling a single file.
18
19 The overall structure is standard for [compilers](http://en.wikipedia.org/wiki/C ompiler) (see [compiler construction](http://en.wikipedia.org/wiki/Compiler_cons truction)): a front end reads in input (IDL), yielding an [intermediate represen tation](http://en.wikipedia.org/wiki/Intermediate_representation) (IR), and the back end takes the IR and writes out output (C++).
20
21 ```
22 IDL (Foo.idl) --front end--> IR --back end--> C++ (V8Foo.h, V8Foo.cpp)
23 ```
24
25 The flow is a (per file) [pipeline](http://en.wikipedia.org/wiki/Pipeline_(softw are)): information flows one way, and, other than dependencies and global inform ation, files are processed independently. In detail:
26
27 ```
28 Front end: IDL --lexer--> tokens --parser--> AST --constructor--> IR
29 Back end: IR --logic--> context --template processor--> C++
30 ```
31
32 In terms of data types, objects, and modules, this is:
33
34 ```
35 IdlCompiler: idl_filename --IdlReader--> IdlDefinitions --CodeGeneratorV8--> header_text, cpp_text
36
37 IdlReader: idl_filename --BlinkIDLLexer--> (internal) --BlinkIDLParser--> ID LNode
38 --idl_definitions_builder--> IdlDefinitions
39
40 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py --> dict
41 --jinja2--> header_text, cpp_text
42 ```
43
44 One can trace the processing of each element of IDL input text through the pipel ine, essentially 1-to-1:
45 * Spec: [Interfaces](http://heycam.github.io/webidl/#idl-interfaces) (producti on rule: [Interface](http://heycam.github.io/webidl/#prod-Interface))
46 * Input IDL text is parsed by the `p_Interface` rule in `idl_parser.IDLParser` (base parser), producing an `IDLNode` (AST) named `'Interface'`.
47 * An `IdlInterface` is constructed from this tree in `idl_definitions_builder`
48 * Python logic produces a dict (Jinja context) via the `v8_interface.interface _context` function
49 * The context is rendered to `.cpp/.h` output text by Jinja, from the template s `interface.cpp` (derived from `interface_base.cpp`) and `interface.h`
50
51 The boundary between the stages of the pipeline can move, primarily in the CG be tween the context and the template (depending on logic complexity: can it fit in the template, or should it be a Python function?). Changing the boundary betwee n the FE and the CG is possible (e.g., moving "find indexed property getter" int o the `IdlInterface` constructor, instead of the CG `v8_interface` module), but rare. In principle the constructor could be entirely integrated into the parser (have the yacc actions construct the object directly, instead of an AST), but th is is unlikely, as it would require a significant amount of work for a relativel y minor simplification.
52
53 Note that filenames, not IDL text, are what is input to the reader. This is fund amentally because the compiler and the parser assume they are reading in files ( hence take a filename parameter, e.g. for reporting errors). Further, due to dep endencies, processing one IDL file may require reading others (hence going out t o the file system). This could be abstracted to taking in IDL text and passing i nterface names rather than filenames internally, but this isn't necessary at pre sent.
54
55 The front end is familiar ([regex](http://en.wikipedia.org/wiki/Regular_expressi on) [lexer](http://en.wikipedia.org/wiki/Lexical_analysis), [LALR parser](http:/ /en.wikipedia.org/wiki/LALR_parser), and object constructor). The lexer-parser u ses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/), a Python implementation of [lex](http://en.wikipedia.org/wiki/Lex_(software)) and [yacc](http://en.wiki pedia.org/wiki/Yacc); the constructor is lengthy but straightforward.
56
57 However, the back end ([code generator](http://en.wikipedia.org/wiki/Code_genera tion_(compiler))) differs from compilers that target machine code: since it is a [source-to-source compiler](http://en.wikipedia.org/wiki/Source-to-source_compi ler), the back end outputs C++ (source code, not machine code). Thus we use a [t emplate processor](http://en.wikipedia.org/wiki/Template_processor) to produce o utput; recall that _templating_ (writing formatted _output_) is complementary to _parsing_ (reading formatted _input_). The template processor is [Jinja](https: //sites.google.com/a/chromium.org/dev/developers/jinja), which is used in severa l places in Chromium. Given an interface, containing various attributes and meth ods, we fill in a template for the corresponding C++ class and each attribute an d method.
58
59 Further, the input language is declarative (the 'D' in IDL), so no optimizations of input code are necessary (there is no _middle end_): it's just filling in a template. The back end is itself divided into the templates themselves, and the Python code that fills in the templates (produces the _context_). There is also no separate [_semantic analysis_](http://en.wikipedia.org/wiki/Semantic_analysis _(compilers)) step, except for validation of extended attributes (see below): th e code generator assumes types are valid, and errors show up when the resulting C++ code fails to compile. This avoids the complexity and time cost of either a separate validation step, or of type-checking in the code generator, at the cost of typos showing up in compile failures instead of at IDL compile time. Notably , there is no type checking or name binding, since identifiers are assumed to re fer to Web IDL interfaces (C++ classes) and the Web IDL namespace is global, and there is no assignment, since Web IDL is declarative.
60
61 ## Code
62 Code-wise, the top-level file [`idl_compiler.py`](https://code.google.com/p/chro mium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_comp iler.py) imports two modules: [`idl_reader`](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_reader.py ) for the front end, [`code_generator_v8`](https://code.google.com/p/chromium/co desearch#chromium/src/third_party/WebKit/Source/bindings/scripts/code_generator_ v8.py) for the back end. Each of these is used to create an object (`IdlReader` and `CodeGeneratorV8`), which handles the library initialization (PLY or Jinja, which are slow, and thus reused if running multiple times during one run) and gl obal information. The objects are then called one time each, for the IDL --> IR and IR --> C++ steps, respectively. By contrast, `run-bindings-tests` creates th ese objects as well, then calls them multiple times, once for each test file. No te that in the actual build, compilation is parallelized, which is why only one file is compiled per process, and there is a pre-caching step which significantl y speeds up library initialization.
63
64 The code is mostly functional, except for a few module-level variables in the co de generator (discussed below) for simplicity, and some objects used for initial ization.
65
66 The main classes are as follows, with the module in which they are defined. Note that the relations are primarily composition, with inheritance significantly us ed for the Blink-specific lexer and parser. `IdlCompiler, IdlReader,` and `CodeG eneratorV8` are expected to be used as singletons, and are just classes for init ialization (avoids expensive re-initialization, and interface-wise separates ini tialization from execution). `IdlDefinitions` contains objects of a few other cl asses (for more minor data), and there is a bit of complex OO code in [`idl_defi nitions`](https://code.google.com/p/chromium/codesearch#chromium/src/third_party /WebKit/Source/bindings/scripts/idl_definitions.py) to simplify [typedef](http:/ /heycam.github.io/webidl/#idl-typedefs) resolution.
67
68 ```
69 IdlCompiler :: idl_compiler
70 IdlReader :: idl_reader
71 BlinkIDLParser < IDLParser
72 BlinkIDLLexer < IDLLexer
73 IDLExtendedAttributeValidator
74 CodeGeneratorV8 :: code_generator_v8
75 ```
76 ```
77 IdlDefinitions :: idl_definitions
78 IdlInterface
79 IdlAttribute
80 IdlConstant
81 IdlOperation
82 IdlArgument
83 ```
84
85 ## Front end
86 The front end is structured as a lexer --> parser --> object constructor pipelin e:
87
88 Front end: `IDL --lexer--> tokens --parser--> AST --constructor--> IR`
89
90 There are two other steps:
91 * extended attribute validation
92 * dependency resolution
93
94 The top-level module for the front end is [`idl_reader`](https://code.google.com /p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/i dl_reader.py). This implements a class, `IdlReader`, whose constructor also cons tructs a lexer, parser, validator, and resolver. `IdlReader` implements a method to construct an IR object (`IdlDefinitions`) from an IDL filename. Thus to conv ert IDL to IR one instantiates a reader, then call its method to read an IDL fil e.
95
96 The lexer-parser uses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/). In fa ct, the lexer and parser for the Blink IDL dialect of Web IDL derive from a base lexer and base parser for standard Web IDL (in [tools/idl_parser](https://code. google.com/p/chromium/codesearch#chromium/src/tools/idl_parser)), and thus only need to include deviations from the standard. The lexical syntax of Blink IDL is standard (though the base lexer is slightly non-standard), so the Blink IDL lex er is very small and slated for removal (Bug [345137](https://code.google.com/p/ chromium/issues/detail?id=345137)). The phrase syntax is slightly non-standard ( primarily in extended attributes) and expected to stay this way, as extended att ributes are implementation-dependent and the deviations are useful (see [Blink I DL: syntax](http://www.chromium.org/blink/webidl#TOC-Syntax)). We thus say that the base lexer/parser + Blink lexer/parser form 1 lexer and 1.1 parser (base + d erived).
97
98 The base lexer class, defined in [`idl_lexer`](https://code.google.com/p/chromiu m/codesearch#chromium/src/tools/idl_parser/idl_lexer.py), is straightforward and short: it is just a list of regular expressions and keywords, wrapped in a clas s, `IDLLexer`; a lexer (object) itself is an instance of this class. There is so me minor complexity with error handling (correct line count) and methods to assi st with derived classes, but it's quite simple. The derived lexer class is very simple: it's a class, `BlinkIDLLexer`, which derives from the base class. The on ly complexity is adding a method to remove a token class from the base class, an d to remove comments (base lexer is non-standard here: standard lexer does not p roduce `COMMENT` tokens).
99
100 The base parser class, defined in [`idl_parser`](https://code.google.com/p/chrom ium/codesearch#chromium/src/tools/idl_parser/idl_parser.py), is considerably lon ger, but consists almost entirely of production rules in (a form of) [BNF](http: //en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), together with yacc _actions_ that build an [abstract syntax tree](http://en.wikipedia.org/wiki/Abstract_synta x_tree) (AST, syntax tree). Recall that yacc _traverses_ the [concrete syntax tr ee](http://en.wikipedia.org/wiki/Concrete_syntax_tree) (CST, parse tree) and can take whatever actions it chooses; in this case it generates an AST, though it c ould also generate the IR directly. See [`blink_idl_parser`](https://code.google .com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scrip ts/blink_idl_parser.py) for a detailed explanation of the PLY yacc syntax and `I DLParser` methods used to construct the AST. The grammar is directly from the [W eb IDL grammar](http://heycam.github.io/webidl/#idl-grammar), which is an [LL(1) grammar](http://en.wikipedia.org/wiki/LL_grammar); the Blink IDL rules are exac tly the deviations from the standard grammar, or overrides to irregularities in the base parser. A parser object is an instance of a class, `IDLParser`, from wh ich `BlinkIDLParser` is derived. The parser constructor takes a lexer as an argu ment, and one passes the corresponding lexer.
101
102 The definitions classes, defined in [idl_definitions](https://code.google.com/p/ chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_ definitions.py), primarily consist of constructors, which take the AST and gener ate an intermediate representation (IR), in this case an object of type `IdlDefi nitions` and contained objects for individual definitions and definition's membe rs.
103
104 The classes are as follows (mostly composition, one case of inheritance); a few internal-use only classes are not shown:
105
106 ```
107 IdlDefinitions
108 IdlInterface
109 IdlAttribute
110 IdlConstant
111 IdlOperation
112 IdlArgument
113 IdlException < IdlInterface
114 IdlCallbackFunction
115 IdlEnum
116 ```
117
118 After reading an IDL file (producing an IR), the reader has two optional additio nal steps (these are run during the build, but aren't necessary for simpler use, such as compiling a test file without dependencies): validation and dependency resolution.
119
120 ### Extended attribute validation
121 The front end largely does not do [semantic analysis](http://en.wikipedia.org/wi ki/Semantic_analysis_%28compilers%29), as semantic errors (primarily name errors ) are largely caught by the build visibly failing, either during the IDL compile or during the C++ compile (name lookup errors). Notably, there is no symbol tab le or name binding in the front end: each name is simply looked up on use (e.g., type names), or passed through to the output (e.g., attribute names), and catch errors by the lookup failing or the generated code failing to compile, respecti vely.
122
123 However, extended attributes are validated, both the keys and values, based on t he list in [`IDLExtendedAttributes.txt`](https://code.google.com/p/chromium/code search#chromium/src/third_party/WebKit/Source/bindings/IDLExtendedAttributes.txt ). This is done because invalid extended attributes are _ignored_ by the compile r, specifically the code generator, and thus errors are easy to miss. The Python code checks for specific extended attributes, so errors just result in the attr ibute not being found; for example, writing `[EnforecRange]` for `[EnforceRange] ` would otherwise silently result in the range not being enforced. This is not p erfect: extended attributes may be valid but used incorrectly (e.g., specified o n an attribute when they only apply to methods), which is not caught; this is ho wever a minor problem.
124
125 This is done in the front end since that is the proper place for semantic analys is, and simplifies the code generator.
126
127 ### Dependency resolution
128 IDL interface definitions have two types of dependencies:
129 * An IDL interface definition can be spread across multiple files, via partial interfaces and (in effect) by `implements` (interface inheritance). We call the se _dependency interface definitions, dependency interfaces,_ or just _dependenc ies_.
130 * An IDL interface may use other interfaces as interface types: e.g., `interfa ce Foo { attribute Bar bar; };` We call these _referenced interfaces_.
131
132 The dependency resolution phase consists of merging the dependencies into the ma in interface definition, and adding referenced interfaces to the list of all int erfaces (available to the code generator). These dependencies are computed durin g the _Global information_ stage; see below.
133
134 Members defined in partial interfaces and in implemented interfaces are simply a ppended to the list of members of the main interface, with a few caveats (below) . Formally the difference between partial interfaces and implemented interfaces is that a partial interface is (external) type extension (it is part of and modi fies the type, but is just defined separately), and a partial interface is assoc iated with a _single_ main interface that it extends; while implemented interfac es are _interface inheritance_ (providing multiple inheritance in IDL) and _seve ral_ interfaces can implement a given (implemented) interface. This difference i s only for IDL: in JavaScript these are both just exposed as properties on the o bject implementing the interface; and thus we mostly don't distinguish them in t he code generator either, just merging them into one list.
135
136 The subtleties are:
137 * Members defined in partial interfaces are treated differently by the code ge nerator: they are assumed to be implemented as static members in separate classe s, not as methods on the object itself. This is because partial interfaces are u sually for optional or specialized parts of a basic interface, which might only be used by subtypes, and thus we do not want to clutter the main Blink C++ class with these.
138 * Some extended attributes are transferred from the dependency interface defin ition to its members, because otherwise there is no way to apply the attributes to "just the members in this definition", due to merging. These are extended att ributes for turning off a dependency interface, namely: `[Conditional], [PerCont extEnabled],` and `[RuntimeEnabled].`
139 * Interface-level `[TypeChecking]` extended attributes are also recognized on partial interfaces as a convenience. Its value is also transferred to its member s, appending it to any `[TypeChecking]` values that the member already declares.
140
141 Also, note that the compiler currently does not do significant type introspectio n of referenced interfaces: it mostly just uses certain global information about the interface (is it a callback interface?, `[ImplementedAs],` etc.). The only significant use of type introspection is in `[PutForwards],` where generating th e code for an attribute with `[PutForwards]` requires looking up the referenced attribute in another interface. Type introspection may increase in future, to si mplify complex cases and reduce hard-coding, but for now is in very limited use. </div>
142
143 ## Back end (code generator)
144 The back end (code generator, CG) is where the bulk of the complexity is. The st ructure is simple: the Python V8 CG modules (`v8_*`) generate a dictionary calle d the **context**, which is passed to Jinja, which uses this dict to fill in the appropriate templates ([bindings/templates](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/templates/)/*.{cpp,h} ). The modules correspond to the class structure in IR, with the following excep tions:
145 * the CG uses “method” where the front end uses “operation” (the term in the s pec), since “method” is a more standard term;
146 * constants are included in `v8_interface`, because very simple; and
147 * arguments are included in `v8_methods`, because they are closely related to methods.
148
149 Recall the overall flow of the CG:
150 ```
151 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py --> dict
152 --jinja2--> header_text, cpp_text
153 ```
154
155 The class structure of the IR, and corresponding Python module that processes it (in _italics_ if included in a previously mentioned module), are:
156 ```
157 IdlInterface :: v8_interface
158 IdlAttribute :: v8_attributes
159 IdlConstant :: _v8_interface_
160 IdlOperation :: v8_methods
161 IdlArgument :: _v8_methods_
162 ```
163 The CG modules each have a top-level `*_context` function_,_ which takes an IR o bject (`Idl*`) as a parameter and returns a context (`dict`), looping through co ntained IR objects as necessary.
164
165 Note that the context is nested – the context for an interfaces contains a list of contexts for the members: attributes, constants, and methods. The context for a given member is used throughout the corresponding template, generally to gene rate several methods; e.g., in `attributes.cpp` there are macros for getter, get ter callback, setter, and setter callback, which are called from a loop in `inte rface_base.cpp`. The context for members is _also_ used at the overall interface level, notably for DOM configuration.
166
167 ## Style
168 [`v8_attributes.py`](https://code.google.com/p/chromium/codesearch#chromium/src/ third_party/WebKit/Source/bindings/scripts/v8_attributes.py) (Python) and [`attr ibutes.cpp`](https://code.google.com/p/chromium/codesearch#chromium/src/third_pa rty/WebKit/Source/bindings/templates/attributes.cpp) (Jinja) are a good guide to style: the getter and setter methods themselves are quite complex, while the ca llbacks are simple.
169
170 See [Jinja: Style](https://sites.google.com/a/chromium.org/dev/developers/jinja# TOC-Style) for general Jinja style tips; below is bindings generator specific gu idance.
171
172 ### Jinja
173 If nesting macros (so macros that call macros), use top-down order: top-level ma cro, followed by macros it calls. This is most notable in [`methods.cpp`](https: //code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/b indings/templates/methods.cpp), where generating a method also requires generati ng code for the arguments, which comes after.
174
175 ### Python
176 Assertions should _not_ be used for validating input, and this includes IDL file s: if an IDL file is invalid (e.g., it uses an extended attribute improperly), r aise an exception explicitly. Assertions can be used for validating internal log ic, but in practice this is rare in the code generator; see [Using Assertions Ef fectively](https://wiki.python.org/moin/UsingAssertionsEffectively).
177
178 Deviations from the "One dictionary display per context" rule:
179 * If the presence of an extended attribute has side effects (namely includes), this code comes before the display.
180 * An early return after the main display, and then additional processing after wards, occurs for attributes, where certain attributes (constructors, custom att ributes) have limited processing, and the side effects of running the additional context-generation code (extra includes) are not desired.
181 * Similarly, the getter and setter logic for attributes is relative complicate d, and depends on previously computed variables, and thus this part of the conte xt is produced in other functions, then added to the main dictionary via `dict.u pdate()`.
182
183 **No extended attributes in templates:** A key point for the CG is that the raw extended attributes are _not_ passed to the Jinja templates, and literal extende d attribute names accordingly do not appear in templates. Instead, extended attr ibutes are all passed via some variable in the context, either a content variabl e or a (boolean) control variable (e.g., `deprecate_as` for the `[DeprecateAs]` extended attribute and `is_reflect` for the `[Reflect]` extended attribute (this is only boolean since the value is used elsewhere, in the function for the gett er/setter name)). This is primarily for consistency: in some cases variables _mu st_ be used, notably if the extended attribute requires a side effect (includes) , and thus putting all literal references to extended attributes in the Python l ogic keeps them together.
184
185 **Includes:** The CG code is primarily functional, with one significant exceptio n: _includes_ (for the `.cpp` file; the `.h` includes are simple). Includes are added to the `.cpp` file throughout the CG (as needed by various features), and account for a significant amount of the code bulk. This is handled by considerin g includes as a [side effect](http://en.wikipedia.org/wiki/Side_effect_(computer _science)), and storing them in a global variable (or rather a module level vari able, as Python modules are [singletons](http://en.wikipedia.org/wiki/Singleton_ pattern)), specifically in `v8_global`. Alternatives include: being purely funct ional, and returning the includes from the functions, which makes the calls and returns bulky (due to all the passing backwards and forwards); or having all CG functions be methods of some object (`CodeGeneratorV8`), which adds a large amou nt of OO boilerplate. The only complexity of using module variables is that one needs to import it into each module, and must clear the includes if compiling mu ltiple files in a single Python run (and can't compile multiple files concurrent ly in a single Python processes). This latter is ok because compilation is paral lelized by distributing across separate processes, not by compiling multiple fil es in a single process.
186
187 Note that the context being computed is _not_ a module-wide variable: for attrib utes, modules, and arguments, a context is computed for each of these (functiona lly), though for an interface only a single context is computed. In rare cases t he context is also passed into functions as an argument (when later values depen d on earlier computed values), which is primarily to flatten the module: it's cl earer to pass explicitly, instead of having a nested function.
188
189 There are a few other module-level variables for the IDL global context, which a re write-once or update-only: `v8_globals.interfaces`, for referenced interfaces , and a few in `v8_types`, for global and local type information (e.g., `[Implem entedAs]`, enumerations). These are set at `CodeGeneratorV8` construction (from `interfaces_info`), which is write-once, or `CodeGeneratorV8.generate_code` invo cation (from `definitions`), which is update-only (can add some new enumerations , for instance, but shouldn't need or refer to any from other files).
190
191 **Template rendering:** the context is then passed into a Jinja template, via th e [`Template.render`](http://jinja.pocoo.org/docs/api/#jinja2.Template.render) m ethod, which returns the generated code.
192
193 ## Global information
194 Global information is computed by a pre-processing step, dependency resolution a nd merging is done by the front end, and global information is used by the code generator in limited places.
195
196 `FIXME`
197
198 ## Dependencies
199 `FIXME`
200
201 ## Goals
202 Key goals of the compiler are as follows (in descending order of importance); th ese primarily affect the code generator:
203 * **Correctness** (including **security**) and **performance** of generated ob ject code (speed, memory usage, size), after compiling C++ to object code
204 * **Simplicity** of compiler code itself, particularly code generator
205 * **Build performance** (see [IDL build: performance](http://www.chromium.org/ developers/design-documents/idl-build#TOC-Performance) and [Performance](http:// www.chromium.org/developers/design-documents/idl-compiler#TOC-Performance), belo w)
206
207 Note that **legibility** of generated C++ source code is secondary to legibility of the code generator, as the CG is what is developed and reviewed, though the generated code should be legible if this does not conflict with other goals. As a rule, simple (one-line) changes to CG that simplify the generated source code are ok, but more complex changes that are only for source code legibility (that have no effect on object code) should be avoided. If you rely on the compiler (e .g., for dead code elimination), please write a comment to that effect.
208
209 Since the CG is complex enough as it is, a key part of the design is moving comp lexity _out_ of the code generator, notably to the front end (building an easy-t o-use IR) and to the global information step.
210
211 ## Alternatives
212 Fundamentally, why do we have our own compiler at all? Can't we reuse an existin g program?
213
214 The answer is: we're reusing as much code as possible (lexer/parser and template library, and indeed a lexer/parser for the Web IDL standard), but we need custo m code for 3 purposes:
215 * Blink IDL dialect, which differs from Web IDL standard: we inherit from a st andard parser, so this is small, but we need our own.
216 * IR constructor: the AST output from the base parser is hard to use directly (you need to walk a tree), so we construct an easy-to-use object so the CG can h ave simple reading code (instead of walking the tree itself).
217 * V8 code generator: we need to generate very specific C++ code to bind to V8 and implement Web IDL, with many options and special cases.
218
219 Of these, the V8 CG is the main reason; Blink IDL dialect and the IR constructor exist to reduce complexity of the CG.
220
221 ### Why not C++?
222 The compiler is written in Python, rather than C++, because hackability (ease of understanding and modification) is more important than build speed, and build s peed is good enough.
223
224 A high-level language like Python is the usual choice for such tasks, and Python is the standard high-level language in Chromium: other languages (except JavaSc ript) are not widely used in Chromium, and JavaScript offers no advantages over Python.
225
226 Also, C++ code that generates C++ code risks being very confusing. In principle one could imagine writing the CG in template metaprogramming, but that would lik ely be hard to understand.
227
228 In principle C/C++ code may be used in the front end (lexer, parser, and constru ctor), particularly as the lexer and parser can be generated by lex-yacc, but th e code generator is expected to be in Python.
229
230 ### Why not SWIG?
231 We can't use SWIG because there's in fact very little overlap. Essentially we us e Jinja templates+Python logic for the backend, rather than SWIG templates.
232
233 While SWIG is a general "high-level language to C++ bindings generator", and thu s seems a natural fit, it doesn't fill our needs. We need a front end because we 're taking IDL files as input, and we need a code generator that supports all th e specific features that Web IDL and Blink require: Web IDL has different semant ics from JavaScript generally, and Blink has many IDL extended attributes and ot her implementation details. Further, there is a prototype V8 backend for SWIG, b ut it is incomplete (the JavaScript backend primarily targets JavaScriptCore). I n principle we could have a compiler whose back end output SWIG templates, compl ete the general V8 backend for SWIG, and then modify this or fork it to make a W eb IDL to V8 backend, customized for Blink. As this description makes clear, tha t involves significant complexity and few advantages over generating the C++ cod e directly, using Jinja instead of SWIG.
234
235 ### Why not another lexer/parser or template processor?
236 PLY (built on lex-yacc) and Jinja are standard and widely-used libraries for the se tasks, and are used elsewhere in Chromium. We're open to other libraries if t hey are significantly superior (and we switch throughout Chromium), but these li braries are fit for purpose.
237
238 ## Performance
239 _For overall IDL build performance, see: [IDL build: Performance](http://www.chr omium.org/developers/design-documents/idl-build#TOC-Performance)_
240
241 The bindings are generated by running a separate compiler process for each inter face (main IDL file), for parallelization and simplicity of build. This section discusses optimization of individual IDL compiler runs.
242
243 For compilation of an individual IDL file, the key factor in startup time, both Python startup time, and initialization time of libraries. This is because most IDL files are quite short and simple, so processing time (variable cost) is shor t relative to startup time (fixed cost). Currently (March 2014) the average sequ ential compilation time of a single IDL file on a fast Linux workstation is ~80 milliseconds, compared to Python startup time of ~4 milliseconds (skipping impor t of `site` with `-S`).
244
245 Naively, the key limiting factor is initialization time of the libraries (PLY an d Jinja). This is expensive, as it requires compiling the lexer & parser or temp lates. Initialization is sped up by pre-caching these in a separate build step, which significantly improves build time. Note that `run-bindings-tests` will com pile multiple test files in a single Python process_,_ which is quite fast, sinc e initialization is done only once.
246
247 The compile has not been profiled (except by using [time(1)](http://en.wikipedia .org/wiki/Time_(Unix)) on a build), since the low-hanging fruit were obvious (ca che lexer and parser tables and templates), performance is now good enough, and significant additional improvements are unlikely without major work or reenginee ring.<
248
249 Processing time itself is presumably dominated by the main libraries (parsing vi a PLY and template rendering via Jinja), and possibly object construction (IdlDe finitions); the actual CG is likely quite cheap, as it's a simple iteration over the IR with a few conditionals and simple computations.
250
251 To our knowledge, there are no O(_n_<sup>2</sup>) or slower algorithms in the co mpiler – everything should be O(_n_) or faster – but some may be lurking somewhe re.
252
253 ### Further improvements
254 Potential future improvements (without substantial reengineering) would primaril y be to improve startup of some components, or rewrite some components in C.
255
256 * Python tweaks:
257 * execute bytecode directly (compile to `.pyc` or `.pyo` in separate step and execute that): v. small improvement (about .5 ms/process), would require other b uild step
258 * use Python optimized mode (`-O` or `-OO`): likely little improvement (just s hrinks size of bytecode), and currently causes serious regressions (likely due t o breaking caching in a library)
259 * Further optimization of Jinja startup requires changes in the Jinja library itself; see discussion at references.
260 * Replacing parts of the PLY lexer-parser with a C/C++ library (produced by le x or yacc) would speed up execution of the front end, as it would be C code, not Python code, and make it virtually instantaneous. C lex/C++ yacc code is about 50x faster than PLY, and the lexer and parser typically each account for 40% of PLY execution time; remainder is cost of running the yacc actions, in this case to build an AST ([Writing Parsers and Compilers with PLY](http://www.dabeaz.com/ ply/PLYTalk.pdf), slides 76 and 77).
261 * A C lexer would be simple and significant, while a C++ parser would be relat ively complicated, particularly due to the use of inheritance to derive a parser for the Blink IDL dialect from a parser for the standard Web IDL language.
262 * <div>In terms of the overall build system, initialization costs could be sig nificantly reduced by initializing a single process and then forking it, so it's just a copy. However, this is platform dependent (Windows does not have `fork() `), and would significantly complicate the compiler.</div>
263
264 ### References
265 * PLY: [Lex: Optimized mode](http://www.dabeaz.com/ply/ply.html#ply_nn15) and [Using Python's Optimized Mode](http://www.dabeaz.com/ply/ply.html#ply_nn38): we use optimization for the control over caching, not because we are running Pytho n in optimized mode (we aren't)
266 * [Jinja performance](http://www.chromium.org/developers/jinja#TOC-Performance )
267
268 ## References
269 * [Source/bindings/scripts](https://code.google.com/p/chromium/codesearch#chro mium/src/third_party/WebKit/Source/bindings/scripts/): the code
270 * [Issue <strike>239771</strike>: Rewrite the IDL compiler in Python](https:// groups.google.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): tracking bug for rewrite (2013/2014)
271 * [IDL compiler (V8 bindings generator) switched to Python](https://groups.goo gle.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): wrap-up email for r ewrite (Feb 2014)
272 * [followup](https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/ 5FSUhiugWsE) about front end
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698