OLD | NEW |
(Empty) | |
| 1 The **IDL compiler** or **bindings generator** [transcompiles](http://en.wikiped
ia.org/wiki/Transcompiler) [Web IDL](https://sites.google.com/a/chromium.org/dev
/blink/webidl) to C++ code, specifically [bindings](http://en.wikipedia.org/wiki
/Language_binding) between V8 (the [JavaScript engine](http://en.wikipedia.org/w
iki/JavaScript_engine)) and Blink. That is, when an attribute or method in a [We
b IDL interface](https://sites.google.com/a/chromium.org/dev/developers/web-idl-
interfaces) is used from JavaScript, V8 calls the bindings code, which calls Bli
nk code. |
| 2 |
| 3 As of early 2014, there are almost 700 IDL files, over 2,000 attributes, and alm
ost 4,000 methods. This bindings code is highly repetitive, primarily passing va
lues (arguments and return values) between V8 and Blink and doing various conver
sions and checks (e.g., type checking), hence it is mostly machine-generated. |
| 4 |
| 5 This is complicated by numerous parameters and special cases, due to either want
ing to automatically generate repetitive code (so Blink implementation is simple
r), supporting quirks of Web interfaces, or Blink implementation details. The co
mpiler is under active development, due to adding new features (bug [345506](htt
ps://code.google.com/p/chromium/issues/detail?id=345506)), adjusting to changes
in V8 or Blink, and reducing technical debt, particularly custom bindings (manua
lly written bindings, bug [345519](https://code.google.com/p/chromium/issues/det
ail?id=345519)). By far the majority of development occurs on the _back end_ (**
code generator**); the _front end_ and overall structure is quite stable. |
| 6 |
| 7 Users of the IDL compiler consist of: |
| 8 * Blink (hence Chromium), which uses the whole compiler in the build (to gener
ate V8 bindings) – this is the primary use. |
| 9 * Other bindings generators (who want to compile Blink IDL files), which use t
he front end and optionally parts of the top-level compiler and build scripts. |
| 10 * Other IDL use: Web IDL is a convenient format for specifying JavaScript inte
rfaces, and the front end can be used for reading these files, most simply using
the IR object (`IdlDefinitions`). |
| 11 |
| 12 The compiler is in [Source/bindings/scripts](https://code.google.com/p/chromium/
codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/), and bindin
gs are generated as part of the build (see [IDL build](https://sites.google.com/
a/chromium.org/dev/developers/design-documents/idl-build)). The top-level file [
idl_compiler.py](https://code.google.com/p/chromium/codesearch#chromium/src/thir
d_party/WebKit/Source/bindings/scripts/idl_compiler.py), can be invoked directly
(if you want to compile a single file), or via [Tools/Scripts/run-bindings-test
s](https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit
/Tools/Scripts/run-bindings-tests). The rest of this document describes design,
implementation, and use. |
| 13 |
| 14 ## Overall structure |
| 15 The compiler is factored into a pipeline, and the steps further divided into sep
arate Python modules. While there are a number of components, each is simple and
well-defined, and generally only one or two must be considered at a time. |
| 16 |
| 17 There are complicated issues with global information such as dependencies (see b
elow), but initially it is sufficient to consider compiling a single file. |
| 18 |
| 19 The overall structure is standard for [compilers](http://en.wikipedia.org/wiki/C
ompiler) (see [compiler construction](http://en.wikipedia.org/wiki/Compiler_cons
truction)): a front end reads in input (IDL), yielding an [intermediate represen
tation](http://en.wikipedia.org/wiki/Intermediate_representation) (IR), and the
back end takes the IR and writes out output (C++). |
| 20 |
| 21 ``` |
| 22 IDL (Foo.idl) --front end--> IR --back end--> C++ (V8Foo.h, V8Foo.cpp) |
| 23 ``` |
| 24 |
| 25 The flow is a (per file) [pipeline](http://en.wikipedia.org/wiki/Pipeline_(softw
are)): information flows one way, and, other than dependencies and global inform
ation, files are processed independently. In detail: |
| 26 |
| 27 ``` |
| 28 Front end: IDL --lexer--> tokens --parser--> AST --constructor--> IR |
| 29 Back end: IR --logic--> context --template processor--> C++ |
| 30 ``` |
| 31 |
| 32 In terms of data types, objects, and modules, this is: |
| 33 |
| 34 ``` |
| 35 IdlCompiler: idl_filename --IdlReader--> IdlDefinitions --CodeGeneratorV8-->
header_text, cpp_text |
| 36 |
| 37 IdlReader: idl_filename --BlinkIDLLexer--> (internal) --BlinkIDLParser--> ID
LNode |
| 38 --idl_definitions_builder--> IdlDefinitions |
| 39 |
| 40 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py
--> dict |
| 41 --jinja2--> header_text, cpp_text |
| 42 ``` |
| 43 |
| 44 One can trace the processing of each element of IDL input text through the pipel
ine, essentially 1-to-1: |
| 45 * Spec: [Interfaces](http://heycam.github.io/webidl/#idl-interfaces) (producti
on rule: [Interface](http://heycam.github.io/webidl/#prod-Interface)) |
| 46 * Input IDL text is parsed by the `p_Interface` rule in `idl_parser.IDLParser`
(base parser), producing an `IDLNode` (AST) named `'Interface'`. |
| 47 * An `IdlInterface` is constructed from this tree in `idl_definitions_builder` |
| 48 * Python logic produces a dict (Jinja context) via the `v8_interface.interface
_context` function |
| 49 * The context is rendered to `.cpp/.h` output text by Jinja, from the template
s `interface.cpp` (derived from `interface_base.cpp`) and `interface.h` |
| 50 |
| 51 The boundary between the stages of the pipeline can move, primarily in the CG be
tween the context and the template (depending on logic complexity: can it fit in
the template, or should it be a Python function?). Changing the boundary betwee
n the FE and the CG is possible (e.g., moving "find indexed property getter" int
o the `IdlInterface` constructor, instead of the CG `v8_interface` module), but
rare. In principle the constructor could be entirely integrated into the parser
(have the yacc actions construct the object directly, instead of an AST), but th
is is unlikely, as it would require a significant amount of work for a relativel
y minor simplification. |
| 52 |
| 53 Note that filenames, not IDL text, are what is input to the reader. This is fund
amentally because the compiler and the parser assume they are reading in files (
hence take a filename parameter, e.g. for reporting errors). Further, due to dep
endencies, processing one IDL file may require reading others (hence going out t
o the file system). This could be abstracted to taking in IDL text and passing i
nterface names rather than filenames internally, but this isn't necessary at pre
sent. |
| 54 |
| 55 The front end is familiar ([regex](http://en.wikipedia.org/wiki/Regular_expressi
on) [lexer](http://en.wikipedia.org/wiki/Lexical_analysis), [LALR parser](http:/
/en.wikipedia.org/wiki/LALR_parser), and object constructor). The lexer-parser u
ses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/), a Python implementation
of [lex](http://en.wikipedia.org/wiki/Lex_(software)) and [yacc](http://en.wiki
pedia.org/wiki/Yacc); the constructor is lengthy but straightforward. |
| 56 |
| 57 However, the back end ([code generator](http://en.wikipedia.org/wiki/Code_genera
tion_(compiler))) differs from compilers that target machine code: since it is a
[source-to-source compiler](http://en.wikipedia.org/wiki/Source-to-source_compi
ler), the back end outputs C++ (source code, not machine code). Thus we use a [t
emplate processor](http://en.wikipedia.org/wiki/Template_processor) to produce o
utput; recall that _templating_ (writing formatted _output_) is complementary to
_parsing_ (reading formatted _input_). The template processor is [Jinja](https:
//sites.google.com/a/chromium.org/dev/developers/jinja), which is used in severa
l places in Chromium. Given an interface, containing various attributes and meth
ods, we fill in a template for the corresponding C++ class and each attribute an
d method. |
| 58 |
| 59 Further, the input language is declarative (the 'D' in IDL), so no optimizations
of input code are necessary (there is no _middle end_): it's just filling in a
template. The back end is itself divided into the templates themselves, and the
Python code that fills in the templates (produces the _context_). There is also
no separate [_semantic analysis_](http://en.wikipedia.org/wiki/Semantic_analysis
_(compilers)) step, except for validation of extended attributes (see below): th
e code generator assumes types are valid, and errors show up when the resulting
C++ code fails to compile. This avoids the complexity and time cost of either a
separate validation step, or of type-checking in the code generator, at the cost
of typos showing up in compile failures instead of at IDL compile time. Notably
, there is no type checking or name binding, since identifiers are assumed to re
fer to Web IDL interfaces (C++ classes) and the Web IDL namespace is global, and
there is no assignment, since Web IDL is declarative. |
| 60 |
| 61 ## Code |
| 62 Code-wise, the top-level file [`idl_compiler.py`](https://code.google.com/p/chro
mium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_comp
iler.py) imports two modules: [`idl_reader`](https://code.google.com/p/chromium/
codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_reader.py
) for the front end, [`code_generator_v8`](https://code.google.com/p/chromium/co
desearch#chromium/src/third_party/WebKit/Source/bindings/scripts/code_generator_
v8.py) for the back end. Each of these is used to create an object (`IdlReader`
and `CodeGeneratorV8`), which handles the library initialization (PLY or Jinja,
which are slow, and thus reused if running multiple times during one run) and gl
obal information. The objects are then called one time each, for the IDL --> IR
and IR --> C++ steps, respectively. By contrast, `run-bindings-tests` creates th
ese objects as well, then calls them multiple times, once for each test file. No
te that in the actual build, compilation is parallelized, which is why only one
file is compiled per process, and there is a pre-caching step which significantl
y speeds up library initialization. |
| 63 |
| 64 The code is mostly functional, except for a few module-level variables in the co
de generator (discussed below) for simplicity, and some objects used for initial
ization. |
| 65 |
| 66 The main classes are as follows, with the module in which they are defined. Note
that the relations are primarily composition, with inheritance significantly us
ed for the Blink-specific lexer and parser. `IdlCompiler, IdlReader,` and `CodeG
eneratorV8` are expected to be used as singletons, and are just classes for init
ialization (avoids expensive re-initialization, and interface-wise separates ini
tialization from execution). `IdlDefinitions` contains objects of a few other cl
asses (for more minor data), and there is a bit of complex OO code in [`idl_defi
nitions`](https://code.google.com/p/chromium/codesearch#chromium/src/third_party
/WebKit/Source/bindings/scripts/idl_definitions.py) to simplify [typedef](http:/
/heycam.github.io/webidl/#idl-typedefs) resolution. |
| 67 |
| 68 ``` |
| 69 IdlCompiler :: idl_compiler |
| 70 IdlReader :: idl_reader |
| 71 BlinkIDLParser < IDLParser |
| 72 BlinkIDLLexer < IDLLexer |
| 73 IDLExtendedAttributeValidator |
| 74 CodeGeneratorV8 :: code_generator_v8 |
| 75 ``` |
| 76 ``` |
| 77 IdlDefinitions :: idl_definitions |
| 78 IdlInterface |
| 79 IdlAttribute |
| 80 IdlConstant |
| 81 IdlOperation |
| 82 IdlArgument |
| 83 ``` |
| 84 |
| 85 ## Front end |
| 86 The front end is structured as a lexer --> parser --> object constructor pipelin
e: |
| 87 |
| 88 Front end: `IDL --lexer--> tokens --parser--> AST --constructor--> IR` |
| 89 |
| 90 There are two other steps: |
| 91 * extended attribute validation |
| 92 * dependency resolution |
| 93 |
| 94 The top-level module for the front end is [`idl_reader`](https://code.google.com
/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/i
dl_reader.py). This implements a class, `IdlReader`, whose constructor also cons
tructs a lexer, parser, validator, and resolver. `IdlReader` implements a method
to construct an IR object (`IdlDefinitions`) from an IDL filename. Thus to conv
ert IDL to IR one instantiates a reader, then call its method to read an IDL fil
e. |
| 95 |
| 96 The lexer-parser uses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/). In fa
ct, the lexer and parser for the Blink IDL dialect of Web IDL derive from a base
lexer and base parser for standard Web IDL (in [tools/idl_parser](https://code.
google.com/p/chromium/codesearch#chromium/src/tools/idl_parser)), and thus only
need to include deviations from the standard. The lexical syntax of Blink IDL is
standard (though the base lexer is slightly non-standard), so the Blink IDL lex
er is very small and slated for removal (Bug [345137](https://code.google.com/p/
chromium/issues/detail?id=345137)). The phrase syntax is slightly non-standard (
primarily in extended attributes) and expected to stay this way, as extended att
ributes are implementation-dependent and the deviations are useful (see [Blink I
DL: syntax](http://www.chromium.org/blink/webidl#TOC-Syntax)). We thus say that
the base lexer/parser + Blink lexer/parser form 1 lexer and 1.1 parser (base + d
erived). |
| 97 |
| 98 The base lexer class, defined in [`idl_lexer`](https://code.google.com/p/chromiu
m/codesearch#chromium/src/tools/idl_parser/idl_lexer.py), is straightforward and
short: it is just a list of regular expressions and keywords, wrapped in a clas
s, `IDLLexer`; a lexer (object) itself is an instance of this class. There is so
me minor complexity with error handling (correct line count) and methods to assi
st with derived classes, but it's quite simple. The derived lexer class is very
simple: it's a class, `BlinkIDLLexer`, which derives from the base class. The on
ly complexity is adding a method to remove a token class from the base class, an
d to remove comments (base lexer is non-standard here: standard lexer does not p
roduce `COMMENT` tokens). |
| 99 |
| 100 The base parser class, defined in [`idl_parser`](https://code.google.com/p/chrom
ium/codesearch#chromium/src/tools/idl_parser/idl_parser.py), is considerably lon
ger, but consists almost entirely of production rules in (a form of) [BNF](http:
//en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), together with yacc _actions_
that build an [abstract syntax tree](http://en.wikipedia.org/wiki/Abstract_synta
x_tree) (AST, syntax tree). Recall that yacc _traverses_ the [concrete syntax tr
ee](http://en.wikipedia.org/wiki/Concrete_syntax_tree) (CST, parse tree) and can
take whatever actions it chooses; in this case it generates an AST, though it c
ould also generate the IR directly. See [`blink_idl_parser`](https://code.google
.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scrip
ts/blink_idl_parser.py) for a detailed explanation of the PLY yacc syntax and `I
DLParser` methods used to construct the AST. The grammar is directly from the [W
eb IDL grammar](http://heycam.github.io/webidl/#idl-grammar), which is an [LL(1)
grammar](http://en.wikipedia.org/wiki/LL_grammar); the Blink IDL rules are exac
tly the deviations from the standard grammar, or overrides to irregularities in
the base parser. A parser object is an instance of a class, `IDLParser`, from wh
ich `BlinkIDLParser` is derived. The parser constructor takes a lexer as an argu
ment, and one passes the corresponding lexer. |
| 101 |
| 102 The definitions classes, defined in [idl_definitions](https://code.google.com/p/
chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_
definitions.py), primarily consist of constructors, which take the AST and gener
ate an intermediate representation (IR), in this case an object of type `IdlDefi
nitions` and contained objects for individual definitions and definition's membe
rs. |
| 103 |
| 104 The classes are as follows (mostly composition, one case of inheritance); a few
internal-use only classes are not shown: |
| 105 |
| 106 ``` |
| 107 IdlDefinitions |
| 108 IdlInterface |
| 109 IdlAttribute |
| 110 IdlConstant |
| 111 IdlOperation |
| 112 IdlArgument |
| 113 IdlException < IdlInterface |
| 114 IdlCallbackFunction |
| 115 IdlEnum |
| 116 ``` |
| 117 |
| 118 After reading an IDL file (producing an IR), the reader has two optional additio
nal steps (these are run during the build, but aren't necessary for simpler use,
such as compiling a test file without dependencies): validation and dependency
resolution. |
| 119 |
| 120 ### Extended attribute validation |
| 121 The front end largely does not do [semantic analysis](http://en.wikipedia.org/wi
ki/Semantic_analysis_%28compilers%29), as semantic errors (primarily name errors
) are largely caught by the build visibly failing, either during the IDL compile
or during the C++ compile (name lookup errors). Notably, there is no symbol tab
le or name binding in the front end: each name is simply looked up on use (e.g.,
type names), or passed through to the output (e.g., attribute names), and catch
errors by the lookup failing or the generated code failing to compile, respecti
vely. |
| 122 |
| 123 However, extended attributes are validated, both the keys and values, based on t
he list in [`IDLExtendedAttributes.txt`](https://code.google.com/p/chromium/code
search#chromium/src/third_party/WebKit/Source/bindings/IDLExtendedAttributes.txt
). This is done because invalid extended attributes are _ignored_ by the compile
r, specifically the code generator, and thus errors are easy to miss. The Python
code checks for specific extended attributes, so errors just result in the attr
ibute not being found; for example, writing `[EnforecRange]` for `[EnforceRange]
` would otherwise silently result in the range not being enforced. This is not p
erfect: extended attributes may be valid but used incorrectly (e.g., specified o
n an attribute when they only apply to methods), which is not caught; this is ho
wever a minor problem. |
| 124 |
| 125 This is done in the front end since that is the proper place for semantic analys
is, and simplifies the code generator. |
| 126 |
| 127 ### Dependency resolution |
| 128 IDL interface definitions have two types of dependencies: |
| 129 * An IDL interface definition can be spread across multiple files, via partial
interfaces and (in effect) by `implements` (interface inheritance). We call the
se _dependency interface definitions, dependency interfaces,_ or just _dependenc
ies_. |
| 130 * An IDL interface may use other interfaces as interface types: e.g., `interfa
ce Foo { attribute Bar bar; };` We call these _referenced interfaces_. |
| 131 |
| 132 The dependency resolution phase consists of merging the dependencies into the ma
in interface definition, and adding referenced interfaces to the list of all int
erfaces (available to the code generator). These dependencies are computed durin
g the _Global information_ stage; see below. |
| 133 |
| 134 Members defined in partial interfaces and in implemented interfaces are simply a
ppended to the list of members of the main interface, with a few caveats (below)
. Formally the difference between partial interfaces and implemented interfaces
is that a partial interface is (external) type extension (it is part of and modi
fies the type, but is just defined separately), and a partial interface is assoc
iated with a _single_ main interface that it extends; while implemented interfac
es are _interface inheritance_ (providing multiple inheritance in IDL) and _seve
ral_ interfaces can implement a given (implemented) interface. This difference i
s only for IDL: in JavaScript these are both just exposed as properties on the o
bject implementing the interface; and thus we mostly don't distinguish them in t
he code generator either, just merging them into one list. |
| 135 |
| 136 The subtleties are: |
| 137 * Members defined in partial interfaces are treated differently by the code ge
nerator: they are assumed to be implemented as static members in separate classe
s, not as methods on the object itself. This is because partial interfaces are u
sually for optional or specialized parts of a basic interface, which might only
be used by subtypes, and thus we do not want to clutter the main Blink C++ class
with these. |
| 138 * Some extended attributes are transferred from the dependency interface defin
ition to its members, because otherwise there is no way to apply the attributes
to "just the members in this definition", due to merging. These are extended att
ributes for turning off a dependency interface, namely: `[Conditional], [PerCont
extEnabled],` and `[RuntimeEnabled].` |
| 139 * Interface-level `[TypeChecking]` extended attributes are also recognized on
partial interfaces as a convenience. Its value is also transferred to its member
s, appending it to any `[TypeChecking]` values that the member already declares. |
| 140 |
| 141 Also, note that the compiler currently does not do significant type introspectio
n of referenced interfaces: it mostly just uses certain global information about
the interface (is it a callback interface?, `[ImplementedAs],` etc.). The only
significant use of type introspection is in `[PutForwards],` where generating th
e code for an attribute with `[PutForwards]` requires looking up the referenced
attribute in another interface. Type introspection may increase in future, to si
mplify complex cases and reduce hard-coding, but for now is in very limited use.
</div> |
| 142 |
| 143 ## Back end (code generator) |
| 144 The back end (code generator, CG) is where the bulk of the complexity is. The st
ructure is simple: the Python V8 CG modules (`v8_*`) generate a dictionary calle
d the **context**, which is passed to Jinja, which uses this dict to fill in the
appropriate templates ([bindings/templates](https://code.google.com/p/chromium/
codesearch#chromium/src/third_party/WebKit/Source/bindings/templates/)/*.{cpp,h}
). The modules correspond to the class structure in IR, with the following excep
tions: |
| 145 * the CG uses “method” where the front end uses “operation” (the term in the s
pec), since “method” is a more standard term; |
| 146 * constants are included in `v8_interface`, because very simple; and |
| 147 * arguments are included in `v8_methods`, because they are closely related to
methods. |
| 148 |
| 149 Recall the overall flow of the CG: |
| 150 ``` |
| 151 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py
--> dict |
| 152 --jinja2--> header_text, cpp_text |
| 153 ``` |
| 154 |
| 155 The class structure of the IR, and corresponding Python module that processes it
(in _italics_ if included in a previously mentioned module), are: |
| 156 ``` |
| 157 IdlInterface :: v8_interface |
| 158 IdlAttribute :: v8_attributes |
| 159 IdlConstant :: _v8_interface_ |
| 160 IdlOperation :: v8_methods |
| 161 IdlArgument :: _v8_methods_ |
| 162 ``` |
| 163 The CG modules each have a top-level `*_context` function_,_ which takes an IR o
bject (`Idl*`) as a parameter and returns a context (`dict`), looping through co
ntained IR objects as necessary. |
| 164 |
| 165 Note that the context is nested – the context for an interfaces contains a list
of contexts for the members: attributes, constants, and methods. The context for
a given member is used throughout the corresponding template, generally to gene
rate several methods; e.g., in `attributes.cpp` there are macros for getter, get
ter callback, setter, and setter callback, which are called from a loop in `inte
rface_base.cpp`. The context for members is _also_ used at the overall interface
level, notably for DOM configuration. |
| 166 |
| 167 ## Style |
| 168 [`v8_attributes.py`](https://code.google.com/p/chromium/codesearch#chromium/src/
third_party/WebKit/Source/bindings/scripts/v8_attributes.py) (Python) and [`attr
ibutes.cpp`](https://code.google.com/p/chromium/codesearch#chromium/src/third_pa
rty/WebKit/Source/bindings/templates/attributes.cpp) (Jinja) are a good guide to
style: the getter and setter methods themselves are quite complex, while the ca
llbacks are simple. |
| 169 |
| 170 See [Jinja: Style](https://sites.google.com/a/chromium.org/dev/developers/jinja#
TOC-Style) for general Jinja style tips; below is bindings generator specific gu
idance. |
| 171 |
| 172 ### Jinja |
| 173 If nesting macros (so macros that call macros), use top-down order: top-level ma
cro, followed by macros it calls. This is most notable in [`methods.cpp`](https:
//code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/b
indings/templates/methods.cpp), where generating a method also requires generati
ng code for the arguments, which comes after. |
| 174 |
| 175 ### Python |
| 176 Assertions should _not_ be used for validating input, and this includes IDL file
s: if an IDL file is invalid (e.g., it uses an extended attribute improperly), r
aise an exception explicitly. Assertions can be used for validating internal log
ic, but in practice this is rare in the code generator; see [Using Assertions Ef
fectively](https://wiki.python.org/moin/UsingAssertionsEffectively). |
| 177 |
| 178 Deviations from the "One dictionary display per context" rule: |
| 179 * If the presence of an extended attribute has side effects (namely includes),
this code comes before the display. |
| 180 * An early return after the main display, and then additional processing after
wards, occurs for attributes, where certain attributes (constructors, custom att
ributes) have limited processing, and the side effects of running the additional
context-generation code (extra includes) are not desired. |
| 181 * Similarly, the getter and setter logic for attributes is relative complicate
d, and depends on previously computed variables, and thus this part of the conte
xt is produced in other functions, then added to the main dictionary via `dict.u
pdate()`. |
| 182 |
| 183 **No extended attributes in templates:** A key point for the CG is that the raw
extended attributes are _not_ passed to the Jinja templates, and literal extende
d attribute names accordingly do not appear in templates. Instead, extended attr
ibutes are all passed via some variable in the context, either a content variabl
e or a (boolean) control variable (e.g., `deprecate_as` for the `[DeprecateAs]`
extended attribute and `is_reflect` for the `[Reflect]` extended attribute (this
is only boolean since the value is used elsewhere, in the function for the gett
er/setter name)). This is primarily for consistency: in some cases variables _mu
st_ be used, notably if the extended attribute requires a side effect (includes)
, and thus putting all literal references to extended attributes in the Python l
ogic keeps them together. |
| 184 |
| 185 **Includes:** The CG code is primarily functional, with one significant exceptio
n: _includes_ (for the `.cpp` file; the `.h` includes are simple). Includes are
added to the `.cpp` file throughout the CG (as needed by various features), and
account for a significant amount of the code bulk. This is handled by considerin
g includes as a [side effect](http://en.wikipedia.org/wiki/Side_effect_(computer
_science)), and storing them in a global variable (or rather a module level vari
able, as Python modules are [singletons](http://en.wikipedia.org/wiki/Singleton_
pattern)), specifically in `v8_global`. Alternatives include: being purely funct
ional, and returning the includes from the functions, which makes the calls and
returns bulky (due to all the passing backwards and forwards); or having all CG
functions be methods of some object (`CodeGeneratorV8`), which adds a large amou
nt of OO boilerplate. The only complexity of using module variables is that one
needs to import it into each module, and must clear the includes if compiling mu
ltiple files in a single Python run (and can't compile multiple files concurrent
ly in a single Python processes). This latter is ok because compilation is paral
lelized by distributing across separate processes, not by compiling multiple fil
es in a single process. |
| 186 |
| 187 Note that the context being computed is _not_ a module-wide variable: for attrib
utes, modules, and arguments, a context is computed for each of these (functiona
lly), though for an interface only a single context is computed. In rare cases t
he context is also passed into functions as an argument (when later values depen
d on earlier computed values), which is primarily to flatten the module: it's cl
earer to pass explicitly, instead of having a nested function. |
| 188 |
| 189 There are a few other module-level variables for the IDL global context, which a
re write-once or update-only: `v8_globals.interfaces`, for referenced interfaces
, and a few in `v8_types`, for global and local type information (e.g., `[Implem
entedAs]`, enumerations). These are set at `CodeGeneratorV8` construction (from
`interfaces_info`), which is write-once, or `CodeGeneratorV8.generate_code` invo
cation (from `definitions`), which is update-only (can add some new enumerations
, for instance, but shouldn't need or refer to any from other files). |
| 190 |
| 191 **Template rendering:** the context is then passed into a Jinja template, via th
e [`Template.render`](http://jinja.pocoo.org/docs/api/#jinja2.Template.render) m
ethod, which returns the generated code. |
| 192 |
| 193 ## Global information |
| 194 Global information is computed by a pre-processing step, dependency resolution a
nd merging is done by the front end, and global information is used by the code
generator in limited places. |
| 195 |
| 196 `FIXME` |
| 197 |
| 198 ## Dependencies |
| 199 `FIXME` |
| 200 |
| 201 ## Goals |
| 202 Key goals of the compiler are as follows (in descending order of importance); th
ese primarily affect the code generator: |
| 203 * **Correctness** (including **security**) and **performance** of generated ob
ject code (speed, memory usage, size), after compiling C++ to object code |
| 204 * **Simplicity** of compiler code itself, particularly code generator |
| 205 * **Build performance** (see [IDL build: performance](http://www.chromium.org/
developers/design-documents/idl-build#TOC-Performance) and [Performance](http://
www.chromium.org/developers/design-documents/idl-compiler#TOC-Performance), belo
w) |
| 206 |
| 207 Note that **legibility** of generated C++ source code is secondary to legibility
of the code generator, as the CG is what is developed and reviewed, though the
generated code should be legible if this does not conflict with other goals. As
a rule, simple (one-line) changes to CG that simplify the generated source code
are ok, but more complex changes that are only for source code legibility (that
have no effect on object code) should be avoided. If you rely on the compiler (e
.g., for dead code elimination), please write a comment to that effect. |
| 208 |
| 209 Since the CG is complex enough as it is, a key part of the design is moving comp
lexity _out_ of the code generator, notably to the front end (building an easy-t
o-use IR) and to the global information step. |
| 210 |
| 211 ## Alternatives |
| 212 Fundamentally, why do we have our own compiler at all? Can't we reuse an existin
g program? |
| 213 |
| 214 The answer is: we're reusing as much code as possible (lexer/parser and template
library, and indeed a lexer/parser for the Web IDL standard), but we need custo
m code for 3 purposes: |
| 215 * Blink IDL dialect, which differs from Web IDL standard: we inherit from a st
andard parser, so this is small, but we need our own. |
| 216 * IR constructor: the AST output from the base parser is hard to use directly
(you need to walk a tree), so we construct an easy-to-use object so the CG can h
ave simple reading code (instead of walking the tree itself). |
| 217 * V8 code generator: we need to generate very specific C++ code to bind to V8
and implement Web IDL, with many options and special cases. |
| 218 |
| 219 Of these, the V8 CG is the main reason; Blink IDL dialect and the IR constructor
exist to reduce complexity of the CG. |
| 220 |
| 221 ### Why not C++? |
| 222 The compiler is written in Python, rather than C++, because hackability (ease of
understanding and modification) is more important than build speed, and build s
peed is good enough. |
| 223 |
| 224 A high-level language like Python is the usual choice for such tasks, and Python
is the standard high-level language in Chromium: other languages (except JavaSc
ript) are not widely used in Chromium, and JavaScript offers no advantages over
Python. |
| 225 |
| 226 Also, C++ code that generates C++ code risks being very confusing. In principle
one could imagine writing the CG in template metaprogramming, but that would lik
ely be hard to understand. |
| 227 |
| 228 In principle C/C++ code may be used in the front end (lexer, parser, and constru
ctor), particularly as the lexer and parser can be generated by lex-yacc, but th
e code generator is expected to be in Python. |
| 229 |
| 230 ### Why not SWIG? |
| 231 We can't use SWIG because there's in fact very little overlap. Essentially we us
e Jinja templates+Python logic for the backend, rather than SWIG templates. |
| 232 |
| 233 While SWIG is a general "high-level language to C++ bindings generator", and thu
s seems a natural fit, it doesn't fill our needs. We need a front end because we
're taking IDL files as input, and we need a code generator that supports all th
e specific features that Web IDL and Blink require: Web IDL has different semant
ics from JavaScript generally, and Blink has many IDL extended attributes and ot
her implementation details. Further, there is a prototype V8 backend for SWIG, b
ut it is incomplete (the JavaScript backend primarily targets JavaScriptCore). I
n principle we could have a compiler whose back end output SWIG templates, compl
ete the general V8 backend for SWIG, and then modify this or fork it to make a W
eb IDL to V8 backend, customized for Blink. As this description makes clear, tha
t involves significant complexity and few advantages over generating the C++ cod
e directly, using Jinja instead of SWIG. |
| 234 |
| 235 ### Why not another lexer/parser or template processor? |
| 236 PLY (built on lex-yacc) and Jinja are standard and widely-used libraries for the
se tasks, and are used elsewhere in Chromium. We're open to other libraries if t
hey are significantly superior (and we switch throughout Chromium), but these li
braries are fit for purpose. |
| 237 |
| 238 ## Performance |
| 239 _For overall IDL build performance, see: [IDL build: Performance](http://www.chr
omium.org/developers/design-documents/idl-build#TOC-Performance)_ |
| 240 |
| 241 The bindings are generated by running a separate compiler process for each inter
face (main IDL file), for parallelization and simplicity of build. This section
discusses optimization of individual IDL compiler runs. |
| 242 |
| 243 For compilation of an individual IDL file, the key factor in startup time, both
Python startup time, and initialization time of libraries. This is because most
IDL files are quite short and simple, so processing time (variable cost) is shor
t relative to startup time (fixed cost). Currently (March 2014) the average sequ
ential compilation time of a single IDL file on a fast Linux workstation is ~80
milliseconds, compared to Python startup time of ~4 milliseconds (skipping impor
t of `site` with `-S`). |
| 244 |
| 245 Naively, the key limiting factor is initialization time of the libraries (PLY an
d Jinja). This is expensive, as it requires compiling the lexer & parser or temp
lates. Initialization is sped up by pre-caching these in a separate build step,
which significantly improves build time. Note that `run-bindings-tests` will com
pile multiple test files in a single Python process_,_ which is quite fast, sinc
e initialization is done only once. |
| 246 |
| 247 The compile has not been profiled (except by using [time(1)](http://en.wikipedia
.org/wiki/Time_(Unix)) on a build), since the low-hanging fruit were obvious (ca
che lexer and parser tables and templates), performance is now good enough, and
significant additional improvements are unlikely without major work or reenginee
ring.< |
| 248 |
| 249 Processing time itself is presumably dominated by the main libraries (parsing vi
a PLY and template rendering via Jinja), and possibly object construction (IdlDe
finitions); the actual CG is likely quite cheap, as it's a simple iteration over
the IR with a few conditionals and simple computations. |
| 250 |
| 251 To our knowledge, there are no O(_n_<sup>2</sup>) or slower algorithms in the co
mpiler – everything should be O(_n_) or faster – but some may be lurking somewhe
re. |
| 252 |
| 253 ### Further improvements |
| 254 Potential future improvements (without substantial reengineering) would primaril
y be to improve startup of some components, or rewrite some components in C. |
| 255 |
| 256 * Python tweaks: |
| 257 * execute bytecode directly (compile to `.pyc` or `.pyo` in separate step and
execute that): v. small improvement (about .5 ms/process), would require other b
uild step |
| 258 * use Python optimized mode (`-O` or `-OO`): likely little improvement (just s
hrinks size of bytecode), and currently causes serious regressions (likely due t
o breaking caching in a library) |
| 259 * Further optimization of Jinja startup requires changes in the Jinja library
itself; see discussion at references. |
| 260 * Replacing parts of the PLY lexer-parser with a C/C++ library (produced by le
x or yacc) would speed up execution of the front end, as it would be C code, not
Python code, and make it virtually instantaneous. C lex/C++ yacc code is about
50x faster than PLY, and the lexer and parser typically each account for 40% of
PLY execution time; remainder is cost of running the yacc actions, in this case
to build an AST ([Writing Parsers and Compilers with PLY](http://www.dabeaz.com/
ply/PLYTalk.pdf), slides 76 and 77). |
| 261 * A C lexer would be simple and significant, while a C++ parser would be relat
ively complicated, particularly due to the use of inheritance to derive a parser
for the Blink IDL dialect from a parser for the standard Web IDL language. |
| 262 * <div>In terms of the overall build system, initialization costs could be sig
nificantly reduced by initializing a single process and then forking it, so it's
just a copy. However, this is platform dependent (Windows does not have `fork()
`), and would significantly complicate the compiler.</div> |
| 263 |
| 264 ### References |
| 265 * PLY: [Lex: Optimized mode](http://www.dabeaz.com/ply/ply.html#ply_nn15) and
[Using Python's Optimized Mode](http://www.dabeaz.com/ply/ply.html#ply_nn38): we
use optimization for the control over caching, not because we are running Pytho
n in optimized mode (we aren't) |
| 266 * [Jinja performance](http://www.chromium.org/developers/jinja#TOC-Performance
) |
| 267 |
| 268 ## References |
| 269 * [Source/bindings/scripts](https://code.google.com/p/chromium/codesearch#chro
mium/src/third_party/WebKit/Source/bindings/scripts/): the code |
| 270 * [Issue <strike>239771</strike>: Rewrite the IDL compiler in Python](https://
groups.google.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): tracking
bug for rewrite (2013/2014) |
| 271 * [IDL compiler (V8 bindings generator) switched to Python](https://groups.goo
gle.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): wrap-up email for r
ewrite (Feb 2014) |
| 272 * [followup](https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/
5FSUhiugWsE) about front end |
OLD | NEW |