third_party/WebKit/Source/bindings/IDLCompiler.md - Issue 2195743003: Migrate IDL compiler documentation to Markdown.

Side by Side Diff: third_party/WebKit/Source/bindings/IDLCompiler.md

Issue 2195743003: Migrate IDL compiler documentation to Markdown. (Closed) Base URL: https://chromium.googlesource.com/chromium/src.git@master

Patch Set: Created 4 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 The IDL compiler or bindings generator [transcompiles](http://en.wikiped ia.org/wiki/Transcompiler) [Web IDL](https://sites.google.com/a/chromium.org/dev /blink/webidl) to C++ code, specifically [bindings](http://en.wikipedia.org/wiki /Language_binding) between V8 (the [JavaScript engine](http://en.wikipedia.org/w iki/JavaScript_engine)) and Blink. That is, when an attribute or method in a [We b IDL interface](https://sites.google.com/a/chromium.org/dev/developers/web-idl- interfaces) is used from JavaScript, V8 calls the bindings code, which calls Bli nk code.

	2

	3 As of early 2014, there are almost 700 IDL files, over 2,000 attributes, and alm ost 4,000 methods. This bindings code is highly repetitive, primarily passing va lues (arguments and return values) between V8 and Blink and doing various conver sions and checks (e.g., type checking), hence it is mostly machine-generated.

	4

	5 This is complicated by numerous parameters and special cases, due to either want ing to automatically generate repetitive code (so Blink implementation is simple r), supporting quirks of Web interfaces, or Blink implementation details. The co mpiler is under active development, due to adding new features (bug [345506](htt ps://code.google.com/p/chromium/issues/detail?id=345506)), adjusting to changes in V8 or Blink, and reducing technical debt, particularly custom bindings (manua lly written bindings, bug [345519](https://code.google.com/p/chromium/issues/det ail?id=345519)). By far the majority of development occurs on the _back end_ ( code generator); the _front end_ and overall structure is quite stable.

	6

	7 Users of the IDL compiler consist of:

	8 * Blink (hence Chromium), which uses the whole compiler in the build (to gener ate V8 bindings) – this is the primary use.

	9 * Other bindings generators (who want to compile Blink IDL files), which use t he front end and optionally parts of the top-level compiler and build scripts.

	10 * Other IDL use: Web IDL is a convenient format for specifying JavaScript inte rfaces, and the front end can be used for reading these files, most simply using the IR object (`IdlDefinitions`).

	11

	12 The compiler is in [Source/bindings/scripts](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/), and bindin gs are generated as part of the build (see [IDL build](https://sites.google.com/ a/chromium.org/dev/developers/design-documents/idl-build)). The top-level file [ idl_compiler.py](https://code.google.com/p/chromium/codesearch#chromium/src/thir d_party/WebKit/Source/bindings/scripts/idl_compiler.py), can be invoked directly (if you want to compile a single file), or via [Tools/Scripts/run-bindings-test s](https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit /Tools/Scripts/run-bindings-tests). The rest of this document describes design, implementation, and use.

	13

	14 ## Overall structure

	15 The compiler is factored into a pipeline, and the steps further divided into sep arate Python modules. While there are a number of components, each is simple and well-defined, and generally only one or two must be considered at a time.

	16

	17 There are complicated issues with global information such as dependencies (see b elow), but initially it is sufficient to consider compiling a single file.

	18

	19 The overall structure is standard for [compilers](http://en.wikipedia.org/wiki/C ompiler) (see [compiler construction](http://en.wikipedia.org/wiki/Compiler_cons truction)): a front end reads in input (IDL), yielding an [intermediate represen tation](http://en.wikipedia.org/wiki/Intermediate_representation) (IR), and the back end takes the IR and writes out output (C++).

	20

	21 ```

	22 IDL (Foo.idl) --front end--> IR --back end--> C++ (V8Foo.h, V8Foo.cpp)

	23 ```

	24

	25 The flow is a (per file) [pipeline](http://en.wikipedia.org/wiki/Pipeline_(softw are)): information flows one way, and, other than dependencies and global inform ation, files are processed independently. In detail:

	26

	27 ```

	28 Front end: IDL --lexer--> tokens --parser--> AST --constructor--> IR

	29 Back end: IR --logic--> context --template processor--> C++

	30 ```

	31

	32 In terms of data types, objects, and modules, this is:

	33

	34 ```

	35 IdlCompiler: idl_filename --IdlReader--> IdlDefinitions --CodeGeneratorV8--> header_text, cpp_text

	36

	37 IdlReader: idl_filename --BlinkIDLLexer--> (internal) --BlinkIDLParser--> ID LNode

	38 --idl_definitions_builder--> IdlDefinitions

	39

	40 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py --> dict

	41 --jinja2--> header_text, cpp_text

	42 ```

	43

	44 One can trace the processing of each element of IDL input text through the pipel ine, essentially 1-to-1:

	45 * Spec: [Interfaces](http://heycam.github.io/webidl/#idl-interfaces) (producti on rule: [Interface](http://heycam.github.io/webidl/#prod-Interface))

	46 * Input IDL text is parsed by the `p_Interface` rule in `idl_parser.IDLParser` (base parser), producing an `IDLNode` (AST) named `'Interface'`.

	47 * An `IdlInterface` is constructed from this tree in `idl_definitions_builder`

	48 * Python logic produces a dict (Jinja context) via the `v8_interface.interface _context` function

	49 * The context is rendered to `.cpp/.h` output text by Jinja, from the template s `interface.cpp` (derived from `interface_base.cpp`) and `interface.h`

	50

	51 The boundary between the stages of the pipeline can move, primarily in the CG be tween the context and the template (depending on logic complexity: can it fit in the template, or should it be a Python function?). Changing the boundary betwee n the FE and the CG is possible (e.g., moving "find indexed property getter" int o the `IdlInterface` constructor, instead of the CG `v8_interface` module), but rare. In principle the constructor could be entirely integrated into the parser (have the yacc actions construct the object directly, instead of an AST), but th is is unlikely, as it would require a significant amount of work for a relativel y minor simplification.

	52

	53 Note that filenames, not IDL text, are what is input to the reader. This is fund amentally because the compiler and the parser assume they are reading in files ( hence take a filename parameter, e.g. for reporting errors). Further, due to dep endencies, processing one IDL file may require reading others (hence going out t o the file system). This could be abstracted to taking in IDL text and passing i nterface names rather than filenames internally, but this isn't necessary at pre sent.

	54

	55 The front end is familiar ([regex](http://en.wikipedia.org/wiki/Regular_expressi on) [lexer](http://en.wikipedia.org/wiki/Lexical_analysis), [LALR parser](http:/ /en.wikipedia.org/wiki/LALR_parser), and object constructor). The lexer-parser u ses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/), a Python implementation of [lex](http://en.wikipedia.org/wiki/Lex_(software)) and [yacc](http://en.wiki pedia.org/wiki/Yacc); the constructor is lengthy but straightforward.

	56

	57 However, the back end ([code generator](http://en.wikipedia.org/wiki/Code_genera tion_(compiler))) differs from compilers that target machine code: since it is a [source-to-source compiler](http://en.wikipedia.org/wiki/Source-to-source_compi ler), the back end outputs C++ (source code, not machine code). Thus we use a [t emplate processor](http://en.wikipedia.org/wiki/Template_processor) to produce o utput; recall that _templating_ (writing formatted _output_) is complementary to _parsing_ (reading formatted _input_). The template processor is [Jinja](https: //sites.google.com/a/chromium.org/dev/developers/jinja), which is used in severa l places in Chromium. Given an interface, containing various attributes and meth ods, we fill in a template for the corresponding C++ class and each attribute an d method.

	58

	59 Further, the input language is declarative (the 'D' in IDL), so no optimizations of input code are necessary (there is no _middle end_): it's just filling in a template. The back end is itself divided into the templates themselves, and the Python code that fills in the templates (produces the _context_). There is also no separate [_semantic analysis_](http://en.wikipedia.org/wiki/Semantic_analysis _(compilers)) step, except for validation of extended attributes (see below): th e code generator assumes types are valid, and errors show up when the resulting C++ code fails to compile. This avoids the complexity and time cost of either a separate validation step, or of type-checking in the code generator, at the cost of typos showing up in compile failures instead of at IDL compile time. Notably , there is no type checking or name binding, since identifiers are assumed to re fer to Web IDL interfaces (C++ classes) and the Web IDL namespace is global, and there is no assignment, since Web IDL is declarative.

	60

	61 ## Code

	62 Code-wise, the top-level file [`idl_compiler.py`](https://code.google.com/p/chro mium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_comp iler.py) imports two modules: [`idl_reader`](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_reader.py ) for the front end, [`code_generator_v8`](https://code.google.com/p/chromium/co desearch#chromium/src/third_party/WebKit/Source/bindings/scripts/code_generator_ v8.py) for the back end. Each of these is used to create an object (`IdlReader` and `CodeGeneratorV8`), which handles the library initialization (PLY or Jinja, which are slow, and thus reused if running multiple times during one run) and gl obal information. The objects are then called one time each, for the IDL --> IR and IR --> C++ steps, respectively. By contrast, `run-bindings-tests` creates th ese objects as well, then calls them multiple times, once for each test file. No te that in the actual build, compilation is parallelized, which is why only one file is compiled per process, and there is a pre-caching step which significantl y speeds up library initialization.

	63

	64 The code is mostly functional, except for a few module-level variables in the co de generator (discussed below) for simplicity, and some objects used for initial ization.

	65

	66 The main classes are as follows, with the module in which they are defined. Note that the relations are primarily composition, with inheritance significantly us ed for the Blink-specific lexer and parser. `IdlCompiler, IdlReader,` and `CodeG eneratorV8` are expected to be used as singletons, and are just classes for init ialization (avoids expensive re-initialization, and interface-wise separates ini tialization from execution). `IdlDefinitions` contains objects of a few other cl asses (for more minor data), and there is a bit of complex OO code in [`idl_defi nitions`](https://code.google.com/p/chromium/codesearch#chromium/src/third_party /WebKit/Source/bindings/scripts/idl_definitions.py) to simplify [typedef](http:/ /heycam.github.io/webidl/#idl-typedefs) resolution.

	67

	68 ```

	69 IdlCompiler :: idl_compiler

	70 IdlReader :: idl_reader

	71 BlinkIDLParser < IDLParser

	72 BlinkIDLLexer < IDLLexer

	73 IDLExtendedAttributeValidator

	74 CodeGeneratorV8 :: code_generator_v8

	75 ```

	76 ```

	77 IdlDefinitions :: idl_definitions

	78 IdlInterface

	79 IdlAttribute

	80 IdlConstant

	81 IdlOperation

	82 IdlArgument

	83 ```

	84

	85 ## Front end

	86 The front end is structured as a lexer --> parser --> object constructor pipelin e:

	87

	88 Front end: `IDL --lexer--> tokens --parser--> AST --constructor--> IR`

	89

	90 There are two other steps:

	91 * extended attribute validation

	92 * dependency resolution

	93

	94 The top-level module for the front end is [`idl_reader`](https://code.google.com /p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/i dl_reader.py). This implements a class, `IdlReader`, whose constructor also cons tructs a lexer, parser, validator, and resolver. `IdlReader` implements a method to construct an IR object (`IdlDefinitions`) from an IDL filename. Thus to conv ert IDL to IR one instantiates a reader, then call its method to read an IDL fil e.

	95

	96 The lexer-parser uses [PLY (Python Lex-Yacc)](http://www.dabeaz.com/ply/). In fa ct, the lexer and parser for the Blink IDL dialect of Web IDL derive from a base lexer and base parser for standard Web IDL (in [tools/idl_parser](https://code. google.com/p/chromium/codesearch#chromium/src/tools/idl_parser)), and thus only need to include deviations from the standard. The lexical syntax of Blink IDL is standard (though the base lexer is slightly non-standard), so the Blink IDL lex er is very small and slated for removal (Bug [345137](https://code.google.com/p/ chromium/issues/detail?id=345137)). The phrase syntax is slightly non-standard ( primarily in extended attributes) and expected to stay this way, as extended att ributes are implementation-dependent and the deviations are useful (see [Blink I DL: syntax](http://www.chromium.org/blink/webidl#TOC-Syntax)). We thus say that the base lexer/parser + Blink lexer/parser form 1 lexer and 1.1 parser (base + d erived).

	97

	98 The base lexer class, defined in [`idl_lexer`](https://code.google.com/p/chromiu m/codesearch#chromium/src/tools/idl_parser/idl_lexer.py), is straightforward and short: it is just a list of regular expressions and keywords, wrapped in a clas s, `IDLLexer`; a lexer (object) itself is an instance of this class. There is so me minor complexity with error handling (correct line count) and methods to assi st with derived classes, but it's quite simple. The derived lexer class is very simple: it's a class, `BlinkIDLLexer`, which derives from the base class. The on ly complexity is adding a method to remove a token class from the base class, an d to remove comments (base lexer is non-standard here: standard lexer does not p roduce `COMMENT` tokens).

	99

	100 The base parser class, defined in [`idl_parser`](https://code.google.com/p/chrom ium/codesearch#chromium/src/tools/idl_parser/idl_parser.py), is considerably lon ger, but consists almost entirely of production rules in (a form of) [BNF](http: //en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form), together with yacc _actions_ that build an [abstract syntax tree](http://en.wikipedia.org/wiki/Abstract_synta x_tree) (AST, syntax tree). Recall that yacc _traverses_ the [concrete syntax tr ee](http://en.wikipedia.org/wiki/Concrete_syntax_tree) (CST, parse tree) and can take whatever actions it chooses; in this case it generates an AST, though it c ould also generate the IR directly. See [`blink_idl_parser`](https://code.google .com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scrip ts/blink_idl_parser.py) for a detailed explanation of the PLY yacc syntax and `I DLParser` methods used to construct the AST. The grammar is directly from the [W eb IDL grammar](http://heycam.github.io/webidl/#idl-grammar), which is an [LL(1) grammar](http://en.wikipedia.org/wiki/LL_grammar); the Blink IDL rules are exac tly the deviations from the standard grammar, or overrides to irregularities in the base parser. A parser object is an instance of a class, `IDLParser`, from wh ich `BlinkIDLParser` is derived. The parser constructor takes a lexer as an argu ment, and one passes the corresponding lexer.

	101

	102 The definitions classes, defined in [idl_definitions](https://code.google.com/p/ chromium/codesearch#chromium/src/third_party/WebKit/Source/bindings/scripts/idl_ definitions.py), primarily consist of constructors, which take the AST and gener ate an intermediate representation (IR), in this case an object of type `IdlDefi nitions` and contained objects for individual definitions and definition's membe rs.

	103

	104 The classes are as follows (mostly composition, one case of inheritance); a few internal-use only classes are not shown:

	105

	106 ```

	107 IdlDefinitions

	108 IdlInterface

	109 IdlAttribute

	110 IdlConstant

	111 IdlOperation

	112 IdlArgument

	113 IdlException < IdlInterface

	114 IdlCallbackFunction

	115 IdlEnum

	116 ```

	117

	118 After reading an IDL file (producing an IR), the reader has two optional additio nal steps (these are run during the build, but aren't necessary for simpler use, such as compiling a test file without dependencies): validation and dependency resolution.

	119

	120 ### Extended attribute validation

	121 The front end largely does not do [semantic analysis](http://en.wikipedia.org/wi ki/Semantic_analysis_%28compilers%29), as semantic errors (primarily name errors ) are largely caught by the build visibly failing, either during the IDL compile or during the C++ compile (name lookup errors). Notably, there is no symbol tab le or name binding in the front end: each name is simply looked up on use (e.g., type names), or passed through to the output (e.g., attribute names), and catch errors by the lookup failing or the generated code failing to compile, respecti vely.

	122

	123 However, extended attributes are validated, both the keys and values, based on t he list in [`IDLExtendedAttributes.txt`](https://code.google.com/p/chromium/code search#chromium/src/third_party/WebKit/Source/bindings/IDLExtendedAttributes.txt ). This is done because invalid extended attributes are _ignored_ by the compile r, specifically the code generator, and thus errors are easy to miss. The Python code checks for specific extended attributes, so errors just result in the attr ibute not being found; for example, writing `[EnforecRange]` for `[EnforceRange] ` would otherwise silently result in the range not being enforced. This is not p erfect: extended attributes may be valid but used incorrectly (e.g., specified o n an attribute when they only apply to methods), which is not caught; this is ho wever a minor problem.

	124

	125 This is done in the front end since that is the proper place for semantic analys is, and simplifies the code generator.

	126

	127 ### Dependency resolution

	128 IDL interface definitions have two types of dependencies:

	129 * An IDL interface definition can be spread across multiple files, via partial interfaces and (in effect) by `implements` (interface inheritance). We call the se _dependency interface definitions, dependency interfaces,_ or just _dependenc ies_.

	130 * An IDL interface may use other interfaces as interface types: e.g., `interfa ce Foo { attribute Bar bar; };` We call these _referenced interfaces_.

	131

	132 The dependency resolution phase consists of merging the dependencies into the ma in interface definition, and adding referenced interfaces to the list of all int erfaces (available to the code generator). These dependencies are computed durin g the _Global information_ stage; see below.

	133

	134 Members defined in partial interfaces and in implemented interfaces are simply a ppended to the list of members of the main interface, with a few caveats (below) . Formally the difference between partial interfaces and implemented interfaces is that a partial interface is (external) type extension (it is part of and modi fies the type, but is just defined separately), and a partial interface is assoc iated with a _single_ main interface that it extends; while implemented interfac es are _interface inheritance_ (providing multiple inheritance in IDL) and _seve ral_ interfaces can implement a given (implemented) interface. This difference i s only for IDL: in JavaScript these are both just exposed as properties on the o bject implementing the interface; and thus we mostly don't distinguish them in t he code generator either, just merging them into one list.

	135

	136 The subtleties are:

	137 * Members defined in partial interfaces are treated differently by the code ge nerator: they are assumed to be implemented as static members in separate classe s, not as methods on the object itself. This is because partial interfaces are u sually for optional or specialized parts of a basic interface, which might only be used by subtypes, and thus we do not want to clutter the main Blink C++ class with these.

	138 * Some extended attributes are transferred from the dependency interface defin ition to its members, because otherwise there is no way to apply the attributes to "just the members in this definition", due to merging. These are extended att ributes for turning off a dependency interface, namely: `[Conditional], [PerCont extEnabled],` and `[RuntimeEnabled].`

	139 * Interface-level `[TypeChecking]` extended attributes are also recognized on partial interfaces as a convenience. Its value is also transferred to its member s, appending it to any `[TypeChecking]` values that the member already declares.

	140

	141 Also, note that the compiler currently does not do significant type introspectio n of referenced interfaces: it mostly just uses certain global information about the interface (is it a callback interface?, `[ImplementedAs],` etc.). The only significant use of type introspection is in `[PutForwards],` where generating th e code for an attribute with `[PutForwards]` requires looking up the referenced attribute in another interface. Type introspection may increase in future, to si mplify complex cases and reduce hard-coding, but for now is in very limited use. </div>

	142

	143 ## Back end (code generator)

	144 The back end (code generator, CG) is where the bulk of the complexity is. The st ructure is simple: the Python V8 CG modules (`v8_`) generate a dictionary calle d the context, which is passed to Jinja, which uses this dict to fill in the appropriate templates ([bindings/templates](https://code.google.com/p/chromium/ codesearch#chromium/src/third_party/WebKit/Source/bindings/templates/)/.{cpp,h} ). The modules correspond to the class structure in IR, with the following excep tions:

	145 * the CG uses “method” where the front end uses “operation” (the term in the s pec), since “method” is a more standard term;

	146 * constants are included in `v8_interface`, because very simple; and

	147 * arguments are included in `v8_methods`, because they are closely related to methods.

	148

	149 Recall the overall flow of the CG:

	150 ```

	151 CodeGeneratorV8: IdlDefinitions --(extract member)--> IdlInterface --v8_*.py --> dict

	152 --jinja2--> header_text, cpp_text

	153 ```

	154

	155 The class structure of the IR, and corresponding Python module that processes it (in _italics_ if included in a previously mentioned module), are:

	156 ```

	157 IdlInterface :: v8_interface

	158 IdlAttribute :: v8_attributes

	159 IdlConstant :: _v8_interface_

	160 IdlOperation :: v8_methods

	161 IdlArgument :: _v8_methods_

	162 ```

	163 The CG modules each have a top-level `_context` function_,_ which takes an IR o bject (`Idl`) as a parameter and returns a context (`dict`), looping through co ntained IR objects as necessary.

	164

	165 Note that the context is nested – the context for an interfaces contains a list of contexts for the members: attributes, constants, and methods. The context for a given member is used throughout the corresponding template, generally to gene rate several methods; e.g., in `attributes.cpp` there are macros for getter, get ter callback, setter, and setter callback, which are called from a loop in `inte rface_base.cpp`. The context for members is _also_ used at the overall interface level, notably for DOM configuration.

	166

	167 ## Style

	168 [`v8_attributes.py`](https://code.google.com/p/chromium/codesearch#chromium/src/ third_party/WebKit/Source/bindings/scripts/v8_attributes.py) (Python) and [`attr ibutes.cpp`](https://code.google.com/p/chromium/codesearch#chromium/src/third_pa rty/WebKit/Source/bindings/templates/attributes.cpp) (Jinja) are a good guide to style: the getter and setter methods themselves are quite complex, while the ca llbacks are simple.

	169

	170 See [Jinja: Style](https://sites.google.com/a/chromium.org/dev/developers/jinja# TOC-Style) for general Jinja style tips; below is bindings generator specific gu idance.

	171

	172 ### Jinja

	173 If nesting macros (so macros that call macros), use top-down order: top-level ma cro, followed by macros it calls. This is most notable in [`methods.cpp`](https: //code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/b indings/templates/methods.cpp), where generating a method also requires generati ng code for the arguments, which comes after.

	174

	175 ### Python

	176 Assertions should _not_ be used for validating input, and this includes IDL file s: if an IDL file is invalid (e.g., it uses an extended attribute improperly), r aise an exception explicitly. Assertions can be used for validating internal log ic, but in practice this is rare in the code generator; see [Using Assertions Ef fectively](https://wiki.python.org/moin/UsingAssertionsEffectively).

	177

	178 Deviations from the "One dictionary display per context" rule:

	179 * If the presence of an extended attribute has side effects (namely includes), this code comes before the display.

	180 * An early return after the main display, and then additional processing after wards, occurs for attributes, where certain attributes (constructors, custom att ributes) have limited processing, and the side effects of running the additional context-generation code (extra includes) are not desired.

	181 * Similarly, the getter and setter logic for attributes is relative complicate d, and depends on previously computed variables, and thus this part of the conte xt is produced in other functions, then added to the main dictionary via `dict.u pdate()`.

	182

	183 No extended attributes in templates: A key point for the CG is that the raw extended attributes are _not_ passed to the Jinja templates, and literal extende d attribute names accordingly do not appear in templates. Instead, extended attr ibutes are all passed via some variable in the context, either a content variabl e or a (boolean) control variable (e.g., `deprecate_as` for the `[DeprecateAs]` extended attribute and `is_reflect` for the `[Reflect]` extended attribute (this is only boolean since the value is used elsewhere, in the function for the gett er/setter name)). This is primarily for consistency: in some cases variables _mu st_ be used, notably if the extended attribute requires a side effect (includes) , and thus putting all literal references to extended attributes in the Python l ogic keeps them together.

	184

	185 Includes: The CG code is primarily functional, with one significant exceptio n: _includes_ (for the `.cpp` file; the `.h` includes are simple). Includes are added to the `.cpp` file throughout the CG (as needed by various features), and account for a significant amount of the code bulk. This is handled by considerin g includes as a [side effect](http://en.wikipedia.org/wiki/Side_effect_(computer _science)), and storing them in a global variable (or rather a module level vari able, as Python modules are [singletons](http://en.wikipedia.org/wiki/Singleton_ pattern)), specifically in `v8_global`. Alternatives include: being purely funct ional, and returning the includes from the functions, which makes the calls and returns bulky (due to all the passing backwards and forwards); or having all CG functions be methods of some object (`CodeGeneratorV8`), which adds a large amou nt of OO boilerplate. The only complexity of using module variables is that one needs to import it into each module, and must clear the includes if compiling mu ltiple files in a single Python run (and can't compile multiple files concurrent ly in a single Python processes). This latter is ok because compilation is paral lelized by distributing across separate processes, not by compiling multiple fil es in a single process.

	186

	187 Note that the context being computed is _not_ a module-wide variable: for attrib utes, modules, and arguments, a context is computed for each of these (functiona lly), though for an interface only a single context is computed. In rare cases t he context is also passed into functions as an argument (when later values depen d on earlier computed values), which is primarily to flatten the module: it's cl earer to pass explicitly, instead of having a nested function.

	188

	189 There are a few other module-level variables for the IDL global context, which a re write-once or update-only: `v8_globals.interfaces`, for referenced interfaces , and a few in `v8_types`, for global and local type information (e.g., `[Implem entedAs]`, enumerations). These are set at `CodeGeneratorV8` construction (from `interfaces_info`), which is write-once, or `CodeGeneratorV8.generate_code` invo cation (from `definitions`), which is update-only (can add some new enumerations , for instance, but shouldn't need or refer to any from other files).

	190

	191 Template rendering: the context is then passed into a Jinja template, via th e [`Template.render`](http://jinja.pocoo.org/docs/api/#jinja2.Template.render) m ethod, which returns the generated code.

	192

	193 ## Global information

	194 Global information is computed by a pre-processing step, dependency resolution a nd merging is done by the front end, and global information is used by the code generator in limited places.

	195

	196 `FIXME`

	197

	198 ## Dependencies

	199 `FIXME`

	200

	201 ## Goals

	202 Key goals of the compiler are as follows (in descending order of importance); th ese primarily affect the code generator:

	203 * Correctness (including security) and performance of generated ob ject code (speed, memory usage, size), after compiling C++ to object code

	204 * Simplicity of compiler code itself, particularly code generator

	205 * Build performance (see [IDL build: performance](http://www.chromium.org/ developers/design-documents/idl-build#TOC-Performance) and [Performance](http:// www.chromium.org/developers/design-documents/idl-compiler#TOC-Performance), belo w)

	206

	207 Note that legibility of generated C++ source code is secondary to legibility of the code generator, as the CG is what is developed and reviewed, though the generated code should be legible if this does not conflict with other goals. As a rule, simple (one-line) changes to CG that simplify the generated source code are ok, but more complex changes that are only for source code legibility (that have no effect on object code) should be avoided. If you rely on the compiler (e .g., for dead code elimination), please write a comment to that effect.

	208

	209 Since the CG is complex enough as it is, a key part of the design is moving comp lexity _out_ of the code generator, notably to the front end (building an easy-t o-use IR) and to the global information step.

	210

	211 ## Alternatives

	212 Fundamentally, why do we have our own compiler at all? Can't we reuse an existin g program?

	213

	214 The answer is: we're reusing as much code as possible (lexer/parser and template library, and indeed a lexer/parser for the Web IDL standard), but we need custo m code for 3 purposes:

	215 * Blink IDL dialect, which differs from Web IDL standard: we inherit from a st andard parser, so this is small, but we need our own.

	216 * IR constructor: the AST output from the base parser is hard to use directly (you need to walk a tree), so we construct an easy-to-use object so the CG can h ave simple reading code (instead of walking the tree itself).

	217 * V8 code generator: we need to generate very specific C++ code to bind to V8 and implement Web IDL, with many options and special cases.

	218

	219 Of these, the V8 CG is the main reason; Blink IDL dialect and the IR constructor exist to reduce complexity of the CG.

	220

	221 ### Why not C++?

	222 The compiler is written in Python, rather than C++, because hackability (ease of understanding and modification) is more important than build speed, and build s peed is good enough.

	223

	224 A high-level language like Python is the usual choice for such tasks, and Python is the standard high-level language in Chromium: other languages (except JavaSc ript) are not widely used in Chromium, and JavaScript offers no advantages over Python.

	225

	226 Also, C++ code that generates C++ code risks being very confusing. In principle one could imagine writing the CG in template metaprogramming, but that would lik ely be hard to understand.

	227

	228 In principle C/C++ code may be used in the front end (lexer, parser, and constru ctor), particularly as the lexer and parser can be generated by lex-yacc, but th e code generator is expected to be in Python.

	229

	230 ### Why not SWIG?

	231 We can't use SWIG because there's in fact very little overlap. Essentially we us e Jinja templates+Python logic for the backend, rather than SWIG templates.

	232

	233 While SWIG is a general "high-level language to C++ bindings generator", and thu s seems a natural fit, it doesn't fill our needs. We need a front end because we 're taking IDL files as input, and we need a code generator that supports all th e specific features that Web IDL and Blink require: Web IDL has different semant ics from JavaScript generally, and Blink has many IDL extended attributes and ot her implementation details. Further, there is a prototype V8 backend for SWIG, b ut it is incomplete (the JavaScript backend primarily targets JavaScriptCore). I n principle we could have a compiler whose back end output SWIG templates, compl ete the general V8 backend for SWIG, and then modify this or fork it to make a W eb IDL to V8 backend, customized for Blink. As this description makes clear, tha t involves significant complexity and few advantages over generating the C++ cod e directly, using Jinja instead of SWIG.

	234

	235 ### Why not another lexer/parser or template processor?

	236 PLY (built on lex-yacc) and Jinja are standard and widely-used libraries for the se tasks, and are used elsewhere in Chromium. We're open to other libraries if t hey are significantly superior (and we switch throughout Chromium), but these li braries are fit for purpose.

	237

	238 ## Performance

	239 _For overall IDL build performance, see: [IDL build: Performance](http://www.chr omium.org/developers/design-documents/idl-build#TOC-Performance)_

	240

	241 The bindings are generated by running a separate compiler process for each inter face (main IDL file), for parallelization and simplicity of build. This section discusses optimization of individual IDL compiler runs.

	242

	243 For compilation of an individual IDL file, the key factor in startup time, both Python startup time, and initialization time of libraries. This is because most IDL files are quite short and simple, so processing time (variable cost) is shor t relative to startup time (fixed cost). Currently (March 2014) the average sequ ential compilation time of a single IDL file on a fast Linux workstation is ~80 milliseconds, compared to Python startup time of ~4 milliseconds (skipping impor t of `site` with `-S`).

	244

	245 Naively, the key limiting factor is initialization time of the libraries (PLY an d Jinja). This is expensive, as it requires compiling the lexer & parser or temp lates. Initialization is sped up by pre-caching these in a separate build step, which significantly improves build time. Note that `run-bindings-tests` will com pile multiple test files in a single Python process_,_ which is quite fast, sinc e initialization is done only once.

	246

	247 The compile has not been profiled (except by using [time(1)](http://en.wikipedia .org/wiki/Time_(Unix)) on a build), since the low-hanging fruit were obvious (ca che lexer and parser tables and templates), performance is now good enough, and significant additional improvements are unlikely without major work or reenginee ring.<

	248

	249 Processing time itself is presumably dominated by the main libraries (parsing vi a PLY and template rendering via Jinja), and possibly object construction (IdlDe finitions); the actual CG is likely quite cheap, as it's a simple iteration over the IR with a few conditionals and simple computations.

	250

	251 To our knowledge, there are no O(_n_<sup>2</sup>) or slower algorithms in the co mpiler – everything should be O(_n_) or faster – but some may be lurking somewhe re.

	252

	253 ### Further improvements

	254 Potential future improvements (without substantial reengineering) would primaril y be to improve startup of some components, or rewrite some components in C.

	255

	256 * Python tweaks:

	257 * execute bytecode directly (compile to `.pyc` or `.pyo` in separate step and execute that): v. small improvement (about .5 ms/process), would require other b uild step

	258 * use Python optimized mode (`-O` or `-OO`): likely little improvement (just s hrinks size of bytecode), and currently causes serious regressions (likely due t o breaking caching in a library)

	259 * Further optimization of Jinja startup requires changes in the Jinja library itself; see discussion at references.

	260 * Replacing parts of the PLY lexer-parser with a C/C++ library (produced by le x or yacc) would speed up execution of the front end, as it would be C code, not Python code, and make it virtually instantaneous. C lex/C++ yacc code is about 50x faster than PLY, and the lexer and parser typically each account for 40% of PLY execution time; remainder is cost of running the yacc actions, in this case to build an AST ([Writing Parsers and Compilers with PLY](http://www.dabeaz.com/ ply/PLYTalk.pdf), slides 76 and 77).

	261 * A C lexer would be simple and significant, while a C++ parser would be relat ively complicated, particularly due to the use of inheritance to derive a parser for the Blink IDL dialect from a parser for the standard Web IDL language.

	262 * <div>In terms of the overall build system, initialization costs could be sig nificantly reduced by initializing a single process and then forking it, so it's just a copy. However, this is platform dependent (Windows does not have `fork() `), and would significantly complicate the compiler.</div>

	263

	264 ### References

	265 * PLY: [Lex: Optimized mode](http://www.dabeaz.com/ply/ply.html#ply_nn15) and [Using Python's Optimized Mode](http://www.dabeaz.com/ply/ply.html#ply_nn38): we use optimization for the control over caching, not because we are running Pytho n in optimized mode (we aren't)

	266 * [Jinja performance](http://www.chromium.org/developers/jinja#TOC-Performance )

	267

	268 ## References

	269 * [Source/bindings/scripts](https://code.google.com/p/chromium/codesearch#chro mium/src/third_party/WebKit/Source/bindings/scripts/): the code

	270 * [Issue <strike>239771</strike>: Rewrite the IDL compiler in Python](https:// groups.google.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): tracking bug for rewrite (2013/2014)

	271 * [IDL compiler (V8 bindings generator) switched to Python](https://groups.goo gle.com/a/chromium.org/forum/#!topic/blink-dev/s1xTE428OAs): wrap-up email for r ewrite (Feb 2014)

	272 * [followup](https://groups.google.com/a/chromium.org/forum/#!topic/blink-dev/ 5FSUhiugWsE) about front end

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »