OLD | NEW |
| (Empty) |
1 Subzero - Fast code generator for PNaCl bitcode | |
2 =============================================== | |
3 | |
4 Design | |
5 ------ | |
6 | |
7 See the accompanying DESIGN.rst file for a more detailed technical overview of | |
8 Subzero. | |
9 | |
10 Building | |
11 -------- | |
12 | |
13 Subzero is set up to be built within the Native Client tree. Follow the | |
14 `Developing PNaCl | |
15 <https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl
>`_ | |
16 instructions, in particular the section on building PNaCl sources. This will | |
17 prepare the necessary external headers and libraries that Subzero needs. | |
18 Checking out the Native Client project also gets the pre-built clang and LLVM | |
19 tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which | |
20 are used for building Subzero. | |
21 | |
22 The Subzero source is in ``native_client/toolchain_build/src/subzero``. From | |
23 within that directory, ``git checkout master && git pull`` to get the latest | |
24 version of Subzero source code. | |
25 | |
26 The Makefile is designed to be used as part of the higher level LLVM build | |
27 system. To build manually, use the ``Makefile.standalone``. There are several | |
28 build configurations from the command line:: | |
29 | |
30 make -f Makefile.standalone | |
31 make -f Makefile.standalone DEBUG=1 | |
32 make -f Makefile.standalone NOASSERT=1 | |
33 make -f Makefile.standalone DEBUG=1 NOASSERT=1 | |
34 make -f Makefile.standalone MINIMAL=1 | |
35 make -f Makefile.standalone ASAN=1 | |
36 make -f Makefile.standalone TSAN=1 | |
37 | |
38 ``DEBUG=1`` builds without optimizations and is good when running the translator | |
39 inside a debugger. ``NOASSERT=1`` disables assertions and is the preferred | |
40 configuration for performance testing the translator. ``MINIMAL=1`` attempts to | |
41 minimize the size of the translator by compiling out everything unnecessary. | |
42 ``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer. | |
43 | |
44 The result of the ``make`` command is the target ``pnacl-sz`` in the current | |
45 directory. | |
46 | |
47 ``pnacl-sz`` | |
48 ------------ | |
49 | |
50 The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it | |
51 into ICE (Subzero's intermediate representation). It then invokes the ICE | |
52 translate method to lower it to target-specific machine code, optionally dumping | |
53 the intermediate representation at various stages of the translation. | |
54 | |
55 The program can be run as follows:: | |
56 | |
57 ../pnacl-sz ./path/to/<file>.pexe | |
58 ../pnacl-sz ./tests_lit/pnacl-sz_tests/<file>.ll | |
59 | |
60 At this time, ``pnacl-sz`` accepts a number of arguments, including the | |
61 following: | |
62 | |
63 ``-help`` -- Show available arguments and possible values. (Note: this | |
64 unfortunately also pulls in some LLVM-specific options that are reported but | |
65 that Subzero doesn't use.) | |
66 | |
67 ``-notranslate`` -- Suppress the ICE translation phase, which is useful if | |
68 ICE is missing some support. | |
69 | |
70 ``-target=<TARGET>`` -- Set the target architecture. The default is x8632. | |
71 Future targets include x8664, arm32, and arm64. | |
72 | |
73 ``-filetype=obj|asm|iasm`` -- Select the output file type. ``obj`` is a | |
74 native ELF file, ``asm`` is a textual assembly file, and ``iasm`` is a | |
75 low-level textual assembly file demonstrating the integrated assembler. | |
76 | |
77 ``-O<LEVEL>`` -- Set the optimization level. Valid levels are ``2``, ``1``, | |
78 ``0``, ``-1``, and ``m1``. Levels ``-1`` and ``m1`` are synonyms, and | |
79 represent the minimum optimization and worst code quality, but fastest code | |
80 generation. | |
81 | |
82 ``-verbose=<list>`` -- Set verbosity flags. This argument allows a | |
83 comma-separated list of values. The default is ``none``, and the value | |
84 ``inst,pred`` will roughly match the .ll bitcode file. Of particular use | |
85 are ``all``, ``most``, and ``none``. | |
86 | |
87 ``-o <FILE>`` -- Set the assembly output file name. Default is stdout. | |
88 | |
89 ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is | |
90 controlled by ``-verbose``). Default is stdout. | |
91 | |
92 ``-timing`` -- Dump some pass timing information after translating the input | |
93 file. | |
94 | |
95 Running the test suite | |
96 ---------------------- | |
97 | |
98 Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which | |
99 lives in ``tests_lit``. To execute the test suite, first build Subzero, and then | |
100 run:: | |
101 | |
102 make -f Makefile.standalone check-lit | |
103 | |
104 There is also a suite of cross tests in the ``crosstest`` directory. A cross | |
105 test takes a test bitcode file implementing some unit tests, and translates it | |
106 twice, once with Subzero and once with LLVM's known-good ``llc`` translator. | |
107 The Subzero-translated symbols are specially mangled to avoid multiple | |
108 definition errors from the linker. Both translated versions are linked together | |
109 with a driver program that calls each version of each unit test with a variety | |
110 of interesting inputs and compares the results for equality. The cross tests | |
111 are currently invoked by running:: | |
112 | |
113 make -f Makefile.standalone check-xtest | |
114 | |
115 Similar, there is a suite of unit tests:: | |
116 | |
117 make -f Makefile.standalone check-unit | |
118 | |
119 A convenient way to run the lit, cross, and unit tests is:: | |
120 | |
121 make -f Makefile.standalone check | |
122 | |
123 Assembling ``pnacl-sz`` output as needed | |
124 ---------------------------------------- | |
125 | |
126 ``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``. | |
127 | |
128 ``pnacl-sz`` can also produce textual assembly code in a structure suitable for | |
129 input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``. An object | |
130 file can then be produced using the command:: | |
131 | |
132 llvm-mc -triple=i686 -filetype=obj -o=MyObj.o | |
133 | |
134 Building a translated binary | |
135 ---------------------------- | |
136 | |
137 There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe | |
138 into a fully linked executable. Run it with ``-help`` for extensive | |
139 documentation. | |
140 | |
141 By default, ``szbuild.py`` builds an executable using only Subzero translation, | |
142 but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is | |
143 the name of the LLVM translator) for bisection-based debugging. In bisection | |
144 debugging mode, the pexe is translated using both Subzero and ``llc``, and the | |
145 resulting object files are combined into a single executable using symbol | |
146 weakening and other linker tricks to control which Subzero symbols and which | |
147 ``llc`` symbols take precedence. This is controlled by the ``-include`` and | |
148 ``-exclude`` arguments. These can be used to rapidly find a single function | |
149 that Subzero translates incorrectly leading to incorrect output. | |
150 | |
151 There is another helper script, ``pydir/szbuild_spec2k.py``, that runs | |
152 ``szbuild.py`` on one or more components of the Spec2K suite. This assumes that | |
153 Spec2K is set up in the usual place in the Native Client tree, and the finalized | |
154 pexe files have been built. (Note: for working with Spec2K and other pexes, | |
155 it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the | |
156 original function and global variable names.) | |
157 | |
158 Status | |
159 ------ | |
160 | |
161 Subzero currently fully supports the x86-32 architecture, for both native and | |
162 Native Client sandboxing modes. The x86-64 architecture is also supported in | |
163 native mode only, and only for the x32 flavor due to the fact that pointers and | |
164 32-bit integers are indistinguishable in PNaCl bitcode. Sandboxing support for | |
165 x86-64 is in progress. ARM and MIPS support is in progress. Two optimization | |
166 levels, ``-Om1`` and ``-O2``, are implemented. | |
167 | |
168 The ``-Om1`` configuration is designed to be the simplest and fastest possible, | |
169 with a minimal set of passes and transformations. | |
170 | |
171 * Simple Phi lowering before target lowering, by generating temporaries and | |
172 adding assignments to the end of predecessor blocks. | |
173 | |
174 * Simple register allocation limited to pre-colored or infinite-weight | |
175 Variables. | |
176 | |
177 The ``-O2`` configuration is designed to use all optimizations available and | |
178 produce the best code. | |
179 | |
180 * Address mode inference to leverage the complex x86 addressing modes. | |
181 | |
182 * Compare/branch fusing based on liveness/last-use analysis. | |
183 | |
184 * Global, linear-scan register allocation. | |
185 | |
186 * Advanced phi lowering after target lowering and global register allocation, | |
187 via edge splitting, topological sorting of the parallel moves, and final local | |
188 register allocation. | |
189 | |
190 * Stack slot coalescing to reduce frame size. | |
191 | |
192 * Branch optimization to reduce the number of branches to the following block. | |
OLD | NEW |