Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 Subzero - Fast code generator for PNaCl bitcode | 1 Subzero - Fast code generator for PNaCl bitcode |
| 2 =============================================== | 2 =============================================== |
| 3 | 3 |
| 4 Design | |
| 5 ------ | |
| 6 | |
| 7 See the accompanying DESIGN.rst file for a more detailed technical overview of | |
| 8 Subzero. | |
| 9 | |
| 4 Building | 10 Building |
| 5 -------- | 11 -------- |
| 6 | 12 |
| 7 Subzero is set up to be built within the Native Client tree. Follow the | 13 Subzero is set up to be built within the Native Client tree. Follow the |
| 8 `Developing PNaCl | 14 `Developing PNaCl |
| 9 <https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl >`_ | 15 <https://sites.google.com/a/chromium.org/dev/nativeclient/pnacl/developing-pnacl >`_ |
| 10 instructions, in particular the section on building PNaCl sources. This will | 16 instructions, in particular the section on building PNaCl sources. This will |
| 11 prepare the necessary external headers and libraries that Subzero needs. | 17 prepare the necessary external headers and libraries that Subzero needs. |
| 12 Checking out the Native Client project also gets the pre-built clang and LLVM | 18 Checking out the Native Client project also gets the pre-built clang and LLVM |
| 13 tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which | 19 tools in ``native_client/../third_party/llvm-build/Release+Asserts/bin`` which |
| 14 are used for building Subzero. | 20 are used for building Subzero. |
| 15 | 21 |
| 16 The Subzero source is in ``native_client/toolchain_build/src/subzero``. From | 22 The Subzero source is in ``native_client/toolchain_build/src/subzero``. From |
| 17 within that directory, ``git checkout master && git pull`` to get the latest | 23 within that directory, ``git checkout master && git pull`` to get the latest |
| 18 version of Subzero source code. | 24 version of Subzero source code. |
| 19 | 25 |
| 20 The Makefile is designed to be used as part of the higher level LLVM build | 26 The Makefile is designed to be used as part of the higher level LLVM build |
| 21 system. To build manually, use the ``Makefile.standalone``. There are several | 27 system. To build manually, use the ``Makefile.standalone``. There are several |
| 22 build configurations from the command line:: | 28 build configurations from the command line:: |
| 23 | 29 |
| 24 make -f Makefile.standalone | 30 make -f Makefile.standalone |
| 25 make -f Makefile.standalone DEBUG=1 | 31 make -f Makefile.standalone DEBUG=1 |
| 26 make -f Makefile.standalone NOASSERT=1 | 32 make -f Makefile.standalone NOASSERT=1 |
| 27 make -f Makefile.standalone DEBUG=1 NOASSERT=1 | 33 make -f Makefile.standalone DEBUG=1 NOASSERT=1 |
| 28 make -f Makefile.standalone MINIMAL=1 | 34 make -f Makefile.standalone MINIMAL=1 |
| 35 make -f Makefile.standalone ASAN=1 | |
| 36 make -f Makefile.standalone TSAN=1 | |
| 29 | 37 |
| 30 ``DEBUG=1`` builds without optimizations and is good when running the translator | 38 ``DEBUG=1`` builds without optimizations and is good when running the translator |
| 31 inside a debugger. ``NOASSERT=1`` disables assertions and is the preferred | 39 inside a debugger. ``NOASSERT=1`` disables assertions and is the preferred |
| 32 configuration for performance testing the translator. ``MINIMAL=1`` attempts to | 40 configuration for performance testing the translator. ``MINIMAL=1`` attempts to |
| 33 minimize the size of the translator by compiling out everything unnecessary. | 41 minimize the size of the translator by compiling out everything unnecessary. |
| 42 ``ASAN=1`` enables AddressSanitizer, and ``TSAN=1`` enables ThreadSanitizer. | |
| 34 | 43 |
| 35 The result of the ``make`` command is the target ``pnacl-sz`` in the current | 44 The result of the ``make`` command is the target ``pnacl-sz`` in the current |
| 36 directory. | 45 directory. |
| 37 | 46 |
| 38 ``pnacl-sz`` | 47 ``pnacl-sz`` |
| 39 ------------ | 48 ------------ |
| 40 | 49 |
| 41 The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it | 50 The ``pnacl-sz`` program parses a pexe or an LLVM bitcode file and translates it |
| 42 into ICE (Subzero's intermediate representation). It then invokes the ICE | 51 into ICE (Subzero's intermediate representation). It then invokes the ICE |
| 43 translate method to lower it to target-specific machine code, optionally dumping | 52 translate method to lower it to target-specific machine code, optionally dumping |
| (...skipping 22 matching lines...) Expand all Loading... | |
| 66 low-level textual assembly file demonstrating the integrated assembler. | 75 low-level textual assembly file demonstrating the integrated assembler. |
| 67 | 76 |
| 68 ``-O<LEVEL>`` -- Set the optimization level. Valid levels are ``2``, ``1``, | 77 ``-O<LEVEL>`` -- Set the optimization level. Valid levels are ``2``, ``1``, |
| 69 ``0``, ``-1``, and ``m1``. Levels ``-1`` and ``m1`` are synonyms, and | 78 ``0``, ``-1``, and ``m1``. Levels ``-1`` and ``m1`` are synonyms, and |
| 70 represent the minimum optimization and worst code quality, but fastest code | 79 represent the minimum optimization and worst code quality, but fastest code |
| 71 generation. | 80 generation. |
| 72 | 81 |
| 73 ``-verbose=<list>`` -- Set verbosity flags. This argument allows a | 82 ``-verbose=<list>`` -- Set verbosity flags. This argument allows a |
| 74 comma-separated list of values. The default is ``none``, and the value | 83 comma-separated list of values. The default is ``none``, and the value |
| 75 ``inst,pred`` will roughly match the .ll bitcode file. Of particular use | 84 ``inst,pred`` will roughly match the .ll bitcode file. Of particular use |
| 76 are ``all`` and ``none``. | 85 are ``all``, ``most``, and ``none``. |
| 77 | 86 |
| 78 ``-o <FILE>`` -- Set the assembly output file name. Default is stdout. | 87 ``-o <FILE>`` -- Set the assembly output file name. Default is stdout. |
| 79 | 88 |
| 80 ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is | 89 ``-log <FILE>`` -- Set the file name for diagnostic output (whose level is |
| 81 controlled by ``-verbose``). Default is stdout. | 90 controlled by ``-verbose``). Default is stdout. |
| 82 | 91 |
| 83 ``-timing`` -- Dump some pass timing information after translating the input | 92 ``-timing`` -- Dump some pass timing information after translating the input |
| 84 file. | 93 file. |
| 85 | 94 |
| 86 Running the test suite | 95 Running the test suite |
| 87 ---------------------- | 96 ---------------------- |
| 88 | 97 |
| 89 Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which | 98 Subzero uses the LLVM ``lit`` testing tool for part of its test suite, which |
| 90 lives in ``tests_lit``. To execute the test suite, first build Subzero, and then | 99 lives in ``tests_lit``. To execute the test suite, first build Subzero, and then |
| 91 run:: | 100 run:: |
| 92 | 101 |
| 93 make -f Makefile.standalone check-lit | 102 make -f Makefile.standalone check-lit |
| 94 | 103 |
| 95 There is also a suite of cross tests in the ``crosstest`` directory. A cross | 104 There is also a suite of cross tests in the ``crosstest`` directory. A cross |
| 96 test takes a test bitcode file implementing some unit tests, and translates it | 105 test takes a test bitcode file implementing some unit tests, and translates it |
| 97 twice, once with Subzero and once with LLVM's known-good ``llc`` translator. | 106 twice, once with Subzero and once with LLVM's known-good ``llc`` translator. |
| 98 The Subzero-translated symbols are specially mangled to avoid multiple | 107 The Subzero-translated symbols are specially mangled to avoid multiple |
| 99 definition errors from the linker. Both translated versions are linked together | 108 definition errors from the linker. Both translated versions are linked together |
| 100 with a driver program that calls each version of each unit test with a variety | 109 with a driver program that calls each version of each unit test with a variety |
| 101 of interesting inputs and compares the results for equality. The cross tests | 110 of interesting inputs and compares the results for equality. The cross tests |
| 102 are currently invoked by running the ``runtests.sh`` script. | 111 are currently invoked by running:: |
| 103 | 112 |
| 104 A convenient way to run both the lit tests and the cross tests is:: | 113 make -f Makefile.standalone check-xtest |
| 114 | |
| 115 Similar, there is a suite of unit tests:: | |
| 116 | |
| 117 make -f Makefile.standalone check-unit | |
| 118 | |
| 119 A convenient way to run the lit, cross, and unit tests is:: | |
| 105 | 120 |
| 106 make -f Makefile.standalone check | 121 make -f Makefile.standalone check |
| 107 | 122 |
| 108 Assembling ``pnacl-sz`` output as needed | 123 Assembling ``pnacl-sz`` output as needed |
| 109 ---------------------------------------- | 124 ---------------------------------------- |
| 110 | 125 |
| 111 ``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``. | 126 ``pnacl-sz`` can now produce a native ELF binary using ``-filetype=obj``. |
| 112 | 127 |
| 113 ``pnacl-sz`` can also produce textual assembly code in a structure suitable for | 128 ``pnacl-sz`` can also produce textual assembly code in a structure suitable for |
| 114 input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``. An object | 129 input to ``llvm-mc``, using ``-filetype=asm`` or ``-filetype=iasm``. An object |
| 115 file can then be produced using the command:: | 130 file can then be produced using the command:: |
| 116 | 131 |
| 117 llvm-mc -arch=x86 -filetype=obj -o=MyObj.o | 132 llvm-mc -triple=i686 -filetype=obj -o=MyObj.o |
| 118 | 133 |
| 119 Building a translated binary | 134 Building a translated binary |
| 120 ---------------------------- | 135 ---------------------------- |
| 121 | 136 |
| 122 There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe | 137 There is a helper script, ``pydir/szbuild.py``, that translates a finalized pexe |
| 123 into a fully linked executable. Run it with ``-help`` for extensive | 138 into a fully linked executable. Run it with ``-help`` for extensive |
| 124 documentation. | 139 documentation. |
| 125 | 140 |
| 126 By default, ``szbuild.py`` builds an executable using only Subzero translation, | 141 By default, ``szbuild.py`` builds an executable using only Subzero translation, |
| 127 but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is | 142 but it can also be used to produce hybrid Subzero/``llc`` binaries (``llc`` is |
| 128 the name of the LLVM translator) for bisection-based debugging. In bisection | 143 the name of the LLVM translator) for bisection-based debugging. In bisection |
| 129 debugging mode, the pexe is translated using both Subzero and ``llc``, and the | 144 debugging mode, the pexe is translated using both Subzero and ``llc``, and the |
| 130 resulting object files are combined into a single executable using symbol | 145 resulting object files are combined into a single executable using symbol |
| 131 weakening and other linker tricks to control which Subzero symbols and which | 146 weakening and other linker tricks to control which Subzero symbols and which |
| 132 ``llc`` symbols take precedence. This is controlled by the ``-include`` and | 147 ``llc`` symbols take precedence. This is controlled by the ``-include`` and |
| 133 ``-exclude`` arguments. These can be used to rapidly find a single function | 148 ``-exclude`` arguments. These can be used to rapidly find a single function |
| 134 that Subzero translates incorrectly leading to incorrect output. | 149 that Subzero translates incorrectly leading to incorrect output. |
| 135 | 150 |
| 136 There is another helper script, ``pydir/szbuild_spec2k.py``, that runs | 151 There is another helper script, ``pydir/szbuild_spec2k.py``, that runs |
| 137 ``szbuild.py`` on one or more components of the Spec2K suite. This assumes that | 152 ``szbuild.py`` on one or more components of the Spec2K suite. This assumes that |
| 138 Spec2K is set up in the usual place in the Native Client tree, and the finalized | 153 Spec2K is set up in the usual place in the Native Client tree, and the finalized |
| 139 pexe files have been built. (Note: for working with Spec2K and other pexes, | 154 pexe files have been built. (Note: for working with Spec2K and other pexes, |
| 140 it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the | 155 it's helpful to finalize the pexe using ``--no-strip-syms``, to preserve the |
| 141 original function and global variable names.) | 156 original function and global variable names.) |
| 142 | 157 |
| 143 Status | 158 Status |
| 144 ------ | 159 ------ |
| 145 | 160 |
| 146 Subzero currently translates only for the x86-32 architecture. Native Client | 161 Subzero currently fully supports the x86-32 architecture, for both native and |
| 147 sandboxing is not yet implemented. Two optimization levels, ``-Om1`` and | 162 Native Client sandboxing modes. The x86-64 architecture is also supported in |
| 148 ``-O2``, are implemented. | 163 native mode only, for the x32 flavor due to PNaCl bitcode restrictions. ARM and |
|
JF
2015/08/31 21:08:02
"in native mode only" but sandboxing is in progres
Jim Stichnoth
2015/09/02 23:35:05
Done.
| |
| 164 MIPS support is in progress. Two optimization levels, ``-Om1`` and ``-O2``, are | |
| 165 implemented. | |
| 149 | 166 |
| 150 The ``-Om1`` configuration is designed to be the simplest and fastest possible, | 167 The ``-Om1`` configuration is designed to be the simplest and fastest possible, |
| 151 with a minimal set of passes and transformations. | 168 with a minimal set of passes and transformations. |
| 152 | 169 |
| 153 * Simple Phi lowering before target lowering, by generating temporaries and | 170 * Simple Phi lowering before target lowering, by generating temporaries and |
| 154 adding assignments to the end of predecessor blocks. | 171 adding assignments to the end of predecessor blocks. |
| 155 | 172 |
| 156 * Simple register allocation limited to pre-colored and infinite-weight | 173 * Simple register allocation limited to pre-colored or infinite-weight |
| 157 Variables. | 174 Variables. |
| 158 | 175 |
| 159 The ``-O2`` configuration is designed to use all optimizations available and | 176 The ``-O2`` configuration is designed to use all optimizations available and |
| 160 produce the best code. | 177 produce the best code. |
| 161 | 178 |
| 162 * Address mode inference to leverage the complex x86 addressing modes. | 179 * Address mode inference to leverage the complex x86 addressing modes. |
| 163 | 180 |
| 164 * Compare/branch fusing based on liveness/last-use analysis. | 181 * Compare/branch fusing based on liveness/last-use analysis. |
| 165 | 182 |
| 166 * Global, linear-scan register allocation. | 183 * Global, linear-scan register allocation. |
| 167 | 184 |
| 168 * Advanced phi lowering after target lowering and global register allocation, | 185 * Advanced phi lowering after target lowering and global register allocation, |
| 169 via edge splitting, topological sorting of the parallel moves, and final local | 186 via edge splitting, topological sorting of the parallel moves, and final local |
| 170 register allocation. | 187 register allocation. |
| 171 | 188 |
| 172 * Stack slot coalescing to reduce frame size. | 189 * Stack slot coalescing to reduce frame size. |
| 173 | 190 |
| 174 * Branch optimization to reduce the number of branches to the following block. | 191 * Branch optimization to reduce the number of branches to the following block. |
| OLD | NEW |