Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(67)

Side by Side Diff: docs/DESIGN.rst

Issue 2069953004: Subzero: Design doc: Include regalloc in Om1 pass list. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Patch Set: Created 4 years, 6 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Design of the Subzero fast code generator 1 Design of the Subzero fast code generator
2 ========================================= 2 =========================================
3 3
4 Introduction 4 Introduction
5 ------------ 5 ------------
6 6
7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes 7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes
8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the 8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the
9 PNaCl toolchain to compile their application to architecture-neutral PNaCl 9 PNaCl toolchain to compile their application to architecture-neutral PNaCl
10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as 10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as
(...skipping 31 matching lines...) Expand 10 before | Expand all | Expand 10 after
42 42
43 Or, improve translator performance to something more reasonable. 43 Or, improve translator performance to something more reasonable.
44 44
45 This document describes Subzero's attempt to improve translation speed by an 45 This document describes Subzero's attempt to improve translation speed by an
46 order of magnitude while rivaling LLVM's code quality. Subzero does this 46 order of magnitude while rivaling LLVM's code quality. Subzero does this
47 through minimal IR layering, lean data structures and passes, and a careful 47 through minimal IR layering, lean data structures and passes, and a careful
48 selection of fast optimization passes. It has two optimization recipes: full 48 selection of fast optimization passes. It has two optimization recipes: full
49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the 49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the
50 following (described in more detail below): 50 following (described in more detail below):
51 51
52 +-----------------------------------+-----------------------+ 52 +-----------------------------------+-----------------------------+
53 | O2 recipe | Om1 recipe | 53 | O2 recipe | Om1 recipe |
54 +===================================+=======================+ 54 +===================================+=============================+
55 | Parse .pexe file | Parse .pexe file | 55 | Parse .pexe file | Parse .pexe file |
56 +-----------------------------------+-----------------------+ 56 +-----------------------------------+-----------------------------+
57 | Loop nest analysis | | 57 | Loop nest analysis | |
58 +-----------------------------------+-----------------------+ 58 +-----------------------------------+-----------------------------+
59 | Address mode inference | | 59 | Address mode inference | |
60 +-----------------------------------+-----------------------+ 60 +-----------------------------------+-----------------------------+
61 | Read-modify-write (RMW) transform | | 61 | Read-modify-write (RMW) transform | |
62 +-----------------------------------+-----------------------+ 62 +-----------------------------------+-----------------------------+
63 | Basic liveness analysis | | 63 | Basic liveness analysis | |
64 +-----------------------------------+-----------------------+ 64 +-----------------------------------+-----------------------------+
65 | Load optimization | | 65 | Load optimization | |
66 +-----------------------------------+-----------------------+ 66 +-----------------------------------+-----------------------------+
67 | | Phi lowering (simple) | 67 | | Phi lowering (simple) |
68 +-----------------------------------+-----------------------+ 68 +-----------------------------------+-----------------------------+
69 | Target lowering | Target lowering | 69 | Target lowering | Target lowering |
70 +-----------------------------------+-----------------------+ 70 +-----------------------------------+-----------------------------+
71 | Full liveness analysis | | 71 | Full liveness analysis | |
72 +-----------------------------------+-----------------------+ 72 +-----------------------------------+-----------------------------+
73 | Register allocation | | 73 | Register allocation | Minimal register allocation |
74 +-----------------------------------+-----------------------+ 74 +-----------------------------------+-----------------------------+
75 | Phi lowering (advanced) | | 75 | Phi lowering (advanced) | |
76 +-----------------------------------+-----------------------+ 76 +-----------------------------------+-----------------------------+
77 | Post-phi register allocation | | 77 | Post-phi register allocation | |
78 +-----------------------------------+-----------------------+ 78 +-----------------------------------+-----------------------------+
79 | Branch optimization | | 79 | Branch optimization | |
80 +-----------------------------------+-----------------------+ 80 +-----------------------------------+-----------------------------+
81 | Code emission | Code emission | 81 | Code emission | Code emission |
82 +-----------------------------------+-----------------------+ 82 +-----------------------------------+-----------------------------+
83 83
84 Goals 84 Goals
85 ===== 85 =====
86 86
87 Translation speed 87 Translation speed
88 ----------------- 88 -----------------
89 89
90 We'd like to be able to translate a ``.pexe`` file as fast as download speed. 90 We'd like to be able to translate a ``.pexe`` file as fast as download speed.
91 Any faster is in a sense wasted effort. Download speed varies greatly, but 91 Any faster is in a sense wasted effort. Download speed varies greatly, but
92 we'll arbitrarily say 1 MB/sec. We'll pick the ARM A15 CPU as the example of a 92 we'll arbitrarily say 1 MB/sec. We'll pick the ARM A15 CPU as the example of a
(...skipping 1392 matching lines...) Expand 10 before | Expand all | Expand 10 after
1485 This could be mitigated by switching these to the CFG-local allocator. 1485 This could be mitigated by switching these to the CFG-local allocator.
1486 1486
1487 Third, multithreading may make the default allocator strategy more expensive. 1487 Third, multithreading may make the default allocator strategy more expensive.
1488 In a single-threaded environment, a pass will allocate its containers, run the 1488 In a single-threaded environment, a pass will allocate its containers, run the
1489 pass, and deallocate the containers. This results in stack-like allocation 1489 pass, and deallocate the containers. This results in stack-like allocation
1490 behavior and makes the heap free list easier to manage, with less heap 1490 behavior and makes the heap free list easier to manage, with less heap
1491 fragmentation. But when multithreading is added, the allocations and 1491 fragmentation. But when multithreading is added, the allocations and
1492 deallocations become much less stack-like, making allocation and deallocation 1492 deallocations become much less stack-like, making allocation and deallocation
1493 operations individually more expensive. Again, this could be mitigated by 1493 operations individually more expensive. Again, this could be mitigated by
1494 switching these to the CFG-local allocator. 1494 switching these to the CFG-local allocator.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698