OLD | NEW |
| (Empty) |
1 Object allocation and lifetime in ICE | |
2 ===================================== | |
3 | |
4 This document discusses object lifetime and scoping issues, starting with | |
5 bitcode parsing and ending with ELF file emission. | |
6 | |
7 Multithreaded translation model | |
8 ------------------------------- | |
9 | |
10 A single thread is responsible for parsing PNaCl bitcode (possibly concurrently | |
11 with downloading the bitcode file) and constructing the initial high-level ICE. | |
12 The result is a queue of Cfg pointers. The parser thread incrementally adds a | |
13 Cfg pointer to the queue after the Cfg is created, and then moves on to parse | |
14 the next function. | |
15 | |
16 Multiple translation worker threads draw from the queue of Cfg pointers as they | |
17 are added to the queue, such that several functions can be translated in paralle
l. | |
18 The result is a queue of assembler buffers, each of which consists of machine co
de | |
19 plus fixups. | |
20 | |
21 A single thread is responsible for writing the assembler buffers to an ELF file. | |
22 It consumes the assembler buffers from the queue that the translation threads | |
23 write to. | |
24 | |
25 This means that Cfgs are created by the parser thread and destroyed by the | |
26 translation thread (including Cfg nodes, instructions, and most kinds of | |
27 operands), and assembler buffers are created by the translation thread and | |
28 destroyed by the writer thread. | |
29 | |
30 Deterministic execution | |
31 ^^^^^^^^^^^^^^^^^^^^^^^ | |
32 | |
33 Although code randomization is a key aspect of security, deterministic and | |
34 repeatable translation is sometimes needed, e.g. for regression testing. | |
35 Multithreaded translation introduces potential for randomness that may need to | |
36 be made deterministic. | |
37 | |
38 * Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the | |
39 translation queue in deterministic order. But since translation is | |
40 multithreaded, FIFO order for the assembler buffer queue may not be | |
41 deterministic. The writer thread would be responsible for reordering the | |
42 buffers, potentially waiting for slower translations to complete even if other | |
43 assembler buffers are available. | |
44 | |
45 * Different translation threads may add new constant pool entries at different | |
46 times. Some constant pool entries are emitted as read-only data. This | |
47 includes floating-point constants for x86, as well as integer immediate | |
48 randomization through constant pooling. These constant pool entries are | |
49 emitted after all assembler buffers have been written. The writer needs to be | |
50 able to sort them deterministically before emitting them. | |
51 | |
52 Object lifetimes | |
53 ---------------- | |
54 | |
55 Objects of type Constant, or a subclass of Constant, are pooled globally. The | |
56 pooling is managed by the GlobalContext class. Since Constants are added or | |
57 looked up by translation threads and the parser thread, access to the constant | |
58 pools, as well as GlobalContext in general, need to be arbitrated by locks. | |
59 (It's possible that if there's too much contention, we can maintain a | |
60 thread-local cache for Constant pool lookups.) Constants live across all | |
61 function translations, and are destroyed only at the end. | |
62 | |
63 Several object types are scoped within the lifetime of the Cfg. These include | |
64 CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand. | |
65 When the Cfg is destroyed, these scoped objects are destroyed as well. To keep | |
66 this cheap, the Cfg includes a slab allocator from which these objects are | |
67 allocated, and the objects should not contain fields with non-trivial | |
68 destructors. Most of these fields are POD, but in a couple of cases these | |
69 fields are STL containers. We deal with this, and avoid leaking memory, by | |
70 providing the container with an allocator that uses the Cfg-local slab | |
71 allocator. Since the container allocator generally needs to be stateless, we | |
72 store a pointer to the slab allocator in thread-local storage (TLS). This is | |
73 straightforward since on any of the threads, only one Cfg is active at a time, | |
74 and a given Cfg is only active in one thread at a time (either the parser | |
75 thread, or at most one translation thread, or the writer thread). | |
76 | |
77 Even though there is a one-to-one correspondence between Cfgs and assembler | |
78 buffers, they need to use different allocators. This is because the translation | |
79 thread wants to destroy the Cfg and reclaim all its memory after translation | |
80 completes, but possibly before the assembly buffer is written to the ELF file. | |
81 Ownership of the assembler buffer and its allocator are transferred to the | |
82 writer thread after translation completes, similar to the way ownership of the | |
83 Cfg and its allocator are transferred to the translation thread after parsing | |
84 completes. | |
85 | |
86 Allocators and TLS | |
87 ------------------ | |
88 | |
89 Part of the Cfg building, and transformations on the Cfg, include STL container | |
90 operations which may need to allocate additional memory in a stateless fashion. | |
91 This requires maintaining the proper slab allocator pointer in TLS. | |
92 | |
93 When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's | |
94 slab allocator into its own TLS. This is used as the Cfg is built within the | |
95 parser thread. After the Cfg is built, the parser thread clears its allocator | |
96 pointer, adds the new Cfg pointer to the translation queue, continues with the | |
97 next function. | |
98 | |
99 When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab | |
100 allocator into its TLS and translates the function. When generating the | |
101 assembly buffer, it must take care not to use the Cfg's slab allocator. If | |
102 there is a slab allocator for the assembler buffer, a pointer to it can also be | |
103 installed in TLS if needed. | |
104 | |
105 The translation thread destroys the Cfg when it is done translating, including | |
106 the Cfg's slab allocator, and clears the allocator pointer from its TLS. | |
107 Likewise, the writer thread destroys the assembler buffer when it is finished | |
108 with it. | |
109 | |
110 Thread safety | |
111 ------------- | |
112 | |
113 The parse/translate/write stages of the translation pipeline are fairly | |
114 independent, with little opportunity for threads to interfere. The Subzero | |
115 design calls for all shared accesses to go through the GlobalContext, which adds | |
116 locking as appropriate. This includes the coarse-grain work queues for Cfgs and | |
117 assembler buffers. It also includes finer-grain access to constant pool | |
118 entries, as well as output streams for verbose debugging output. | |
119 | |
120 If locked access to constant pools becomes a bottleneck, we can investigate | |
121 thread-local caches of constants (as mentioned earlier). Also, it should be | |
122 safe though slightly less efficient to allow duplicate copies of constants | |
123 across threads (which could be de-dupped by the writer at the end). | |
124 | |
125 We will use ThreadSanitizer as a way to detect potential data races in the | |
126 implementation. | |
OLD | NEW |