ALLOCATION.rst - Issue 802183004: Subzero: Use CFG-local arena allocation for relevant containers.

Side by Side Diff: ALLOCATION.rst

Issue 802183004: Subzero: Use CFG-local arena allocation for relevant containers. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: Typo fix Created 6 years ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 Object allocation and lifetime in ICE

	2 =====================================

	3

	4 This document discusses object lifetime and scoping issues, starting with

	5 bitcode parsing and ending with ELF file emission.

	6

	7 Multithreaded translation model

	8 -------------------------------

	9

	10 A single thread is responsible for parsing PNaCl bitcode (possibly concurrently

	11 with downloading the bitcode file) and constructing the initial high-level ICE.

	12 The result is a queue of Cfg pointers. The parser thread incrementally adds a

	13 Cfg pointer to the queue after the Cfg is created, and then moves on to parse

	14 the next function.

	15

	16 Multiple translation worker threads draw from the queue of Cfg pointers as they

	17 are added to the queue, such that several functions can be translated in paralle l.

	18 The result is a queue of assembler buffers, each of which consists of machine co de

	19 plus fixups.

	20

	21 A single thread is responsible for writing the assembler buffers to an ELF file.

	22 It consumes the assembler buffers from the queue that the translation threads

	23 write to.

	24

	25 This means that Cfgs are created by the parser thread and destroyed by the

	26 translation thread (including Cfg nodes, instructions, and most kinds of

	27 operands), and assembler buffers are created by the translation thread and

	28 destroyed by the writer thread.

	29

	30 Deterministic execution

	31 ^^^^^^^^^^^^^^^^^^^^^^^

	32

	33 Although code randomization is a key aspect of security, deterministic and

	34 repeatable translation is sometimes needed, e.g. for regression testing.

	35 Multithreaded translation introduces potential for randomness that may need to

	36 be made deterministic.

	37

	38 * Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the

	39 translation queue in deterministic order. But since translation is

	40 multithreaded, FIFO order for the assembler buffer queue may not be

	41 deterministic. The writer thread would be responsible for reordering the

	42 buffers, potentially waiting for slower translations to complete even if other

	43 assembler buffers are available.

	44

	45 * Different translation threads may add new constant pool entries at different

	46 times. Some constant pool entries are emitted as read-only data. This

	47 includes floating-point constants for x86, as well as integer immediate

	48 randomization through constant pooling. These constant pool entries are

	49 emitted after all assembler buffers have been written. The writer needs to be

	50 able to sort them deterministically before emitting them.

	51

	52 Object lifetimes

	53 ----------------

	54

	55 Objects of type Constant, or a subclass of Constant, are pooled globally. The

	56 pooling is managed by the GlobalContext class. Since Constants are added or

	57 looked up by translation threads and the parser thread, access to the constant

	58 pools, as well as GlobalContext in general, need to be arbitrated by locks.

	59 (It's possible that if there's too much contention, we can maintain a

	60 thread-local cache for Constant pool lookups.) Constants live across all

	61 function translations, and are destroyed only at the end.

	62

	63 Several object types are scoped within the lifetime of the Cfg. These include

	64 CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand.

	65 When the Cfg is destroyed, these scoped objects are destroyed as well. To keep

	66 this cheap, the Cfg includes a slab allocator from which these objects are

	67 allocated, and the objects should not contain fields with non-trivial

	68 destructors. Most of these fields are POD, but in a couple of cases these

	69 fields are STL containers. We deal with this, and avoid leaking memory, by

	70 providing the container with an allocator that uses the Cfg-local slab

	71 allocator. Since the container allocator generally needs to be stateless, we

	72 store a pointer to the slab allocator in thread-local storage (TLS). This is

	73 straightforward since on any of the threads, only one Cfg is active at a time,

	74 and a given Cfg is only active in one thread at a time (either the parser

	75 thread, or at most one translation thread, or the writer thread).

	76

	77 Even though there is a one-to-one correspondence between Cfgs and assembler

	78 buffers, they need to use different allocators. This is because the translation

	79 thread wants to destroy the Cfg and reclaim all its memory after translation

	80 completes, but possibly before the assembly buffer is written to the ELF file.

	81 Ownership of the assembler buffer and its allocator are transferred to the

	82 writer thread after translation completes, similar to the way ownership of the

	83 Cfg and its allocator are transferred to the translation thread after parsing

	84 completes.

	85

	86 Allocators and TLS

	87 ------------------

	88

	89 Part of the Cfg building, and transformations on the Cfg, include STL container

	90 operations which may need to allocate additional memory in a stateless fashion.

	91 This requires maintaining the proper slab allocator pointer in TLS.

	92

	93 When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's

	94 slab allocator into its own TLS. This is used as the Cfg is built within the

	95 parser thread. After the Cfg is built, the parser thread clears its allocator

	96 pointer, adds the new Cfg pointer to the translation queue, continues with the

	97 next function.

	98

	99 When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab

	100 allocator into its TLS and translates the function. When generating the

	101 assembly buffer, it must take care not to use the Cfg's slab allocator. If

	102 there is a slab allocator for the assembler buffer, a pointer to it can also be

	103 installed in TLS if needed.

	104

	105 The translation thread destroys the Cfg when it is done translating, including

	106 the Cfg's slab allocator, and clears the allocator pointer from its TLS.

	107 Likewise, the writer thread destroys the assembler buffer when it is finished

	108 with it.

	109

	110 Thread safety

	111 -------------

	112

	113 The parse/translate/write stages of the translation pipeline are fairly

	114 independent, with little opportunity for threads to interfere. The Subzero

	115 design calls for all shared accesses to go through the GlobalContext, which adds

	116 locking as appropriate. This includes the coarse-grain work queues for Cfgs and

	117 assembler buffers. It also includes finer-grain access to constant pool

	118 entries, as well as output streams for verbose debugging output.

	119

	120 If locked access to constant pools becomes a bottleneck, we can investigate

	121 thread-local caches of constants (as mentioned earlier). Also, it should be

	122 safe though slightly less efficient to allow duplicate copies of constants

	123 across threads (which could be de-dupped by the writer at the end).

	124

	125 We will use ThreadSanitizer as a way to detect potential data races in the

	126 implementation.

OLD	NEW

« no previous file with comments | « no previous file | src/IceCfg.h » ('j') | no next file with comments »