ALLOCATION.rst - Issue 1562543002: move .rst to docs

Unified Diff: ALLOCATION.rst

Issue 1562543002: move .rst to docs (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: changes suggested by stichnot Created 4 years, 11 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: ALLOCATION.rst

diff --git a/ALLOCATION.rst b/ALLOCATION.rst

deleted file mode 100644

index b1e848f8afd25fccfffa8017bfdcf276183000a6..0000000000000000000000000000000000000000

--- a/ALLOCATION.rst

+++ /dev/null

@@ -1,126 +0,0 @@

-Object allocation and lifetime in ICE

-=====================================

-This document discusses object lifetime and scoping issues, starting with

-bitcode parsing and ending with ELF file emission.

-Multithreaded translation model

--------------------------------

-A single thread is responsible for parsing PNaCl bitcode (possibly concurrently

-with downloading the bitcode file) and constructing the initial high-level ICE.

-The result is a queue of Cfg pointers. The parser thread incrementally adds a

-Cfg pointer to the queue after the Cfg is created, and then moves on to parse

-the next function.

-Multiple translation worker threads draw from the queue of Cfg pointers as they

-are added to the queue, such that several functions can be translated in parallel.

-The result is a queue of assembler buffers, each of which consists of machine code

-plus fixups.

-A single thread is responsible for writing the assembler buffers to an ELF file.

-It consumes the assembler buffers from the queue that the translation threads

-write to.

-This means that Cfgs are created by the parser thread and destroyed by the

-translation thread (including Cfg nodes, instructions, and most kinds of

-operands), and assembler buffers are created by the translation thread and

-destroyed by the writer thread.

-Deterministic execution

-^^^^^^^^^^^^^^^^^^^^^^^

-Although code randomization is a key aspect of security, deterministic and

-repeatable translation is sometimes needed, e.g. for regression testing.

-Multithreaded translation introduces potential for randomness that may need to

-be made deterministic.

-* Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the

- translation queue in deterministic order. But since translation is

- multithreaded, FIFO order for the assembler buffer queue may not be

- deterministic. The writer thread would be responsible for reordering the

- buffers, potentially waiting for slower translations to complete even if other

- assembler buffers are available.

-* Different translation threads may add new constant pool entries at different

- times. Some constant pool entries are emitted as read-only data. This

- includes floating-point constants for x86, as well as integer immediate

- randomization through constant pooling. These constant pool entries are

- emitted after all assembler buffers have been written. The writer needs to be

- able to sort them deterministically before emitting them.

-Object lifetimes

-----------------

-Objects of type Constant, or a subclass of Constant, are pooled globally. The

-pooling is managed by the GlobalContext class. Since Constants are added or

-looked up by translation threads and the parser thread, access to the constant

-pools, as well as GlobalContext in general, need to be arbitrated by locks.

-(It's possible that if there's too much contention, we can maintain a

-thread-local cache for Constant pool lookups.) Constants live across all

-function translations, and are destroyed only at the end.

-Several object types are scoped within the lifetime of the Cfg. These include

-CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand.

-When the Cfg is destroyed, these scoped objects are destroyed as well. To keep

-this cheap, the Cfg includes a slab allocator from which these objects are

-allocated, and the objects should not contain fields with non-trivial

-destructors. Most of these fields are POD, but in a couple of cases these

-fields are STL containers. We deal with this, and avoid leaking memory, by

-providing the container with an allocator that uses the Cfg-local slab

-allocator. Since the container allocator generally needs to be stateless, we

-store a pointer to the slab allocator in thread-local storage (TLS). This is

-straightforward since on any of the threads, only one Cfg is active at a time,

-and a given Cfg is only active in one thread at a time (either the parser

-thread, or at most one translation thread, or the writer thread).

-Even though there is a one-to-one correspondence between Cfgs and assembler

-buffers, they need to use different allocators. This is because the translation

-thread wants to destroy the Cfg and reclaim all its memory after translation

-completes, but possibly before the assembly buffer is written to the ELF file.

-Ownership of the assembler buffer and its allocator are transferred to the

-writer thread after translation completes, similar to the way ownership of the

-Cfg and its allocator are transferred to the translation thread after parsing

-completes.

-Allocators and TLS

-------------------

-Part of the Cfg building, and transformations on the Cfg, include STL container

-operations which may need to allocate additional memory in a stateless fashion.

-This requires maintaining the proper slab allocator pointer in TLS.

-When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's

-slab allocator into its own TLS. This is used as the Cfg is built within the

-parser thread. After the Cfg is built, the parser thread clears its allocator

-pointer, adds the new Cfg pointer to the translation queue, continues with the

-next function.

-When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab

-allocator into its TLS and translates the function. When generating the

-assembly buffer, it must take care not to use the Cfg's slab allocator. If

-there is a slab allocator for the assembler buffer, a pointer to it can also be

-installed in TLS if needed.

-The translation thread destroys the Cfg when it is done translating, including

-the Cfg's slab allocator, and clears the allocator pointer from its TLS.

-Likewise, the writer thread destroys the assembler buffer when it is finished

-with it.

-Thread safety

--------------

-The parse/translate/write stages of the translation pipeline are fairly

-independent, with little opportunity for threads to interfere. The Subzero

-design calls for all shared accesses to go through the GlobalContext, which adds

-locking as appropriate. This includes the coarse-grain work queues for Cfgs and

-assembler buffers. It also includes finer-grain access to constant pool

-entries, as well as output streams for verbose debugging output.

-If locked access to constant pools becomes a bottleneck, we can investigate

-thread-local caches of constants (as mentioned earlier). Also, it should be

-safe though slightly less efficient to allow duplicate copies of constants

-across threads (which could be de-dupped by the writer at the end).

-We will use ThreadSanitizer as a way to detect potential data races in the

-implementation.

« no previous file with comments | « .gitignore ('k') | DESIGN.rst » ('j') | no next file with comments »