Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(26)

Unified Diff: ALLOCATION.rst

Issue 802183004: Subzero: Use CFG-local arena allocation for relevant containers. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Patch Set: Typo fix Created 6 years ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | src/IceCfg.h » ('j') | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: ALLOCATION.rst
diff --git a/ALLOCATION.rst b/ALLOCATION.rst
new file mode 100644
index 0000000000000000000000000000000000000000..b1e848f8afd25fccfffa8017bfdcf276183000a6
--- /dev/null
+++ b/ALLOCATION.rst
@@ -0,0 +1,126 @@
+Object allocation and lifetime in ICE
+=====================================
+
+This document discusses object lifetime and scoping issues, starting with
+bitcode parsing and ending with ELF file emission.
+
+Multithreaded translation model
+-------------------------------
+
+A single thread is responsible for parsing PNaCl bitcode (possibly concurrently
+with downloading the bitcode file) and constructing the initial high-level ICE.
+The result is a queue of Cfg pointers. The parser thread incrementally adds a
+Cfg pointer to the queue after the Cfg is created, and then moves on to parse
+the next function.
+
+Multiple translation worker threads draw from the queue of Cfg pointers as they
+are added to the queue, such that several functions can be translated in parallel.
+The result is a queue of assembler buffers, each of which consists of machine code
+plus fixups.
+
+A single thread is responsible for writing the assembler buffers to an ELF file.
+It consumes the assembler buffers from the queue that the translation threads
+write to.
+
+This means that Cfgs are created by the parser thread and destroyed by the
+translation thread (including Cfg nodes, instructions, and most kinds of
+operands), and assembler buffers are created by the translation thread and
+destroyed by the writer thread.
+
+Deterministic execution
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Although code randomization is a key aspect of security, deterministic and
+repeatable translation is sometimes needed, e.g. for regression testing.
+Multithreaded translation introduces potential for randomness that may need to
+be made deterministic.
+
+* Bitcode parsing is sequential, so it's easy to use a FIFO queue to keep the
+ translation queue in deterministic order. But since translation is
+ multithreaded, FIFO order for the assembler buffer queue may not be
+ deterministic. The writer thread would be responsible for reordering the
+ buffers, potentially waiting for slower translations to complete even if other
+ assembler buffers are available.
+
+* Different translation threads may add new constant pool entries at different
+ times. Some constant pool entries are emitted as read-only data. This
+ includes floating-point constants for x86, as well as integer immediate
+ randomization through constant pooling. These constant pool entries are
+ emitted after all assembler buffers have been written. The writer needs to be
+ able to sort them deterministically before emitting them.
+
+Object lifetimes
+----------------
+
+Objects of type Constant, or a subclass of Constant, are pooled globally. The
+pooling is managed by the GlobalContext class. Since Constants are added or
+looked up by translation threads and the parser thread, access to the constant
+pools, as well as GlobalContext in general, need to be arbitrated by locks.
+(It's possible that if there's too much contention, we can maintain a
+thread-local cache for Constant pool lookups.) Constants live across all
+function translations, and are destroyed only at the end.
+
+Several object types are scoped within the lifetime of the Cfg. These include
+CfgNode, Inst, Variable, and any target-specific subclasses of Inst and Operand.
+When the Cfg is destroyed, these scoped objects are destroyed as well. To keep
+this cheap, the Cfg includes a slab allocator from which these objects are
+allocated, and the objects should not contain fields with non-trivial
+destructors. Most of these fields are POD, but in a couple of cases these
+fields are STL containers. We deal with this, and avoid leaking memory, by
+providing the container with an allocator that uses the Cfg-local slab
+allocator. Since the container allocator generally needs to be stateless, we
+store a pointer to the slab allocator in thread-local storage (TLS). This is
+straightforward since on any of the threads, only one Cfg is active at a time,
+and a given Cfg is only active in one thread at a time (either the parser
+thread, or at most one translation thread, or the writer thread).
+
+Even though there is a one-to-one correspondence between Cfgs and assembler
+buffers, they need to use different allocators. This is because the translation
+thread wants to destroy the Cfg and reclaim all its memory after translation
+completes, but possibly before the assembly buffer is written to the ELF file.
+Ownership of the assembler buffer and its allocator are transferred to the
+writer thread after translation completes, similar to the way ownership of the
+Cfg and its allocator are transferred to the translation thread after parsing
+completes.
+
+Allocators and TLS
+------------------
+
+Part of the Cfg building, and transformations on the Cfg, include STL container
+operations which may need to allocate additional memory in a stateless fashion.
+This requires maintaining the proper slab allocator pointer in TLS.
+
+When the parser thread creates a new Cfg object, it puts a pointer to the Cfg's
+slab allocator into its own TLS. This is used as the Cfg is built within the
+parser thread. After the Cfg is built, the parser thread clears its allocator
+pointer, adds the new Cfg pointer to the translation queue, continues with the
+next function.
+
+When the translation thread grabs a new Cfg pointer, it installs the Cfg's slab
+allocator into its TLS and translates the function. When generating the
+assembly buffer, it must take care not to use the Cfg's slab allocator. If
+there is a slab allocator for the assembler buffer, a pointer to it can also be
+installed in TLS if needed.
+
+The translation thread destroys the Cfg when it is done translating, including
+the Cfg's slab allocator, and clears the allocator pointer from its TLS.
+Likewise, the writer thread destroys the assembler buffer when it is finished
+with it.
+
+Thread safety
+-------------
+
+The parse/translate/write stages of the translation pipeline are fairly
+independent, with little opportunity for threads to interfere. The Subzero
+design calls for all shared accesses to go through the GlobalContext, which adds
+locking as appropriate. This includes the coarse-grain work queues for Cfgs and
+assembler buffers. It also includes finer-grain access to constant pool
+entries, as well as output streams for verbose debugging output.
+
+If locked access to constant pools becomes a bottleneck, we can investigate
+thread-local caches of constants (as mentioned earlier). Also, it should be
+safe though slightly less efficient to allow duplicate copies of constants
+across threads (which could be de-dupped by the writer at the end).
+
+We will use ThreadSanitizer as a way to detect potential data races in the
+implementation.
« no previous file with comments | « no previous file | src/IceCfg.h » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698