Chromium Code Reviews| Index: docs/PNaClDeveloperGuide.rst |
| diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst |
| index 9c27ae5c14cb7a2215ee6f9363080c5d4f5b1449..2c5a4ba497df53846d4f67482182ac4e3bc81ba7 100644 |
| --- a/docs/PNaClDeveloperGuide.rst |
| +++ b/docs/PNaClDeveloperGuide.rst |
| @@ -14,126 +14,158 @@ TODO |
| Memory Model and Atomics |
| ======================== |
| -Volatile Memory Accesses |
| ------------------------- |
| - |
| -The C11/C++11 standards mandate that ``volatile`` accesses execute in program |
| -order (but are not fences, so other memory operations can reorder around them), |
| -are not necessarily atomic, and can’t be elided. They can be separated into |
| -smaller width accesses. |
| - |
| -The PNaCl toolchain applies regular LLVM optimizations along these guidelines, |
| -and it further prevents any load/store (even non-``volatile`` and non-atomic |
| -ones) from moving above or below a volatile operations: they act as compiler |
| -barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| -accesses after optimizations into atomic accesses with sequentially consistent |
| -memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and |
| -combined with builtin fences these programs can do meaningful cross-thread |
| -communication without changing code. It also reflects the original code's intent |
| -and guarantees better portability. |
| - |
| -Relaxed ordering could be used instead, but for the first release it is more |
| -conservative to apply sequential consistency. Future releases may change what |
| -happens at compile-time, but already-released pexes will continue using |
| -sequential consistency. |
| - |
| -The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| -naturally aligned, and tries to guarantee this alignment. |
| - |
| Memory Model for Concurrent Operations |
| -------------------------------------- |
| -The memory model offered by PNaCl relies on the same coding guidelines as the |
| -C11/C++11 one: concurrent accesses must always occur through atomic primitives |
| -(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and |
| -these accesses must always occur with the same size for the same memory |
| -location. Visibility of stores is provided on a happens-before basis that |
| -relates memory locations to each other as the C11/C++11 standards do. |
| - |
| -As in C11/C++11 some atomic accesses may be implemented with locks on certain |
| -platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying |
| -that all types are sometimes lock-free. The ``is_lock_free`` methods will return |
| -the current platform's implementation at translation time. |
| - |
| -The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style |
| -``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. |
| -``volatile`` memory accesses can also be used, though these are discouraged, and |
| -aren't present in bitcode. |
| +The memory model offered by PNaCl relies on the same coding guidelines |
| +as the C11/C++11 one: concurrent accesses must always occur through |
| +atomic primitives (offered by `atomic intrinsics |
| +<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always |
| +occur with the same size for the same memory location. Visibility of |
| +stores is provided on a happens-before basis that relates memory |
| +locations to each other as the C11/C++11 standards do. |
| + |
| +Non-atomic memory accesses may be reordered, separated, elided or fused |
| +according to C and C++'s memory model before the pexe is created as well |
| +as after its creation. |
| + |
| +As in C11/C++11 some atomic accesses may be implemented with locks on |
| +certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be |
| +``1``, signifying that all types are sometimes lock-free. The |
| +``is_lock_free`` methods and ``atomic_is_lock_free`` will return the |
| +current platform's implementation at translation time. These macros, |
| +methods and functions are in the C11 header ``<stdatomic.h>`` and the |
| +C++11 header ``<atomic>``. |
| + |
| +The PNaCl toolchain supports concurrent memory accesses through legacy |
| +GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic |
| +primitives. ``volatile`` memory accesses can also be used, though these |
| +are discouraged. See `Volatile Memory Accesses`_. |
| PNaCl supports concurrency and parallelism with some restrictions: |
| -* Threading is explicitly supported. |
| - |
| -* Inter-process communication through shared memory is limited to operations |
| - which are lock-free on the current platform (``is_lock_free`` methods). This |
| - may change at a later date. |
| - |
| -* Direct interaction with device memory isn't supported. |
| - |
| -* Signal handling isn't supported, PNaCl therefore promotes all primitives to |
| - cross-thread (instead of single-thread). This may change at a later date. Note |
| - that using atomic operations which aren't lock-free may lead to deadlocks when |
| - handling asynchronous signals. |
| +* Threading is explicitly supported and has no restrictions over what |
| + prevalent implementations offer. See `Threading`_. |
| -* ``volatile`` and atomic operations are address-free (operations on the same |
| - memory location via two different addresses work atomically), as intended by |
| - the C11/C++11 standards. This is critical for inter-process communication as |
| - well as synchronous "external modifications" such as mapping underlying memory |
| - at multiple locations. |
| - |
| -Setting up the above mechanisms requires assistance from the embedding sandbox's |
| -runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through |
| -regular C/C++ code. |
| - |
| -The PNaCl toolchain currently optimizes for memory ordering as LLVM normally |
| -does, but at pexe creation time it promotes all ``volatile`` accesses as well as |
| -all atomic accesses to be sequentially consistent. Other memory orderings will |
| -be supported in a future release, but pexes generated with the current toolchain |
| -will continue functioning with sequential consistency. Using sequential |
| -consistency provides a total ordering for all sequentially-consistent operations |
| -on all addresses. |
| - |
| -This means that ``volatile`` and atomic memory accesses can only be re-ordered |
| -in some limited way before the pexe is created, and will act as fences for all |
| -memory accesses (even non-atomic and non-``volatile``) after pexe creation. |
| -Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence |
| -intervenes), separated, elided or fused according to C and C++'s memory model |
| -before the pexe is created as well as after its creation. |
| +* ``volatile`` and atomic operations are address-free (operations on the |
| + same memory location via two different addresses work atomically), as |
| + intended by the C11/C++11 standards. This is critical for |
| + inter-process communication as well as synchronous "external |
| + modifications" such as mapping underlying memory at multiple |
| + locations. |
| + |
| +* Inter-process communication through shared memory is currently not |
| + supported. See `Future Direction`_. |
| + |
| +* Signal handling isn't supported, PNaCl therefore promotes all |
| + primitives to cross-thread (instead of single-thread). This may change |
| + at a later date. Note that using atomic operations which aren't |
| + lock-free may lead to deadlocks when handling asynchronous |
| + signals. See `Future Direction`_. |
| + |
| +* Direct interaction with device memory isn't supported, and there is no |
| + intent to support it. The embedding sandbox's runtime can offer APIs |
| + to indirectly access devices. |
| + |
| +Setting up the above mechanisms requires assistance from the embedding |
| +sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup |
| +can be done through regular C/C++ code. |
| Atomic Memory Ordering Constraints |
| ---------------------------------- |
| -Atomics follow the same ordering constraints as in regular LLVM, but |
| -all accesses are promoted to sequential consistency (the strongest |
| -memory ordering) at pexe creation time. As more C11/C++11 code |
| -allows us to understand performance and portability needs we intend |
| -to support the full gamut of C11/C++11 memory orderings: |
| +Atomics follow the same ordering constraints as in regular C11/C++11, |
| +but all accesses are promoted to sequential consistency (the strongest |
| +memory ordering) at pexe creation time. As more C11/C++11 code allows us |
| +to understand performance and portability needs we intend to support the |
| +full gamut of C11/C++11 memory orderings: |
| - Relaxed: no operation orders memory. |
| -- Consume: a load operation performs a consume operation on the affected memory |
| - location (currently unsupported by LLVM). |
| -- Acquire: a load operation performs an acquire operation on the affected memory |
| - location. |
| -- Release: a store operation performs a release operation on the affected memory |
| - location. |
| +- Consume: a load operation performs a consume operation on the affected |
| + memory location (note: currently unsupported by LLVM). |
| +- Acquire: a load operation performs an acquire operation on the |
| + affected memory location. |
| +- Release: a store operation performs a release operation on the |
| + affected memory location. |
| - Acquire-release: load and store operations perform acquire and release |
| operations on the affected memory. |
| -- Sequentially consistent: same as acquire-release, but providing a global total |
| - ordering for all affected locations. |
| +- Sequentially consistent: same as acquire-release, but providing a |
| + global total ordering for all affected locations. |
| As in C11/C++11: |
| - Atomic accesses must at least be naturally aligned. |
| -- Some accesses may not actually be atomic on certain platforms, requiring an |
| - implementation that uses a global lock. |
| -- An atomic memory location must always be accessed with atomic primitives, and |
| - these primitives must always be of the same bit size for that location. |
| +- Some accesses may not actually be atomic on certain platforms, |
| + requiring an implementation that uses global lock(s). |
| +- An atomic memory location must always be accessed with atomic |
| + primitives, and these primitives must always be of the same bit size |
| + for that location. |
| - Not all memory orderings are valid for all atomic operations. |
| +Volatile Memory Accesses |
| +------------------------ |
| + |
| +The C11/C++11 standards mandate that ``volatile`` accesses execute in |
| +program order (but are not fences, so other memory operations can |
| +reorder around them), are not necessarily atomic, and can’t be |
| +elided. They can be separated into smaller width accesses. |
| + |
| +Before any optimizations occur the PNaCl toolchain transforms |
| +``volatile`` loads and stores into sequentially consistent ``volatile`` |
| +atomic loads and stores, and applies regular compiler optimizations |
| +along the above guidelines. This orders ``volatiles`` according to the |
| +atomic rules, and means that fences (including ``__sync_synchronize``) |
| +act in a better-defined manner. Regular memory accesses still do not |
| +have ordering guarantees with ``volatile`` and atomic accesses, though |
| +the internal representation of ``__sync_synchronize`` attempts to |
| +prevent reordering of memory accesses to objects which may escape. |
| + |
| +Relaxed ordering could be used instead, but for the first release it is |
| +more conservative to apply sequential consistency. Future releases may |
| +change what happens at compile-time, but already-released pexes will |
| +continue using sequential consistency. |
| + |
| +The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| +naturally aligned, and tries to guarantee this alignment. |
| + |
| +The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
| +code, and combined with builtin fences these programs can do meaningful |
| +cross-thread communication without changing code. They also better |
| +reflect the original code's intent and guarantee better portability. |
| + |
| +Threading |
| +========= |
| + |
| +Threading is explicitly supported through C11/C++11's threading |
| +libraries as well as POSIX threads. |
| + |
| +Communication between threads should use atomic primitives as described |
| +in `Memory Model and Atomics`_. |
| + |
| Inline Assembly |
| =============== |
| Inline assembly isn't supported by PNaCl because it isn't portable. The |
| one current exception is the common compiler barrier idiom |
| ``asm("":::"memory")``, which gets transformed to a sequentially |
| -consistent memory barrier (equivalent to ``__sync_synchronize()``). |
| +consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
| +PNaCl this barrier is only guaranteed to order ``volatile`` and atomic |
| +memory accesses, though in practice the implementation attempts to also |
| +prevent reordering of memory accesses to objects which may escape. |
| + |
| +Future Direction |
|
eliben
2013/08/06 16:11:35
"Future Directions" ?
JF
2013/08/06 16:19:33
Done.
|
| +================ |
| + |
| +Inter-Process Communication |
| +--------------------------- |
| + |
| +Inter-process communication through shared memory is currently not |
| +supported by PNaCl. When implemented, it may be limited to operations |
| +which are lock-free on the current platform (``is_lock_free`` methods). |
| + |
| +Signal Handling |
| +--------------- |
| + |
| +Untrusted signal handling currently isn't supported by PNaCl. When |
| +supported, the impact of ``volatile`` and atomics for same-thread signal |
| +handling will need to be carefully detailed. |