Chromium Code Reviews| Index: docs/PNaClDeveloperGuide.rst |
| diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst |
| index 9c27ae5c14cb7a2215ee6f9363080c5d4f5b1449..a66e35109913b27176f706c033be05a52360331c 100644 |
| --- a/docs/PNaClDeveloperGuide.rst |
| +++ b/docs/PNaClDeveloperGuide.rst |
| @@ -14,126 +14,142 @@ TODO |
| Memory Model and Atomics |
| ======================== |
| -Volatile Memory Accesses |
| ------------------------- |
| - |
| -The C11/C++11 standards mandate that ``volatile`` accesses execute in program |
| -order (but are not fences, so other memory operations can reorder around them), |
| -are not necessarily atomic, and can’t be elided. They can be separated into |
| -smaller width accesses. |
| - |
| -The PNaCl toolchain applies regular LLVM optimizations along these guidelines, |
| -and it further prevents any load/store (even non-``volatile`` and non-atomic |
| -ones) from moving above or below a volatile operations: they act as compiler |
| -barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| -accesses after optimizations into atomic accesses with sequentially consistent |
| -memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and |
| -combined with builtin fences these programs can do meaningful cross-thread |
| -communication without changing code. It also reflects the original code's intent |
| -and guarantees better portability. |
| - |
| -Relaxed ordering could be used instead, but for the first release it is more |
| -conservative to apply sequential consistency. Future releases may change what |
| -happens at compile-time, but already-released pexes will continue using |
| -sequential consistency. |
| - |
| -The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| -naturally aligned, and tries to guarantee this alignment. |
| - |
| Memory Model for Concurrent Operations |
| -------------------------------------- |
| -The memory model offered by PNaCl relies on the same coding guidelines as the |
| -C11/C++11 one: concurrent accesses must always occur through atomic primitives |
| -(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and |
| -these accesses must always occur with the same size for the same memory |
| -location. Visibility of stores is provided on a happens-before basis that |
| -relates memory locations to each other as the C11/C++11 standards do. |
| - |
| -As in C11/C++11 some atomic accesses may be implemented with locks on certain |
| -platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying |
| -that all types are sometimes lock-free. The ``is_lock_free`` methods will return |
| -the current platform's implementation at translation time. |
| - |
| -The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style |
| -``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. |
| -``volatile`` memory accesses can also be used, though these are discouraged, and |
| -aren't present in bitcode. |
| +The memory model offered by PNaCl relies on the same coding guidelines |
| +as the C11/C++11 one: concurrent accesses must always occur through |
| +atomic primitives (offered by `atomic intrinsics |
| +<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always |
| +occur with the same size for the same memory location. Visibility of |
| +stores is provided on a happens-before basis that relates memory |
| +locations to each other as the C11/C++11 standards do. |
| + |
| +As in C11/C++11 some atomic accesses may be implemented with locks on |
| +certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be |
| +``1``, signifying that all types are sometimes lock-free. The |
| +``is_lock_free`` methods and ``atomic_is_lock_free`` will return the |
| +current platform's implementation at translation time. These macros, |
| +methods and functions are in the C11 header ``<stdatomic.h>`` and the |
| +C++11 header ``<atomic>``. |
| + |
| +The PNaCl toolchain supports concurrent memory accesses through legacy |
| +GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic |
| +primitives. ``volatile`` memory accesses can also be used, though these |
| +are discouraged, and aren't present in bitcode. See `Volatile Memory |
|
eliben
2013/08/05 18:35:54
Remove the "aren't present in bitcode" - i don't t
jvoung (off chromium)
2013/08/05 18:59:41
Yeah, this is the summary for developers. What is
JF
2013/08/05 20:37:48
Done.
|
| +Accesses`_. |
| PNaCl supports concurrency and parallelism with some restrictions: |
| -* Threading is explicitly supported. |
| +* Threading is explicitly supported through C11/C++11's threading |
| + libraries as well as POSIX threads. |
| -* Inter-process communication through shared memory is limited to operations |
| - which are lock-free on the current platform (``is_lock_free`` methods). This |
| - may change at a later date. |
| +* Inter-process communication through shared memory is limited to |
|
eliben
2013/08/05 18:35:54
What does inter-process communication even mean in
JF
2013/08/05 20:37:48
Done.
|
| + operations which are lock-free on the current platform |
| + (``is_lock_free`` methods). This may change at a later date. |
| * Direct interaction with device memory isn't supported. |
| -* Signal handling isn't supported, PNaCl therefore promotes all primitives to |
| - cross-thread (instead of single-thread). This may change at a later date. Note |
| - that using atomic operations which aren't lock-free may lead to deadlocks when |
| - handling asynchronous signals. |
| +* Signal handling isn't supported, PNaCl therefore promotes all |
| + primitives to cross-thread (instead of single-thread). This may change |
| + at a later date. Note that using atomic operations which aren't |
| + lock-free may lead to deadlocks when handling asynchronous signals. |
| -* ``volatile`` and atomic operations are address-free (operations on the same |
| - memory location via two different addresses work atomically), as intended by |
| - the C11/C++11 standards. This is critical for inter-process communication as |
| - well as synchronous "external modifications" such as mapping underlying memory |
| - at multiple locations. |
| - |
| -Setting up the above mechanisms requires assistance from the embedding sandbox's |
| -runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through |
| -regular C/C++ code. |
| - |
| -The PNaCl toolchain currently optimizes for memory ordering as LLVM normally |
| -does, but at pexe creation time it promotes all ``volatile`` accesses as well as |
| -all atomic accesses to be sequentially consistent. Other memory orderings will |
| -be supported in a future release, but pexes generated with the current toolchain |
| -will continue functioning with sequential consistency. Using sequential |
| -consistency provides a total ordering for all sequentially-consistent operations |
| -on all addresses. |
| - |
| -This means that ``volatile`` and atomic memory accesses can only be re-ordered |
| -in some limited way before the pexe is created, and will act as fences for all |
| -memory accesses (even non-atomic and non-``volatile``) after pexe creation. |
| -Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence |
| -intervenes), separated, elided or fused according to C and C++'s memory model |
| -before the pexe is created as well as after its creation. |
| +* ``volatile`` and atomic operations are address-free (operations on the |
| + same memory location via two different addresses work atomically), as |
| + intended by the C11/C++11 standards. This is critical for |
| + inter-process communication as well as synchronous "external |
| + modifications" such as mapping underlying memory at multiple |
| + locations. |
| + |
| +Setting up the above mechanisms requires assistance from the embedding |
| +sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup |
| +can be done through regular C/C++ code. |
| Atomic Memory Ordering Constraints |
| ---------------------------------- |
| -Atomics follow the same ordering constraints as in regular LLVM, but |
| -all accesses are promoted to sequential consistency (the strongest |
| -memory ordering) at pexe creation time. As more C11/C++11 code |
| -allows us to understand performance and portability needs we intend |
| -to support the full gamut of C11/C++11 memory orderings: |
| +Atomics follow the same ordering constraints as in regular LLVM, but all |
| +accesses are promoted to sequential consistency (the strongest memory |
| +ordering) at pexe creation time. As more C11/C++11 code allows us to |
|
jvoung (off chromium)
2013/08/05 18:59:41
Should the memory orderings change also be done by
JF
2013/08/05 20:37:48
I think the current implementation should offer a
|
| +understand performance and portability needs we intend to support the |
| +full gamut of C11/C++11 memory orderings: |
| - Relaxed: no operation orders memory. |
|
eliben
2013/08/05 18:35:54
Maybe this list does not belong here? This is user
JF
2013/08/05 20:37:48
This is not an addition, I just moved it around. I
|
| -- Consume: a load operation performs a consume operation on the affected memory |
| - location (currently unsupported by LLVM). |
| -- Acquire: a load operation performs an acquire operation on the affected memory |
| - location. |
| -- Release: a store operation performs a release operation on the affected memory |
| - location. |
| +- Consume: a load operation performs a consume operation on the affected |
| + memory location (currently unsupported by LLVM). |
| +- Acquire: a load operation performs an acquire operation on the |
| + affected memory location. |
| +- Release: a store operation performs a release operation on the |
| + affected memory location. |
| - Acquire-release: load and store operations perform acquire and release |
| operations on the affected memory. |
| -- Sequentially consistent: same as acquire-release, but providing a global total |
| - ordering for all affected locations. |
| +- Sequentially consistent: same as acquire-release, but providing a |
| + global total ordering for all affected locations. |
| As in C11/C++11: |
| - Atomic accesses must at least be naturally aligned. |
| -- Some accesses may not actually be atomic on certain platforms, requiring an |
| - implementation that uses a global lock. |
| -- An atomic memory location must always be accessed with atomic primitives, and |
| - these primitives must always be of the same bit size for that location. |
| +- Some accesses may not actually be atomic on certain platforms, |
| + requiring an implementation that uses global lock(s). |
| +- An atomic memory location must always be accessed with atomic |
| + primitives, and these primitives must always be of the same bit size |
| + for that location. |
| - Not all memory orderings are valid for all atomic operations. |
| +Volatile Memory Accesses |
| +------------------------ |
| + |
| +The C11/C++11 standards mandate that ``volatile`` accesses execute in |
| +program order (but are not fences, so other memory operations can |
| +reorder around them), are not necessarily atomic, and can’t be |
| +elided. They can be separated into smaller width accesses. |
| + |
| +Before any optimizations occur the PNaCl toolchain transforms |
| +``volatile`` loads and stores into sequentially consistent ``volatile`` |
| +atomic loads and stores, and applies regular LLVM optimizations along |
| +the above guidelines. This orders ``volatiles`` according to the atomic |
| +rules, and means that fences (including ``__sync_synchronize``) act in a |
| +better-defined manner. Regular memory accesses still do not have |
| +ordering guarantees with ``volatile`` and atomic accesses, though the |
| +internal representation of ``__sync_synchronize`` attempts to prevent |
| +reordering of memory accesses to objects which may escape. |
| + |
| +Relaxed ordering could be used instead, but for the first release it is |
| +more conservative to apply sequential consistency. Future releases may |
| +change what happens at compile-time, but already-released pexes will |
| +continue using sequential consistency. |
| + |
| +The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| +naturally aligned, and tries to guarantee this alignment. |
| + |
| +The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
| +code, and combined with builtin fences these programs can do meaningful |
| +cross-thread communication without changing code. They also better |
| +reflect the original code's intent and guarantee better portability. |
| + |
| +Stable Transfer Format |
| +---------------------- |
| + |
| +The PNaCl toolchain freezes atomic and ``volatile`` memory accesses |
| +after optimizations into atomic accesses with sequentially consistent |
|
jvoung (off chromium)
2013/08/05 18:59:41
re "after optimizations": volatiles get converted
JF
2013/08/05 20:37:48
OK, I can remove this section.
|
| +memory ordering. Other memory orderings will be exposed in future |
| +releases, when we have a better grasp of existing code's needs, |
| +portability implications, and are confident that implementation limits |
| +are overcome. Future releases may change what happens at compile-time, |
| +but already-released pexes will continue using sequential consistency. |
| + |
| +Non-atomic and non-``volatile`` memory accesses may be reordered, |
| +separated, elided or fused according to C and C++'s memory model before |
| +the pexe is created as well as after its creation. |
| + |
| Inline Assembly |
| =============== |
| Inline assembly isn't supported by PNaCl because it isn't portable. The |
| one current exception is the common compiler barrier idiom |
| ``asm("":::"memory")``, which gets transformed to a sequentially |
| -consistent memory barrier (equivalent to ``__sync_synchronize()``). |
| +consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
| +PNaCl this barrier is only guaranteed to order ``volatile`` and atomic |
| +memory accesses, though in practice the implementation attempts to also |
| +prevent reordering of memory accesses to objects which may escape. |