Index: docs/PNaClDeveloperGuide.rst |
diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst |
index 9c27ae5c14cb7a2215ee6f9363080c5d4f5b1449..a66e35109913b27176f706c033be05a52360331c 100644 |
--- a/docs/PNaClDeveloperGuide.rst |
+++ b/docs/PNaClDeveloperGuide.rst |
@@ -14,126 +14,142 @@ TODO |
Memory Model and Atomics |
======================== |
-Volatile Memory Accesses |
------------------------- |
- |
-The C11/C++11 standards mandate that ``volatile`` accesses execute in program |
-order (but are not fences, so other memory operations can reorder around them), |
-are not necessarily atomic, and can’t be elided. They can be separated into |
-smaller width accesses. |
- |
-The PNaCl toolchain applies regular LLVM optimizations along these guidelines, |
-and it further prevents any load/store (even non-``volatile`` and non-atomic |
-ones) from moving above or below a volatile operations: they act as compiler |
-barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` |
-accesses after optimizations into atomic accesses with sequentially consistent |
-memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and |
-combined with builtin fences these programs can do meaningful cross-thread |
-communication without changing code. It also reflects the original code's intent |
-and guarantees better portability. |
- |
-Relaxed ordering could be used instead, but for the first release it is more |
-conservative to apply sequential consistency. Future releases may change what |
-happens at compile-time, but already-released pexes will continue using |
-sequential consistency. |
- |
-The PNaCl toolchain also requires that ``volatile`` accesses be at least |
-naturally aligned, and tries to guarantee this alignment. |
- |
Memory Model for Concurrent Operations |
-------------------------------------- |
-The memory model offered by PNaCl relies on the same coding guidelines as the |
-C11/C++11 one: concurrent accesses must always occur through atomic primitives |
-(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and |
-these accesses must always occur with the same size for the same memory |
-location. Visibility of stores is provided on a happens-before basis that |
-relates memory locations to each other as the C11/C++11 standards do. |
- |
-As in C11/C++11 some atomic accesses may be implemented with locks on certain |
-platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying |
-that all types are sometimes lock-free. The ``is_lock_free`` methods will return |
-the current platform's implementation at translation time. |
- |
-The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style |
-``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. |
-``volatile`` memory accesses can also be used, though these are discouraged, and |
-aren't present in bitcode. |
+The memory model offered by PNaCl relies on the same coding guidelines |
+as the C11/C++11 one: concurrent accesses must always occur through |
+atomic primitives (offered by `atomic intrinsics |
+<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always |
+occur with the same size for the same memory location. Visibility of |
+stores is provided on a happens-before basis that relates memory |
+locations to each other as the C11/C++11 standards do. |
+ |
+As in C11/C++11 some atomic accesses may be implemented with locks on |
+certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be |
+``1``, signifying that all types are sometimes lock-free. The |
+``is_lock_free`` methods and ``atomic_is_lock_free`` will return the |
+current platform's implementation at translation time. These macros, |
+methods and functions are in the C11 header ``<stdatomic.h>`` and the |
+C++11 header ``<atomic>``. |
+ |
+The PNaCl toolchain supports concurrent memory accesses through legacy |
+GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic |
+primitives. ``volatile`` memory accesses can also be used, though these |
+are discouraged, and aren't present in bitcode. See `Volatile Memory |
eliben
2013/08/05 18:35:54
Remove the "aren't present in bitcode" - i don't t
jvoung (off chromium)
2013/08/05 18:59:41
Yeah, this is the summary for developers. What is
JF
2013/08/05 20:37:48
Done.
|
+Accesses`_. |
PNaCl supports concurrency and parallelism with some restrictions: |
-* Threading is explicitly supported. |
+* Threading is explicitly supported through C11/C++11's threading |
+ libraries as well as POSIX threads. |
-* Inter-process communication through shared memory is limited to operations |
- which are lock-free on the current platform (``is_lock_free`` methods). This |
- may change at a later date. |
+* Inter-process communication through shared memory is limited to |
eliben
2013/08/05 18:35:54
What does inter-process communication even mean in
JF
2013/08/05 20:37:48
Done.
|
+ operations which are lock-free on the current platform |
+ (``is_lock_free`` methods). This may change at a later date. |
* Direct interaction with device memory isn't supported. |
-* Signal handling isn't supported, PNaCl therefore promotes all primitives to |
- cross-thread (instead of single-thread). This may change at a later date. Note |
- that using atomic operations which aren't lock-free may lead to deadlocks when |
- handling asynchronous signals. |
+* Signal handling isn't supported, PNaCl therefore promotes all |
+ primitives to cross-thread (instead of single-thread). This may change |
+ at a later date. Note that using atomic operations which aren't |
+ lock-free may lead to deadlocks when handling asynchronous signals. |
-* ``volatile`` and atomic operations are address-free (operations on the same |
- memory location via two different addresses work atomically), as intended by |
- the C11/C++11 standards. This is critical for inter-process communication as |
- well as synchronous "external modifications" such as mapping underlying memory |
- at multiple locations. |
- |
-Setting up the above mechanisms requires assistance from the embedding sandbox's |
-runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through |
-regular C/C++ code. |
- |
-The PNaCl toolchain currently optimizes for memory ordering as LLVM normally |
-does, but at pexe creation time it promotes all ``volatile`` accesses as well as |
-all atomic accesses to be sequentially consistent. Other memory orderings will |
-be supported in a future release, but pexes generated with the current toolchain |
-will continue functioning with sequential consistency. Using sequential |
-consistency provides a total ordering for all sequentially-consistent operations |
-on all addresses. |
- |
-This means that ``volatile`` and atomic memory accesses can only be re-ordered |
-in some limited way before the pexe is created, and will act as fences for all |
-memory accesses (even non-atomic and non-``volatile``) after pexe creation. |
-Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence |
-intervenes), separated, elided or fused according to C and C++'s memory model |
-before the pexe is created as well as after its creation. |
+* ``volatile`` and atomic operations are address-free (operations on the |
+ same memory location via two different addresses work atomically), as |
+ intended by the C11/C++11 standards. This is critical for |
+ inter-process communication as well as synchronous "external |
+ modifications" such as mapping underlying memory at multiple |
+ locations. |
+ |
+Setting up the above mechanisms requires assistance from the embedding |
+sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup |
+can be done through regular C/C++ code. |
Atomic Memory Ordering Constraints |
---------------------------------- |
-Atomics follow the same ordering constraints as in regular LLVM, but |
-all accesses are promoted to sequential consistency (the strongest |
-memory ordering) at pexe creation time. As more C11/C++11 code |
-allows us to understand performance and portability needs we intend |
-to support the full gamut of C11/C++11 memory orderings: |
+Atomics follow the same ordering constraints as in regular LLVM, but all |
+accesses are promoted to sequential consistency (the strongest memory |
+ordering) at pexe creation time. As more C11/C++11 code allows us to |
jvoung (off chromium)
2013/08/05 18:59:41
Should the memory orderings change also be done by
JF
2013/08/05 20:37:48
I think the current implementation should offer a
|
+understand performance and portability needs we intend to support the |
+full gamut of C11/C++11 memory orderings: |
- Relaxed: no operation orders memory. |
eliben
2013/08/05 18:35:54
Maybe this list does not belong here? This is user
JF
2013/08/05 20:37:48
This is not an addition, I just moved it around. I
|
-- Consume: a load operation performs a consume operation on the affected memory |
- location (currently unsupported by LLVM). |
-- Acquire: a load operation performs an acquire operation on the affected memory |
- location. |
-- Release: a store operation performs a release operation on the affected memory |
- location. |
+- Consume: a load operation performs a consume operation on the affected |
+ memory location (currently unsupported by LLVM). |
+- Acquire: a load operation performs an acquire operation on the |
+ affected memory location. |
+- Release: a store operation performs a release operation on the |
+ affected memory location. |
- Acquire-release: load and store operations perform acquire and release |
operations on the affected memory. |
-- Sequentially consistent: same as acquire-release, but providing a global total |
- ordering for all affected locations. |
+- Sequentially consistent: same as acquire-release, but providing a |
+ global total ordering for all affected locations. |
As in C11/C++11: |
- Atomic accesses must at least be naturally aligned. |
-- Some accesses may not actually be atomic on certain platforms, requiring an |
- implementation that uses a global lock. |
-- An atomic memory location must always be accessed with atomic primitives, and |
- these primitives must always be of the same bit size for that location. |
+- Some accesses may not actually be atomic on certain platforms, |
+ requiring an implementation that uses global lock(s). |
+- An atomic memory location must always be accessed with atomic |
+ primitives, and these primitives must always be of the same bit size |
+ for that location. |
- Not all memory orderings are valid for all atomic operations. |
+Volatile Memory Accesses |
+------------------------ |
+ |
+The C11/C++11 standards mandate that ``volatile`` accesses execute in |
+program order (but are not fences, so other memory operations can |
+reorder around them), are not necessarily atomic, and can’t be |
+elided. They can be separated into smaller width accesses. |
+ |
+Before any optimizations occur the PNaCl toolchain transforms |
+``volatile`` loads and stores into sequentially consistent ``volatile`` |
+atomic loads and stores, and applies regular LLVM optimizations along |
+the above guidelines. This orders ``volatiles`` according to the atomic |
+rules, and means that fences (including ``__sync_synchronize``) act in a |
+better-defined manner. Regular memory accesses still do not have |
+ordering guarantees with ``volatile`` and atomic accesses, though the |
+internal representation of ``__sync_synchronize`` attempts to prevent |
+reordering of memory accesses to objects which may escape. |
+ |
+Relaxed ordering could be used instead, but for the first release it is |
+more conservative to apply sequential consistency. Future releases may |
+change what happens at compile-time, but already-released pexes will |
+continue using sequential consistency. |
+ |
+The PNaCl toolchain also requires that ``volatile`` accesses be at least |
+naturally aligned, and tries to guarantee this alignment. |
+ |
+The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
+code, and combined with builtin fences these programs can do meaningful |
+cross-thread communication without changing code. They also better |
+reflect the original code's intent and guarantee better portability. |
+ |
+Stable Transfer Format |
+---------------------- |
+ |
+The PNaCl toolchain freezes atomic and ``volatile`` memory accesses |
+after optimizations into atomic accesses with sequentially consistent |
jvoung (off chromium)
2013/08/05 18:59:41
re "after optimizations": volatiles get converted
JF
2013/08/05 20:37:48
OK, I can remove this section.
|
+memory ordering. Other memory orderings will be exposed in future |
+releases, when we have a better grasp of existing code's needs, |
+portability implications, and are confident that implementation limits |
+are overcome. Future releases may change what happens at compile-time, |
+but already-released pexes will continue using sequential consistency. |
+ |
+Non-atomic and non-``volatile`` memory accesses may be reordered, |
+separated, elided or fused according to C and C++'s memory model before |
+the pexe is created as well as after its creation. |
+ |
Inline Assembly |
=============== |
Inline assembly isn't supported by PNaCl because it isn't portable. The |
one current exception is the common compiler barrier idiom |
``asm("":::"memory")``, which gets transformed to a sequentially |
-consistent memory barrier (equivalent to ``__sync_synchronize()``). |
+consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
+PNaCl this barrier is only guaranteed to order ``volatile`` and atomic |
+memory accesses, though in practice the implementation attempts to also |
+prevent reordering of memory accesses to objects which may escape. |