Index: docs/PNaClLangRef.rst |
diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst |
index e48bebc84dad554e9dea8a2dde47414c4688f0da..5918456b1849fd1b2d43cd3b47fec72c3311a0dc 100644 |
--- a/docs/PNaClLangRef.rst |
+++ b/docs/PNaClLangRef.rst |
@@ -113,21 +113,133 @@ Volatile Memory Accesses |
`LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
-TODO: are we going to promote volatile to atomic? |
+PNaCl bitcode does not support volatile memory accesses. |
+ |
+.. note:: |
+ |
+ The C11/C++11 standards mandate that ``volatile`` accesses execute |
+ in program order (but are not fences, so other memory operations can |
+ reorder around them), are not necessarily atomic, and can’t be |
+ elided or fused. |
+ |
+ The PNaCl toolchain applies regular LLVM optimizations along these |
+ guidelines, and it further prevents any load/store (even |
+ non-``volatile`` and non-atomic ones) from moving above or below a |
+ volatile operations: they act as compiler barriers before |
+ optimizations occur. The PNaCl toolchain freezes ``volatile`` |
+ accesses after optimizations into atomic accesses with sequentially |
+ consistent memory ordering. This eases the support of legacy |
+ (i.e. non-C11/C++11) code, and combined with builtin fences these |
+ programs can do meaningful cross-thread communication without |
+ changing code. It also reflects the original code's intent and |
+ guarantees better portability. |
+ |
+ Relaxed ordering could be used instead, but for the first release it |
+ is more conservative to apply sequential consistency. Future |
+ releases may change what happens at compile-time, but |
+ already-released pexes will continue using sequential consistency. |
+ |
+ The PNaCl toolchain also requires that ``volatile`` accesses be at |
+ least naturally aligned, and tries to guarantee this alignment. |
Memory Model for Concurrent Operations |
-------------------------------------- |
`LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
-TODO. |
+The memory model offered by PNaCl relies on the same coding guidelines |
+as the C11/C++11 one: concurrent accesses must always occur through |
+atomic primitives, and these accesses must always occur with the same |
+size for the same memory location. Visibility of stores is provided on a |
+happens-before basis that relates memory locations to each other as the |
+C11/C++11 standards do. |
+ |
+PNaCl bitcode requires all concurrency to occur through `atomic |
+intrinsics`_. |
+ |
+.. note:: |
+ |
+ As in C11/C++11 some atomic accesses may be implemented with locks |
+ on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always |
+ be ``1``, signifying that all types are sometimes lock-free. The |
+ ``is_lock_free`` methods will return the current platform's |
+ implementation at runtime. |
+ |
+ The PNaCl toolchain supports concurrent memory accesses through |
+ legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 |
+ atomic primitives. ``volatile`` memory accesses can also be used, |
+ though these are discouraged, and aren't present in bitcode. |
+ |
+ PNaCl has varying support for concurrency and parallelism: |
eliben
2013/07/03 16:06:05
"varying support" is a bit pessimistic. How about
JF
2013/07/03 20:58:35
Done.
|
+ |
+ * Threading is explicitly supported. |
+ * Inter-process communication through shared memory is limited to |
+ operations which are lock-free on the current platform |
+ (``is_lock_free`` methods). This may change at a later date. |
+ * Direct interaction with device memory isn't supported. |
+ * Signal handling isn't supported, PNaCl therefore promotes all |
+ primitives to cross-thread (instead of single-thread). This may |
+ change at a later date. |
+ |
+ Setting up the above mechanisms requires assistance from the |
+ embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using |
+ them once setup can be done through regular C/C++ code. |
+ |
+ The PNaCl toolchain currently optimizes for memory ordering as LLVM |
+ normally does, but at pexe creation time it promotes all |
+ ``volatile`` accesses as well as all atomic accesses to be |
+ sequentially consistent. Other memory orderings will be supported in |
+ a future release, but pexes generated with the current toolchain |
+ will continue functioning with sequential consistency. Using |
+ sequential consistency provides a total ordering for all |
+ sequentially-consistent operations on all addresses. |
+ |
+ This means that ``volatile`` and atomic memory accesses can only be |
+ re-ordered in some limited way before the pexe is created, and will |
+ act as fences for all memory accesses (even non-atomic and |
+ non-``volatile``) after pexe creation. Non-atomic and |
+ non-``volatile`` memory accesses may be reordered (unless a fence |
+ intervenes), separated, elided or fused according to C and C++'s |
+ memory model before the pexe is created as well as after its |
+ creation. |
Atomic Memory Ordering Constraints |
---------------------------------- |
`LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
-TODO. |
+PNaCl bitcode currently supports sequential consistency only, through |
+its `atomic intrinsics`_. |
+ |
+.. note:: |
+ |
+ Atomics follow the same ordering constraints as in regular LLVM, but |
+ all accesses are promoted to sequential consistency (the strongest |
+ memory ordering) at pexe creation time. As more C11/C++11 code |
+ allows us to understand performance and portability needs we intend |
+ to support the full gamut of C11/C++11 memory orderings: |
+ |
+ - Relaxed: no operation orders memory. |
+ - Consume: a load operation performs a consume operation on the |
+ affected memory location (currently unsupported by LLVM). |
+ - Acquire: a load operation performs an acquire operation on the |
+ affected memory location. |
+ - Release: a store operation performs a release operation on the |
+ affected memory location. |
+ - Acquire-release: load and store operations perform acquire and |
+ release operations on the affected memory. |
+ - Sequentially consistent: same as acquire-release, but providing |
+ a global total ordering for all affected locations. |
+ |
+ As in C11/C++11: |
+ |
+ - Atomic and volatile accesses must at least be naturally aligned. |
+ - Some accesses may not actually be atomic on certain platforms, |
+ requiring an implementation that uses a global lock. |
+ - An atomic memory location must always be accessed with atomic |
+ primitives, and these primitives must always be of the same bit |
+ size for that location. |
+ - Not all memory orderings are valid for all atomic operations. |
Fast-Math Flags |
--------------- |
@@ -277,14 +389,6 @@ Only the LLVM instructions listed here are supported by PNaCl bitcode. |
The pointer argument of these instructions must be a *normalized* pointer |
(see :ref:`pointer types <pointertypes>`). |
-* ``fence`` |
-* ``cmpxchg``, ``atomicrmw`` |
- |
- The pointer argument of these instructions must be a *normalized* pointer |
- (see :ref:`pointer types <pointertypes>`). |
- |
- TODO(jfb): this may change |
- |
* ``trunc`` |
* ``zext`` |
* ``sext`` |
@@ -323,8 +427,6 @@ Intrinsic Functions |
The only intrinsics supported by PNaCl bitcode are the following. |
-TODO(jfb): atomics |
- |
* ``llvm.memcpy`` |
* ``llvm.memmove`` |
* ``llvm.memset`` |
@@ -359,3 +461,56 @@ TODO(jfb): atomics |
TODO: describe |
+.. _atomic intrinsics: |
+ |
+* ``llvm.nacl.atomic.store`` |
+* ``llvm.nacl.atomic.load`` |
+* ``llvm.nacl.atomic.rmw`` |
+* ``llvm.nacl.atomic.cmpxchg`` |
+* ``llvm.nacl.atomic.fence`` |
+ |
+ .. code-block:: llvm |
+ |
+ declare iN @llvm.nacl.atomic.load( |
+ iN* <source>, i32 <memory_order>) |
+ declare void @llvm.nacl.atomic.store( |
+ iN <operand>, iN* <destination>, i32 <memory_order>) |
+ declare iN @llvm.nacl.atomic.rmw( |
+ i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) |
+ declare iN @llvm.nacl.atomic.cmpxchg( |
+ iN* <object>, iN <expected>, iN <desired>, |
+ i32 <memory_order_success>, i32 <memory_order_failure>) |
+ declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
+ |
+ Each of these intrinsics is overloaded on the ``iN`` |
+ argument. Integral types of 8, 16, 32 and 64-bit width are supported |
+ for these ``iN`` arguments. |
+ |
+ The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
+ read-modify-write operations, from the general and arithmetic sections |
+ of the C11/C++11 standards: |
+ |
+ - ``add`` |
+ - ``sub`` |
+ - ``or`` |
+ - ``and`` |
+ - ``xor`` |
+ - ``exchange`` |
+ |
+ For all of these read-modify-write operations, the returned value is |
+ that at ``object`` before the computation. The ``computation`` |
+ argument must be a compile-time constant. |
+ |
+ All atomic intrinsics also support C11/C++11 memory orderings, which |
+ must be compile-time constants. Those are detailed in `Atomic Memory |
+ Ordering Constraints`_. |
+ |
+ Integer values for these computations and memory orderings are defined |
+ in ``"llvm/IR/NaClIntrinsics.h"``. |
+ |
+ .. note:: |
+ |
+ These intrinsics allow PNaCl to support C11/C++11 style atomic |
+ operations as well as some legacy GCC-style ``__sync_*`` builtins |
+ while remaining stable as the LLVM codebase changes. The user |
+ isn't expected to use these intrinsics directly. |