Chromium Code Reviews| Index: docs/PNaClLangRef.rst |
| diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst |
| index b1d39a7187806fc36a4dc8a54d38feb6441eb48d..488c61ba16a25ed4dfeb61c3220ce06170475cf8 100644 |
| --- a/docs/PNaClLangRef.rst |
| +++ b/docs/PNaClLangRef.rst |
| @@ -106,21 +106,99 @@ Volatile Memory Accesses |
| `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
| -TODO: are we going to promote volatile to atomic? |
| +The C and C++ standards mandate that ``volatile`` accesses execute in |
|
eliben
2013/06/27 19:32:24
The C11 and C++11 standards...
JF
2013/06/27 21:03:17
These restrictions are actually all in the previou
|
| +program order (but are not fences, so other memory operations can |
| +reorder around them), are not necessarily atomic, and can’t be elided or |
| +fused. |
| + |
| +The PNaCl toolchain applies regular LLVM optimizations along these |
| +guidelines, but prevents any load/store (even non-``volatile`` and |
| +non-atomic ones) from moving past a volatile operations: they act as |
|
eliben
2013/06/27 19:32:24
"past volatile operations"?
JF
2013/06/27 21:03:17
I mean: a regular load/store can't move above or b
|
| +compiler barriers before optimizations occur. The PNaCl toolchain |
| +freezes ``volatile`` accesses after optimizations into atomic accesses |
| +with sequential consistency memory ordering. This eases the support of |
|
eliben
2013/06/27 19:32:24
sequentially consistent memory ordering?
JF
2013/06/27 21:03:17
Done.
|
| +legacy (i.e. non-C11/C++11) code, and combined with builtin fences these |
| +programs can do meaningful cross-thread communication without changing |
| +code. It also reflects the original code's intent and guarantees better |
| +portability. |
| + |
| +Relaxed ordering could be used instead, but for the first release it is |
| +more conservative to apply sequential consistency. Future releases may |
| +change what happens at compile-time, but already-released pexes will |
| +continue using sequential consistency. |
| + |
| +The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| +naturally aligned, and tries to guarantee this alignment. |
| Memory Model for Concurrent Operations |
| -------------------------------------- |
| `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
| -TODO. |
| +The memory model offered by PNaCl relies the same coding guidelines as |
|
eliben
2013/06/27 19:32:24
"relies on the same" ?
JF
2013/06/27 21:03:17
Done.
|
| +the C11/C++11 one: concurrent accesses must always occur through atomic |
| +primitives, and these accesses must always occur with the same size for |
| +the same memory locations. Visibility of stores is provided on a |
| +happens-before basis that relates memory locations to each other as the |
| +C11/C++11 standards do. |
| + |
| +As in C11/C++11 some atomic accesses may be implemented with locks on |
| +certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be |
| +``1``, signifying that all types are sometimes lock-free. The |
| +``is_lock_free`` methods will return the current platform's |
| +implementation at runtime. |
| + |
| +The PNaCl toolchain supports concurrent memory accesses through legacy |
| +GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic |
| +primitives. ``volatile`` memory accesses can also be used, though these |
| +are discouraged. |
| + |
| +Note that PNaCl explicitly supports concurrency through threading and |
| +inter-process communication (shared memory), but doesn't support |
| +interacting with device memory. Setting these up require assistance from |
| +the embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using |
| +them once setup can be done through regular C/C++ code. |
| + |
| +PNaCl also doesn't currently support signal handling, and therefore |
| +promotes all primitives to cross-thread (instead of single-thread). This |
| +may change at a later date. |
| + |
| +The PNaCl toolchain currently optimizes for memory ordering as LLVM |
| +normally does, but at pexe creation time it promotes all ``volatile`` |
| +accesses as well as all atomic accesses to be sequentially |
| +consistent. Other memory orderings will be supported in a future |
| +release, but pexes generate with the current toolchain will continue |
| +functioning with sequential consistency. Using sequential consistency |
| +provides a total ordering for all sequentially-consistent operations on |
| +all addresses. |
| + |
| +This means that ``volatile`` and atomic memory accesses can only be |
| +re-ordered in some limited way before the pexe is created, and will act |
| +as fences for all memory accesses (even non-atomic and non-``volatile``) |
| +after pexe creation. Non-atomic and non-``volatile`` memory accesses may |
| +be reordered (unless a fence intervenes), separated, elided or fused |
| +according to C and C++'s memory model before the pexe is created as well |
| +as after its creation. |
| Atomic Memory Ordering Constraints |
| ---------------------------------- |
| `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
| -TODO. |
| +Atomics follow the same ordering constraints as in regular LLVM, but all |
| +accesses are promoted to sequential consistency (the strongest memory |
| +ordering) at pexe creation time. We may relax these rules and honor the |
| +program's memory ordering constraints as more C11/C++11 code allows us |
| +to understand performance and portability needs. |
| + |
| +As in C11/C++11: |
| + |
| + - Atomic and volatile accesses must at least be naturally aligned. |
| + - Some accesses may not actually be atomic on certain platforms, |
| + requiring an implementation that uses a global lock. |
| + - An atomic memory location must always be accesses with atomic |
|
eliben
2013/06/27 19:32:24
accessed
JF
2013/06/27 21:03:17
Done.
|
| + primitives, and these primitives must always be of the same type for |
| + that location. |
| Fast-Math Flags |
| --------------- |
| @@ -270,14 +348,6 @@ Only the LLVM instructions listed here are supported by PNaCl bitcode. |
| The pointer argument of these instructions must be a *normalized* pointer |
| (see :ref:`pointer types <pointertypes>`). |
| -* ``fence`` |
| -* ``cmpxchg``, ``atomicrmw`` |
| - |
| - The pointer argument of these instructions must be a *normalized* pointer |
| - (see :ref:`pointer types <pointertypes>`). |
| - |
| - TODO(jfb): this may change |
| - |
| * ``trunc`` |
| * ``zext`` |
| * ``sext`` |
| @@ -316,8 +386,6 @@ Intrinsic Functions |
| The only intrinsics supported by PNaCl bitcode are the following. |
| -TODO(jfb): atomics |
| - |
| * ``llvm.memcpy`` |
| * ``llvm.memmove`` |
| * ``llvm.memset`` |
| @@ -346,3 +414,67 @@ TODO(jfb): atomics |
| TODO: describe |
| +* ``llvm.nacl.atomic.store`` |
| +* ``llvm.nacl.atomic.load`` |
| +* ``llvm.nacl.atomic.rmw`` |
| +* ``llvm.nacl.atomic.cmpxchg`` |
| +* ``llvm.nacl.atomic.fence`` |
| + |
| + These intrinsics allow PNaCl to support C11/C++11 style atomic |
| + operations as well as some legacy GCC-style ``__sync_*`` builtins |
| + while remaining stable as the LLVM codebase changes. The user isn't |
| + expected to use these intrinsics directly. |
| + |
| + :: |
| + |
| + declare void @llvm.nacl.atomic.store( |
| + iN* <dest>, iN <val>, i32 <memory_order>) |
| + declare iN @llvm.nacl.atomic.load( |
| + iN* <src>, i32 <memory_order>) |
| + declare iN @llvm.nacl.atomic.rmw( |
| + i32 <op>, iN* <loc>, iN <val>, i32 <memory_order>) |
| + declare iN @llvm.nacl.atomic.compare_exchange( |
|
eliben
2013/06/27 19:32:24
name mismatch with above
JF
2013/06/27 21:03:17
Done. I went with the C11/C++11 naming and forgot
|
| + iN* <loc>, iN <expected>, iN <desired>, |
| + i32 <memory_order_success>, i32 <memory_order_failure>) |
| + declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
| + |
| + Each of these intrinsic is overloaded on the ``iN`` argument. Integral |
|
eliben
2013/06/27 19:32:24
intrinsics
JF
2013/06/27 21:03:17
Done.
|
| + types of 8, 16, 32 and 64-bit width are currently supported for these |
| + ``iN`` arguments. |
| + |
| + The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
| + read-modify-write operations, from the general and arithmetic sections |
| + of the C11/C++11 standards: |
| + |
| + - ``add`` |
| + - ``sub`` |
| + - ``or`` |
| + - ``and`` |
| + - ``xor`` |
| + - ``exchange`` |
| + |
| + For all of these read-modify-write operations, the returned value is |
| + that at ``loc`` before the operation. The ``op`` argument must be a |
| + compile-time constant. |
| + |
| + All atomic intrinsics also support C11/C++11 memory orderings, which |
| + must be compile-time constants: |
| + |
| + - Relaxed: no operation orders memory. |
| + - Consume: a load operation performs a consume operation on the |
| + affected memory location (currently unsupported by LLVM). |
| + - Acquire: a load operation performs an acquire operation on the |
| + affected memory location. |
| + - Release: a store operation performs a release operation on the |
| + affected memory location. |
| + - Acquire-release: load and store operations perform acquire and |
| + release operations on the affected memory. |
| + - Sequentially consistent: same as acquire-release, but providing a |
| + global total ordering for all affected locations. |
| + |
| + Note that PNaCl currently strengthens all memory ordering |
| + specifications to sequential consistency, the strongest form of memory |
| + ordering. |
| + |
| + Values for these operations and memory orderings are defined in |
| + llvm/IR/NaClIntrinsics.h. |