Chromium Code Reviews| Index: docs/PNaClLangRef.rst |
| diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst |
| index b1d39a7187806fc36a4dc8a54d38feb6441eb48d..2c7f7f9743a2a353ec5e8675e82e745da26ee38c 100644 |
| --- a/docs/PNaClLangRef.rst |
| +++ b/docs/PNaClLangRef.rst |
| @@ -106,21 +106,128 @@ Volatile Memory Accesses |
| `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
| -TODO: are we going to promote volatile to atomic? |
| +PNaCl bitcode does not support volatile memory accesses. |
| + |
| +.. note:: |
| + |
| + The C11/C++11 standards mandate that ``volatile`` accesses execute |
| + in program order (but are not fences, so other memory operations can |
| + reorder around them), are not necessarily atomic, and can’t be |
| + elided or fused. |
| + |
| + The PNaCl toolchain applies regular LLVM optimizations along these |
| + guidelines, and it further prevents any load/store (even |
| + non-``volatile`` and non-atomic ones) from moving above or below a |
| + volatile operations: they act as compiler barriers before |
| + optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| + accesses after optimizations into atomic accesses with sequentially |
| + consistent memory ordering. This eases the support of legacy |
| + (i.e. non-C11/C++11) code, and combined with builtin fences these |
| + programs can do meaningful cross-thread communication without |
| + changing code. It also reflects the original code's intent and |
| + guarantees better portability. |
| + |
| + Relaxed ordering could be used instead, but for the first release it |
| + is more conservative to apply sequential consistency. Future |
| + releases may change what happens at compile-time, but |
| + already-released pexes will continue using sequential consistency. |
| + |
| + The PNaCl toolchain also requires that ``volatile`` accesses be at |
| + least naturally aligned, and tries to guarantee this alignment. |
| Memory Model for Concurrent Operations |
| -------------------------------------- |
| `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
| -TODO. |
| +The memory model offered by PNaCl relies on the same coding guidelines |
| +as the C11/C++11 one: concurrent accesses must always occur through |
| +atomic primitives, and these accesses must always occur with the same |
| +size for the same memory location. Visibility of stores is provided on a |
| +happens-before basis that relates memory locations to each other as the |
| +C11/C++11 standards do. |
| + |
| +PNaCl bitcode requires all concurrency to occur through `atomic |
| +intrinsics`_. |
| + |
| +.. note:: |
| + |
| + As in C11/C++11 some atomic accesses may be implemented with locks |
| + on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always |
| + be ``1``, signifying that all types are sometimes lock-free. The |
| + ``is_lock_free`` methods will return the current platform's |
| + implementation at runtime. |
| + |
| + The PNaCl toolchain supports concurrent memory accesses through |
| + legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 |
| + atomic primitives. ``volatile`` memory accesses can also be used, |
| + though these are discouraged, and aren't present in bitcode. |
| + |
| + Note that PNaCl explicitly supports concurrency through threading |
| + and inter-process communication (shared memory), but doesn't support |
|
Derek Schuff
2013/07/02 22:13:17
should probably remove the reference to shared mem
JF
2013/07/02 23:44:32
I clarified this entire section, please review aga
|
| + interacting with device memory. Setting these up require assistance |
| + from the embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but |
| + using them once setup can be done through regular C/C++ code. |
| + |
| + PNaCl also doesn't currently support signal handling, and therefore |
| + promotes all primitives to cross-thread (instead of |
| + single-thread). This may change at a later date. |
| + |
| + The PNaCl toolchain currently optimizes for memory ordering as LLVM |
| + normally does, but at pexe creation time it promotes all |
| + ``volatile`` accesses as well as all atomic accesses to be |
| + sequentially consistent. Other memory orderings will be supported in |
| + a future release, but pexes generate with the current toolchain will |
|
Derek Schuff
2013/07/02 22:13:17
s/generate/generated
JF
2013/07/02 23:44:32
Done.
|
| + continue functioning with sequential consistency. Using sequential |
| + consistency provides a total ordering for all |
| + sequentially-consistent operations on all addresses. |
| + |
| + This means that ``volatile`` and atomic memory accesses can only be |
| + re-ordered in some limited way before the pexe is created, and will |
| + act as fences for all memory accesses (even non-atomic and |
| + non-``volatile``) after pexe creation. Non-atomic and |
| + non-``volatile`` memory accesses may be reordered (unless a fence |
| + intervenes), separated, elided or fused according to C and C++'s |
| + memory model before the pexe is created as well as after its |
| + creation. |
| Atomic Memory Ordering Constraints |
| ---------------------------------- |
| `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
| -TODO. |
| +PNaCl bitcode currently supports sequential consistency only, through |
| +its `atomic intrinsics`_. |
| + |
| +.. note:: |
| + |
| + Atomics follow the same ordering constraints as in regular LLVM, but |
| + all accesses are promoted to sequential consistency (the strongest |
| + memory ordering) at pexe creation time. As more C11/C++11 code |
| + allows us to understand performance and portability needs we intend |
| + to support the full gamut of C11/C++11 memory orderings: |
| + |
| + - Relaxed: no operation orders memory. |
| + - Consume: a load operation performs a consume operation on the |
| + affected memory location (currently unsupported by LLVM). |
| + - Acquire: a load operation performs an acquire operation on the |
| + affected memory location. |
| + - Release: a store operation performs a release operation on the |
| + affected memory location. |
| + - Acquire-release: load and store operations perform acquire and |
| + release operations on the affected memory. |
| + - Sequentially consistent: same as acquire-release, but providing |
| + a global total ordering for all affected locations. |
| + |
| + As in C11/C++11: |
| + |
| + - Atomic and volatile accesses must at least be naturally aligned. |
| + - Some accesses may not actually be atomic on certain platforms, |
| + requiring an implementation that uses a global lock. |
| + - An atomic memory location must always be accessed with atomic |
| + primitives, and these primitives must always be of the same bit |
| + size for that location. |
| + - Not all memory orderings are valid for all atomic operations. |
| Fast-Math Flags |
| --------------- |
| @@ -270,14 +377,6 @@ Only the LLVM instructions listed here are supported by PNaCl bitcode. |
| The pointer argument of these instructions must be a *normalized* pointer |
| (see :ref:`pointer types <pointertypes>`). |
| -* ``fence`` |
| -* ``cmpxchg``, ``atomicrmw`` |
| - |
| - The pointer argument of these instructions must be a *normalized* pointer |
| - (see :ref:`pointer types <pointertypes>`). |
| - |
| - TODO(jfb): this may change |
| - |
| * ``trunc`` |
| * ``zext`` |
| * ``sext`` |
| @@ -316,8 +415,6 @@ Intrinsic Functions |
| The only intrinsics supported by PNaCl bitcode are the following. |
| -TODO(jfb): atomics |
| - |
| * ``llvm.memcpy`` |
| * ``llvm.memmove`` |
| * ``llvm.memset`` |
| @@ -346,3 +443,56 @@ TODO(jfb): atomics |
| TODO: describe |
| +.. _atomic intrinsics: |
| + |
| +* ``llvm.nacl.atomic.store`` |
| +* ``llvm.nacl.atomic.load`` |
| +* ``llvm.nacl.atomic.rmw`` |
| +* ``llvm.nacl.atomic.cmpxchg`` |
| +* ``llvm.nacl.atomic.fence`` |
| + |
| + .. code-block:: llvm |
| + |
| + declare iN @llvm.nacl.atomic.load( |
| + iN* <source>, i32 <memory_order>) |
| + declare void @llvm.nacl.atomic.store( |
| + iN <operand>, iN* <destination>, i32 <memory_order>) |
| + declare iN @llvm.nacl.atomic.rmw( |
| + i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) |
| + declare iN @llvm.nacl.atomic.cmpxchg( |
| + iN* <object>, iN <expected>, iN <desired>, |
| + i32 <memory_order_success>, i32 <memory_order_failure>) |
| + declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
| + |
| + Each of these intrinsics is overloaded on the ``iN`` |
| + argument. Integral types of 8, 16, 32 and 64-bit width are supported |
| + for these ``iN`` arguments. |
| + |
| + The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
| + read-modify-write operations, from the general and arithmetic sections |
| + of the C11/C++11 standards: |
| + |
| + - ``add`` |
| + - ``sub`` |
| + - ``or`` |
| + - ``and`` |
| + - ``xor`` |
| + - ``exchange`` |
| + |
| + For all of these read-modify-write operations, the returned value is |
| + that at ``object`` before the computation. The ``computation`` |
| + argument must be a compile-time constant. |
| + |
| + All atomic intrinsics also support C11/C++11 memory orderings, which |
| + must be compile-time constants. Those are detailed in `Atomic Memory |
| + Ordering Constraints`_. |
| + |
| + Integer values for these computations and memory orderings are defined |
| + in ``"llvm/IR/NaClIntrinsics.h"``. |
| + |
| + .. note:: |
| + |
| + These intrinsics allow PNaCl to support C11/C++11 style atomic |
| + operations as well as some legacy GCC-style ``__sync_*`` builtins |
| + while remaining stable as the LLVM codebase changes. The user |
| + isn't expected to use these intrinsics directly. |