| Index: docs/PNaClLangRef.rst
|
| diff --git a/docs/PNaClLangRef.rst b/docs/PNaClLangRef.rst
|
| index b1d39a7187806fc36a4dc8a54d38feb6441eb48d..d3911d99a230346264bc926ba4da6c416ca890e0 100644
|
| --- a/docs/PNaClLangRef.rst
|
| +++ b/docs/PNaClLangRef.rst
|
| @@ -106,21 +106,133 @@ Volatile Memory Accesses
|
|
|
| `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_
|
|
|
| -TODO: are we going to promote volatile to atomic?
|
| +PNaCl bitcode does not support volatile memory accesses.
|
| +
|
| +.. note::
|
| +
|
| + The C11/C++11 standards mandate that ``volatile`` accesses execute
|
| + in program order (but are not fences, so other memory operations can
|
| + reorder around them), are not necessarily atomic, and can’t be
|
| + elided or fused.
|
| +
|
| + The PNaCl toolchain applies regular LLVM optimizations along these
|
| + guidelines, and it further prevents any load/store (even
|
| + non-``volatile`` and non-atomic ones) from moving above or below a
|
| + volatile operations: they act as compiler barriers before
|
| + optimizations occur. The PNaCl toolchain freezes ``volatile``
|
| + accesses after optimizations into atomic accesses with sequentially
|
| + consistent memory ordering. This eases the support of legacy
|
| + (i.e. non-C11/C++11) code, and combined with builtin fences these
|
| + programs can do meaningful cross-thread communication without
|
| + changing code. It also reflects the original code's intent and
|
| + guarantees better portability.
|
| +
|
| + Relaxed ordering could be used instead, but for the first release it
|
| + is more conservative to apply sequential consistency. Future
|
| + releases may change what happens at compile-time, but
|
| + already-released pexes will continue using sequential consistency.
|
| +
|
| + The PNaCl toolchain also requires that ``volatile`` accesses be at
|
| + least naturally aligned, and tries to guarantee this alignment.
|
|
|
| Memory Model for Concurrent Operations
|
| --------------------------------------
|
|
|
| `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_
|
|
|
| -TODO.
|
| +The memory model offered by PNaCl relies on the same coding guidelines
|
| +as the C11/C++11 one: concurrent accesses must always occur through
|
| +atomic primitives, and these accesses must always occur with the same
|
| +size for the same memory location. Visibility of stores is provided on a
|
| +happens-before basis that relates memory locations to each other as the
|
| +C11/C++11 standards do.
|
| +
|
| +PNaCl bitcode requires all concurrency to occur through `atomic
|
| +intrinsics`_.
|
| +
|
| +.. note::
|
| +
|
| + As in C11/C++11 some atomic accesses may be implemented with locks
|
| + on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always
|
| + be ``1``, signifying that all types are sometimes lock-free. The
|
| + ``is_lock_free`` methods will return the current platform's
|
| + implementation at runtime.
|
| +
|
| + The PNaCl toolchain supports concurrent memory accesses through
|
| + legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11
|
| + atomic primitives. ``volatile`` memory accesses can also be used,
|
| + though these are discouraged, and aren't present in bitcode.
|
| +
|
| + PNaCl has varying support for concurrency and parallelism:
|
| +
|
| + * Threading is explicitly supported.
|
| + * Inter-process communication through shared memory is limited to
|
| + operations which are lock-free on the current platform
|
| + (``is_lock_free`` methods). This may change at a later date.
|
| + * Direct interaction with device memory isn't supported.
|
| + * Signal handling isn't supported, PNaCl therefore promotes all
|
| + primitives to cross-thread (instead of single-thread). This may
|
| + change at a later date.
|
| +
|
| + Setting up the above mechanisms requires assistance from the
|
| + embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using
|
| + them once setup can be done through regular C/C++ code.
|
| +
|
| + The PNaCl toolchain currently optimizes for memory ordering as LLVM
|
| + normally does, but at pexe creation time it promotes all
|
| + ``volatile`` accesses as well as all atomic accesses to be
|
| + sequentially consistent. Other memory orderings will be supported in
|
| + a future release, but pexes generated with the current toolchain
|
| + will continue functioning with sequential consistency. Using
|
| + sequential consistency provides a total ordering for all
|
| + sequentially-consistent operations on all addresses.
|
| +
|
| + This means that ``volatile`` and atomic memory accesses can only be
|
| + re-ordered in some limited way before the pexe is created, and will
|
| + act as fences for all memory accesses (even non-atomic and
|
| + non-``volatile``) after pexe creation. Non-atomic and
|
| + non-``volatile`` memory accesses may be reordered (unless a fence
|
| + intervenes), separated, elided or fused according to C and C++'s
|
| + memory model before the pexe is created as well as after its
|
| + creation.
|
|
|
| Atomic Memory Ordering Constraints
|
| ----------------------------------
|
|
|
| `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_
|
|
|
| -TODO.
|
| +PNaCl bitcode currently supports sequential consistency only, through
|
| +its `atomic intrinsics`_.
|
| +
|
| +.. note::
|
| +
|
| + Atomics follow the same ordering constraints as in regular LLVM, but
|
| + all accesses are promoted to sequential consistency (the strongest
|
| + memory ordering) at pexe creation time. As more C11/C++11 code
|
| + allows us to understand performance and portability needs we intend
|
| + to support the full gamut of C11/C++11 memory orderings:
|
| +
|
| + - Relaxed: no operation orders memory.
|
| + - Consume: a load operation performs a consume operation on the
|
| + affected memory location (currently unsupported by LLVM).
|
| + - Acquire: a load operation performs an acquire operation on the
|
| + affected memory location.
|
| + - Release: a store operation performs a release operation on the
|
| + affected memory location.
|
| + - Acquire-release: load and store operations perform acquire and
|
| + release operations on the affected memory.
|
| + - Sequentially consistent: same as acquire-release, but providing
|
| + a global total ordering for all affected locations.
|
| +
|
| + As in C11/C++11:
|
| +
|
| + - Atomic and volatile accesses must at least be naturally aligned.
|
| + - Some accesses may not actually be atomic on certain platforms,
|
| + requiring an implementation that uses a global lock.
|
| + - An atomic memory location must always be accessed with atomic
|
| + primitives, and these primitives must always be of the same bit
|
| + size for that location.
|
| + - Not all memory orderings are valid for all atomic operations.
|
|
|
| Fast-Math Flags
|
| ---------------
|
| @@ -270,14 +382,6 @@ Only the LLVM instructions listed here are supported by PNaCl bitcode.
|
| The pointer argument of these instructions must be a *normalized* pointer
|
| (see :ref:`pointer types <pointertypes>`).
|
|
|
| -* ``fence``
|
| -* ``cmpxchg``, ``atomicrmw``
|
| -
|
| - The pointer argument of these instructions must be a *normalized* pointer
|
| - (see :ref:`pointer types <pointertypes>`).
|
| -
|
| - TODO(jfb): this may change
|
| -
|
| * ``trunc``
|
| * ``zext``
|
| * ``sext``
|
| @@ -316,8 +420,6 @@ Intrinsic Functions
|
|
|
| The only intrinsics supported by PNaCl bitcode are the following.
|
|
|
| -TODO(jfb): atomics
|
| -
|
| * ``llvm.memcpy``
|
| * ``llvm.memmove``
|
| * ``llvm.memset``
|
| @@ -346,3 +448,56 @@ TODO(jfb): atomics
|
|
|
| TODO: describe
|
|
|
| +.. _atomic intrinsics:
|
| +
|
| +* ``llvm.nacl.atomic.store``
|
| +* ``llvm.nacl.atomic.load``
|
| +* ``llvm.nacl.atomic.rmw``
|
| +* ``llvm.nacl.atomic.cmpxchg``
|
| +* ``llvm.nacl.atomic.fence``
|
| +
|
| + .. code-block:: llvm
|
| +
|
| + declare iN @llvm.nacl.atomic.load(
|
| + iN* <source>, i32 <memory_order>)
|
| + declare void @llvm.nacl.atomic.store(
|
| + iN <operand>, iN* <destination>, i32 <memory_order>)
|
| + declare iN @llvm.nacl.atomic.rmw(
|
| + i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>)
|
| + declare iN @llvm.nacl.atomic.cmpxchg(
|
| + iN* <object>, iN <expected>, iN <desired>,
|
| + i32 <memory_order_success>, i32 <memory_order_failure>)
|
| + declare void @llvm.nacl.atomic.fence(i32 <memory_order>)
|
| +
|
| + Each of these intrinsics is overloaded on the ``iN``
|
| + argument. Integral types of 8, 16, 32 and 64-bit width are supported
|
| + for these ``iN`` arguments.
|
| +
|
| + The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following
|
| + read-modify-write operations, from the general and arithmetic sections
|
| + of the C11/C++11 standards:
|
| +
|
| + - ``add``
|
| + - ``sub``
|
| + - ``or``
|
| + - ``and``
|
| + - ``xor``
|
| + - ``exchange``
|
| +
|
| + For all of these read-modify-write operations, the returned value is
|
| + that at ``object`` before the computation. The ``computation``
|
| + argument must be a compile-time constant.
|
| +
|
| + All atomic intrinsics also support C11/C++11 memory orderings, which
|
| + must be compile-time constants. Those are detailed in `Atomic Memory
|
| + Ordering Constraints`_.
|
| +
|
| + Integer values for these computations and memory orderings are defined
|
| + in ``"llvm/IR/NaClIntrinsics.h"``.
|
| +
|
| + .. note::
|
| +
|
| + These intrinsics allow PNaCl to support C11/C++11 style atomic
|
| + operations as well as some legacy GCC-style ``__sync_*`` builtins
|
| + while remaining stable as the LLVM codebase changes. The user
|
| + isn't expected to use these intrinsics directly.
|
|
|