| Index: docs/PNaClDeveloperGuide.rst
|
| diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst
|
| index 9c27ae5c14cb7a2215ee6f9363080c5d4f5b1449..e807d572f77fa57b70f1cf82c94a1870c03b21ad 100644
|
| --- a/docs/PNaClDeveloperGuide.rst
|
| +++ b/docs/PNaClDeveloperGuide.rst
|
| @@ -14,126 +14,159 @@ TODO
|
| Memory Model and Atomics
|
| ========================
|
|
|
| -Volatile Memory Accesses
|
| -------------------------
|
| -
|
| -The C11/C++11 standards mandate that ``volatile`` accesses execute in program
|
| -order (but are not fences, so other memory operations can reorder around them),
|
| -are not necessarily atomic, and can’t be elided. They can be separated into
|
| -smaller width accesses.
|
| -
|
| -The PNaCl toolchain applies regular LLVM optimizations along these guidelines,
|
| -and it further prevents any load/store (even non-``volatile`` and non-atomic
|
| -ones) from moving above or below a volatile operations: they act as compiler
|
| -barriers before optimizations occur. The PNaCl toolchain freezes ``volatile``
|
| -accesses after optimizations into atomic accesses with sequentially consistent
|
| -memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and
|
| -combined with builtin fences these programs can do meaningful cross-thread
|
| -communication without changing code. It also reflects the original code's intent
|
| -and guarantees better portability.
|
| -
|
| -Relaxed ordering could be used instead, but for the first release it is more
|
| -conservative to apply sequential consistency. Future releases may change what
|
| -happens at compile-time, but already-released pexes will continue using
|
| -sequential consistency.
|
| -
|
| -The PNaCl toolchain also requires that ``volatile`` accesses be at least
|
| -naturally aligned, and tries to guarantee this alignment.
|
| -
|
| Memory Model for Concurrent Operations
|
| --------------------------------------
|
|
|
| -The memory model offered by PNaCl relies on the same coding guidelines as the
|
| -C11/C++11 one: concurrent accesses must always occur through atomic primitives
|
| -(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and
|
| -these accesses must always occur with the same size for the same memory
|
| -location. Visibility of stores is provided on a happens-before basis that
|
| -relates memory locations to each other as the C11/C++11 standards do.
|
| -
|
| -As in C11/C++11 some atomic accesses may be implemented with locks on certain
|
| -platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying
|
| -that all types are sometimes lock-free. The ``is_lock_free`` methods will return
|
| -the current platform's implementation at translation time.
|
| -
|
| -The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style
|
| -``__sync_*`` builtins, as well as through C11/C++11 atomic primitives.
|
| -``volatile`` memory accesses can also be used, though these are discouraged, and
|
| -aren't present in bitcode.
|
| +The memory model offered by PNaCl relies on the same coding guidelines
|
| +as the C11/C++11 one: concurrent accesses must always occur through
|
| +atomic primitives (offered by `atomic intrinsics
|
| +<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always
|
| +occur with the same size for the same memory location. Visibility of
|
| +stores is provided on a happens-before basis that relates memory
|
| +locations to each other as the C11/C++11 standards do.
|
| +
|
| +Non-atomic memory accesses may be reordered, separated, elided or fused
|
| +according to C and C++'s memory model before the pexe is created as well
|
| +as after its creation.
|
| +
|
| +As in C11/C++11 some atomic accesses may be implemented with locks on
|
| +certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be
|
| +``1``, signifying that all types are sometimes lock-free. The
|
| +``is_lock_free`` methods and ``atomic_is_lock_free`` will return the
|
| +current platform's implementation at translation time. These macros,
|
| +methods and functions are in the C11 header ``<stdatomic.h>`` and the
|
| +C++11 header ``<atomic>``.
|
| +
|
| +The PNaCl toolchain supports concurrent memory accesses through legacy
|
| +GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic
|
| +primitives. ``volatile`` memory accesses can also be used, though these
|
| +are discouraged. See `Volatile Memory Accesses`_.
|
|
|
| PNaCl supports concurrency and parallelism with some restrictions:
|
|
|
| -* Threading is explicitly supported.
|
| +* Threading is explicitly supported and has no restrictions over what
|
| + prevalent implementations offer. See `Threading`_.
|
| +
|
| +* ``volatile`` and atomic operations are address-free (operations on the
|
| + same memory location via two different addresses work atomically), as
|
| + intended by the C11/C++11 standards. This is critical in supporting
|
| + synchronous "external modifications" such as mapping underlying memory
|
| + at multiple locations.
|
|
|
| -* Inter-process communication through shared memory is limited to operations
|
| - which are lock-free on the current platform (``is_lock_free`` methods). This
|
| - may change at a later date.
|
| +* Inter-process communication through shared memory is currently not
|
| + supported. See `Future Directions`_.
|
|
|
| -* Direct interaction with device memory isn't supported.
|
| +* Signal handling isn't supported, PNaCl therefore promotes all
|
| + primitives to cross-thread (instead of single-thread). This may change
|
| + at a later date. Note that using atomic operations which aren't
|
| + lock-free may lead to deadlocks when handling asynchronous
|
| + signals. See `Future Directions`_.
|
|
|
| -* Signal handling isn't supported, PNaCl therefore promotes all primitives to
|
| - cross-thread (instead of single-thread). This may change at a later date. Note
|
| - that using atomic operations which aren't lock-free may lead to deadlocks when
|
| - handling asynchronous signals.
|
| -
|
| -* ``volatile`` and atomic operations are address-free (operations on the same
|
| - memory location via two different addresses work atomically), as intended by
|
| - the C11/C++11 standards. This is critical for inter-process communication as
|
| - well as synchronous "external modifications" such as mapping underlying memory
|
| - at multiple locations.
|
| +* Direct interaction with device memory isn't supported, and there is no
|
| + intent to support it. The embedding sandbox's runtime can offer APIs
|
| + to indirectly access devices.
|
|
|
| -Setting up the above mechanisms requires assistance from the embedding sandbox's
|
| -runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through
|
| -regular C/C++ code.
|
| -
|
| -The PNaCl toolchain currently optimizes for memory ordering as LLVM normally
|
| -does, but at pexe creation time it promotes all ``volatile`` accesses as well as
|
| -all atomic accesses to be sequentially consistent. Other memory orderings will
|
| -be supported in a future release, but pexes generated with the current toolchain
|
| -will continue functioning with sequential consistency. Using sequential
|
| -consistency provides a total ordering for all sequentially-consistent operations
|
| -on all addresses.
|
| -
|
| -This means that ``volatile`` and atomic memory accesses can only be re-ordered
|
| -in some limited way before the pexe is created, and will act as fences for all
|
| -memory accesses (even non-atomic and non-``volatile``) after pexe creation.
|
| -Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence
|
| -intervenes), separated, elided or fused according to C and C++'s memory model
|
| -before the pexe is created as well as after its creation.
|
| +Setting up the above mechanisms requires assistance from the embedding
|
| +sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup
|
| +can be done through regular C/C++ code.
|
|
|
| Atomic Memory Ordering Constraints
|
| ----------------------------------
|
|
|
| -Atomics follow the same ordering constraints as in regular LLVM, but
|
| -all accesses are promoted to sequential consistency (the strongest
|
| -memory ordering) at pexe creation time. As more C11/C++11 code
|
| -allows us to understand performance and portability needs we intend
|
| -to support the full gamut of C11/C++11 memory orderings:
|
| +Atomics follow the same ordering constraints as in regular C11/C++11,
|
| +but all accesses are promoted to sequential consistency (the strongest
|
| +memory ordering) at pexe creation time. As more C11/C++11 code allows us
|
| +to understand performance and portability needs we intend to support the
|
| +full gamut of C11/C++11 memory orderings:
|
|
|
| - Relaxed: no operation orders memory.
|
| -- Consume: a load operation performs a consume operation on the affected memory
|
| - location (currently unsupported by LLVM).
|
| -- Acquire: a load operation performs an acquire operation on the affected memory
|
| - location.
|
| -- Release: a store operation performs a release operation on the affected memory
|
| - location.
|
| +- Consume: a load operation performs a consume operation on the affected
|
| + memory location (note: currently unsupported by LLVM).
|
| +- Acquire: a load operation performs an acquire operation on the
|
| + affected memory location.
|
| +- Release: a store operation performs a release operation on the
|
| + affected memory location.
|
| - Acquire-release: load and store operations perform acquire and release
|
| operations on the affected memory.
|
| -- Sequentially consistent: same as acquire-release, but providing a global total
|
| - ordering for all affected locations.
|
| +- Sequentially consistent: same as acquire-release, but providing a
|
| + global total ordering for all affected locations.
|
|
|
| As in C11/C++11:
|
|
|
| - Atomic accesses must at least be naturally aligned.
|
| -- Some accesses may not actually be atomic on certain platforms, requiring an
|
| - implementation that uses a global lock.
|
| -- An atomic memory location must always be accessed with atomic primitives, and
|
| - these primitives must always be of the same bit size for that location.
|
| +- Some accesses may not actually be atomic on certain platforms,
|
| + requiring an implementation that uses global lock(s).
|
| +- An atomic memory location must always be accessed with atomic
|
| + primitives, and these primitives must always be of the same bit size
|
| + for that location.
|
| - Not all memory orderings are valid for all atomic operations.
|
|
|
| +Volatile Memory Accesses
|
| +------------------------
|
| +
|
| +The C11/C++11 standards mandate that ``volatile`` accesses execute in
|
| +program order (but are not fences, so other memory operations can
|
| +reorder around them), are not necessarily atomic, and can’t be
|
| +elided. They can be separated into smaller width accesses.
|
| +
|
| +Before any optimizations occur the PNaCl toolchain transforms
|
| +``volatile`` loads and stores into sequentially consistent ``volatile``
|
| +atomic loads and stores, and applies regular compiler optimizations
|
| +along the above guidelines. This orders ``volatiles`` according to the
|
| +atomic rules, and means that fences (including ``__sync_synchronize``)
|
| +act in a better-defined manner. Regular memory accesses still do not
|
| +have ordering guarantees with ``volatile`` and atomic accesses, though
|
| +the internal representation of ``__sync_synchronize`` attempts to
|
| +prevent reordering of memory accesses to objects which may escape.
|
| +
|
| +Relaxed ordering could be used instead, but for the first release it is
|
| +more conservative to apply sequential consistency. Future releases may
|
| +change what happens at compile-time, but already-released pexes will
|
| +continue using sequential consistency.
|
| +
|
| +The PNaCl toolchain also requires that ``volatile`` accesses be at least
|
| +naturally aligned, and tries to guarantee this alignment.
|
| +
|
| +The above guarantees ease the support of legacy (i.e. non-C11/C++11)
|
| +code, and combined with builtin fences these programs can do meaningful
|
| +cross-thread communication without changing code. They also better
|
| +reflect the original code's intent and guarantee better portability.
|
| +
|
| +Threading
|
| +=========
|
| +
|
| +Threading is explicitly supported through C11/C++11's threading
|
| +libraries as well as POSIX threads.
|
| +
|
| +Communication between threads should use atomic primitives as described
|
| +in `Memory Model and Atomics`_.
|
| +
|
| Inline Assembly
|
| ===============
|
|
|
| Inline assembly isn't supported by PNaCl because it isn't portable. The
|
| one current exception is the common compiler barrier idiom
|
| ``asm("":::"memory")``, which gets transformed to a sequentially
|
| -consistent memory barrier (equivalent to ``__sync_synchronize()``).
|
| +consistent memory barrier (equivalent to ``__sync_synchronize()``). In
|
| +PNaCl this barrier is only guaranteed to order ``volatile`` and atomic
|
| +memory accesses, though in practice the implementation attempts to also
|
| +prevent reordering of memory accesses to objects which may escape.
|
| +
|
| +Future Directions
|
| +=================
|
| +
|
| +Inter-Process Communication
|
| +---------------------------
|
| +
|
| +Inter-process communication through shared memory is currently not
|
| +supported by PNaCl. When implemented, it may be limited to operations
|
| +which are lock-free on the current platform (``is_lock_free``
|
| +methods). It will rely on the address-free properly discussed in `Memory
|
| +Model for Concurrent Operations`_.
|
| +
|
| +Signal Handling
|
| +---------------
|
| +
|
| +Untrusted signal handling currently isn't supported by PNaCl. When
|
| +supported, the impact of ``volatile`` and atomics for same-thread signal
|
| +handling will need to be carefully detailed.
|
|
|