docs/PNaClDeveloperGuide.rst - Issue 22240002: Rework PNaCl memory ordering

Unified Diff: docs/PNaClDeveloperGuide.rst

Issue 22240002: Rework PNaCl memory ordering (Closed) Base URL: http://git.chromium.org/native_client/pnacl-llvm.git@master

Patch Set: Created 7 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: docs/PNaClDeveloperGuide.rst

diff --git a/docs/PNaClDeveloperGuide.rst b/docs/PNaClDeveloperGuide.rst

index 9c27ae5c14cb7a2215ee6f9363080c5d4f5b1449..a66e35109913b27176f706c033be05a52360331c 100644

--- a/docs/PNaClDeveloperGuide.rst

+++ b/docs/PNaClDeveloperGuide.rst

@@ -14,126 +14,142 @@ TODO

Memory Model and Atomics

========================

-Volatile Memory Accesses

-------------------------

-The C11/C++11 standards mandate that ``volatile`` accesses execute in program

-order (but are not fences, so other memory operations can reorder around them),

-are not necessarily atomic, and can’t be elided. They can be separated into

-smaller width accesses.

-The PNaCl toolchain applies regular LLVM optimizations along these guidelines,

-and it further prevents any load/store (even non-``volatile`` and non-atomic

-ones) from moving above or below a volatile operations: they act as compiler

-barriers before optimizations occur. The PNaCl toolchain freezes ``volatile``

-accesses after optimizations into atomic accesses with sequentially consistent

-memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and

-combined with builtin fences these programs can do meaningful cross-thread

-communication without changing code. It also reflects the original code's intent

-and guarantees better portability.

-Relaxed ordering could be used instead, but for the first release it is more

-conservative to apply sequential consistency. Future releases may change what

-happens at compile-time, but already-released pexes will continue using

-sequential consistency.

-The PNaCl toolchain also requires that ``volatile`` accesses be at least

-naturally aligned, and tries to guarantee this alignment.

Memory Model for Concurrent Operations

--------------------------------------

-The memory model offered by PNaCl relies on the same coding guidelines as the

-C11/C++11 one: concurrent accesses must always occur through atomic primitives

-(offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and

-these accesses must always occur with the same size for the same memory

-location. Visibility of stores is provided on a happens-before basis that

-relates memory locations to each other as the C11/C++11 standards do.

-As in C11/C++11 some atomic accesses may be implemented with locks on certain

-platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying

-that all types are sometimes lock-free. The ``is_lock_free`` methods will return

-the current platform's implementation at translation time.

-The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style

-``__sync_*`` builtins, as well as through C11/C++11 atomic primitives.

-``volatile`` memory accesses can also be used, though these are discouraged, and

-aren't present in bitcode.

+The memory model offered by PNaCl relies on the same coding guidelines

+as the C11/C++11 one: concurrent accesses must always occur through

+atomic primitives (offered by `atomic intrinsics

+<PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always

+occur with the same size for the same memory location. Visibility of

+stores is provided on a happens-before basis that relates memory

+locations to each other as the C11/C++11 standards do.

+As in C11/C++11 some atomic accesses may be implemented with locks on

+certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be

+``1``, signifying that all types are sometimes lock-free. The

+``is_lock_free`` methods and ``atomic_is_lock_free`` will return the

+current platform's implementation at translation time. These macros,

+methods and functions are in the C11 header ``<stdatomic.h>`` and the

+C++11 header ``<atomic>``.

+The PNaCl toolchain supports concurrent memory accesses through legacy

+GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic

+primitives. ``volatile`` memory accesses can also be used, though these

+are discouraged, and aren't present in bitcode. See `Volatile Memory

eliben 2013/08/05 18:35:54 Remove the "aren't present in bitcode" - i don't t

jvoung (off chromium) 2013/08/05 18:59:41 Yeah, this is the summary for developers. What is

JF 2013/08/05 20:37:48 Done.

+Accesses`_.

PNaCl supports concurrency and parallelism with some restrictions:

-* Threading is explicitly supported.

+* Threading is explicitly supported through C11/C++11's threading

+ libraries as well as POSIX threads.

-* Inter-process communication through shared memory is limited to operations

- which are lock-free on the current platform (``is_lock_free`` methods). This

- may change at a later date.

+* Inter-process communication through shared memory is limited to

eliben 2013/08/05 18:35:54 What does inter-process communication even mean in

JF 2013/08/05 20:37:48 Done.

+ operations which are lock-free on the current platform

+ (``is_lock_free`` methods). This may change at a later date.

* Direct interaction with device memory isn't supported.

-* Signal handling isn't supported, PNaCl therefore promotes all primitives to

- cross-thread (instead of single-thread). This may change at a later date. Note

- that using atomic operations which aren't lock-free may lead to deadlocks when

- handling asynchronous signals.

+* Signal handling isn't supported, PNaCl therefore promotes all

+ primitives to cross-thread (instead of single-thread). This may change

+ at a later date. Note that using atomic operations which aren't

+ lock-free may lead to deadlocks when handling asynchronous signals.

-* ``volatile`` and atomic operations are address-free (operations on the same

- memory location via two different addresses work atomically), as intended by

- the C11/C++11 standards. This is critical for inter-process communication as

- well as synchronous "external modifications" such as mapping underlying memory

- at multiple locations.

-Setting up the above mechanisms requires assistance from the embedding sandbox's

-runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through

-regular C/C++ code.

-The PNaCl toolchain currently optimizes for memory ordering as LLVM normally

-does, but at pexe creation time it promotes all ``volatile`` accesses as well as

-all atomic accesses to be sequentially consistent. Other memory orderings will

-be supported in a future release, but pexes generated with the current toolchain

-will continue functioning with sequential consistency. Using sequential

-consistency provides a total ordering for all sequentially-consistent operations

-on all addresses.

-This means that ``volatile`` and atomic memory accesses can only be re-ordered

-in some limited way before the pexe is created, and will act as fences for all

-memory accesses (even non-atomic and non-``volatile``) after pexe creation.

-Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence

-intervenes), separated, elided or fused according to C and C++'s memory model

-before the pexe is created as well as after its creation.

+* ``volatile`` and atomic operations are address-free (operations on the

+ same memory location via two different addresses work atomically), as

+ intended by the C11/C++11 standards. This is critical for

+ inter-process communication as well as synchronous "external

+ modifications" such as mapping underlying memory at multiple

+ locations.

+Setting up the above mechanisms requires assistance from the embedding

+sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup

+can be done through regular C/C++ code.

Atomic Memory Ordering Constraints

----------------------------------

-Atomics follow the same ordering constraints as in regular LLVM, but

-all accesses are promoted to sequential consistency (the strongest

-memory ordering) at pexe creation time. As more C11/C++11 code

-allows us to understand performance and portability needs we intend

-to support the full gamut of C11/C++11 memory orderings:

+Atomics follow the same ordering constraints as in regular LLVM, but all

+accesses are promoted to sequential consistency (the strongest memory

+ordering) at pexe creation time. As more C11/C++11 code allows us to

jvoung (off chromium) 2013/08/05 18:59:41 Should the memory orderings change also be done by

JF 2013/08/05 20:37:48 I think the current implementation should offer a

+understand performance and portability needs we intend to support the

+full gamut of C11/C++11 memory orderings:

- Relaxed: no operation orders memory.

eliben 2013/08/05 18:35:54 Maybe this list does not belong here? This is user

JF 2013/08/05 20:37:48 This is not an addition, I just moved it around. I

-- Consume: a load operation performs a consume operation on the affected memory

- location (currently unsupported by LLVM).

-- Acquire: a load operation performs an acquire operation on the affected memory

- location.

-- Release: a store operation performs a release operation on the affected memory

- location.

+- Consume: a load operation performs a consume operation on the affected

+ memory location (currently unsupported by LLVM).

+- Acquire: a load operation performs an acquire operation on the

+ affected memory location.

+- Release: a store operation performs a release operation on the

+ affected memory location.

- Acquire-release: load and store operations perform acquire and release

operations on the affected memory.

-- Sequentially consistent: same as acquire-release, but providing a global total

- ordering for all affected locations.

+- Sequentially consistent: same as acquire-release, but providing a

+ global total ordering for all affected locations.

As in C11/C++11:

- Atomic accesses must at least be naturally aligned.

-- Some accesses may not actually be atomic on certain platforms, requiring an

- implementation that uses a global lock.

-- An atomic memory location must always be accessed with atomic primitives, and

- these primitives must always be of the same bit size for that location.

+- Some accesses may not actually be atomic on certain platforms,

+ requiring an implementation that uses global lock(s).

+- An atomic memory location must always be accessed with atomic

+ primitives, and these primitives must always be of the same bit size

+ for that location.

- Not all memory orderings are valid for all atomic operations.

+Volatile Memory Accesses

+------------------------

+The C11/C++11 standards mandate that ``volatile`` accesses execute in

+program order (but are not fences, so other memory operations can

+reorder around them), are not necessarily atomic, and can’t be

+elided. They can be separated into smaller width accesses.

+Before any optimizations occur the PNaCl toolchain transforms

+``volatile`` loads and stores into sequentially consistent ``volatile``

+atomic loads and stores, and applies regular LLVM optimizations along

+the above guidelines. This orders ``volatiles`` according to the atomic

+rules, and means that fences (including ``__sync_synchronize``) act in a

+better-defined manner. Regular memory accesses still do not have

+ordering guarantees with ``volatile`` and atomic accesses, though the

+internal representation of ``__sync_synchronize`` attempts to prevent

+reordering of memory accesses to objects which may escape.

+Relaxed ordering could be used instead, but for the first release it is

+more conservative to apply sequential consistency. Future releases may

+change what happens at compile-time, but already-released pexes will

+continue using sequential consistency.

+The PNaCl toolchain also requires that ``volatile`` accesses be at least

+naturally aligned, and tries to guarantee this alignment.

+The above guarantees ease the support of legacy (i.e. non-C11/C++11)

+code, and combined with builtin fences these programs can do meaningful

+cross-thread communication without changing code. They also better

+reflect the original code's intent and guarantee better portability.

+Stable Transfer Format

+----------------------

+The PNaCl toolchain freezes atomic and ``volatile`` memory accesses

+after optimizations into atomic accesses with sequentially consistent

jvoung (off chromium) 2013/08/05 18:59:41 re "after optimizations": volatiles get converted

JF 2013/08/05 20:37:48 OK, I can remove this section.

+memory ordering. Other memory orderings will be exposed in future

+releases, when we have a better grasp of existing code's needs,

+portability implications, and are confident that implementation limits

+are overcome. Future releases may change what happens at compile-time,

+but already-released pexes will continue using sequential consistency.

+Non-atomic and non-``volatile`` memory accesses may be reordered,

+separated, elided or fused according to C and C++'s memory model before

+the pexe is created as well as after its creation.

Inline Assembly

===============

Inline assembly isn't supported by PNaCl because it isn't portable. The

one current exception is the common compiler barrier idiom

``asm("":::"memory")``, which gets transformed to a sequentially

-consistent memory barrier (equivalent to ``__sync_synchronize()``).

+consistent memory barrier (equivalent to ``__sync_synchronize()``). In

+PNaCl this barrier is only guaranteed to order ``volatile`` and atomic

+memory accesses, though in practice the implementation attempts to also

+prevent reordering of memory accesses to objects which may escape.

« no previous file with comments | « no previous file | docs/PNaClLangRef.rst » ('j') | docs/PNaClLangRef.rst » ('J')