Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 ======================= | 1 ======================= |
| 2 PNaCl Developer's Guide | 2 PNaCl Developer's Guide |
| 3 ======================= | 3 ======================= |
| 4 | 4 |
| 5 .. contents:: | 5 .. contents:: |
| 6 :local: | 6 :local: |
| 7 :depth: 3 | 7 :depth: 3 |
| 8 | 8 |
| 9 Introduction | 9 Introduction |
| 10 ============ | 10 ============ |
| 11 | 11 |
| 12 TODO | 12 TODO |
| 13 | 13 |
| 14 Memory Model and Atomics | 14 Memory Model and Atomics |
| 15 ======================== | 15 ======================== |
| 16 | 16 |
| 17 Memory Model for Concurrent Operations | |
| 18 -------------------------------------- | |
| 19 | |
| 20 The memory model offered by PNaCl relies on the same coding guidelines | |
| 21 as the C11/C++11 one: concurrent accesses must always occur through | |
| 22 atomic primitives (offered by `atomic intrinsics | |
| 23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always | |
| 24 occur with the same size for the same memory location. Visibility of | |
| 25 stores is provided on a happens-before basis that relates memory | |
| 26 locations to each other as the C11/C++11 standards do. | |
| 27 | |
| 28 As in C11/C++11 some atomic accesses may be implemented with locks on | |
| 29 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be | |
| 30 ``1``, signifying that all types are sometimes lock-free. The | |
| 31 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the | |
| 32 current platform's implementation at translation time. These macros, | |
| 33 methods and functions are in the C11 header ``<stdatomic.h>`` and the | |
| 34 C++11 header ``<atomic>``. | |
| 35 | |
| 36 The PNaCl toolchain supports concurrent memory accesses through legacy | |
| 37 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic | |
| 38 primitives. ``volatile`` memory accesses can also be used, though these | |
| 39 are discouraged, and aren't present in bitcode. See `Volatile Memory | |
|
eliben
2013/08/05 18:35:54
Remove the "aren't present in bitcode" - i don't t
jvoung (off chromium)
2013/08/05 18:59:41
Yeah, this is the summary for developers. What is
JF
2013/08/05 20:37:48
Done.
| |
| 40 Accesses`_. | |
| 41 | |
| 42 PNaCl supports concurrency and parallelism with some restrictions: | |
| 43 | |
| 44 * Threading is explicitly supported through C11/C++11's threading | |
| 45 libraries as well as POSIX threads. | |
| 46 | |
| 47 * Inter-process communication through shared memory is limited to | |
|
eliben
2013/08/05 18:35:54
What does inter-process communication even mean in
JF
2013/08/05 20:37:48
Done.
| |
| 48 operations which are lock-free on the current platform | |
| 49 (``is_lock_free`` methods). This may change at a later date. | |
| 50 | |
| 51 * Direct interaction with device memory isn't supported. | |
| 52 | |
| 53 * Signal handling isn't supported, PNaCl therefore promotes all | |
| 54 primitives to cross-thread (instead of single-thread). This may change | |
| 55 at a later date. Note that using atomic operations which aren't | |
| 56 lock-free may lead to deadlocks when handling asynchronous signals. | |
| 57 | |
| 58 * ``volatile`` and atomic operations are address-free (operations on the | |
| 59 same memory location via two different addresses work atomically), as | |
| 60 intended by the C11/C++11 standards. This is critical for | |
| 61 inter-process communication as well as synchronous "external | |
| 62 modifications" such as mapping underlying memory at multiple | |
| 63 locations. | |
| 64 | |
| 65 Setting up the above mechanisms requires assistance from the embedding | |
| 66 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup | |
| 67 can be done through regular C/C++ code. | |
| 68 | |
| 69 Atomic Memory Ordering Constraints | |
| 70 ---------------------------------- | |
| 71 | |
| 72 Atomics follow the same ordering constraints as in regular LLVM, but all | |
| 73 accesses are promoted to sequential consistency (the strongest memory | |
| 74 ordering) at pexe creation time. As more C11/C++11 code allows us to | |
|
jvoung (off chromium)
2013/08/05 18:59:41
Should the memory orderings change also be done by
JF
2013/08/05 20:37:48
I think the current implementation should offer a
| |
| 75 understand performance and portability needs we intend to support the | |
| 76 full gamut of C11/C++11 memory orderings: | |
| 77 | |
| 78 - Relaxed: no operation orders memory. | |
|
eliben
2013/08/05 18:35:54
Maybe this list does not belong here? This is user
JF
2013/08/05 20:37:48
This is not an addition, I just moved it around. I
| |
| 79 - Consume: a load operation performs a consume operation on the affected | |
| 80 memory location (currently unsupported by LLVM). | |
| 81 - Acquire: a load operation performs an acquire operation on the | |
| 82 affected memory location. | |
| 83 - Release: a store operation performs a release operation on the | |
| 84 affected memory location. | |
| 85 - Acquire-release: load and store operations perform acquire and release | |
| 86 operations on the affected memory. | |
| 87 - Sequentially consistent: same as acquire-release, but providing a | |
| 88 global total ordering for all affected locations. | |
| 89 | |
| 90 As in C11/C++11: | |
| 91 | |
| 92 - Atomic accesses must at least be naturally aligned. | |
| 93 - Some accesses may not actually be atomic on certain platforms, | |
| 94 requiring an implementation that uses global lock(s). | |
| 95 - An atomic memory location must always be accessed with atomic | |
| 96 primitives, and these primitives must always be of the same bit size | |
| 97 for that location. | |
| 98 - Not all memory orderings are valid for all atomic operations. | |
| 99 | |
| 17 Volatile Memory Accesses | 100 Volatile Memory Accesses |
| 18 ------------------------ | 101 ------------------------ |
| 19 | 102 |
| 20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program | 103 The C11/C++11 standards mandate that ``volatile`` accesses execute in |
| 21 order (but are not fences, so other memory operations can reorder around them), | 104 program order (but are not fences, so other memory operations can |
| 22 are not necessarily atomic, and can’t be elided. They can be separated into | 105 reorder around them), are not necessarily atomic, and can’t be |
| 23 smaller width accesses. | 106 elided. They can be separated into smaller width accesses. |
| 24 | 107 |
| 25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines, | 108 Before any optimizations occur the PNaCl toolchain transforms |
| 26 and it further prevents any load/store (even non-``volatile`` and non-atomic | 109 ``volatile`` loads and stores into sequentially consistent ``volatile`` |
| 27 ones) from moving above or below a volatile operations: they act as compiler | 110 atomic loads and stores, and applies regular LLVM optimizations along |
| 28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` | 111 the above guidelines. This orders ``volatiles`` according to the atomic |
| 29 accesses after optimizations into atomic accesses with sequentially consistent | 112 rules, and means that fences (including ``__sync_synchronize``) act in a |
| 30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and | 113 better-defined manner. Regular memory accesses still do not have |
| 31 combined with builtin fences these programs can do meaningful cross-thread | 114 ordering guarantees with ``volatile`` and atomic accesses, though the |
| 32 communication without changing code. It also reflects the original code's intent | 115 internal representation of ``__sync_synchronize`` attempts to prevent |
| 33 and guarantees better portability. | 116 reordering of memory accesses to objects which may escape. |
| 34 | 117 |
| 35 Relaxed ordering could be used instead, but for the first release it is more | 118 Relaxed ordering could be used instead, but for the first release it is |
| 36 conservative to apply sequential consistency. Future releases may change what | 119 more conservative to apply sequential consistency. Future releases may |
| 37 happens at compile-time, but already-released pexes will continue using | 120 change what happens at compile-time, but already-released pexes will |
| 38 sequential consistency. | 121 continue using sequential consistency. |
| 39 | 122 |
| 40 The PNaCl toolchain also requires that ``volatile`` accesses be at least | 123 The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| 41 naturally aligned, and tries to guarantee this alignment. | 124 naturally aligned, and tries to guarantee this alignment. |
| 42 | 125 |
| 43 Memory Model for Concurrent Operations | 126 The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
| 44 -------------------------------------- | 127 code, and combined with builtin fences these programs can do meaningful |
| 128 cross-thread communication without changing code. They also better | |
| 129 reflect the original code's intent and guarantee better portability. | |
| 45 | 130 |
| 46 The memory model offered by PNaCl relies on the same coding guidelines as the | 131 Stable Transfer Format |
| 47 C11/C++11 one: concurrent accesses must always occur through atomic primitives | 132 ---------------------- |
| 48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and | |
| 49 these accesses must always occur with the same size for the same memory | |
| 50 location. Visibility of stores is provided on a happens-before basis that | |
| 51 relates memory locations to each other as the C11/C++11 standards do. | |
| 52 | 133 |
| 53 As in C11/C++11 some atomic accesses may be implemented with locks on certain | 134 The PNaCl toolchain freezes atomic and ``volatile`` memory accesses |
| 54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying | 135 after optimizations into atomic accesses with sequentially consistent |
|
jvoung (off chromium)
2013/08/05 18:59:41
re "after optimizations": volatiles get converted
JF
2013/08/05 20:37:48
OK, I can remove this section.
| |
| 55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return | 136 memory ordering. Other memory orderings will be exposed in future |
| 56 the current platform's implementation at translation time. | 137 releases, when we have a better grasp of existing code's needs, |
| 138 portability implications, and are confident that implementation limits | |
| 139 are overcome. Future releases may change what happens at compile-time, | |
| 140 but already-released pexes will continue using sequential consistency. | |
| 57 | 141 |
| 58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style | 142 Non-atomic and non-``volatile`` memory accesses may be reordered, |
| 59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. | 143 separated, elided or fused according to C and C++'s memory model before |
| 60 ``volatile`` memory accesses can also be used, though these are discouraged, and | 144 the pexe is created as well as after its creation. |
| 61 aren't present in bitcode. | |
| 62 | |
| 63 PNaCl supports concurrency and parallelism with some restrictions: | |
| 64 | |
| 65 * Threading is explicitly supported. | |
| 66 | |
| 67 * Inter-process communication through shared memory is limited to operations | |
| 68 which are lock-free on the current platform (``is_lock_free`` methods). This | |
| 69 may change at a later date. | |
| 70 | |
| 71 * Direct interaction with device memory isn't supported. | |
| 72 | |
| 73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to | |
| 74 cross-thread (instead of single-thread). This may change at a later date. Note | |
| 75 that using atomic operations which aren't lock-free may lead to deadlocks when | |
| 76 handling asynchronous signals. | |
| 77 | |
| 78 * ``volatile`` and atomic operations are address-free (operations on the same | |
| 79 memory location via two different addresses work atomically), as intended by | |
| 80 the C11/C++11 standards. This is critical for inter-process communication as | |
| 81 well as synchronous "external modifications" such as mapping underlying memory | |
| 82 at multiple locations. | |
| 83 | |
| 84 Setting up the above mechanisms requires assistance from the embedding sandbox's | |
| 85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through | |
| 86 regular C/C++ code. | |
| 87 | |
| 88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally | |
| 89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as | |
| 90 all atomic accesses to be sequentially consistent. Other memory orderings will | |
| 91 be supported in a future release, but pexes generated with the current toolchain | |
| 92 will continue functioning with sequential consistency. Using sequential | |
| 93 consistency provides a total ordering for all sequentially-consistent operations | |
| 94 on all addresses. | |
| 95 | |
| 96 This means that ``volatile`` and atomic memory accesses can only be re-ordered | |
| 97 in some limited way before the pexe is created, and will act as fences for all | |
| 98 memory accesses (even non-atomic and non-``volatile``) after pexe creation. | |
| 99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence | |
| 100 intervenes), separated, elided or fused according to C and C++'s memory model | |
| 101 before the pexe is created as well as after its creation. | |
| 102 | |
| 103 Atomic Memory Ordering Constraints | |
| 104 ---------------------------------- | |
| 105 | |
| 106 Atomics follow the same ordering constraints as in regular LLVM, but | |
| 107 all accesses are promoted to sequential consistency (the strongest | |
| 108 memory ordering) at pexe creation time. As more C11/C++11 code | |
| 109 allows us to understand performance and portability needs we intend | |
| 110 to support the full gamut of C11/C++11 memory orderings: | |
| 111 | |
| 112 - Relaxed: no operation orders memory. | |
| 113 - Consume: a load operation performs a consume operation on the affected memory | |
| 114 location (currently unsupported by LLVM). | |
| 115 - Acquire: a load operation performs an acquire operation on the affected memory | |
| 116 location. | |
| 117 - Release: a store operation performs a release operation on the affected memory | |
| 118 location. | |
| 119 - Acquire-release: load and store operations perform acquire and release | |
| 120 operations on the affected memory. | |
| 121 - Sequentially consistent: same as acquire-release, but providing a global total | |
| 122 ordering for all affected locations. | |
| 123 | |
| 124 As in C11/C++11: | |
| 125 | |
| 126 - Atomic accesses must at least be naturally aligned. | |
| 127 - Some accesses may not actually be atomic on certain platforms, requiring an | |
| 128 implementation that uses a global lock. | |
| 129 - An atomic memory location must always be accessed with atomic primitives, and | |
| 130 these primitives must always be of the same bit size for that location. | |
| 131 - Not all memory orderings are valid for all atomic operations. | |
| 132 | 145 |
| 133 Inline Assembly | 146 Inline Assembly |
| 134 =============== | 147 =============== |
| 135 | 148 |
| 136 Inline assembly isn't supported by PNaCl because it isn't portable. The | 149 Inline assembly isn't supported by PNaCl because it isn't portable. The |
| 137 one current exception is the common compiler barrier idiom | 150 one current exception is the common compiler barrier idiom |
| 138 ``asm("":::"memory")``, which gets transformed to a sequentially | 151 ``asm("":::"memory")``, which gets transformed to a sequentially |
| 139 consistent memory barrier (equivalent to ``__sync_synchronize()``). | 152 consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
| 153 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic | |
| 154 memory accesses, though in practice the implementation attempts to also | |
| 155 prevent reordering of memory accesses to objects which may escape. | |
| OLD | NEW |