Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 ======================= | 1 ======================= |
| 2 PNaCl Developer's Guide | 2 PNaCl Developer's Guide |
| 3 ======================= | 3 ======================= |
| 4 | 4 |
| 5 .. contents:: | 5 .. contents:: |
| 6 :local: | 6 :local: |
| 7 :depth: 3 | 7 :depth: 3 |
| 8 | 8 |
| 9 Introduction | 9 Introduction |
| 10 ============ | 10 ============ |
| 11 | 11 |
| 12 TODO | 12 TODO |
| 13 | 13 |
| 14 Memory Model and Atomics | 14 Memory Model and Atomics |
| 15 ======================== | 15 ======================== |
| 16 | 16 |
| 17 Memory Model for Concurrent Operations | |
| 18 -------------------------------------- | |
| 19 | |
| 20 The memory model offered by PNaCl relies on the same coding guidelines | |
| 21 as the C11/C++11 one: concurrent accesses must always occur through | |
| 22 atomic primitives (offered by `atomic intrinsics | |
| 23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always | |
| 24 occur with the same size for the same memory location. Visibility of | |
| 25 stores is provided on a happens-before basis that relates memory | |
| 26 locations to each other as the C11/C++11 standards do. | |
| 27 | |
| 28 Non-atomic memory accesses may be reordered, separated, elided or fused | |
| 29 according to C and C++'s memory model before the pexe is created as well | |
| 30 as after its creation. | |
| 31 | |
| 32 As in C11/C++11 some atomic accesses may be implemented with locks on | |
| 33 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be | |
| 34 ``1``, signifying that all types are sometimes lock-free. The | |
| 35 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the | |
| 36 current platform's implementation at translation time. These macros, | |
| 37 methods and functions are in the C11 header ``<stdatomic.h>`` and the | |
| 38 C++11 header ``<atomic>``. | |
| 39 | |
| 40 The PNaCl toolchain supports concurrent memory accesses through legacy | |
| 41 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic | |
| 42 primitives. ``volatile`` memory accesses can also be used, though these | |
| 43 are discouraged. See `Volatile Memory Accesses`_. | |
| 44 | |
| 45 PNaCl supports concurrency and parallelism with some restrictions: | |
| 46 | |
| 47 * Threading is explicitly supported and has no restrictions over what | |
| 48 prevalent implementations offer. See `Threading`_. | |
| 49 | |
| 50 * ``volatile`` and atomic operations are address-free (operations on the | |
| 51 same memory location via two different addresses work atomically), as | |
| 52 intended by the C11/C++11 standards. This is critical for | |
| 53 inter-process communication as well as synchronous "external | |
| 54 modifications" such as mapping underlying memory at multiple | |
| 55 locations. | |
| 56 | |
| 57 * Inter-process communication through shared memory is currently not | |
| 58 supported. See `Future Direction`_. | |
| 59 | |
| 60 * Signal handling isn't supported, PNaCl therefore promotes all | |
| 61 primitives to cross-thread (instead of single-thread). This may change | |
| 62 at a later date. Note that using atomic operations which aren't | |
| 63 lock-free may lead to deadlocks when handling asynchronous | |
| 64 signals. See `Future Direction`_. | |
| 65 | |
| 66 * Direct interaction with device memory isn't supported, and there is no | |
| 67 intent to support it. The embedding sandbox's runtime can offer APIs | |
| 68 to indirectly access devices. | |
| 69 | |
| 70 Setting up the above mechanisms requires assistance from the embedding | |
| 71 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup | |
| 72 can be done through regular C/C++ code. | |
| 73 | |
| 74 Atomic Memory Ordering Constraints | |
| 75 ---------------------------------- | |
| 76 | |
| 77 Atomics follow the same ordering constraints as in regular C11/C++11, | |
| 78 but all accesses are promoted to sequential consistency (the strongest | |
| 79 memory ordering) at pexe creation time. As more C11/C++11 code allows us | |
| 80 to understand performance and portability needs we intend to support the | |
| 81 full gamut of C11/C++11 memory orderings: | |
| 82 | |
| 83 - Relaxed: no operation orders memory. | |
| 84 - Consume: a load operation performs a consume operation on the affected | |
| 85 memory location (note: currently unsupported by LLVM). | |
| 86 - Acquire: a load operation performs an acquire operation on the | |
| 87 affected memory location. | |
| 88 - Release: a store operation performs a release operation on the | |
| 89 affected memory location. | |
| 90 - Acquire-release: load and store operations perform acquire and release | |
| 91 operations on the affected memory. | |
| 92 - Sequentially consistent: same as acquire-release, but providing a | |
| 93 global total ordering for all affected locations. | |
| 94 | |
| 95 As in C11/C++11: | |
| 96 | |
| 97 - Atomic accesses must at least be naturally aligned. | |
| 98 - Some accesses may not actually be atomic on certain platforms, | |
| 99 requiring an implementation that uses global lock(s). | |
| 100 - An atomic memory location must always be accessed with atomic | |
| 101 primitives, and these primitives must always be of the same bit size | |
| 102 for that location. | |
| 103 - Not all memory orderings are valid for all atomic operations. | |
| 104 | |
| 17 Volatile Memory Accesses | 105 Volatile Memory Accesses |
| 18 ------------------------ | 106 ------------------------ |
| 19 | 107 |
| 20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program | 108 The C11/C++11 standards mandate that ``volatile`` accesses execute in |
| 21 order (but are not fences, so other memory operations can reorder around them), | 109 program order (but are not fences, so other memory operations can |
| 22 are not necessarily atomic, and can’t be elided. They can be separated into | 110 reorder around them), are not necessarily atomic, and can’t be |
| 23 smaller width accesses. | 111 elided. They can be separated into smaller width accesses. |
| 24 | 112 |
| 25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines, | 113 Before any optimizations occur the PNaCl toolchain transforms |
| 26 and it further prevents any load/store (even non-``volatile`` and non-atomic | 114 ``volatile`` loads and stores into sequentially consistent ``volatile`` |
| 27 ones) from moving above or below a volatile operations: they act as compiler | 115 atomic loads and stores, and applies regular compiler optimizations |
| 28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` | 116 along the above guidelines. This orders ``volatiles`` according to the |
| 29 accesses after optimizations into atomic accesses with sequentially consistent | 117 atomic rules, and means that fences (including ``__sync_synchronize``) |
| 30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and | 118 act in a better-defined manner. Regular memory accesses still do not |
| 31 combined with builtin fences these programs can do meaningful cross-thread | 119 have ordering guarantees with ``volatile`` and atomic accesses, though |
| 32 communication without changing code. It also reflects the original code's intent | 120 the internal representation of ``__sync_synchronize`` attempts to |
| 33 and guarantees better portability. | 121 prevent reordering of memory accesses to objects which may escape. |
| 34 | 122 |
| 35 Relaxed ordering could be used instead, but for the first release it is more | 123 Relaxed ordering could be used instead, but for the first release it is |
| 36 conservative to apply sequential consistency. Future releases may change what | 124 more conservative to apply sequential consistency. Future releases may |
| 37 happens at compile-time, but already-released pexes will continue using | 125 change what happens at compile-time, but already-released pexes will |
| 38 sequential consistency. | 126 continue using sequential consistency. |
| 39 | 127 |
| 40 The PNaCl toolchain also requires that ``volatile`` accesses be at least | 128 The PNaCl toolchain also requires that ``volatile`` accesses be at least |
| 41 naturally aligned, and tries to guarantee this alignment. | 129 naturally aligned, and tries to guarantee this alignment. |
| 42 | 130 |
| 43 Memory Model for Concurrent Operations | 131 The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
| 44 -------------------------------------- | 132 code, and combined with builtin fences these programs can do meaningful |
| 133 cross-thread communication without changing code. They also better | |
| 134 reflect the original code's intent and guarantee better portability. | |
| 45 | 135 |
| 46 The memory model offered by PNaCl relies on the same coding guidelines as the | 136 Threading |
| 47 C11/C++11 one: concurrent accesses must always occur through atomic primitives | 137 ========= |
| 48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and | |
| 49 these accesses must always occur with the same size for the same memory | |
| 50 location. Visibility of stores is provided on a happens-before basis that | |
| 51 relates memory locations to each other as the C11/C++11 standards do. | |
| 52 | 138 |
| 53 As in C11/C++11 some atomic accesses may be implemented with locks on certain | 139 Threading is explicitly supported through C11/C++11's threading |
| 54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying | 140 libraries as well as POSIX threads. |
| 55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return | |
| 56 the current platform's implementation at translation time. | |
| 57 | 141 |
| 58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style | 142 Communication between threads should use atomic primitives as described |
| 59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. | 143 in `Memory Model and Atomics`_. |
| 60 ``volatile`` memory accesses can also be used, though these are discouraged, and | |
| 61 aren't present in bitcode. | |
| 62 | |
| 63 PNaCl supports concurrency and parallelism with some restrictions: | |
| 64 | |
| 65 * Threading is explicitly supported. | |
| 66 | |
| 67 * Inter-process communication through shared memory is limited to operations | |
| 68 which are lock-free on the current platform (``is_lock_free`` methods). This | |
| 69 may change at a later date. | |
| 70 | |
| 71 * Direct interaction with device memory isn't supported. | |
| 72 | |
| 73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to | |
| 74 cross-thread (instead of single-thread). This may change at a later date. Note | |
| 75 that using atomic operations which aren't lock-free may lead to deadlocks when | |
| 76 handling asynchronous signals. | |
| 77 | |
| 78 * ``volatile`` and atomic operations are address-free (operations on the same | |
| 79 memory location via two different addresses work atomically), as intended by | |
| 80 the C11/C++11 standards. This is critical for inter-process communication as | |
| 81 well as synchronous "external modifications" such as mapping underlying memory | |
| 82 at multiple locations. | |
| 83 | |
| 84 Setting up the above mechanisms requires assistance from the embedding sandbox's | |
| 85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through | |
| 86 regular C/C++ code. | |
| 87 | |
| 88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally | |
| 89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as | |
| 90 all atomic accesses to be sequentially consistent. Other memory orderings will | |
| 91 be supported in a future release, but pexes generated with the current toolchain | |
| 92 will continue functioning with sequential consistency. Using sequential | |
| 93 consistency provides a total ordering for all sequentially-consistent operations | |
| 94 on all addresses. | |
| 95 | |
| 96 This means that ``volatile`` and atomic memory accesses can only be re-ordered | |
| 97 in some limited way before the pexe is created, and will act as fences for all | |
| 98 memory accesses (even non-atomic and non-``volatile``) after pexe creation. | |
| 99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence | |
| 100 intervenes), separated, elided or fused according to C and C++'s memory model | |
| 101 before the pexe is created as well as after its creation. | |
| 102 | |
| 103 Atomic Memory Ordering Constraints | |
| 104 ---------------------------------- | |
| 105 | |
| 106 Atomics follow the same ordering constraints as in regular LLVM, but | |
| 107 all accesses are promoted to sequential consistency (the strongest | |
| 108 memory ordering) at pexe creation time. As more C11/C++11 code | |
| 109 allows us to understand performance and portability needs we intend | |
| 110 to support the full gamut of C11/C++11 memory orderings: | |
| 111 | |
| 112 - Relaxed: no operation orders memory. | |
| 113 - Consume: a load operation performs a consume operation on the affected memory | |
| 114 location (currently unsupported by LLVM). | |
| 115 - Acquire: a load operation performs an acquire operation on the affected memory | |
| 116 location. | |
| 117 - Release: a store operation performs a release operation on the affected memory | |
| 118 location. | |
| 119 - Acquire-release: load and store operations perform acquire and release | |
| 120 operations on the affected memory. | |
| 121 - Sequentially consistent: same as acquire-release, but providing a global total | |
| 122 ordering for all affected locations. | |
| 123 | |
| 124 As in C11/C++11: | |
| 125 | |
| 126 - Atomic accesses must at least be naturally aligned. | |
| 127 - Some accesses may not actually be atomic on certain platforms, requiring an | |
| 128 implementation that uses a global lock. | |
| 129 - An atomic memory location must always be accessed with atomic primitives, and | |
| 130 these primitives must always be of the same bit size for that location. | |
| 131 - Not all memory orderings are valid for all atomic operations. | |
| 132 | 144 |
| 133 Inline Assembly | 145 Inline Assembly |
| 134 =============== | 146 =============== |
| 135 | 147 |
| 136 Inline assembly isn't supported by PNaCl because it isn't portable. The | 148 Inline assembly isn't supported by PNaCl because it isn't portable. The |
| 137 one current exception is the common compiler barrier idiom | 149 one current exception is the common compiler barrier idiom |
| 138 ``asm("":::"memory")``, which gets transformed to a sequentially | 150 ``asm("":::"memory")``, which gets transformed to a sequentially |
| 139 consistent memory barrier (equivalent to ``__sync_synchronize()``). | 151 consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
| 152 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic | |
| 153 memory accesses, though in practice the implementation attempts to also | |
| 154 prevent reordering of memory accesses to objects which may escape. | |
| 155 | |
| 156 Future Direction | |
|
eliben
2013/08/06 16:11:35
"Future Directions" ?
JF
2013/08/06 16:19:33
Done.
| |
| 157 ================ | |
| 158 | |
| 159 Inter-Process Communication | |
| 160 --------------------------- | |
| 161 | |
| 162 Inter-process communication through shared memory is currently not | |
| 163 supported by PNaCl. When implemented, it may be limited to operations | |
| 164 which are lock-free on the current platform (``is_lock_free`` methods). | |
| 165 | |
| 166 Signal Handling | |
| 167 --------------- | |
| 168 | |
| 169 Untrusted signal handling currently isn't supported by PNaCl. When | |
| 170 supported, the impact of ``volatile`` and atomics for same-thread signal | |
| 171 handling will need to be carefully detailed. | |
| OLD | NEW |