OLD | NEW |
1 ======================= | 1 ======================= |
2 PNaCl Developer's Guide | 2 PNaCl Developer's Guide |
3 ======================= | 3 ======================= |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
11 | 11 |
12 TODO | 12 TODO |
13 | 13 |
14 Memory Model and Atomics | 14 Memory Model and Atomics |
15 ======================== | 15 ======================== |
16 | 16 |
| 17 Memory Model for Concurrent Operations |
| 18 -------------------------------------- |
| 19 |
| 20 The memory model offered by PNaCl relies on the same coding guidelines |
| 21 as the C11/C++11 one: concurrent accesses must always occur through |
| 22 atomic primitives (offered by `atomic intrinsics |
| 23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always |
| 24 occur with the same size for the same memory location. Visibility of |
| 25 stores is provided on a happens-before basis that relates memory |
| 26 locations to each other as the C11/C++11 standards do. |
| 27 |
| 28 Non-atomic memory accesses may be reordered, separated, elided or fused |
| 29 according to C and C++'s memory model before the pexe is created as well |
| 30 as after its creation. |
| 31 |
| 32 As in C11/C++11 some atomic accesses may be implemented with locks on |
| 33 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be |
| 34 ``1``, signifying that all types are sometimes lock-free. The |
| 35 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the |
| 36 current platform's implementation at translation time. These macros, |
| 37 methods and functions are in the C11 header ``<stdatomic.h>`` and the |
| 38 C++11 header ``<atomic>``. |
| 39 |
| 40 The PNaCl toolchain supports concurrent memory accesses through legacy |
| 41 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic |
| 42 primitives. ``volatile`` memory accesses can also be used, though these |
| 43 are discouraged. See `Volatile Memory Accesses`_. |
| 44 |
| 45 PNaCl supports concurrency and parallelism with some restrictions: |
| 46 |
| 47 * Threading is explicitly supported and has no restrictions over what |
| 48 prevalent implementations offer. See `Threading`_. |
| 49 |
| 50 * ``volatile`` and atomic operations are address-free (operations on the |
| 51 same memory location via two different addresses work atomically), as |
| 52 intended by the C11/C++11 standards. This is critical in supporting |
| 53 synchronous "external modifications" such as mapping underlying memory |
| 54 at multiple locations. |
| 55 |
| 56 * Inter-process communication through shared memory is currently not |
| 57 supported. See `Future Directions`_. |
| 58 |
| 59 * Signal handling isn't supported, PNaCl therefore promotes all |
| 60 primitives to cross-thread (instead of single-thread). This may change |
| 61 at a later date. Note that using atomic operations which aren't |
| 62 lock-free may lead to deadlocks when handling asynchronous |
| 63 signals. See `Future Directions`_. |
| 64 |
| 65 * Direct interaction with device memory isn't supported, and there is no |
| 66 intent to support it. The embedding sandbox's runtime can offer APIs |
| 67 to indirectly access devices. |
| 68 |
| 69 Setting up the above mechanisms requires assistance from the embedding |
| 70 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup |
| 71 can be done through regular C/C++ code. |
| 72 |
| 73 Atomic Memory Ordering Constraints |
| 74 ---------------------------------- |
| 75 |
| 76 Atomics follow the same ordering constraints as in regular C11/C++11, |
| 77 but all accesses are promoted to sequential consistency (the strongest |
| 78 memory ordering) at pexe creation time. As more C11/C++11 code allows us |
| 79 to understand performance and portability needs we intend to support the |
| 80 full gamut of C11/C++11 memory orderings: |
| 81 |
| 82 - Relaxed: no operation orders memory. |
| 83 - Consume: a load operation performs a consume operation on the affected |
| 84 memory location (note: currently unsupported by LLVM). |
| 85 - Acquire: a load operation performs an acquire operation on the |
| 86 affected memory location. |
| 87 - Release: a store operation performs a release operation on the |
| 88 affected memory location. |
| 89 - Acquire-release: load and store operations perform acquire and release |
| 90 operations on the affected memory. |
| 91 - Sequentially consistent: same as acquire-release, but providing a |
| 92 global total ordering for all affected locations. |
| 93 |
| 94 As in C11/C++11: |
| 95 |
| 96 - Atomic accesses must at least be naturally aligned. |
| 97 - Some accesses may not actually be atomic on certain platforms, |
| 98 requiring an implementation that uses global lock(s). |
| 99 - An atomic memory location must always be accessed with atomic |
| 100 primitives, and these primitives must always be of the same bit size |
| 101 for that location. |
| 102 - Not all memory orderings are valid for all atomic operations. |
| 103 |
17 Volatile Memory Accesses | 104 Volatile Memory Accesses |
18 ------------------------ | 105 ------------------------ |
19 | 106 |
20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program | 107 The C11/C++11 standards mandate that ``volatile`` accesses execute in |
21 order (but are not fences, so other memory operations can reorder around them), | 108 program order (but are not fences, so other memory operations can |
22 are not necessarily atomic, and can’t be elided. They can be separated into | 109 reorder around them), are not necessarily atomic, and can’t be |
23 smaller width accesses. | 110 elided. They can be separated into smaller width accesses. |
24 | 111 |
25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines, | 112 Before any optimizations occur the PNaCl toolchain transforms |
26 and it further prevents any load/store (even non-``volatile`` and non-atomic | 113 ``volatile`` loads and stores into sequentially consistent ``volatile`` |
27 ones) from moving above or below a volatile operations: they act as compiler | 114 atomic loads and stores, and applies regular compiler optimizations |
28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` | 115 along the above guidelines. This orders ``volatiles`` according to the |
29 accesses after optimizations into atomic accesses with sequentially consistent | 116 atomic rules, and means that fences (including ``__sync_synchronize``) |
30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and | 117 act in a better-defined manner. Regular memory accesses still do not |
31 combined with builtin fences these programs can do meaningful cross-thread | 118 have ordering guarantees with ``volatile`` and atomic accesses, though |
32 communication without changing code. It also reflects the original code's intent | 119 the internal representation of ``__sync_synchronize`` attempts to |
33 and guarantees better portability. | 120 prevent reordering of memory accesses to objects which may escape. |
34 | 121 |
35 Relaxed ordering could be used instead, but for the first release it is more | 122 Relaxed ordering could be used instead, but for the first release it is |
36 conservative to apply sequential consistency. Future releases may change what | 123 more conservative to apply sequential consistency. Future releases may |
37 happens at compile-time, but already-released pexes will continue using | 124 change what happens at compile-time, but already-released pexes will |
38 sequential consistency. | 125 continue using sequential consistency. |
39 | 126 |
40 The PNaCl toolchain also requires that ``volatile`` accesses be at least | 127 The PNaCl toolchain also requires that ``volatile`` accesses be at least |
41 naturally aligned, and tries to guarantee this alignment. | 128 naturally aligned, and tries to guarantee this alignment. |
42 | 129 |
43 Memory Model for Concurrent Operations | 130 The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
44 -------------------------------------- | 131 code, and combined with builtin fences these programs can do meaningful |
| 132 cross-thread communication without changing code. They also better |
| 133 reflect the original code's intent and guarantee better portability. |
45 | 134 |
46 The memory model offered by PNaCl relies on the same coding guidelines as the | 135 Threading |
47 C11/C++11 one: concurrent accesses must always occur through atomic primitives | 136 ========= |
48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and | |
49 these accesses must always occur with the same size for the same memory | |
50 location. Visibility of stores is provided on a happens-before basis that | |
51 relates memory locations to each other as the C11/C++11 standards do. | |
52 | 137 |
53 As in C11/C++11 some atomic accesses may be implemented with locks on certain | 138 Threading is explicitly supported through C11/C++11's threading |
54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying | 139 libraries as well as POSIX threads. |
55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return | |
56 the current platform's implementation at translation time. | |
57 | 140 |
58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style | 141 Communication between threads should use atomic primitives as described |
59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. | 142 in `Memory Model and Atomics`_. |
60 ``volatile`` memory accesses can also be used, though these are discouraged, and | |
61 aren't present in bitcode. | |
62 | |
63 PNaCl supports concurrency and parallelism with some restrictions: | |
64 | |
65 * Threading is explicitly supported. | |
66 | |
67 * Inter-process communication through shared memory is limited to operations | |
68 which are lock-free on the current platform (``is_lock_free`` methods). This | |
69 may change at a later date. | |
70 | |
71 * Direct interaction with device memory isn't supported. | |
72 | |
73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to | |
74 cross-thread (instead of single-thread). This may change at a later date. Note | |
75 that using atomic operations which aren't lock-free may lead to deadlocks when | |
76 handling asynchronous signals. | |
77 | |
78 * ``volatile`` and atomic operations are address-free (operations on the same | |
79 memory location via two different addresses work atomically), as intended by | |
80 the C11/C++11 standards. This is critical for inter-process communication as | |
81 well as synchronous "external modifications" such as mapping underlying memory | |
82 at multiple locations. | |
83 | |
84 Setting up the above mechanisms requires assistance from the embedding sandbox's | |
85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through | |
86 regular C/C++ code. | |
87 | |
88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally | |
89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as | |
90 all atomic accesses to be sequentially consistent. Other memory orderings will | |
91 be supported in a future release, but pexes generated with the current toolchain | |
92 will continue functioning with sequential consistency. Using sequential | |
93 consistency provides a total ordering for all sequentially-consistent operations | |
94 on all addresses. | |
95 | |
96 This means that ``volatile`` and atomic memory accesses can only be re-ordered | |
97 in some limited way before the pexe is created, and will act as fences for all | |
98 memory accesses (even non-atomic and non-``volatile``) after pexe creation. | |
99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence | |
100 intervenes), separated, elided or fused according to C and C++'s memory model | |
101 before the pexe is created as well as after its creation. | |
102 | |
103 Atomic Memory Ordering Constraints | |
104 ---------------------------------- | |
105 | |
106 Atomics follow the same ordering constraints as in regular LLVM, but | |
107 all accesses are promoted to sequential consistency (the strongest | |
108 memory ordering) at pexe creation time. As more C11/C++11 code | |
109 allows us to understand performance and portability needs we intend | |
110 to support the full gamut of C11/C++11 memory orderings: | |
111 | |
112 - Relaxed: no operation orders memory. | |
113 - Consume: a load operation performs a consume operation on the affected memory | |
114 location (currently unsupported by LLVM). | |
115 - Acquire: a load operation performs an acquire operation on the affected memory | |
116 location. | |
117 - Release: a store operation performs a release operation on the affected memory | |
118 location. | |
119 - Acquire-release: load and store operations perform acquire and release | |
120 operations on the affected memory. | |
121 - Sequentially consistent: same as acquire-release, but providing a global total | |
122 ordering for all affected locations. | |
123 | |
124 As in C11/C++11: | |
125 | |
126 - Atomic accesses must at least be naturally aligned. | |
127 - Some accesses may not actually be atomic on certain platforms, requiring an | |
128 implementation that uses a global lock. | |
129 - An atomic memory location must always be accessed with atomic primitives, and | |
130 these primitives must always be of the same bit size for that location. | |
131 - Not all memory orderings are valid for all atomic operations. | |
132 | 143 |
133 Inline Assembly | 144 Inline Assembly |
134 =============== | 145 =============== |
135 | 146 |
136 Inline assembly isn't supported by PNaCl because it isn't portable. The | 147 Inline assembly isn't supported by PNaCl because it isn't portable. The |
137 one current exception is the common compiler barrier idiom | 148 one current exception is the common compiler barrier idiom |
138 ``asm("":::"memory")``, which gets transformed to a sequentially | 149 ``asm("":::"memory")``, which gets transformed to a sequentially |
139 consistent memory barrier (equivalent to ``__sync_synchronize()``). | 150 consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
| 151 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic |
| 152 memory accesses, though in practice the implementation attempts to also |
| 153 prevent reordering of memory accesses to objects which may escape. |
| 154 |
| 155 Future Directions |
| 156 ================= |
| 157 |
| 158 Inter-Process Communication |
| 159 --------------------------- |
| 160 |
| 161 Inter-process communication through shared memory is currently not |
| 162 supported by PNaCl. When implemented, it may be limited to operations |
| 163 which are lock-free on the current platform (``is_lock_free`` |
| 164 methods). It will rely on the address-free properly discussed in `Memory |
| 165 Model for Concurrent Operations`_. |
| 166 |
| 167 Signal Handling |
| 168 --------------- |
| 169 |
| 170 Untrusted signal handling currently isn't supported by PNaCl. When |
| 171 supported, the impact of ``volatile`` and atomics for same-thread signal |
| 172 handling will need to be carefully detailed. |
OLD | NEW |