OLD | NEW |
---|---|
1 ======================= | 1 ======================= |
2 PNaCl Developer's Guide | 2 PNaCl Developer's Guide |
3 ======================= | 3 ======================= |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
11 | 11 |
12 TODO | 12 TODO |
13 | 13 |
14 Memory Model and Atomics | 14 Memory Model and Atomics |
15 ======================== | 15 ======================== |
16 | 16 |
17 Memory Model for Concurrent Operations | |
18 -------------------------------------- | |
19 | |
20 The memory model offered by PNaCl relies on the same coding guidelines | |
21 as the C11/C++11 one: concurrent accesses must always occur through | |
22 atomic primitives (offered by `atomic intrinsics | |
23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always | |
24 occur with the same size for the same memory location. Visibility of | |
25 stores is provided on a happens-before basis that relates memory | |
26 locations to each other as the C11/C++11 standards do. | |
27 | |
28 Non-atomic memory accesses may be reordered, separated, elided or fused | |
29 according to C and C++'s memory model before the pexe is created as well | |
30 as after its creation. | |
31 | |
32 As in C11/C++11 some atomic accesses may be implemented with locks on | |
33 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be | |
34 ``1``, signifying that all types are sometimes lock-free. The | |
35 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the | |
36 current platform's implementation at translation time. These macros, | |
37 methods and functions are in the C11 header ``<stdatomic.h>`` and the | |
38 C++11 header ``<atomic>``. | |
39 | |
40 The PNaCl toolchain supports concurrent memory accesses through legacy | |
41 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic | |
42 primitives. ``volatile`` memory accesses can also be used, though these | |
43 are discouraged. See `Volatile Memory Accesses`_. | |
44 | |
45 PNaCl supports concurrency and parallelism with some restrictions: | |
46 | |
47 * Threading is explicitly supported and has no restrictions over what | |
48 prevalent implementations offer. See `Threading`_. | |
49 | |
50 * ``volatile`` and atomic operations are address-free (operations on the | |
51 same memory location via two different addresses work atomically), as | |
52 intended by the C11/C++11 standards. This is critical for | |
53 inter-process communication as well as synchronous "external | |
54 modifications" such as mapping underlying memory at multiple | |
55 locations. | |
56 | |
57 * Inter-process communication through shared memory is currently not | |
58 supported. See `Future Direction`_. | |
59 | |
60 * Signal handling isn't supported, PNaCl therefore promotes all | |
61 primitives to cross-thread (instead of single-thread). This may change | |
62 at a later date. Note that using atomic operations which aren't | |
63 lock-free may lead to deadlocks when handling asynchronous | |
64 signals. See `Future Direction`_. | |
65 | |
66 * Direct interaction with device memory isn't supported, and there is no | |
67 intent to support it. The embedding sandbox's runtime can offer APIs | |
68 to indirectly access devices. | |
69 | |
70 Setting up the above mechanisms requires assistance from the embedding | |
71 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup | |
72 can be done through regular C/C++ code. | |
73 | |
74 Atomic Memory Ordering Constraints | |
75 ---------------------------------- | |
76 | |
77 Atomics follow the same ordering constraints as in regular C11/C++11, | |
78 but all accesses are promoted to sequential consistency (the strongest | |
79 memory ordering) at pexe creation time. As more C11/C++11 code allows us | |
80 to understand performance and portability needs we intend to support the | |
81 full gamut of C11/C++11 memory orderings: | |
82 | |
83 - Relaxed: no operation orders memory. | |
84 - Consume: a load operation performs a consume operation on the affected | |
85 memory location (note: currently unsupported by LLVM). | |
86 - Acquire: a load operation performs an acquire operation on the | |
87 affected memory location. | |
88 - Release: a store operation performs a release operation on the | |
89 affected memory location. | |
90 - Acquire-release: load and store operations perform acquire and release | |
91 operations on the affected memory. | |
92 - Sequentially consistent: same as acquire-release, but providing a | |
93 global total ordering for all affected locations. | |
94 | |
95 As in C11/C++11: | |
96 | |
97 - Atomic accesses must at least be naturally aligned. | |
98 - Some accesses may not actually be atomic on certain platforms, | |
99 requiring an implementation that uses global lock(s). | |
100 - An atomic memory location must always be accessed with atomic | |
101 primitives, and these primitives must always be of the same bit size | |
102 for that location. | |
103 - Not all memory orderings are valid for all atomic operations. | |
104 | |
17 Volatile Memory Accesses | 105 Volatile Memory Accesses |
18 ------------------------ | 106 ------------------------ |
19 | 107 |
20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program | 108 The C11/C++11 standards mandate that ``volatile`` accesses execute in |
21 order (but are not fences, so other memory operations can reorder around them), | 109 program order (but are not fences, so other memory operations can |
22 are not necessarily atomic, and can’t be elided. They can be separated into | 110 reorder around them), are not necessarily atomic, and can’t be |
23 smaller width accesses. | 111 elided. They can be separated into smaller width accesses. |
24 | 112 |
25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines, | 113 Before any optimizations occur the PNaCl toolchain transforms |
26 and it further prevents any load/store (even non-``volatile`` and non-atomic | 114 ``volatile`` loads and stores into sequentially consistent ``volatile`` |
27 ones) from moving above or below a volatile operations: they act as compiler | 115 atomic loads and stores, and applies regular compiler optimizations |
28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` | 116 along the above guidelines. This orders ``volatiles`` according to the |
29 accesses after optimizations into atomic accesses with sequentially consistent | 117 atomic rules, and means that fences (including ``__sync_synchronize``) |
30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and | 118 act in a better-defined manner. Regular memory accesses still do not |
31 combined with builtin fences these programs can do meaningful cross-thread | 119 have ordering guarantees with ``volatile`` and atomic accesses, though |
32 communication without changing code. It also reflects the original code's intent | 120 the internal representation of ``__sync_synchronize`` attempts to |
33 and guarantees better portability. | 121 prevent reordering of memory accesses to objects which may escape. |
34 | 122 |
35 Relaxed ordering could be used instead, but for the first release it is more | 123 Relaxed ordering could be used instead, but for the first release it is |
36 conservative to apply sequential consistency. Future releases may change what | 124 more conservative to apply sequential consistency. Future releases may |
37 happens at compile-time, but already-released pexes will continue using | 125 change what happens at compile-time, but already-released pexes will |
38 sequential consistency. | 126 continue using sequential consistency. |
39 | 127 |
40 The PNaCl toolchain also requires that ``volatile`` accesses be at least | 128 The PNaCl toolchain also requires that ``volatile`` accesses be at least |
41 naturally aligned, and tries to guarantee this alignment. | 129 naturally aligned, and tries to guarantee this alignment. |
42 | 130 |
43 Memory Model for Concurrent Operations | 131 The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
44 -------------------------------------- | 132 code, and combined with builtin fences these programs can do meaningful |
133 cross-thread communication without changing code. They also better | |
134 reflect the original code's intent and guarantee better portability. | |
45 | 135 |
46 The memory model offered by PNaCl relies on the same coding guidelines as the | 136 Threading |
47 C11/C++11 one: concurrent accesses must always occur through atomic primitives | 137 ========= |
48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and | |
49 these accesses must always occur with the same size for the same memory | |
50 location. Visibility of stores is provided on a happens-before basis that | |
51 relates memory locations to each other as the C11/C++11 standards do. | |
52 | 138 |
53 As in C11/C++11 some atomic accesses may be implemented with locks on certain | 139 Threading is explicitly supported through C11/C++11's threading |
54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying | 140 libraries as well as POSIX threads. |
55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return | |
56 the current platform's implementation at translation time. | |
57 | 141 |
58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style | 142 Communication between threads should use atomic primitives as described |
59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. | 143 in `Memory Model and Atomics`_. |
60 ``volatile`` memory accesses can also be used, though these are discouraged, and | |
61 aren't present in bitcode. | |
62 | |
63 PNaCl supports concurrency and parallelism with some restrictions: | |
64 | |
65 * Threading is explicitly supported. | |
66 | |
67 * Inter-process communication through shared memory is limited to operations | |
68 which are lock-free on the current platform (``is_lock_free`` methods). This | |
69 may change at a later date. | |
70 | |
71 * Direct interaction with device memory isn't supported. | |
72 | |
73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to | |
74 cross-thread (instead of single-thread). This may change at a later date. Note | |
75 that using atomic operations which aren't lock-free may lead to deadlocks when | |
76 handling asynchronous signals. | |
77 | |
78 * ``volatile`` and atomic operations are address-free (operations on the same | |
79 memory location via two different addresses work atomically), as intended by | |
80 the C11/C++11 standards. This is critical for inter-process communication as | |
81 well as synchronous "external modifications" such as mapping underlying memory | |
82 at multiple locations. | |
83 | |
84 Setting up the above mechanisms requires assistance from the embedding sandbox's | |
85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through | |
86 regular C/C++ code. | |
87 | |
88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally | |
89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as | |
90 all atomic accesses to be sequentially consistent. Other memory orderings will | |
91 be supported in a future release, but pexes generated with the current toolchain | |
92 will continue functioning with sequential consistency. Using sequential | |
93 consistency provides a total ordering for all sequentially-consistent operations | |
94 on all addresses. | |
95 | |
96 This means that ``volatile`` and atomic memory accesses can only be re-ordered | |
97 in some limited way before the pexe is created, and will act as fences for all | |
98 memory accesses (even non-atomic and non-``volatile``) after pexe creation. | |
99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence | |
100 intervenes), separated, elided or fused according to C and C++'s memory model | |
101 before the pexe is created as well as after its creation. | |
102 | |
103 Atomic Memory Ordering Constraints | |
104 ---------------------------------- | |
105 | |
106 Atomics follow the same ordering constraints as in regular LLVM, but | |
107 all accesses are promoted to sequential consistency (the strongest | |
108 memory ordering) at pexe creation time. As more C11/C++11 code | |
109 allows us to understand performance and portability needs we intend | |
110 to support the full gamut of C11/C++11 memory orderings: | |
111 | |
112 - Relaxed: no operation orders memory. | |
113 - Consume: a load operation performs a consume operation on the affected memory | |
114 location (currently unsupported by LLVM). | |
115 - Acquire: a load operation performs an acquire operation on the affected memory | |
116 location. | |
117 - Release: a store operation performs a release operation on the affected memory | |
118 location. | |
119 - Acquire-release: load and store operations perform acquire and release | |
120 operations on the affected memory. | |
121 - Sequentially consistent: same as acquire-release, but providing a global total | |
122 ordering for all affected locations. | |
123 | |
124 As in C11/C++11: | |
125 | |
126 - Atomic accesses must at least be naturally aligned. | |
127 - Some accesses may not actually be atomic on certain platforms, requiring an | |
128 implementation that uses a global lock. | |
129 - An atomic memory location must always be accessed with atomic primitives, and | |
130 these primitives must always be of the same bit size for that location. | |
131 - Not all memory orderings are valid for all atomic operations. | |
132 | 144 |
133 Inline Assembly | 145 Inline Assembly |
134 =============== | 146 =============== |
135 | 147 |
136 Inline assembly isn't supported by PNaCl because it isn't portable. The | 148 Inline assembly isn't supported by PNaCl because it isn't portable. The |
137 one current exception is the common compiler barrier idiom | 149 one current exception is the common compiler barrier idiom |
138 ``asm("":::"memory")``, which gets transformed to a sequentially | 150 ``asm("":::"memory")``, which gets transformed to a sequentially |
139 consistent memory barrier (equivalent to ``__sync_synchronize()``). | 151 consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
152 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic | |
153 memory accesses, though in practice the implementation attempts to also | |
154 prevent reordering of memory accesses to objects which may escape. | |
155 | |
156 Future Direction | |
eliben
2013/08/06 16:11:35
"Future Directions" ?
JF
2013/08/06 16:19:33
Done.
| |
157 ================ | |
158 | |
159 Inter-Process Communication | |
160 --------------------------- | |
161 | |
162 Inter-process communication through shared memory is currently not | |
163 supported by PNaCl. When implemented, it may be limited to operations | |
164 which are lock-free on the current platform (``is_lock_free`` methods). | |
165 | |
166 Signal Handling | |
167 --------------- | |
168 | |
169 Untrusted signal handling currently isn't supported by PNaCl. When | |
170 supported, the impact of ``volatile`` and atomics for same-thread signal | |
171 handling will need to be carefully detailed. | |
OLD | NEW |