OLD | NEW |
---|---|
1 ======================= | 1 ======================= |
2 PNaCl Developer's Guide | 2 PNaCl Developer's Guide |
3 ======================= | 3 ======================= |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
11 | 11 |
12 TODO | 12 TODO |
13 | 13 |
14 Memory Model and Atomics | 14 Memory Model and Atomics |
15 ======================== | 15 ======================== |
16 | 16 |
17 Memory Model for Concurrent Operations | |
18 -------------------------------------- | |
19 | |
20 The memory model offered by PNaCl relies on the same coding guidelines | |
21 as the C11/C++11 one: concurrent accesses must always occur through | |
22 atomic primitives (offered by `atomic intrinsics | |
23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always | |
24 occur with the same size for the same memory location. Visibility of | |
25 stores is provided on a happens-before basis that relates memory | |
26 locations to each other as the C11/C++11 standards do. | |
27 | |
28 As in C11/C++11 some atomic accesses may be implemented with locks on | |
29 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be | |
30 ``1``, signifying that all types are sometimes lock-free. The | |
31 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the | |
32 current platform's implementation at translation time. These macros, | |
33 methods and functions are in the C11 header ``<stdatomic.h>`` and the | |
34 C++11 header ``<atomic>``. | |
35 | |
36 The PNaCl toolchain supports concurrent memory accesses through legacy | |
37 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic | |
38 primitives. ``volatile`` memory accesses can also be used, though these | |
39 are discouraged, and aren't present in bitcode. See `Volatile Memory | |
eliben
2013/08/05 18:35:54
Remove the "aren't present in bitcode" - i don't t
jvoung (off chromium)
2013/08/05 18:59:41
Yeah, this is the summary for developers. What is
JF
2013/08/05 20:37:48
Done.
| |
40 Accesses`_. | |
41 | |
42 PNaCl supports concurrency and parallelism with some restrictions: | |
43 | |
44 * Threading is explicitly supported through C11/C++11's threading | |
45 libraries as well as POSIX threads. | |
46 | |
47 * Inter-process communication through shared memory is limited to | |
eliben
2013/08/05 18:35:54
What does inter-process communication even mean in
JF
2013/08/05 20:37:48
Done.
| |
48 operations which are lock-free on the current platform | |
49 (``is_lock_free`` methods). This may change at a later date. | |
50 | |
51 * Direct interaction with device memory isn't supported. | |
52 | |
53 * Signal handling isn't supported, PNaCl therefore promotes all | |
54 primitives to cross-thread (instead of single-thread). This may change | |
55 at a later date. Note that using atomic operations which aren't | |
56 lock-free may lead to deadlocks when handling asynchronous signals. | |
57 | |
58 * ``volatile`` and atomic operations are address-free (operations on the | |
59 same memory location via two different addresses work atomically), as | |
60 intended by the C11/C++11 standards. This is critical for | |
61 inter-process communication as well as synchronous "external | |
62 modifications" such as mapping underlying memory at multiple | |
63 locations. | |
64 | |
65 Setting up the above mechanisms requires assistance from the embedding | |
66 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup | |
67 can be done through regular C/C++ code. | |
68 | |
69 Atomic Memory Ordering Constraints | |
70 ---------------------------------- | |
71 | |
72 Atomics follow the same ordering constraints as in regular LLVM, but all | |
73 accesses are promoted to sequential consistency (the strongest memory | |
74 ordering) at pexe creation time. As more C11/C++11 code allows us to | |
jvoung (off chromium)
2013/08/05 18:59:41
Should the memory orderings change also be done by
JF
2013/08/05 20:37:48
I think the current implementation should offer a
| |
75 understand performance and portability needs we intend to support the | |
76 full gamut of C11/C++11 memory orderings: | |
77 | |
78 - Relaxed: no operation orders memory. | |
eliben
2013/08/05 18:35:54
Maybe this list does not belong here? This is user
JF
2013/08/05 20:37:48
This is not an addition, I just moved it around. I
| |
79 - Consume: a load operation performs a consume operation on the affected | |
80 memory location (currently unsupported by LLVM). | |
81 - Acquire: a load operation performs an acquire operation on the | |
82 affected memory location. | |
83 - Release: a store operation performs a release operation on the | |
84 affected memory location. | |
85 - Acquire-release: load and store operations perform acquire and release | |
86 operations on the affected memory. | |
87 - Sequentially consistent: same as acquire-release, but providing a | |
88 global total ordering for all affected locations. | |
89 | |
90 As in C11/C++11: | |
91 | |
92 - Atomic accesses must at least be naturally aligned. | |
93 - Some accesses may not actually be atomic on certain platforms, | |
94 requiring an implementation that uses global lock(s). | |
95 - An atomic memory location must always be accessed with atomic | |
96 primitives, and these primitives must always be of the same bit size | |
97 for that location. | |
98 - Not all memory orderings are valid for all atomic operations. | |
99 | |
17 Volatile Memory Accesses | 100 Volatile Memory Accesses |
18 ------------------------ | 101 ------------------------ |
19 | 102 |
20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program | 103 The C11/C++11 standards mandate that ``volatile`` accesses execute in |
21 order (but are not fences, so other memory operations can reorder around them), | 104 program order (but are not fences, so other memory operations can |
22 are not necessarily atomic, and can’t be elided. They can be separated into | 105 reorder around them), are not necessarily atomic, and can’t be |
23 smaller width accesses. | 106 elided. They can be separated into smaller width accesses. |
24 | 107 |
25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines, | 108 Before any optimizations occur the PNaCl toolchain transforms |
26 and it further prevents any load/store (even non-``volatile`` and non-atomic | 109 ``volatile`` loads and stores into sequentially consistent ``volatile`` |
27 ones) from moving above or below a volatile operations: they act as compiler | 110 atomic loads and stores, and applies regular LLVM optimizations along |
28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile`` | 111 the above guidelines. This orders ``volatiles`` according to the atomic |
29 accesses after optimizations into atomic accesses with sequentially consistent | 112 rules, and means that fences (including ``__sync_synchronize``) act in a |
30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and | 113 better-defined manner. Regular memory accesses still do not have |
31 combined with builtin fences these programs can do meaningful cross-thread | 114 ordering guarantees with ``volatile`` and atomic accesses, though the |
32 communication without changing code. It also reflects the original code's intent | 115 internal representation of ``__sync_synchronize`` attempts to prevent |
33 and guarantees better portability. | 116 reordering of memory accesses to objects which may escape. |
34 | 117 |
35 Relaxed ordering could be used instead, but for the first release it is more | 118 Relaxed ordering could be used instead, but for the first release it is |
36 conservative to apply sequential consistency. Future releases may change what | 119 more conservative to apply sequential consistency. Future releases may |
37 happens at compile-time, but already-released pexes will continue using | 120 change what happens at compile-time, but already-released pexes will |
38 sequential consistency. | 121 continue using sequential consistency. |
39 | 122 |
40 The PNaCl toolchain also requires that ``volatile`` accesses be at least | 123 The PNaCl toolchain also requires that ``volatile`` accesses be at least |
41 naturally aligned, and tries to guarantee this alignment. | 124 naturally aligned, and tries to guarantee this alignment. |
42 | 125 |
43 Memory Model for Concurrent Operations | 126 The above guarantees ease the support of legacy (i.e. non-C11/C++11) |
44 -------------------------------------- | 127 code, and combined with builtin fences these programs can do meaningful |
128 cross-thread communication without changing code. They also better | |
129 reflect the original code's intent and guarantee better portability. | |
45 | 130 |
46 The memory model offered by PNaCl relies on the same coding guidelines as the | 131 Stable Transfer Format |
47 C11/C++11 one: concurrent accesses must always occur through atomic primitives | 132 ---------------------- |
48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and | |
49 these accesses must always occur with the same size for the same memory | |
50 location. Visibility of stores is provided on a happens-before basis that | |
51 relates memory locations to each other as the C11/C++11 standards do. | |
52 | 133 |
53 As in C11/C++11 some atomic accesses may be implemented with locks on certain | 134 The PNaCl toolchain freezes atomic and ``volatile`` memory accesses |
54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying | 135 after optimizations into atomic accesses with sequentially consistent |
jvoung (off chromium)
2013/08/05 18:59:41
re "after optimizations": volatiles get converted
JF
2013/08/05 20:37:48
OK, I can remove this section.
| |
55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return | 136 memory ordering. Other memory orderings will be exposed in future |
56 the current platform's implementation at translation time. | 137 releases, when we have a better grasp of existing code's needs, |
138 portability implications, and are confident that implementation limits | |
139 are overcome. Future releases may change what happens at compile-time, | |
140 but already-released pexes will continue using sequential consistency. | |
57 | 141 |
58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style | 142 Non-atomic and non-``volatile`` memory accesses may be reordered, |
59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives. | 143 separated, elided or fused according to C and C++'s memory model before |
60 ``volatile`` memory accesses can also be used, though these are discouraged, and | 144 the pexe is created as well as after its creation. |
61 aren't present in bitcode. | |
62 | |
63 PNaCl supports concurrency and parallelism with some restrictions: | |
64 | |
65 * Threading is explicitly supported. | |
66 | |
67 * Inter-process communication through shared memory is limited to operations | |
68 which are lock-free on the current platform (``is_lock_free`` methods). This | |
69 may change at a later date. | |
70 | |
71 * Direct interaction with device memory isn't supported. | |
72 | |
73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to | |
74 cross-thread (instead of single-thread). This may change at a later date. Note | |
75 that using atomic operations which aren't lock-free may lead to deadlocks when | |
76 handling asynchronous signals. | |
77 | |
78 * ``volatile`` and atomic operations are address-free (operations on the same | |
79 memory location via two different addresses work atomically), as intended by | |
80 the C11/C++11 standards. This is critical for inter-process communication as | |
81 well as synchronous "external modifications" such as mapping underlying memory | |
82 at multiple locations. | |
83 | |
84 Setting up the above mechanisms requires assistance from the embedding sandbox's | |
85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through | |
86 regular C/C++ code. | |
87 | |
88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally | |
89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as | |
90 all atomic accesses to be sequentially consistent. Other memory orderings will | |
91 be supported in a future release, but pexes generated with the current toolchain | |
92 will continue functioning with sequential consistency. Using sequential | |
93 consistency provides a total ordering for all sequentially-consistent operations | |
94 on all addresses. | |
95 | |
96 This means that ``volatile`` and atomic memory accesses can only be re-ordered | |
97 in some limited way before the pexe is created, and will act as fences for all | |
98 memory accesses (even non-atomic and non-``volatile``) after pexe creation. | |
99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence | |
100 intervenes), separated, elided or fused according to C and C++'s memory model | |
101 before the pexe is created as well as after its creation. | |
102 | |
103 Atomic Memory Ordering Constraints | |
104 ---------------------------------- | |
105 | |
106 Atomics follow the same ordering constraints as in regular LLVM, but | |
107 all accesses are promoted to sequential consistency (the strongest | |
108 memory ordering) at pexe creation time. As more C11/C++11 code | |
109 allows us to understand performance and portability needs we intend | |
110 to support the full gamut of C11/C++11 memory orderings: | |
111 | |
112 - Relaxed: no operation orders memory. | |
113 - Consume: a load operation performs a consume operation on the affected memory | |
114 location (currently unsupported by LLVM). | |
115 - Acquire: a load operation performs an acquire operation on the affected memory | |
116 location. | |
117 - Release: a store operation performs a release operation on the affected memory | |
118 location. | |
119 - Acquire-release: load and store operations perform acquire and release | |
120 operations on the affected memory. | |
121 - Sequentially consistent: same as acquire-release, but providing a global total | |
122 ordering for all affected locations. | |
123 | |
124 As in C11/C++11: | |
125 | |
126 - Atomic accesses must at least be naturally aligned. | |
127 - Some accesses may not actually be atomic on certain platforms, requiring an | |
128 implementation that uses a global lock. | |
129 - An atomic memory location must always be accessed with atomic primitives, and | |
130 these primitives must always be of the same bit size for that location. | |
131 - Not all memory orderings are valid for all atomic operations. | |
132 | 145 |
133 Inline Assembly | 146 Inline Assembly |
134 =============== | 147 =============== |
135 | 148 |
136 Inline assembly isn't supported by PNaCl because it isn't portable. The | 149 Inline assembly isn't supported by PNaCl because it isn't portable. The |
137 one current exception is the common compiler barrier idiom | 150 one current exception is the common compiler barrier idiom |
138 ``asm("":::"memory")``, which gets transformed to a sequentially | 151 ``asm("":::"memory")``, which gets transformed to a sequentially |
139 consistent memory barrier (equivalent to ``__sync_synchronize()``). | 152 consistent memory barrier (equivalent to ``__sync_synchronize()``). In |
153 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic | |
154 memory accesses, though in practice the implementation attempts to also | |
155 prevent reordering of memory accesses to objects which may escape. | |
OLD | NEW |