OLD | NEW |
---|---|
1 ============================== | 1 ============================== |
2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
3 ============================== | 3 ============================== |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
(...skipping 95 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
106 | 106 |
107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
108 | 108 |
109 PNaCl bitcode does not support inline assembly. | 109 PNaCl bitcode does not support inline assembly. |
110 | 110 |
111 Volatile Memory Accesses | 111 Volatile Memory Accesses |
112 ------------------------ | 112 ------------------------ |
113 | 113 |
114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
115 | 115 |
116 TODO: are we going to promote volatile to atomic? | 116 PNaCl bitcode does not support volatile memory accesses. |
jvoung (off chromium)
2013/07/03 17:50:26
Not sure what the original doc for this looked lik
JF
2013/07/03 22:28:30
The text originally explained more, and I moved th
| |
117 | |
118 .. note:: | |
119 | |
120 The C11/C++11 standards mandate that ``volatile`` accesses execute | |
121 in program order (but are not fences, so other memory operations can | |
122 reorder around them), are not necessarily atomic, and can’t be | |
123 elided or fused. | |
124 | |
125 The PNaCl toolchain applies regular LLVM optimizations along these | |
126 guidelines, and it further prevents any load/store (even | |
127 non-``volatile`` and non-atomic ones) from moving above or below a | |
128 volatile operations: they act as compiler barriers before | |
129 optimizations occur. The PNaCl toolchain freezes ``volatile`` | |
130 accesses after optimizations into atomic accesses with sequentially | |
131 consistent memory ordering. This eases the support of legacy | |
132 (i.e. non-C11/C++11) code, and combined with builtin fences these | |
133 programs can do meaningful cross-thread communication without | |
134 changing code. It also reflects the original code's intent and | |
135 guarantees better portability. | |
136 | |
137 Relaxed ordering could be used instead, but for the first release it | |
138 is more conservative to apply sequential consistency. Future | |
139 releases may change what happens at compile-time, but | |
140 already-released pexes will continue using sequential consistency. | |
141 | |
142 The PNaCl toolchain also requires that ``volatile`` accesses be at | |
143 least naturally aligned, and tries to guarantee this alignment. | |
117 | 144 |
118 Memory Model for Concurrent Operations | 145 Memory Model for Concurrent Operations |
119 -------------------------------------- | 146 -------------------------------------- |
120 | 147 |
121 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 148 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
122 | 149 |
123 TODO. | 150 The memory model offered by PNaCl relies on the same coding guidelines |
151 as the C11/C++11 one: concurrent accesses must always occur through | |
152 atomic primitives, and these accesses must always occur with the same | |
153 size for the same memory location. Visibility of stores is provided on a | |
154 happens-before basis that relates memory locations to each other as the | |
155 C11/C++11 standards do. | |
156 | |
157 PNaCl bitcode requires all concurrency to occur through `atomic | |
158 intrinsics`_. | |
159 | |
160 .. note:: | |
161 | |
162 As in C11/C++11 some atomic accesses may be implemented with locks | |
163 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always | |
164 be ``1``, signifying that all types are sometimes lock-free. The | |
165 ``is_lock_free`` methods will return the current platform's | |
166 implementation at runtime. | |
167 | |
168 The PNaCl toolchain supports concurrent memory accesses through | |
169 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 | |
170 atomic primitives. ``volatile`` memory accesses can also be used, | |
171 though these are discouraged, and aren't present in bitcode. | |
172 | |
173 PNaCl has varying support for concurrency and parallelism: | |
174 | |
175 * Threading is explicitly supported. | |
176 * Inter-process communication through shared memory is limited to | |
177 operations which are lock-free on the current platform | |
178 (``is_lock_free`` methods). This may change at a later date. | |
179 * Direct interaction with device memory isn't supported. | |
180 * Signal handling isn't supported, PNaCl therefore promotes all | |
181 primitives to cross-thread (instead of single-thread). This may | |
182 change at a later date. | |
183 | |
184 Setting up the above mechanisms requires assistance from the | |
185 embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using | |
186 them once setup can be done through regular C/C++ code. | |
187 | |
188 The PNaCl toolchain currently optimizes for memory ordering as LLVM | |
189 normally does, but at pexe creation time it promotes all | |
190 ``volatile`` accesses as well as all atomic accesses to be | |
191 sequentially consistent. Other memory orderings will be supported in | |
192 a future release, but pexes generated with the current toolchain | |
193 will continue functioning with sequential consistency. Using | |
194 sequential consistency provides a total ordering for all | |
195 sequentially-consistent operations on all addresses. | |
196 | |
197 This means that ``volatile`` and atomic memory accesses can only be | |
198 re-ordered in some limited way before the pexe is created, and will | |
199 act as fences for all memory accesses (even non-atomic and | |
200 non-``volatile``) after pexe creation. Non-atomic and | |
201 non-``volatile`` memory accesses may be reordered (unless a fence | |
202 intervenes), separated, elided or fused according to C and C++'s | |
203 memory model before the pexe is created as well as after its | |
204 creation. | |
124 | 205 |
125 Atomic Memory Ordering Constraints | 206 Atomic Memory Ordering Constraints |
126 ---------------------------------- | 207 ---------------------------------- |
127 | 208 |
128 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 209 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
129 | 210 |
130 TODO. | 211 PNaCl bitcode currently supports sequential consistency only, through |
212 its `atomic intrinsics`_. | |
213 | |
214 .. note:: | |
215 | |
216 Atomics follow the same ordering constraints as in regular LLVM, but | |
217 all accesses are promoted to sequential consistency (the strongest | |
218 memory ordering) at pexe creation time. As more C11/C++11 code | |
219 allows us to understand performance and portability needs we intend | |
220 to support the full gamut of C11/C++11 memory orderings: | |
221 | |
222 - Relaxed: no operation orders memory. | |
223 - Consume: a load operation performs a consume operation on the | |
224 affected memory location (currently unsupported by LLVM). | |
225 - Acquire: a load operation performs an acquire operation on the | |
226 affected memory location. | |
227 - Release: a store operation performs a release operation on the | |
228 affected memory location. | |
229 - Acquire-release: load and store operations perform acquire and | |
230 release operations on the affected memory. | |
231 - Sequentially consistent: same as acquire-release, but providing | |
232 a global total ordering for all affected locations. | |
233 | |
234 As in C11/C++11: | |
235 | |
236 - Atomic and volatile accesses must at least be naturally aligned. | |
237 - Some accesses may not actually be atomic on certain platforms, | |
238 requiring an implementation that uses a global lock. | |
239 - An atomic memory location must always be accessed with atomic | |
240 primitives, and these primitives must always be of the same bit | |
241 size for that location. | |
242 - Not all memory orderings are valid for all atomic operations. | |
131 | 243 |
132 Fast-Math Flags | 244 Fast-Math Flags |
133 --------------- | 245 --------------- |
134 | 246 |
135 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 247 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
136 | 248 |
137 Fast-math mode is not currently supported by the PNaCl bitcode. | 249 Fast-math mode is not currently supported by the PNaCl bitcode. |
138 | 250 |
139 Type System | 251 Type System |
140 =========== | 252 =========== |
(...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
270 | 382 |
271 .. code-block:: llvm | 383 .. code-block:: llvm |
272 | 384 |
273 %buf = alloca i8, i32 8, align 4 | 385 %buf = alloca i8, i32 8, align 4 |
274 | 386 |
275 * ``load``, ``store`` | 387 * ``load``, ``store`` |
276 | 388 |
277 The pointer argument of these instructions must be a *normalized* pointer | 389 The pointer argument of these instructions must be a *normalized* pointer |
278 (see :ref:`pointer types <pointertypes>`). | 390 (see :ref:`pointer types <pointertypes>`). |
279 | 391 |
280 * ``fence`` | |
281 * ``cmpxchg``, ``atomicrmw`` | |
282 | |
283 The pointer argument of these instructions must be a *normalized* pointer | |
284 (see :ref:`pointer types <pointertypes>`). | |
285 | |
286 TODO(jfb): this may change | |
287 | |
288 * ``trunc`` | 392 * ``trunc`` |
289 * ``zext`` | 393 * ``zext`` |
290 * ``sext`` | 394 * ``sext`` |
291 * ``fptrunc`` | 395 * ``fptrunc`` |
292 * ``fpext`` | 396 * ``fpext`` |
293 * ``fptoui`` | 397 * ``fptoui`` |
294 * ``fptosi`` | 398 * ``fptosi`` |
295 * ``uitofp`` | 399 * ``uitofp`` |
296 * ``sitofp`` | 400 * ``sitofp`` |
297 | 401 |
(...skipping 18 matching lines...) Expand all Loading... | |
316 * ``select`` | 420 * ``select`` |
317 * ``call`` | 421 * ``call`` |
318 | 422 |
319 Intrinsic Functions | 423 Intrinsic Functions |
320 =================== | 424 =================== |
321 | 425 |
322 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 426 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
323 | 427 |
324 The only intrinsics supported by PNaCl bitcode are the following. | 428 The only intrinsics supported by PNaCl bitcode are the following. |
325 | 429 |
326 TODO(jfb): atomics | |
327 | |
328 * ``llvm.memcpy`` | 430 * ``llvm.memcpy`` |
329 * ``llvm.memmove`` | 431 * ``llvm.memmove`` |
330 * ``llvm.memset`` | 432 * ``llvm.memset`` |
331 | 433 |
332 These intrinsics are only supported with an i32 ``len`` argument. | 434 These intrinsics are only supported with an i32 ``len`` argument. |
333 | 435 |
334 * ``llvm.bswap`` | 436 * ``llvm.bswap`` |
335 | 437 |
336 The overloaded ``llvm.bswap`` intrinsic is only supported with the following | 438 The overloaded ``llvm.bswap`` intrinsic is only supported with the following |
337 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). | 439 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). |
(...skipping 14 matching lines...) Expand all Loading... | |
352 TODO: describe | 454 TODO: describe |
353 | 455 |
354 * ``llvm.nacl.longjmp`` | 456 * ``llvm.nacl.longjmp`` |
355 | 457 |
356 TODO: describe | 458 TODO: describe |
357 | 459 |
358 * ``llvm.nacl.setjmp`` | 460 * ``llvm.nacl.setjmp`` |
359 | 461 |
360 TODO: describe | 462 TODO: describe |
361 | 463 |
464 .. _atomic intrinsics: | |
465 | |
466 * ``llvm.nacl.atomic.store`` | |
467 * ``llvm.nacl.atomic.load`` | |
468 * ``llvm.nacl.atomic.rmw`` | |
469 * ``llvm.nacl.atomic.cmpxchg`` | |
470 * ``llvm.nacl.atomic.fence`` | |
471 | |
472 .. code-block:: llvm | |
473 | |
474 declare iN @llvm.nacl.atomic.load( | |
475 iN* <source>, i32 <memory_order>) | |
476 declare void @llvm.nacl.atomic.store( | |
477 iN <operand>, iN* <destination>, i32 <memory_order>) | |
478 declare iN @llvm.nacl.atomic.rmw( | |
479 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) | |
480 declare iN @llvm.nacl.atomic.cmpxchg( | |
481 iN* <object>, iN <expected>, iN <desired>, | |
482 i32 <memory_order_success>, i32 <memory_order_failure>) | |
483 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) | |
484 | |
485 Each of these intrinsics is overloaded on the ``iN`` | |
486 argument. Integral types of 8, 16, 32 and 64-bit width are supported | |
487 for these ``iN`` arguments. | |
488 | |
489 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following | |
490 read-modify-write operations, from the general and arithmetic sections | |
491 of the C11/C++11 standards: | |
492 | |
493 - ``add`` | |
494 - ``sub`` | |
495 - ``or`` | |
496 - ``and`` | |
497 - ``xor`` | |
498 - ``exchange`` | |
499 | |
500 For all of these read-modify-write operations, the returned value is | |
501 that at ``object`` before the computation. The ``computation`` | |
502 argument must be a compile-time constant. | |
503 | |
504 All atomic intrinsics also support C11/C++11 memory orderings, which | |
505 must be compile-time constants. Those are detailed in `Atomic Memory | |
506 Ordering Constraints`_. | |
507 | |
508 Integer values for these computations and memory orderings are defined | |
509 in ``"llvm/IR/NaClIntrinsics.h"``. | |
510 | |
511 .. note:: | |
512 | |
513 These intrinsics allow PNaCl to support C11/C++11 style atomic | |
514 operations as well as some legacy GCC-style ``__sync_*`` builtins | |
515 while remaining stable as the LLVM codebase changes. The user | |
516 isn't expected to use these intrinsics directly. | |
OLD | NEW |