OLD | NEW |
---|---|
1 ============================== | 1 ============================== |
2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
3 ============================== | 3 ============================== |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
(...skipping 88 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
99 | 99 |
100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
101 | 101 |
102 PNaCl bitcode does not support inline assembly. | 102 PNaCl bitcode does not support inline assembly. |
103 | 103 |
104 Volatile Memory Accesses | 104 Volatile Memory Accesses |
105 ------------------------ | 105 ------------------------ |
106 | 106 |
107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
108 | 108 |
109 TODO: are we going to promote volatile to atomic? | 109 PNaCl bitcode does not support volatile memory accesses. |
110 | |
111 .. note:: | |
112 | |
113 The C11/C++11 standards mandate that ``volatile`` accesses execute | |
114 in program order (but are not fences, so other memory operations can | |
115 reorder around them), are not necessarily atomic, and can’t be | |
116 elided or fused. | |
117 | |
118 The PNaCl toolchain applies regular LLVM optimizations along these | |
119 guidelines, and it further prevents any load/store (even | |
120 non-``volatile`` and non-atomic ones) from moving above or below a | |
121 volatile operations: they act as compiler barriers before | |
122 optimizations occur. The PNaCl toolchain freezes ``volatile`` | |
123 accesses after optimizations into atomic accesses with sequentially | |
124 consistent memory ordering. This eases the support of legacy | |
125 (i.e. non-C11/C++11) code, and combined with builtin fences these | |
126 programs can do meaningful cross-thread communication without | |
127 changing code. It also reflects the original code's intent and | |
128 guarantees better portability. | |
129 | |
130 Relaxed ordering could be used instead, but for the first release it | |
131 is more conservative to apply sequential consistency. Future | |
132 releases may change what happens at compile-time, but | |
133 already-released pexes will continue using sequential consistency. | |
134 | |
135 The PNaCl toolchain also requires that ``volatile`` accesses be at | |
136 least naturally aligned, and tries to guarantee this alignment. | |
110 | 137 |
111 Memory Model for Concurrent Operations | 138 Memory Model for Concurrent Operations |
112 -------------------------------------- | 139 -------------------------------------- |
113 | 140 |
114 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 141 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
115 | 142 |
116 TODO. | 143 The memory model offered by PNaCl relies on the same coding guidelines |
144 as the C11/C++11 one: concurrent accesses must always occur through | |
145 atomic primitives, and these accesses must always occur with the same | |
146 size for the same memory location. Visibility of stores is provided on a | |
147 happens-before basis that relates memory locations to each other as the | |
148 C11/C++11 standards do. | |
149 | |
150 PNaCl bitcode requires all concurrency to occur through `atomic | |
151 intrinsics`_. | |
152 | |
153 .. note:: | |
154 | |
155 As in C11/C++11 some atomic accesses may be implemented with locks | |
156 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always | |
157 be ``1``, signifying that all types are sometimes lock-free. The | |
158 ``is_lock_free`` methods will return the current platform's | |
159 implementation at runtime. | |
160 | |
161 The PNaCl toolchain supports concurrent memory accesses through | |
162 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 | |
163 atomic primitives. ``volatile`` memory accesses can also be used, | |
164 though these are discouraged, and aren't present in bitcode. | |
165 | |
166 Note that PNaCl explicitly supports concurrency through threading | |
167 and inter-process communication (shared memory), but doesn't support | |
Derek Schuff
2013/07/02 22:13:17
should probably remove the reference to shared mem
JF
2013/07/02 23:44:32
I clarified this entire section, please review aga
| |
168 interacting with device memory. Setting these up require assistance | |
169 from the embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but | |
170 using them once setup can be done through regular C/C++ code. | |
171 | |
172 PNaCl also doesn't currently support signal handling, and therefore | |
173 promotes all primitives to cross-thread (instead of | |
174 single-thread). This may change at a later date. | |
175 | |
176 The PNaCl toolchain currently optimizes for memory ordering as LLVM | |
177 normally does, but at pexe creation time it promotes all | |
178 ``volatile`` accesses as well as all atomic accesses to be | |
179 sequentially consistent. Other memory orderings will be supported in | |
180 a future release, but pexes generate with the current toolchain will | |
Derek Schuff
2013/07/02 22:13:17
s/generate/generated
JF
2013/07/02 23:44:32
Done.
| |
181 continue functioning with sequential consistency. Using sequential | |
182 consistency provides a total ordering for all | |
183 sequentially-consistent operations on all addresses. | |
184 | |
185 This means that ``volatile`` and atomic memory accesses can only be | |
186 re-ordered in some limited way before the pexe is created, and will | |
187 act as fences for all memory accesses (even non-atomic and | |
188 non-``volatile``) after pexe creation. Non-atomic and | |
189 non-``volatile`` memory accesses may be reordered (unless a fence | |
190 intervenes), separated, elided or fused according to C and C++'s | |
191 memory model before the pexe is created as well as after its | |
192 creation. | |
117 | 193 |
118 Atomic Memory Ordering Constraints | 194 Atomic Memory Ordering Constraints |
119 ---------------------------------- | 195 ---------------------------------- |
120 | 196 |
121 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 197 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
122 | 198 |
123 TODO. | 199 PNaCl bitcode currently supports sequential consistency only, through |
200 its `atomic intrinsics`_. | |
201 | |
202 .. note:: | |
203 | |
204 Atomics follow the same ordering constraints as in regular LLVM, but | |
205 all accesses are promoted to sequential consistency (the strongest | |
206 memory ordering) at pexe creation time. As more C11/C++11 code | |
207 allows us to understand performance and portability needs we intend | |
208 to support the full gamut of C11/C++11 memory orderings: | |
209 | |
210 - Relaxed: no operation orders memory. | |
211 - Consume: a load operation performs a consume operation on the | |
212 affected memory location (currently unsupported by LLVM). | |
213 - Acquire: a load operation performs an acquire operation on the | |
214 affected memory location. | |
215 - Release: a store operation performs a release operation on the | |
216 affected memory location. | |
217 - Acquire-release: load and store operations perform acquire and | |
218 release operations on the affected memory. | |
219 - Sequentially consistent: same as acquire-release, but providing | |
220 a global total ordering for all affected locations. | |
221 | |
222 As in C11/C++11: | |
223 | |
224 - Atomic and volatile accesses must at least be naturally aligned. | |
225 - Some accesses may not actually be atomic on certain platforms, | |
226 requiring an implementation that uses a global lock. | |
227 - An atomic memory location must always be accessed with atomic | |
228 primitives, and these primitives must always be of the same bit | |
229 size for that location. | |
230 - Not all memory orderings are valid for all atomic operations. | |
124 | 231 |
125 Fast-Math Flags | 232 Fast-Math Flags |
126 --------------- | 233 --------------- |
127 | 234 |
128 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 235 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
129 | 236 |
130 Fast-math mode is not currently supported by the PNaCl bitcode. | 237 Fast-math mode is not currently supported by the PNaCl bitcode. |
131 | 238 |
132 Type System | 239 Type System |
133 =========== | 240 =========== |
(...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
263 | 370 |
264 .. code-block:: llvm | 371 .. code-block:: llvm |
265 | 372 |
266 %buf = alloca i8, i32 8, align 4 | 373 %buf = alloca i8, i32 8, align 4 |
267 | 374 |
268 * ``load``, ``store`` | 375 * ``load``, ``store`` |
269 | 376 |
270 The pointer argument of these instructions must be a *normalized* pointer | 377 The pointer argument of these instructions must be a *normalized* pointer |
271 (see :ref:`pointer types <pointertypes>`). | 378 (see :ref:`pointer types <pointertypes>`). |
272 | 379 |
273 * ``fence`` | |
274 * ``cmpxchg``, ``atomicrmw`` | |
275 | |
276 The pointer argument of these instructions must be a *normalized* pointer | |
277 (see :ref:`pointer types <pointertypes>`). | |
278 | |
279 TODO(jfb): this may change | |
280 | |
281 * ``trunc`` | 380 * ``trunc`` |
282 * ``zext`` | 381 * ``zext`` |
283 * ``sext`` | 382 * ``sext`` |
284 * ``fptrunc`` | 383 * ``fptrunc`` |
285 * ``fpext`` | 384 * ``fpext`` |
286 * ``fptoui`` | 385 * ``fptoui`` |
287 * ``fptosi`` | 386 * ``fptosi`` |
288 * ``uitofp`` | 387 * ``uitofp`` |
289 * ``sitofp`` | 388 * ``sitofp`` |
290 | 389 |
(...skipping 18 matching lines...) Expand all Loading... | |
309 * ``select`` | 408 * ``select`` |
310 * ``call`` | 409 * ``call`` |
311 | 410 |
312 Intrinsic Functions | 411 Intrinsic Functions |
313 =================== | 412 =================== |
314 | 413 |
315 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 414 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
316 | 415 |
317 The only intrinsics supported by PNaCl bitcode are the following. | 416 The only intrinsics supported by PNaCl bitcode are the following. |
318 | 417 |
319 TODO(jfb): atomics | |
320 | |
321 * ``llvm.memcpy`` | 418 * ``llvm.memcpy`` |
322 * ``llvm.memmove`` | 419 * ``llvm.memmove`` |
323 * ``llvm.memset`` | 420 * ``llvm.memset`` |
324 * ``llvm.bswap`` | 421 * ``llvm.bswap`` |
325 | 422 |
326 The llvm.bswap intrinsic is only supported with the following argument types: | 423 The llvm.bswap intrinsic is only supported with the following argument types: |
327 i16, i32, i64. | 424 i16, i32, i64. |
328 | 425 |
329 * ``llvm.ctlz`` | 426 * ``llvm.ctlz`` |
330 * ``llvm.cttz`` | 427 * ``llvm.cttz`` |
331 * ``llvm.ctpop`` | 428 * ``llvm.ctpop`` |
332 | 429 |
333 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support | 430 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support |
334 i32 and i64 argument types (the types supported by C-style GCC builtins). | 431 i32 and i64 argument types (the types supported by C-style GCC builtins). |
335 | 432 |
336 * ``llvm.trap`` | 433 * ``llvm.trap`` |
337 * ``llvm.nacl.read.tp`` | 434 * ``llvm.nacl.read.tp`` |
338 | 435 |
339 TODO: describe | 436 TODO: describe |
340 | 437 |
341 * ``llvm.nacl.longjmp`` | 438 * ``llvm.nacl.longjmp`` |
342 | 439 |
343 TODO: describe | 440 TODO: describe |
344 | 441 |
345 * ``llvm.nacl.setjmp`` | 442 * ``llvm.nacl.setjmp`` |
346 | 443 |
347 TODO: describe | 444 TODO: describe |
348 | 445 |
446 .. _atomic intrinsics: | |
447 | |
448 * ``llvm.nacl.atomic.store`` | |
449 * ``llvm.nacl.atomic.load`` | |
450 * ``llvm.nacl.atomic.rmw`` | |
451 * ``llvm.nacl.atomic.cmpxchg`` | |
452 * ``llvm.nacl.atomic.fence`` | |
453 | |
454 .. code-block:: llvm | |
455 | |
456 declare iN @llvm.nacl.atomic.load( | |
457 iN* <source>, i32 <memory_order>) | |
458 declare void @llvm.nacl.atomic.store( | |
459 iN <operand>, iN* <destination>, i32 <memory_order>) | |
460 declare iN @llvm.nacl.atomic.rmw( | |
461 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) | |
462 declare iN @llvm.nacl.atomic.cmpxchg( | |
463 iN* <object>, iN <expected>, iN <desired>, | |
464 i32 <memory_order_success>, i32 <memory_order_failure>) | |
465 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) | |
466 | |
467 Each of these intrinsics is overloaded on the ``iN`` | |
468 argument. Integral types of 8, 16, 32 and 64-bit width are supported | |
469 for these ``iN`` arguments. | |
470 | |
471 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following | |
472 read-modify-write operations, from the general and arithmetic sections | |
473 of the C11/C++11 standards: | |
474 | |
475 - ``add`` | |
476 - ``sub`` | |
477 - ``or`` | |
478 - ``and`` | |
479 - ``xor`` | |
480 - ``exchange`` | |
481 | |
482 For all of these read-modify-write operations, the returned value is | |
483 that at ``object`` before the computation. The ``computation`` | |
484 argument must be a compile-time constant. | |
485 | |
486 All atomic intrinsics also support C11/C++11 memory orderings, which | |
487 must be compile-time constants. Those are detailed in `Atomic Memory | |
488 Ordering Constraints`_. | |
489 | |
490 Integer values for these computations and memory orderings are defined | |
491 in ``"llvm/IR/NaClIntrinsics.h"``. | |
492 | |
493 .. note:: | |
494 | |
495 These intrinsics allow PNaCl to support C11/C++11 style atomic | |
496 operations as well as some legacy GCC-style ``__sync_*`` builtins | |
497 while remaining stable as the LLVM codebase changes. The user | |
498 isn't expected to use these intrinsics directly. | |
OLD | NEW |