OLD | NEW |
1 ============================== | 1 ============================== |
2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
3 ============================== | 3 ============================== |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
(...skipping 88 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
99 | 99 |
100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
101 | 101 |
102 PNaCl bitcode does not support inline assembly. | 102 PNaCl bitcode does not support inline assembly. |
103 | 103 |
104 Volatile Memory Accesses | 104 Volatile Memory Accesses |
105 ------------------------ | 105 ------------------------ |
106 | 106 |
107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
108 | 108 |
109 TODO: are we going to promote volatile to atomic? | 109 PNaCl bitcode does not support volatile memory accesses. |
| 110 |
| 111 .. note:: |
| 112 |
| 113 The C11/C++11 standards mandate that ``volatile`` accesses execute |
| 114 in program order (but are not fences, so other memory operations can |
| 115 reorder around them), are not necessarily atomic, and can’t be |
| 116 elided or fused. |
| 117 |
| 118 The PNaCl toolchain applies regular LLVM optimizations along these |
| 119 guidelines, and it further prevents any load/store (even |
| 120 non-``volatile`` and non-atomic ones) from moving above or below a |
| 121 volatile operations: they act as compiler barriers before |
| 122 optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| 123 accesses after optimizations into atomic accesses with sequentially |
| 124 consistent memory ordering. This eases the support of legacy |
| 125 (i.e. non-C11/C++11) code, and combined with builtin fences these |
| 126 programs can do meaningful cross-thread communication without |
| 127 changing code. It also reflects the original code's intent and |
| 128 guarantees better portability. |
| 129 |
| 130 Relaxed ordering could be used instead, but for the first release it |
| 131 is more conservative to apply sequential consistency. Future |
| 132 releases may change what happens at compile-time, but |
| 133 already-released pexes will continue using sequential consistency. |
| 134 |
| 135 The PNaCl toolchain also requires that ``volatile`` accesses be at |
| 136 least naturally aligned, and tries to guarantee this alignment. |
110 | 137 |
111 Memory Model for Concurrent Operations | 138 Memory Model for Concurrent Operations |
112 -------------------------------------- | 139 -------------------------------------- |
113 | 140 |
114 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 141 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
115 | 142 |
116 TODO. | 143 The memory model offered by PNaCl relies on the same coding guidelines |
| 144 as the C11/C++11 one: concurrent accesses must always occur through |
| 145 atomic primitives, and these accesses must always occur with the same |
| 146 size for the same memory location. Visibility of stores is provided on a |
| 147 happens-before basis that relates memory locations to each other as the |
| 148 C11/C++11 standards do. |
| 149 |
| 150 PNaCl bitcode requires all concurrency to occur through `atomic |
| 151 intrinsics`_. |
| 152 |
| 153 .. note:: |
| 154 |
| 155 As in C11/C++11 some atomic accesses may be implemented with locks |
| 156 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always |
| 157 be ``1``, signifying that all types are sometimes lock-free. The |
| 158 ``is_lock_free`` methods will return the current platform's |
| 159 implementation at runtime. |
| 160 |
| 161 The PNaCl toolchain supports concurrent memory accesses through |
| 162 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 |
| 163 atomic primitives. ``volatile`` memory accesses can also be used, |
| 164 though these are discouraged, and aren't present in bitcode. |
| 165 |
| 166 PNaCl has varying support for concurrency and parallelism: |
| 167 |
| 168 * Threading is explicitly supported. |
| 169 * Inter-process communication through shared memory is limited to |
| 170 operations which are lock-free on the current platform |
| 171 (``is_lock_free`` methods). This may change at a later date. |
| 172 * Direct interaction with device memory isn't supported. |
| 173 * Signal handling isn't supported, PNaCl therefore promotes all |
| 174 primitives to cross-thread (instead of single-thread). This may |
| 175 change at a later date. |
| 176 |
| 177 Setting up the above mechanisms requires assistance from the |
| 178 embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using |
| 179 them once setup can be done through regular C/C++ code. |
| 180 |
| 181 The PNaCl toolchain currently optimizes for memory ordering as LLVM |
| 182 normally does, but at pexe creation time it promotes all |
| 183 ``volatile`` accesses as well as all atomic accesses to be |
| 184 sequentially consistent. Other memory orderings will be supported in |
| 185 a future release, but pexes generated with the current toolchain |
| 186 will continue functioning with sequential consistency. Using |
| 187 sequential consistency provides a total ordering for all |
| 188 sequentially-consistent operations on all addresses. |
| 189 |
| 190 This means that ``volatile`` and atomic memory accesses can only be |
| 191 re-ordered in some limited way before the pexe is created, and will |
| 192 act as fences for all memory accesses (even non-atomic and |
| 193 non-``volatile``) after pexe creation. Non-atomic and |
| 194 non-``volatile`` memory accesses may be reordered (unless a fence |
| 195 intervenes), separated, elided or fused according to C and C++'s |
| 196 memory model before the pexe is created as well as after its |
| 197 creation. |
117 | 198 |
118 Atomic Memory Ordering Constraints | 199 Atomic Memory Ordering Constraints |
119 ---------------------------------- | 200 ---------------------------------- |
120 | 201 |
121 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 202 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
122 | 203 |
123 TODO. | 204 PNaCl bitcode currently supports sequential consistency only, through |
| 205 its `atomic intrinsics`_. |
| 206 |
| 207 .. note:: |
| 208 |
| 209 Atomics follow the same ordering constraints as in regular LLVM, but |
| 210 all accesses are promoted to sequential consistency (the strongest |
| 211 memory ordering) at pexe creation time. As more C11/C++11 code |
| 212 allows us to understand performance and portability needs we intend |
| 213 to support the full gamut of C11/C++11 memory orderings: |
| 214 |
| 215 - Relaxed: no operation orders memory. |
| 216 - Consume: a load operation performs a consume operation on the |
| 217 affected memory location (currently unsupported by LLVM). |
| 218 - Acquire: a load operation performs an acquire operation on the |
| 219 affected memory location. |
| 220 - Release: a store operation performs a release operation on the |
| 221 affected memory location. |
| 222 - Acquire-release: load and store operations perform acquire and |
| 223 release operations on the affected memory. |
| 224 - Sequentially consistent: same as acquire-release, but providing |
| 225 a global total ordering for all affected locations. |
| 226 |
| 227 As in C11/C++11: |
| 228 |
| 229 - Atomic and volatile accesses must at least be naturally aligned. |
| 230 - Some accesses may not actually be atomic on certain platforms, |
| 231 requiring an implementation that uses a global lock. |
| 232 - An atomic memory location must always be accessed with atomic |
| 233 primitives, and these primitives must always be of the same bit |
| 234 size for that location. |
| 235 - Not all memory orderings are valid for all atomic operations. |
124 | 236 |
125 Fast-Math Flags | 237 Fast-Math Flags |
126 --------------- | 238 --------------- |
127 | 239 |
128 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 240 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
129 | 241 |
130 Fast-math mode is not currently supported by the PNaCl bitcode. | 242 Fast-math mode is not currently supported by the PNaCl bitcode. |
131 | 243 |
132 Type System | 244 Type System |
133 =========== | 245 =========== |
(...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
263 | 375 |
264 .. code-block:: llvm | 376 .. code-block:: llvm |
265 | 377 |
266 %buf = alloca i8, i32 8, align 4 | 378 %buf = alloca i8, i32 8, align 4 |
267 | 379 |
268 * ``load``, ``store`` | 380 * ``load``, ``store`` |
269 | 381 |
270 The pointer argument of these instructions must be a *normalized* pointer | 382 The pointer argument of these instructions must be a *normalized* pointer |
271 (see :ref:`pointer types <pointertypes>`). | 383 (see :ref:`pointer types <pointertypes>`). |
272 | 384 |
273 * ``fence`` | |
274 * ``cmpxchg``, ``atomicrmw`` | |
275 | |
276 The pointer argument of these instructions must be a *normalized* pointer | |
277 (see :ref:`pointer types <pointertypes>`). | |
278 | |
279 TODO(jfb): this may change | |
280 | |
281 * ``trunc`` | 385 * ``trunc`` |
282 * ``zext`` | 386 * ``zext`` |
283 * ``sext`` | 387 * ``sext`` |
284 * ``fptrunc`` | 388 * ``fptrunc`` |
285 * ``fpext`` | 389 * ``fpext`` |
286 * ``fptoui`` | 390 * ``fptoui`` |
287 * ``fptosi`` | 391 * ``fptosi`` |
288 * ``uitofp`` | 392 * ``uitofp`` |
289 * ``sitofp`` | 393 * ``sitofp`` |
290 | 394 |
(...skipping 18 matching lines...) Expand all Loading... |
309 * ``select`` | 413 * ``select`` |
310 * ``call`` | 414 * ``call`` |
311 | 415 |
312 Intrinsic Functions | 416 Intrinsic Functions |
313 =================== | 417 =================== |
314 | 418 |
315 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 419 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
316 | 420 |
317 The only intrinsics supported by PNaCl bitcode are the following. | 421 The only intrinsics supported by PNaCl bitcode are the following. |
318 | 422 |
319 TODO(jfb): atomics | |
320 | |
321 * ``llvm.memcpy`` | 423 * ``llvm.memcpy`` |
322 * ``llvm.memmove`` | 424 * ``llvm.memmove`` |
323 * ``llvm.memset`` | 425 * ``llvm.memset`` |
324 * ``llvm.bswap`` | 426 * ``llvm.bswap`` |
325 | 427 |
326 The llvm.bswap intrinsic is only supported with the following argument types: | 428 The llvm.bswap intrinsic is only supported with the following argument types: |
327 i16, i32, i64. | 429 i16, i32, i64. |
328 | 430 |
329 * ``llvm.ctlz`` | 431 * ``llvm.ctlz`` |
330 * ``llvm.cttz`` | 432 * ``llvm.cttz`` |
331 * ``llvm.ctpop`` | 433 * ``llvm.ctpop`` |
332 | 434 |
333 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support | 435 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support |
334 i32 and i64 argument types (the types supported by C-style GCC builtins). | 436 i32 and i64 argument types (the types supported by C-style GCC builtins). |
335 | 437 |
336 * ``llvm.trap`` | 438 * ``llvm.trap`` |
337 * ``llvm.nacl.read.tp`` | 439 * ``llvm.nacl.read.tp`` |
338 | 440 |
339 TODO: describe | 441 TODO: describe |
340 | 442 |
341 * ``llvm.nacl.longjmp`` | 443 * ``llvm.nacl.longjmp`` |
342 | 444 |
343 TODO: describe | 445 TODO: describe |
344 | 446 |
345 * ``llvm.nacl.setjmp`` | 447 * ``llvm.nacl.setjmp`` |
346 | 448 |
347 TODO: describe | 449 TODO: describe |
348 | 450 |
| 451 .. _atomic intrinsics: |
| 452 |
| 453 * ``llvm.nacl.atomic.store`` |
| 454 * ``llvm.nacl.atomic.load`` |
| 455 * ``llvm.nacl.atomic.rmw`` |
| 456 * ``llvm.nacl.atomic.cmpxchg`` |
| 457 * ``llvm.nacl.atomic.fence`` |
| 458 |
| 459 .. code-block:: llvm |
| 460 |
| 461 declare iN @llvm.nacl.atomic.load( |
| 462 iN* <source>, i32 <memory_order>) |
| 463 declare void @llvm.nacl.atomic.store( |
| 464 iN <operand>, iN* <destination>, i32 <memory_order>) |
| 465 declare iN @llvm.nacl.atomic.rmw( |
| 466 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) |
| 467 declare iN @llvm.nacl.atomic.cmpxchg( |
| 468 iN* <object>, iN <expected>, iN <desired>, |
| 469 i32 <memory_order_success>, i32 <memory_order_failure>) |
| 470 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
| 471 |
| 472 Each of these intrinsics is overloaded on the ``iN`` |
| 473 argument. Integral types of 8, 16, 32 and 64-bit width are supported |
| 474 for these ``iN`` arguments. |
| 475 |
| 476 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
| 477 read-modify-write operations, from the general and arithmetic sections |
| 478 of the C11/C++11 standards: |
| 479 |
| 480 - ``add`` |
| 481 - ``sub`` |
| 482 - ``or`` |
| 483 - ``and`` |
| 484 - ``xor`` |
| 485 - ``exchange`` |
| 486 |
| 487 For all of these read-modify-write operations, the returned value is |
| 488 that at ``object`` before the computation. The ``computation`` |
| 489 argument must be a compile-time constant. |
| 490 |
| 491 All atomic intrinsics also support C11/C++11 memory orderings, which |
| 492 must be compile-time constants. Those are detailed in `Atomic Memory |
| 493 Ordering Constraints`_. |
| 494 |
| 495 Integer values for these computations and memory orderings are defined |
| 496 in ``"llvm/IR/NaClIntrinsics.h"``. |
| 497 |
| 498 .. note:: |
| 499 |
| 500 These intrinsics allow PNaCl to support C11/C++11 style atomic |
| 501 operations as well as some legacy GCC-style ``__sync_*`` builtins |
| 502 while remaining stable as the LLVM codebase changes. The user |
| 503 isn't expected to use these intrinsics directly. |
OLD | NEW |