OLD | NEW |
1 ============================== | 1 ============================== |
2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
3 ============================== | 3 ============================== |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
(...skipping 95 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
106 | 106 |
107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
108 | 108 |
109 PNaCl bitcode does not support inline assembly. | 109 PNaCl bitcode does not support inline assembly. |
110 | 110 |
111 Volatile Memory Accesses | 111 Volatile Memory Accesses |
112 ------------------------ | 112 ------------------------ |
113 | 113 |
114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
115 | 115 |
116 TODO: are we going to promote volatile to atomic? | 116 PNaCl bitcode does not support volatile memory accesses. |
| 117 |
| 118 .. note:: |
| 119 |
| 120 The C11/C++11 standards mandate that ``volatile`` accesses execute |
| 121 in program order (but are not fences, so other memory operations can |
| 122 reorder around them), are not necessarily atomic, and can’t be |
| 123 elided. They can be separated into smaller width accesses. |
| 124 |
| 125 The PNaCl toolchain applies regular LLVM optimizations along these |
| 126 guidelines, and it further prevents any load/store (even |
| 127 non-``volatile`` and non-atomic ones) from moving above or below a |
| 128 volatile operations: they act as compiler barriers before |
| 129 optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| 130 accesses after optimizations into atomic accesses with sequentially |
| 131 consistent memory ordering. This eases the support of legacy |
| 132 (i.e. non-C11/C++11) code, and combined with builtin fences these |
| 133 programs can do meaningful cross-thread communication without |
| 134 changing code. It also reflects the original code's intent and |
| 135 guarantees better portability. |
| 136 |
| 137 Relaxed ordering could be used instead, but for the first release it |
| 138 is more conservative to apply sequential consistency. Future |
| 139 releases may change what happens at compile-time, but |
| 140 already-released pexes will continue using sequential consistency. |
| 141 |
| 142 The PNaCl toolchain also requires that ``volatile`` accesses be at |
| 143 least naturally aligned, and tries to guarantee this alignment. |
117 | 144 |
118 Memory Model for Concurrent Operations | 145 Memory Model for Concurrent Operations |
119 -------------------------------------- | 146 -------------------------------------- |
120 | 147 |
121 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 148 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
122 | 149 |
123 TODO. | 150 The memory model offered by PNaCl relies on the same coding guidelines |
| 151 as the C11/C++11 one: concurrent accesses must always occur through |
| 152 atomic primitives (offered by `atomic intrinsics`_), and these accesses |
| 153 must always occur with the same size for the same memory |
| 154 location. Visibility of stores is provided on a happens-before basis |
| 155 that relates memory locations to each other as the C11/C++11 standards |
| 156 do. |
| 157 |
| 158 .. note:: |
| 159 |
| 160 As in C11/C++11 some atomic accesses may be implemented with locks |
| 161 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always |
| 162 be ``1``, signifying that all types are sometimes lock-free. The |
| 163 ``is_lock_free`` methods will return the current platform's |
| 164 implementation at runtime. |
| 165 |
| 166 The PNaCl toolchain supports concurrent memory accesses through |
| 167 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 |
| 168 atomic primitives. ``volatile`` memory accesses can also be used, |
| 169 though these are discouraged, and aren't present in bitcode. |
| 170 |
| 171 PNaCl supports concurrency and parallelism with some restrictions: |
| 172 |
| 173 * Threading is explicitly supported. |
| 174 * Inter-process communication through shared memory is limited to |
| 175 operations which are lock-free on the current platform |
| 176 (``is_lock_free`` methods). This may change at a later date. |
| 177 * Direct interaction with device memory isn't supported. |
| 178 * Signal handling isn't supported, PNaCl therefore promotes all |
| 179 primitives to cross-thread (instead of single-thread). This may |
| 180 change at a later date. |
| 181 |
| 182 Setting up the above mechanisms requires assistance from the |
| 183 embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using |
| 184 them once setup can be done through regular C/C++ code. |
| 185 |
| 186 The PNaCl toolchain currently optimizes for memory ordering as LLVM |
| 187 normally does, but at pexe creation time it promotes all |
| 188 ``volatile`` accesses as well as all atomic accesses to be |
| 189 sequentially consistent. Other memory orderings will be supported in |
| 190 a future release, but pexes generated with the current toolchain |
| 191 will continue functioning with sequential consistency. Using |
| 192 sequential consistency provides a total ordering for all |
| 193 sequentially-consistent operations on all addresses. |
| 194 |
| 195 This means that ``volatile`` and atomic memory accesses can only be |
| 196 re-ordered in some limited way before the pexe is created, and will |
| 197 act as fences for all memory accesses (even non-atomic and |
| 198 non-``volatile``) after pexe creation. Non-atomic and |
| 199 non-``volatile`` memory accesses may be reordered (unless a fence |
| 200 intervenes), separated, elided or fused according to C and C++'s |
| 201 memory model before the pexe is created as well as after its |
| 202 creation. |
124 | 203 |
125 Atomic Memory Ordering Constraints | 204 Atomic Memory Ordering Constraints |
126 ---------------------------------- | 205 ---------------------------------- |
127 | 206 |
128 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 207 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
129 | 208 |
130 TODO. | 209 PNaCl bitcode currently supports sequential consistency only, through |
| 210 its `atomic intrinsics`_. |
| 211 |
| 212 .. note:: |
| 213 |
| 214 Atomics follow the same ordering constraints as in regular LLVM, but |
| 215 all accesses are promoted to sequential consistency (the strongest |
| 216 memory ordering) at pexe creation time. As more C11/C++11 code |
| 217 allows us to understand performance and portability needs we intend |
| 218 to support the full gamut of C11/C++11 memory orderings: |
| 219 |
| 220 - Relaxed: no operation orders memory. |
| 221 - Consume: a load operation performs a consume operation on the |
| 222 affected memory location (currently unsupported by LLVM). |
| 223 - Acquire: a load operation performs an acquire operation on the |
| 224 affected memory location. |
| 225 - Release: a store operation performs a release operation on the |
| 226 affected memory location. |
| 227 - Acquire-release: load and store operations perform acquire and |
| 228 release operations on the affected memory. |
| 229 - Sequentially consistent: same as acquire-release, but providing |
| 230 a global total ordering for all affected locations. |
| 231 |
| 232 As in C11/C++11: |
| 233 |
| 234 - Atomic accesses must at least be naturally aligned. |
| 235 - Some accesses may not actually be atomic on certain platforms, |
| 236 requiring an implementation that uses a global lock. |
| 237 - An atomic memory location must always be accessed with atomic |
| 238 primitives, and these primitives must always be of the same bit |
| 239 size for that location. |
| 240 - Not all memory orderings are valid for all atomic operations. |
131 | 241 |
132 Fast-Math Flags | 242 Fast-Math Flags |
133 --------------- | 243 --------------- |
134 | 244 |
135 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 245 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
136 | 246 |
137 Fast-math mode is not currently supported by the PNaCl bitcode. | 247 Fast-math mode is not currently supported by the PNaCl bitcode. |
138 | 248 |
139 Type System | 249 Type System |
140 =========== | 250 =========== |
(...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
270 | 380 |
271 .. code-block:: llvm | 381 .. code-block:: llvm |
272 | 382 |
273 %buf = alloca i8, i32 8, align 4 | 383 %buf = alloca i8, i32 8, align 4 |
274 | 384 |
275 * ``load``, ``store`` | 385 * ``load``, ``store`` |
276 | 386 |
277 The pointer argument of these instructions must be a *normalized* pointer | 387 The pointer argument of these instructions must be a *normalized* pointer |
278 (see :ref:`pointer types <pointertypes>`). | 388 (see :ref:`pointer types <pointertypes>`). |
279 | 389 |
280 * ``fence`` | |
281 * ``cmpxchg``, ``atomicrmw`` | |
282 | |
283 The pointer argument of these instructions must be a *normalized* pointer | |
284 (see :ref:`pointer types <pointertypes>`). | |
285 | |
286 TODO(jfb): this may change | |
287 | |
288 * ``trunc`` | 390 * ``trunc`` |
289 * ``zext`` | 391 * ``zext`` |
290 * ``sext`` | 392 * ``sext`` |
291 * ``fptrunc`` | 393 * ``fptrunc`` |
292 * ``fpext`` | 394 * ``fpext`` |
293 * ``fptoui`` | 395 * ``fptoui`` |
294 * ``fptosi`` | 396 * ``fptosi`` |
295 * ``uitofp`` | 397 * ``uitofp`` |
296 * ``sitofp`` | 398 * ``sitofp`` |
297 | 399 |
(...skipping 18 matching lines...) Expand all Loading... |
316 * ``select`` | 418 * ``select`` |
317 * ``call`` | 419 * ``call`` |
318 | 420 |
319 Intrinsic Functions | 421 Intrinsic Functions |
320 =================== | 422 =================== |
321 | 423 |
322 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 424 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
323 | 425 |
324 The only intrinsics supported by PNaCl bitcode are the following. | 426 The only intrinsics supported by PNaCl bitcode are the following. |
325 | 427 |
326 TODO(jfb): atomics | |
327 | |
328 * ``llvm.memcpy`` | 428 * ``llvm.memcpy`` |
329 * ``llvm.memmove`` | 429 * ``llvm.memmove`` |
330 * ``llvm.memset`` | 430 * ``llvm.memset`` |
331 | 431 |
332 These intrinsics are only supported with an i32 ``len`` argument. | 432 These intrinsics are only supported with an i32 ``len`` argument. |
333 | 433 |
334 * ``llvm.bswap`` | 434 * ``llvm.bswap`` |
335 | 435 |
336 The overloaded ``llvm.bswap`` intrinsic is only supported with the following | 436 The overloaded ``llvm.bswap`` intrinsic is only supported with the following |
337 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). | 437 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). |
(...skipping 14 matching lines...) Expand all Loading... |
352 TODO: describe | 452 TODO: describe |
353 | 453 |
354 * ``llvm.nacl.longjmp`` | 454 * ``llvm.nacl.longjmp`` |
355 | 455 |
356 TODO: describe | 456 TODO: describe |
357 | 457 |
358 * ``llvm.nacl.setjmp`` | 458 * ``llvm.nacl.setjmp`` |
359 | 459 |
360 TODO: describe | 460 TODO: describe |
361 | 461 |
| 462 .. _atomic intrinsics: |
| 463 |
| 464 * ``llvm.nacl.atomic.store`` |
| 465 * ``llvm.nacl.atomic.load`` |
| 466 * ``llvm.nacl.atomic.rmw`` |
| 467 * ``llvm.nacl.atomic.cmpxchg`` |
| 468 * ``llvm.nacl.atomic.fence`` |
| 469 |
| 470 .. code-block:: llvm |
| 471 |
| 472 declare iN @llvm.nacl.atomic.load.<size>( |
| 473 iN* <source>, i32 <memory_order>) |
| 474 declare void @llvm.nacl.atomic.store.<size>( |
| 475 iN <operand>, iN* <destination>, i32 <memory_order>) |
| 476 declare iN @llvm.nacl.atomic.rmw.<size>( |
| 477 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) |
| 478 declare iN @llvm.nacl.atomic.cmpxchg.<size>( |
| 479 iN* <object>, iN <expected>, iN <desired>, |
| 480 i32 <memory_order_success>, i32 <memory_order_failure>) |
| 481 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
| 482 |
| 483 Each of these intrinsics is overloaded on the ``iN`` argument, which |
| 484 is reflected through ``<size>`` in the overload's name. Integral types |
| 485 of 8, 16, 32 and 64-bit width are supported for these arguments. |
| 486 |
| 487 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
| 488 read-modify-write operations, from the general and arithmetic sections |
| 489 of the C11/C++11 standards: |
| 490 |
| 491 - ``add`` |
| 492 - ``sub`` |
| 493 - ``or`` |
| 494 - ``and`` |
| 495 - ``xor`` |
| 496 - ``exchange`` |
| 497 |
| 498 For all of these read-modify-write operations, the returned value is |
| 499 that at ``object`` before the computation. The ``computation`` |
| 500 argument must be a compile-time constant. |
| 501 |
| 502 All atomic intrinsics also support C11/C++11 memory orderings, which |
| 503 must be compile-time constants. Those are detailed in `Atomic Memory |
| 504 Ordering Constraints`_. |
| 505 |
| 506 Integer values for these computations and memory orderings are defined |
| 507 in ``"llvm/IR/NaClAtomicIntrinsics.h"``. |
| 508 |
| 509 .. note:: |
| 510 |
| 511 These intrinsics allow PNaCl to support C11/C++11 style atomic |
| 512 operations as well as some legacy GCC-style ``__sync_*`` builtins |
| 513 while remaining stable as the LLVM codebase changes. The user |
| 514 isn't expected to use these intrinsics directly. |
OLD | NEW |