OLD | NEW |
1 ============================== | 1 ============================== |
2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
3 ============================== | 3 ============================== |
4 | 4 |
5 .. contents:: | 5 .. contents:: |
6 :local: | 6 :local: |
7 :depth: 3 | 7 :depth: 3 |
8 | 8 |
9 Introduction | 9 Introduction |
10 ============ | 10 ============ |
(...skipping 95 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
106 | 106 |
107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 107 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
108 | 108 |
109 PNaCl bitcode does not support inline assembly. | 109 PNaCl bitcode does not support inline assembly. |
110 | 110 |
111 Volatile Memory Accesses | 111 Volatile Memory Accesses |
112 ------------------------ | 112 ------------------------ |
113 | 113 |
114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 114 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
115 | 115 |
116 TODO: are we going to promote volatile to atomic? | 116 PNaCl bitcode does not support volatile memory accesses. |
| 117 |
| 118 .. note:: |
| 119 |
| 120 The C11/C++11 standards mandate that ``volatile`` accesses execute |
| 121 in program order (but are not fences, so other memory operations can |
| 122 reorder around them), are not necessarily atomic, and can’t be |
| 123 elided. They can be separated into smaller width accesses. |
| 124 |
| 125 The PNaCl toolchain applies regular LLVM optimizations along these |
| 126 guidelines, and it further prevents any load/store (even |
| 127 non-``volatile`` and non-atomic ones) from moving above or below a |
| 128 volatile operations: they act as compiler barriers before |
| 129 optimizations occur. The PNaCl toolchain freezes ``volatile`` |
| 130 accesses after optimizations into atomic accesses with sequentially |
| 131 consistent memory ordering. This eases the support of legacy |
| 132 (i.e. non-C11/C++11) code, and combined with builtin fences these |
| 133 programs can do meaningful cross-thread communication without |
| 134 changing code. It also reflects the original code's intent and |
| 135 guarantees better portability. |
| 136 |
| 137 Relaxed ordering could be used instead, but for the first release it |
| 138 is more conservative to apply sequential consistency. Future |
| 139 releases may change what happens at compile-time, but |
| 140 already-released pexes will continue using sequential consistency. |
| 141 |
| 142 The PNaCl toolchain also requires that ``volatile`` accesses be at |
| 143 least naturally aligned, and tries to guarantee this alignment. |
117 | 144 |
118 Memory Model for Concurrent Operations | 145 Memory Model for Concurrent Operations |
119 -------------------------------------- | 146 -------------------------------------- |
120 | 147 |
121 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 148 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
122 | 149 |
123 TODO. | 150 The memory model offered by PNaCl relies on the same coding guidelines |
| 151 as the C11/C++11 one: concurrent accesses must always occur through |
| 152 atomic primitives (offered by `atomic intrinsics`_), and these accesses |
| 153 must always occur with the same size for the same memory |
| 154 location. Visibility of stores is provided on a happens-before basis |
| 155 that relates memory locations to each other as the C11/C++11 standards |
| 156 do. |
| 157 |
| 158 .. note:: |
| 159 |
| 160 As in C11/C++11 some atomic accesses may be implemented with locks |
| 161 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always |
| 162 be ``1``, signifying that all types are sometimes lock-free. The |
| 163 ``is_lock_free`` methods will return the current platform's |
| 164 implementation at runtime. |
| 165 |
| 166 The PNaCl toolchain supports concurrent memory accesses through |
| 167 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 |
| 168 atomic primitives. ``volatile`` memory accesses can also be used, |
| 169 though these are discouraged, and aren't present in bitcode. |
| 170 |
| 171 PNaCl supports concurrency and parallelism with some restrictions: |
| 172 |
| 173 * Threading is explicitly supported. |
| 174 * Inter-process communication through shared memory is limited to |
| 175 operations which are lock-free on the current platform |
| 176 (``is_lock_free`` methods). This may change at a later date. |
| 177 * Direct interaction with device memory isn't supported. |
| 178 * Signal handling isn't supported, PNaCl therefore promotes all |
| 179 primitives to cross-thread (instead of single-thread). This may |
| 180 change at a later date. Note that using atomic operations which |
| 181 aren't lock-free may lead to deadlocks when handling asynchronous |
| 182 signals. |
| 183 * ``volatile`` and atomic operations are address-free (operations on |
| 184 the same memory location via two different addresses work |
| 185 atomically), as intended by the C11/C++11 standards. This is |
| 186 critical for inter-process communication as well as synchronous |
| 187 "external modifications" such as mapping underlying memory at |
| 188 multiple locations. |
| 189 |
| 190 Setting up the above mechanisms requires assistance from the |
| 191 embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using |
| 192 them once setup can be done through regular C/C++ code. |
| 193 |
| 194 The PNaCl toolchain currently optimizes for memory ordering as LLVM |
| 195 normally does, but at pexe creation time it promotes all |
| 196 ``volatile`` accesses as well as all atomic accesses to be |
| 197 sequentially consistent. Other memory orderings will be supported in |
| 198 a future release, but pexes generated with the current toolchain |
| 199 will continue functioning with sequential consistency. Using |
| 200 sequential consistency provides a total ordering for all |
| 201 sequentially-consistent operations on all addresses. |
| 202 |
| 203 This means that ``volatile`` and atomic memory accesses can only be |
| 204 re-ordered in some limited way before the pexe is created, and will |
| 205 act as fences for all memory accesses (even non-atomic and |
| 206 non-``volatile``) after pexe creation. Non-atomic and |
| 207 non-``volatile`` memory accesses may be reordered (unless a fence |
| 208 intervenes), separated, elided or fused according to C and C++'s |
| 209 memory model before the pexe is created as well as after its |
| 210 creation. |
124 | 211 |
125 Atomic Memory Ordering Constraints | 212 Atomic Memory Ordering Constraints |
126 ---------------------------------- | 213 ---------------------------------- |
127 | 214 |
128 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 215 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
129 | 216 |
130 TODO. | 217 PNaCl bitcode currently supports sequential consistency only, through |
| 218 its `atomic intrinsics`_. |
| 219 |
| 220 .. note:: |
| 221 |
| 222 Atomics follow the same ordering constraints as in regular LLVM, but |
| 223 all accesses are promoted to sequential consistency (the strongest |
| 224 memory ordering) at pexe creation time. As more C11/C++11 code |
| 225 allows us to understand performance and portability needs we intend |
| 226 to support the full gamut of C11/C++11 memory orderings: |
| 227 |
| 228 - Relaxed: no operation orders memory. |
| 229 - Consume: a load operation performs a consume operation on the |
| 230 affected memory location (currently unsupported by LLVM). |
| 231 - Acquire: a load operation performs an acquire operation on the |
| 232 affected memory location. |
| 233 - Release: a store operation performs a release operation on the |
| 234 affected memory location. |
| 235 - Acquire-release: load and store operations perform acquire and |
| 236 release operations on the affected memory. |
| 237 - Sequentially consistent: same as acquire-release, but providing |
| 238 a global total ordering for all affected locations. |
| 239 |
| 240 As in C11/C++11: |
| 241 |
| 242 - Atomic accesses must at least be naturally aligned. |
| 243 - Some accesses may not actually be atomic on certain platforms, |
| 244 requiring an implementation that uses a global lock. |
| 245 - An atomic memory location must always be accessed with atomic |
| 246 primitives, and these primitives must always be of the same bit |
| 247 size for that location. |
| 248 - Not all memory orderings are valid for all atomic operations. |
131 | 249 |
132 Fast-Math Flags | 250 Fast-Math Flags |
133 --------------- | 251 --------------- |
134 | 252 |
135 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 253 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
136 | 254 |
137 Fast-math mode is not currently supported by the PNaCl bitcode. | 255 Fast-math mode is not currently supported by the PNaCl bitcode. |
138 | 256 |
139 Type System | 257 Type System |
140 =========== | 258 =========== |
(...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... |
270 | 388 |
271 .. code-block:: llvm | 389 .. code-block:: llvm |
272 | 390 |
273 %buf = alloca i8, i32 8, align 4 | 391 %buf = alloca i8, i32 8, align 4 |
274 | 392 |
275 * ``load``, ``store`` | 393 * ``load``, ``store`` |
276 | 394 |
277 The pointer argument of these instructions must be a *normalized* pointer | 395 The pointer argument of these instructions must be a *normalized* pointer |
278 (see :ref:`pointer types <pointertypes>`). | 396 (see :ref:`pointer types <pointertypes>`). |
279 | 397 |
280 * ``fence`` | |
281 * ``cmpxchg``, ``atomicrmw`` | |
282 | |
283 The pointer argument of these instructions must be a *normalized* pointer | |
284 (see :ref:`pointer types <pointertypes>`). | |
285 | |
286 TODO(jfb): this may change | |
287 | |
288 * ``trunc`` | 398 * ``trunc`` |
289 * ``zext`` | 399 * ``zext`` |
290 * ``sext`` | 400 * ``sext`` |
291 * ``fptrunc`` | 401 * ``fptrunc`` |
292 * ``fpext`` | 402 * ``fpext`` |
293 * ``fptoui`` | 403 * ``fptoui`` |
294 * ``fptosi`` | 404 * ``fptosi`` |
295 * ``uitofp`` | 405 * ``uitofp`` |
296 * ``sitofp`` | 406 * ``sitofp`` |
297 | 407 |
(...skipping 18 matching lines...) Expand all Loading... |
316 * ``select`` | 426 * ``select`` |
317 * ``call`` | 427 * ``call`` |
318 | 428 |
319 Intrinsic Functions | 429 Intrinsic Functions |
320 =================== | 430 =================== |
321 | 431 |
322 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 432 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
323 | 433 |
324 The only intrinsics supported by PNaCl bitcode are the following. | 434 The only intrinsics supported by PNaCl bitcode are the following. |
325 | 435 |
326 TODO(jfb): atomics | |
327 | |
328 * ``llvm.memcpy`` | 436 * ``llvm.memcpy`` |
329 * ``llvm.memmove`` | 437 * ``llvm.memmove`` |
330 * ``llvm.memset`` | 438 * ``llvm.memset`` |
331 | 439 |
332 These intrinsics are only supported with an i32 ``len`` argument. | 440 These intrinsics are only supported with an i32 ``len`` argument. |
333 | 441 |
334 * ``llvm.bswap`` | 442 * ``llvm.bswap`` |
335 | 443 |
336 The overloaded ``llvm.bswap`` intrinsic is only supported with the following | 444 The overloaded ``llvm.bswap`` intrinsic is only supported with the following |
337 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). | 445 argument types: i16, i32, i64 (the types supported by C-style GCC builtins). |
(...skipping 20 matching lines...) Expand all Loading... |
358 TODO: describe | 466 TODO: describe |
359 | 467 |
360 * ``llvm.nacl.longjmp`` | 468 * ``llvm.nacl.longjmp`` |
361 | 469 |
362 TODO: describe | 470 TODO: describe |
363 | 471 |
364 * ``llvm.nacl.setjmp`` | 472 * ``llvm.nacl.setjmp`` |
365 | 473 |
366 TODO: describe | 474 TODO: describe |
367 | 475 |
| 476 .. _atomic intrinsics: |
| 477 |
| 478 * ``llvm.nacl.atomic.store`` |
| 479 * ``llvm.nacl.atomic.load`` |
| 480 * ``llvm.nacl.atomic.rmw`` |
| 481 * ``llvm.nacl.atomic.cmpxchg`` |
| 482 * ``llvm.nacl.atomic.fence`` |
| 483 |
| 484 .. code-block:: llvm |
| 485 |
| 486 declare iN @llvm.nacl.atomic.load.<size>( |
| 487 iN* <source>, i32 <memory_order>) |
| 488 declare void @llvm.nacl.atomic.store.<size>( |
| 489 iN <operand>, iN* <destination>, i32 <memory_order>) |
| 490 declare iN @llvm.nacl.atomic.rmw.<size>( |
| 491 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) |
| 492 declare iN @llvm.nacl.atomic.cmpxchg.<size>( |
| 493 iN* <object>, iN <expected>, iN <desired>, |
| 494 i32 <memory_order_success>, i32 <memory_order_failure>) |
| 495 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) |
| 496 |
| 497 Each of these intrinsics is overloaded on the ``iN`` argument, which |
| 498 is reflected through ``<size>`` in the overload's name. Integral types |
| 499 of 8, 16, 32 and 64-bit width are supported for these arguments. |
| 500 |
| 501 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following |
| 502 read-modify-write operations, from the general and arithmetic sections |
| 503 of the C11/C++11 standards: |
| 504 |
| 505 - ``add`` |
| 506 - ``sub`` |
| 507 - ``or`` |
| 508 - ``and`` |
| 509 - ``xor`` |
| 510 - ``exchange`` |
| 511 |
| 512 For all of these read-modify-write operations, the returned value is |
| 513 that at ``object`` before the computation. The ``computation`` |
| 514 argument must be a compile-time constant. |
| 515 |
| 516 All atomic intrinsics also support C11/C++11 memory orderings, which |
| 517 must be compile-time constants. Those are detailed in `Atomic Memory |
| 518 Ordering Constraints`_. |
| 519 |
| 520 Integer values for these computations and memory orderings are defined |
| 521 in ``"llvm/IR/NaClAtomicIntrinsics.h"``. |
| 522 |
| 523 .. note:: |
| 524 |
| 525 These intrinsics allow PNaCl to support C11/C++11 style atomic |
| 526 operations as well as some legacy GCC-style ``__sync_*`` builtins |
| 527 while remaining stable as the LLVM codebase changes. The user |
| 528 isn't expected to use these intrinsics directly. |
OLD | NEW |