Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 ============================== | 1 ============================== |
| 2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
| 3 ============================== | 3 ============================== |
| 4 | 4 |
| 5 .. contents:: | 5 .. contents:: |
| 6 :local: | 6 :local: |
| 7 :depth: 3 | 7 :depth: 3 |
| 8 | 8 |
| 9 Introduction | 9 Introduction |
| 10 ============ | 10 ============ |
| (...skipping 88 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 99 | 99 |
| 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
| 101 | 101 |
| 102 PNaCl bitcode does not support inline assembly. | 102 PNaCl bitcode does not support inline assembly. |
| 103 | 103 |
| 104 Volatile Memory Accesses | 104 Volatile Memory Accesses |
| 105 ------------------------ | 105 ------------------------ |
| 106 | 106 |
| 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
| 108 | 108 |
| 109 TODO: are we going to promote volatile to atomic? | 109 PNaCl bitcode does not support volatile memory accesses. |
| 110 | |
| 111 .. note:: | |
| 112 | |
| 113 The C11/C++11 standards mandate that ``volatile`` accesses execute | |
| 114 in program order (but are not fences, so other memory operations can | |
| 115 reorder around them), are not necessarily atomic, and can’t be | |
| 116 elided or fused. | |
| 117 | |
| 118 The PNaCl toolchain applies regular LLVM optimizations along these | |
| 119 guidelines, and it further prevents any load/store (even | |
| 120 non-``volatile`` and non-atomic ones) from moving above or below a | |
| 121 volatile operations: they act as compiler barriers before | |
| 122 optimizations occur. The PNaCl toolchain freezes ``volatile`` | |
| 123 accesses after optimizations into atomic accesses with sequentially | |
| 124 consistent memory ordering. This eases the support of legacy | |
| 125 (i.e. non-C11/C++11) code, and combined with builtin fences these | |
| 126 programs can do meaningful cross-thread communication without | |
| 127 changing code. It also reflects the original code's intent and | |
| 128 guarantees better portability. | |
| 129 | |
| 130 Relaxed ordering could be used instead, but for the first release it | |
| 131 is more conservative to apply sequential consistency. Future | |
| 132 releases may change what happens at compile-time, but | |
| 133 already-released pexes will continue using sequential consistency. | |
| 134 | |
| 135 The PNaCl toolchain also requires that ``volatile`` accesses be at | |
| 136 least naturally aligned, and tries to guarantee this alignment. | |
| 110 | 137 |
| 111 Memory Model for Concurrent Operations | 138 Memory Model for Concurrent Operations |
| 112 -------------------------------------- | 139 -------------------------------------- |
| 113 | 140 |
| 114 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 141 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
| 115 | 142 |
| 116 TODO. | 143 The memory model offered by PNaCl relies on the same coding guidelines |
| 144 as the C11/C++11 one: concurrent accesses must always occur through | |
| 145 atomic primitives, and these accesses must always occur with the same | |
| 146 size for the same memory location. Visibility of stores is provided on a | |
| 147 happens-before basis that relates memory locations to each other as the | |
| 148 C11/C++11 standards do. | |
| 149 | |
| 150 PNaCl bitcode requires all concurrency to occur through `atomic | |
| 151 intrinsics`_. | |
| 152 | |
| 153 .. note:: | |
| 154 | |
| 155 As in C11/C++11 some atomic accesses may be implemented with locks | |
| 156 on certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always | |
| 157 be ``1``, signifying that all types are sometimes lock-free. The | |
| 158 ``is_lock_free`` methods will return the current platform's | |
| 159 implementation at runtime. | |
| 160 | |
| 161 The PNaCl toolchain supports concurrent memory accesses through | |
| 162 legacy GCC-style ``__sync_*`` builtins, as well as through C11/C++11 | |
| 163 atomic primitives. ``volatile`` memory accesses can also be used, | |
| 164 though these are discouraged, and aren't present in bitcode. | |
| 165 | |
| 166 Note that PNaCl explicitly supports concurrency through threading | |
| 167 and inter-process communication (shared memory), but doesn't support | |
|
Derek Schuff
2013/07/02 22:13:17
should probably remove the reference to shared mem
JF
2013/07/02 23:44:32
I clarified this entire section, please review aga
| |
| 168 interacting with device memory. Setting these up require assistance | |
| 169 from the embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but | |
| 170 using them once setup can be done through regular C/C++ code. | |
| 171 | |
| 172 PNaCl also doesn't currently support signal handling, and therefore | |
| 173 promotes all primitives to cross-thread (instead of | |
| 174 single-thread). This may change at a later date. | |
| 175 | |
| 176 The PNaCl toolchain currently optimizes for memory ordering as LLVM | |
| 177 normally does, but at pexe creation time it promotes all | |
| 178 ``volatile`` accesses as well as all atomic accesses to be | |
| 179 sequentially consistent. Other memory orderings will be supported in | |
| 180 a future release, but pexes generate with the current toolchain will | |
|
Derek Schuff
2013/07/02 22:13:17
s/generate/generated
JF
2013/07/02 23:44:32
Done.
| |
| 181 continue functioning with sequential consistency. Using sequential | |
| 182 consistency provides a total ordering for all | |
| 183 sequentially-consistent operations on all addresses. | |
| 184 | |
| 185 This means that ``volatile`` and atomic memory accesses can only be | |
| 186 re-ordered in some limited way before the pexe is created, and will | |
| 187 act as fences for all memory accesses (even non-atomic and | |
| 188 non-``volatile``) after pexe creation. Non-atomic and | |
| 189 non-``volatile`` memory accesses may be reordered (unless a fence | |
| 190 intervenes), separated, elided or fused according to C and C++'s | |
| 191 memory model before the pexe is created as well as after its | |
| 192 creation. | |
| 117 | 193 |
| 118 Atomic Memory Ordering Constraints | 194 Atomic Memory Ordering Constraints |
| 119 ---------------------------------- | 195 ---------------------------------- |
| 120 | 196 |
| 121 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 197 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
| 122 | 198 |
| 123 TODO. | 199 PNaCl bitcode currently supports sequential consistency only, through |
| 200 its `atomic intrinsics`_. | |
| 201 | |
| 202 .. note:: | |
| 203 | |
| 204 Atomics follow the same ordering constraints as in regular LLVM, but | |
| 205 all accesses are promoted to sequential consistency (the strongest | |
| 206 memory ordering) at pexe creation time. As more C11/C++11 code | |
| 207 allows us to understand performance and portability needs we intend | |
| 208 to support the full gamut of C11/C++11 memory orderings: | |
| 209 | |
| 210 - Relaxed: no operation orders memory. | |
| 211 - Consume: a load operation performs a consume operation on the | |
| 212 affected memory location (currently unsupported by LLVM). | |
| 213 - Acquire: a load operation performs an acquire operation on the | |
| 214 affected memory location. | |
| 215 - Release: a store operation performs a release operation on the | |
| 216 affected memory location. | |
| 217 - Acquire-release: load and store operations perform acquire and | |
| 218 release operations on the affected memory. | |
| 219 - Sequentially consistent: same as acquire-release, but providing | |
| 220 a global total ordering for all affected locations. | |
| 221 | |
| 222 As in C11/C++11: | |
| 223 | |
| 224 - Atomic and volatile accesses must at least be naturally aligned. | |
| 225 - Some accesses may not actually be atomic on certain platforms, | |
| 226 requiring an implementation that uses a global lock. | |
| 227 - An atomic memory location must always be accessed with atomic | |
| 228 primitives, and these primitives must always be of the same bit | |
| 229 size for that location. | |
| 230 - Not all memory orderings are valid for all atomic operations. | |
| 124 | 231 |
| 125 Fast-Math Flags | 232 Fast-Math Flags |
| 126 --------------- | 233 --------------- |
| 127 | 234 |
| 128 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 235 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
| 129 | 236 |
| 130 Fast-math mode is not currently supported by the PNaCl bitcode. | 237 Fast-math mode is not currently supported by the PNaCl bitcode. |
| 131 | 238 |
| 132 Type System | 239 Type System |
| 133 =========== | 240 =========== |
| (...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 263 | 370 |
| 264 .. code-block:: llvm | 371 .. code-block:: llvm |
| 265 | 372 |
| 266 %buf = alloca i8, i32 8, align 4 | 373 %buf = alloca i8, i32 8, align 4 |
| 267 | 374 |
| 268 * ``load``, ``store`` | 375 * ``load``, ``store`` |
| 269 | 376 |
| 270 The pointer argument of these instructions must be a *normalized* pointer | 377 The pointer argument of these instructions must be a *normalized* pointer |
| 271 (see :ref:`pointer types <pointertypes>`). | 378 (see :ref:`pointer types <pointertypes>`). |
| 272 | 379 |
| 273 * ``fence`` | |
| 274 * ``cmpxchg``, ``atomicrmw`` | |
| 275 | |
| 276 The pointer argument of these instructions must be a *normalized* pointer | |
| 277 (see :ref:`pointer types <pointertypes>`). | |
| 278 | |
| 279 TODO(jfb): this may change | |
| 280 | |
| 281 * ``trunc`` | 380 * ``trunc`` |
| 282 * ``zext`` | 381 * ``zext`` |
| 283 * ``sext`` | 382 * ``sext`` |
| 284 * ``fptrunc`` | 383 * ``fptrunc`` |
| 285 * ``fpext`` | 384 * ``fpext`` |
| 286 * ``fptoui`` | 385 * ``fptoui`` |
| 287 * ``fptosi`` | 386 * ``fptosi`` |
| 288 * ``uitofp`` | 387 * ``uitofp`` |
| 289 * ``sitofp`` | 388 * ``sitofp`` |
| 290 | 389 |
| (...skipping 18 matching lines...) Expand all Loading... | |
| 309 * ``select`` | 408 * ``select`` |
| 310 * ``call`` | 409 * ``call`` |
| 311 | 410 |
| 312 Intrinsic Functions | 411 Intrinsic Functions |
| 313 =================== | 412 =================== |
| 314 | 413 |
| 315 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 414 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
| 316 | 415 |
| 317 The only intrinsics supported by PNaCl bitcode are the following. | 416 The only intrinsics supported by PNaCl bitcode are the following. |
| 318 | 417 |
| 319 TODO(jfb): atomics | |
| 320 | |
| 321 * ``llvm.memcpy`` | 418 * ``llvm.memcpy`` |
| 322 * ``llvm.memmove`` | 419 * ``llvm.memmove`` |
| 323 * ``llvm.memset`` | 420 * ``llvm.memset`` |
| 324 * ``llvm.bswap`` | 421 * ``llvm.bswap`` |
| 325 | 422 |
| 326 The llvm.bswap intrinsic is only supported with the following argument types: | 423 The llvm.bswap intrinsic is only supported with the following argument types: |
| 327 i16, i32, i64. | 424 i16, i32, i64. |
| 328 | 425 |
| 329 * ``llvm.ctlz`` | 426 * ``llvm.ctlz`` |
| 330 * ``llvm.cttz`` | 427 * ``llvm.cttz`` |
| 331 * ``llvm.ctpop`` | 428 * ``llvm.ctpop`` |
| 332 | 429 |
| 333 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support | 430 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support |
| 334 i32 and i64 argument types (the types supported by C-style GCC builtins). | 431 i32 and i64 argument types (the types supported by C-style GCC builtins). |
| 335 | 432 |
| 336 * ``llvm.trap`` | 433 * ``llvm.trap`` |
| 337 * ``llvm.nacl.read.tp`` | 434 * ``llvm.nacl.read.tp`` |
| 338 | 435 |
| 339 TODO: describe | 436 TODO: describe |
| 340 | 437 |
| 341 * ``llvm.nacl.longjmp`` | 438 * ``llvm.nacl.longjmp`` |
| 342 | 439 |
| 343 TODO: describe | 440 TODO: describe |
| 344 | 441 |
| 345 * ``llvm.nacl.setjmp`` | 442 * ``llvm.nacl.setjmp`` |
| 346 | 443 |
| 347 TODO: describe | 444 TODO: describe |
| 348 | 445 |
| 446 .. _atomic intrinsics: | |
| 447 | |
| 448 * ``llvm.nacl.atomic.store`` | |
| 449 * ``llvm.nacl.atomic.load`` | |
| 450 * ``llvm.nacl.atomic.rmw`` | |
| 451 * ``llvm.nacl.atomic.cmpxchg`` | |
| 452 * ``llvm.nacl.atomic.fence`` | |
| 453 | |
| 454 .. code-block:: llvm | |
| 455 | |
| 456 declare iN @llvm.nacl.atomic.load( | |
| 457 iN* <source>, i32 <memory_order>) | |
| 458 declare void @llvm.nacl.atomic.store( | |
| 459 iN <operand>, iN* <destination>, i32 <memory_order>) | |
| 460 declare iN @llvm.nacl.atomic.rmw( | |
| 461 i32 <computation>, iN* <object>, iN <operand>, i32 <memory_order>) | |
| 462 declare iN @llvm.nacl.atomic.cmpxchg( | |
| 463 iN* <object>, iN <expected>, iN <desired>, | |
| 464 i32 <memory_order_success>, i32 <memory_order_failure>) | |
| 465 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) | |
| 466 | |
| 467 Each of these intrinsics is overloaded on the ``iN`` | |
| 468 argument. Integral types of 8, 16, 32 and 64-bit width are supported | |
| 469 for these ``iN`` arguments. | |
| 470 | |
| 471 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following | |
| 472 read-modify-write operations, from the general and arithmetic sections | |
| 473 of the C11/C++11 standards: | |
| 474 | |
| 475 - ``add`` | |
| 476 - ``sub`` | |
| 477 - ``or`` | |
| 478 - ``and`` | |
| 479 - ``xor`` | |
| 480 - ``exchange`` | |
| 481 | |
| 482 For all of these read-modify-write operations, the returned value is | |
| 483 that at ``object`` before the computation. The ``computation`` | |
| 484 argument must be a compile-time constant. | |
| 485 | |
| 486 All atomic intrinsics also support C11/C++11 memory orderings, which | |
| 487 must be compile-time constants. Those are detailed in `Atomic Memory | |
| 488 Ordering Constraints`_. | |
| 489 | |
| 490 Integer values for these computations and memory orderings are defined | |
| 491 in ``"llvm/IR/NaClIntrinsics.h"``. | |
| 492 | |
| 493 .. note:: | |
| 494 | |
| 495 These intrinsics allow PNaCl to support C11/C++11 style atomic | |
| 496 operations as well as some legacy GCC-style ``__sync_*`` builtins | |
| 497 while remaining stable as the LLVM codebase changes. The user | |
| 498 isn't expected to use these intrinsics directly. | |
| OLD | NEW |