Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 ============================== | 1 ============================== |
| 2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
| 3 ============================== | 3 ============================== |
| 4 | 4 |
| 5 .. contents:: | 5 .. contents:: |
| 6 :local: | 6 :local: |
| 7 :depth: 3 | 7 :depth: 3 |
| 8 | 8 |
| 9 Introduction | 9 Introduction |
| 10 ============ | 10 ============ |
| (...skipping 88 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 99 | 99 |
| 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
| 101 | 101 |
| 102 PNaCl bitcode does not support inline assembly. | 102 PNaCl bitcode does not support inline assembly. |
| 103 | 103 |
| 104 Volatile Memory Accesses | 104 Volatile Memory Accesses |
| 105 ------------------------ | 105 ------------------------ |
| 106 | 106 |
| 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
| 108 | 108 |
| 109 TODO: are we going to promote volatile to atomic? | 109 The C and C++ standards mandate that ``volatile`` accesses execute in |
|
eliben
2013/06/27 19:32:24
The C11 and C++11 standards...
JF
2013/06/27 21:03:17
These restrictions are actually all in the previou
| |
| 110 program order (but are not fences, so other memory operations can | |
| 111 reorder around them), are not necessarily atomic, and can’t be elided or | |
| 112 fused. | |
| 113 | |
| 114 The PNaCl toolchain applies regular LLVM optimizations along these | |
| 115 guidelines, but prevents any load/store (even non-``volatile`` and | |
| 116 non-atomic ones) from moving past a volatile operations: they act as | |
|
eliben
2013/06/27 19:32:24
"past volatile operations"?
JF
2013/06/27 21:03:17
I mean: a regular load/store can't move above or b
| |
| 117 compiler barriers before optimizations occur. The PNaCl toolchain | |
| 118 freezes ``volatile`` accesses after optimizations into atomic accesses | |
| 119 with sequential consistency memory ordering. This eases the support of | |
|
eliben
2013/06/27 19:32:24
sequentially consistent memory ordering?
JF
2013/06/27 21:03:17
Done.
| |
| 120 legacy (i.e. non-C11/C++11) code, and combined with builtin fences these | |
| 121 programs can do meaningful cross-thread communication without changing | |
| 122 code. It also reflects the original code's intent and guarantees better | |
| 123 portability. | |
| 124 | |
| 125 Relaxed ordering could be used instead, but for the first release it is | |
| 126 more conservative to apply sequential consistency. Future releases may | |
| 127 change what happens at compile-time, but already-released pexes will | |
| 128 continue using sequential consistency. | |
| 129 | |
| 130 The PNaCl toolchain also requires that ``volatile`` accesses be at least | |
| 131 naturally aligned, and tries to guarantee this alignment. | |
| 110 | 132 |
| 111 Memory Model for Concurrent Operations | 133 Memory Model for Concurrent Operations |
| 112 -------------------------------------- | 134 -------------------------------------- |
| 113 | 135 |
| 114 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 136 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
| 115 | 137 |
| 116 TODO. | 138 The memory model offered by PNaCl relies the same coding guidelines as |
|
eliben
2013/06/27 19:32:24
"relies on the same" ?
JF
2013/06/27 21:03:17
Done.
| |
| 139 the C11/C++11 one: concurrent accesses must always occur through atomic | |
| 140 primitives, and these accesses must always occur with the same size for | |
| 141 the same memory locations. Visibility of stores is provided on a | |
| 142 happens-before basis that relates memory locations to each other as the | |
| 143 C11/C++11 standards do. | |
| 144 | |
| 145 As in C11/C++11 some atomic accesses may be implemented with locks on | |
| 146 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be | |
| 147 ``1``, signifying that all types are sometimes lock-free. The | |
| 148 ``is_lock_free`` methods will return the current platform's | |
| 149 implementation at runtime. | |
| 150 | |
| 151 The PNaCl toolchain supports concurrent memory accesses through legacy | |
| 152 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic | |
| 153 primitives. ``volatile`` memory accesses can also be used, though these | |
| 154 are discouraged. | |
| 155 | |
| 156 Note that PNaCl explicitly supports concurrency through threading and | |
| 157 inter-process communication (shared memory), but doesn't support | |
| 158 interacting with device memory. Setting these up require assistance from | |
| 159 the embedding sandbox's runtime (e.g. NaCl's Pepper APIs), but using | |
| 160 them once setup can be done through regular C/C++ code. | |
| 161 | |
| 162 PNaCl also doesn't currently support signal handling, and therefore | |
| 163 promotes all primitives to cross-thread (instead of single-thread). This | |
| 164 may change at a later date. | |
| 165 | |
| 166 The PNaCl toolchain currently optimizes for memory ordering as LLVM | |
| 167 normally does, but at pexe creation time it promotes all ``volatile`` | |
| 168 accesses as well as all atomic accesses to be sequentially | |
| 169 consistent. Other memory orderings will be supported in a future | |
| 170 release, but pexes generate with the current toolchain will continue | |
| 171 functioning with sequential consistency. Using sequential consistency | |
| 172 provides a total ordering for all sequentially-consistent operations on | |
| 173 all addresses. | |
| 174 | |
| 175 This means that ``volatile`` and atomic memory accesses can only be | |
| 176 re-ordered in some limited way before the pexe is created, and will act | |
| 177 as fences for all memory accesses (even non-atomic and non-``volatile``) | |
| 178 after pexe creation. Non-atomic and non-``volatile`` memory accesses may | |
| 179 be reordered (unless a fence intervenes), separated, elided or fused | |
| 180 according to C and C++'s memory model before the pexe is created as well | |
| 181 as after its creation. | |
| 117 | 182 |
| 118 Atomic Memory Ordering Constraints | 183 Atomic Memory Ordering Constraints |
| 119 ---------------------------------- | 184 ---------------------------------- |
| 120 | 185 |
| 121 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 186 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
| 122 | 187 |
| 123 TODO. | 188 Atomics follow the same ordering constraints as in regular LLVM, but all |
| 189 accesses are promoted to sequential consistency (the strongest memory | |
| 190 ordering) at pexe creation time. We may relax these rules and honor the | |
| 191 program's memory ordering constraints as more C11/C++11 code allows us | |
| 192 to understand performance and portability needs. | |
| 193 | |
| 194 As in C11/C++11: | |
| 195 | |
| 196 - Atomic and volatile accesses must at least be naturally aligned. | |
| 197 - Some accesses may not actually be atomic on certain platforms, | |
| 198 requiring an implementation that uses a global lock. | |
| 199 - An atomic memory location must always be accesses with atomic | |
|
eliben
2013/06/27 19:32:24
accessed
JF
2013/06/27 21:03:17
Done.
| |
| 200 primitives, and these primitives must always be of the same type for | |
| 201 that location. | |
| 124 | 202 |
| 125 Fast-Math Flags | 203 Fast-Math Flags |
| 126 --------------- | 204 --------------- |
| 127 | 205 |
| 128 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 206 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
| 129 | 207 |
| 130 Fast-math mode is not currently supported by the PNaCl bitcode. | 208 Fast-math mode is not currently supported by the PNaCl bitcode. |
| 131 | 209 |
| 132 Type System | 210 Type System |
| 133 =========== | 211 =========== |
| (...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 263 | 341 |
| 264 .. code-block:: llvm | 342 .. code-block:: llvm |
| 265 | 343 |
| 266 %buf = alloca i8, i32 8, align 4 | 344 %buf = alloca i8, i32 8, align 4 |
| 267 | 345 |
| 268 * ``load``, ``store`` | 346 * ``load``, ``store`` |
| 269 | 347 |
| 270 The pointer argument of these instructions must be a *normalized* pointer | 348 The pointer argument of these instructions must be a *normalized* pointer |
| 271 (see :ref:`pointer types <pointertypes>`). | 349 (see :ref:`pointer types <pointertypes>`). |
| 272 | 350 |
| 273 * ``fence`` | |
| 274 * ``cmpxchg``, ``atomicrmw`` | |
| 275 | |
| 276 The pointer argument of these instructions must be a *normalized* pointer | |
| 277 (see :ref:`pointer types <pointertypes>`). | |
| 278 | |
| 279 TODO(jfb): this may change | |
| 280 | |
| 281 * ``trunc`` | 351 * ``trunc`` |
| 282 * ``zext`` | 352 * ``zext`` |
| 283 * ``sext`` | 353 * ``sext`` |
| 284 * ``fptrunc`` | 354 * ``fptrunc`` |
| 285 * ``fpext`` | 355 * ``fpext`` |
| 286 * ``fptoui`` | 356 * ``fptoui`` |
| 287 * ``fptosi`` | 357 * ``fptosi`` |
| 288 * ``uitofp`` | 358 * ``uitofp`` |
| 289 * ``sitofp`` | 359 * ``sitofp`` |
| 290 | 360 |
| (...skipping 18 matching lines...) Expand all Loading... | |
| 309 * ``select`` | 379 * ``select`` |
| 310 * ``call`` | 380 * ``call`` |
| 311 | 381 |
| 312 Intrinsic Functions | 382 Intrinsic Functions |
| 313 =================== | 383 =================== |
| 314 | 384 |
| 315 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 385 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
| 316 | 386 |
| 317 The only intrinsics supported by PNaCl bitcode are the following. | 387 The only intrinsics supported by PNaCl bitcode are the following. |
| 318 | 388 |
| 319 TODO(jfb): atomics | |
| 320 | |
| 321 * ``llvm.memcpy`` | 389 * ``llvm.memcpy`` |
| 322 * ``llvm.memmove`` | 390 * ``llvm.memmove`` |
| 323 * ``llvm.memset`` | 391 * ``llvm.memset`` |
| 324 * ``llvm.bswap`` | 392 * ``llvm.bswap`` |
| 325 | 393 |
| 326 The llvm.bswap intrinsic is only supported with the following argument types: | 394 The llvm.bswap intrinsic is only supported with the following argument types: |
| 327 i16, i32, i64. | 395 i16, i32, i64. |
| 328 | 396 |
| 329 * ``llvm.ctlz`` | 397 * ``llvm.ctlz`` |
| 330 * ``llvm.cttz`` | 398 * ``llvm.cttz`` |
| 331 * ``llvm.ctpop`` | 399 * ``llvm.ctpop`` |
| 332 | 400 |
| 333 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support | 401 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support |
| 334 i32 and i64 argument types (the types supported by C-style GCC builtins). | 402 i32 and i64 argument types (the types supported by C-style GCC builtins). |
| 335 | 403 |
| 336 * ``llvm.trap`` | 404 * ``llvm.trap`` |
| 337 * ``llvm.nacl.read.tp`` | 405 * ``llvm.nacl.read.tp`` |
| 338 | 406 |
| 339 TODO: describe | 407 TODO: describe |
| 340 | 408 |
| 341 * ``llvm.nacl.longjmp`` | 409 * ``llvm.nacl.longjmp`` |
| 342 | 410 |
| 343 TODO: describe | 411 TODO: describe |
| 344 | 412 |
| 345 * ``llvm.nacl.setjmp`` | 413 * ``llvm.nacl.setjmp`` |
| 346 | 414 |
| 347 TODO: describe | 415 TODO: describe |
| 348 | 416 |
| 417 * ``llvm.nacl.atomic.store`` | |
| 418 * ``llvm.nacl.atomic.load`` | |
| 419 * ``llvm.nacl.atomic.rmw`` | |
| 420 * ``llvm.nacl.atomic.cmpxchg`` | |
| 421 * ``llvm.nacl.atomic.fence`` | |
| 422 | |
| 423 These intrinsics allow PNaCl to support C11/C++11 style atomic | |
| 424 operations as well as some legacy GCC-style ``__sync_*`` builtins | |
| 425 while remaining stable as the LLVM codebase changes. The user isn't | |
| 426 expected to use these intrinsics directly. | |
| 427 | |
| 428 :: | |
| 429 | |
| 430 declare void @llvm.nacl.atomic.store( | |
| 431 iN* <dest>, iN <val>, i32 <memory_order>) | |
| 432 declare iN @llvm.nacl.atomic.load( | |
| 433 iN* <src>, i32 <memory_order>) | |
| 434 declare iN @llvm.nacl.atomic.rmw( | |
| 435 i32 <op>, iN* <loc>, iN <val>, i32 <memory_order>) | |
| 436 declare iN @llvm.nacl.atomic.compare_exchange( | |
|
eliben
2013/06/27 19:32:24
name mismatch with above
JF
2013/06/27 21:03:17
Done. I went with the C11/C++11 naming and forgot
| |
| 437 iN* <loc>, iN <expected>, iN <desired>, | |
| 438 i32 <memory_order_success>, i32 <memory_order_failure>) | |
| 439 declare void @llvm.nacl.atomic.fence(i32 <memory_order>) | |
| 440 | |
| 441 Each of these intrinsic is overloaded on the ``iN`` argument. Integral | |
|
eliben
2013/06/27 19:32:24
intrinsics
JF
2013/06/27 21:03:17
Done.
| |
| 442 types of 8, 16, 32 and 64-bit width are currently supported for these | |
| 443 ``iN`` arguments. | |
| 444 | |
| 445 The ``@llvm.nacl.atomic.rmw`` intrinsic implements the following | |
| 446 read-modify-write operations, from the general and arithmetic sections | |
| 447 of the C11/C++11 standards: | |
| 448 | |
| 449 - ``add`` | |
| 450 - ``sub`` | |
| 451 - ``or`` | |
| 452 - ``and`` | |
| 453 - ``xor`` | |
| 454 - ``exchange`` | |
| 455 | |
| 456 For all of these read-modify-write operations, the returned value is | |
| 457 that at ``loc`` before the operation. The ``op`` argument must be a | |
| 458 compile-time constant. | |
| 459 | |
| 460 All atomic intrinsics also support C11/C++11 memory orderings, which | |
| 461 must be compile-time constants: | |
| 462 | |
| 463 - Relaxed: no operation orders memory. | |
| 464 - Consume: a load operation performs a consume operation on the | |
| 465 affected memory location (currently unsupported by LLVM). | |
| 466 - Acquire: a load operation performs an acquire operation on the | |
| 467 affected memory location. | |
| 468 - Release: a store operation performs a release operation on the | |
| 469 affected memory location. | |
| 470 - Acquire-release: load and store operations perform acquire and | |
| 471 release operations on the affected memory. | |
| 472 - Sequentially consistent: same as acquire-release, but providing a | |
| 473 global total ordering for all affected locations. | |
| 474 | |
| 475 Note that PNaCl currently strengthens all memory ordering | |
| 476 specifications to sequential consistency, the strongest form of memory | |
| 477 ordering. | |
| 478 | |
| 479 Values for these operations and memory orderings are defined in | |
| 480 llvm/IR/NaClIntrinsics.h. | |
| OLD | NEW |