Chromium Code Reviews| OLD | NEW |
|---|---|
| 1 ============================== | 1 ============================== |
| 2 PNaCl Bitcode Reference Manual | 2 PNaCl Bitcode Reference Manual |
| 3 ============================== | 3 ============================== |
| 4 | 4 |
| 5 .. contents:: | 5 .. contents:: |
| 6 :local: | 6 :local: |
| 7 :depth: 3 | 7 :depth: 3 |
| 8 | 8 |
| 9 Introduction | 9 Introduction |
| 10 ============ | 10 ============ |
| (...skipping 88 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 99 | 99 |
| 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ | 100 `LLVM LangRef: Module-Level Inline Assembly <LangRef.html#moduleasm>`_ |
| 101 | 101 |
| 102 PNaCl bitcode does not support inline assembly. | 102 PNaCl bitcode does not support inline assembly. |
| 103 | 103 |
| 104 Volatile Memory Accesses | 104 Volatile Memory Accesses |
| 105 ------------------------ | 105 ------------------------ |
| 106 | 106 |
| 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ | 107 `LLVM LangRef: Volatile Memory Accesses <LangRef.html#volatile>`_ |
| 108 | 108 |
| 109 TODO: are we going to promote volatile to atomic? | 109 We recommend that C11/C++11 atomics be used instead of ``volatile``. |
| 110 | |
| 111 The C and C++ standards mandate that ``volatile`` accesses execute in | |
| 112 program order (but are not fences, so other memory operations can | |
| 113 reorder around them), are not necessarily atomic, and can’t be elided or | |
| 114 fused. | |
| 115 | |
| 116 The PNaCl toolchain applies regular LLVM optimizations along these | |
| 117 guidelines, and the PNaCl then toolchain freezes ``volatile`` accesses | |
| 118 into atomic accesses with sequential consistency memory ordering. This | |
| 119 eases the support of legacy (i.e. non-C11/C++11) code, and combined with | |
| 120 builtin fences these programs can do meaningful cross-thread | |
| 121 communication without changing code. | |
| 122 | |
| 123 Relaxed ordering could be used instead, but for the first release it is | |
| 124 more conservative to apply sequential consistency. Future releases may | |
| 125 change what happens at compile-time, but already-release pexes will | |
|
Derek Schuff
2013/06/26 17:03:29
release->released
JF
2013/06/26 23:41:12
Done.
| |
| 126 continue using sequential consistency. | |
| 127 | |
| 128 The PNaCl toolchain also tries to guarantee natural alignment of | |
| 129 ``volatile`` accesses, a requirement for atomicity on some platforms. | |
| 110 | 130 |
| 111 Memory Model for Concurrent Operations | 131 Memory Model for Concurrent Operations |
| 112 -------------------------------------- | 132 -------------------------------------- |
| 113 | 133 |
| 114 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ | 134 `LLVM LangRef: Memory Model for Concurrent Operations <LangRef.html#memmodel>`_ |
| 115 | 135 |
| 116 TODO. | 136 The PNaCl toolchain currently supports concurrent memory accesses |
| 137 through legacy GCC-style ``__sync_*`` builtins, as well as through | |
| 138 C11/C++11 atomic primitives. ``volatile`` memory accesses can also be | |
| 139 used, though these are discouraged. | |
| 140 | |
| 141 Note that PNaCl explicitly supports concurrency through threading, but | |
| 142 doesn't support interacting with device memory, nor does it attempt to | |
| 143 support cross-program communication, including through shared | |
| 144 memory. These concerns are left up to the embedding sandbox's runtime | |
| 145 (e.g. NaCl's Pepper APIs). | |
| 146 | |
| 147 PNaCl also doesn't currently support signal handling, and therefore | |
| 148 promotes all primitives to cross-thread (instead of single-thread). This | |
| 149 may change at a later date. | |
| 150 | |
| 151 The PNaCl toolchain currently optimizes for memory ordering as LLVM | |
| 152 normally does, but at pexe creation time it promotes all ``volatile`` | |
| 153 accesses as well as all atomic accesses to be sequentially consistent. | |
| 154 | |
| 155 This means that ``volatile`` and atomic memory accesses can only be | |
| 156 re-ordered before the pexe is created, and will act as fences for all | |
| 157 memory accesses (even non-atomic and non-``volatile``) after pexe | |
| 158 creation. Non-atomic and non-``volatile`` memory accesses may be | |
| 159 reordered (unless a fence intervenes), separate, elided or fused | |
|
Derek Schuff
2013/06/26 17:03:29
separate->separated?
JF
2013/06/26 23:41:12
Done.
| |
| 160 according to C and C++'s memory model before the pexe is created as well | |
| 161 as after its creation. | |
| 117 | 162 |
| 118 Atomic Memory Ordering Constraints | 163 Atomic Memory Ordering Constraints |
| 119 ---------------------------------- | 164 ---------------------------------- |
| 120 | 165 |
| 121 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ | 166 `LLVM LangRef: Atomic Memory Ordering Constraints <LangRef.html#ordering>`_ |
| 122 | 167 |
| 123 TODO. | 168 Atomics follow the same ordering constraints as in regular LLVM, but all |
| 169 accesses are promoted to sequential consistency (the strongest memory | |
| 170 ordering) at pexe creation time. We may relax these rules and honor the | |
| 171 program's memory ordering constraints as more C11/C++11 code allows us | |
| 172 to understand performance and portability needs. | |
| 173 | |
| 174 As in C11/C++11: | |
| 175 | |
| 176 - Atomic accesses must at least be naturally aligned. | |
| 177 - Some accesses may not actually be atomic on certain platforms, | |
| 178 requiring an implementation that uses a global lock. | |
| 124 | 179 |
| 125 Fast-Math Flags | 180 Fast-Math Flags |
| 126 --------------- | 181 --------------- |
| 127 | 182 |
| 128 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ | 183 `LLVM LangRef: Fast-Math Flags <LangRef.html#fastmath>`_ |
| 129 | 184 |
| 130 Fast-math mode is not currently supported by the PNaCl bitcode. | 185 Fast-math mode is not currently supported by the PNaCl bitcode. |
| 131 | 186 |
| 132 Type System | 187 Type System |
| 133 =========== | 188 =========== |
| (...skipping 129 matching lines...) Expand 10 before | Expand all | Expand 10 after Loading... | |
| 263 | 318 |
| 264 .. code-block:: llvm | 319 .. code-block:: llvm |
| 265 | 320 |
| 266 %buf = alloca i8, i32 8, align 4 | 321 %buf = alloca i8, i32 8, align 4 |
| 267 | 322 |
| 268 * ``load``, ``store`` | 323 * ``load``, ``store`` |
| 269 | 324 |
| 270 The pointer argument of these instructions must be a *normalized* pointer | 325 The pointer argument of these instructions must be a *normalized* pointer |
| 271 (see :ref:`pointer types <pointertypes>`). | 326 (see :ref:`pointer types <pointertypes>`). |
| 272 | 327 |
| 273 * ``fence`` | |
| 274 * ``cmpxchg``, ``atomicrmw`` | |
| 275 | |
| 276 The pointer argument of these instructions must be a *normalized* pointer | |
| 277 (see :ref:`pointer types <pointertypes>`). | |
| 278 | |
| 279 TODO(jfb): this may change | |
| 280 | |
| 281 * ``trunc`` | 328 * ``trunc`` |
| 282 * ``zext`` | 329 * ``zext`` |
| 283 * ``sext`` | 330 * ``sext`` |
| 284 * ``fptrunc`` | 331 * ``fptrunc`` |
| 285 * ``fpext`` | 332 * ``fpext`` |
| 286 * ``fptoui`` | 333 * ``fptoui`` |
| 287 * ``fptosi`` | 334 * ``fptosi`` |
| 288 * ``uitofp`` | 335 * ``uitofp`` |
| 289 * ``sitofp`` | 336 * ``sitofp`` |
| 290 | 337 |
| (...skipping 18 matching lines...) Expand all Loading... | |
| 309 * ``select`` | 356 * ``select`` |
| 310 * ``call`` | 357 * ``call`` |
| 311 | 358 |
| 312 Intrinsic Functions | 359 Intrinsic Functions |
| 313 =================== | 360 =================== |
| 314 | 361 |
| 315 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ | 362 `LLVM LangRef: Intrinsic Functions <LangRef.html#intrinsics>`_ |
| 316 | 363 |
| 317 The only intrinsics supported by PNaCl bitcode are the following. | 364 The only intrinsics supported by PNaCl bitcode are the following. |
| 318 | 365 |
| 319 TODO(jfb): atomics | |
| 320 | |
| 321 * ``llvm.memcpy`` | 366 * ``llvm.memcpy`` |
| 322 * ``llvm.memmove`` | 367 * ``llvm.memmove`` |
| 323 * ``llvm.memset`` | 368 * ``llvm.memset`` |
| 324 * ``llvm.bswap`` | 369 * ``llvm.bswap`` |
| 325 | 370 |
| 326 The llvm.bswap intrinsic is only supported with the following argument types: | 371 The llvm.bswap intrinsic is only supported with the following argument types: |
| 327 i16, i32, i64. | 372 i16, i32, i64. |
| 328 | 373 |
| 329 * ``llvm.ctlz`` | 374 * ``llvm.ctlz`` |
| 330 * ``llvm.cttz`` | 375 * ``llvm.cttz`` |
| 331 * ``llvm.ctpop`` | 376 * ``llvm.ctpop`` |
| 332 | 377 |
| 333 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support | 378 The llvm.ctlz, llvm.cttz, and llvm.ctpop intrinsics only support |
| 334 i32 and i64 argument types (the types supported by C-style GCC builtins). | 379 i32 and i64 argument types (the types supported by C-style GCC builtins). |
| 335 | 380 |
| 336 * ``llvm.trap`` | 381 * ``llvm.trap`` |
| 337 * ``llvm.nacl.read.tp`` | 382 * ``llvm.nacl.read.tp`` |
| 338 | 383 |
| 339 TODO: describe | 384 TODO: describe |
| 340 | 385 |
| 341 * ``llvm.nacl.longjmp`` | 386 * ``llvm.nacl.longjmp`` |
| 342 | 387 |
| 343 TODO: describe | 388 TODO: describe |
| 344 | 389 |
| 345 * ``llvm.nacl.setjmp`` | 390 * ``llvm.nacl.setjmp`` |
| 346 | 391 |
| 347 TODO: describe | 392 TODO: describe |
| 348 | 393 |
| 394 * ``llvm.nacl.atomic.8`` | |
| 395 * ``llvm.nacl.atomic.16`` | |
| 396 * ``llvm.nacl.atomic.32`` | |
| 397 * ``llvm.nacl.atomic.64`` | |
| 398 | |
| 399 These intrinsics provide support for atomic accesses at 8, 16, 32 and | |
| 400 64-bit sizes for primitives required to implement C11/C++11 atomic | |
| 401 accesses: | |
| 402 | |
| 403 - load | |
| 404 - store | |
| 405 - add | |
| 406 - sub | |
| 407 - or | |
| 408 - and | |
| 409 - xor | |
| 410 - xchg | |
| 411 - cmpxchg | |
| 412 - fence | |
| 413 | |
| 414 They also support C11/C++11 memory orderings: | |
| 415 | |
| 416 - Relaxed: no operation orders memory. | |
| 417 - Consume: a load operation performs a consume operation on the | |
| 418 affected memory location (currently unsupported by LLVM). | |
| 419 - Acquire: a load operation performs an acquire operation on the | |
| 420 affected memory location. | |
| 421 - Release: a store operation performs a release operation on the | |
| 422 affected memory location. | |
| 423 - Acquire-release: load and store operations perform acquire and | |
| 424 release operations on the affected memory. | |
| 425 - Sequentially consistent: same as acquire-release, but providing a | |
| 426 total ordering for all affected locations. | |
| 427 | |
| 428 Note that PNaCl currently strengthens all memory ordering | |
| 429 specifications to sequential consistency, the strongest form of memory | |
| 430 ordering. | |
| OLD | NEW |