docs/PNaClDeveloperGuide.rst - Issue 22240002: Rework PNaCl memory ordering

Side by Side Diff: docs/PNaClDeveloperGuide.rst

Issue 22240002: Rework PNaCl memory ordering (Closed) Base URL: http://git.chromium.org/native_client/pnacl-llvm.git@master

Patch Set: Created 7 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 =======================	1 =======================

2 PNaCl Developer's Guide	2 PNaCl Developer's Guide

3 =======================	3 =======================

4	4

5 .. contents::	5 .. contents::

6 :local:	6 :local:

7 :depth: 3	7 :depth: 3

8	8

9 Introduction	9 Introduction

10 ============	10 ============

11	11

12 TODO	12 TODO

13	13

14 Memory Model and Atomics	14 Memory Model and Atomics

15 ========================	15 ========================

16	16

	17 Memory Model for Concurrent Operations

	18 --------------------------------------

	19

	20 The memory model offered by PNaCl relies on the same coding guidelines

	21 as the C11/C++11 one: concurrent accesses must always occur through

	22 atomic primitives (offered by `atomic intrinsics

	23 <PNaClLangRef.html#atomicintrinsics>`_), and these accesses must always

	24 occur with the same size for the same memory location. Visibility of

	25 stores is provided on a happens-before basis that relates memory

	26 locations to each other as the C11/C++11 standards do.

	27

	28 As in C11/C++11 some atomic accesses may be implemented with locks on

	29 certain platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be

	30 ``1``, signifying that all types are sometimes lock-free. The

	31 ``is_lock_free`` methods and ``atomic_is_lock_free`` will return the

	32 current platform's implementation at translation time. These macros,

	33 methods and functions are in the C11 header ``<stdatomic.h>`` and the

	34 C++11 header ``<atomic>``.

	35

	36 The PNaCl toolchain supports concurrent memory accesses through legacy

	37 GCC-style ``__sync_*`` builtins, as well as through C11/C++11 atomic

	38 primitives. ``volatile`` memory accesses can also be used, though these

	39 are discouraged, and aren't present in bitcode. See `Volatile Memory
	eliben 2013/08/05 18:35:54 Remove the "aren't present in bitcode" - i don't t Remove the "aren't present in bitcode" - i don't think this is relevant. The developer's guide should be read by devs who neither care about nor understand the bitcode ABI jvoung (off chromium) 2013/08/05 18:59:41 Yeah, this is the summary for developers. What is Show quoted text On 2013/08/05 18:35:54, eliben wrote: > Remove the "aren't present in bitcode" - i don't think this is relevant. The > developer's guide should be read by devs who neither care about nor understand > the bitcode ABI Yeah, this is the summary for developers. What is relevant is that volatile accesses are promoted to be sequentially consistent atomic accesses, unlike normal C/C++, not the bitcode ABI. JF 2013/08/05 20:37:48 Done. Show quoted text On 2013/08/05 18:35:54, eliben wrote: > Remove the "aren't present in bitcode" - i don't think this is relevant. The > developer's guide should be read by devs who neither care about nor understand > the bitcode ABI Done.
	40 Accesses`_.

	41

	42 PNaCl supports concurrency and parallelism with some restrictions:

	43

	44 * Threading is explicitly supported through C11/C++11's threading

	45 libraries as well as POSIX threads.

	46

	47 * Inter-process communication through shared memory is limited to
	eliben 2013/08/05 18:35:54 What does inter-process communication even mean in What does inter-process communication even mean in PNaCl? We should have a section here about it, at least as a place-holder JF 2013/08/05 20:37:48 Done. Show quoted text On 2013/08/05 18:35:54, eliben wrote: > What does inter-process communication even mean in PNaCl? We should have a > section here about it, at least as a place-holder Done.
	48 operations which are lock-free on the current platform

	49 (``is_lock_free`` methods). This may change at a later date.

	50

	51 * Direct interaction with device memory isn't supported.

	52

	53 * Signal handling isn't supported, PNaCl therefore promotes all

	54 primitives to cross-thread (instead of single-thread). This may change

	55 at a later date. Note that using atomic operations which aren't

	56 lock-free may lead to deadlocks when handling asynchronous signals.

	57

	58 * ``volatile`` and atomic operations are address-free (operations on the

	59 same memory location via two different addresses work atomically), as

	60 intended by the C11/C++11 standards. This is critical for

	61 inter-process communication as well as synchronous "external

	62 modifications" such as mapping underlying memory at multiple

	63 locations.

	64

	65 Setting up the above mechanisms requires assistance from the embedding

	66 sandbox's runtime (e.g. NaCl's Pepper APIs), but using them once setup

	67 can be done through regular C/C++ code.

	68

	69 Atomic Memory Ordering Constraints

	70 ----------------------------------

	71

	72 Atomics follow the same ordering constraints as in regular LLVM, but all

	73 accesses are promoted to sequential consistency (the strongest memory

	74 ordering) at pexe creation time. As more C11/C++11 code allows us to
	jvoung (off chromium) 2013/08/05 18:59:41 Should the memory orderings change also be done by Should the memory orderings change also be done by the frontend, and not at the pexe freeze point, after optimization? JF 2013/08/05 20:37:48 I think the current implementation should offer a Show quoted text On 2013/08/05 18:59:41, jvoung (cr) wrote: > Should the memory orderings change also be done by the frontend, and not at the > pexe freeze point, after optimization? I think the current implementation should offer a good medium between caution and potential optimization. I'm relatively certain that our second release will be able to offer most memory orderings (except consume, and maybe relaxed which may have some implementation issues on weakly-ordered machines). Realistically most code today uses volatile or ``__sync_*``, which are all sequentially consistent, a PNaCl user has to do extra work to properly use other memory orderings, I think they can therefore be trusted to have tested this code fairly well.
	75 understand performance and portability needs we intend to support the

	76 full gamut of C11/C++11 memory orderings:

	77

	78 - Relaxed: no operation orders memory.
	eliben 2013/08/05 18:35:54 Maybe this list does not belong here? This is user Maybe this list does not belong here? This is user facing documentation; let's not spend too much time discussing things that are not actually supported. JF 2013/08/05 20:37:48 This is not an addition, I just moved it around. I Show quoted text On 2013/08/05 18:35:54, eliben wrote: > Maybe this list does not belong here? This is user facing documentation; let's > not spend too much time discussing things that are not actually supported. This is not an addition, I just moved it around. I think we should list these or redirect to LLVM's own developer documentation, since they are actually supported but rewritten post-opt.
	79 - Consume: a load operation performs a consume operation on the affected

	80 memory location (currently unsupported by LLVM).

	81 - Acquire: a load operation performs an acquire operation on the

	82 affected memory location.

	83 - Release: a store operation performs a release operation on the

	84 affected memory location.

	85 - Acquire-release: load and store operations perform acquire and release

	86 operations on the affected memory.

	87 - Sequentially consistent: same as acquire-release, but providing a

	88 global total ordering for all affected locations.

	89

	90 As in C11/C++11:

	91

	92 - Atomic accesses must at least be naturally aligned.

	93 - Some accesses may not actually be atomic on certain platforms,

	94 requiring an implementation that uses global lock(s).

	95 - An atomic memory location must always be accessed with atomic

	96 primitives, and these primitives must always be of the same bit size

	97 for that location.

	98 - Not all memory orderings are valid for all atomic operations.

	99

17 Volatile Memory Accesses	100 Volatile Memory Accesses

18 ------------------------	101 ------------------------

19	102

20 The C11/C++11 standards mandate that ``volatile`` accesses execute in program	103 The C11/C++11 standards mandate that ``volatile`` accesses execute in

21 order (but are not fences, so other memory operations can reorder around them),	104 program order (but are not fences, so other memory operations can

22 are not necessarily atomic, and can’t be elided. They can be separated into	105 reorder around them), are not necessarily atomic, and can’t be

23 smaller width accesses.	106 elided. They can be separated into smaller width accesses.

24	107

25 The PNaCl toolchain applies regular LLVM optimizations along these guidelines,	108 Before any optimizations occur the PNaCl toolchain transforms

26 and it further prevents any load/store (even non-``volatile`` and non-atomic	109 ``volatile`` loads and stores into sequentially consistent ``volatile``

27 ones) from moving above or below a volatile operations: they act as compiler	110 atomic loads and stores, and applies regular LLVM optimizations along

28 barriers before optimizations occur. The PNaCl toolchain freezes ``volatile``	111 the above guidelines. This orders ``volatiles`` according to the atomic

29 accesses after optimizations into atomic accesses with sequentially consistent	112 rules, and means that fences (including ``__sync_synchronize``) act in a

30 memory ordering. This eases the support of legacy (i.e. non-C11/C++11) code, and	113 better-defined manner. Regular memory accesses still do not have

31 combined with builtin fences these programs can do meaningful cross-thread	114 ordering guarantees with ``volatile`` and atomic accesses, though the

32 communication without changing code. It also reflects the original code's intent	115 internal representation of ``__sync_synchronize`` attempts to prevent

33 and guarantees better portability.	116 reordering of memory accesses to objects which may escape.

34	117

35 Relaxed ordering could be used instead, but for the first release it is more	118 Relaxed ordering could be used instead, but for the first release it is

36 conservative to apply sequential consistency. Future releases may change what	119 more conservative to apply sequential consistency. Future releases may

37 happens at compile-time, but already-released pexes will continue using	120 change what happens at compile-time, but already-released pexes will

38 sequential consistency.	121 continue using sequential consistency.

39	122

40 The PNaCl toolchain also requires that ``volatile`` accesses be at least	123 The PNaCl toolchain also requires that ``volatile`` accesses be at least

41 naturally aligned, and tries to guarantee this alignment.	124 naturally aligned, and tries to guarantee this alignment.

42	125

43 Memory Model for Concurrent Operations	126 The above guarantees ease the support of legacy (i.e. non-C11/C++11)

44 --------------------------------------	127 code, and combined with builtin fences these programs can do meaningful

	128 cross-thread communication without changing code. They also better

	129 reflect the original code's intent and guarantee better portability.

45	130

46 The memory model offered by PNaCl relies on the same coding guidelines as the	131 Stable Transfer Format

47 C11/C++11 one: concurrent accesses must always occur through atomic primitives	132 ----------------------

48 (offered by `atomic intrinsics <PNaClLangRef.html#atomicintrinsics>`_), and

49 these accesses must always occur with the same size for the same memory

50 location. Visibility of stores is provided on a happens-before basis that

51 relates memory locations to each other as the C11/C++11 standards do.

52	133

53 As in C11/C++11 some atomic accesses may be implemented with locks on certain	134 The PNaCl toolchain freezes atomic and ``volatile`` memory accesses

54 platforms. The ``ATOMIC_*_LOCK_FREE`` macros will always be ``1``, signifying	135 after optimizations into atomic accesses with sequentially consistent
	jvoung (off chromium) 2013/08/05 18:59:41 re "after optimizations": volatiles get converted re "after optimizations": volatiles get converted before optimizations. It seems like this has already been said earlier. Instead of focusing on when the conversion happens, perhaps we can say that pexes created and shipped now are created with sequential consistency, and the shipped pexes will remain that way. ... then talk about the future possibilities. JF 2013/08/05 20:37:48 OK, I can remove this section. Show quoted text On 2013/08/05 18:59:41, jvoung (cr) wrote: > re "after optimizations": volatiles get converted before optimizations. > > It seems like this has already been said earlier. Instead of focusing on when > the conversion happens, perhaps we can say that pexes created and shipped now > are created with sequential consistency, and the shipped pexes will remain that > way. > > ... then talk about the future possibilities. OK, I can remove this section.
55 that all types are sometimes lock-free. The ``is_lock_free`` methods will return	136 memory ordering. Other memory orderings will be exposed in future

56 the current platform's implementation at translation time.	137 releases, when we have a better grasp of existing code's needs,

	138 portability implications, and are confident that implementation limits

	139 are overcome. Future releases may change what happens at compile-time,

	140 but already-released pexes will continue using sequential consistency.

57	141

58 The PNaCl toolchain supports concurrent memory accesses through legacy GCC-style	142 Non-atomic and non-``volatile`` memory accesses may be reordered,

59 ``__sync_*`` builtins, as well as through C11/C++11 atomic primitives.	143 separated, elided or fused according to C and C++'s memory model before

60 ``volatile`` memory accesses can also be used, though these are discouraged, and	144 the pexe is created as well as after its creation.

61 aren't present in bitcode.

62

63 PNaCl supports concurrency and parallelism with some restrictions:

64

65 * Threading is explicitly supported.

66

67 * Inter-process communication through shared memory is limited to operations

68 which are lock-free on the current platform (``is_lock_free`` methods). This

69 may change at a later date.

70

71 * Direct interaction with device memory isn't supported.

72

73 * Signal handling isn't supported, PNaCl therefore promotes all primitives to

74 cross-thread (instead of single-thread). This may change at a later date. Note

75 that using atomic operations which aren't lock-free may lead to deadlocks when

76 handling asynchronous signals.

77

78 * ``volatile`` and atomic operations are address-free (operations on the same

79 memory location via two different addresses work atomically), as intended by

80 the C11/C++11 standards. This is critical for inter-process communication as

81 well as synchronous "external modifications" such as mapping underlying memory

82 at multiple locations.

83

84 Setting up the above mechanisms requires assistance from the embedding sandbox's

85 runtime (e.g. NaCl's Pepper APIs), but using them once setup can be done through

86 regular C/C++ code.

87

88 The PNaCl toolchain currently optimizes for memory ordering as LLVM normally

89 does, but at pexe creation time it promotes all ``volatile`` accesses as well as

90 all atomic accesses to be sequentially consistent. Other memory orderings will

91 be supported in a future release, but pexes generated with the current toolchain

92 will continue functioning with sequential consistency. Using sequential

93 consistency provides a total ordering for all sequentially-consistent operations

94 on all addresses.

95

96 This means that ``volatile`` and atomic memory accesses can only be re-ordered

97 in some limited way before the pexe is created, and will act as fences for all

98 memory accesses (even non-atomic and non-``volatile``) after pexe creation.

99 Non-atomic and non-``volatile`` memory accesses may be reordered (unless a fence

100 intervenes), separated, elided or fused according to C and C++'s memory model

101 before the pexe is created as well as after its creation.

102

103 Atomic Memory Ordering Constraints

104 ----------------------------------

105

106 Atomics follow the same ordering constraints as in regular LLVM, but

107 all accesses are promoted to sequential consistency (the strongest

108 memory ordering) at pexe creation time. As more C11/C++11 code

109 allows us to understand performance and portability needs we intend

110 to support the full gamut of C11/C++11 memory orderings:

111

112 - Relaxed: no operation orders memory.

113 - Consume: a load operation performs a consume operation on the affected memory

114 location (currently unsupported by LLVM).

115 - Acquire: a load operation performs an acquire operation on the affected memory

116 location.

117 - Release: a store operation performs a release operation on the affected memory

118 location.

119 - Acquire-release: load and store operations perform acquire and release

120 operations on the affected memory.

121 - Sequentially consistent: same as acquire-release, but providing a global total

122 ordering for all affected locations.

123

124 As in C11/C++11:

125

126 - Atomic accesses must at least be naturally aligned.

127 - Some accesses may not actually be atomic on certain platforms, requiring an

128 implementation that uses a global lock.

129 - An atomic memory location must always be accessed with atomic primitives, and

130 these primitives must always be of the same bit size for that location.

131 - Not all memory orderings are valid for all atomic operations.

132	145

133 Inline Assembly	146 Inline Assembly

134 ===============	147 ===============

135	148

136 Inline assembly isn't supported by PNaCl because it isn't portable. The	149 Inline assembly isn't supported by PNaCl because it isn't portable. The

137 one current exception is the common compiler barrier idiom	150 one current exception is the common compiler barrier idiom

138 ``asm("":::"memory")``, which gets transformed to a sequentially	151 ``asm("":::"memory")``, which gets transformed to a sequentially

139 consistent memory barrier (equivalent to ``__sync_synchronize()``).	152 consistent memory barrier (equivalent to ``__sync_synchronize()``). In

	153 PNaCl this barrier is only guaranteed to order ``volatile`` and atomic

	154 memory accesses, though in practice the implementation attempts to also

	155 prevent reordering of memory accesses to objects which may escape.

OLD	NEW

« no previous file with comments | « no previous file | docs/PNaClLangRef.rst » ('j') | docs/PNaClLangRef.rst » ('J')