Issue 342763004: Add atomic load/store, fetch_add, fence, and is-lock-free lowering.

Issue 342763004: Add atomic load/store, fetch_add, fence, and is-lock-free lowering. (Closed)

Created:
6 years, 6 months ago by jvoung (off chromium)

Modified:
6 years, 6 months ago

Reviewers:
Jim Stichnoth, JF

CC:
native-client-reviews_googlegroups.com

Base URL:
https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Visibility:
Public.

More Reviews

Description

Add atomic load/store, fetch_add, fence, and is-lock-free lowering. Loads/stores w/ type i8, i16, and i32 are converted to plain load/store instructions and lowered w/ the plain lowerLoad/lowerStore. Atomic stores are followed by an mfence for sequential consistency. For 64-bit types, use movq to do 64-bit memory loads/stores (vs the usual load/store being broken into separate 32-bit load/stores). This means bitcasting the i64 -> f64, first (which splits the load of the value to be stored into two 32-bit ops) then stores in a single op. For load, load into f64 then bitcast back to i64 (which splits after the atomic load). This follows what GCC does for c++11 std::atomic<uint64_t> load/store methods (uses movq when -mfpmath=sse). This introduces some redundancy between movq and movsd, but the convention seems to be to use movq when working with integer quantities. Otherwise, movsd could work too. The difference seems to be in whether or not the XMM register's upper 64-bits are filled with 0 or not. Zero-extending could help avoid partial register stalls. Handle up to i32 fetch_add. TODO: add i64 via a cmpxchg loop. TODO: add some runnable crosstests to make sure that this doesn't do funny things to integer bit patterns that happen to look like signaling NaNs and quiet NaNs. However, the system clang would not know how to handle "llvm.nacl.*" if we choose to target that level directly via .ll files. Or, (a) we use old-school __sync methods (sync_fetch_and_add w/ 0 to load) or (b) require buildbot's clang/gcc to support c++11... BUG= https://code.google.com/p/nativeclient/issues/detail?id=3882 R=stichnot@chromium.org Committed: https://gerrit.chromium.org/gerrit/gitweb?p=native_client/pnacl-subzero.git;a=commit;h=5cd240d

Patch Set 1 #

Patch Set 2 : Handle atomic rmw add up to i32 for now #

Patch Set 3 : use movq for cast #

Patch Set 4 : beef up test a bit #

Total comments: 1

Patch Set 5 : make sure atomic loads are n't optimized out #

Patch Set 6 : test atomic rmw is not elided also #

Total comments: 10

Patch Set 7 : review #

Patch Set 8 : add a comment about xadd #

Patch Set 9 : move _xadd fakedef to common _xadd code #

Total comments: 7

Patch Set 10 : review #

Patch Set 11 : cleanup more #

Patch Set 12 : couple more daring tests #

Patch Set 13 : fix test todo now that separate fix landed #

Patch Set 14 : test 64 errors more #

Total comments: 2

Patch Set 15 : check width #

Total comments: 3

Patch Set 16 : add LOCK prefix to the usage part of comment #

Patch Set 17 : change comment #

Created: 6 years, 6 months ago

Download [raw] [tar.bz2]

	Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+1257 lines, -42 lines)			Patch
M	src/IceInstX8632.h	View	1 2 3	5 chunks	+87 lines, -0 lines	0 comments	Download
M	src/IceInstX8632.cpp	View	1 2 3 4 5 6 7 8 9	7 chunks	+112 lines, -0 lines	0 comments	Download
M	src/IceIntrinsics.h	View	1	1 chunk	+33 lines, -0 lines	0 comments	Download
M	src/IceIntrinsics.cpp	View	1 2 3 4 5 6 7 8 9	2 chunks	+6 lines, -1 line	0 comments	Download
M	src/IceTargetLoweringX8632.h	View	1 2 3 4 5 6 7 8	6 chunks	+20 lines, -0 lines	0 comments	Download
M	src/IceTargetLoweringX8632.cpp	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	14 chunks	+205 lines, -39 lines	0 comments	Download
M	src/llvm2ice.cpp	View		1 chunk	+0 lines, -2 lines	0 comments	Download
A	tests_lit/llvm2ice_tests/nacl-atomic-errors.ll	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16	1 chunk	+169 lines, -0 lines	0 comments	Download
A	tests_lit/llvm2ice_tests/nacl-atomic-fence-all.ll	View	1 2 3 4 5 6 7 8 9 10 11	1 chunk	+216 lines, -0 lines	0 comments	Download
A	tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll	View	1 2 3 4 5 6 7 8 9 10 11 12 13 14	1 chunk	+409 lines, -0 lines	0 comments	Download

Messages

Total messages: 14 (0 generated)

Expand Messages | Collapse Messages

jvoung (off chromium)

https://codereview.chromium.org/342763004/diff/100001/tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll File tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll (right): https://codereview.chromium.org/342763004/diff/100001/tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll#newcode142 tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll:142: ; TODO(jvoung): the 64-bit constant materialization Started looking into ...

6 years, 6 months ago (2014-06-20 23:21:07 UTC) #2

Jim Stichnoth

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (left): https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp#oldcode2009 src/IceTargetLoweringX8632.cpp:2009: Src0 = OperandX8632Mem::create(Func, Ty, Base, Offset); Do you think ...

6 years, 6 months ago (2014-06-23 18:13:02 UTC) #3

jvoung (off chromium)

6 years, 6 months ago (2014-06-23 22:41:43 UTC) #4

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
File src/IceTargetLoweringX8632.cpp (left):

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:2009: Src0 = OperandX8632Mem::create(Func, Ty,
Base, Offset);
On 2014/06/23 18:13:01, stichnot wrote:
> Do you think we need a line after this one like:
>   Src0 = legalize(Src0);
> as is done in lowerStore()?  Otherwise a non register allocated Base would
> create a problem for code emission?

Yes, this does need to be legalized. As do the other uses of FormMemoryOperand
that I added in this CL -- didn't notice that some of the Om1 code was not legal
=( earlier.

Did some common legalization at the end of FormMemoryOperand().

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
File src/IceTargetLoweringX8632.cpp (right):

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:1956: void TargetX8632::lowerAtomicRMW(Variable
*Dest, uint32_t Operation,
On 2014/06/23 18:13:01, stichnot wrote:
> Operation's type should be uint64_t to match the type of
> ConstantInteger::getValue().  Or else the getValue() result should be cast to
> uint32_t in the caller.

Done.

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:1974: Context.insert(InstFakeDef::create(Func,
T));
On 2014/06/23 18:13:01, stichnot wrote:
> What is the purpose of the FakeDef here?

I added this to more precisely model the "exchanging" part of the xadd.

After the xadd, the value of T is no longer Val (from the earlier _mov(T, Val)).
It is instead the old value stored in *Addr. So the later "_mov(Dest, T)" puts
the value of *Addr into Dest instead of Val into Dest.

I suppose that since T is a short-lived temporary it doesn't matter much for
register allocation. Remove then?

https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:1993: Func->setError("Unhandled AtomicRMW
operation");
On 2014/06/23 18:13:01, stichnot wrote:
> Leave a TODO here?  (same for the other instances of "Unhandled AtomicRMW
> operation")

Done.

Jim Stichnoth

LGTM https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (right): https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp#newcode1974 src/IceTargetLoweringX8632.cpp:1974: Context.insert(InstFakeDef::create(Func, T)); On 2014/06/23 22:41:43, jvoung wrote: > ...

6 years, 6 months ago (2014-06-23 23:55:54 UTC) #5

jvoung (off chromium)

Thanks! https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (right): https://codereview.chromium.org/342763004/diff/140001/src/IceTargetLoweringX8632.cpp#newcode1974 src/IceTargetLoweringX8632.cpp:1974: Context.insert(InstFakeDef::create(Func, T)); On 2014/06/23 23:55:54, stichnot wrote: > ...

6 years, 6 months ago (2014-06-24 00:35:57 UTC) #6

Quick drive-by, I haven't read this thoroughly but a few things jumped out. I think ...

6 years, 6 months ago (2014-06-24 01:23:29 UTC) #7

jvoung (off chromium)

Thanks! Good points all around. Let me know if there are other issues. https://codereview.chromium.org/342763004/diff/150009/src/IceIntrinsics.cpp File ...

6 years, 6 months ago (2014-06-24 21:16:55 UTC) #8

Thanks! Good points all around. Let me know if there are other issues.

https://codereview.chromium.org/342763004/diff/150009/src/IceIntrinsics.cpp
File src/IceIntrinsics.cpp (right):

https://codereview.chromium.org/342763004/diff/150009/src/IceIntrinsics.cpp#n...
src/IceIntrinsics.cpp:202: // static
On 2014/06/24 01:23:29, JF wrote:
> // static?

It's chromium style to indicate a static member function. E.g., see this base/
file:

https://code.google.com/p/chromium/codesearch#chromium/src/base/files/file_pa...

but this is LLVM style -- I'll leave it out.

https://codereview.chromium.org/342763004/diff/150009/src/IceTargetLoweringX8...
File src/IceTargetLoweringX8632.cpp (right):

https://codereview.chromium.org/342763004/diff/150009/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:1801: _mfence();
On 2014/06/24 01:23:29, JF wrote:
> FenceAll is meant as an equivalent to asm(:::"m"), so there should be a TODO
> here to ensure no memory operations move past it in Subzero (not just atomic
> accesses, but regular memory too).

Added a comment, and a couple of speculative tests using globals and allocas.
Otherwise, I don't think we really have the aliasing information (or budget to
run alias analysis) on pointers to move loads/stores.

We also don't really have any code that moves loads and stores at the moment, so
there's no existing flag to check, but I've marked mfence w/ the coarse-grained
"HasSideEffects" flag.

Well, there is some code that could possibly move a load. That is the load + add
fusing. It could move the load to where the add is, which would be dangerous if
split by a fence. The test tests this case. Also added a comment to the load+add
fusing code to be more careful if it is beefed up any more.

https://codereview.chromium.org/342763004/diff/150009/src/IceTargetLoweringX8...
src/IceTargetLoweringX8632.cpp:1804: Constant *One =
Ctx->getConstantInt(IceType_i32, 1);
iOn 2014/06/24 01:23:29, JF wrote:
> It would be good to explain that x86 is always lock free for 8/16/32/64 bit
> accesses. A check for wider bitwidth would be useful, to make sure this fails
> if/when we add support for them.

Ah!  I actually misread what the arguments were... Added a comment, the checks
(for exactly those bit widths), and test.

> 
> This should also be used to key DCE in Subzero, since it's not known at
> compilation time but is known at translation time.

Added a TODO and a test that should illustrate what can be optimized.

I think that such DCE will likely be a separate pass. Depends on tradeoff of
time spent by the compiler looking for and marking dead code, and the benefit of
skipping code generation for the dead code.

https://codereview.chromium.org/342763004/diff/150009/tests_lit/llvm2ice_test...
File tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll (right):

https://codereview.chromium.org/342763004/diff/150009/tests_lit/llvm2ice_test...
tests_lit/llvm2ice_tests/nacl-atomic-intrinsics.ll:298: ;
https://code.google.com/p/nativeclient/issues/detail?id=2981

I did notice the 16-bit hack (see this reference).  I was hoping that the
validator would actually be fixed at some point (soonish) rather than add the
hack here.  There are currently thing other things to fix before can run the
NaCl validator (no jump masking or alignment right now -- might also want to
have an integrated assembler first), so there is still a little bit of time
before we pull the switch and start running the NaCl validator. Otherwise, I
have feeling it would need to be done as a separate pass from the rest of the
lowering code.

Looks good overall, one last comment. https://codereview.chromium.org/342763004/diff/280001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (right): https://codereview.chromium.org/342763004/diff/280001/src/IceTargetLoweringX8632.cpp#newcode1839 src/IceTargetLoweringX8632.cpp:1839: Result = Ctx->getConstantZero(IceType_i32); ...

6 years, 6 months ago (2014-06-24 23:50:37 UTC) #9

jvoung (off chromium)

https://codereview.chromium.org/342763004/diff/280001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (right): https://codereview.chromium.org/342763004/diff/280001/src/IceTargetLoweringX8632.cpp#newcode1839 src/IceTargetLoweringX8632.cpp:1839: Result = Ctx->getConstantZero(IceType_i32); On 2014/06/24 23:50:37, JF wrote: > ...

6 years, 6 months ago (2014-06-25 01:31:42 UTC) #10

https://codereview.chromium.org/342763004/diff/300001/src/IceTargetLoweringX8632.cpp File src/IceTargetLoweringX8632.cpp (right): https://codereview.chromium.org/342763004/diff/300001/src/IceTargetLoweringX8632.cpp#newcode1843 src/IceTargetLoweringX8632.cpp:1843: // have cmpxchg16b, which can make 16-byte operations lock ...

6 years, 6 months ago (2014-06-25 01:44:03 UTC) #11

jvoung (off chromium)

6 years, 6 months ago (2014-06-25 15:32:45 UTC) #12

6 years, 6 months ago (2014-06-25 15:41:25 UTC) #13

jvoung (off chromium)

6 years, 6 months ago (2014-06-25 17:37:01 UTC) #14

Message was sent while issue was closed.

Committed patchset #17 manually as r5cd240d (presubmit successful).

Expand Messages | Collapse Messages