|
|
Created:
5 years, 10 months ago by Richard Diamond Modified:
5 years, 4 months ago Reviewers:
JF CC:
native-client-reviews_googlegroups.com Base URL:
https://chromium.googlesource.com/native_client/pnacl-llvm.git@master Target Ref:
refs/heads/master Visibility:
Public. |
DescriptionPNaCl: Impl the other atomicrmw operations: nand, max, min, umax, and umin.
Originally, only Nand was implemented, as that seems to be the only extra
operation used in practice by Rust. However, rewriting the rest in addition was
easy, so I've included those as well.
R= jfb@chromium.org
TEST= (cd toolchain_build/out/llvm_x86_64_linux_work && make check)
BUG=
Patch Set 1 #
Total comments: 2
Patch Set 2 : #
Total comments: 8
Patch Set 3 : #Patch Set 4 : Rebase #
Total comments: 2
Patch Set 5 : Reduce big-O cost. #
Total comments: 1
Patch Set 6 : #Patch Set 7 : #
Total comments: 12
Patch Set 8 : #
Total comments: 8
Patch Set 9 : #Patch Set 10 : #Patch Set 11 : #
Total comments: 2
Patch Set 12 : #Patch Set 13 : #Patch Set 14 : #
Total comments: 7
Patch Set 15 : #Patch Set 16 : #
Messages
Total messages: 34 (0 generated)
https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... lib/Transforms/NaCl/RewriteAtomics.cpp:132: Instruction::CastOps castOp(Instruction &I, Value *Src, Type *Dst, Twine Name) const; Lines > 80. https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... lib/Transforms/NaCl/RewriteAtomics.cpp:471: /// Lifted from X86AtomicExpandPass: I like the approach overall, but since LLVM already contains code to do this I'd rather reuse it directly than copy it in. Do you think you could instead make X86AtomicExpandPass a utility that's usable for non-x86 code? I would do this change in LLVM itself, and PNaCl will soon contain it (we're trying to be ~2 weeks behind top-of-tree, sync'ing at "stable" points). We can also cherry-pick it if we want to get the change faster, but in the end it's better for this to be in LLVM rather than PNaCl. I'd be happy to help with the code reviews in upstream LLVM. The process is similar, but uses Phabricator instead.
On 2015/02/13 17:14:26, JF wrote: > https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... > File lib/Transforms/NaCl/RewriteAtomics.cpp (right): > > https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... > lib/Transforms/NaCl/RewriteAtomics.cpp:132: Instruction::CastOps > castOp(Instruction &I, Value *Src, Type *Dst, Twine Name) const; > Lines > 80. > > https://codereview.chromium.org/927493002/diff/1/lib/Transforms/NaCl/RewriteA... > lib/Transforms/NaCl/RewriteAtomics.cpp:471: /// Lifted from X86AtomicExpandPass: > I like the approach overall, but since LLVM already contains code to do this I'd > rather reuse it directly than copy it in. Do you think you could instead make > X86AtomicExpandPass a utility that's usable for non-x86 code? > > I would do this change in LLVM itself, and PNaCl will soon contain it (we're > trying to be ~2 weeks behind top-of-tree, sync'ing at "stable" points). We can > also cherry-pick it if we want to get the change faster, but in the end it's > better for this to be in LLVM rather than PNaCl. > > I'd be happy to help with the code reviews in upstream LLVM. The process is > similar, but uses Phabricator instead. Actually, such generalization has already been completed and is present in the `merge_36` branch. I've rebased this patch off that branch and have added code to run -atomic-expand to expand those extra atomic RMW instructions. Naturally, this shouldn't be committed until `merge_36` is merged into `master`.
> Actually, such generalization has already been completed and is present in the > `merge_36` branch. I've rebased this patch off that branch and have added code > to run -atomic-expand to expand those extra atomic RMW instructions. Hmm, I forgot that I had reviewed that code... :-s Good thing it's already there! https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:179: Triple TheTriple = Triple("i686-none-nacl"); Does it need to be i686? Can you just have unknown or pnacl instead? https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:211: } Could you invoke the -atomic-expand pass after this one, and then re-run this one, from the pass registry instead of manually? It also looks like the Changed flag won't be quite right (well, technically it is because any -atomic-expand rewrite will trigger a rewrite atomic after, but still...). https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:395: default: return; // -atomic-expand will handle it. It would be good to assert that the second pass of this never hits this case. https://codereview.chromium.org/927493002/diff/20001/test/Transforms/NaCl/ato... File test/Transforms/NaCl/atomic/extra-rmw-operations.ll (right): https://codereview.chromium.org/927493002/diff/20001/test/Transforms/NaCl/ato... test/Transforms/NaCl/atomic/extra-rmw-operations.ll:16: ; CHECK: %3 = call i8 @llvm.nacl.atomic.cmpxchg.i8(i8* %ptr, i8 %loaded, i8 %new, i32 6, i32 6) Could you check that there's a loop at least for one of the cases?
https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:179: Triple TheTriple = Triple("i686-none-nacl"); On 2015/02/16 21:09:01, JF wrote: > Does it need to be i686? Can you just have unknown or pnacl instead? The other option is `x86_64`. Arm expands into some kind of target specific llvm.arm.* combo (I didn't really examine it closely), and mips doesn't need to expand anything (supported natively, I assume). AtomicExpandPass queries the TargetMachine to determine what to expand/what to expand to/etc and without one it won't do anything. Of the three mentioned, only X86 expands those atomic RMWs to target independant IR. (If this seems smelly to you, I 100% agree). https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:211: } On 2015/02/16 21:09:01, JF wrote: > Could you invoke the -atomic-expand pass after this one, and then re-run this > one, from the pass registry instead of manually? > > It also looks like the Changed flag won't be quite right (well, technically it > is because any -atomic-expand rewrite will trigger a rewrite atomic after, but > still...). No, because it would force this kluge onto users. This hack requires a specific TargetMachine to work. Apropos to Changed flag: whoops, fixed. https://codereview.chromium.org/927493002/diff/20001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:395: default: return; // -atomic-expand will handle it. On 2015/02/16 21:09:01, JF wrote: > It would be good to assert that the second pass of this never hits this case. The second pass never reaches this. The other RMW ops are expanded to cmpxchg instructions. https://codereview.chromium.org/927493002/diff/20001/test/Transforms/NaCl/ato... File test/Transforms/NaCl/atomic/extra-rmw-operations.ll (right): https://codereview.chromium.org/927493002/diff/20001/test/Transforms/NaCl/ato... test/Transforms/NaCl/atomic/extra-rmw-operations.ll:16: ; CHECK: %3 = call i8 @llvm.nacl.atomic.cmpxchg.i8(i8* %ptr, i8 %loaded, i8 %new, i32 6, i32 6) On 2015/02/16 21:09:01, JF wrote: > Could you check that there's a loop at least for one of the cases? Done.
I think this approach is OK as a temporary fix, but I think medium term we'll want something cleaner. Since we're waiting for the 3.6 merge before we can land this patch, I propose cleaning things up in upstream LLVM first, so that this patch not be needed at all: (1) Modify LLVM's TargetLowering.cpp to add a new method shouldExpandNonStandardAtomicRMWInIR, which is a superset by shouldExpandAtomicRMWInIR and which allows a TargetLowering to opt in to expanding just nand/[u]min/[u]max. (2) Create lib/Target/Le32/ and create an Le32TargetLowering, which for now only implements the basic functionality needed for atomic expansion. This would do everything you want, but it would also be upstream and would set the basis for a real Le32 target, which we've been wanting to do for a while. I can help you do (1) if you're up for it, it shouldn't be too hard: http://llvm.org/docs/DeveloperPolicy.html Otherwise I can do it in the next few days. I think the PNaCl team should do (2), since we're committing to maintaining the code we're creating, and we're going to create an actual backend. What do you think?
On 2015/02/17 17:30:16, JF wrote: > I think this approach is OK as a temporary fix, but I think medium term we'll > want something cleaner. Since we're waiting for the 3.6 merge before we can land > this patch, I propose cleaning things up in upstream LLVM first, so that this > patch not be needed at all: > > (1) Modify LLVM's TargetLowering.cpp to add a new method > shouldExpandNonStandardAtomicRMWInIR, which is a superset by > shouldExpandAtomicRMWInIR and which allows a TargetLowering to opt in to > expanding just nand/[u]min/[u]max. + changes to AtomicExpandPass. Though I'll make TargetLowering check a bitfield of supported ops instead of a coarse all-or-nothing value, if that's alright. > > (2) Create lib/Target/Le32/ and create an Le32TargetLowering, which for now > only implements the basic functionality needed for atomic expansion. > > This would do everything you want, but it would also be upstream and would set > the basis for a real Le32 target, which we've been wanting to do for a while. That sounds excellent. Let me know if I can assist. > I can help you do (1) if you're up for it, it shouldn't be too hard: > http://llvm.org/docs/DeveloperPolicy.html > Otherwise I can do it in the next few days. > > I think the PNaCl team should do (2), since we're committing to maintaining the > code we're creating, and we're going to create an actual backend. > > What do you think? Happy to help, I'll submit a patch for (1) to llvm-commits soonish.
> Happy to help, I'll submit a patch for (1) to llvm-commits soonish. Great! Please use Phabricator, it's much easier to review than emailing patches.
On 2015/02/17 22:02:52, JF wrote: > On 2015/02/17 17:30:16, JF wrote: > > I think this approach is OK as a temporary fix, but I think medium term we'll > > want something cleaner. Since we're waiting for the 3.6 merge before we can > land > > this patch, I propose cleaning things up in upstream LLVM first, so that this > > patch not be needed at all: > > > > (1) Modify LLVM's TargetLowering.cpp to add a new method > > shouldExpandNonStandardAtomicRMWInIR, which is a superset by > > shouldExpandAtomicRMWInIR and which allows a TargetLowering to opt in to > > expanding just nand/[u]min/[u]max. > > + changes to AtomicExpandPass. Though I'll make TargetLowering check a bitfield > of supported ops instead of a coarse all-or-nothing value, if that's alright. > Now that I've read the surrounding code (to shouldExpandAtomicRMWInIR), nvm the bitfield thing. Also, presumably, a target specific (ie Le32TargetLowering) TargetLowering could choose to only return true for specific ops (the function gets the whole instruction), so I'm not sure what would be gained by adding shouldExpandNonStandardAtomicRMWInIR as you said. Currently, I'm adding a new enum to TargetLowering, AtomicRMWExpansionKind, and changing shouldExpandAtomicRMWInIR to return a value indicating to AtomicExpandPass (or anybody I suppose) how to expand the atomic (ie via loadlinked/storeconditional combos, or via cmpxchg), if at all. Specific targets can then look at the instruction to determine expansion. So, presumably, Le32TargetLowering can override shouldExpandAtomicRMWInIR and return AtomicRMWExpansionKind::CmpXChg only for Nand, (u)max/min RMW ops.
On 2015/02/17 22:02:52, JF wrote: > > Happy to help, I'll submit a patch for (1) to llvm-commits soonish. > > Great! Please use Phabricator, it's much easier to review than emailing patches. Okay, how does one do that?
On 2015/02/17 23:02:25, Richard Diamond wrote: > On 2015/02/17 22:02:52, JF wrote: > > > Happy to help, I'll submit a patch for (1) to llvm-commits soonish. > > > > Great! Please use Phabricator, it's much easier to review than emailing > patches. > > Okay, how does one do that? Nvm, found http://llvm.org/docs/Phabricator.html.
> Now that I've read the surrounding code (to shouldExpandAtomicRMWInIR), nvm the > bitfield thing. Also, presumably, a target specific (ie Le32TargetLowering) > TargetLowering could choose to only return true for specific ops (the function > gets the whole instruction), so I'm not sure what would be gained by adding > shouldExpandNonStandardAtomicRMWInIR as you said. Currently, I'm adding a new > enum to TargetLowering, AtomicRMWExpansionKind, and changing > shouldExpandAtomicRMWInIR to return a value indicating to AtomicExpandPass (or > anybody I suppose) how to expand the atomic (ie via loadlinked/storeconditional > combos, or via cmpxchg), if at all. Specific targets can then look at the > instruction to determine expansion. So, presumably, Le32TargetLowering can > override shouldExpandAtomicRMWInIR and return AtomicRMWExpansionKind::CmpXChg > only for Nand, (u)max/min RMW ops. Yeah that sounds better than what I suggested! I like it.
Now that `pnacl-llvm` is more up-to-date w.r.t. upstream, this can be reviewed and committed.
https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:188: (createAtomicExpandPass(Target.get())); Could the pass instead go in lib/Transforms/NaCl/PNaClABISimplify.cpp, just before createRewriteAtomicsPass? It'll need a target, which for now would still be the hacky x86 one, but that seems simpler than calling the pass from here.
https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:188: (createAtomicExpandPass(Target.get())); On 2015/07/10 at 18:07:32, JF wrote: > Could the pass instead go in lib/Transforms/NaCl/PNaClABISimplify.cpp, just before createRewriteAtomicsPass? It'll need a target, which for now would still be the hacky x86 one, but that seems simpler than calling the pass from here. No, because -atomic-expand will expand stuff that this pass is capable of expanding, and will expand it to code that's less efficient. Not that we couldn't run this pass, then -atomic-expand, then this pass again, but that would result in three whole passes over the module. By running the pass "inline", we reduce the computational cost of the IR simplification passes by only running ourselves over the functions that AtomicExpandPass modifies. I would argue that use of most of these extra ops are the exceptional case, rather than the norm, so this saves us a few cycles. That said, I've modified the visitor to save even more cycles by selectively running -atomic-expand and this pass on only those functions which we know will need it. It might not be simpler, but as I've said before, Rust generates fairly obese modules and I'd like Rust's tests to not take hours upon hours to run.
On 2015/07/10 21:08:17, Richard Diamond wrote: > https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... > File lib/Transforms/NaCl/RewriteAtomics.cpp (right): > > https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... > lib/Transforms/NaCl/RewriteAtomics.cpp:188: > (createAtomicExpandPass(Target.get())); > On 2015/07/10 at 18:07:32, JF wrote: > > Could the pass instead go in lib/Transforms/NaCl/PNaClABISimplify.cpp, just > before createRewriteAtomicsPass? It'll need a target, which for now would still > be the hacky x86 one, but that seems simpler than calling the pass from here. > > No, because -atomic-expand will expand stuff that this pass is capable of > expanding, and will expand it to code that's less efficient. Not that we > couldn't run this pass, then -atomic-expand, then this pass again, but that > would result in three whole passes over the module. By running the pass > "inline", we reduce the computational cost of the IR simplification passes by > only running ourselves over the functions that AtomicExpandPass modifies. I > would argue that use of most of these extra ops are the exceptional case, rather > than the norm, so this saves us a few cycles. That said, I've modified the > visitor to save even more cycles by selectively running -atomic-expand and this > pass on only those functions which we know will need it. > > It might not be simpler, but as I've said before, Rust generates fairly obese > modules and I'd like Rust's tests to not take hours upon hours to run. My main worry is maintainability: the LLVM pass code is changing quite a bit upstream, and this will be a headache to merge and fix in the future. IIUC where things are going the eventual solution will be to run this pass separately, so I'd rather do that now. Good idea on caching which functions need to be analyzed. I think a good solution would be to have an analysis pass which figures out which functions have atomic operations, and then proceed with what you've described (rewrite atomics, expand odd ones, and rewrite what's left). I don't think that doing this will be particularly expensive either.
On 2015/07/10 at 21:46:53, jfb wrote: > On 2015/07/10 21:08:17, Richard Diamond wrote: > > https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... > > File lib/Transforms/NaCl/RewriteAtomics.cpp (right): > > > > https://codereview.chromium.org/927493002/diff/60001/lib/Transforms/NaCl/Rewr... > > lib/Transforms/NaCl/RewriteAtomics.cpp:188: > > (createAtomicExpandPass(Target.get())); > > On 2015/07/10 at 18:07:32, JF wrote: > > > Could the pass instead go in lib/Transforms/NaCl/PNaClABISimplify.cpp, just > > before createRewriteAtomicsPass? It'll need a target, which for now would still > > be the hacky x86 one, but that seems simpler than calling the pass from here. > > > > No, because -atomic-expand will expand stuff that this pass is capable of > > expanding, and will expand it to code that's less efficient. Not that we > > couldn't run this pass, then -atomic-expand, then this pass again, but that > > would result in three whole passes over the module. By running the pass > > "inline", we reduce the computational cost of the IR simplification passes by > > only running ourselves over the functions that AtomicExpandPass modifies. I > > would argue that use of most of these extra ops are the exceptional case, rather > > than the norm, so this saves us a few cycles. That said, I've modified the > > visitor to save even more cycles by selectively running -atomic-expand and this > > pass on only those functions which we know will need it. > > > > It might not be simpler, but as I've said before, Rust generates fairly obese > > modules and I'd like Rust's tests to not take hours upon hours to run. > > My main worry is maintainability: the LLVM pass code is changing quite a bit upstream, and this will be a headache to merge and fix in the future. IIUC where things are going the eventual solution will be to run this pass separately, so I'd rather do that now. Fair enough. (After starting the change) I've realized we can't create the target machine in `PNaClABISimplifyAddPostOptPasses` because then the lifetime of the TargetMachine will end when `PNaClABISimplifyAddPostOptPasses`s returns, whereas -atomic-expand expects it to live until at least after -atomic-expand has ran. I went ahead and modified -nacl-rewrite-atomics to be a function pass, so at least that part won't require maintenance in the future to take advantage of pass manager improvements in LLVM. > > Good idea on caching which functions need to be analyzed. I think a good solution would be to have an analysis pass which figures out which functions have atomic operations, and then proceed with what you've described (rewrite atomics, expand odd ones, and rewrite what's left). I don't think that doing this will be particularly expensive either. I agree and it would also benefit other passes. I'm at the very least willing to begin this work, but I'll have to dig into LLVM's analysis infrastructure before I can, so it should probably be done in another PR; until this PR is merged, Rust will always crash pnacl-opt.
JF, I've added the analysis pass we've discussed.
https://codereview.chromium.org/927493002/diff/80001/lib/Transforms/NaCl/Rewr... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/80001/lib/Transforms/NaCl/Rewr... lib/Transforms/NaCl/RewriteAtomics.cpp:93: /// the expection rather than the rule). "exception" https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... File include/llvm/Analysis/NaCl/SimplificationAnalyses.h (right): https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:36: const SmallVectorImpl<FenceInst *> &getFences() const { return Fences; } Are these ever used in a non-const context by non-analysis code? https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:55: } Do these ever get used? It doesn't seem like you should ever copy this class. https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:96: class AtomicAnalysisWrapperPass : public FunctionPass { Why not just put everything in this class, instead of having 3 classes? https://codereview.chromium.org/927493002/diff/120001/include/llvm/Transforms... File include/llvm/Transforms/NaCl.h (right): https://codereview.chromium.org/927493002/diff/120001/include/llvm/Transforms... include/llvm/Transforms/NaCl.h:41: #undef PNACLPASS_LEGACY_ONLY Could you do this in a separate CL? https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/CMake... File lib/Analysis/NaCl/CMakeLists.txt (right): https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/CMake... lib/Analysis/NaCl/CMakeLists.txt:7: SimplificationAnalyses.cpp Prefix with PNaCl to be consistent. https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/Simpl... File lib/Analysis/NaCl/SimplificationAnalyses.cpp (right): https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/Simpl... lib/Analysis/NaCl/SimplificationAnalyses.cpp:23: for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) { Use inst_range instead? Or just make this an InstVisitor.
https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... File include/llvm/Analysis/NaCl/SimplificationAnalyses.h (right): https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:36: const SmallVectorImpl<FenceInst *> &getFences() const { return Fences; } On 2015/07/23 at 22:44:38, JF wrote: > Are these ever used in a non-const context by non-analysis code? Probably not, removed. https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:55: } On 2015/07/23 at 22:44:38, JF wrote: > Do these ever get used? It doesn't seem like you should ever copy this class. Yes, the legacy wrapper uses them when it's run over a new function and MSVC doesn't generate them automatically. https://codereview.chromium.org/927493002/diff/120001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/SimplificationAnalyses.h:96: class AtomicAnalysisWrapperPass : public FunctionPass { On 2015/07/23 at 22:44:38, JF wrote: > Why not just put everything in this class, instead of having 3 classes? `AtomicAnalysis` && `AtomicInfo` are for the new pass managers and AIUI must be split (the managers will cache AtomicInfo). Additionally, the new managers use templates to discover what type of pass something is (ie what IRUnitT it uses) and doesn't use any sort of inheritance. https://codereview.chromium.org/927493002/diff/120001/include/llvm/Transforms... File include/llvm/Transforms/NaCl.h (right): https://codereview.chromium.org/927493002/diff/120001/include/llvm/Transforms... include/llvm/Transforms/NaCl.h:41: #undef PNACLPASS_LEGACY_ONLY On 2015/07/23 at 22:44:38, JF wrote: > Could you do this in a separate CL? Done. https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/CMake... File lib/Analysis/NaCl/CMakeLists.txt (right): https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/CMake... lib/Analysis/NaCl/CMakeLists.txt:7: SimplificationAnalyses.cpp On 2015/07/23 at 22:44:38, JF wrote: > Prefix with PNaCl to be consistent. Done. https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/Simpl... File lib/Analysis/NaCl/SimplificationAnalyses.cpp (right): https://codereview.chromium.org/927493002/diff/120001/lib/Analysis/NaCl/Simpl... lib/Analysis/NaCl/SimplificationAnalyses.cpp:23: for (inst_iterator I = inst_begin(F), E = inst_end(F); I != E; ++I) { On 2015/07/23 at 22:44:38, JF wrote: > Use inst_range instead? Or just make this an InstVisitor. Done.
https://codereview.chromium.org/927493002/diff/140001/include/llvm/Analysis/N... File include/llvm/Analysis/NaCl/PNaClSimplificationAnalyses.h (right): https://codereview.chromium.org/927493002/diff/140001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/PNaClSimplificationAnalyses.h:86: AtomicAnalysis &operator=(AtomicAnalysis &&) { return *this; } Do you need these? The default should be enough. https://codereview.chromium.org/927493002/diff/140001/lib/Analysis/NaCl/PNaCl... File lib/Analysis/NaCl/PNaClSimplificationAnalyses.cpp (right): https://codereview.chromium.org/927493002/diff/140001/lib/Analysis/NaCl/PNaCl... lib/Analysis/NaCl/PNaClSimplificationAnalyses.cpp:38: NeedsAtomicExpand = true; Don't leave a fallthrough here, usual LLVM style tries to avoid it (or points out where it's used when necessary) because it's easier to read. https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:84: AtomicRMWExpander.reset(createAtomicExpandPass(Target.get())); Now that you have an analysis pass, can you move this code to the part that invokes the passes, so that we can create the triple there and pass it to other passes? https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:253: const bool Expanded = AtomicRMWExpander->runOnFunction(F); Once the target is present, this can be done automatically by the pass manager (assuming you add it as a pass).
https://codereview.chromium.org/927493002/diff/140001/include/llvm/Analysis/N... File include/llvm/Analysis/NaCl/PNaClSimplificationAnalyses.h (right): https://codereview.chromium.org/927493002/diff/140001/include/llvm/Analysis/N... include/llvm/Analysis/NaCl/PNaClSimplificationAnalyses.h:86: AtomicAnalysis &operator=(AtomicAnalysis &&) { return *this; } On 2015/07/27 at 19:49:28, JF wrote: > Do you need these? The default should be enough. All the other analysis have them, citing MSVC as the culprit. https://codereview.chromium.org/927493002/diff/140001/lib/Analysis/NaCl/PNaCl... File lib/Analysis/NaCl/PNaClSimplificationAnalyses.cpp (right): https://codereview.chromium.org/927493002/diff/140001/lib/Analysis/NaCl/PNaCl... lib/Analysis/NaCl/PNaClSimplificationAnalyses.cpp:38: NeedsAtomicExpand = true; On 2015/07/27 at 19:49:28, JF wrote: > Don't leave a fallthrough here, usual LLVM style tries to avoid it (or points out where it's used when necessary) because it's easier to read. That wasn't intended. I think Rust's lack of fall through got to me. Fixed. https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:65: // For rewritting nand, (u)max, (u)min rmw atomics: Note to (re)views: once http://reviews.llvm.org/D11422 is accepted upstream and cherry picked into `pnacl-llvm`, this kluge will be removed infavor of `llvm::expandAtomicRMWToCmpXchg`.
https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/140001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:65: // For rewritting nand, (u)max, (u)min rmw atomics: On 2015/07/30 at 11:47:08, Richard Diamond wrote: > Note to (re)views: once http://reviews.llvm.org/D11422 is accepted upstream and cherry picked into `pnacl-llvm`, this kluge will be removed infavor of `llvm::expandAtomicRMWToCmpXchg`. reviewers*
Updated for the AtomicExpand refactor.
https://codereview.chromium.org/927493002/diff/200001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/200001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:407: auto Factory = [&] (IRBuilder<> &Builder, Value *Addr, I'd rather explicitly list the captures: [this, &I] https://codereview.chromium.org/927493002/diff/200001/test/Transforms/NaCl/at... File test/Transforms/NaCl/atomic/extra-rmw-operations.ll (right): https://codereview.chromium.org/927493002/diff/200001/test/Transforms/NaCl/at... test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK: @llvm.nacl.atomic.cmpxchg Can you keep one of them expanded, so we can check what the code is?
Done.
Wait, check doesn't pass.
On 2015/08/04 at 19:08:56, Richard Diamond wrote: > Wait, check doesn't pass. Fixed.
https://codereview.chromium.org/927493002/diff/260001/include/llvm/CodeGen/At... File include/llvm/CodeGen/AtomicExpandUtils.h (right): https://codereview.chromium.org/927493002/diff/260001/include/llvm/CodeGen/At... include/llvm/CodeGen/AtomicExpandUtils.h:1: //===-- AtomicExpandUtils.h - Utilities for expanding atomic instructions -===// This file should go away from the diff once you rebase? https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:98: freezeMemoryOrder(const Instruction &I, AtomicOrdering S, Why change the signature? The only valid input is cmpxchg because of the success/failure ordering. https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:525: class RewriteAtomics : public FunctionPass { Doesn't this still have to be a ModulePass because it can add intrinsic declarations into the module? https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... File test/Transforms/NaCl/atomic/extra-rmw-operations.ll (right): https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK-NEXT: %1 = load i8, i8* %ptr, align 8 This load needs to be atomic, or we need a fence before it. The alignment is also wrong, since the input is just 1 byte.
https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... File lib/Transforms/NaCl/RewriteAtomics.cpp (right): https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:98: freezeMemoryOrder(const Instruction &I, AtomicOrdering S, On 2015/08/05 at 16:25:21, JF wrote: > Why change the signature? The only valid input is cmpxchg because of the success/failure ordering. In the callback provided to `llvm::expandAtomicRMWToCmpXchg`, we don't have a `AtomicCmpXchgInst` to use. https://codereview.chromium.org/927493002/diff/260001/lib/Transforms/NaCl/Rew... lib/Transforms/NaCl/RewriteAtomics.cpp:525: class RewriteAtomics : public FunctionPass { On 2015/08/05 at 16:25:21, JF wrote: > Doesn't this still have to be a ModulePass because it can add intrinsic declarations into the module? No, because `AddPNaClExternalDecls` does that. https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... File test/Transforms/NaCl/atomic/extra-rmw-operations.ll (right): https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK-NEXT: %1 = load i8, i8* %ptr, align 8 On 2015/08/05 at 16:25:21, JF wrote: > This load needs to be atomic, or we need a fence before it. > > The alignment is also wrong, since the input is just 1 byte. Hm, indeed this is wrong. This patch isn't the culprit, though. I'll have to submit a patch upstream fixing `llvm::expandAtomicRMWToCmpXchg`.
> https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... > test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK-NEXT: %1 = load i8, i8* %ptr, align 8 > On 2015/08/05 at 16:25:21, JF wrote: > > This load needs to be atomic, or we need a fence before it. I've been giving this some thought. I've come to the conclusion that the load on the incoming edge of the loop doesn't *need* to be atomic. The loop doesn't contain any stores other than the cmpxchg and that's guarded with the `lock` prefix (or a fence or whatever). Therefore it doesn't violate any memory order semantics with it's absence. I'll submit a patch upstream to fix the comment, but that shouldn't block this. > > > > The alignment is also wrong, since the input is just 1 byte. > > Hm, indeed this is wrong. This patch isn't the culprit, though. I'll have to submit a patch upstream fixing `llvm::expandAtomicRMWToCmpXchg`. Now that the upstream patch has been cherry picked into `pnacl-llvm`, this patch should be ready to land, assuming you have no more issues with it.
On 2015/08/22 06:51:42, Richard Diamond wrote: > > > https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... > > test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK-NEXT: %1 = > load i8, i8* %ptr, align 8 > > On 2015/08/05 at 16:25:21, JF wrote: > > > This load needs to be atomic, or we need a fence before it. > > I've been giving this some thought. I've come to the conclusion that the load on > the incoming edge of the loop doesn't *need* to be atomic. The loop doesn't > contain any stores other than the cmpxchg and that's guarded with the `lock` > prefix (or a fence or whatever). Therefore it doesn't violate any memory order > semantics with it's absence. > > I'll submit a patch upstream to fix the comment, but that shouldn't block this. It does need to be atomic: on weak memory systems the CPU can speculate on subsequent loads (e.g. the cmpxchg) and observe them without honoring the happens-before ordering of the corresponding stores. This is the "load buffering" problem in literature, and occurs on ARM and POWER. > > > The alignment is also wrong, since the input is just 1 byte. > > > > Hm, indeed this is wrong. This patch isn't the culprit, though. I'll have to > submit a patch upstream fixing `llvm::expandAtomicRMWToCmpXchg`. > > Now that the upstream patch has been cherry picked into `pnacl-llvm`, this patch > should be ready to land, assuming you have no more issues with it. Could you add a TODO in the test, to denote that `atomic` will be fixed separately, as will alignment? The test will start failing once your fix lands upstream and we cherry-pick it, so it'll be nice to have a comment over the failure explaining that this is expected! Looks good otherwise.
On 2015/08/25 at 19:37:46, jfb wrote: > On 2015/08/22 06:51:42, Richard Diamond wrote: > > > > > https://codereview.chromium.org/927493002/diff/260001/test/Transforms/NaCl/at... > > > test/Transforms/NaCl/atomic/extra-rmw-operations.ll:14: ; CHECK-NEXT: %1 = > > load i8, i8* %ptr, align 8 > > > On 2015/08/05 at 16:25:21, JF wrote: > > > > This load needs to be atomic, or we need a fence before it. > > > > I've been giving this some thought. I've come to the conclusion that the load on > > the incoming edge of the loop doesn't *need* to be atomic. The loop doesn't > > contain any stores other than the cmpxchg and that's guarded with the `lock` > > prefix (or a fence or whatever). Therefore it doesn't violate any memory order > > semantics with it's absence. > > > > I'll submit a patch upstream to fix the comment, but that shouldn't block this. > > It does need to be atomic: on weak memory systems the CPU can speculate on subsequent loads (e.g. the cmpxchg) and observe them without honoring the happens-before ordering of the corresponding stores. This is the "load buffering" problem in literature, and occurs on ARM and POWER. Hm. Which is a problem if the cmpxchg's are subsequently expanded during bitcode translation. But don't both ARM && PPC expand the resulting cmpxchg's into linked loads && conditional stores with memory fences? > > > > > The alignment is also wrong, since the input is just 1 byte. > > > > > > Hm, indeed this is wrong. This patch isn't the culprit, though. I'll have to > > submit a patch upstream fixing `llvm::expandAtomicRMWToCmpXchg`. > > > > Now that the upstream patch has been cherry picked into `pnacl-llvm`, this patch > > should be ready to land, assuming you have no more issues with it. > > Could you add a TODO in the test, to denote that `atomic` will be fixed separately, as will alignment? The test will start failing once your fix lands upstream and we cherry-pick it, so it'll be nice to have a comment over the failure explaining that this is expected! > > Looks good otherwise.
> > It does need to be atomic: on weak memory systems the CPU can speculate on > subsequent loads (e.g. the cmpxchg) and observe them without honoring the > happens-before ordering of the corresponding stores. This is the "load > buffering" problem in literature, and occurs on ARM and POWER. > > Hm. Which is a problem if the cmpxchg's are subsequently expanded during bitcode > translation. > > But don't both ARM && PPC expand the resulting cmpxchg's into linked loads && > conditional stores with memory fences? What ARM and PPC do isn't known by LLVM at this point in time (or rather, the IR transform doesn't ask the target if a non-atomic load was OK here, it just assumes it is!). This break LLVM's memory model without an explicit opt-in from the target: other optimizations could occur in between this and further lowering. |