src/IceTargetLoweringMIPS32.cpp - Issue 2432373002: [SubZero] Fix f64 to/from i64 moves

Side by Side Diff: src/IceTargetLoweringMIPS32.cpp

Issue 2432373002: [SubZero] Fix f64 to/from i64 moves (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: Created 4 years, 2 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 //	1 //

2 // The Subzero Code Generator	2 // The Subzero Code Generator

3 //	3 //

4 // This file is distributed under the University of Illinois Open Source	4 // This file is distributed under the University of Illinois Open Source

5 // License. See LICENSE.TXT for details.	5 // License. See LICENSE.TXT for details.

6 //	6 //

7 //===----------------------------------------------------------------------===//	7 //===----------------------------------------------------------------------===//

8 ///	8 ///

9 /// \file	9 /// \file

10 /// \brief Implements the TargetLoweringMIPS32 class, which consists almost	10 /// \brief Implements the TargetLoweringMIPS32 class, which consists almost

(...skipping 98 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
109 void TargetMIPS32::assignVarStackSlots(VarList &SortedSpilledVariables,	109 void TargetMIPS32::assignVarStackSlots(VarList &SortedSpilledVariables,

110 size_t SpillAreaPaddingBytes,	110 size_t SpillAreaPaddingBytes,

111 size_t SpillAreaSizeBytes,	111 size_t SpillAreaSizeBytes,

112 size_t GlobalsAndSubsequentPaddingSize) {	112 size_t GlobalsAndSubsequentPaddingSize) {

113 const VariablesMetadata *VMetadata = Func->getVMetadata();	113 const VariablesMetadata *VMetadata = Func->getVMetadata();

114 size_t GlobalsSpaceUsed = SpillAreaPaddingBytes;	114 size_t GlobalsSpaceUsed = SpillAreaPaddingBytes;

115 size_t NextStackOffset = SpillAreaPaddingBytes;	115 size_t NextStackOffset = SpillAreaPaddingBytes;

116 CfgVector<size_t> LocalsSize(Func->getNumNodes());	116 CfgVector<size_t> LocalsSize(Func->getNumNodes());

117 const bool SimpleCoalescing = !callsReturnsTwice();	117 const bool SimpleCoalescing = !callsReturnsTwice();

118	118

119 for (Variable *Var : SortedSpilledVariables) {	119 // We have sorted the spilled variables based on their alignment.
	Jim Stichnoth 2016/10/19 18:41:16 Reflow comment to 80-col. (M-q if using emacs...) Reflow comment to 80-col. (M-q if using emacs...)
	120 // However when the alignment is same then it is the index of the

	121 // variable which gets precedence. In case of i64 variables (which

	122 // we split into Hi and Lo parts), the Lo part is created first and

	123 // thus it gets a lower index than the Hi part. Due to this Lo part

	124 // is allocated on stack before its corresponding Hi part.

	125 // As per the MIPS32 ABI, Hi part must be stored in the high address.

	126 // We assign stack slots to the variables in reverse order

	127 // so that Hi part can get high address.

	128 for (Variable *Var : reverse_range(SortedSpilledVariables)) {
	Jim Stichnoth 2016/10/19 18:41:16 This fix concerns me a bit. The first concern is This fix concerns me a bit. The first concern is how this interacts with the register allocator. On x8632 (and mips32 as well I assume), the hi and lo parts are independently register-allocated, so either or both parts may end up in registers or on the stack. Also in x8632, the lowering is arranged such that bitcasts between i64 and double doesn't rely on the i32 components being adjacent on the stack. So it's not clear to me where the hi/lo stack ordering comes into play here. Maybe the issue is just with respect to function arguments passed on the stack? The other concern is about the sorting. SortedSpilledVariables is deliberately sorted first by alignment (largest to smallest), and in the case of ties, deterministically sorted by variable number (smallest to largest). It's important that it is sorted by largest-to-smallest alignment, since assuming all alignment values are powers of two, it guarantees that aligning the first one (with the largest alignment requirement) will also make everything else aligned. If you reverse this order, the guarantee no longer holds. jaydeep.patil 2016/10/20 04:35:40 The issue occurs when a spilled i64 is moved to f6 Show quoted text On 2016/10/19 18:41:16, Jim Stichnoth wrote: > This fix concerns me a bit. > > The first concern is how this interacts with the register allocator. On x8632 > (and mips32 as well I assume), the hi and lo parts are independently > register-allocated, so either or both parts may end up in registers or on the > stack. Also in x8632, the lowering is arranged such that bitcasts between i64 > and double doesn't rely on the i32 components being adjacent on the stack. So > it's not clear to me where the hi/lo stack ordering comes into play here. > > Maybe the issue is just with respect to function arguments passed on the stack? > > The other concern is about the sorting. SortedSpilledVariables is deliberately > sorted first by alignment (largest to smallest), and in the case of ties, > deterministically sorted by variable number (smallest to largest). It's > important that it is sorted by largest-to-smallest alignment, since assuming all > alignment values are powers of two, it guarantees that aligning the first one > (with the largest alignment requirement) will also make everything else aligned. > If you reverse this order, the guarantee no longer holds. The issue occurs when a spilled i64 is moved to f64. On MIPS we split i64 into Hi-Lo, but f64 is a single 64-bit variable. To move a spilled i64 to f64 we need a 64-bit load of f64 from Lo part of i64. However the Lo part is stored at high-address and its Hi part at low-address which is exactly opposite of the ABI. As mentioned by you, reverse allocation is not a good idea. I should not rely on the order of the variable on stack. Let me change it to use registers instead and handle it in post-lower legalization.
120 size_t Increment = typeWidthInBytesOnStack(Var->getType());	129 size_t Increment = typeWidthInBytesOnStack(Var->getType());

121 if (SimpleCoalescing && VMetadata->isTracked(Var)) {	130 if (SimpleCoalescing && VMetadata->isTracked(Var)) {

122 if (VMetadata->isMultiBlock(Var)) {	131 if (VMetadata->isMultiBlock(Var)) {

123 GlobalsSpaceUsed += Increment;	132 GlobalsSpaceUsed += Increment;

124 NextStackOffset = GlobalsSpaceUsed;	133 NextStackOffset = GlobalsSpaceUsed;

125 } else {	134 } else {

126 SizeT NodeIndex = VMetadata->getLocalUseNode(Var)->getIndex();	135 SizeT NodeIndex = VMetadata->getLocalUseNode(Var)->getIndex();

127 LocalsSize[NodeIndex] += Increment;	136 LocalsSize[NodeIndex] += Increment;

128 NextStackOffset = SpillAreaPaddingBytes +	137 NextStackOffset = SpillAreaPaddingBytes +

129 GlobalsAndSubsequentPaddingSize +	138 GlobalsAndSubsequentPaddingSize +

(...skipping 1638 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
1768 }	1777 }

1769 MovInstr->setDeleted();	1778 MovInstr->setDeleted();

1770 return;	1779 return;

1771 }	1780 }

1772 }	1781 }

1773	1782

1774 if (!Dest->hasReg()) {	1783 if (!Dest->hasReg()) {

1775 auto *SrcR = llvm::cast<Variable>(Src);	1784 auto *SrcR = llvm::cast<Variable>(Src);

1776 assert(SrcR->hasReg());	1785 assert(SrcR->hasReg());

1777 assert(!SrcR->isRematerializable());	1786 assert(!SrcR->isRematerializable());

1778 int32_t Offset = 0;	1787 int32_t Offset = Dest->getStackOffset();

1779

1780 if (MovInstr->getDestHi() != nullptr)

1781 Offset = MovInstr->getDestHi()->getStackOffset();

1782 else

1783 Offset = Dest->getStackOffset();

1784	1788

1785 // This is a _mov(Mem(), Variable), i.e., a store.	1789 // This is a _mov(Mem(), Variable), i.e., a store.

1786 auto *Base = Target->getPhysicalRegister(Target->getFrameOrStackReg());	1790 auto *Base = Target->getPhysicalRegister(Target->getFrameOrStackReg());

1787	1791

1788 OperandMIPS32Mem *Addr = OperandMIPS32Mem::create(	1792 OperandMIPS32Mem *Addr = OperandMIPS32Mem::create(

1789 Target->Func, DestTy, Base,	1793 Target->Func, DestTy, Base,

1790 llvm::cast<ConstantInteger32>(Target->Ctx->getConstantInt32(Offset)));	1794 llvm::cast<ConstantInteger32>(Target->Ctx->getConstantInt32(Offset)));

1791	1795

1792 // FP arguments are passed in GP reg if first argument is in GP. In this	1796 // FP arguments are passed in GP reg if first argument is in GP. In this

1793 // case type of the SrcR is still FP thus we need to explicitly generate sw	1797 // case type of the SrcR is still FP thus we need to explicitly generate sw

(...skipping 12 matching lines...) Expand all Loading...
1806 Target->Func, DestTy, Base,	1810 Target->Func, DestTy, Base,

1807 llvm::cast<ConstantInteger32>(	1811 llvm::cast<ConstantInteger32>(

1808 Target->Ctx->getConstantInt32(Offset + 4)));	1812 Target->Ctx->getConstantInt32(Offset + 4)));

1809 Target->_sw(SrcGPRLo, Addr);	1813 Target->_sw(SrcGPRLo, Addr);

1810 Target->_sw(SrcGPRHi, AddrHi);	1814 Target->_sw(SrcGPRHi, AddrHi);

1811 } else if (DestTy == IceType_f64 && IsSrcGPReg) {	1815 } else if (DestTy == IceType_f64 && IsSrcGPReg) {

1812 const auto FirstReg =	1816 const auto FirstReg =

1813 (llvm::cast<Variable>(MovInstr->getSrc(0)))->getRegNum();	1817 (llvm::cast<Variable>(MovInstr->getSrc(0)))->getRegNum();

1814 const auto SecondReg =	1818 const auto SecondReg =

1815 (llvm::cast<Variable>(MovInstr->getSrc(1)))->getRegNum();	1819 (llvm::cast<Variable>(MovInstr->getSrc(1)))->getRegNum();

1816 Variable *SrcGPRHi = Target->makeReg(IceType_i32, SecondReg);	1820 Variable *SrcGPRHi = Target->makeReg(IceType_i32, FirstReg);

1817 Variable *SrcGPRLo = Target->makeReg(IceType_i32, FirstReg);	1821 Variable *SrcGPRLo = Target->makeReg(IceType_i32, SecondReg);

1818 OperandMIPS32Mem *AddrHi = OperandMIPS32Mem::create(	1822 OperandMIPS32Mem *AddrHi = OperandMIPS32Mem::create(

1819 Target->Func, DestTy, Base,	1823 Target->Func, DestTy, Base,

1820 llvm::cast<ConstantInteger32>(	1824 llvm::cast<ConstantInteger32>(

1821 Target->Ctx->getConstantInt32(Offset + 4)));	1825 Target->Ctx->getConstantInt32(Offset + 4)));

1822 Target->_sw(SrcGPRLo, Addr);	1826 Target->_sw(SrcGPRLo, Addr);

1823 Target->_sw(SrcGPRHi, AddrHi);	1827 Target->_sw(SrcGPRHi, AddrHi);

1824 } else {	1828 } else {

1825 Target->_sw(SrcR, Addr);	1829 Target->_sw(SrcR, Addr);

1826 }	1830 }

1827	1831

(...skipping 701 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
2529 _mul(T, Src0R, Src1R);	2533 _mul(T, Src0R, Src1R);

2530 _mov(Dest, T);	2534 _mov(Dest, T);

2531 return;	2535 return;

2532 }	2536 }

2533 case InstArithmetic::Shl: {	2537 case InstArithmetic::Shl: {

2534 _sllv(T, Src0R, Src1R);	2538 _sllv(T, Src0R, Src1R);

2535 _mov(Dest, T);	2539 _mov(Dest, T);

2536 return;	2540 return;

2537 }	2541 }

2538 case InstArithmetic::Lshr: {	2542 case InstArithmetic::Lshr: {

2539 _srlv(T, Src0R, Src1R);	2543 auto *T0R = Src0R;

	2544 auto *T1R = Src1R;

	2545 if (Dest->getType() != IceType_i32) {

	2546 T0R = makeReg(IceType_i32);

	2547 lowerCast(InstCast::create(Func, InstCast::Zext, T0R, Src0R));

	2548 T1R = makeReg(IceType_i32);

	2549 lowerCast(InstCast::create(Func, InstCast::Zext, T1R, Src1R));

	2550 }

	2551 _srlv(T, T0R, T1R);

2540 _mov(Dest, T);	2552 _mov(Dest, T);

2541 return;	2553 return;

2542 }	2554 }

2543 case InstArithmetic::Ashr: {	2555 case InstArithmetic::Ashr: {

2544 _srav(T, Src0R, Src1R);	2556 _srav(T, Src0R, Src1R);

2545 _mov(Dest, T);	2557 _mov(Dest, T);

2546 return;	2558 return;

2547 }	2559 }

2548 case InstArithmetic::Udiv: {	2560 case InstArithmetic::Udiv: {

2549 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);	2561 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);

2550 _divu(T_Zero, Src0R, Src1R);	2562 auto *T0R = Src0R;

2551 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero	2563 auto *T1R = Src1R;

	2564 if (Dest->getType() != IceType_i32) {

	2565 T0R = makeReg(IceType_i32);

	2566 lowerCast(InstCast::create(Func, InstCast::Zext, T0R, Src0R));

	2567 T1R = makeReg(IceType_i32);

	2568 lowerCast(InstCast::create(Func, InstCast::Zext, T1R, Src1R));

	2569 }

	2570 _divu(T_Zero, T0R, T1R);

	2571 _teq(T1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero

2552 _mflo(T, T_Zero);	2572 _mflo(T, T_Zero);

2553 _mov(Dest, T);	2573 _mov(Dest, T);

2554 return;	2574 return;

2555 }	2575 }

2556 case InstArithmetic::Sdiv: {	2576 case InstArithmetic::Sdiv: {

2557 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);	2577 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);

2558 _div(T_Zero, Src0R, Src1R);	2578 _div(T_Zero, Src0R, Src1R);

2559 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero	2579 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero

2560 _mflo(T, T_Zero);	2580 _mflo(T, T_Zero);

2561 _mov(Dest, T);	2581 _mov(Dest, T);

2562 return;	2582 return;

2563 }	2583 }

2564 case InstArithmetic::Urem: {	2584 case InstArithmetic::Urem: {

2565 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);	2585 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);

2566 _divu(T_Zero, Src0R, Src1R);	2586 auto *T0R = Src0R;

2567 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero	2587 auto *T1R = Src1R;

	2588 if (Dest->getType() != IceType_i32) {

	2589 T0R = makeReg(IceType_i32);

	2590 lowerCast(InstCast::create(Func, InstCast::Zext, T0R, Src0R));

	2591 T1R = makeReg(IceType_i32);

	2592 lowerCast(InstCast::create(Func, InstCast::Zext, T1R, Src1R));

	2593 }

	2594 _divu(T_Zero, T0R, T1R);

	2595 _teq(T1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero

2568 _mfhi(T, T_Zero);	2596 _mfhi(T, T_Zero);

2569 _mov(Dest, T);	2597 _mov(Dest, T);

2570 return;	2598 return;

2571 }	2599 }

2572 case InstArithmetic::Srem: {	2600 case InstArithmetic::Srem: {

2573 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);	2601 auto *T_Zero = I32Reg(RegMIPS32::Reg_ZERO);

2574 _div(T_Zero, Src0R, Src1R);	2602 _div(T_Zero, Src0R, Src1R);

2575 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero	2603 _teq(Src1R, T_Zero, DivideByZeroTrapCode); // Trap if divide-by-zero

2576 _mfhi(T, T_Zero);	2604 _mfhi(T, T_Zero);

2577 _mov(Dest, T);	2605 _mov(Dest, T);

(...skipping 763 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
3341 UnimplementedLoweringError(this, Instr);	3369 UnimplementedLoweringError(this, Instr);

3342 break;	3370 break;

3343 }	3371 }

3344 case InstCast::Sitofp:	3372 case InstCast::Sitofp:

3345 case InstCast::Uitofp: {	3373 case InstCast::Uitofp: {

3346 if (llvm::isa<Variable64On32>(Dest)) {	3374 if (llvm::isa<Variable64On32>(Dest)) {

3347 llvm::report_fatal_error("i64-to-fp should have been prelowered.");	3375 llvm::report_fatal_error("i64-to-fp should have been prelowered.");

3348 return;	3376 return;

3349 }	3377 }

3350 if (Src0Ty != IceType_i64) {	3378 if (Src0Ty != IceType_i64) {

	3379 Variable *Src0R = legalizeToReg(Src0);

	3380 auto *T0R = Src0R;

	3381 if (Src0Ty != IceType_i32 && CastKind == InstCast::Uitofp) {

	3382 T0R = makeReg(IceType_i32);

	3383 lowerCast(InstCast::create(Func, InstCast::Zext, T0R, Src0R));

	3384 }

3351 if (isScalarIntegerType(Src0Ty) && DestTy == IceType_f32) {	3385 if (isScalarIntegerType(Src0Ty) && DestTy == IceType_f32) {

3352 Variable *Src0R = legalizeToReg(Src0);

3353 Variable *FTmp1 = makeReg(IceType_f32);	3386 Variable *FTmp1 = makeReg(IceType_f32);

3354 Variable *FTmp2 = makeReg(IceType_f32);	3387 Variable *FTmp2 = makeReg(IceType_f32);

3355 _mtc1(FTmp1, Src0R);	3388 _mtc1(FTmp1, T0R);

3356 _cvt_s_w(FTmp2, FTmp1);	3389 _cvt_s_w(FTmp2, FTmp1);

3357 _mov(Dest, FTmp2);	3390 _mov(Dest, FTmp2);

3358 return;	3391 return;

3359 }	3392 }

3360 if (isScalarIntegerType(Src0Ty) && DestTy == IceType_f64) {	3393 if (isScalarIntegerType(Src0Ty) && DestTy == IceType_f64) {

3361 Variable *Src0R = legalizeToReg(Src0);

3362 Variable *FTmp1 = makeReg(IceType_f64);	3394 Variable *FTmp1 = makeReg(IceType_f64);

3363 Variable *FTmp2 = makeReg(IceType_f64);	3395 Variable *FTmp2 = makeReg(IceType_f64);

3364 _mtc1(FTmp1, Src0R);	3396 _mtc1(FTmp1, T0R);

3365 _cvt_d_w(FTmp2, FTmp1);	3397 _cvt_d_w(FTmp2, FTmp1);

3366 _mov(Dest, FTmp2);	3398 _mov(Dest, FTmp2);

3367 return;	3399 return;

3368 }	3400 }

3369 }	3401 }

3370 UnimplementedLoweringError(this, Instr);	3402 UnimplementedLoweringError(this, Instr);

3371 break;	3403 break;

3372 }	3404 }

3373 case InstCast::Bitcast: {	3405 case InstCast::Bitcast: {

3374 Operand *Src0 = Instr->getSrc(0);	3406 Operand *Src0 = Instr->getSrc(0);

(...skipping 680 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
4055 lowerAssign(Assign);	4087 lowerAssign(Assign);

4056 return;	4088 return;

4057 }	4089 }

4058 llvm::report_fatal_error("InsertElement requires a constant index");	4090 llvm::report_fatal_error("InsertElement requires a constant index");

4059 }	4091 }

4060	4092

4061 void TargetMIPS32::lowerIntrinsicCall(const InstIntrinsicCall *Instr) {	4093 void TargetMIPS32::lowerIntrinsicCall(const InstIntrinsicCall *Instr) {

4062 Variable *Dest = Instr->getDest();	4094 Variable *Dest = Instr->getDest();

4063 Type DestTy = (Dest == nullptr) ? IceType_void : Dest->getType();	4095 Type DestTy = (Dest == nullptr) ? IceType_void : Dest->getType();

4064 switch (Instr->getIntrinsicInfo().ID) {	4096 switch (Instr->getIntrinsicInfo().ID) {

4065 default:

4066 llvm::report_fatal_error("Unexpected intrinsic");

4067 return;

4068 case Intrinsics::AtomicCmpxchg: {	4097 case Intrinsics::AtomicCmpxchg: {

4069 UnimplementedLoweringError(this, Instr);	4098 UnimplementedLoweringError(this, Instr);

4070 return;	4099 return;

4071 }	4100 }

4072 case Intrinsics::AtomicFence:	4101 case Intrinsics::AtomicFence:

4073 UnimplementedLoweringError(this, Instr);	4102 UnimplementedLoweringError(this, Instr);

4074 return;	4103 return;

4075 case Intrinsics::AtomicFenceAll:	4104 case Intrinsics::AtomicFenceAll:

4076 // NOTE: FenceAll should prevent and load/store from being moved across the	4105 // NOTE: FenceAll should prevent and load/store from being moved across the

4077 // fence (both atomic and non-atomic). The InstMIPS32Mfence instruction is	4106 // fence (both atomic and non-atomic). The InstMIPS32Mfence instruction is

(...skipping 1167 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
5245 Str << "\t.set\t"	5274 Str << "\t.set\t"

5246 << "nomips16\n";	5275 << "nomips16\n";

5247 }	5276 }

5248	5277

5249 SmallBitVector TargetMIPS32::TypeToRegisterSet[RCMIPS32_NUM];	5278 SmallBitVector TargetMIPS32::TypeToRegisterSet[RCMIPS32_NUM];

5250 SmallBitVector TargetMIPS32::TypeToRegisterSetUnfiltered[RCMIPS32_NUM];	5279 SmallBitVector TargetMIPS32::TypeToRegisterSetUnfiltered[RCMIPS32_NUM];

5251 SmallBitVector TargetMIPS32::RegisterAliases[RegMIPS32::Reg_NUM];	5280 SmallBitVector TargetMIPS32::RegisterAliases[RegMIPS32::Reg_NUM];

5252	5281

5253 } // end of namespace MIPS32	5282 } // end of namespace MIPS32

5254 } // end of namespace Ice	5283 } // end of namespace Ice

OLD	NEW

« no previous file with comments | « src/IceTargetLoweringARM32.cpp ('k') | src/IceTargetLoweringX86BaseImpl.h » ('j') | no next file with comments »