Issue 1655313002: Subzero: ARM32: lowering of vector insert and extract.

Eric Holk

eholk@chromium.org changed reviewers: + jpp@chromium.org, kschimpf@google.com, sehr@chromium.org, stichnot@chromium.org

4 years, 10 months ago (2016-02-02 20:51:07 UTC) #1

Eric Holk

This CL enables lowering of ARM32 vector insert and extract instructions. The integer variants lower ...

4 years, 10 months ago (2016-02-02 20:51:08 UTC) #2

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcode986 src/IceInstARM32.cpp:986: auto SrcReg = Src->getRegNum(); You will get some disagreement ...

4 years, 10 months ago (2016-02-03 15:28:38 UTC) #3

John

It would be nice if insert/extract also handled non-infinite-weight temporaries, and then just loaded/stored the ...

4 years, 10 months ago (2016-02-03 16:06:52 UTC) #4

It would be nice if insert/extract also handled non-infinite-weight temporaries,
and then just loaded/stored the desired element to memory.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp
File src/IceInstARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:986: auto SrcReg = Src->getRegNum();
const auto

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:991: // This code assumes the Aliases list goes Q_n, S_2n,
S_2n+1. The asserts in
On 2016/02/03 15:28:37, stichnot wrote:
> I would really like to stay away from any assumptions on ordering of the
> register enum values or the Aliases list.
> 
> IIUC, the purpose of mapping from Q register to D registers is for syntactic
> correctness of the emit() output, but ultimately in the emitIAS() output you
> will just take the encoded register value of the Q register and apply a simple
> transformation that has nothing to do with our various orderings.
> 
> So I'm thinking maybe this method should return the encoded D register value
> instead of the enum value, and the emit() code could then call another
function
> that returns a register name given the encoded value and register class/type
> (presumably f64 in this case because we know that's how to get at D
registers).

he will actually need to provide a D register to the underlying assembler. I am
concerned about this as well, so I asked him to leave a TODO behind (see below)
until we come up with something better that does not require assumptions or
clever tricks.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:993: auto HalfWidth = VectorSize / 2;
optional: I am very auto-lenient, but this is probably too much

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1023: // For floating point values, we hope we got
allocated to Q0 - Q7, so we can
I would rephrase this. "hoping" conveys the idea that we cross our fingers that
Src was allocated the right register.

also, won't this assertion fail for non-fp types if, e.g., q15 is allocated? (I
had to search for the uses of getSRegisters() below to figure out that this will
not happen... maybe assert(Src->getType() == v4f32)?)

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1031: return
(RegARM32::AllRegisters)RegARM32::RegTable[SrcReg].Aliases[Index + 3];
I fear what will happen if the alias declaration ever changes. Please leave a
TODO(jpp), and describe that we need to make this more resilient to changes in
the alias declaration.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1050: auto BitSize = typeWidthInBits(DestTy);
instead of adding this new method, why don't you just do

auto BitSize = typeWidthInBytes() * CHAR_BIT

?

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1371
src/IceInstARM32.h:1371: uint32_t Index;
const

also, why don't you add a

const uint32_t NumElements

?

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1376
src/IceInstARM32.h:1376: uint32_t getDIndex(uint32_t VectorSize) const;
VectorSize to me is always 128-bit. Maybe rename these to NumElements?

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32.cpp
File src/IceTargetLoweringARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.cpp:3841: Context.insert<InstFakeDef>(Dest);
On 2016/02/03 15:28:37, stichnot wrote:
> Maybe I haven't thought this through well enough, but this just seems wrong.

Yup, this is wrong. you have to handle these (which are just stack 'alloca'ed
arrays.)

If I am not mistaken, legalizeToReg() below will emit code to load from memory

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32.h
File src/IceTargetLoweringARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.h:424: void _extract(Variable *Dest, Variable *Src,
uint32_t Index,
On 2016/02/03 15:28:38, stichnot wrote:
> I would probably rename Src to Src0 for consistency.

It's been common practice (at least in the ARM32 backend) to name something Src
if it is the only Src. Unfortunately, this is not always followed, but I think
it makes moar sense -- if I see Src0 in the middle of a function I am tempted to
think there might be a Src1.

regardless of parameter naming, I would name this method _extract_element (and
_insert_element) simply because _insert and _extract are too generic.

Eric Holk

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcode986 src/IceInstARM32.cpp:986: auto SrcReg = Src->getRegNum(); On 2016/02/03 15:28:37, stichnot wrote: ...

4 years, 10 months ago (2016-02-03 21:02:22 UTC) #5

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp
File src/IceInstARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:986: auto SrcReg = Src->getRegNum();
On 2016/02/03 15:28:37, stichnot wrote:
> You will get some disagreement within the team, but I would like you to back
off
> on use of auto.
> 
> My approximate rules for when to use auto:
> 
> 1. Something like a cast operation (static_cast, llvm::dyn_cast) where the
> explicit type is completely redundant.
> 
>   const auto *Var = llvm::dyn_cast<Variable>(Opnd);
> 
> 2. Other operations where the the type is pretty clear and practically
redundant
> from context.
> 
>   auto *Instr = InstFoo::create();
> 
> 3. Situations where the actual type is obnoxiously verbose for no good reason.
> 
>   auto *Iter = MyContainer.begin();
>   auto Func = /* some lambda expression */;
> 
> Otherwise, code starts to become write-only and it gets really hard to make
> sense of the actual types without continual scrutinizing of header files.

I meant to remove most of my uses of auto before submitting this CL. I'm
generally a fan of auto since I've also spent a lot of time writing in
type-inferred languages, but the local style here is definitely to avoid using
it.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:986: auto SrcReg = Src->getRegNum();
On 2016/02/03 16:06:51, John wrote:
> const auto

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:991: // This code assumes the Aliases list goes Q_n, S_2n,
S_2n+1. The asserts in
On 2016/02/03 16:06:51, John wrote:
> On 2016/02/03 15:28:37, stichnot wrote:
> > I would really like to stay away from any assumptions on ordering of the
> > register enum values or the Aliases list.
> > 
> > IIUC, the purpose of mapping from Q register to D registers is for syntactic
> > correctness of the emit() output, but ultimately in the emitIAS() output you
> > will just take the encoded register value of the Q register and apply a
simple
> > transformation that has nothing to do with our various orderings.
> > 
> > So I'm thinking maybe this method should return the encoded D register value
> > instead of the enum value, and the emit() code could then call another
> function
> > that returns a register name given the encoded value and register class/type
> > (presumably f64 in this case because we know that's how to get at D
> registers).
> 
> he will actually need to provide a D register to the underlying assembler. I
am
> concerned about this as well, so I asked him to leave a TODO behind (see
below)
> until we come up with something better that does not require assumptions or
> clever tricks.

I've added the TODO.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:993: auto HalfWidth = VectorSize / 2;
On 2016/02/03 16:06:51, John wrote:
> optional: I am very auto-lenient, but this is probably too much

I removed it. HalfWidth was only used in one place, so I just replaced it with
VectorSize / 2.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1019: InstARM32ExtractInsert::getSRegister(Variable *Src)
const {
On 2016/02/03 15:28:37, stichnot wrote:
> Same comment as above.

Acknowledged.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1023: // For floating point values, we hope we got
allocated to Q0 - Q7, so we can
On 2016/02/03 16:06:51, John wrote:
> I would rephrase this. "hoping" conveys the idea that we cross our fingers
that
> Src was allocated the right register.
> 
> also, won't this assertion fail for non-fp types if, e.g., q15 is allocated?
(I
> had to search for the uses of getSRegisters() below to figure out that this
will
> not happen... maybe assert(Src->getType() == v4f32)?)

I fixed the comment. I wrote that before the higher up code was actually
guaranteeing that invariant.

I strengthened the assert too. This function should only be called on the v4f32
version anyway.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1031: return
(RegARM32::AllRegisters)RegARM32::RegTable[SrcReg].Aliases[Index + 3];
On 2016/02/03 16:06:51, John wrote:
> I fear what will happen if the alias declaration ever changes. Please leave a
> TODO(jpp), and describe that we need to make this more resilient to changes in
> the alias declaration.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1045: auto Src = llvm::dyn_cast<Variable>(getSrc(0));
On 2016/02/03 15:28:37, stichnot wrote:
> You can probably just use llvm::cast<> here and remove the assert.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.cpp#newcod...
src/IceInstARM32.cpp:1050: auto BitSize = typeWidthInBits(DestTy);
On 2016/02/03 16:06:51, John wrote:
> instead of adding this new method, why don't you just do
> 
> auto BitSize = typeWidthInBytes() * CHAR_BIT
> 
> ?

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1368
src/IceInstARM32.h:1368: : InstARM32Pred(Func, InstARM32::Extract, 2, Dest,
Predicate),
On 2016/02/03 15:28:37, stichnot wrote:
> Since both Insert and Extract have a single source Operand (i.e., a single
call
> to addSource()), I would pass it in to this base class ctor and do the
> addSource() here.
> 
> Also, "2" is the max number of addSource() calls, so make it "1" instead.

Done.

Originally I thought there was going to be more divergence between the two
subclasses, but I guess now.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1371
src/IceInstARM32.h:1371: uint32_t Index;
On 2016/02/03 15:28:37, stichnot wrote:
> const uint32_t Index

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1371
src/IceInstARM32.h:1371: uint32_t Index;
On 2016/02/03 16:06:51, John wrote:
> const
> 
> also, why don't you add a
> 
> const uint32_t NumElements
> 
> ?

It wasn't immediately clear what the right NumElements was at this point,
although the higher level constructor knows, so that's not a big problem.

The floating point version of this instruction doesn't need NumElements, because
it's always 4.

It doesn't matter much to me, but it's not hard to compute NumElements and it
isn't always needed, so it seems better not to bother storing it here.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
On 2016/02/03 15:28:37, stichnot wrote:
> This isn't absolutely clear-cut, but I think it would be a bit better if these
> three functions can figure out on their own what Src and VectorSize should be,
> and not have to pass them as arguments.

I wasn't totally thrilled with the design I used here.

The challenge is that the source and vector size are determined by the
destination for insertelement, but by the source for extractelement. The common
code doesn't know which version it is, so you'd basically have to duplicate this
code in each of the subclasses.

Perhaps the better design is to make a function that's not part of the class
that takes a register, numelements and an index. As it is right now, it's
strange that the register and numelements are given explicitly but the index is
implicit.

It would actually be pretty easy to compute NumElements from the Src though,
which would at least make it a little cleaner. I went ahead and made this
change, but I'm open to making more changes.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1376
src/IceInstARM32.h:1376: uint32_t getDIndex(uint32_t VectorSize) const;
On 2016/02/03 16:06:51, John wrote:
> VectorSize to me is always 128-bit. Maybe rename these to NumElements?

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1381
src/IceInstARM32.h:1381: };
On 2016/02/03 15:28:37, stichnot wrote:
> Implement classof() for completeness.

Does it make sense to have classof here, given that this class should never be
instantiated except as a base of InstARM32Extract and InstARM32Insert (which
both implement classof)? For example, InstArm32Pred does not include classof,
that I can see.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1421
src/IceInstARM32.h:1421: InstARM32Insert(Cfg *Func, Variable *Dest, Variable
*Src0, uint32_t Index,
On 2016/02/03 15:28:37, stichnot wrote:
> This parameter is named Src0, the one for Extract is name Src; make
consistent.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceRegistersARM32.h
File src/IceRegistersARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceRegistersARM32.h#new...
src/IceRegistersARM32.h:222: // Extend enum RegClass with ARM32-specific
register classes (if any).
On 2016/02/03 15:28:37, stichnot wrote:
> I guess the "(if any)" comment can be dropped.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32.cpp
File src/IceTargetLoweringARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.cpp:301: llvm::SmallBitVector
QtoSRegisters(RegARM32::Reg_NUM);
On 2016/02/03 15:28:37, stichnot wrote:
> I would list this before InvalidRegisters.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.cpp:3854: Src0->setRegClass(RegARM32::RCARM32_QtoS);
On 2016/02/03 15:28:38, stichnot wrote:
> It's possible (though unlikely) that legalizeToReg(Instr->getSrc(0)) just
> returns its argument without creating a new temporary, if src0 is already
known
> to be a pre-colored or infinite-weight variable.  It's also possible that it
> returns an existing temporary from the previous instruction lowering due to
the
> local availability optimization.
> 
> In either of these cases, you really don't want to override Src0's register
> class.
> 
> Instead, move Src0 into a new temporary with the desired register class.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.cpp:3861: assert(false && "extract requires a
constant index");
On 2016/02/03 15:28:38, stichnot wrote:
> s/extract/extractelement
> would make a more instantly understandable message.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32.h
File src/IceTargetLoweringARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.h:93: assert(RegARM32::RCARM32_QtoS <
RegARM32::RCARM32_NUM);
On 2016/02/03 15:28:38, stichnot wrote:
> I think this assert is pretty much guaranteed to be true and you can remove
it. 
> (Another possibility is to use static_assert, but it's trivially true enough
> that I just wouldn't bother.)
> 

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.h:424: void _extract(Variable *Dest, Variable *Src,
uint32_t Index,
On 2016/02/03 15:28:38, stichnot wrote:
> I would probably rename Src to Src0 for consistency.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.h:424: void _extract(Variable *Dest, Variable *Src,
uint32_t Index,
On 2016/02/03 16:06:52, John wrote:
> On 2016/02/03 15:28:38, stichnot wrote:
> > I would probably rename Src to Src0 for consistency.
> 
> It's been common practice (at least in the ARM32 backend) to name something
Src
> if it is the only Src. Unfortunately, this is not always followed, but I think
> it makes moar sense -- if I see Src0 in the middle of a function I am tempted
to
> think there might be a Src1.
> 
> regardless of parameter naming, I would name this method _extract_element (and
> _insert_element) simply because _insert and _extract are too generic.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTargetLoweringARM32....
src/IceTargetLoweringARM32.h:431: void _insert(Variable *Dest, Variable *Val,
uint32_t Index,
On 2016/02/03 15:28:38, stichnot wrote:
> I would probably rename Val to Src0 for consistency.

Done.

https://codereview.chromium.org/1655313002/diff/1/src/IceTypes.cpp
File src/IceTypes.cpp (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceTypes.cpp#newcode148
src/IceTypes.cpp:148: size_t typeWidthInBits(Type Ty) { return
typeWidthInBytes(Ty) * 8; }
On 2016/02/03 15:28:38, stichnot wrote:
> Probably CHAR_BIT would be more appropriate than 8.

Agreed. I removed this function and inlined it where needed, following John's
suggestion.

https://codereview.chromium.org/1655313002/diff/1/tests_lit/assembler/arm32/i...
File tests_lit/assembler/arm32/insert-extract.ll (right):

https://codereview.chromium.org/1655313002/diff/1/tests_lit/assembler/arm32/i...
tests_lit/assembler/arm32/insert-extract.ll:1: ; Show that we know how ot
translate insertelement and extractelement.
On 2016/02/03 15:28:38, stichnot wrote:
> to

Done.

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h File src/IceInstARM32.h (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375 src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src, uint32_t VectorSize) const; On 2016/02/03 21:02:21, ...

4 years, 10 months ago (2016-02-03 22:32:22 UTC) #6

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
On 2016/02/03 21:02:21, Eric Holk wrote:
> On 2016/02/03 15:28:37, stichnot wrote:
> > This isn't absolutely clear-cut, but I think it would be a bit better if
these
> > three functions can figure out on their own what Src and VectorSize should
be,
> > and not have to pass them as arguments.
> 
> I wasn't totally thrilled with the design I used here.
> 
> The challenge is that the source and vector size are determined by the
> destination for insertelement, but by the source for extractelement. The
common
> code doesn't know which version it is, so you'd basically have to duplicate
this
> code in each of the subclasses.
> 
> Perhaps the better design is to make a function that's not part of the class
> that takes a register, numelements and an index. As it is right now, it's
> strange that the register and numelements are given explicitly but the index
is
> implicit.
> 
> It would actually be pretty easy to compute NumElements from the Src though,
> which would at least make it a little cleaner. I went ahead and made this
> change, but I'm open to making more changes.

I was thinking that the common code could determine whether to use Dest or Src
based on whether the instruction kind is Insert or Extract.

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1381
src/IceInstARM32.h:1381: };
On 2016/02/03 21:02:21, Eric Holk wrote:
> On 2016/02/03 15:28:37, stichnot wrote:
> > Implement classof() for completeness.
> 
> Does it make sense to have classof here, given that this class should never be
> instantiated except as a base of InstARM32Extract and InstARM32Insert (which
> both implement classof)? For example, InstArm32Pred does not include classof,
> that I can see.

If you aren't already aware, have a look at
http://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html .

In this case, the implementation should look something like:

static bool classof(const Inst *Inst) {
  return isClassof(Inst, Insert) || isClassof(Inst, Extract);
}

I didn't know InstArm32Pred was incomplete, maybe it should be fixed? :)

For example, see InstTarget, Constant, Variable.

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.cpp
File src/IceInstARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.cpp#ne...
src/IceInstARM32.cpp:1042: auto &Str = Func->getContext()->getStrEmit();
Ostream &Str

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.cpp#ne...
src/IceInstARM32.cpp:1043: auto DestTy = getDest()->getType();
const Type DestTy

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.cpp#ne...
src/IceInstARM32.cpp:1045: auto Src = llvm::dyn_cast<Variable>(getSrc(0));
auto *

https://codereview.chromium.org/1655313002/diff/20001/src/IceTargetLoweringAR...
File src/IceTargetLoweringARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/20001/src/IceTargetLoweringAR...
src/IceTargetLoweringARM32.cpp:3851: Variable *TSrc0 = makeReg(Src0->getType());
Bikeshed: the name "TSrc0" strikes me as unlike any other variable name here. 
I'm too lazy to figure out a better name, any ideas John?

https://codereview.chromium.org/1655313002/diff/20001/src/IceTargetLoweringAR...
src/IceTargetLoweringARM32.cpp:3855: TSrc0->setRegClass(RegARM32::RCARM32_QtoS);
Another possibility is to move more stuff into this conditional:
if (isFloatingType(DestTy)) {
  Variable *TSrc0 = makeReg(Src0->getType());
  TSrc0->setRegClass(RegARM32::RCARM32_QtoS);
  _mov(TSrc0, Src0);
  Src0 = TSrc0;
}

John

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h File src/IceInstARM32.h (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375 src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src, uint32_t VectorSize) const; On 2016/02/03 22:32:21, ...

4 years, 10 months ago (2016-02-04 15:23:57 UTC) #7

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
On 2016/02/03 22:32:21, stichnot wrote:
> On 2016/02/03 21:02:21, Eric Holk wrote:
> > On 2016/02/03 15:28:37, stichnot wrote:
> > > This isn't absolutely clear-cut, but I think it would be a bit better if
> these
> > > three functions can figure out on their own what Src and VectorSize should
> be,
> > > and not have to pass them as arguments.
> > 
> > I wasn't totally thrilled with the design I used here.
> > 
> > The challenge is that the source and vector size are determined by the
> > destination for insertelement, but by the source for extractelement. The
> common
> > code doesn't know which version it is, so you'd basically have to duplicate
> this
> > code in each of the subclasses.
> > 
> > Perhaps the better design is to make a function that's not part of the class
> > that takes a register, numelements and an index. As it is right now, it's
> > strange that the register and numelements are given explicitly but the index
> is
> > implicit.
> > 
> > It would actually be pretty easy to compute NumElements from the Src though,
> > which would at least make it a little cleaner. I went ahead and made this
> > change, but I'm open to making more changes.
> 
> I was thinking that the common code could determine whether to use Dest or Src
> based on whether the instruction kind is Insert or Extract.

what about defining a virtual method in this class:

Variable *getVectorOperand() const;

and implement getDRegister in terms of that? I believe that's all you'd need.

Alternatively, you could have a

const Variable *VectorOperand;

and let the base classes set it to the right operand. (I'd prefer this approach
to avoid the virtual call.)

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
Another issue I'd like to point out here. if this method takes no parameter,
then I'd say getDRegister does not convey the idea of what this does. I would
provably implement this helper (and all the other helpers in this base class) as
global functions in anonymous namespaces in the .cpp file.

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/20001/src/IceInstARM32.h#newc...
src/IceInstARM32.h:1380: // For floating point values, we can read directly from
an S register. This
read/write ... from/to?

https://codereview.chromium.org/1655313002/diff/20001/src/IceTargetLoweringAR...
File src/IceTargetLoweringARM32.cpp (right):

https://codereview.chromium.org/1655313002/diff/20001/src/IceTargetLoweringAR...
src/IceTargetLoweringARM32.cpp:3855: TSrc0->setRegClass(RegARM32::RCARM32_QtoS);
On 2016/02/03 22:32:21, stichnot wrote:
> Another possibility is to move more stuff into this conditional:
> if (isFloatingType(DestTy)) {
>   Variable *TSrc0 = makeReg(Src0->getType());
>   TSrc0->setRegClass(RegARM32::RCARM32_QtoS);
>   _mov(TSrc0, Src0);
>   Src0 = TSrc0;
> }

+1

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h File src/IceInstARM32.h (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375 src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src, uint32_t VectorSize) const; On 2016/02/04 15:23:57, ...

4 years, 10 months ago (2016-02-04 16:44:12 UTC) #8

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
On 2016/02/04 15:23:57, John wrote:
> On 2016/02/03 22:32:21, stichnot wrote:
> > On 2016/02/03 21:02:21, Eric Holk wrote:
> > > On 2016/02/03 15:28:37, stichnot wrote:
> > > > This isn't absolutely clear-cut, but I think it would be a bit better if
> > these
> > > > three functions can figure out on their own what Src and VectorSize
should
> > be,
> > > > and not have to pass them as arguments.
> > > 
> > > I wasn't totally thrilled with the design I used here.
> > > 
> > > The challenge is that the source and vector size are determined by the
> > > destination for insertelement, but by the source for extractelement. The
> > common
> > > code doesn't know which version it is, so you'd basically have to
duplicate
> > this
> > > code in each of the subclasses.
> > > 
> > > Perhaps the better design is to make a function that's not part of the
class
> > > that takes a register, numelements and an index. As it is right now, it's
> > > strange that the register and numelements are given explicitly but the
index
> > is
> > > implicit.
> > > 
> > > It would actually be pretty easy to compute NumElements from the Src
though,
> > > which would at least make it a little cleaner. I went ahead and made this
> > > change, but I'm open to making more changes.
> > 
> > I was thinking that the common code could determine whether to use Dest or
Src
> > based on whether the instruction kind is Insert or Extract.
> 
> what about defining a virtual method in this class:
> 
> Variable *getVectorOperand() const;
> 
> and implement getDRegister in terms of that? I believe that's all you'd need.

I recalled that in general, we had tried to avoid virtual methods in Inst and
Operand (particularly Variable), to keep object sizes smaller by avoiding
vtables.

Unfortunately I see that Inst has a bunch of integral virtual methods, so oh
well.  Adding a virtual method is probably a fine approach.  Also, this would
not be a virtual method in the base class, so even if we were generally free and
clear of virtual methods, the impact here would be small.

> 
> Alternatively, you could have a
> 
> const Variable *VectorOperand;
> 
> and let the base classes set it to the right operand. (I'd prefer this
approach
> to avoid the virtual call.)

I don't think precisely this is a good idea.  There is essentially an assumption
that all Variable* are captured by FOREACH_VAR_IN_INST().  Probably nothing
would break today by adding a "hidden" Variable*, but...  Maybe better would be
Variable **VectorOperand, with appropriate const's thrown in?

John

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h File src/IceInstARM32.h (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375 src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src, uint32_t VectorSize) const; On 2016/02/04 16:44:12, ...

4 years, 10 months ago (2016-02-04 16:52:24 UTC) #9

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h
File src/IceInstARM32.h (right):

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1375
src/IceInstARM32.h:1375: RegARM32::AllRegisters getDRegister(Variable *Src,
uint32_t VectorSize) const;
On 2016/02/04 16:44:12, stichnot wrote:
> On 2016/02/04 15:23:57, John wrote:
> > On 2016/02/03 22:32:21, stichnot wrote:
> > > On 2016/02/03 21:02:21, Eric Holk wrote:
> > > > On 2016/02/03 15:28:37, stichnot wrote:
> > > > > This isn't absolutely clear-cut, but I think it would be a bit better
if
> > > these
> > > > > three functions can figure out on their own what Src and VectorSize
> should
> > > be,
> > > > > and not have to pass them as arguments.
> > > > 
> > > > I wasn't totally thrilled with the design I used here.
> > > > 
> > > > The challenge is that the source and vector size are determined by the
> > > > destination for insertelement, but by the source for extractelement. The
> > > common
> > > > code doesn't know which version it is, so you'd basically have to
> duplicate
> > > this
> > > > code in each of the subclasses.
> > > > 
> > > > Perhaps the better design is to make a function that's not part of the
> class
> > > > that takes a register, numelements and an index. As it is right now,
it's
> > > > strange that the register and numelements are given explicitly but the
> index
> > > is
> > > > implicit.
> > > > 
> > > > It would actually be pretty easy to compute NumElements from the Src
> though,
> > > > which would at least make it a little cleaner. I went ahead and made
this
> > > > change, but I'm open to making more changes.
> > > 
> > > I was thinking that the common code could determine whether to use Dest or
> Src
> > > based on whether the instruction kind is Insert or Extract.
> > 
> > what about defining a virtual method in this class:
> > 
> > Variable *getVectorOperand() const;
> > 
> > and implement getDRegister in terms of that? I believe that's all you'd
need.
> 
> I recalled that in general, we had tried to avoid virtual methods in Inst and
> Operand (particularly Variable), to keep object sizes smaller by avoiding
> vtables.
> 
> Unfortunately I see that Inst has a bunch of integral virtual methods, so oh
> well.  Adding a virtual method is probably a fine approach.  Also, this would
> not be a virtual method in the base class, so even if we were generally free
and
> clear of virtual methods, the impact here would be small.
> 
> > 
> > Alternatively, you could have a
> > 
> > const Variable *VectorOperand;
> > 
> > and let the base classes set it to the right operand. (I'd prefer this
> approach
> > to avoid the virtual call.)
> 
> I don't think precisely this is a good idea.  There is essentially an
assumption
> that all Variable* are captured by FOREACH_VAR_IN_INST().  Probably nothing
> would break today by adding a "hidden" Variable*, but...  Maybe better would
be
> Variable **VectorOperand, with appropriate const's thrown in?

I am **not** suggesting this member in lieu of the canonical Srcs array; See,
for example, X86OperandMem: those classes (X8632 and X8664) will cache Base, and
Index, but they will **also** add them to Srcs.

You don't need a double pointer; a single pointer:

Variable *const Operand;

Eric Holk

https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h File src/IceInstARM32.h (right): https://codereview.chromium.org/1655313002/diff/1/src/IceInstARM32.h#newcode1381 src/IceInstARM32.h:1381: }; On 2016/02/03 22:32:21, stichnot wrote: > On 2016/02/03 ...

4 years, 10 months ago (2016-02-04 20:57:25 UTC) #10

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/60001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/60001/src/IceInstARM32.cpp#newcode1046 src/IceInstARM32.cpp:1046: auto Src = llvm::dyn_cast<Variable>(getSrc(0)); auto * https://codereview.chromium.org/1655313002/diff/60001/src/IceInstARM32.cpp#newcode1086 src/IceInstARM32.cpp:1086: const ...

4 years, 10 months ago (2016-02-05 13:27:20 UTC) #11

Eric Holk

https://codereview.chromium.org/1655313002/diff/60001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/60001/src/IceInstARM32.cpp#newcode1046 src/IceInstARM32.cpp:1046: auto Src = llvm::dyn_cast<Variable>(getSrc(0)); On 2016/02/05 13:27:20, stichnot wrote: ...

4 years, 10 months ago (2016-02-05 17:52:01 UTC) #12

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/80001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/80001/src/IceInstARM32.cpp#newcode1033 src/IceInstARM32.cpp:1033: // This part assumes the register alias list is ...

4 years, 10 months ago (2016-02-08 18:08:29 UTC) #13

Eric Holk

https://codereview.chromium.org/1655313002/diff/80001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/80001/src/IceInstARM32.cpp#newcode1033 src/IceInstARM32.cpp:1033: // This part assumes the register alias list is ...

4 years, 10 months ago (2016-02-08 18:44:11 UTC) #14

Jim Stichnoth

https://codereview.chromium.org/1655313002/diff/100001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/100001/src/IceInstARM32.cpp#newcode1122 src/IceInstARM32.cpp:1122: auto VectorSize = typeNumElements(Src->getType()); no auto here, I think ...

4 years, 10 months ago (2016-02-08 20:13:59 UTC) #15

Eric Holk

https://codereview.chromium.org/1655313002/diff/100001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/100001/src/IceInstARM32.cpp#newcode1122 src/IceInstARM32.cpp:1122: auto VectorSize = typeNumElements(Src->getType()); On 2016/02/08 20:13:59, stichnot wrote: ...

4 years, 10 months ago (2016-02-08 21:24:17 UTC) #16

Jim Stichnoth

lgtm https://codereview.chromium.org/1655313002/diff/140001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/140001/src/IceInstARM32.cpp#newcode1051 src/IceInstARM32.cpp:1051: const Register SrcReg = (Register)Src->getRegNum(); Here and the ...

4 years, 10 months ago (2016-02-08 23:08:58 UTC) #17

Eric Holk

Thanks! https://codereview.chromium.org/1655313002/diff/140001/src/IceInstARM32.cpp File src/IceInstARM32.cpp (right): https://codereview.chromium.org/1655313002/diff/140001/src/IceInstARM32.cpp#newcode1051 src/IceInstARM32.cpp:1051: const Register SrcReg = (Register)Src->getRegNum(); On 2016/02/08 23:08:58, ...

4 years, 10 months ago (2016-02-08 23:22:12 UTC) #18

Eric Holk

Description was changed from ========== Subzero: ARM32: lowering of vector insert and extract. BUG= https://bugs.chromium.org/p/nativeclient/issues/detail?id=4076 ...

4 years, 10 months ago (2016-02-08 23:22:21 UTC) #19

Eric Holk

4 years, 10 months ago (2016-02-08 23:22:22 UTC) #20

Message was sent while issue was closed.

Committed patchset #9 (id:160001) manually as
658bae2023cace91f8489f597c1c82dfe95aaaa2 (presubmit successful).

Issue 1655313002: Subzero: ARM32: lowering of vector insert and extract. (Closed)

Description

Patch Set 1 #

Patch Set 2 : Incorporating review feedback" #

Patch Set 3 : More code cleanup. #

Patch Set 4 : Removing incorrect use of rematerializable check" #

Patch Set 5 : Incorporating review feedback. #

Patch Set 6 : Code review feedback. #

Patch Set 7 : #

Patch Set 8 : No more auto. #

Patch Set 9 : Fixes #

Messages