|
|
Created:
6 years, 9 months ago by Fritz Modified:
6 years, 6 months ago CC:
v8-dev Visibility:
Public. |
DescriptionDo away with variable length memcpy to Set/Get registers in simulator
About a 32% boost.
Before - 5:31
Richards: 84.5
DeltaBlue: 128
Crypto: 65.3
RayTrace: 203
EarleyBoyer: 149
RegExp: 23.4
Splay: 121
NavierStokes: 98.9
----
Score (version 7): 93.8
After - 4:10
Richards: 107
DeltaBlue: 175
Crypto: 93.9
RayTrace: 258
EarleyBoyer: 186
RegExp: 32.7
Splay: 165
NavierStokes: 124
----
Score (version 7): 124
R=jacob.bramley@arm.com, svenpanne@chromium.org
Committed: https://code.google.com/p/v8/source/detail?r=21448
Committed: https://code.google.com/p/v8/source/detail?r=21804
Patch Set 1 #Patch Set 2 : #
Total comments: 11
Patch Set 3 : #
Total comments: 2
Patch Set 4 : #Patch Set 5 : #
Total comments: 27
Patch Set 6 : #
Total comments: 5
Patch Set 7 : #Patch Set 8 : #
Total comments: 1
Patch Set 9 : #
Total comments: 1
Messages
Total messages: 34 (0 generated)
https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:165: ASSERT(size/8 <= kXRegSize); There are actually only two supported sizes at the moment; T in the original may have had other sizes of course, but the _access_ is always 32- or 64-bits. Perhaps this should assert that size is either kXRegSizeInBits or kWRegSizeInBits (or kDRegSizeInBits or kSRegSizeInBits). The use of kSizeInBytes in the original was to allow for NEON Q registers in the future; this would require a 128-bit backing store and the ability to store 128 bits at a time. https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:169: if (size == kSRegSizeInBits) { Why kSRegSize here? These can be any register type (currently X, W, D, S). I think we would use kWRegSize by default, perhaps with a static assertion as you used to check that kSRegSize == kWRegSize. https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:179: } reg = { new_value }; This isn't safe for type-aliasing rules. In another recent patch, we decided not to worry about that, but it's worth noting it here. On the other hand, a memcpy here should be free if the size is known at compile time. Perhaps we could keep this type-aliasing-safe and still have the speed advantage: void Set(T new_value) { value_ = 0; memcpy(&value_, &new_value, sizeof(new_value)); } void Set(T new_value, unsigned size) { if (size == sizeof(new_value)) { Set(new_value); } else { // Slower case. } } https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:180: STATIC_ASSERT(sizeof(union SetUnion) == sizeof(value_)); That assertion isn't sufficient; if T is smaller than int64_t, sizeof(reg) will still match sizeof(value_). https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:182: value_ = reg.value; Again, if T is smaller than value_, this won't correctly zero-extend it.
https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:165: ASSERT(size/8 <= kXRegSize); On 2014/03/27 17:50:42, jbramley wrote: > There are actually only two supported sizes at the moment; T in the original may > have had other sizes of course, but the _access_ is always 32- or 64-bits. > Perhaps this should assert that size is either kXRegSizeInBits or > kWRegSizeInBits (or kDRegSizeInBits or kSRegSizeInBits). > > The use of kSizeInBytes in the original was to allow for NEON Q registers in the > future; this would require a 128-bit backing store and the ability to store 128 > bits at a time. Is there a plan to use Q registers? I think this method can safely extend to 128-bit registers. uint128_t is available in gcc 4.8. Not sure what compilers are required to be supported for the compiler. https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:169: if (size == kSRegSizeInBits) { On 2014/03/27 17:50:42, jbramley wrote: > Why kSRegSize here? These can be any register type (currently X, W, D, S). I > think we would use kWRegSize by default, perhaps with a static assertion as you > used to check that kSRegSize == kWRegSize. I choose kSRegSize because I was following the logic in fpreg. I now see this was incorrect. Done. https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:179: } reg = { new_value }; On 2014/03/27 17:50:42, jbramley wrote: > This isn't safe for type-aliasing rules. In another recent patch, we decided not > to worry about that, but it's worth noting it here. > > On the other hand, a memcpy here should be free if the size is known at compile > time. Perhaps we could keep this type-aliasing-safe and still have the speed > advantage: > > void Set(T new_value) { > value_ = 0; > memcpy(&value_, &new_value, sizeof(new_value)); > } > > void Set(T new_value, unsigned size) { > if (size == sizeof(new_value)) { > Set(new_value); > } else { > // Slower case. > } > } I tried this method earlier. https://codereview.chromium.org/203263017/ and it had to be reverted due to compiler bugs. I think this method is closer to what would be expected from the hardware, and faster. I'm specifically trying to type-alias here in a safe way. This is defining a union that takes both a 32 or a 64 bit number and stores it at the same location that is 64 bits in size. https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:180: STATIC_ASSERT(sizeof(union SetUnion) == sizeof(value_)); On 2014/03/27 17:50:42, jbramley wrote: > That assertion isn't sufficient; if T is smaller than int64_t, sizeof(reg) will > still match sizeof(value_). Which is what I want. From a conceptual standpoint I think that a union fits what is going on with the register. It has 64 bits, but in 32 bit mode only half is accessible. But they have the same address. Just like a union. I'll agree the initialization was incorrect. C89 only initializes the first element. I can't find documentation on what happens if only the smaller element is initialized. You are probably right that the top half is then undefined.
NOT LGTM. I think we have to approach the whole problem differently... https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:185: value_ = reg.value; I am slightly worried about using the union trick here: This is undefined behavior in C++ and we've actually seen compilers (clang IIRC) generating not the code one wants here. v8 has the BitCast template for such tricks, but this contains a memcpy for just that reason. https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:207: return reg.result; Same concern here as above: The compiler is free to elide the assignment of value_ to reg.value, because according to the rules, reading reg.result can't depend on that assignment.
Ooops, for some reasons I didn't see your previous comments (temporary blindness due to sheriffing duties?), but I think we are in the same boat here that the union trick doesn't really work...
> https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h > File src/arm64/simulator-arm64.h (right): > > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:165: ASSERT(size/8 <= kXRegSize); > On 2014/03/27 17:50:42, jbramley wrote: > > There are actually only two supported sizes at the moment; T in the original > may > > have had other sizes of course, but the _access_ is always 32- or 64-bits. > > Perhaps this should assert that size is either kXRegSizeInBits or > > kWRegSizeInBits (or kDRegSizeInBits or kSRegSizeInBits). > > > > The use of kSizeInBytes in the original was to allow for NEON Q registers in > the > > future; this would require a 128-bit backing store and the ability to store > 128 > > bits at a time. > > Is there a plan to use Q registers? I think this method can safely extend to > 128-bit registers. uint128_t is available in gcc 4.8. Not sure what compilers > are required to be supported for the compiler. ARM has them, and whatever optimisation they're used for there ought to be applicable to A64 too. I don't know, but I suspect that we need to be able to support much older GCCs than that. My stdint.h doesn't have int128_t, but GCC 4.8 seems to have a built-in __int128_t type. However, support for __int128_t is also target-specific, so there's a good chance that it won't be available for A64 anyway. > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:179: } reg = { new_value }; > On 2014/03/27 17:50:42, jbramley wrote: > > This isn't safe for type-aliasing rules. In another recent patch, we decided > not > > to worry about that, but it's worth noting it here. > > > > On the other hand, a memcpy here should be free if the size is known at > compile > > time. Perhaps we could keep this type-aliasing-safe and still have the speed > > advantage: > > > > void Set(T new_value) { > > value_ = 0; > > memcpy(&value_, &new_value, sizeof(new_value)); > > } > > > > void Set(T new_value, unsigned size) { > > if (size == sizeof(new_value)) { > > Set(new_value); > > } else { > > // Slower case. > > } > > } > > I tried this method earlier. > https://codereview.chromium.org/203263017/ > and it had to be reverted due to compiler bugs. I think this method is closer > to what would be expected from the hardware, and faster. > > I'm specifically trying to type-alias here in a safe way. This is defining a > union that takes both a 32 or a 64 bit number and stores it at the same location > that is 64 bits in size. Yes, sorry, I think this is safe here, provided that T has 64 bits. (I mis-remembered the rules surrounding unions; there are some type-punning uses of unions that are not safe.) Having said that, Sven mentioned that Clang doesn't do this right anyway. Interestingly, it looks like C++ is somewhat more relaxed about type punning than C is. In particular, it suggests that reinterpret_cast can be used in some cases that are definitely not safe in C. I think I need to spend some time reading up on C++ aliasing rules. > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:180: STATIC_ASSERT(sizeof(union SetUnion) == > sizeof(value_)); > On 2014/03/27 17:50:42, jbramley wrote: > > That assertion isn't sufficient; if T is smaller than int64_t, sizeof(reg) > will > > still match sizeof(value_). > > Which is what I want. From a conceptual standpoint I think that a union fits > what is going on with the register. It has 64 bits, but in 32 bit mode only > half is accessible. But they have the same address. Just like a union. Sort-of, yes, but a 32-bit write to a 64-bit register does actually write 64 bits, so it's not a perfect fit. > I'll agree the initialization was incorrect. C89 only initializes the first > element. I can't find documentation on what happens if only the smaller element > is initialized. You are probably right that the top half is then undefined. C99 also says that the other bits are undefined. C++ initialization is a little more complicated, but I think it has the same effect in this case.
https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:179: } reg = { new_value }; On 2014/03/27 19:58:33, Fritz wrote: > I'm specifically trying to type-alias here in a safe way. [...] I don't think that such a thing is possible: IIRC, the only defined way of using a union is accessing its fields consistently, i.e. read only out of the field you have last written into, everything else is totally undefined (at least this was the definition for C). Or in other words: The only safe way to use a union is to save space, not to do some funny casting/conversion tricks. Has this changed in C++? If yes, could somebody point me to the right section of the spec? It would be good to know, because I'm quite sure we already had trouble in this area.
https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:179: } reg = { new_value }; On 2014/03/28 11:46:04, Sven Panne wrote: > On 2014/03/27 19:58:33, Fritz wrote: > > I'm specifically trying to type-alias here in a safe way. [...] > > I don't think that such a thing is possible: IIRC, the only defined way of using > a union is accessing its fields consistently, i.e. read only out of the field > you have last written into, everything else is totally undefined (at least this > was the definition for C). Or in other words: The only safe way to use a union > is to save space, not to do some funny casting/conversion tricks. It seems to be allowed in C99 TC3 (dated the September 2007-09-07). See section 6.5.2.3, page 73: "If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type [...]" I don't think it was allowed in TC2, but I don't have it to hand. > Has this changed in C++? If yes, could somebody point me to the right section of > the spec? It would be good to know, because I'm quite sure we already had > trouble in this area. I don't think it's allowed in C++98 but I can't find a clear bit of the specification that says that. However, it does have a rather surprising note on reinterpret_cast suggesting that reinterpret_cast<T&>(x) can be used to safely do type-punning. I'd not seen this before, and every resource I've seen until now has said that this is _not_ safe. This (draft) standard is dated 2005-10-19 and I'm not sure where I got it from. I still think that the only _portable_ way to do it is to use memcpy; that's safe in every version of the standards, but the rules are rather more relaxed than I previously thought.
On 2014/03/28 11:59:58, jbramley wrote: > [...] However, it does have a rather surprising note on > reinterpret_cast suggesting that reinterpret_cast<T&>(x) can be used to safely > do type-punning. I'd not seen this before, and every resource I've seen until > now has said that this is _not_ safe. This (draft) standard is dated 2005-10-19 > and I'm not sure where I got it from. Hmmm interesting, I didn't know that. I have a Working Draft (N3242=11-0012) from 2011-02-28 which says: ---------------------------------------------- 5.2.10 Reinterpret cast ... 11 An lvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. That is, a reference cast reinterpret_cast<T&>(x) has the same effect as the conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators (and similarly for reinterpret_cast<T&&>(x)). The result refers to the same object as the source lvalue, but with a different type. The result is an lvalue for an lvalue reference type or an rvalue reference to function type and an xvalue for an rvalue reference to object type. No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called. (71) (This is sometimes referred to as a type pun.) ---------------------------------------------- > I still think that the only _portable_ way to do it is to use memcpy; that's > safe in every version of the standards, but the rules are rather more relaxed > than I previously thought. One can read the section above in the sense that a reference cast is safe for our purposes, too, but the next question is: Do all compilers implement that correctly? :-}
On 2014/03/28 12:16:14, Sven Panne wrote: > On 2014/03/28 11:59:58, jbramley wrote: > > [...] However, it does have a rather surprising note on > > reinterpret_cast suggesting that reinterpret_cast<T&>(x) can be used to safely > > do type-punning. I'd not seen this before, and every resource I've seen until > > now has said that this is _not_ safe. This (draft) standard is dated > 2005-10-19 > > and I'm not sure where I got it from. > > Hmmm interesting, I didn't know that. I have a Working Draft (N3242=11-0012) > from 2011-02-28 which says: > > ---------------------------------------------- > 5.2.10 Reinterpret cast > ... 11 An lvalue expression of type T1 can be cast to the type “reference to T2” > if an expression of type “pointer to T1” can be explicitly converted to the type > “pointer to T2” using a reinterpret_cast. That is, a reference cast > reinterpret_cast<T&>(x) has the same effect as the conversion > *reinterpret_cast<T*>(&x) with the built-in & and * operators (and similarly for > reinterpret_cast<T&&>(x)). The result refers to the same object as the source > lvalue, but with a different type. The result is an lvalue for an lvalue > reference type or an rvalue reference to function type and an xvalue for an > rvalue reference to object type. No temporary is created, no copy is made, and > constructors (12.1) or conversion functions (12.3) are not called. (71) (This is > sometimes referred to as a type pun.) > ---------------------------------------------- That's very similar to mine (N1905=05-0165). > > I still think that the only _portable_ way to do it is to use memcpy; that's > > safe in every version of the standards, but the rules are rather more relaxed > > than I previously thought. > > One can read the section above in the sense that a reference cast is safe for > our purposes, too, but the next question is: Do all compilers implement that > correctly? :-} That is an good question, especially if a C++ compiler is based on C behaviour, where such casts are not safe.
On 2014/03/28 11:59:58, jbramley wrote: > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h > File src/arm64/simulator-arm64.h (right): > > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:179: } reg = { new_value }; > On 2014/03/28 11:46:04, Sven Panne wrote: > > On 2014/03/27 19:58:33, Fritz wrote: > > > I'm specifically trying to type-alias here in a safe way. [...] > > > > I don't think that such a thing is possible: IIRC, the only defined way of > using > > a union is accessing its fields consistently, i.e. read only out of the field > > you have last written into, everything else is totally undefined (at least > this > > was the definition for C). Or in other words: The only safe way to use a union > > is to save space, not to do some funny casting/conversion tricks. > > It seems to be allowed in C99 TC3 (dated the September 2007-09-07). See section > 6.5.2.3, page 73: "If the member used to access the contents of a union object > is not the same as the member last used to store a value in the object, the > appropriate part of the object representation of the value is reinterpreted as > an object representation in the new type [...]" from [...] == "as described in 6.2.6 (a process sometimes called "type punning"). This might be a trap representation." Which I read as not allowed. I can't find any reference in the C++ spec, so I will assume that it is also not defined. > > I don't think it was allowed in TC2, but I don't have it to hand. > > > Has this changed in C++? If yes, could somebody point me to the right section > of > > the spec? It would be good to know, because I'm quite sure we already had > > trouble in this area. > > I don't think it's allowed in C++98 but I can't find a clear bit of the > specification that says that. However, it does have a rather surprising note on > reinterpret_cast suggesting that reinterpret_cast<T&>(x) can be used to safely > do type-punning. I'd not seen this before, and every resource I've seen until > now has said that this is _not_ safe. This (draft) standard is dated 2005-10-19 > and I'm not sure where I got it from. > > I still think that the only _portable_ way to do it is to use memcpy; that's > safe in every version of the standards, but the rules are rather more relaxed > than I previously thought. void Set(T new_value) { int64_t* value = reinterpret_cast<int64_t*>(&new_value); value_ = value; } won't work because the size of the read is different. If it was originally a float, I'm now stuck with the upper 32 bits having random data. Get() will work fine with this construct because the original value was 64 bits, so all reads 64bits or less are going to be correct. memcpy, even with fixed size, is still a slower. base : 5:31 memcpy : 4:03 union : 3:52 Does this adversely impact the possible use of Q registers in the future to not be worthwhile?
On 2014/03/28 07:57:54, Sven Panne wrote: > NOT LGTM. I think we have to approach the whole problem differently... > How would you like to see this done then? The other approach that I can see is to change the functions that pass in a size value so that they don't. This is more invasive, but possible. The underlying problem of having to write different types to some type of backing store exists. Leaving them as a memcpy is the most portable, but there are still compiler issues with trying to optimize it. Or maybe I'm missing what you think the problem is entirely. > https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm64.h > File src/arm64/simulator-arm64.h (right): > > https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:185: value_ = reg.value; > I am slightly worried about using the union trick here: This is undefined > behavior in C++ and we've actually seen compilers (clang IIRC) generating not > the code one wants here. v8 has the BitCast template for such tricks, but this > contains a memcpy for just that reason. > > https://codereview.chromium.org/213943002/diff/40001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:207: return reg.result; > Same concern here as above: The compiler is free to elide the assignment of > value_ to reg.value, because according to the rules, reading reg.result can't > depend on that assignment.
On 2014/03/28 11:46:04, Sven Panne wrote: > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm64.h > File src/arm64/simulator-arm64.h (right): > > https://codereview.chromium.org/213943002/diff/20001/src/arm64/simulator-arm6... > src/arm64/simulator-arm64.h:179: } reg = { new_value }; > On 2014/03/27 19:58:33, Fritz wrote: > > I'm specifically trying to type-alias here in a safe way. [...] > > I don't think that such a thing is possible: IIRC, the only defined way of using > a union is accessing its fields consistently, i.e. read only out of the field > you have last written into, everything else is totally undefined (at least this > was the definition for C). Or in other words: The only safe way to use a union > is to save space, not to do some funny casting/conversion tricks. > > Has this changed in C++? If yes, could somebody point me to the right section of > the spec? It would be good to know, because I'm quite sure we already had > trouble in this area. The best way I can read it is that it is not defined. Here is a GCC doc calling it out explicitly as legal. But I'm sure that's just one of many compilers that need to be supported. http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Optimize-Options.html under -fstrict-aliasing Is there a minimum c++ spec that v8 adheres to?
Browsing through the simulator code a bit, I think the fundamental problem is that passing around the register size at runtime is a bad idea from a performance point of view. This leads to tons of dynamic checks and the casting problem at hand, and we're just trying to cure the symptoms. The "real cure" IMHO would be templatizing the size of the register in question, i.e. make it a compile time parameter, not a runtime parameter. With this, the need for the funky type puns plus a lot of dynamic checks vanishes. Depending on how one does it, it increases the code size of the simulator, but given how slow it is currently, this might be a very good trade-off.
On 2014/03/31 13:32:53, Sven Panne wrote: > Browsing through the simulator code a bit, I think the fundamental problem is > that passing around the register size at runtime is a bad idea from a > performance point of view. This leads to tons of dynamic checks and the casting > problem at hand, and we're just trying to cure the symptoms. The "real cure" > IMHO would be templatizing the size of the register in question, i.e. make it a > compile time parameter, not a runtime parameter. With this, the need for the > funky type puns plus a lot of dynamic checks vanishes. Depending on how one does > it, it increases the code size of the simulator, but given how slow it is > currently, this might be a very good trade-off. I agree that passing register size around is not ideal. But I disagree that changing that will solve this problem. There is one register file that is 64 bits. This can be accessed by both 32 bit op codes and 64 bit op codes. The register access for those 32 bit operations are going to need to read/write to the 64 bit register file because it's shared. So there needs to be some sort of memcpy/type pun to accomplish this. Or I remove most of my changes and leave in the register size and mask the access to the register like Set(int64_t new_value, unsigned size) is doing now. That is still going to need some casting to write/read floats and doubles. I was trying to avoid branches in the template as I had done in my previous cl because I thought that was where the compiler was failing.
On 2014/03/31 17:33:38, Fritz wrote: > I agree that passing register size around is not ideal. But I disagree that > changing that will solve this problem. There is one register file that is 64 > bits. This can be accessed by both 32 bit op codes and 64 bit op codes. The > register access for those 32 bit operations are going to need to read/write to > the 64 bit register file because it's shared. So there needs to be some sort of > memcpy/type pun to accomplish this. [...] Nope, that's exactly the fallacy behind all our problems: If you are *explicit* about what you're doing, there is *no need at all* to do some funky casts or memcpy. Making the register size explicit as a template parameter is achieving exactly this: The backing store is of course always 64bit, so the 64bit operations have to do nothing special, while the 32bit operations have to do some zero/sign-extension. But via template specialization one can easily make the distinction a compile-time decision. You can even get away without this template magic by providing 2 operations Foo32 and Foo64 instead of a Foo<size> template, this is essentially the same as using template specialization for exactly 2 parameter values.
I've started work on templatizing functions that access registers. This will get away from the controversial type punning. Before going through and doing all the functions, I wanted to get a quick check to see if this is an acceptable solution. I also didn't realize that git cl upload would squash my commits. What would be the preferred way of landing this? A lot of code is going to change. Multiple CLs? Or one giant one. One of the problems with multiple is that there are dependencies, so there would be functions not totally optimized. (See op2 &= kWRegMask)
I started to review this, but I had trouble seeing the direction you want to go. Could you outline a bit how you want to do it? It's a bit hard to reconstruct the intention when it is unclear what is only transitional code and what is code which should stay. Regarding the "how-to-land": If it is a mechanical changes (which it should be), I have no problem with a big CL. If you could find a way to land things incrementally (either by "layers" or by features), it would be nice, but not really necessary from my POV. Again, I'd like to hear some feedback from the ARM people...
https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:152: value_ = new_value; A ternary is a bit more readable here: value_ = (size == kWRegSizeInBits) ? (new_value & kWRegMask) : new_value; https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:160: void Set(T new_value) { This method and the corresponding templatized Get is temporary only, right? Or is it the other way round? I don't fully understand the direction we're going here... https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:169: int64_t result = value_; I think a ternary is more readable here, too: return (size == kSRegSizeInBits) ? (value_ & kSRegMask) : value; Furthermore, why do we use kSRegSizeInBits here and not kWRegSizeInBits (see Set). Granted, they have to be the same, but we somehow should do things consistently.
In general, I like the approach, but I have a few comments. Thanks, Jacob https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:888: public: Is 'public' necessary here? https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:895: typedef typename make_unsigned<T>::type unsignedT; I don't like this mechanism I'm afraid; it's not a common idiom (as far as I know) and it's pretty confusing at first. If you can't replace it with something cleaner, at least put the main declaration in the .cc file with the specialisations and then add a very clear comment explaining what's going on. Another option would be to put it in the global 'utils.h'. In this case, can't you just operate on uint64_t values and then cast to <T> at the end? (I presume that the purpose is to get ShiftOperand to return the proper type, instead of just int64_t.) https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1306: instr->ImmDPShift()); This would be clearer (and fit on one line) if it declared shift_type and shift_amount, like in VisitLogicalShifted. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1359: op2 &= kWRegMask; Doesn't the new ShiftOperand apply that mask? If so, that would be a good assertion. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1934: const T kMinimumInteger = static_cast<T>(1) << (sizeof(T) * 8 - 1); Consider adding a helper function to do this (perhaps in utils.h): template <typename T> T MinInt(); template <> int64_t MinInt<int64_t>() { return INT64_MIN; } template <> int32_t MinInt<int32_t>() { return INT32_MIN; } We do that for FPDefaultNaN for example, for doubles and floats. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:152: value_ = new_value; You probably need to set the value just once, to avoid temporarily writing a sign-extended value into 'value_'. (This kind of thing has upset the sampling profiler in the past as registers occasionally appear to have unexpected values.) https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:156: } else { ASSERT(size == (sizeof(value_) * kByteSize)); } https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:160: void Set(T new_value) { On 2014/05/16 11:15:47, Sven Panne wrote: > This method and the corresponding templatized Get is temporary only, right? Or > is it the other way round? I don't fully understand the direction we're going > here... Indeed; I'd like to see what you're aiming for. I wouldn't expect this to be temporary though; this method allows a constant size for memcpy, so it's probably the best way to implement a simple 'put value X in a register' case. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:170: if (size == kSRegSizeInBits) { Use of SRegSize is surprising here; use WRegSize instead (as in the other 'Set'). https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:184: int64_t value_; Have you thought about how NEON Q registers might be represented? We discussed it earlier in this issue but I don't think there was a conclusion. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:187: STATIC_ASSERT(kXRegSize == kDRegSize); It'd be better to put this next to the declaration of value_, but using sizeof: STATIC_ASSERT(kXRegSize == sizeof(value_)); STATIC_ASSERT(kDRegSize == sizeof(value_)); https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:356: bool Reg31ZeroMode(unsigned code, Reg31Mode r31mode) const { `IsZeroRegister(...)` would be a better name.
Thanks for the feedback. Direction: I am replacing reg(reg_size, ... functions with either reg<int32/64_t> or wreg, xreg access. Getting rid of reg_size will allow for memcpy calls to have a constant length, which should be optimized out. class SimRegisterBase should have the Set/Get (unsigned size) member functions removed when this is all done. CL: There are less occurrences of the reg(reg_size, ... idiom than I originally thought. Locally I am trying to keep the changes spread across git commits, but I'll squash them into this cl for review. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:888: public: On 2014/05/16 14:05:50, jbramley wrote: > Is 'public' necessary here? Yes. Below it is accessed make_unsigned<T>::type https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:895: typedef typename make_unsigned<T>::type unsignedT; On 2014/05/16 14:05:50, jbramley wrote: > I don't like this mechanism I'm afraid; it's not a common idiom (as far as I > know) and it's pretty confusing at first. If you can't replace it with something > cleaner, at least put the main declaration in the .cc file with the > specialisations and then add a very clear comment explaining what's going on. > > Another option would be to put it in the global 'utils.h'. > > In this case, can't you just operate on uint64_t values and then cast to <T> at > the end? (I presume that the purpose is to get ShiftOperand to return the proper > type, instead of just int64_t.) I haven't found away around this yet. Propagating a int32_t to a uint64_t sign extends it first. int32_t value = 0x80000000; int32_t result32 = static_cast<uint64_t>(value) >> 16; // = 0xffff8000 int32_t result64 = static_cast<uint32_t>(value) >> 16; // = 0x00008000 Register read/write access is signed at the moment. I could templatize read_wreg<uint32_t> or do read_wreg_unsigned. But I was trying to be simpler than that. I was trying to do a drop in replacement without changing the algorithm. std::make_unsigned is present in c++11 (http://www.cplusplus.com/reference/type_traits/make_unsigned/) So the idea isn't mine. I was just trying to backport it. And the implementation isn't mine either, it's cribbed together mostly from here (http://stackoverflow.com/questions/16461561/c-conversion-from-t-to-unsigned-t) I will look for a better alternative. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1306: instr->ImmDPShift()); On 2014/05/16 14:05:50, jbramley wrote: > This would be clearer (and fit on one line) if it declared shift_type and > shift_amount, like in VisitLogicalShifted. Done. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1359: op2 &= kWRegMask; On 2014/05/16 14:05:50, jbramley wrote: > Doesn't the new ShiftOperand apply that mask? If so, that would be a good > assertion. ShiftOperand was returning a signed int64_t. It's returned as a int32_t and propagated to a int64_t. This has been an iterative process of trying to replace one function at a time. Looking at this no, ShiftOperand can probably return an unsigned. LogicalHelper writes the register and needs to be templatized so that it isn't writing using reg_size. This is what I was referring to in regards to transitory functions. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1934: const T kMinimumInteger = static_cast<T>(1) << (sizeof(T) * 8 - 1); On 2014/05/16 14:05:50, jbramley wrote: > Consider adding a helper function to do this (perhaps in utils.h): > > template <typename T> T MinInt(); > template <> int64_t MinInt<int64_t>() { return INT64_MIN; } > template <> int32_t MinInt<int32_t>() { return INT32_MIN; } > > We do that for FPDefaultNaN for example, for doubles and floats. Thanks, great idea. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm64.h File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:160: void Set(T new_value) { On 2014/05/16 14:05:50, jbramley wrote: > On 2014/05/16 11:15:47, Sven Panne wrote: > > This method and the corresponding templatized Get is temporary only, right? Or > > is it the other way round? I don't fully understand the direction we're going > > here... > > Indeed; I'd like to see what you're aiming for. I wouldn't expect this to be > temporary though; this method allows a constant size for memcpy, so it's > probably the best way to implement a simple 'put value X in a register' case. My direction is to only use the templatized Get/Set. This results in a memcpy with a constant length. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:184: int64_t value_; On 2014/05/16 14:05:50, jbramley wrote: > Have you thought about how NEON Q registers might be represented? We discussed > it earlier in this issue but I don't think there was a conclusion. I don't have a fully formed solution yet. I do realize that it will be necessary to land this cl. One thing I have considered is declaring all registers as 128 bit. All access to registers go through this class, there are no native operations on the backing store. So it will waste space. 64 registers * extra 8 bytes. And it doesn't exactly follow the hw model. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.h:356: bool Reg31ZeroMode(unsigned code, Reg31Mode r31mode) const { On 2014/05/16 14:05:50, jbramley wrote: > `IsZeroRegister(...)` would be a better name. Done.
Re: Direction A little more info. The main reason I was trying to get a Set/Get without specifying a size was to get around the possibility of compiler issues that came about previously (https://codereview.chromium.org/203263017/). I was hoping to be able to eventually templatize the Visit* functions so that the SixtyFourBits() decision could be made once. This is a bit harder than I thought due to a lot of shared code and the way the Visit* functions work. My next approach has been putting the branch only once in the Visit* functions and then templatizing any helper functions that access the registers. This however is a decent amount of code. Eventually the Set/Get size functions can be removed and only the fixed memcpy function will be necessary. I could be wrong about why the other cl fails gcc under optimization and the best approach could to just do the changes to SimRegisterBase. It would be a lot less code. Part of that depends on changing value_ from an array to a single entity.
2 tiny drive-by comments... https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:888: public: On 2014/05/16 18:11:46, Fritz wrote: > On 2014/05/16 14:05:50, jbramley wrote: > > Is 'public' necessary here? > Yes. Below it is accessed make_unsigned<T>::type Huh? struct = class + public, so this shouldn't be needed, unless there is some arcane template/namespace/overloading/... resolution rule which I don't know. :-) https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1934: const T kMinimumInteger = static_cast<T>(1) << (sizeof(T) * 8 - 1); Even better idea: Use <limits>: http://www.cplusplus.com/reference/limits/numeric_limits/
This should be feature complete. It get's rid of all variable length memcpy when accessing the registers in the simulator. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:888: public: On 2014/05/19 08:12:26, Sven Panne wrote: > On 2014/05/16 18:11:46, Fritz wrote: > > On 2014/05/16 14:05:50, jbramley wrote: > > > Is 'public' necessary here? > > Yes. Below it is accessed make_unsigned<T>::type > > Huh? struct = class + public, so this shouldn't be needed, unless there is some > arcane template/namespace/overloading/... resolution rule which I don't know. > :-) Your correct. My mistake. Done. https://codereview.chromium.org/213943002/diff/80001/src/arm64/simulator-arm6... src/arm64/simulator-arm64.cc:1934: const T kMinimumInteger = static_cast<T>(1) << (sizeof(T) * 8 - 1); On 2014/05/19 08:12:26, Sven Panne wrote: > Even better idea: Use <limits>: > http://www.cplusplus.com/reference/limits/numeric_limits/ Done. https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... src/arm64/simulator-arm64.h:76: template<typename T> utils-arm64.h or utils.h? https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... src/arm64/simulator-arm64.h:159: int64_t value_; for the SIMD registers I considered templating this class so that this could be a uint64_t or a uint128_t. Is that acceptable?
https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... src/arm64/simulator-arm64.h:76: template<typename T> On 2014/05/20 19:44:24, Fritz wrote: > utils-arm64.h or utils.h? It's not ARM64-specific, so I'd say utils.h, but there may be conventions about these headers that I'm not aware of. The specialisations in simulator-arm64.cc should be there too. Finally, could you add a comment saying that this is like C++11's `std::make_unsigned` please? https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... src/arm64/simulator-arm64.h:159: int64_t value_; On 2014/05/20 19:44:24, Fritz wrote: > for the SIMD registers I considered templating this class so that this could be > a uint64_t or a uint128_t. > > Is that acceptable? I don't think we can assume that uint128_t is available. However, this should probably work if we pass in a struct of two uint64_t values; you can still memcpy its data. The initialisation list would need updating but that's a problem we can deal with later.
That should be everything. Please let me know if anything needs work. https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... File src/arm64/simulator-arm64.h (right): https://codereview.chromium.org/213943002/diff/120001/src/arm64/simulator-arm... src/arm64/simulator-arm64.h:76: template<typename T> On 2014/05/21 08:02:30, jbramley wrote: > On 2014/05/20 19:44:24, Fritz wrote: > > utils-arm64.h or utils.h? > > It's not ARM64-specific, so I'd say utils.h, but there may be conventions about > these headers that I'm not aware of. > > The specialisations in simulator-arm64.cc should be there too. > > Finally, could you add a comment saying that this is like C++11's > `std::make_unsigned` please? Done.
LGTM from my side, but I'll leave the final word to Jacob and Rodolph (and I'm on vacation the next week and after that, so it would be nice if you could take care of Fritz' work in the meantime).
On 2014/05/22 07:56:42, Sven Panne wrote: > LGTM from my side, but I'll leave the final word to Jacob and Rodolph (and I'm > on vacation the next week and after that, so it would be nice if you could take > care of Fritz' work in the meantime). Right-o, have a good holiday. Thanks for bearing with us, Fritz! lgtm
On 2014/05/22 08:18:57, jbramley wrote: > On 2014/05/22 07:56:42, Sven Panne wrote: > > LGTM from my side, but I'll leave the final word to Jacob and Rodolph (and I'm > > on vacation the next week and after that, so it would be nice if you could > take > > care of Fritz' work in the meantime). > > Right-o, have a good holiday. > > Thanks for bearing with us, Fritz! > > lgtm I really appreciate all the help in getting this to a better and accepted place.
Message was sent while issue was closed.
Committed patchset #8 manually as r21448 (presubmit successful).
Message was sent while issue was closed.
I had to revert this, it broke Webkit tests and arm64.debug.check (note: "debug", not "optdebug"). Please have a look at these and try to reland, the speedup was really nice (2:06min => 1:34min for arm64.release.check).
Message was sent while issue was closed.
Will work on getting this fixed up before Sven gets back. https://codereview.chromium.org/213943002/diff/160001/src/arm64/simulator-arm... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/160001/src/arm64/simulator-arm... src/arm64/simulator-arm64.cc:843: ((std::numeric_limits<unsignedT>::max() - u1 - carry_in) < u2); The bug introduced is in the computation of C. It errors now when u1 == u2 == max and carry_in == 1 as the computation is not promoted to uint64_t. max - u1 == 0, but u2 + carry_in also == 0 as it outranges max - u1 - carry_in == max which is equal to u2, not less than so C is false. If the promotion occurs then (u2 + carry_in) > (max - u1) and so C is true. I'll work on getting a fix up for this. I'm in and out this week so it might not be immediate.
Can I just open this up? Or do I need to create another CL? https://codereview.chromium.org/213943002/diff/180001/src/arm64/simulator-arm... File src/arm64/simulator-arm64.cc (right): https://codereview.chromium.org/213943002/diff/180001/src/arm64/simulator-arm... src/arm64/simulator-arm64.cc:839: // Compute the C flag Using the logic from the arm simulator here. I don't think it would have changed for arm64.
Message was sent while issue was closed.
Committed patchset #9 manually as r21804 (presubmit successful). |