|
|
Created:
5 years, 6 months ago by Mircea Trofin Modified:
5 years, 4 months ago CC:
v8-dev, titzer, Benedikt Meurer, JF Base URL:
https://chromium.googlesource.com/v8/v8.git@master Target Ref:
refs/pending/heads/master Project:
v8 Visibility:
Public. |
DescriptionWhile working on tuning, I realized that the initial implementation was
suffering from an inefficiency. Turns out, if a range had multiple
conflicts, only the first 2 were considered, after which (regardless of
weight) the remaining would win by default. This would lead in some
scenarios to inefficient (while correct) codegen. To correct this, and
avoid disproportionately large compile time regressions, I implemented
an alternate data structure for storing allocations.
There were a few more important changes from the initial implementation:
- up-front grouping of related live ranges (e.g. phi inputs and output),
and a first-attempt allocation of groups on the same register;
- conflict resolution: I initially believed that splitting blocked
live ranges should rely on conflicts, but after more analysis (including
reference implementations and the scarce documentation around the LLVM
implementation) I changed that to relying on "what's best for the range
itself", letting the weights mechanics of the algorithm converge to the
right decision. As a result, a few benchmarks that were before regressed
are now improved, and a few with serious regressions (15-20%) are now under
the 10% range. I expect these remaining regressions to be easier to
understand and address, given the changes I'm introducing here.
The change includes a few other opportunistic changes. The optimized code printer was outputting instruction offsets in decimal, changed it to hex, to match what perf outputs them as. Also, when producing printouts from the instruction scheduler, blocks now each indicate which loop (if any) they belong to, which helps with perf analysis.
BUG=
Patch Set 1 #
Total comments: 2
Patch Set 2 : signed/unsigned implicit cast fix #
Total comments: 12
Patch Set 3 : addressed feedback #
Total comments: 51
Patch Set 4 : #Patch Set 5 : #
Total comments: 40
Patch Set 6 : #Patch Set 7 : Small fix #
Total comments: 39
Patch Set 8 : #Patch Set 9 : #
Total comments: 36
Messages
Total messages: 20 (4 generated)
mtrofin@chromium.org changed reviewers: + jarin@chromium.org, jvoung@chromium.org, stichnot@chromium.org
https://codereview.chromium.org/1157663007/diff/1/src/compiler/schedule.cc File src/compiler/schedule.cc (right): https://codereview.chromium.org/1157663007/diff/1/src/compiler/schedule.cc#ne... src/compiler/schedule.cc:340: os << " (in loop: B" << block->loop_header()->rpo_number() << ")"; much easier to profile when information whether block is in loop is locally available. https://codereview.chromium.org/1157663007/diff/1/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/1/src/disassembler.cc#newcode149 src/disassembler.cc:149: out.AddFormatted("%p %4X ", prev_pc, prev_pc - begin); perf outputs offsets in hex. I changed our output to hex, too, makes it easier to profile and peek at output code (with block labels) https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:3477: static_cast<size_t>(code()->InstructionBlockCount())) Is there a reason InstructionBlockCount() is signed? I noticed its implementation intentionally casts the unsigned "size()" of the underlying collection to signed.
First batch of comments. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:485: float weight_; Could you please split the fields into groups by their use in the allocator (three groups: (1) used by linear scan only, (2) used by greedy only, (3) used by both). I am worried that too much bloat might hurt performance of linear scan, the split could help with experiments measuring impact of the bloat. It would also be good for documentation purposes. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:940: LiveRange* operator*() { return pos_->range; } Should not the star operator return a reference rather than a pointer? https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:950: bool operator!=(const conflict_iterator& other) { Why is not this is simply negation of the equality? Even if there is reason to do it this way, please, do not use bitwise xor for bools. You can use != instead. https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc#new... src/disassembler.cc:149: out.AddFormatted("%p %4X ", prev_pc, prev_pc - begin); Why did you change this? I am worried that tools relying on formatting here might be unhappy.
https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:485: float weight_; On 2015/06/03 15:06:16, jarin wrote: > Could you please split the fields into groups by their use in the allocator > (three groups: (1) used by linear scan only, (2) used by greedy only, (3) used > by both). I am worried that too much bloat might hurt performance of linear > scan, the split could help with experiments measuring impact of the bloat. It > would also be good for documentation purposes. Actually, the greedy introduces 3 fields only, the rest end up being used by both. So there's no category 1, at least just yet. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:940: LiveRange* operator*() { return pos_->range; } On 2015/06/03 15:06:16, jarin wrote: > Should not the star operator return a reference rather than a pointer? It returns the item "pointed at" by the iterator, so LiveRange*. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:950: bool operator!=(const conflict_iterator& other) { On 2015/06/03 15:06:16, jarin wrote: > Why is not this is simply negation of the equality? > > Even if there is reason to do it this way, please, do not use bitwise xor for > bools. You can use != instead. Done. https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc#new... src/disassembler.cc:149: out.AddFormatted("%p %4X ", prev_pc, prev_pc - begin); On 2015/06/03 15:06:16, jarin wrote: > Why did you change this? > > I am worried that tools relying on formatting here might be unhappy. "perf" (the tool) outputs offsets as hex, so this makes perf analysis much easier. I share your worry though. I'll change to having a flag perhaps?
Another batch of comments. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:3477: static_cast<size_t>(code()->InstructionBlockCount())) On 2015/06/03 05:25:51, Mircea Trofin wrote: > Is there a reason InstructionBlockCount() is signed? I noticed its > implementation intentionally casts the unsigned "size()" of the underlying > collection to signed. For historical reasons. Previously, we used hand-rolled collections (with int sizes), we moved towards STL later. We only converted to size_t where it was painless. https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/20001/src/compiler/register-a... src/compiler/register-allocator.h:940: LiveRange* operator*() { return pos_->range; } On 2015/06/04 03:38:39, Mircea Trofin wrote: > On 2015/06/03 15:06:16, jarin wrote: > > Should not the star operator return a reference rather than a pointer? > > It returns the item "pointed at" by the iterator, so LiveRange*. Normally, operator* for T::iterator returns T&; operator-> returns T*. https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/20001/src/disassembler.cc#new... src/disassembler.cc:149: out.AddFormatted("%p %4X ", prev_pc, prev_pc - begin); On 2015/06/04 03:38:39, Mircea Trofin wrote: > On 2015/06/03 15:06:16, jarin wrote: > > Why did you change this? > > > > I am worried that tools relying on formatting here might be unhappy. > > "perf" (the tool) outputs offsets as hex, so this makes perf analysis much > easier. I share your worry though. I'll change to having a flag perhaps? Maybe use this only locally, or change the formatting in a separate change list. It looks like your change does not format jump targets consistently, so we should not really commit it as is. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:166: os << "H:N|U"; Any reason why these cases are not separated? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:172: os << "H:Use(R x"; How about changing "x" to "?"? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:853: unsigned ipow(unsigned b, unsigned e) { Since you are casting the result to float anyway, you might as well use std::pow. As a bonus, you will avoid overflow problems and recursion. If you really see this high in the profile, then you should rewrite this not to use recursion. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:865: static int GetHintedRegister(LiveRange* range) { Google style guide recommends anonymous namespaces over statics. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:888: DCHECK_NE(0U, size_); Should not it be always positive? Why are you comparing unsigned value to a signed one? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:917: if (pos->pos() > end) break; I am a bit confused - how could it happen that the position is outside the start-end range. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:920: use_count += static_cast<float>(ipow(10, loop_count)); Could you pull the various magic constants to the header file? (And give them some informative names? E.g., LoopNestingWeightMultiplier or some such.) https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:929: // if (is_phi()) { Remove the commented out code. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2148: bool cont = false; Why do you use this awkward assignment to bool variable here? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2680: if (same_storage_and_query && first.IsValid()) Any reason why the operator is not symmetric?
Some very-surface-level comments. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2101: range_count++; How is range_count used? Seems to be a local variable that is only incremented. Was it previously part of debug-only traces? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2715: // is reflow block of comments? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2751: // still reflow comment https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2770: // by reflow comments https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2792: // contained can put "ranges." on the same line https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2808: bool empty() { return storage_.empty(); } const? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3081: CHECK(!range->IsFixed()); Just checking -- DCHECK or CHECK ? https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3282: CHECK_EQ(0, queue_.size()); Do you need CHECK or DCHECK is okay? Similar below. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3947: moves_counter++; Similar to the range_count earlier. This local variable seems to only get incremented but not observed. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:542: bool IsSlotAssigned() { return assigned_slot_ != kUnassignedSlot; } const https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:925: typedef ZoneSet<AllocatedInterval, AllocatedInterval::Comparer>::const_iterator "ZoneSet<AllocatedInterval, AllocatedInterval::Comparer>" also shows up a couple times, so if it would help you could also typedef that? If you want to typedef, may want to narrow its scope to the conflict_iterator though since it's only referenced in the private parts. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:956: conflict_iterator(UseInterval* q, Is this ctor variant used? Not knowing much it seemed a bit odd to take q, s parameters but then ignore them and not assign them to the query_ and storage_ fields. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:1006: bool IsGroup() { return is_grp_; } const https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:1015: can remove some an extra newline here and line 1020/1021 https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:1016: LiveRange* get_range() { const
Incorporated feedback, and also retracted the cosmetic formatting in both disassembler.cc and scheduler.cc, which fit better in a different CL (plus, the dec->hex change should probably hide behind a flag, and definitely extend to other places offsets are used, e.g. jumps) https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:166: os << "H:N|U"; On 2015/06/08 13:32:50, jarin wrote: > Any reason why these cases are not separated? You are right, they should be distinguished, esp. since this is a pretty printer. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:172: os << "H:Use(R x"; On 2015/06/08 13:32:50, jarin wrote: > How about changing "x" to "?"? Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:853: unsigned ipow(unsigned b, unsigned e) { On 2015/06/08 13:32:50, jarin wrote: > Since you are casting the result to float anyway, you might as well use > std::pow. As a bonus, you will avoid overflow problems and recursion. > > If you really see this high in the profile, then you should rewrite this not to > use recursion. Makes sense. I needed it as an int in an earlier incarnation, but not anymore. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:865: static int GetHintedRegister(LiveRange* range) { On 2015/06/08 13:32:49, jarin wrote: > Google style guide recommends anonymous namespaces over statics. Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:888: DCHECK_NE(0U, size_); On 2015/06/08 13:32:49, jarin wrote: > Should not it be always positive? Why are you comparing unsigned value to a > signed one? It could erroneously be 0, if the range has no interval, but the signed and unsigned observation is correct. The reason size_ is an int is so that I can use the negative value as "invalid". I suppose I could use "0" as invalid marker (taking off the table an extra bool, why bloat); but "0" could also be the size of an empty range, so for clarity, I prefer the "-1 -> invalid, 0-> empty, should never be calculated" https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:917: if (pos->pos() > end) break; On 2015/06/08 13:32:49, jarin wrote: > I am a bit confused - how could it happen that the position is outside the > start-end range. Oh! this and the line above are from an incarnation of the weight calculation where I could pass starts and ends within the range. Neither lines are needed anymore. Thanks for the catch! https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:920: use_count += static_cast<float>(ipow(10, loop_count)); On 2015/06/08 13:32:49, jarin wrote: > Could you pull the various magic constants to the header file? (And give them > some informative names? E.g., LoopNestingWeightMultiplier or some such.) Yes - I avoided that because I believe I'll evolve the weight from a scalar to a polynomial. But for now, I'll pull all the constants out, just in case this current model turns out to be more adequate. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:929: // if (is_phi()) { On 2015/06/08 13:32:50, jarin wrote: > Remove the commented out code. Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2101: range_count++; On 2015/06/08 18:59:21, jvoung wrote: > How is range_count used? Seems to be a local variable that is only incremented. > Was it previously part of debug-only traces? Yes :) thanks for the catch, removed it. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2148: bool cont = false; On 2015/06/08 13:32:50, jarin wrote: > Why do you use this awkward assignment to bool variable here? point in time. I'll revert this part of this change https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2680: if (same_storage_and_query && first.IsValid()) On 2015/06/08 13:32:49, jarin wrote: > Any reason why the operator is not symmetric? I believe it is: if same_storage_and_query is true, it means first.IsValid()N and second.IsValid(). No point checking the second.IsValid(). But I can add it if we feel it improved readability. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2715: // is On 2015/06/08 18:59:21, jvoung wrote: > reflow block of comments? I lost my faith in git cl format :) https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2751: // still On 2015/06/08 18:59:21, jvoung wrote: > reflow comment Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2770: // by On 2015/06/08 18:59:21, jvoung wrote: > reflow comments Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2792: // contained On 2015/06/08 18:59:21, jvoung wrote: > can put "ranges." on the same line Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2808: bool empty() { return storage_.empty(); } On 2015/06/08 18:59:21, jvoung wrote: > const? Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3081: CHECK(!range->IsFixed()); On 2015/06/08 18:59:21, jvoung wrote: > Just checking -- DCHECK or CHECK ? CHECK, because attempting to evict a fixed range is very very flawed. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3282: CHECK_EQ(0, queue_.size()); On 2015/06/08 18:59:21, jvoung wrote: > Do you need CHECK or DCHECK is okay? Similar below. I prefer check here, and so below. This executes once per function, so the cost shouldn't register, so I prefer to "err" towards better bug detection. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:3947: moves_counter++; On 2015/06/08 18:59:21, jvoung wrote: > Similar to the range_count earlier. This local variable seems to only get > incremented but not observed. Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:542: bool IsSlotAssigned() { return assigned_slot_ != kUnassignedSlot; } On 2015/06/08 18:59:22, jvoung wrote: > const Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:925: typedef ZoneSet<AllocatedInterval, AllocatedInterval::Comparer>::const_iterator On 2015/06/08 18:59:22, jvoung wrote: > "ZoneSet<AllocatedInterval, AllocatedInterval::Comparer>" also shows up a couple > times, so if it would help you could also typedef that? > > If you want to typedef, may want to narrow its scope to the conflict_iterator > though since it's only referenced in the private parts. Done. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:956: conflict_iterator(UseInterval* q, On 2015/06/08 18:59:22, jvoung wrote: > Is this ctor variant used? Not knowing much it seemed a bit odd to take q, s > parameters but then ignore them and not assign them to the query_ and storage_ > fields. Good catch, it serves no purpose. Removed it.
Next batch of comments. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:888: DCHECK_NE(0U, size_); On 2015/06/09 02:56:02, Mircea Trofin wrote: > On 2015/06/08 13:32:49, jarin wrote: > > Should not it be always positive? Why are you comparing unsigned value to a > > signed one? > > It could erroneously be 0, if the range has no interval, but the signed and > unsigned observation is correct. The reason size_ is an int is so that I can use > the negative value as "invalid". I suppose I could use "0" as invalid marker > (taking off the table an extra bool, why bloat); but "0" could also be the size > of an empty range, so for clarity, I prefer the "-1 -> invalid, 0-> empty, > should never be calculated" I understand that size_ == -1 means invalid, but in this particular place, it should be always valid, no? I thought this should say DCHECK(0 < size_) (or DCHECK_LT(0, size_)). https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:2653: MoveToEnd(); Can this ever happen? It seems that all methods invalidate the iterator if at end. https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.h:926: interval_iterator; The interval_iterator typedef seems to be local to the conflict_iterator class. Can we move it there? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2630: if (pos_ == storage_->end()) { Overall, this gymnastics of MoveToEnd seem to be quite messy. Why could not you always insist that storage_ is not null, and consider the iterator finished if pos_ == storage_->end(). (Moreover, the invariant should be that the current allocation interval is overlapping with the current use interval or the allocation interval is storage_->end() and query_ is null; this would mean that pos_ is at end iff query is null). In this particular case, the caller should make sure that the iterator is not finished yet, so perhaps we could just DCHECK here. At a higher level, this whole class is quite complex for what it does. E.g., its is not clear to me what is the invariant for entering the ++ operator. The last loop of InitializeForNewQuery suggests that the query should be the last use interval that overlaps with pos(?). Indeed if it was not then the InitializeForNewQuery method could set the same pos as the current pos, so the iterator would return the same allocation interval twice. However, neither the ++ operator nor the InitializeForNewQuery seem to maintain this invariant: - when the ++ operator advances pos and it still intersects, it does not try to move the query as far as possible. - when InitializeForNewQuery's lower_bound lookup succeeds with the same start (line 2699), it also does not try to advance the query. Am I missing something? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2657: first.storage_ == second.storage_ && first.query_ == second.query_; Does it even make sense to compare iterators with different storage? Could not you just DCHECK that the storage is the same. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2697: pos_ = storage_->lower_bound(AsAllocatedInterval(q_start)); Would not the upper_bound be easier to handle here? Then you can just MoveToEnd and return if pos_ is at the beginning, or take --pos_ otherwise. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2716: break; Why can't you just say "if (Intersects(...)) break;"? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2765: void insert(LiveRange* range) { Style nit: why did you change the name to lower case? Only trivial getters should have lower case names. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2941: bool CanAddToGroup(LiveRange* r, LiveRangeGroup* grp) { Style nit: move to an anonymous name space (or make this a method of LiveRangeGroup, where this seems to belong). https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2945: ret = false; How about "return false" here and get rid of the "ret" variable? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2953: void TryMergeGroups(LiveRange* r1, LiveRange* r2) { It is a bit funny that its name is TryMergeGroups but it takes live ranges. Could not it take groups instead? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2969: r1->SetGroup(r2->group()); Is this necessary? r1 should be part of r1->group()->ranges(), so it should be set by the SetGroup call above. Perhaps DCHECK(r1->group() == r2->group()) would be enough? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3004: if (r == nullptr || r->IsEmpty() || r->kind() != mode()) continue; Are all these cases really possible? It seems to me that the nullptr case should not happen. If that is the case, please DCHECK rather than silently skip. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3070: // now that we have groups? I think some hinting is important for both allocators, e.g., hinting from fixed register use. However, most of the hinting is indeed useless for the greedy algorithm because it assumes the hints propagate forward. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3201: int i = static_cast<int>(a); This code is both brittle and hard to read. Please, use switch-case. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3216: if (HandleSpillOperands(range, &reminder)) { reminder -> remainder https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3227: Next(state)) { You mean "state = Next(state)"? I am puzzled why this code does not hang. Also, it seems the Attempt enum is not used in any interesting way; basically, it just makes the loop run two iterations in a very roundabout way. Perhaps we could just have a loop from 0 to 2? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3254: CHECK_EQ(0, allocations_.size()); CHECK(queue_.empty()); CHECK(allocations_.empty()); https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3276: auto current_pair = queue_.top(); Could you replace the "auto" with a real type here (and elsewhere where the type is not immediately obvious)? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3309: } I am a bit confused about what this is doing. It looks like it collects all spilled sub-ranges of live range. If that is the case, it is quite dangerous because we only spill in the beginning of the top level range, so if a non-spilled subrange is reused for a different live range, it could overwrite the spill slot, which would be bad. Perhaps I am missing something? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3368: if (next_mandatory_use == nullptr) Please, put the then- and else- clauses into braces. We generally omit the braces only if the then-clause fits on the same line as the condition. https://codereview.chromium.org/1157663007/diff/80001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/80001/src/disassembler.cc#new... src/disassembler.cc:297: } Please, do not change the end of the scope, it will confuse git cl format.
https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/40001/src/compiler/register-a... src/compiler/register-allocator.cc:888: DCHECK_NE(0U, size_); On 2015/06/09 15:00:41, jarin wrote: > On 2015/06/09 02:56:02, Mircea Trofin wrote: > > On 2015/06/08 13:32:49, jarin wrote: > > > Should not it be always positive? Why are you comparing unsigned value to a > > > signed one? > > > > It could erroneously be 0, if the range has no interval, but the signed and > > unsigned observation is correct. The reason size_ is an int is so that I can > use > > the negative value as "invalid". I suppose I could use "0" as invalid marker > > (taking off the table an extra bool, why bloat); but "0" could also be the > size > > of an empty range, so for clarity, I prefer the "-1 -> invalid, 0-> empty, > > should never be calculated" > > I understand that size_ == -1 means invalid, but in this particular place, it > should be always valid, no? I thought this should say DCHECK(0 < size_) (or > DCHECK_LT(0, size_)). It could stay 0 if the loop is never traversed. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2630: if (pos_ == storage_->end()) { On 2015/06/09 15:00:41, jarin wrote: > Overall, this gymnastics of MoveToEnd seem to be quite messy. Why could not you > always insist that storage_ is not null, and consider the iterator finished if > pos_ == storage_->end(). (Moreover, the invariant should be that the current > allocation interval is overlapping with the current use interval or the > allocation interval is storage_->end() and query_ is null; this would mean that > pos_ is at end iff query is null). > > In this particular case, the caller should make sure that the iterator is not > finished yet, so perhaps we could just DCHECK here. > > At a higher level, this whole class is quite complex for what it does. > > E.g., its is not clear to me what is the invariant for entering the ++ operator. > The last loop of InitializeForNewQuery suggests that the query should be the > last use interval that overlaps with pos(?). Indeed if it was not then the > InitializeForNewQuery method could set the same pos as the current pos, so the > iterator would return the same allocation interval twice. However, neither the > ++ operator nor the InitializeForNewQuery seem to maintain this invariant: > - when the ++ operator advances pos and it still intersects, it does not try to > move the query as far as possible. > - when InitializeForNewQuery's lower_bound lookup succeeds with the same start > (line 2699), it also does not try to advance the query. > > Am I missing something? I'll take the suggestion with storage_, some complexity here is likely due to an earlier design that I canned since, where "end()" was singular. I think what you're suggesting will simplify things. For the invariants question. When we enter ++, we know query_ is the last "query" use interval to overlap with pos. It could be that this query overlaps with the next allocated interval, ++pos. So we don't move query just yet. I think you're correct about the second point, that's a bug. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2657: first.storage_ == second.storage_ && first.query_ == second.query_; On 2015/06/09 15:00:41, jarin wrote: > Does it even make sense to compare iterators with different storage? Could not > you just DCHECK that the storage is the same. That makes sense, thanks! https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2697: pos_ = storage_->lower_bound(AsAllocatedInterval(q_start)); On 2015/06/09 15:00:42, jarin wrote: > Would not the upper_bound be easier to handle here? Then you can just MoveToEnd > and return if pos_ is at the beginning, or take --pos_ otherwise. upper_bound would never return a pos_ starting at q_start, if I understand your suggestion correctly, but regardless, your suggestion led me to something cleaner than the unhappy mess before. Thanks! https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2716: break; On 2015/06/09 15:00:42, jarin wrote: > Why can't you just say "if (Intersects(...)) break;"? No reason - good catch. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2765: void insert(LiveRange* range) { On 2015/06/09 15:00:42, jarin wrote: > Style nit: why did you change the name to lower case? Only trivial getters > should have lower case names. to emulate the collections' casing - but happy to go either way. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2941: bool CanAddToGroup(LiveRange* r, LiveRangeGroup* grp) { On 2015/06/09 15:00:42, jarin wrote: > Style nit: move to an anonymous name space (or make this a method of > LiveRangeGroup, where this seems to belong). Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2945: ret = false; On 2015/06/09 15:00:41, jarin wrote: > How about "return false" here and get rid of the "ret" variable? Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2953: void TryMergeGroups(LiveRange* r1, LiveRange* r2) { On 2015/06/09 15:00:41, jarin wrote: > It is a bit funny that its name is TryMergeGroups but it takes live ranges. > Could not it take groups instead? Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2969: r1->SetGroup(r2->group()); On 2015/06/09 15:00:42, jarin wrote: > Is this necessary? r1 should be part of r1->group()->ranges(), so it should be > set by the SetGroup call above. Perhaps DCHECK(r1->group() == r2->group()) would > be enough? Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3004: if (r == nullptr || r->IsEmpty() || r->kind() != mode()) continue; On 2015/06/09 15:00:42, jarin wrote: > Are all these cases really possible? It seems to me that the nullptr case should > not happen. If that is the case, please DCHECK rather than silently skip. A similar test happens in LinearScanAllocator::AllocateRanges. I believe it may have to do with how the live ranges get built and joined - I haven't investigated deeper this area, I remember having seen null values though. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3070: // now that we have groups? On 2015/06/09 15:00:42, jarin wrote: > I think some hinting is important for both allocators, e.g., hinting from fixed > register use. However, most of the hinting is indeed useless for the greedy > algorithm because it assumes the hints propagate forward. I wonder if more elaborate grouping will help there - I'll leave the comment in so I remember to tackle this in a next change. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3201: int i = static_cast<int>(a); On 2015/06/09 15:00:41, jarin wrote: > This code is both brittle and hard to read. Please, use switch-case. Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3216: if (HandleSpillOperands(range, &reminder)) { On 2015/06/09 15:00:41, jarin wrote: > reminder -> remainder Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3227: Next(state)) { On 2015/06/09 15:00:42, jarin wrote: > You mean "state = Next(state)"? > > I am puzzled why this code does not hang. > > Also, it seems the Attempt enum is not used in any interesting way; basically, > it just makes the loop run two iterations in a very roundabout way. Perhaps we > could just have a loop from 0 to 2? Ya, there's a bug, it should be state = Next(state); it probably got lucky and all scenarios had allocating groups. Thanks for the catch. As for a for (0..2) instead - it'd be functionally equivalent, but I believe harder to read. E.g. why 2? I'd argue that the enum-based design makes it clear: it's not a "loop" per se, it's just 2 states. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3254: CHECK_EQ(0, allocations_.size()); On 2015/06/09 15:00:41, jarin wrote: > CHECK(queue_.empty()); > CHECK(allocations_.empty()); Done. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3276: auto current_pair = queue_.top(); On 2015/06/09 15:00:42, jarin wrote: > Could you replace the "auto" with a real type here (and elsewhere where the type > is not immediately obvious)? Done. I'll keep an eye for other instances of this. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3309: } On 2015/06/09 15:00:41, jarin wrote: > I am a bit confused about what this is doing. It looks like it collects all > spilled sub-ranges of live range. If that is the case, it is quite dangerous > because we only spill in the beginning of the top level range, so if a > non-spilled subrange is reused for a different live range, it could overwrite > the spill slot, which would be bad. Perhaps I am missing something? It merges the spill ranges of live ranges belonging to a group. It's the moral equivalent of TryReuseSpillForPhi. While it does visit all (including split) ranges, it picks the spill ranges from the parent always. But I may be missing something? https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3368: if (next_mandatory_use == nullptr) On 2015/06/09 15:00:41, jarin wrote: > Please, put the then- and else- clauses into braces. We generally omit the > braces only if the then-clause fits on the same line as the condition. Ugh... yes... llvm-ism
https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3227: Next(state)) { On 2015/06/09 15:00:42, jarin wrote: > You mean "state = Next(state)"? > > I am puzzled why this code does not hang. > > Also, it seems the Attempt enum is not used in any interesting way; basically, > it just makes the loop run two iterations in a very roundabout way. Perhaps we > could just have a loop from 0 to 2? Actually, there is a return at the end of the loop (and no continue), so I am wondering why there is a loop at all.
bmeurer@chromium.org changed reviewers: + bmeurer@chromium.org
First round of comments. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:127: auto hint = range->FirstHintPosition(®); Don't use auto here. Use the correct type, and ideally even use const. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:354: InvalidateWeightAndSize(); Can we initialize properly instead of (mis)using an InvalidateSomething method in the constructor? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:888: auto start = Start(); Don't use auto here. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:909: for (; pos != nullptr; pos = pos->next()) { That looks kinda inefficient to me. Seems to be linear in the size of the program in the worst case. What does LLVM do here? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2309: UsePosition* hint = current->FirstHintPosition(&hint_register); Why did you change this? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2633: Invalidate(); Why invalidate here? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2639: Invalidate(); Same here https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2681: auto q_start = query_->start(); No auto here. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:436: LiveRangeGroup* group() { return group_; } Nit: Make this method const. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:440: DCHECK(weight_ >= 0.0f); Trivial getters should not have DCHECK's in them. Check on assignment or introduce a setter. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:444: unsigned size() { This getter does perform work, so it's not a simple getter, hence it must not look like one. Please rename to GetSize(). https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:521: ZoneSet<LiveRange*> ranges_; I'm not sure ZoneSet is such a great datastructure for this purpose. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:924: struct Comparer : public std::binary_function<AllocatedInterval, Why do you need this Comparer class? Can't you just define operator< on AllocatedInterval directly? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:937: // Iterator over allocated live ranges (CoalescedLiveRanges) that conflict with This whole iterator looks quite complex. Maybe we should have a better data structure that does not require such a complex iterator instead? https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:941: friend class CoalescedLiveRanges; friend classes should be listed in the private section. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:947: LiveRange* operator*() { return pos_->range; } This should return a LiveRange&, not a LiveRange*. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:950: bool IsEnd() const { STL iterators compare against some end. Don't try to add a wrapper for that, which might get out of sync. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:989: Invalidate(); Initialize properly instead of calling Invalidate in the constructor. https://codereview.chromium.org/1157663007/diff/120001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/120001/src/disassembler.cc#ne... src/disassembler.cc:297: } Please undo this change, especially since this file is otherwise unchanged.
Thanks for the feedback - updated. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:354: InvalidateWeightAndSize(); On 2015/06/11 05:30:29, Benedikt Meurer wrote: > Can we initialize properly instead of (mis)using an InvalidateSomething method > in the constructor? It would be inefficient to perform weight and size calculations at this stage. In fact, this reminds me, certain live range operations - like handling spill operands - can be done upfront before even worrying about (calculating) size and weights. I plan on working on that in upcoming changes, so I'd prefer not to change something that sets me in the right direction. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:888: auto start = Start(); On 2015/06/11 05:30:28, Benedikt Meurer wrote: > Don't use auto here. Done. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:909: for (; pos != nullptr; pos = pos->next()) { On 2015/06/11 05:30:28, Benedikt Meurer wrote: > That looks kinda inefficient to me. Seems to be linear in the size of the > program in the worst case. What does LLVM do here? Mozilla, at least, does something similar. LLVM's code is more complicated, but the high level description of the algorithm calls for a calculation of use densities, and factoring in loop-ness is important in ensuring hot ranges don't lose to colder ones. We could, eventually, change the LiveRange model and the stages computing them to pre-calculate certain components, for example the loop-ness of use positions, if that information is readily available earlier. But I believe that would be too much churn at this stage. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2309: UsePosition* hint = current->FirstHintPosition(&hint_register); On 2015/06/11 05:30:28, Benedikt Meurer wrote: > Why did you change this? Probably some experimentations, thanks for the catch - there's no reason for this to be changed. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2633: Invalidate(); On 2015/06/11 05:30:28, Benedikt Meurer wrote: > Why invalidate here? Because we're at the end, we're out of matching conflicts. I want the end iterator to have a canonical shape - valid storage, null query and past-end pos - so that == and != work well. Hence calling Invalidate. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2639: Invalidate(); On 2015/06/11 05:30:28, Benedikt Meurer wrote: > Same here See above. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.cc:2681: auto q_start = query_->start(); On 2015/06/11 05:30:29, Benedikt Meurer wrote: > No auto here. Done. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:436: LiveRangeGroup* group() { return group_; } On 2015/06/11 05:30:29, Benedikt Meurer wrote: > Nit: Make this method const. Done. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:440: DCHECK(weight_ >= 0.0f); On 2015/06/11 05:30:29, Benedikt Meurer wrote: > Trivial getters should not have DCHECK's in them. Check on assignment or > introduce a setter. The point is that I want to make sure no attempt to access weight is made without pre-calculating it. I'll call it GetWeight then. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:444: unsigned size() { On 2015/06/11 05:30:30, Benedikt Meurer wrote: > This getter does perform work, so it's not a simple getter, hence it must not > look like one. Please rename to GetSize(). Done. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:521: ZoneSet<LiveRange*> ranges_; On 2015/06/11 05:30:30, Benedikt Meurer wrote: > I'm not sure ZoneSet is such a great datastructure for this purpose. What would you propose as an alternative? Full disclosure, I did consider a R x I matrix - where "R" is the number of physical registers and "I" is the number of positions. That would simplify a number of things, especially the conflict iterator. Lookup would be O(1), too - rather than O(log(n)) right now. However, the memory cost could be high. Using some codebases we compile with subzero - so a similar in spirit scenario (C code - if you squint and think of asm.js as really "C"), we saw programs with 100K IR instructions coming into the allocator. Granted, different IR language, but I feel we could use it as a reasonable hint. So for that scenario, since we here have 4 positions per instruction, it'd mean ~400K per register, so then a few megs just for this structure. That worried me a bit. I didn't continue with the analysis - there is a cutoff point after which a set may also become more expensive than a dumb matrix. My priority still is getting the codegen to a good place where it shows the value of the algorithm, so I postponed these decisions, and I think the APIs through which the rest of the allocator uses this storage are abstract enough for me to swap implementations later and analyze this area better. The alternative I had with the initial checkin before was inspired by Mozilla and turned out to be *really* bad. Mozilla uses a splay tree, but then they chance on there being no more than 1 or 2 conflicts per candidate live range. If that's not the case, the range loses automatically. Seemed suboptimal to me, and also a choice that makes understanding the effects of the algorithm on large programs hard - you can't rely on "best choice wins", it is rather "best choice, or sometimes not, if more than a few worse choices are conflicting" :) Using that implementation choice, compile time for programs like zlib ballooned to over 5 minutes (!!) from a few hundred ms. I was planning actually to talk to the author (on the Mozilla side), at our last social, to get a better understanding of the tradeoffs he found - but he had a conflict, so next time. I haven't analyzed much LLVM's, they call it a Matrix though, interestingly enough. At the end of the day, with this structure, I get a compile time in the neighborhood of what linear does - for complicated programs (like zlib), the difference is noticeable (100 vs 148 ms spent in the turbofan pipeline, with regalloc alone being 3x more expensive - from 18ms to 66ms), but it's not 5 minutes, and at this stage, all I care about is codegen quality - gathering data to allow us to analyse the value of this allocator, for a scenario that's mainly AoT compilation. Not saying compile time doesn't matter - just prioritizing it after we feel happy enough with the codegen. Sorry for the long rant :) But if you have suggestions for improvements here, let's chat, but I'd prefer separating that from this particular CL - I need to land this to unblock more changes in the code quality side of things - but this structure, I feel, will be the battleground for faster compile time, so I care about it quite a lot. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:924: struct Comparer : public std::binary_function<AllocatedInterval, On 2015/06/11 05:30:30, Benedikt Meurer wrote: > Why do you need this Comparer class? Can't you just define operator< on > AllocatedInterval directly? Oh, I see - because std::less would then just use that? Sure - this way though I could have other intrinsic comparers to AllocatedInterval, while in the scenario of the storage, I want this one. I admit, that's all hypothetical, though. At the same time, I see no inefficiency with the current design, and, subjectively, I find it clearer. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:937: // Iterator over allocated live ranges (CoalescedLiveRanges) that conflict with On 2015/06/11 05:30:29, Benedikt Meurer wrote: > This whole iterator looks quite complex. Maybe we should have a better data > structure that does not require such a complex iterator instead? Possibly, please see long rant above :) https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:941: friend class CoalescedLiveRanges; On 2015/06/11 05:30:30, Benedikt Meurer wrote: > friend classes should be listed in the private section. Done. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:947: LiveRange* operator*() { return pos_->range; } On 2015/06/11 05:30:29, Benedikt Meurer wrote: > This should return a LiveRange&, not a LiveRange*. The element of the set if "LiveRange*", so dereferencing the iterator produces a value typed as that. It's akin to how for a std::set<LiveRange*>, dereferencing the iterator also returns LiveRange*. https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:950: bool IsEnd() const { On 2015/06/11 05:30:29, Benedikt Meurer wrote: > STL iterators compare against some end. Don't try to add a wrapper for that, > which might get out of sync. I'm not sure how it could it get out of sync (and what it's sync-ed to). The trade-off here is: similarity with reusable collections, vs some code simplicity savings in a few places. If we removed IsEnd, we'd have to carry around the "right" end, however all scenarios we're using this iterator use it in a very constrained manner - no mutations for example. But, if we feel similarity with STL trumps that, happy to change it accordingly (removing IsEnd). I went back and forth on this one, so any nudge will do :) https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:989: Invalidate(); On 2015/06/11 05:30:29, Benedikt Meurer wrote: > Initialize properly instead of calling Invalidate in the constructor. Not sure what you mean by "proper" initialization, but the goal here is to have an "end" constructor for a given storage. I'm using "Invalidate" as a canonical way of achieving that. What trade-offs am I missing? https://codereview.chromium.org/1157663007/diff/120001/src/disassembler.cc File src/disassembler.cc (right): https://codereview.chromium.org/1157663007/diff/120001/src/disassembler.cc#ne... src/disassembler.cc:297: } On 2015/06/11 05:30:30, Benedikt Meurer wrote: > Please undo this change, especially since this file is otherwise unchanged. This (well, undoing it) makes git cl format complain, actually... I'll figure out the git gymnastics to get out of that.
bmeurer@chromium.org changed reviewers: + danno@chromium.org
This is actually a big CL, which I'm not sure we can get into landable state soonish. I suspect that we could proceed a lot faster if we split the work into manageable parts, something along these lines: 1.) Do the minimal amount of work required to make the "greedy allocator bot" green (i.e. passing all our current tests). 2.) Implement the basic algorithm (as mentioned in http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html) without any fancy heuristics, but with the interface introduced by LLVM to make the splitting/spilling decisions. 3.) Stabilize the basic algorithm and do measurements against LS. 4.) Based on that, split the basic allocator from LS and use appropriate data structures instead of trying to share all datastructures with LS. 5.) Start implementing the greedy heuristics on top of the basic allocator, just like LLVM did. 6.) Weaponize. WDYT?
On 2015/06/11 08:15:43, Benedikt Meurer wrote: > This is actually a big CL, which I'm not sure we can get into landable state > soonish. I suspect that we could proceed a lot faster if we split the work into > manageable parts, something along these lines: > > 1.) Do the minimal amount of work required to make the "greedy allocator bot" > green (i.e. passing all our current tests). > 2.) Implement the basic algorithm (as mentioned in > http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html) without > any fancy heuristics, but with the interface introduced by LLVM to make the > splitting/spilling decisions. > 3.) Stabilize the basic algorithm and do measurements against LS. > 4.) Based on that, split the basic allocator from LS and use appropriate data > structures instead of trying to share all datastructures with LS. > 5.) Start implementing the greedy heuristics on top of the basic allocator, just > like LLVM did. > 6.) Weaponize. > > WDYT? One more thing - it would be very useful if we could have a similar split between the mechanics of managing live ranges and the prioritization/splitting decisions, similar to what LLVM has (see http://llvm.org/doxygen/classllvm_1_1RegAllocBase.html for the interface).
Some comments I accumulated yesterday. I realize that a lot of this code will be changed, so these are more like suggestions for how the new code should like. https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:2697: pos_ = storage_->lower_bound(AsAllocatedInterval(q_start)); On 2015/06/10 05:37:11, Mircea Trofin wrote: > On 2015/06/09 15:00:42, jarin wrote: > > Would not the upper_bound be easier to handle here? Then you can just > MoveToEnd > > and return if pos_ is at the beginning, or take --pos_ otherwise. > > upper_bound would never return a pos_ starting at q_start, if I understand your > suggestion correctly, but regardless, your suggestion led me to something > cleaner than the unhappy mess before. Thanks! Actually, you did what I had in mind (and fixed the gaps in my thinking). https://codereview.chromium.org/1157663007/diff/80001/src/compiler/register-a... src/compiler/register-allocator.cc:3301: CHECK(top_sp_range != nullptr); CHECK -> DCHECK top_sp_range -> top_spill_range We generally do not like abbreviations unless it is crystal clear what they mean. (Style guide says: "Function names, variable names, and filenames should be descriptive; eschew abbreviation.") https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:1028: bool TryAllocateGroupAtRegister(unsigned reg, LiveRangeGroup* grp); grp -> group https://codereview.chromium.org/1157663007/diff/120001/src/compiler/register-... src/compiler/register-allocator.h:1031: bool IsGroup() { return is_grp_; } is_grp_ -> is_group_ https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:349: group_(nullptr) { Initialize size_ and weight_ here, please. It is safer. (It will be compiled away if you worry about performance.) https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:852: for (auto c = first_pos(); c != nullptr; c = c->next()) { Use the real type here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:854: if (CanBeSpilled(c->pos())) { I suppose you wanted to use your new CanBeSplit method here (and below). https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:910: auto loop_count = GetContainingLoopCount( Use the real type here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:2631: if (pos_ == storage_->end()) { The iterator should always be in canonical form, no? Moreover, the ++ operator should not be called when already at the end, so perhaps just do "DCHECK(pos_ != storage_->end())" here? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:2796: ZoneSet<AllocatedInterval, AllocatedInterval::Comparer> storage_; storage_ -> intervals_, perhaps? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3017: if (range->group() != nullptr) { Package the range->group() == nullptr check into the "if (...) continue;" above. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3018: auto grp = range->group(); grp -> group https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3044: while (!c.IsEnd() && *c == range) ++c; When can it happen that the iterator gives the same element several times in a row? This just should not happen. (If it does happen, the iterator should be fixed.) https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3066: // now that we have groups? Does it make any performance difference if you keep this code? If no, please remove? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3099: free_reg = FindRegisterToEvictForRange(all_conflicts, my_weight); What happens if you do not evict here? (And spill instead, just like the basic algorithm from LLVM does?) Could you measure the performance impact? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3120: bool GreedyAllocator::TryAllocateGroup(LiveRangeGroup* grp) { Do you have any performance numbers on group allocation? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3134: auto first_conflict = allocations_[reg]->conflicts_begin(r); Could we have a method on CoalescedLiveRanges that just returns bool if a given range conflicts with the supplied live range? That way, you would not have to mess with the conflict_iterator here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3147: conflict_iterator all_conflicts[RegisterConfiguration::kMaxDoubleRegisters], Sized array argument is a bit esoteric construct in C++ (it passes just the reference, not the entire array), and in fact some earlier versions of Visual C++ had trouble with this. Could you omit the size of the array here? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3152: float w = CalculateMaxSpillWeight(all_conflicts[i], w -> weight https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3170: float grp_counter_weight = 0.0; grp -> group. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3174: auto w = CalculateMaxSpillWeight(first_conflict, Do not use auto here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3179: if (grp_counter_weight >= competing) continue; Could you fold the grp_counter_weight >= competing into the next if statement (i.e., say "if (grp_counter_weight < competing && grp_counter_weight < smallest_weight)" there? In fact, looking at this you should be able to start with smallest_weight = competing and get rid of the check above, no? The same question applies to the method above. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3190: void GreedyAllocator::AllocateGroup(LiveRangeGroup* grp) { grp -> group https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3206: float grp_weight = -1.0; grp_weight -> group_weight https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3212: if (grp_weight < 0.0) { If you really insist on having the same types for left and right side, use 0.0f here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3263: auto current = element.get_range(); No need for the intermediate variable. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3277: if (!r->spilled()) continue; Fold into the previous if? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3278: auto top = r->TopLevel(); Use the real type here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3282: CHECK(top_sp_range != nullptr); DCHECK? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3303: auto start = current->Start(); Explicit types, please. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.cc:3389: Enqueue(range); I am wondering whether queuing a group for tail and range would help with our spilling around calls problem (the hope is that the group would get the same register, so moving stuff around would not be necessary). https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:941: conflict_iterator() : storage_(nullptr) {} Initialize all fields here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:969: if (a_start < b_start) return a_end > b_start; Out of curiosity, cannot you just leave out the "if (a_start == b_start) return true" line and replace "a_start < b_start" with "a_start <= b_start"? This special casing for equality seems a bit funny. Perhaps you can even expand the recursive call, so the body would be: if (a_start <= b_start) return b_start < a_end; return b_start <= a_start && a_start < b_end; https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1031: struct PQueueElem { PQueueElem -> QueueElement (or PriorityQueueElement). I am actually wondering whether you could not include the priority field in the struct, so that it is really the element. This would save you some std::pair exercises. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1043: if (is_grp_) return nullptr; The callers should (and do) always consult IsGroup, so it is safer to just do "DCHECK(!IsGroup()); return storage_.range_;" here. https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1049: if (is_grp_) return storage_.grp_; Similarly, "DCHECK(IsGroup()); return storage_.grp_;". https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1061: void GroupAndEnqueue(); GroupAndEnqueue -> EnqueuePhiGroups? https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1075: PQueue; PQueue -> PriorityQueue https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:1094: float CalculateMaxSpillWeight(const TIter& begin, const TIter& end); Static?
svenpanne@chromium.org changed reviewers: + svenpanne@chromium.org
Quick DBC... https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... File src/compiler/register-allocator.h (right): https://codereview.chromium.org/1157663007/diff/160001/src/compiler/register-... src/compiler/register-allocator.h:969: if (a_start < b_start) return a_end > b_start; On 2015/06/12 04:09:12, jarin wrote: > Out of curiosity, cannot you just leave out the "if (a_start == b_start) return > true" line and replace "a_start < b_start" with "a_start <= b_start"? > > This special casing for equality seems a bit funny. > > Perhaps you can even expand the recursive call, so the body would be: > > if (a_start <= b_start) return b_start < a_end; > return b_start <= a_start && a_start < b_end; Hmmm, this looks still too complicated, normally just a single line is needed: return a_start <= b_end && b_start <= a_end; I'm not sure about the "open-ness" of our intervals, but I'm quite sure that this condition can be tweaked to any convention. |