|
|
Description[turbofan] Fix phi-hinting problem with deferred blocks
Previously, turbofan selected the gap use from first predecessor block when
hinting a phi, unless that block was deferred, in which case the gap move from
the first non-deferred predecessor block was chosen.
This strategy didn't guarantee that an important invariant was maintained: the
predecessor blocks chosen for hinting phis must preceed the phi's block in the
rpo ordering. In most cases the strategy worked, since graphs generated by the
AstGraphBuilder and existing stubs just happened to always generate schedules
where this rpo ordering property for the first predecessor block, but it is
quite possible to generate a code stub by hand that doesn't have this property
(see included test case).
After this CL, the allocator chooses either the the first non-deferred
"rpo-preceeding" block to be the hinting block, or the first deferred
"rpo-preceeding" block if that doesn't exist. In all previously-existing code,
this behavior is the same as the original algorithm, but has the benefit of not
failing in the register allocator in hand-crafted stubs where all the
"rpo-preceeding" predecessors are all in deferred code.
Committed: https://crrev.com/afb0e7a4bded2b0d79558c1acdb039cb8955220c
Cr-Commit-Position: refs/heads/master@{#36689}
Patch Set 1 #Patch Set 2 : Tweaks #
Total comments: 9
Patch Set 3 : Review feedback #Patch Set 4 : Rebase #Patch Set 5 : Feedback #Patch Set 6 : Fix stuff #Patch Set 7 : Rebase #Messages
Total messages: 20 (7 generated)
danno@chromium.org changed reviewers: + bmeurer@chromium.org, mtrofin@chromium.org
PTAL
LGTM, after comments are addressed. https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer non-deferred blocks. Could you add a comment as to why the rpo ordering matters? https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2196: if (!predecessor_block->IsDeferred()) break; Before, if the phi block was deferred, and all its predecessors were also deferred, there would still be a hint. That may matter for code size. Perhaps add to the condition here a "block->IsDeferred() || " ? https://codereview.chromium.org/2030463003/diff/20001/test/cctest/compiler/te... File test/cctest/compiler/test-code-stub-assembler.cc (right): https://codereview.chromium.org/2030463003/diff/20001/test/cctest/compiler/te... test/cctest/compiler/test-code-stub-assembler.cc:414: USE(m.GenerateCode()); Why not CHECK there was code generated?
please take another look https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer non-deferred blocks. On 2016/06/01 15:02:58, Mircea Trofin wrote: > Could you add a comment as to why the rpo ordering matters? Done. https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2196: if (!predecessor_block->IsDeferred()) break; On 2016/06/01 15:02:58, Mircea Trofin wrote: > Before, if the phi block was deferred, and all its predecessors were also > deferred, there would still be a hint. That may matter for code size. Perhaps > add to the condition here a "block->IsDeferred() || " ? Hmmm. Sorry, perhaps I'm missing something here, but I think the case you mention remains unchanged. There is always a hint. If all the predecessor blocks are deferred, then now you get a hint from the last deferred block that is a rpo predecessor, not the first one like in the old version, but you still always get a hint. https://codereview.chromium.org/2030463003/diff/20001/test/cctest/compiler/te... File test/cctest/compiler/test-code-stub-assembler.cc (right): https://codereview.chromium.org/2030463003/diff/20001/test/cctest/compiler/te... test/cctest/compiler/test-code-stub-assembler.cc:414: USE(m.GenerateCode()); On 2016/06/01 15:02:58, Mircea Trofin wrote: > Why not CHECK there was code generated? Done.
lgtm https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2196: if (!predecessor_block->IsDeferred()) break; On 2016/06/02 01:29:37, danno wrote: > On 2016/06/01 15:02:58, Mircea Trofin wrote: > > Before, if the phi block was deferred, and all its predecessors were also > > deferred, there would still be a hint. That may matter for code size. Perhaps > > add to the condition here a "block->IsDeferred() || " ? > > Hmmm. Sorry, perhaps I'm missing something here, but I think the case you > mention remains unchanged. There is always a hint. If all the predecessor blocks > are deferred, then now you get a hint from the last deferred block that is a rpo > predecessor, not the first one like in the old version, but you still always get > a hint. Oh! You are right, sorry. I now see instr is always not null. In fact, the DCHECK just below exposes that invariant :) Could you kick off a perf try, so we see the impact of this change (the hint being last rather than first in deferred subgraphs, that is). I know we don't expect any, but code size => various alignments, who knows.
https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer non-deferred blocks. On 2016/06/02 01:29:37, danno wrote: > On 2016/06/01 15:02:58, Mircea Trofin wrote: > > Could you add a comment as to why the rpo ordering matters? > > Done. (nit, sorry) what's the motivation, though? Why does the hint need to be in that order?
The rpo order was always an implicit requirement (see the new comment), and we just got lucky to always generate graphs with that property. The new test case shows a graph where that properly has to be more carefully enforced, exposing the existing bug. https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... File src/compiler/register-allocator.cc (right): https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer non-deferred blocks. On 2016/06/02 01:42:43, Mircea Trofin wrote: > On 2016/06/02 01:29:37, danno wrote: > > On 2016/06/01 15:02:58, Mircea Trofin wrote: > > > Could you add a comment as to why the rpo ordering matters? > > > > Done. > > (nit, sorry) what's the motivation, though? Why does the hint need to be in that > order? Whoops, lost the edit. Updated the comment in a later patch set with the explanation.
On 2016/06/02 13:17:06, danno wrote: > The rpo order was always an implicit requirement (see the new comment), and we > just got lucky to always generate graphs with that property. The new test case > shows a graph where that properly has to be more carefully enforced, exposing > the existing bug. > > https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... > File src/compiler/register-allocator.cc (right): > > https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... > src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer > non-deferred blocks. > On 2016/06/02 01:42:43, Mircea Trofin wrote: > > On 2016/06/02 01:29:37, danno wrote: > > > On 2016/06/01 15:02:58, Mircea Trofin wrote: > > > > Could you add a comment as to why the rpo ordering matters? > > > > > > Done. > > > > (nit, sorry) what's the motivation, though? Why does the hint need to be in > that > > order? > > Whoops, lost the edit. Updated the comment in a later patch set with the > explanation. Perf jobs seems to show negligible change: "Octane2.1-TF-pr/Total": { "result with patch ": " 3080.73 +- 3.42 (-0.3% regression)", "result without patch": " 3088.60 +- 1.12", "runs": 3, "units": "score" }, "Emscripten": { "Emscripten/Box2d": { "result with patch ": " 8836.00 +- 0.00 (-0.7% regression)", "result without patch": " 8775.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Bullet": { "result with patch ": " 13461.00 +- 0.00 (+0.4% improvement)", "result without patch": " 13521.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Copy": { "result with patch ": " 2123.00 +- 0.00 (-0.1% regression)", "result without patch": " 2121.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Corrections": { "result with patch ": " 2307.00 +- 0.00 (+0.0% improvement)", "result without patch": " 2308.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Fannkuch": { "result with patch ": " 10598.00 +- 0.00 (+0.1% improvement)", "result without patch": " 10605.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Fasta": { "result with patch ": " 6277.00 +- 0.00 (+0.3% improvement)", "result without patch": " 6294.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Life": { "result with patch ": " 2991.00 +- 0.00 (-0.1% regression)", "result without patch": " 2988.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/LuaBinaryTrees": { "result with patch ": " 33533.00 +- 0.00 (+2.5% improvement)", "result without patch": " 34379.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/MemOps": { "result with patch ": " 4249.00 +- 0.00 (+0.8% improvement)", "result without patch": " 4285.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/NBodyJava": { "result with patch ": " 8934.00 +- 0.00 (-0.3% regression)", "result without patch": " 8903.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Primes": { "result with patch ": " 2537.00 +- 0.00 (+0.0% improvement)", "result without patch": " 2537.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/Skinning": { "result with patch ": " 8643.00 +- 0.00 (-1.1% regression)", "result without patch": " 8553.00 +- 0.00", "runs": 1, "units": "ms" }, "Emscripten/ZLib": { "result with patch ": " 8298.00 +- 0.00 (+0.1% improvement)", "result without patch": " 8304.00 +- 0.00", "runs": 1, "units": "ms" }
The CQ bit was checked by danno@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mtrofin@chromium.org Link to the patchset: https://codereview.chromium.org/2030463003/#ps80001 (title: "Feedback")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2030463003/80001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2030463003/80001
On 2016/06/02 15:17:32, danno wrote: > On 2016/06/02 13:17:06, danno wrote: > > The rpo order was always an implicit requirement (see the new comment), and we > > just got lucky to always generate graphs with that property. The new test case > > shows a graph where that properly has to be more carefully enforced, exposing > > the existing bug. > > > > > https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... > > File src/compiler/register-allocator.cc (right): > > > > > https://codereview.chromium.org/2030463003/diff/20001/src/compiler/register-a... > > src/compiler/register-allocator.cc:2188: // in the rpo ordering. Prefer > > non-deferred blocks. > > On 2016/06/02 01:42:43, Mircea Trofin wrote: > > > On 2016/06/02 01:29:37, danno wrote: > > > > On 2016/06/01 15:02:58, Mircea Trofin wrote: > > > > > Could you add a comment as to why the rpo ordering matters? > > > > > > > > Done. > > > > > > (nit, sorry) what's the motivation, though? Why does the hint need to be in > > that > > > order? > > > > Whoops, lost the edit. Updated the comment in a later patch set with the > > explanation. > > Perf jobs seems to show negligible change: > > "Octane2.1-TF-pr/Total": { > "result with patch ": " 3080.73 +- 3.42 (-0.3% regression)", > "result without patch": " 3088.60 +- 1.12", > "runs": 3, > "units": "score" > }, > > "Emscripten": { > "Emscripten/Box2d": { > "result with patch ": " 8836.00 +- 0.00 (-0.7% regression)", > "result without patch": " 8775.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Bullet": { > "result with patch ": " 13461.00 +- 0.00 (+0.4% improvement)", > "result without patch": " 13521.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Copy": { > "result with patch ": " 2123.00 +- 0.00 (-0.1% regression)", > "result without patch": " 2121.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Corrections": { > "result with patch ": " 2307.00 +- 0.00 (+0.0% improvement)", > "result without patch": " 2308.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Fannkuch": { > "result with patch ": " 10598.00 +- 0.00 (+0.1% improvement)", > "result without patch": " 10605.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Fasta": { > "result with patch ": " 6277.00 +- 0.00 (+0.3% improvement)", > "result without patch": " 6294.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Life": { > "result with patch ": " 2991.00 +- 0.00 (-0.1% regression)", > "result without patch": " 2988.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/LuaBinaryTrees": { > "result with patch ": " 33533.00 +- 0.00 (+2.5% improvement)", > "result without patch": " 34379.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/MemOps": { > "result with patch ": " 4249.00 +- 0.00 (+0.8% improvement)", > "result without patch": " 4285.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/NBodyJava": { > "result with patch ": " 8934.00 +- 0.00 (-0.3% regression)", > "result without patch": " 8903.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Primes": { > "result with patch ": " 2537.00 +- 0.00 (+0.0% improvement)", > "result without patch": " 2537.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/Skinning": { > "result with patch ": " 8643.00 +- 0.00 (-1.1% regression)", > "result without patch": " 8553.00 +- 0.00", > "runs": 1, > "units": "ms" > }, > "Emscripten/ZLib": { > "result with patch ": " 8298.00 +- 0.00 (+0.1% improvement)", > "result without patch": " 8304.00 +- 0.00", > "runs": 1, > "units": "ms" > } Thanks!
The CQ bit was unchecked by commit-bot@chromium.org
Try jobs failed on following builders: v8_presubmit on tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_presubmit/builds/16393)
The CQ bit was checked by danno@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from mtrofin@chromium.org Link to the patchset: https://codereview.chromium.org/2030463003/#ps120001 (title: "Rebase")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/patch-status/2030463003/120001 View timeline at https://chromium-cq-status.appspot.com/patch-timeline/2030463003/120001
Message was sent while issue was closed.
Committed patchset #7 (id:120001)
Message was sent while issue was closed.
Description was changed from ========== [turbofan] Fix phi-hinting problem with deferred blocks Previously, turbofan selected the gap use from first predecessor block when hinting a phi, unless that block was deferred, in which case the gap move from the first non-deferred predecessor block was chosen. This strategy didn't guarantee that an important invariant was maintained: the predecessor blocks chosen for hinting phis must preceed the phi's block in the rpo ordering. In most cases the strategy worked, since graphs generated by the AstGraphBuilder and existing stubs just happened to always generate schedules where this rpo ordering property for the first predecessor block, but it is quite possible to generate a code stub by hand that doesn't have this property (see included test case). After this CL, the allocator chooses either the the first non-deferred "rpo-preceeding" block to be the hinting block, or the first deferred "rpo-preceeding" block if that doesn't exist. In all previously-existing code, this behavior is the same as the original algorithm, but has the benefit of not failing in the register allocator in hand-crafted stubs where all the "rpo-preceeding" predecessors are all in deferred code. ========== to ========== [turbofan] Fix phi-hinting problem with deferred blocks Previously, turbofan selected the gap use from first predecessor block when hinting a phi, unless that block was deferred, in which case the gap move from the first non-deferred predecessor block was chosen. This strategy didn't guarantee that an important invariant was maintained: the predecessor blocks chosen for hinting phis must preceed the phi's block in the rpo ordering. In most cases the strategy worked, since graphs generated by the AstGraphBuilder and existing stubs just happened to always generate schedules where this rpo ordering property for the first predecessor block, but it is quite possible to generate a code stub by hand that doesn't have this property (see included test case). After this CL, the allocator chooses either the the first non-deferred "rpo-preceeding" block to be the hinting block, or the first deferred "rpo-preceeding" block if that doesn't exist. In all previously-existing code, this behavior is the same as the original algorithm, but has the benefit of not failing in the register allocator in hand-crafted stubs where all the "rpo-preceeding" predecessors are all in deferred code. Committed: https://crrev.com/afb0e7a4bded2b0d79558c1acdb039cb8955220c Cr-Commit-Position: refs/heads/master@{#36689} ==========
Message was sent while issue was closed.
Patchset 7 (id:??) landed as https://crrev.com/afb0e7a4bded2b0d79558c1acdb039cb8955220c Cr-Commit-Position: refs/heads/master@{#36689} |