|
|
Created:
4 years ago by Leszek Swirski Modified:
4 years ago CC:
v8-reviews_googlegroups.com, rmcilroy Target Ref:
refs/pending/heads/master Project:
v8 Visibility:
Public. |
Description[turbofan] Add and use bytecode loop assigment analysis
Adds assignment tracking to the bytecode analysis pass, and updates
bytecode graph builder to only create LoopExitValues for assigned
values.
Review-Url: https://codereview.chromium.org/2558093005
Cr-Commit-Position: refs/heads/master@{#41719}
Committed: https://chromium.googlesource.com/v8/v8/+/bcb38979f732f282f52876c3412b99bb48d02fcc
Patch Set 1 #Patch Set 2 : Store parent loop and assignments together, to save a map lookup #Patch Set 3 : Use assignment analysis in PrepareForLoop #
Total comments: 6
Patch Set 4 : Fix OSR check and address Jaro's comments. #
Total comments: 15
Patch Set 5 : Address Ross's comments #Patch Set 6 : Fix build #
Messages
Total messages: 45 (30 generated)
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
leszeks@chromium.org changed reviewers: + jarin@chromium.org, rmcilroy@chromium.org
Hi Jaro and Ross, Could you take a look at this first-pass of bytecode loop assignment analysis? I haven't finished running performance comparisons yet, but at first look it seems like it doesn't regress, but doesn't improve much either, so ¯\_(ツ)_/¯
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
High level comment before I look in more detail - the loop analysis phase is used by the ASTGraphBuilder to avoid assigning everything as a Phi on loop entry - it looks like you are doing this here unless I'm missing something. Any reason?
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_dbg_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_dbg_ng/builds/17656) v8_linux_dbg_ng_triggered on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_dbg_ng_triggered/b...)
jarin@chromium.org changed reviewers: + mstarzinger@chromium.org
lgtm with nits. However, if it does not help anywhere, we should not add this. https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.h (right): https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.h:65: void Analyze(BailoutId osr_bailout_id); Please explain in the comment what is the parameter. https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:252: BytecodeGraphBuilder::Environment* BytecodeGraphBuilder::Environment::Copy() { Any reason why you restructured environment management. I remember Michi had some arguments to keep it the way it is. (I am adding him as a reviewer.) https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:404: if (true) { ???
Thanks Jaro, I'm addressing Ross's comment, which has brought up some test breakages, and then I'll measure performance again. I did see a small (1-2%) improvement on Box2D with the original version of this patch, and I expect bigger savings from addressing Ross's comment, plus a followup which should reduce the assignment set further. There should also be a memory savings, I haven't measured this but I can if you like. https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:252: BytecodeGraphBuilder::Environment* BytecodeGraphBuilder::Environment::Copy() { On 2016/12/12 06:53:37, Jarin wrote: > Any reason why you restructured environment management. I remember Michi had > some arguments to keep it the way it is. (I am adding him as a reviewer.) Ah, yes, I'd discussed this with him offline before changing it. Long story short, the previous implementation had (IMO) a bug that happened to work. The current environment would be prepared for looping, copied for loop end, and then the loop end copy would be prepared for OSR, which would serendipitously also mutate the current environment's values because they were all Phi nodes. Now that some of those Phi nodes are gone, we have to make sure we prepare for OSR before copying. https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:404: if (true) { On 2016/12/12 06:53:37, Jarin wrote: > ??? Ah, this was left over from preparations for a follow up CL (which subtracts dead registers from the assignment set). I'll move some of those refactors back into here.
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
On 2016/12/12 10:01:42, Leszek Swirski wrote: > Thanks Jaro, > > I'm addressing Ross's comment, which has brought up some test breakages, and > then I'll measure performance again. I did see a small (1-2%) improvement on > Box2D with the original version of this patch, and I expect bigger savings from > addressing Ross's comment, plus a followup which should reduce the assignment > set further. There should also be a memory savings, I haven't measured this but > I can if you like. Oh, that is disappointing. The memory savings should be small in the big picture of things, what matters is compilation time. Could you measure compilation time improvements in benchmarks with --turbo-stats? If we only get 1-2% on one Octane benchmark, then I'd say it is not worth the extra complexity. > https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... > File src/compiler/bytecode-graph-builder.cc (right): > > https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... > src/compiler/bytecode-graph-builder.cc:252: BytecodeGraphBuilder::Environment* > BytecodeGraphBuilder::Environment::Copy() { > On 2016/12/12 06:53:37, Jarin wrote: > > Any reason why you restructured environment management. I remember Michi had > > some arguments to keep it the way it is. (I am adding him as a reviewer.) > > Ah, yes, I'd discussed this with him offline before changing it. > > Long story short, the previous implementation had (IMO) a bug that happened to > work. The current environment would be prepared for looping, copied for loop > end, and then the loop end copy would be prepared for OSR, which would > serendipitously also mutate the current environment's values because they were > all Phi nodes. Now that some of those Phi nodes are gone, we have to make sure > we prepare for OSR before copying. > > https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... > src/compiler/bytecode-graph-builder.cc:404: if (true) { > On 2016/12/12 06:53:37, Jarin wrote: > > ??? > > Ah, this was left over from preparations for a follow up CL (which subtracts > dead registers from the assignment set). I'll move some of those refactors back > into here.
https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/40001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:252: BytecodeGraphBuilder::Environment* BytecodeGraphBuilder::Environment::Copy() { On 2016/12/12 10:01:42, Leszek Swirski wrote: > On 2016/12/12 06:53:37, Jarin wrote: > > Any reason why you restructured environment management. I remember Michi had > > some arguments to keep it the way it is. (I am adding him as a reviewer.) > > Ah, yes, I'd discussed this with him offline before changing it. > > Long story short, the previous implementation had (IMO) a bug that happened to > work. The current environment would be prepared for looping, copied for loop > end, and then the loop end copy would be prepared for OSR, which would > serendipitously also mutate the current environment's values because they were > all Phi nodes. Now that some of those Phi nodes are gone, we have to make sure > we prepare for OSR before copying. Acknowledged. Yes, restructuring as discussed is fine with me. As long as the logic stays in the {BytecodeGraphBuilder} but outside of the {BytecodeGraphBuilder::Environment} class. That way the conditions of when and in which order the mutations of the "environment" are performed are fully determined by the graph-builder. The "environment" only deals with local information (i.e. local to the current point within the control-flow). https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:1955: if (!osr_ast_id_.IsNone() && osr_loop_offset_ == current_offset) { This essentially duplicates the translation of the "osr_ast_id" into a bytecode-offset within the analysis and here. Since we need to have it available inside the analysis and pass the "osr_ast_id" value already, would it be possible to just expose a {BytecodeAnalysis::IsOSREntryPoint(int)} predicate? That way the translation is fully done inside the analysis and doesn't need to be duplicated. We can then remove the {osr_loop_offset} field. WDYT?
On 2016/12/12 13:11:49, Jarin wrote: > Oh, that is disappointing. The memory savings should be small in the big picture > of things, what matters is compilation time. Could you measure compilation time > improvements in benchmarks with --turbo-stats? If we only get 1-2% on one Octane > benchmark, then I'd say it is not worth the extra complexity. Ok, so I've run a full analysis over 1000 runs each of Octane and Kraken, with and without --turbo-stats. It's too much for a comment here, so here's a doc: https://docs.google.com/a/google.com/document/d/1HzomptW7Cp1EvP_UwOnkpYqFV5ug... Highlights are: * 1.47% median improvement in Box2D * 4.48% median improvement in imaging-gaussian-blur * 36Mb (1%) median decrease in total space use on Octane * 5Mb (3%) median decrease in total space use on Kraken * 3.28% median time decrease for loop peeling on Octane * 4.19% median time decrease for loop peeling on Kraken * 11.92%(!) median time decrease for load elimination on Kraken There are some regressions, but there's a lot more green than red.
https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:1955: if (!osr_ast_id_.IsNone() && osr_loop_offset_ == current_offset) { On 2016/12/12 13:16:09, Michael Starzinger wrote: > This essentially duplicates the translation of the "osr_ast_id" into a > bytecode-offset within the analysis and here. Since we need to have it available > inside the analysis and pass the "osr_ast_id" value already, would it be > possible to just expose a {BytecodeAnalysis::IsOSREntryPoint(int)} predicate? > That way the translation is fully done inside the analysis and doesn't need to > be duplicated. We can then remove the {osr_loop_offset} field. WDYT? I can do this, but it's not quite the code duplication that you're implying as one is calculating the loop end offset, and the other is calculating the loop header offset. Do you mind if I punt this to another CL, just to avoid additional complication in this one?
https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:1955: if (!osr_ast_id_.IsNone() && osr_loop_offset_ == current_offset) { On 2016/12/13 13:52:34, Leszek Swirski wrote: > On 2016/12/12 13:16:09, Michael Starzinger wrote: > > This essentially duplicates the translation of the "osr_ast_id" into a > > bytecode-offset within the analysis and here. Since we need to have it > available > > inside the analysis and pass the "osr_ast_id" value already, would it be > > possible to just expose a {BytecodeAnalysis::IsOSREntryPoint(int)} predicate? > > That way the translation is fully done inside the analysis and doesn't need to > > be duplicated. We can then remove the {osr_loop_offset} field. WDYT? > > I can do this, but it's not quite the code duplication that you're implying as > one is calculating the loop end offset, and the other is calculating the loop > header offset. Do you mind if I punt this to another CL, just to avoid > additional complication in this one? Acknowledged. Works for me.
Looks like some nice potential potential perf wins :). A couple of comments and a question for Michi, but LGTM other than these if Michi and Jaro are happy. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:266: // bytecode ignoring the prefix. Michi question (not for this CL) - should we use the prefix offset for OSR bailoutIDs instead? In most other places we treat a prefix + BC as a single unit, with the bytecode offset being the prefix bytecode's offset. Not sure whether this would cause compilation for other parts of OSR though. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:279: } Could you just pull this out into a separate helper function and DCHECK the result of that function rather than adding a #if DEBUG block. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:283: loop_stack_.top().loop_info->assignments().AddAll(); Could you add a comment on why this is necessary https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:294: if (!is_osr_loop) { Do we need this check here? If we AddAll for the OSR loop above then I think it would still be fine to UpdateAssignements here just to simplify the code. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.h (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.h:107: BytecodeLivenessMap liveness_map_; nit - fields go below private functions. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:388: // Rename the environment values. nit - ...values, if they were assigned in the loop.
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: Try jobs failed on following builders: v8_linux_gcc_compile_rel on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_gcc_compile_rel/bu...) v8_linux_mips64el_compile_rel on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_mips64el_compile_r...) v8_linux_nodcheck_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_linux_nodcheck_rel_ng/bu...) v8_mac_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_mac_rel_ng/builds/14182) v8_win64_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win64_rel_ng/builds/19386) v8_win_compile_dbg on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_compile_dbg/builds/2...) v8_win_rel_ng on master.tryserver.v8 (JOB_FAILED, http://build.chromium.org/p/tryserver.v8/builders/v8_win_rel_ng/builds/19500)
The CQ bit was checked by leszeks@chromium.org to run a CQ dry run
Dry run: CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
Thanks Ross. Jaro, does your lgtm still hold? https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:279: } On 2016/12/14 10:27:55, rmcilroy wrote: > Could you just pull this out into a separate helper function and DCHECK the > result of that function rather than adding a #if DEBUG block. Done. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:283: loop_stack_.top().loop_info->assignments().AddAll(); On 2016/12/14 10:27:55, rmcilroy wrote: > Could you add a comment on why this is necessary Done. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:294: if (!is_osr_loop) { On 2016/12/14 10:27:55, rmcilroy wrote: > Do we need this check here? If we AddAll for the OSR loop above then I think it > would still be fine to UpdateAssignements here just to simplify the code. Sure, it's not performance critical I think. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.h (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.h:107: BytecodeLivenessMap liveness_map_; On 2016/12/14 10:27:55, rmcilroy wrote: > nit - fields go below private functions. No idea why I moved these. Done. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... File src/compiler/bytecode-graph-builder.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-g... src/compiler/bytecode-graph-builder.cc:388: // Rename the environment values. On 2016/12/14 10:27:55, rmcilroy wrote: > nit - ...values, if they were assigned in the loop. Done.
The CQ bit was unchecked by commit-bot@chromium.org
Dry run: This issue passed the CQ dry run.
LGTM from my end. https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... File src/compiler/bytecode-analysis.cc (right): https://codereview.chromium.org/2558093005/diff/60001/src/compiler/bytecode-a... src/compiler/bytecode-analysis.cc:266: // bytecode ignoring the prefix. On 2016/12/14 10:27:55, rmcilroy wrote: > Michi question (not for this CL) - should we use the prefix offset for OSR > bailoutIDs instead? In most other places we treat a prefix + BC as a single > unit, with the bytecode offset being the prefix bytecode's offset. Not sure > whether this would cause compilation for other parts of OSR though. Acknowledged. As discussed offline: Doing this before we enter the bytecode generator would require an additional iteration over the bytecode stream. I think the cleanest solution (as mentioned in an earlier comment in this CL) would be to keep passing the {BailoutId} into {BytecodeAnalysis::Analyze} as done in this CL, but not store it in the {BytecodeGenerator} in any form. Exposing a predicate {BytecodeAnalysis::IsOSREntryPoint(int)} would completely encapsulate any calculation involving the OSR BailoutId in this class. I would very much vote for going for such an interface. Doing that in a separate follow-up CL is totally fine with me.
As discussed offline, lgtm from my end.
The CQ bit was checked by leszeks@chromium.org
The patchset sent to the CQ was uploaded after l-g-t-m from rmcilroy@chromium.org Link to the patchset: https://codereview.chromium.org/2558093005/#ps100001 (title: "Fix build")
CQ is trying da patch. Follow status at https://chromium-cq-status.appspot.com/v2/patch-status/codereview.chromium.or...
CQ is committing da patch. Bot data: {"patchset_id": 100001, "attempt_start_ts": 1481808149590260, "parent_rev": "01de216fd7eadedc95bc4b8a772e97a2968905c0", "commit_rev": "bcb38979f732f282f52876c3412b99bb48d02fcc"}
Message was sent while issue was closed.
Description was changed from ========== [turbofan] Add and use bytecode loop assigment analysis Adds assignment tracking to the bytecode analysis pass, and updates bytecode graph builder to only create LoopExitValues for assigned values. ========== to ========== [turbofan] Add and use bytecode loop assigment analysis Adds assignment tracking to the bytecode analysis pass, and updates bytecode graph builder to only create LoopExitValues for assigned values. Review-Url: https://codereview.chromium.org/2558093005 Cr-Commit-Position: refs/heads/master@{#41719} Committed: https://chromium.googlesource.com/v8/v8/+/bcb38979f732f282f52876c3412b99bb48d... ==========
Message was sent while issue was closed.
Committed patchset #6 (id:100001) as https://chromium.googlesource.com/v8/v8/+/bcb38979f732f282f52876c3412b99bb48d... |