Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(1224)

Issue 1385433002: Subzero: Use register availability during lowering to improve the code. (Closed)

Created:
5 years, 2 months ago by Jim Stichnoth
Modified:
5 years, 2 months ago
CC:
native-client-reviews_googlegroups.com
Base URL:
https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Target Ref:
refs/heads/master
Visibility:
Public.

Description

Subzero: Use register availability during lowering to improve the code. The problem is that given code like this: a = b + c d = a + e ... ... (use of a) ... Lowering may produce code like this, at least on x86: T1 = b T1 += c a = T1 T2 = a T2 += e d = T2 ... ... (use of a) ... If "a" has a long live range, it may not get a register, resulting in clumsy code in the middle of the sequence like "a=reg; reg=a". Normally one might expect store forwarding to make the clumsy code fast, but it does presumably add an extra instruction-retirement cycle to the critical path in a pointer-chasing loop, and makes a big difference on some benchmarks. The simple fix here is, at the end of lowering "a=b+c", keep track of the final "a=T1" assignment. Then, when lowering "d=a+e" and we look up "a", we can substitute "T1". This slightly increases the live range of T1, but it does a great job of avoiding the redundant reload of the register from the stack location. A more general fix (in the future) might be to do live range splitting and let the register allocator handle it. BUG= https://code.google.com/p/nativeclient/issues/detail?id=4095 R=kschimpf@google.com Committed: https://gerrit.chromium.org/gerrit/gitweb?p=native_client/pnacl-subzero.git;a=commit;h=318f4cdaa21eac5ef1d16731e51cd7adb3083d3b

Patch Set 1 #

Patch Set 2 : Update phi test per new O2 optimizations #

Patch Set 3 : Add comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+68 lines, -9 lines) Patch
M src/IceCfgNode.h View 1 chunk +1 line, -1 line 0 comments Download
M src/IceCfgNode.cpp View 2 chunks +2 lines, -1 line 0 comments Download
M src/IceRegAlloc.h View 1 chunk +1 line, -1 line 0 comments Download
M src/IceRegAlloc.cpp View 1 chunk +1 line, -1 line 0 comments Download
M src/IceTargetLowering.h View 1 2 2 chunks +8 lines, -0 lines 0 comments Download
M src/IceTargetLowering.cpp View 2 chunks +26 lines, -0 lines 0 comments Download
M src/IceTargetLoweringX86BaseImpl.h View 1 2 2 chunks +19 lines, -0 lines 0 comments Download
M tests_lit/llvm2ice_tests/callindirect.pnacl.ll View 1 chunk +3 lines, -1 line 0 comments Download
M tests_lit/llvm2ice_tests/phi.ll View 1 1 chunk +7 lines, -4 lines 0 comments Download

Messages

Total messages: 5 (2 generated)
Jim Stichnoth
This gives a 14% improvement on our dear friend ammp. 4% spec2k geomean improvement overall.
5 years, 2 months ago (2015-10-01 15:06:13 UTC) #3
Karl
lgtm
5 years, 2 months ago (2015-10-01 21:34:28 UTC) #4
Jim Stichnoth
5 years, 2 months ago (2015-10-02 04:02:42 UTC) #5
Message was sent while issue was closed.
Committed patchset #3 (id:40001) manually as
318f4cdaa21eac5ef1d16731e51cd7adb3083d3b (presubmit successful).

Powered by Google App Engine
This is Rietveld 408576698