Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(133)

Issue 23654026: Use xorps to break the cvtsi2sd unnecessary dependence due to its partially written (Closed)

Created:
7 years, 3 months ago by weiliang.lin2
Modified:
7 years, 2 months ago
Reviewers:
Yang
CC:
v8-dev
Base URL:
git://github.com/v8/v8.git@master
Visibility:
Public.

Description

Use xorps to break the cvtsi2sd unnecessary dependence due to its partially written BUG= R=yangguo@chromium.org Committed: https://code.google.com/p/v8/source/detail?r=16702

Patch Set 1 #

Patch Set 2 : bundle xorps and cvtsi2sd into a MacroAssembler #

Patch Set 3 : x64 port #

Total comments: 1

Patch Set 4 : rebase to master and address comments #

Unified diffs Side-by-side diffs Delta from patch set Stats (+80 lines, -50 lines) Patch
M src/ia32/code-stubs-ia32.cc View 1 2 3 12 chunks +14 lines, -14 lines 0 comments Download
M src/ia32/codegen-ia32.cc View 1 1 chunk +1 line, -1 line 0 comments Download
M src/ia32/lithium-codegen-ia32.cc View 1 2 3 5 chunks +5 lines, -5 lines 0 comments Download
M src/ia32/macro-assembler-ia32.h View 1 2 3 1 chunk +6 lines, -0 lines 0 comments Download
M src/ia32/macro-assembler-ia32.cc View 1 2 3 5 chunks +10 lines, -4 lines 0 comments Download
M src/ia32/stub-cache-ia32.cc View 1 2 3 2 chunks +2 lines, -2 lines 0 comments Download
M src/x64/code-stubs-x64.cc View 1 2 3 11 chunks +13 lines, -13 lines 0 comments Download
M src/x64/codegen-x64.cc View 1 2 1 chunk +1 line, -1 line 0 comments Download
M src/x64/lithium-codegen-x64.cc View 1 2 3 4 chunks +5 lines, -5 lines 0 comments Download
M src/x64/macro-assembler-x64.h View 1 2 3 1 chunk +6 lines, -0 lines 0 comments Download
M src/x64/macro-assembler-x64.cc View 1 2 3 4 chunks +15 lines, -3 lines 0 comments Download
M src/x64/stub-cache-x64.cc View 1 2 3 2 chunks +2 lines, -2 lines 0 comments Download

Messages

Total messages: 13 (0 generated)
weiliang.lin2
There is obvious performance improvement on IA32 when using xorps to solve the cvtsi2sd partially ...
7 years, 3 months ago (2013-09-10 14:39:13 UTC) #1
Jakob Kummerow
Redirecting to Yang.
7 years, 3 months ago (2013-09-11 11:34:14 UTC) #2
Yang
On 2013/09/11 11:34:14, Jakob wrote: > Redirecting to Yang. We use cvtsi2sd in a couple ...
7 years, 3 months ago (2013-09-11 13:37:02 UTC) #3
Sven Panne
Could you describe the underlying reason why zeroing out things first makes this faster? The ...
7 years, 3 months ago (2013-09-11 13:43:06 UTC) #4
weiliang.lin2
On 2013/09/11 13:37:02, Yang wrote: > On 2013/09/11 11:34:14, Jakob wrote: > > Redirecting to ...
7 years, 3 months ago (2013-09-12 01:44:43 UTC) #5
weiliang.lin2
bundle xorps and cvtsi2sd into a MacroAssembler. Currently use it everywhere. There is not any ...
7 years, 3 months ago (2013-09-12 08:06:51 UTC) #6
weiliang.lin2
x64 port done
7 years, 3 months ago (2013-09-12 08:51:03 UTC) #7
Yang
I can confirm the performance improvements. This is great! Could you add a comment on ...
7 years, 3 months ago (2013-09-12 11:56:49 UTC) #8
weiliang.lin2
On 2013/09/12 11:56:49, Yang wrote: > I can confirm the performance improvements. This is great! ...
7 years, 3 months ago (2013-09-12 14:15:03 UTC) #9
Yang
On 2013/09/12 14:15:03, weiliang.lin2 wrote: > On 2013/09/12 11:56:49, Yang wrote: > > I can ...
7 years, 3 months ago (2013-09-12 15:28:57 UTC) #10
Yang
Committed patchset #4 manually as r16702 (presubmit successful).
7 years, 3 months ago (2013-09-13 08:00:02 UTC) #11
Sven Panne
[ Reviving the dead... :-] While the performance improvement on kraken's audio-oscillator is great, I've ...
7 years, 2 months ago (2013-10-16 08:58:17 UTC) #12
Weiliang
7 years, 2 months ago (2013-10-17 04:02:14 UTC) #13
Message was sent while issue was closed.
On 2013/10/16 08:58:17, Sven Panne wrote:
> [ Reviving the dead... :-]
> While the performance improvement on kraken's audio-oscillator is great, I've
> just seen that we actually regress SunSpider's math-spectral-norm by roughly
25%
> on linux-x64 (Core 2 and Core i5). I am not sure what the actual reason is,
> bmeurer@ has the theory that it might be related to single vs. double mode.
> Anyway, we might have to investigate this further, any help from the Intel
side
> would be highly appreciated.

Hi Sven, 
Thanks a lot for your sharing. 

I did a quick experiment on my PC (Core 2) by simply not using xorps before
cvtsi2sd as below:

diff --git a/src/x64/macro-assembler-x64.cc b/src/x64/macro-assembler-x64.cc
index 9dcb9d1..05ae973 100644
--- a/src/x64/macro-assembler-x64.cc
+++ b/src/x64/macro-assembler-x64.cc
@@ -936,13 +936,11 @@ void MacroAssembler::PopCallerSaved(SaveFPRegsMode
fp_mode,


 void MacroAssembler::Cvtlsi2sd(XMMRegister dst, Register src) {
-  xorps(dst, dst);
   cvtlsi2sd(dst, src);
 }


 void MacroAssembler::Cvtlsi2sd(XMMRegister dst, const Operand& src) {
-  xorps(dst, dst);
   cvtlsi2sd(dst, src);
 }



Then I got the following result for SunSpider's math-spectral-norm. Because
math-spectral-norm is a quite small case (only 4-5 ms in my machine), I load it
ten time to measure its performance.

Case: (run.js)
var _start = new Date();
for (var ii = 0; ii < 10; ii += 1) {
    load("math-spectral-norm.js");
}
var elapsed = new Date() - _start;
print(elapsed + " ms");

Result:
[~]./d8.x64.no.xorps run.js
28 ms
[~]./d8.x64.xorps run.js
23 ms

So it seems there is ~17% improvement by using xorps. :)

Powered by Google App Engine
This is Rietveld 408576698