DescriptionSpeed up A64 simulator by removing useless memcpy.
The addresses involved should always be aligned, so we can simply use
a cast, just like the ARM simulator. Even if the alignment assumption
did not hold and the platform we are running on couldn't handle
unaligned access, some #ifdefs would be much more preferable. The
affected member functions were the top 2 in a profile (18% and 15%),
so basically every hack is allowed here to speed things up. :-)
Removed some dead code for literals on the way. If we need to
resurrect it, we should do it without double(!) memcpys.
Generally, I still don't understand why we need the Instr/Instruction
distinction or simply wrap Instr within Instruction, this seems to
be much simpler and cleaner, but this would involve heavier changes.
The overall speedup of this CL is roughly 37%, see the numbers below
for a reduced Octane suite and the check targets:
------------------------------------------------------------
With memcpy:
------------------------------------------------------------
make -j32 a64.release.quickcheck => 03:29
make -j32 a64.release.check => 11:30
Reduced Octane suite => 05:16
Richards: 35.1
DeltaBlue: 64.1
RayTrace: 130
Splay: 66.1
SplayLatency: 619
NavierStokes: 58.7
PdfJS: 89.6
Mandreel: 58.5
MandreelLatency: 242
CodeLoad: 5103
Box2D: 124
----
Score (version 9): 144
------------------------------------------------------------
With casts:
------------------------------------------------------------
make -j32 a64.release.quickcheck => 02:14
make -j32 a64.release.check => 07:21
Reduced Octane suite => 03:21
Richards: 53.3
DeltaBlue: 103
RayTrace: 205
Splay: 95.9
SplayLatency: 859
NavierStokes: 103
PdfJS: 136
Mandreel: 94.8
MandreelLatency: 386
CodeLoad: 6493
Box2D: 179
----
Score (version 9): 219
R=ulan@chromium.org
Committed: https://code.google.com/p/v8/source/detail?r=19929
Patch Set 1 #Patch Set 2 : Use V8_INLINE #Patch Set 3 : Rebased. #Patch Set 4 : Rebased #Messages
Total messages: 6 (0 generated)
|