Issue 860283004: Improve the x64 Load and Store access instruction selection to include immediate offsets.

Created:
5 years, 10 months ago by dougc

Modified:
4 years, 8 months ago

Reviewers:
Jarin, danno

CC:
v8-dev

Base URL:
https://chromium.googlesource.com/v8/v8.git@master

Target Ref:
refs/pending/heads/master

Project:
v8

Visibility:
Public.

More Reviews

Description

Improve the x64 Load and Store access instruction selection to include immediate offsets. Currently GetEffectiveAddressMemoryOperand() uses BaseWithIndexAndDisplacement64Match. However this does not recognize the ChangeUint32ToUint64 operation which is applied to the index. It looks like a lot of work to rework BaseWithIndexAndDisplacement64Match, and BaseWithIndexAndDisplacement32Match seems ok as-is, so it seems more appropriate to add x64 backend specific code to match the patterns which is done inline in GetEffectiveAddressMemoryOperand(). Some of the code in VisitChangeUint32ToUint64() has been refactored into OpZeroExtends() and is needed in the pattern matching. The BaseWithIndexAndDisplacementMatch code has some OwnedBy() tests. It's not immediately clear if these are to maintain some invariant or if they are just a heuristic? It seems generally better to dehoist the index offsets aggressively so the OwnedBy() test is omitted in this patch. This could use some reviewer scrutiny? A TODO note is included to explore the scaled-index patterns matched by BaseWithIndexAndDisplacement64Match. Such patterns are less likely to be proven within bounds and thus not taken by these paths anyway, so they are left for future work. This patch gives a useful x0.92 improvement in run time for an asm.js zlib benchmark when using index masking to avoid bounds checks. BUG=

		Unified diffs	Side-by-side diffs	Delta from patch set	Stats (+89 lines, -37 lines)			Patch
	M	src/compiler/x64/instruction-selector-x64.cc	View	1 2 3 4	3 chunks	+89 lines, -37 lines	0 comments	Download

Messages

Total messages: 16 (4 generated)

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages

Benedikt Meurer

I think this problem is not limited to x64, but affects all 64bit platforms, so ...

5 years, 10 months ago (2015-01-28 05:55:34 UTC) #4

dougc

Thank you for the feedback. There was a problem with the difference between the int32 ...

5 years, 10 months ago (2015-01-29 08:14:08 UTC) #5

Benedikt Meurer

On 2015/01/29 at 08:14:08, dtc-v8 wrote: > Thank you for the feedback. > > There ...

5 years, 10 months ago (2015-01-29 08:17:28 UTC) #6

dougc

On 2015/01/29 08:17:28, Benedikt Meurer wrote: > On 2015/01/29 at 08:14:08, dtc-v8 wrote: ... > ...

5 years, 10 months ago (2015-01-29 09:23:43 UTC) #7

Benedikt Meurer

I think this is something that can only be solved properly by adding support for ...

5 years, 10 months ago (2015-01-30 04:50:09 UTC) #8

dougc

On 2015/01/30 04:50:09, Benedikt Meurer wrote: > I think this is something that can only ...

5 years, 10 months ago (2015-01-30 12:13:20 UTC) #10

Benedikt Meurer

> The index is constrained to be a 31-bit unsigned integer because: it is an ...

5 years, 10 months ago (2015-01-30 12:15:03 UTC) #11

dougc

On 2015/01/30 12:15:03, Benedikt Meurer wrote: > > The index is constrained to be a ...

5 years, 10 months ago (2015-01-30 12:36:57 UTC) #12

Benedikt Meurer

On 2015/01/30 12:36:57, dougc wrote: > On 2015/01/30 12:15:03, Benedikt Meurer wrote: > > > ...

5 years, 10 months ago (2015-02-04 18:39:34 UTC) #13

dougc

On 2015/02/04 18:39:34, Benedikt Meurer wrote: > On 2015/01/30 12:36:57, dougc wrote: > > On ...

5 years, 10 months ago (2015-02-05 13:36:07 UTC) #14

On 2015/02/04 18:39:34, Benedikt Meurer wrote:
> On 2015/01/30 12:36:57, dougc wrote:
> > On 2015/01/30 12:15:03, Benedikt Meurer wrote:
> > > > The index is constrained to be a 31-bit unsigned integer because: it is
an
> > > array buffer index, and the maximum array buffer size is 2GB, and at this
> > point
> > > in the code the index is known to be within bounds?
> > > 
> > > That's only true for asm.js heap access, not for Load/Store in general.
> > 
> > Could you help me understand what aspect of asm.js this is limited to?
> > 
> > This code path is only taken for typed arrays, when
> > IsExternalArrayElementsKind() is true? E.g. See the logic in
> > JSTypedLowering::ReduceJSLoadProperty().
> 
> As said, that is the current implementation, which only considers asm.js. But
we
> will be targeting regular JavaScript soon(ish), and that means you can have
> abitrary Loads/Stores, i.e. even Loads/Stores for Blink DOM objects or other
> arrays/objects/whatever. You cannot make any assumptions about the index in
that
> case.

It's true that asm.js heap access patterns are a subset of the code this path
supports, but this path is not limited to asm.js, rather it deals with typed
array element access. For example u32[i+K] is supported by this path but is not
asm.js.

It would appear that LoadElement() and StoreElement() are only used when the
index is within bounds, as they do no bounds checking, so even if these were
used for other objects they would only be used when the index is known to be
within bounds of the object?  Further ReduceJSLoadProperty() would be checking
that the index is within the 2G limit. Thus even in future the index will be an
unsigned 31-bit integer if the LoadElement() or StoreElement() paths are taken?

Transforming the addition to a 64 bit operation earlier might also frustrate
later optimizations. For example the right and left shift are optimized away via
some patterns etc, and the addition moved over a low bit masking operation. A 64
bit op might require more general pattern matching and might not even capture
the constrains needed to satisfy the transforms. It might be sound to lower to
32 bit ops when possible.

If the code is in flux then I can wait and rework the patch in future.

Benedikt Meurer

5 years, 10 months ago (2015-02-05 13:53:33 UTC) #15

On 2015/02/05 at 13:36:07, dtc-v8 wrote:
> On 2015/02/04 18:39:34, Benedikt Meurer wrote:
> > On 2015/01/30 12:36:57, dougc wrote:
> > > On 2015/01/30 12:15:03, Benedikt Meurer wrote:
> > > > > The index is constrained to be a 31-bit unsigned integer because: it
is an
> > > > array buffer index, and the maximum array buffer size is 2GB, and at
this
> > > point
> > > > in the code the index is known to be within bounds?
> > > > 
> > > > That's only true for asm.js heap access, not for Load/Store in general.
> > > 
> > > Could you help me understand what aspect of asm.js this is limited to?
> > > 
> > > This code path is only taken for typed arrays, when
> > > IsExternalArrayElementsKind() is true? E.g. See the logic in
> > > JSTypedLowering::ReduceJSLoadProperty().
> > 
> > As said, that is the current implementation, which only considers asm.js.
But we
> > will be targeting regular JavaScript soon(ish), and that means you can have
> > abitrary Loads/Stores, i.e. even Loads/Stores for Blink DOM objects or other
> > arrays/objects/whatever. You cannot make any assumptions about the index in
that
> > case.
> 
> It's true that asm.js heap access patterns are a subset of the code this path
supports, but this path is not limited to asm.js, rather it deals with typed
array element access. For example u32[i+K] is supported by this path but is not
asm.js.
> 
> It would appear that LoadElement() and StoreElement() are only used when the
index is within bounds, as they do no bounds checking, so even if these were
used for other objects they would only be used when the index is known to be
within bounds of the object?  Further ReduceJSLoadProperty() would be checking
that the index is within the 2G limit. Thus even in future the index will be an
unsigned 31-bit integer if the LoadElement() or StoreElement() paths are taken?
> 
> Transforming the addition to a 64 bit operation earlier might also frustrate
later optimizations. For example the right and left shift are optimized away via
some patterns etc, and the addition moved over a low bit masking operation. A 64
bit op might require more general pattern matching and might not even capture
the constrains needed to satisfy the transforms. It might be sound to lower to
32 bit ops when possible.
> 
> If the code is in flux then I can wait and rework the patch in future.

Yes, let's first get the missing bits for JavaScript in.

Expand Messages | Collapse Messages | Show Generated Messages | Hide Generated Messages