src/README.SIMD.rst - Issue 477773003: Subzero: Start a list of SIMD improvement ideas.

Unified Diff: src/README.SIMD.rst

Issue 477773003: Subzero: Start a list of SIMD improvement ideas. (Closed) Base URL: https://gerrit.chromium.org/gerrit/p/native_client/pnacl-subzero.git@master

Patch Set: Change name Created 6 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: src/README.SIMD.rst

diff --git a/src/README.SIMD.rst b/src/README.SIMD.rst

new file mode 100644

index 0000000000000000000000000000000000000000..58f25d96b1fa4285267267eaa56434df0afe31f0

--- /dev/null

+++ b/src/README.SIMD.rst

@@ -0,0 +1,67 @@

+Missing support

+===============

+* The PNaCl LLVM backend expands shufflevector operations into

+ sequences of insertelement and extractelement operations. For

+ instance:

+ define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

+ entry:

+ %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i32 5, i32 0, i32 1>

+ ret <4 x i32> %res

+ }

+ gets expanded into:

+ define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

+ entry:

+ %0 = extractelement <4 x i32> %arg2, i32 0

+ %1 = insertelement <4 x i32> undef, i32 %0, i32 0

+ %2 = extractelement <4 x i32> %arg2, i32 1

+ %3 = insertelement <4 x i32> %1, i32 %2, i32 1

+ %4 = extractelement <4 x i32> %arg1, i32 0

+ %5 = insertelement <4 x i32> %3, i32 %4, i32 2

+ %6 = extractelement <4 x i32> %arg1, i32 1

+ %7 = insertelement <4 x i32> %5, i32 %6, i32 3

+ ret <4 x i32> %7

+ }

+ Subzero should recognize these sequences and recombine them into

+ shuffle operations where appropriate.

+* Add support for vector constants in the backend. The current code

+ materializes the vector constants it needs (eg. for performing icmp

+ on unsigned operands) using register operations, but this should be

+ changed to loading them from a constant pool if the register

+ initialization is too complicated (such as in

+ TargetX8632::makeVectorOfHighOrderBits()).

+* [x86 specific] llvm-mc does not allow lea to take a mem128 memory

+ operand when assembling x86-32 code. The current

+ InstX8632Lea::emit() code uses Variable::asType() to convert any

+ mem128 Variables into a compatible memory operand type. However, the

+ emit code does not do any conversions of OperandX8632Mem, so if an

+ OperandX8632Mem is passed to lea as mem128 the resulting code will

+ not assemble. One way to fix this is by implementing

+ OperandX8632Mem::asType().

+* [x86 specific] Lower shl with <4 x i32> using some clever float

+ conversion:

+http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html

+* [x86 specific] Add support for using aligned mov operations

+ (movaps). This will require passing alignment information to loads

+ and stores.

+x86 SIMD Diversification

+========================

+* Vector "bitwise" operations have several variant instructions: the

+ AND operation can be implemented with pand, andpd, or andps. This

+ pattern also holds for ANDN, OR, and XOR.

+* Vector "mov" instructions can be diversified (eg. movdqu instead of

+ movups) at the cost of a possible performance penalty.

+* Scalar FP arithmetic can be diversified by performing the operations

+ with the vector version of the instructions.

« no previous file with comments | « no previous file | no next file » | no next file with comments »