src/README.SIMD.rst - Issue 477773003: Subzero: Start a list of SIMD improvement ideas.

Side by Side Diff

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Keyboard Shortcuts

	File
u :	up to issue
j / k :	jump to file after / before current file
J / K :	jump to next file with a comment after / before current file
	Side-by-side diff
i :	toggle intra-line diffs
e :	expand all comments
c :	collapse all comments
s :	toggle showing all comments
n / p :	next / previous diff chunk or comment
N / P :	next / previous comment
<Up> / <Down> :	next / previous line

	Issue
u :	up to list of issues
j / k :	jump to patch after / before current patch
o / <Enter> :	open current patch in side-by-side view
i :	open current patch in unified diff view

	Issue List
j / k :	jump to issue after / before current issue
o / <Enter> :	open current issue

Side by Side Diff: src/README.SIMD.rst

Issue 477773003: Subzero: Start a list of SIMD improvement ideas. (Closed) Base URL: https://gerrit.chromium.org/gerrit/p/native_client/pnacl-subzero.git@master

Patch Set: Change name Created 6 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
(Empty)
	1 Missing support

	2 ===============

	3

	4 * The PNaCl LLVM backend expands shufflevector operations into

	5 sequences of insertelement and extractelement operations. For

	6 instance:

	7

	8 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

	9 entry:

	10 %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i 32 5, i32 0, i32 1>

	11 ret <4 x i32> %res

	12 }

	13

	14 gets expanded into:

	15

	16 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

	17 entry:

	18 %0 = extractelement <4 x i32> %arg2, i32 0

	19 %1 = insertelement <4 x i32> undef, i32 %0, i32 0

	20 %2 = extractelement <4 x i32> %arg2, i32 1

	21 %3 = insertelement <4 x i32> %1, i32 %2, i32 1

	22 %4 = extractelement <4 x i32> %arg1, i32 0

	23 %5 = insertelement <4 x i32> %3, i32 %4, i32 2

	24 %6 = extractelement <4 x i32> %arg1, i32 1

	25 %7 = insertelement <4 x i32> %5, i32 %6, i32 3

	26 ret <4 x i32> %7

	27 }

	28

	29 Subzero should recognize these sequences and recombine them into

	30 shuffle operations where appropriate.

	31

	32 * Add support for vector constants in the backend. The current code

	33 materializes the vector constants it needs (eg. for performing icmp

	34 on unsigned operands) using register operations, but this should be

	35 changed to loading them from a constant pool if the register

	36 initialization is too complicated (such as in

	37 TargetX8632::makeVectorOfHighOrderBits()).

	38

	39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory

	40 operand when assembling x86-32 code. The current

	41 InstX8632Lea::emit() code uses Variable::asType() to convert any

	42 mem128 Variables into a compatible memory operand type. However, the

	43 emit code does not do any conversions of OperandX8632Mem, so if an

	44 OperandX8632Mem is passed to lea as mem128 the resulting code will

	45 not assemble. One way to fix this is by implementing

	46 OperandX8632Mem::asType().

	47

	48 * [x86 specific] Lower shl with <4 x i32> using some clever float

	49 conversion:

	50 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html

	51

	52 * [x86 specific] Add support for using aligned mov operations

	53 (movaps). This will require passing alignment information to loads

	54 and stores.

	55

	56 x86 SIMD Diversification

	57 ========================

	58

	59 * Vector "bitwise" operations have several variant instructions: the

	60 AND operation can be implemented with pand, andpd, or andps. This

	61 pattern also holds for ANDN, OR, and XOR.

	62

	63 * Vector "mov" instructions can be diversified (eg. movdqu instead of

	64 movups) at the cost of a possible performance penalty.

	65

	66 * Scalar FP arithmetic can be diversified by performing the operations

	67 with the vector version of the instructions.

OLD	NEW

« no previous file with comments | « no previous file | no next file » | no next file with comments »