Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(265)

Side by Side Diff: src/README.SIMD.rst

Issue 477773003: Subzero: Start a list of SIMD improvement ideas. (Closed) Base URL: https://gerrit.chromium.org/gerrit/p/native_client/pnacl-subzero.git@master
Patch Set: Change name Created 6 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
(Empty)
1 Missing support
2 ===============
3
4 * The PNaCl LLVM backend expands shufflevector operations into
5 sequences of insertelement and extractelement operations. For
6 instance:
7
8 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
9 entry:
10 %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i 32 5, i32 0, i32 1>
11 ret <4 x i32> %res
12 }
13
14 gets expanded into:
15
16 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
17 entry:
18 %0 = extractelement <4 x i32> %arg2, i32 0
19 %1 = insertelement <4 x i32> undef, i32 %0, i32 0
20 %2 = extractelement <4 x i32> %arg2, i32 1
21 %3 = insertelement <4 x i32> %1, i32 %2, i32 1
22 %4 = extractelement <4 x i32> %arg1, i32 0
23 %5 = insertelement <4 x i32> %3, i32 %4, i32 2
24 %6 = extractelement <4 x i32> %arg1, i32 1
25 %7 = insertelement <4 x i32> %5, i32 %6, i32 3
26 ret <4 x i32> %7
27 }
28
29 Subzero should recognize these sequences and recombine them into
30 shuffle operations where appropriate.
31
32 * Add support for vector constants in the backend. The current code
33 materializes the vector constants it needs (eg. for performing icmp
34 on unsigned operands) using register operations, but this should be
35 changed to loading them from a constant pool if the register
36 initialization is too complicated (such as in
37 TargetX8632::makeVectorOfHighOrderBits()).
38
39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory
40 operand when assembling x86-32 code. The current
41 InstX8632Lea::emit() code uses Variable::asType() to convert any
42 mem128 Variables into a compatible memory operand type. However, the
43 emit code does not do any conversions of OperandX8632Mem, so if an
44 OperandX8632Mem is passed to lea as mem128 the resulting code will
45 not assemble. One way to fix this is by implementing
46 OperandX8632Mem::asType().
47
48 * [x86 specific] Lower shl with <4 x i32> using some clever float
49 conversion:
50 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html
51
52 * [x86 specific] Add support for using aligned mov operations
53 (movaps). This will require passing alignment information to loads
54 and stores.
55
56 x86 SIMD Diversification
57 ========================
58
59 * Vector "bitwise" operations have several variant instructions: the
60 AND operation can be implemented with pand, andpd, or andps. This
61 pattern also holds for ANDN, OR, and XOR.
62
63 * Vector "mov" instructions can be diversified (eg. movdqu instead of
64 movups) at the cost of a possible performance penalty.
65
66 * Scalar FP arithmetic can be diversified by performing the operations
67 with the vector version of the instructions.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698