Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(57)

Side by Side Diff: src/README.SIMD.rst

Issue 1341423002: Reflow comments to use the full width. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Patch Set: Fix spelling and rebase Created 5 years, 3 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « src/PNaClTranslator.cpp ('k') | src/main.cpp » ('j') | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Missing support 1 Missing support
2 =============== 2 ===============
3 3
4 * The PNaCl LLVM backend expands shufflevector operations into 4 * The PNaCl LLVM backend expands shufflevector operations into sequences of
5 sequences of insertelement and extractelement operations. For 5 insertelement and extractelement operations. For instance:
6 instance:
7 6
8 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { 7 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
9 entry: 8 entry:
10 %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i 32 5, i32 0, i32 1> 9 %res = shufflevector <4 x i32> %arg1,
10 <4 x i32> %arg2,
11 <4 x i32> <i32 4, i32 5, i32 0, i32 1>
11 ret <4 x i32> %res 12 ret <4 x i32> %res
12 } 13 }
13 14
14 gets expanded into: 15 gets expanded into:
15 16
16 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { 17 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
17 entry: 18 entry:
18 %0 = extractelement <4 x i32> %arg2, i32 0 19 %0 = extractelement <4 x i32> %arg2, i32 0
19 %1 = insertelement <4 x i32> undef, i32 %0, i32 0 20 %1 = insertelement <4 x i32> undef, i32 %0, i32 0
20 %2 = extractelement <4 x i32> %arg2, i32 1 21 %2 = extractelement <4 x i32> %arg2, i32 1
21 %3 = insertelement <4 x i32> %1, i32 %2, i32 1 22 %3 = insertelement <4 x i32> %1, i32 %2, i32 1
22 %4 = extractelement <4 x i32> %arg1, i32 0 23 %4 = extractelement <4 x i32> %arg1, i32 0
23 %5 = insertelement <4 x i32> %3, i32 %4, i32 2 24 %5 = insertelement <4 x i32> %3, i32 %4, i32 2
24 %6 = extractelement <4 x i32> %arg1, i32 1 25 %6 = extractelement <4 x i32> %arg1, i32 1
25 %7 = insertelement <4 x i32> %5, i32 %6, i32 3 26 %7 = insertelement <4 x i32> %5, i32 %6, i32 3
26 ret <4 x i32> %7 27 ret <4 x i32> %7
27 } 28 }
28 29
29 Subzero should recognize these sequences and recombine them into 30 Subzero should recognize these sequences and recombine them into
30 shuffle operations where appropriate. 31 shuffle operations where appropriate.
31 32
32 * Add support for vector constants in the backend. The current code 33 * Add support for vector constants in the backend. The current code
33 materializes the vector constants it needs (eg. for performing icmp 34 materializes the vector constants it needs (eg. for performing icmp on
34 on unsigned operands) using register operations, but this should be 35 unsigned operands) using register operations, but this should be changed to
35 changed to loading them from a constant pool if the register 36 loading them from a constant pool if the register initialization is too
36 initialization is too complicated (such as in 37 complicated (such as in TargetX8632::makeVectorOfHighOrderBits()).
37 TargetX8632::makeVectorOfHighOrderBits()).
38 38
39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory 39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory operand
40 operand when assembling x86-32 code. The current 40 when assembling x86-32 code. The current InstX8632Lea::emit() code uses
41 InstX8632Lea::emit() code uses Variable::asType() to convert any 41 Variable::asType() to convert any mem128 Variables into a compatible memory
42 mem128 Variables into a compatible memory operand type. However, the 42 operand type. However, the emit code does not do any conversions of
43 emit code does not do any conversions of OperandX8632Mem, so if an 43 OperandX8632Mem, so if an OperandX8632Mem is passed to lea as mem128 the
44 OperandX8632Mem is passed to lea as mem128 the resulting code will 44 resulting code will not assemble. One way to fix this is by implementing
45 not assemble. One way to fix this is by implementing
46 OperandX8632Mem::asType(). 45 OperandX8632Mem::asType().
47 46
48 * [x86 specific] Lower shl with <4 x i32> using some clever float 47 * [x86 specific] Lower shl with <4 x i32> using some clever float conversion:
49 conversion:
50 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html 48 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html
51 49
52 * [x86 specific] Add support for using aligned mov operations 50 * [x86 specific] Add support for using aligned mov operations (movaps). This
53 (movaps). This will require passing alignment information to loads 51 will require passing alignment information to loads and stores.
54 and stores.
55 52
56 x86 SIMD Diversification 53 x86 SIMD Diversification
57 ======================== 54 ========================
58 55
59 * Vector "bitwise" operations have several variant instructions: the 56 * Vector "bitwise" operations have several variant instructions: the AND
60 AND operation can be implemented with pand, andpd, or andps. This 57 operation can be implemented with pand, andpd, or andps. This pattern also
61 pattern also holds for ANDN, OR, and XOR. 58 holds for ANDN, OR, and XOR.
62 59
63 * Vector "mov" instructions can be diversified (eg. movdqu instead of 60 * Vector "mov" instructions can be diversified (eg. movdqu instead of movups)
64 movups) at the cost of a possible performance penalty. 61 at the cost of a possible performance penalty.
65 62
66 * Scalar FP arithmetic can be diversified by performing the operations 63 * Scalar FP arithmetic can be diversified by performing the operations with the
67 with the vector version of the instructions. 64 vector version of the instructions.
OLDNEW
« no previous file with comments | « src/PNaClTranslator.cpp ('k') | src/main.cpp » ('j') | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698