OLD | NEW |
1 Missing support | 1 Missing support |
2 =============== | 2 =============== |
3 | 3 |
4 * The PNaCl LLVM backend expands shufflevector operations into | 4 * The PNaCl LLVM backend expands shufflevector operations into sequences of |
5 sequences of insertelement and extractelement operations. For | 5 insertelement and extractelement operations. For instance: |
6 instance: | |
7 | 6 |
8 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { | 7 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { |
9 entry: | 8 entry: |
10 %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i
32 5, i32 0, i32 1> | 9 %res = shufflevector <4 x i32> %arg1, |
| 10 <4 x i32> %arg2, |
| 11 <4 x i32> <i32 4, i32 5, i32 0, i32 1> |
11 ret <4 x i32> %res | 12 ret <4 x i32> %res |
12 } | 13 } |
13 | 14 |
14 gets expanded into: | 15 gets expanded into: |
15 | 16 |
16 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { | 17 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) { |
17 entry: | 18 entry: |
18 %0 = extractelement <4 x i32> %arg2, i32 0 | 19 %0 = extractelement <4 x i32> %arg2, i32 0 |
19 %1 = insertelement <4 x i32> undef, i32 %0, i32 0 | 20 %1 = insertelement <4 x i32> undef, i32 %0, i32 0 |
20 %2 = extractelement <4 x i32> %arg2, i32 1 | 21 %2 = extractelement <4 x i32> %arg2, i32 1 |
21 %3 = insertelement <4 x i32> %1, i32 %2, i32 1 | 22 %3 = insertelement <4 x i32> %1, i32 %2, i32 1 |
22 %4 = extractelement <4 x i32> %arg1, i32 0 | 23 %4 = extractelement <4 x i32> %arg1, i32 0 |
23 %5 = insertelement <4 x i32> %3, i32 %4, i32 2 | 24 %5 = insertelement <4 x i32> %3, i32 %4, i32 2 |
24 %6 = extractelement <4 x i32> %arg1, i32 1 | 25 %6 = extractelement <4 x i32> %arg1, i32 1 |
25 %7 = insertelement <4 x i32> %5, i32 %6, i32 3 | 26 %7 = insertelement <4 x i32> %5, i32 %6, i32 3 |
26 ret <4 x i32> %7 | 27 ret <4 x i32> %7 |
27 } | 28 } |
28 | 29 |
29 Subzero should recognize these sequences and recombine them into | 30 Subzero should recognize these sequences and recombine them into |
30 shuffle operations where appropriate. | 31 shuffle operations where appropriate. |
31 | 32 |
32 * Add support for vector constants in the backend. The current code | 33 * Add support for vector constants in the backend. The current code |
33 materializes the vector constants it needs (eg. for performing icmp | 34 materializes the vector constants it needs (eg. for performing icmp on |
34 on unsigned operands) using register operations, but this should be | 35 unsigned operands) using register operations, but this should be changed to |
35 changed to loading them from a constant pool if the register | 36 loading them from a constant pool if the register initialization is too |
36 initialization is too complicated (such as in | 37 complicated (such as in TargetX8632::makeVectorOfHighOrderBits()). |
37 TargetX8632::makeVectorOfHighOrderBits()). | |
38 | 38 |
39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory | 39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory operand |
40 operand when assembling x86-32 code. The current | 40 when assembling x86-32 code. The current InstX8632Lea::emit() code uses |
41 InstX8632Lea::emit() code uses Variable::asType() to convert any | 41 Variable::asType() to convert any mem128 Variables into a compatible memory |
42 mem128 Variables into a compatible memory operand type. However, the | 42 operand type. However, the emit code does not do any conversions of |
43 emit code does not do any conversions of OperandX8632Mem, so if an | 43 OperandX8632Mem, so if an OperandX8632Mem is passed to lea as mem128 the |
44 OperandX8632Mem is passed to lea as mem128 the resulting code will | 44 resulting code will not assemble. One way to fix this is by implementing |
45 not assemble. One way to fix this is by implementing | |
46 OperandX8632Mem::asType(). | 45 OperandX8632Mem::asType(). |
47 | 46 |
48 * [x86 specific] Lower shl with <4 x i32> using some clever float | 47 * [x86 specific] Lower shl with <4 x i32> using some clever float conversion: |
49 conversion: | |
50 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html | 48 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html |
51 | 49 |
52 * [x86 specific] Add support for using aligned mov operations | 50 * [x86 specific] Add support for using aligned mov operations (movaps). This |
53 (movaps). This will require passing alignment information to loads | 51 will require passing alignment information to loads and stores. |
54 and stores. | |
55 | 52 |
56 x86 SIMD Diversification | 53 x86 SIMD Diversification |
57 ======================== | 54 ======================== |
58 | 55 |
59 * Vector "bitwise" operations have several variant instructions: the | 56 * Vector "bitwise" operations have several variant instructions: the AND |
60 AND operation can be implemented with pand, andpd, or andps. This | 57 operation can be implemented with pand, andpd, or andps. This pattern also |
61 pattern also holds for ANDN, OR, and XOR. | 58 holds for ANDN, OR, and XOR. |
62 | 59 |
63 * Vector "mov" instructions can be diversified (eg. movdqu instead of | 60 * Vector "mov" instructions can be diversified (eg. movdqu instead of movups) |
64 movups) at the cost of a possible performance penalty. | 61 at the cost of a possible performance penalty. |
65 | 62 |
66 * Scalar FP arithmetic can be diversified by performing the operations | 63 * Scalar FP arithmetic can be diversified by performing the operations with the |
67 with the vector version of the instructions. | 64 vector version of the instructions. |
OLD | NEW |