Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(2)

Unified Diff: src/README.SIMD.rst

Issue 477773003: Subzero: Start a list of SIMD improvement ideas. (Closed) Base URL: https://gerrit.chromium.org/gerrit/p/native_client/pnacl-subzero.git@master
Patch Set: Change name Created 6 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: src/README.SIMD.rst
diff --git a/src/README.SIMD.rst b/src/README.SIMD.rst
new file mode 100644
index 0000000000000000000000000000000000000000..58f25d96b1fa4285267267eaa56434df0afe31f0
--- /dev/null
+++ b/src/README.SIMD.rst
@@ -0,0 +1,67 @@
+Missing support
+===============
+
+* The PNaCl LLVM backend expands shufflevector operations into
+ sequences of insertelement and extractelement operations. For
+ instance:
+
+ define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
+ entry:
+ %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i32 5, i32 0, i32 1>
+ ret <4 x i32> %res
+ }
+
+ gets expanded into:
+
+ define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {
+ entry:
+ %0 = extractelement <4 x i32> %arg2, i32 0
+ %1 = insertelement <4 x i32> undef, i32 %0, i32 0
+ %2 = extractelement <4 x i32> %arg2, i32 1
+ %3 = insertelement <4 x i32> %1, i32 %2, i32 1
+ %4 = extractelement <4 x i32> %arg1, i32 0
+ %5 = insertelement <4 x i32> %3, i32 %4, i32 2
+ %6 = extractelement <4 x i32> %arg1, i32 1
+ %7 = insertelement <4 x i32> %5, i32 %6, i32 3
+ ret <4 x i32> %7
+ }
+
+ Subzero should recognize these sequences and recombine them into
+ shuffle operations where appropriate.
+
+* Add support for vector constants in the backend. The current code
+ materializes the vector constants it needs (eg. for performing icmp
+ on unsigned operands) using register operations, but this should be
+ changed to loading them from a constant pool if the register
+ initialization is too complicated (such as in
+ TargetX8632::makeVectorOfHighOrderBits()).
+
+* [x86 specific] llvm-mc does not allow lea to take a mem128 memory
+ operand when assembling x86-32 code. The current
+ InstX8632Lea::emit() code uses Variable::asType() to convert any
+ mem128 Variables into a compatible memory operand type. However, the
+ emit code does not do any conversions of OperandX8632Mem, so if an
+ OperandX8632Mem is passed to lea as mem128 the resulting code will
+ not assemble. One way to fix this is by implementing
+ OperandX8632Mem::asType().
+
+* [x86 specific] Lower shl with <4 x i32> using some clever float
+ conversion:
+http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html
+
+* [x86 specific] Add support for using aligned mov operations
+ (movaps). This will require passing alignment information to loads
+ and stores.
+
+x86 SIMD Diversification
+========================
+
+* Vector "bitwise" operations have several variant instructions: the
+ AND operation can be implemented with pand, andpd, or andps. This
+ pattern also holds for ANDN, OR, and XOR.
+
+* Vector "mov" instructions can be diversified (eg. movdqu instead of
+ movups) at the cost of a possible performance penalty.
+
+* Scalar FP arithmetic can be diversified by performing the operations
+ with the vector version of the instructions.
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698