src/README.SIMD.rst - Issue 1341423002: Reflow comments to use the full width.

Side by Side Diff: src/README.SIMD.rst

Issue 1341423002: Reflow comments to use the full width. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: Fix spelling and rebase Created 5 years, 3 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

OLD	NEW
1 Missing support	1 Missing support

2 ===============	2 ===============

3	3

4 * The PNaCl LLVM backend expands shufflevector operations into	4 * The PNaCl LLVM backend expands shufflevector operations into sequences of

5 sequences of insertelement and extractelement operations. For	5 insertelement and extractelement operations. For instance:

6 instance:

7	6

8 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {	7 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

9 entry:	8 entry:

10 %res = shufflevector <4 x i32> %arg1, <4 x i32> %arg2, <4 x i32> <i32 4, i 32 5, i32 0, i32 1>	9 %res = shufflevector <4 x i32> %arg1,

	10 <4 x i32> %arg2,

	11 <4 x i32> <i32 4, i32 5, i32 0, i32 1>

11 ret <4 x i32> %res	12 ret <4 x i32> %res

12 }	13 }

13	14

14 gets expanded into:	15 gets expanded into:

15	16

16 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {	17 define <4 x i32> @shuffle(<4 x i32> %arg1, <4 x i32> %arg2) {

17 entry:	18 entry:

18 %0 = extractelement <4 x i32> %arg2, i32 0	19 %0 = extractelement <4 x i32> %arg2, i32 0

19 %1 = insertelement <4 x i32> undef, i32 %0, i32 0	20 %1 = insertelement <4 x i32> undef, i32 %0, i32 0

20 %2 = extractelement <4 x i32> %arg2, i32 1	21 %2 = extractelement <4 x i32> %arg2, i32 1

21 %3 = insertelement <4 x i32> %1, i32 %2, i32 1	22 %3 = insertelement <4 x i32> %1, i32 %2, i32 1

22 %4 = extractelement <4 x i32> %arg1, i32 0	23 %4 = extractelement <4 x i32> %arg1, i32 0

23 %5 = insertelement <4 x i32> %3, i32 %4, i32 2	24 %5 = insertelement <4 x i32> %3, i32 %4, i32 2

24 %6 = extractelement <4 x i32> %arg1, i32 1	25 %6 = extractelement <4 x i32> %arg1, i32 1

25 %7 = insertelement <4 x i32> %5, i32 %6, i32 3	26 %7 = insertelement <4 x i32> %5, i32 %6, i32 3

26 ret <4 x i32> %7	27 ret <4 x i32> %7

27 }	28 }

28	29

29 Subzero should recognize these sequences and recombine them into	30 Subzero should recognize these sequences and recombine them into

30 shuffle operations where appropriate.	31 shuffle operations where appropriate.

31	32

32 * Add support for vector constants in the backend. The current code	33 * Add support for vector constants in the backend. The current code

33 materializes the vector constants it needs (eg. for performing icmp	34 materializes the vector constants it needs (eg. for performing icmp on

34 on unsigned operands) using register operations, but this should be	35 unsigned operands) using register operations, but this should be changed to

35 changed to loading them from a constant pool if the register	36 loading them from a constant pool if the register initialization is too

36 initialization is too complicated (such as in	37 complicated (such as in TargetX8632::makeVectorOfHighOrderBits()).

37 TargetX8632::makeVectorOfHighOrderBits()).

38	38

39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory	39 * [x86 specific] llvm-mc does not allow lea to take a mem128 memory operand

40 operand when assembling x86-32 code. The current	40 when assembling x86-32 code. The current InstX8632Lea::emit() code uses

41 InstX8632Lea::emit() code uses Variable::asType() to convert any	41 Variable::asType() to convert any mem128 Variables into a compatible memory

42 mem128 Variables into a compatible memory operand type. However, the	42 operand type. However, the emit code does not do any conversions of

43 emit code does not do any conversions of OperandX8632Mem, so if an	43 OperandX8632Mem, so if an OperandX8632Mem is passed to lea as mem128 the

44 OperandX8632Mem is passed to lea as mem128 the resulting code will	44 resulting code will not assemble. One way to fix this is by implementing

45 not assemble. One way to fix this is by implementing

46 OperandX8632Mem::asType().	45 OperandX8632Mem::asType().

47	46

48 * [x86 specific] Lower shl with <4 x i32> using some clever float	47 * [x86 specific] Lower shl with <4 x i32> using some clever float conversion:

49 conversion:

50 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html	48 http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20100726/105087.html

51	49

52 * [x86 specific] Add support for using aligned mov operations	50 * [x86 specific] Add support for using aligned mov operations (movaps). This

53 (movaps). This will require passing alignment information to loads	51 will require passing alignment information to loads and stores.

54 and stores.

55	52

56 x86 SIMD Diversification	53 x86 SIMD Diversification

57 ========================	54 ========================

58	55

59 * Vector "bitwise" operations have several variant instructions: the	56 * Vector "bitwise" operations have several variant instructions: the AND

60 AND operation can be implemented with pand, andpd, or andps. This	57 operation can be implemented with pand, andpd, or andps. This pattern also

61 pattern also holds for ANDN, OR, and XOR.	58 holds for ANDN, OR, and XOR.

62	59

63 * Vector "mov" instructions can be diversified (eg. movdqu instead of	60 * Vector "mov" instructions can be diversified (eg. movdqu instead of movups)

64 movups) at the cost of a possible performance penalty.	61 at the cost of a possible performance penalty.

65	62

66 * Scalar FP arithmetic can be diversified by performing the operations	63 * Scalar FP arithmetic can be diversified by performing the operations with the

67 with the vector version of the instructions.	64 vector version of the instructions.

OLD	NEW

« no previous file with comments | « src/PNaClTranslator.cpp ('k') | src/main.cpp » ('j') | no next file with comments »