docs/DESIGN.rst - Issue 2069953004: Subzero: Design doc: Include regalloc in Om1 pass list.

Side by Side Diff: docs/DESIGN.rst

Issue 2069953004: Subzero: Design doc: Include regalloc in Om1 pass list. (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: Created 4 years, 6 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View unified diff | Download patch

« no previous file with comments | « no previous file | no next file » | no next file with comments »

OLD	NEW
1 Design of the Subzero fast code generator	1 Design of the Subzero fast code generator

2 =========================================	2 =========================================

3	3

4 Introduction	4 Introduction

5 ------------	5 ------------

6	6

7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes	7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes

8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the	8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the

9 PNaCl toolchain to compile their application to architecture-neutral PNaCl	9 PNaCl toolchain to compile their application to architecture-neutral PNaCl

10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as	10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as

(...skipping 31 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
42	42

43 Or, improve translator performance to something more reasonable.	43 Or, improve translator performance to something more reasonable.

44	44

45 This document describes Subzero's attempt to improve translation speed by an	45 This document describes Subzero's attempt to improve translation speed by an

46 order of magnitude while rivaling LLVM's code quality. Subzero does this	46 order of magnitude while rivaling LLVM's code quality. Subzero does this

47 through minimal IR layering, lean data structures and passes, and a careful	47 through minimal IR layering, lean data structures and passes, and a careful

48 selection of fast optimization passes. It has two optimization recipes: full	48 selection of fast optimization passes. It has two optimization recipes: full

49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the	49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the

50 following (described in more detail below):	50 following (described in more detail below):

51	51

52 +-----------------------------------+-----------------------+	52 +-----------------------------------+-----------------------------+

53 \| O2 recipe \| Om1 recipe \|	53 \| O2 recipe \| Om1 recipe \|

54 +===================================+=======================+	54 +===================================+=============================+

55 \| Parse .pexe file \| Parse .pexe file \|	55 \| Parse .pexe file \| Parse .pexe file \|

56 +-----------------------------------+-----------------------+	56 +-----------------------------------+-----------------------------+

57 \| Loop nest analysis \| \|	57 \| Loop nest analysis \| \|

58 +-----------------------------------+-----------------------+	58 +-----------------------------------+-----------------------------+

59 \| Address mode inference \| \|	59 \| Address mode inference \| \|

60 +-----------------------------------+-----------------------+	60 +-----------------------------------+-----------------------------+

61 \| Read-modify-write (RMW) transform \| \|	61 \| Read-modify-write (RMW) transform \| \|

62 +-----------------------------------+-----------------------+	62 +-----------------------------------+-----------------------------+

63 \| Basic liveness analysis \| \|	63 \| Basic liveness analysis \| \|

64 +-----------------------------------+-----------------------+	64 +-----------------------------------+-----------------------------+

65 \| Load optimization \| \|	65 \| Load optimization \| \|

66 +-----------------------------------+-----------------------+	66 +-----------------------------------+-----------------------------+

67 \| \| Phi lowering (simple) \|	67 \| \| Phi lowering (simple) \|

68 +-----------------------------------+-----------------------+	68 +-----------------------------------+-----------------------------+

69 \| Target lowering \| Target lowering \|	69 \| Target lowering \| Target lowering \|

70 +-----------------------------------+-----------------------+	70 +-----------------------------------+-----------------------------+

71 \| Full liveness analysis \| \|	71 \| Full liveness analysis \| \|

72 +-----------------------------------+-----------------------+	72 +-----------------------------------+-----------------------------+

73 \| Register allocation \| \|	73 \| Register allocation \| Minimal register allocation \|

74 +-----------------------------------+-----------------------+	74 +-----------------------------------+-----------------------------+

75 \| Phi lowering (advanced) \| \|	75 \| Phi lowering (advanced) \| \|

76 +-----------------------------------+-----------------------+	76 +-----------------------------------+-----------------------------+

77 \| Post-phi register allocation \| \|	77 \| Post-phi register allocation \| \|

78 +-----------------------------------+-----------------------+	78 +-----------------------------------+-----------------------------+

79 \| Branch optimization \| \|	79 \| Branch optimization \| \|

80 +-----------------------------------+-----------------------+	80 +-----------------------------------+-----------------------------+

81 \| Code emission \| Code emission \|	81 \| Code emission \| Code emission \|

82 +-----------------------------------+-----------------------+	82 +-----------------------------------+-----------------------------+

83	83

84 Goals	84 Goals

85 =====	85 =====

86	86

87 Translation speed	87 Translation speed

88 -----------------	88 -----------------

89	89

90 We'd like to be able to translate a ``.pexe`` file as fast as download speed.	90 We'd like to be able to translate a ``.pexe`` file as fast as download speed.

91 Any faster is in a sense wasted effort. Download speed varies greatly, but	91 Any faster is in a sense wasted effort. Download speed varies greatly, but

92 we'll arbitrarily say 1 MB/sec. We'll pick the ARM A15 CPU as the example of a	92 we'll arbitrarily say 1 MB/sec. We'll pick the ARM A15 CPU as the example of a

(...skipping 1392 matching lines...) Expand 10 before \| Expand all \| Expand 10 after Loading...
1485 This could be mitigated by switching these to the CFG-local allocator.	1485 This could be mitigated by switching these to the CFG-local allocator.

1486	1486

1487 Third, multithreading may make the default allocator strategy more expensive.	1487 Third, multithreading may make the default allocator strategy more expensive.

1488 In a single-threaded environment, a pass will allocate its containers, run the	1488 In a single-threaded environment, a pass will allocate its containers, run the

1489 pass, and deallocate the containers. This results in stack-like allocation	1489 pass, and deallocate the containers. This results in stack-like allocation

1490 behavior and makes the heap free list easier to manage, with less heap	1490 behavior and makes the heap free list easier to manage, with less heap

1491 fragmentation. But when multithreading is added, the allocations and	1491 fragmentation. But when multithreading is added, the allocations and

1492 deallocations become much less stack-like, making allocation and deallocation	1492 deallocations become much less stack-like, making allocation and deallocation

1493 operations individually more expensive. Again, this could be mitigated by	1493 operations individually more expensive. Again, this could be mitigated by

1494 switching these to the CFG-local allocator.	1494 switching these to the CFG-local allocator.

OLD	NEW