Index: docs/DESIGN.rst |
diff --git a/docs/DESIGN.rst b/docs/DESIGN.rst |
index 8264a7aeac74bb408850e665ef0032c5476dca40..73dcf23d0710a7f0fef8696d6ceea5abfd3dc99f 100644 |
--- a/docs/DESIGN.rst |
+++ b/docs/DESIGN.rst |
@@ -56,6 +56,8 @@ following (described in more detail below): |
+-----------------------------------+-----------------------------+ |
| Loop nest analysis | | |
+-----------------------------------+-----------------------------+ |
+| Local common sub-exp elimination | | |
Jim Stichnoth
2016/08/05 06:20:00
I think you can spell out "subexpression" and (pai
manasijm
2016/08/05 06:57:09
Won't be that painful!
I might not be an emacs nin
manasijm
2016/08/05 17:34:44
Done.
|
++-----------------------------------+-----------------------------+ |
| Address mode inference | | |
+-----------------------------------+-----------------------------+ |
| Read-modify-write (RMW) transform | | |
@@ -518,6 +520,8 @@ Currently, the ``O2`` lowering recipe is the following: |
- Loop nest analysis |
+- Local common sub-expression elimination |
Jim Stichnoth
2016/08/05 06:20:00
"Subexpression" is usually written without the hyp
manasijm
2016/08/05 17:34:44
Done.
|
+ |
- Address mode inference |
- Read-modify-write (RMW) transformation |
@@ -698,6 +702,102 @@ that the Greedy allocator also improved maintainability because a lot of hacks |
could be removed, but Subzero is probably not yet to that level of hacks, and is |
less likely to see that particular benefit.) |
+Local common sub-expression elimination |
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
+ |
+The Local CSE implementation goes through each instruction and record ``Seen`` |
Jim Stichnoth
2016/08/05 06:20:00
records
manasijm
2016/08/05 17:34:44
Done.
|
+instructions in a ``CfgUnorderedSet``. The equality comparator for this |
Jim Stichnoth
2016/08/05 06:20:00
I don't think the level of detail in this document
manasijm
2016/08/05 17:34:44
Done.
|
+container ignores the destination of the instructions. That means ``A = X + Y`` |
+and ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows |
+us to detect common sub-expressions. |
+ |
+Whenever a repetition is detected, the redundant variables are stored in a |
+container mapping the replacee to the replacement. In the case above, it would |
+be ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``. |
+ |
+At any point if a variable that has an entry in the replacement table is |
+encountered, it is replaced with the variable it is mapped to. This ensures that |
+the redundant variables will not have any uses in the basic block, allowing |
+dead code elimination to clean up the redundant instruction. |
+ |
+With SSA, the information stored is never invalidated. However, non-SSA input is |
+supported with the -lcse=no-ssa option. This has to maintain some extra |
Jim Stichnoth
2016/08/05 06:19:59
``-lcse=no-ssa``
manasijm
2016/08/05 17:34:43
Done.
|
+dependency information to ensure proper invalidation on variable assignment. |
+This is not rigorously tested because this pass is run at an early stage where |
+it is safe to assume variables have a single definition. This is not enabled by |
+default because it bumps the compile time overhead from 2% to 6%. |
+ |
+Loop invariant code motion |
Jim Stichnoth
2016/08/05 06:20:00
I think it's more common to hyphenate "loop-invari
manasijm
2016/08/05 17:34:44
Done.
|
+^^^^^^^^^^^^^^^^^^^^^^^^^^ |
+ |
+This pass utilizes the loop analysis information to hoist invariant instructions |
+to loop pre-headers. A loop must have a single entry node (header) and that node |
+must have a single external predecessor for this optimization to work, as it is |
+currently implemented. |
+ |
+The pass works by iterating over all instructions in the loop until the set of |
+invariant instructions converge. In each iteration, a non-invariant instruction |
Jim Stichnoth
2016/08/05 06:20:00
converges
manasijm
2016/08/05 17:34:44
Done.
|
+involving only constants or a variable known to be invariant is added to the |
+result set. The destination variable of that instruction is added to the set of |
+variables known to be invariant (which is initialized with the constant args). |
+ |
+Improving the loop-analysis infrastructure is likely to have significant effect |
Jim Stichnoth
2016/08/05 06:20:00
maybe s/effect/impact/ ?
manasijm
2016/08/05 17:34:43
Done.
|
+on this optimization. Inserting an extra node to act as the pre-header when the |
+header has multiple incoming edges from outside could also be good idea. |
Jim Stichnoth
2016/08/05 06:20:00
be a good
manasijm
2016/08/05 17:34:44
Done.
|
+Expanding the initial invariant variable set to contain all variables that do |
+not have definitions inside the loop does not seem to improve anything. |
+ |
+Short Circuit Evaluation |
Jim Stichnoth
2016/08/05 06:20:00
For consistency, only capitalize the first word of
manasijm
2016/08/05 17:34:43
Done.
|
+^^^^^^^^^^^^^^^^^^^^^^^^ |
+ |
+Short Circuit Evaluation splits nodes and introduces early jumps when the result |
Jim Stichnoth
2016/08/05 06:19:59
I would also not capitalize all these words. To s
manasijm
2016/08/05 17:34:44
Done.
|
+of a logical operation can be determined early and there are no side effects of |
Jim Stichnoth
2016/08/05 06:20:00
maybe "no observable side effects" ?
manasijm
2016/08/05 17:34:44
Done.
|
+skipping the rest of the computation. The instructions considered backwards from |
+the end of the basic blocks. |
+When a definition of a variable involved in a conditional jump is found, an |
Jim Stichnoth
2016/08/05 06:20:00
reflow paragraph, or make it a new paragraph
manasijm
2016/08/05 17:34:44
Done.
|
+extra jump can be inserted in that location, moving the rest of the instructions |
+in the node to a newly inserted node. Consider this example:: |
+ |
+ __N : |
+ a = <something> |
+ Instruction 1 without side effect |
+ ... b = <something> ... |
+ Instruction N without side effect |
+ t1 = or a b |
+ br t1 __X __Y |
+ |
+is transformed to :: |
+ |
+ __N : |
+ a = <something> |
+ br a __X __N_ext |
+ |
+ __N_ext : |
+ Instruction 1 without side effect |
+ ... b = <something> ... |
+ Instruction N without side effect |
+ br b __X __Y |
+ |
+The logic for AND is analogous, the only difference is that the early jump is |
+facilitated by a ``false`` value instead of ``true``. |
+ |
+Global Variable Splitting |
+^^^^^^^^^^^^^^^^^^^^^^^^^ |
+ |
+Global variable splitting (-split-global-vars) is run after register allocation. |
Jim Stichnoth
2016/08/05 06:19:59
``-split-global-vars``
manasijm
2016/08/05 17:34:44
Done.
|
+It works on the variables that did not manage to get registers (but are allowed |
+to) and decomposes their live ranges into the individual segments (which span a |
+single node at most). New variables are created (but not yet used) with these |
+smaller live ranges and the register allocator is run again. This is not |
+inefficient as the old variables that already had registers are now considered |
+pre-colored. |
+ |
+The new variables that get registers replace their parent variables for their |
+portion of its (parent's) live range. A copy from the old variable to the new |
+is introduced before the first use and the reverse after the last def in the |
+live range. |
+ |
+ |
Basic phi lowering |
^^^^^^^^^^^^^^^^^^ |