Chromium Code Reviews| Index: docs/DESIGN.rst |
| diff --git a/docs/DESIGN.rst b/docs/DESIGN.rst |
| index 8264a7aeac74bb408850e665ef0032c5476dca40..73dcf23d0710a7f0fef8696d6ceea5abfd3dc99f 100644 |
| --- a/docs/DESIGN.rst |
| +++ b/docs/DESIGN.rst |
| @@ -56,6 +56,8 @@ following (described in more detail below): |
| +-----------------------------------+-----------------------------+ |
| | Loop nest analysis | | |
| +-----------------------------------+-----------------------------+ |
| +| Local common sub-exp elimination | | |
|
Jim Stichnoth
2016/08/05 06:20:00
I think you can spell out "subexpression" and (pai
manasijm
2016/08/05 06:57:09
Won't be that painful!
I might not be an emacs nin
manasijm
2016/08/05 17:34:44
Done.
|
| ++-----------------------------------+-----------------------------+ |
| | Address mode inference | | |
| +-----------------------------------+-----------------------------+ |
| | Read-modify-write (RMW) transform | | |
| @@ -518,6 +520,8 @@ Currently, the ``O2`` lowering recipe is the following: |
| - Loop nest analysis |
| +- Local common sub-expression elimination |
|
Jim Stichnoth
2016/08/05 06:20:00
"Subexpression" is usually written without the hyp
manasijm
2016/08/05 17:34:44
Done.
|
| + |
| - Address mode inference |
| - Read-modify-write (RMW) transformation |
| @@ -698,6 +702,102 @@ that the Greedy allocator also improved maintainability because a lot of hacks |
| could be removed, but Subzero is probably not yet to that level of hacks, and is |
| less likely to see that particular benefit.) |
| +Local common sub-expression elimination |
| +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| + |
| +The Local CSE implementation goes through each instruction and record ``Seen`` |
|
Jim Stichnoth
2016/08/05 06:20:00
records
manasijm
2016/08/05 17:34:44
Done.
|
| +instructions in a ``CfgUnorderedSet``. The equality comparator for this |
|
Jim Stichnoth
2016/08/05 06:20:00
I don't think the level of detail in this document
manasijm
2016/08/05 17:34:44
Done.
|
| +container ignores the destination of the instructions. That means ``A = X + Y`` |
| +and ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows |
| +us to detect common sub-expressions. |
| + |
| +Whenever a repetition is detected, the redundant variables are stored in a |
| +container mapping the replacee to the replacement. In the case above, it would |
| +be ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``. |
| + |
| +At any point if a variable that has an entry in the replacement table is |
| +encountered, it is replaced with the variable it is mapped to. This ensures that |
| +the redundant variables will not have any uses in the basic block, allowing |
| +dead code elimination to clean up the redundant instruction. |
| + |
| +With SSA, the information stored is never invalidated. However, non-SSA input is |
| +supported with the -lcse=no-ssa option. This has to maintain some extra |
|
Jim Stichnoth
2016/08/05 06:19:59
``-lcse=no-ssa``
manasijm
2016/08/05 17:34:43
Done.
|
| +dependency information to ensure proper invalidation on variable assignment. |
| +This is not rigorously tested because this pass is run at an early stage where |
| +it is safe to assume variables have a single definition. This is not enabled by |
| +default because it bumps the compile time overhead from 2% to 6%. |
| + |
| +Loop invariant code motion |
|
Jim Stichnoth
2016/08/05 06:20:00
I think it's more common to hyphenate "loop-invari
manasijm
2016/08/05 17:34:44
Done.
|
| +^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| + |
| +This pass utilizes the loop analysis information to hoist invariant instructions |
| +to loop pre-headers. A loop must have a single entry node (header) and that node |
| +must have a single external predecessor for this optimization to work, as it is |
| +currently implemented. |
| + |
| +The pass works by iterating over all instructions in the loop until the set of |
| +invariant instructions converge. In each iteration, a non-invariant instruction |
|
Jim Stichnoth
2016/08/05 06:20:00
converges
manasijm
2016/08/05 17:34:44
Done.
|
| +involving only constants or a variable known to be invariant is added to the |
| +result set. The destination variable of that instruction is added to the set of |
| +variables known to be invariant (which is initialized with the constant args). |
| + |
| +Improving the loop-analysis infrastructure is likely to have significant effect |
|
Jim Stichnoth
2016/08/05 06:20:00
maybe s/effect/impact/ ?
manasijm
2016/08/05 17:34:43
Done.
|
| +on this optimization. Inserting an extra node to act as the pre-header when the |
| +header has multiple incoming edges from outside could also be good idea. |
|
Jim Stichnoth
2016/08/05 06:20:00
be a good
manasijm
2016/08/05 17:34:44
Done.
|
| +Expanding the initial invariant variable set to contain all variables that do |
| +not have definitions inside the loop does not seem to improve anything. |
| + |
| +Short Circuit Evaluation |
|
Jim Stichnoth
2016/08/05 06:20:00
For consistency, only capitalize the first word of
manasijm
2016/08/05 17:34:43
Done.
|
| +^^^^^^^^^^^^^^^^^^^^^^^^ |
| + |
| +Short Circuit Evaluation splits nodes and introduces early jumps when the result |
|
Jim Stichnoth
2016/08/05 06:19:59
I would also not capitalize all these words. To s
manasijm
2016/08/05 17:34:44
Done.
|
| +of a logical operation can be determined early and there are no side effects of |
|
Jim Stichnoth
2016/08/05 06:20:00
maybe "no observable side effects" ?
manasijm
2016/08/05 17:34:44
Done.
|
| +skipping the rest of the computation. The instructions considered backwards from |
| +the end of the basic blocks. |
| +When a definition of a variable involved in a conditional jump is found, an |
|
Jim Stichnoth
2016/08/05 06:20:00
reflow paragraph, or make it a new paragraph
manasijm
2016/08/05 17:34:44
Done.
|
| +extra jump can be inserted in that location, moving the rest of the instructions |
| +in the node to a newly inserted node. Consider this example:: |
| + |
| + __N : |
| + a = <something> |
| + Instruction 1 without side effect |
| + ... b = <something> ... |
| + Instruction N without side effect |
| + t1 = or a b |
| + br t1 __X __Y |
| + |
| +is transformed to :: |
| + |
| + __N : |
| + a = <something> |
| + br a __X __N_ext |
| + |
| + __N_ext : |
| + Instruction 1 without side effect |
| + ... b = <something> ... |
| + Instruction N without side effect |
| + br b __X __Y |
| + |
| +The logic for AND is analogous, the only difference is that the early jump is |
| +facilitated by a ``false`` value instead of ``true``. |
| + |
| +Global Variable Splitting |
| +^^^^^^^^^^^^^^^^^^^^^^^^^ |
| + |
| +Global variable splitting (-split-global-vars) is run after register allocation. |
|
Jim Stichnoth
2016/08/05 06:19:59
``-split-global-vars``
manasijm
2016/08/05 17:34:44
Done.
|
| +It works on the variables that did not manage to get registers (but are allowed |
| +to) and decomposes their live ranges into the individual segments (which span a |
| +single node at most). New variables are created (but not yet used) with these |
| +smaller live ranges and the register allocator is run again. This is not |
| +inefficient as the old variables that already had registers are now considered |
| +pre-colored. |
| + |
| +The new variables that get registers replace their parent variables for their |
| +portion of its (parent's) live range. A copy from the old variable to the new |
| +is introduced before the first use and the reverse after the last def in the |
| +live range. |
| + |
| + |
| Basic phi lowering |
| ^^^^^^^^^^^^^^^^^^ |