docs/DESIGN.rst - Issue 2217773003: Documentation for LCSE, LICM, Short-Circuit, Global-Splitting

Unified Diff: docs/DESIGN.rst

Issue 2217773003: Documentation for LCSE, LICM, Short-Circuit, Global-Splitting (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master

Patch Set: Created 4 years, 4 months ago

Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.

Jump to:

View side-by-side diff with in-line comments

Download patch

Index: docs/DESIGN.rst

diff --git a/docs/DESIGN.rst b/docs/DESIGN.rst

index 8264a7aeac74bb408850e665ef0032c5476dca40..73dcf23d0710a7f0fef8696d6ceea5abfd3dc99f 100644

--- a/docs/DESIGN.rst

+++ b/docs/DESIGN.rst

@@ -56,6 +56,8 @@ following (described in more detail below):

+-----------------------------------+-----------------------------+

| Loop nest analysis | |

+-----------------------------------+-----------------------------+

+| Local common sub-exp elimination | |

Jim Stichnoth 2016/08/05 06:20:00 I think you can spell out "subexpression" and (pai

manasijm 2016/08/05 06:57:09 Won't be that painful! I might not be an emacs nin

manasijm 2016/08/05 17:34:44 Done.

++-----------------------------------+-----------------------------+

| Address mode inference | |

+-----------------------------------+-----------------------------+

| Read-modify-write (RMW) transform | |

@@ -518,6 +520,8 @@ Currently, the ``O2`` lowering recipe is the following:

- Loop nest analysis

+- Local common sub-expression elimination

Jim Stichnoth 2016/08/05 06:20:00 "Subexpression" is usually written without the hyp

manasijm 2016/08/05 17:34:44 Done.

- Address mode inference

- Read-modify-write (RMW) transformation

@@ -698,6 +702,102 @@ that the Greedy allocator also improved maintainability because a lot of hacks

could be removed, but Subzero is probably not yet to that level of hacks, and is

less likely to see that particular benefit.)

+Local common sub-expression elimination

+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+The Local CSE implementation goes through each instruction and record ``Seen``

Jim Stichnoth 2016/08/05 06:20:00 records

manasijm 2016/08/05 17:34:44 Done.

+instructions in a ``CfgUnorderedSet``. The equality comparator for this

Jim Stichnoth 2016/08/05 06:20:00 I don't think the level of detail in this document

manasijm 2016/08/05 17:34:44 Done.

+container ignores the destination of the instructions. That means ``A = X + Y``

+and ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows

+us to detect common sub-expressions.

+Whenever a repetition is detected, the redundant variables are stored in a

+container mapping the replacee to the replacement. In the case above, it would

+be ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``.

+At any point if a variable that has an entry in the replacement table is

+encountered, it is replaced with the variable it is mapped to. This ensures that

+the redundant variables will not have any uses in the basic block, allowing

+dead code elimination to clean up the redundant instruction.

+With SSA, the information stored is never invalidated. However, non-SSA input is

+supported with the -lcse=no-ssa option. This has to maintain some extra

Jim Stichnoth 2016/08/05 06:19:59 ``-lcse=no-ssa``

manasijm 2016/08/05 17:34:43 Done.

+dependency information to ensure proper invalidation on variable assignment.

+This is not rigorously tested because this pass is run at an early stage where

+it is safe to assume variables have a single definition. This is not enabled by

+default because it bumps the compile time overhead from 2% to 6%.

+Loop invariant code motion

Jim Stichnoth 2016/08/05 06:20:00 I think it's more common to hyphenate "loop-invari

manasijm 2016/08/05 17:34:44 Done.

+^^^^^^^^^^^^^^^^^^^^^^^^^^

+This pass utilizes the loop analysis information to hoist invariant instructions

+to loop pre-headers. A loop must have a single entry node (header) and that node

+must have a single external predecessor for this optimization to work, as it is

+currently implemented.

+The pass works by iterating over all instructions in the loop until the set of

+invariant instructions converge. In each iteration, a non-invariant instruction

Jim Stichnoth 2016/08/05 06:20:00 converges

manasijm 2016/08/05 17:34:44 Done.

+involving only constants or a variable known to be invariant is added to the

+result set. The destination variable of that instruction is added to the set of

+variables known to be invariant (which is initialized with the constant args).

+Improving the loop-analysis infrastructure is likely to have significant effect

Jim Stichnoth 2016/08/05 06:20:00 maybe s/effect/impact/ ?

manasijm 2016/08/05 17:34:43 Done.

+on this optimization. Inserting an extra node to act as the pre-header when the

+header has multiple incoming edges from outside could also be good idea.

Jim Stichnoth 2016/08/05 06:20:00 be a good

manasijm 2016/08/05 17:34:44 Done.

+Expanding the initial invariant variable set to contain all variables that do

+not have definitions inside the loop does not seem to improve anything.

+Short Circuit Evaluation

Jim Stichnoth 2016/08/05 06:20:00 For consistency, only capitalize the first word of

manasijm 2016/08/05 17:34:43 Done.

+^^^^^^^^^^^^^^^^^^^^^^^^

+Short Circuit Evaluation splits nodes and introduces early jumps when the result

Jim Stichnoth 2016/08/05 06:19:59 I would also not capitalize all these words. To s

manasijm 2016/08/05 17:34:44 Done.

+of a logical operation can be determined early and there are no side effects of

Jim Stichnoth 2016/08/05 06:20:00 maybe "no observable side effects" ?

manasijm 2016/08/05 17:34:44 Done.

+skipping the rest of the computation. The instructions considered backwards from

+the end of the basic blocks.

+When a definition of a variable involved in a conditional jump is found, an

Jim Stichnoth 2016/08/05 06:20:00 reflow paragraph, or make it a new paragraph

manasijm 2016/08/05 17:34:44 Done.

+extra jump can be inserted in that location, moving the rest of the instructions

+in the node to a newly inserted node. Consider this example::

+ __N :

+ a = <something>

+ Instruction 1 without side effect

+ ... b = <something> ...

+ Instruction N without side effect

+ t1 = or a b

+ br t1 __X __Y

+is transformed to ::

+ __N :

+ a = <something>

+ br a __X __N_ext

+ __N_ext :

+ Instruction 1 without side effect

+ ... b = <something> ...

+ Instruction N without side effect

+ br b __X __Y

+The logic for AND is analogous, the only difference is that the early jump is

+facilitated by a ``false`` value instead of ``true``.

+Global Variable Splitting

+^^^^^^^^^^^^^^^^^^^^^^^^^

+Global variable splitting (-split-global-vars) is run after register allocation.

Jim Stichnoth 2016/08/05 06:19:59 ``-split-global-vars``

manasijm 2016/08/05 17:34:44 Done.

+It works on the variables that did not manage to get registers (but are allowed

+to) and decomposes their live ranges into the individual segments (which span a

+single node at most). New variables are created (but not yet used) with these

+smaller live ranges and the register allocator is run again. This is not

+inefficient as the old variables that already had registers are now considered

+pre-colored.

+The new variables that get registers replace their parent variables for their

+portion of its (parent's) live range. A copy from the old variable to the new

+is introduced before the first use and the reverse after the last def in the

+live range.

Basic phi lowering

^^^^^^^^^^^^^^^^^^

« no previous file with comments | « no previous file | no next file » | no next file with comments »