Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(841)

Unified Diff: docs/DESIGN.rst

Issue 2217773003: Documentation for LCSE, LICM, Short-Circuit, Global-Splitting (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Patch Set: Created 4 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View side-by-side diff with in-line comments
Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
Index: docs/DESIGN.rst
diff --git a/docs/DESIGN.rst b/docs/DESIGN.rst
index 8264a7aeac74bb408850e665ef0032c5476dca40..73dcf23d0710a7f0fef8696d6ceea5abfd3dc99f 100644
--- a/docs/DESIGN.rst
+++ b/docs/DESIGN.rst
@@ -56,6 +56,8 @@ following (described in more detail below):
+-----------------------------------+-----------------------------+
| Loop nest analysis | |
+-----------------------------------+-----------------------------+
+| Local common sub-exp elimination | |
Jim Stichnoth 2016/08/05 06:20:00 I think you can spell out "subexpression" and (pai
manasijm 2016/08/05 06:57:09 Won't be that painful! I might not be an emacs nin
manasijm 2016/08/05 17:34:44 Done.
++-----------------------------------+-----------------------------+
| Address mode inference | |
+-----------------------------------+-----------------------------+
| Read-modify-write (RMW) transform | |
@@ -518,6 +520,8 @@ Currently, the ``O2`` lowering recipe is the following:
- Loop nest analysis
+- Local common sub-expression elimination
Jim Stichnoth 2016/08/05 06:20:00 "Subexpression" is usually written without the hyp
manasijm 2016/08/05 17:34:44 Done.
+
- Address mode inference
- Read-modify-write (RMW) transformation
@@ -698,6 +702,102 @@ that the Greedy allocator also improved maintainability because a lot of hacks
could be removed, but Subzero is probably not yet to that level of hacks, and is
less likely to see that particular benefit.)
+Local common sub-expression elimination
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The Local CSE implementation goes through each instruction and record ``Seen``
Jim Stichnoth 2016/08/05 06:20:00 records
manasijm 2016/08/05 17:34:44 Done.
+instructions in a ``CfgUnorderedSet``. The equality comparator for this
Jim Stichnoth 2016/08/05 06:20:00 I don't think the level of detail in this document
manasijm 2016/08/05 17:34:44 Done.
+container ignores the destination of the instructions. That means ``A = X + Y``
+and ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows
+us to detect common sub-expressions.
+
+Whenever a repetition is detected, the redundant variables are stored in a
+container mapping the replacee to the replacement. In the case above, it would
+be ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``.
+
+At any point if a variable that has an entry in the replacement table is
+encountered, it is replaced with the variable it is mapped to. This ensures that
+the redundant variables will not have any uses in the basic block, allowing
+dead code elimination to clean up the redundant instruction.
+
+With SSA, the information stored is never invalidated. However, non-SSA input is
+supported with the -lcse=no-ssa option. This has to maintain some extra
Jim Stichnoth 2016/08/05 06:19:59 ``-lcse=no-ssa``
manasijm 2016/08/05 17:34:43 Done.
+dependency information to ensure proper invalidation on variable assignment.
+This is not rigorously tested because this pass is run at an early stage where
+it is safe to assume variables have a single definition. This is not enabled by
+default because it bumps the compile time overhead from 2% to 6%.
+
+Loop invariant code motion
Jim Stichnoth 2016/08/05 06:20:00 I think it's more common to hyphenate "loop-invari
manasijm 2016/08/05 17:34:44 Done.
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This pass utilizes the loop analysis information to hoist invariant instructions
+to loop pre-headers. A loop must have a single entry node (header) and that node
+must have a single external predecessor for this optimization to work, as it is
+currently implemented.
+
+The pass works by iterating over all instructions in the loop until the set of
+invariant instructions converge. In each iteration, a non-invariant instruction
Jim Stichnoth 2016/08/05 06:20:00 converges
manasijm 2016/08/05 17:34:44 Done.
+involving only constants or a variable known to be invariant is added to the
+result set. The destination variable of that instruction is added to the set of
+variables known to be invariant (which is initialized with the constant args).
+
+Improving the loop-analysis infrastructure is likely to have significant effect
Jim Stichnoth 2016/08/05 06:20:00 maybe s/effect/impact/ ?
manasijm 2016/08/05 17:34:43 Done.
+on this optimization. Inserting an extra node to act as the pre-header when the
+header has multiple incoming edges from outside could also be good idea.
Jim Stichnoth 2016/08/05 06:20:00 be a good
manasijm 2016/08/05 17:34:44 Done.
+Expanding the initial invariant variable set to contain all variables that do
+not have definitions inside the loop does not seem to improve anything.
+
+Short Circuit Evaluation
Jim Stichnoth 2016/08/05 06:20:00 For consistency, only capitalize the first word of
manasijm 2016/08/05 17:34:43 Done.
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Short Circuit Evaluation splits nodes and introduces early jumps when the result
Jim Stichnoth 2016/08/05 06:19:59 I would also not capitalize all these words. To s
manasijm 2016/08/05 17:34:44 Done.
+of a logical operation can be determined early and there are no side effects of
Jim Stichnoth 2016/08/05 06:20:00 maybe "no observable side effects" ?
manasijm 2016/08/05 17:34:44 Done.
+skipping the rest of the computation. The instructions considered backwards from
+the end of the basic blocks.
+When a definition of a variable involved in a conditional jump is found, an
Jim Stichnoth 2016/08/05 06:20:00 reflow paragraph, or make it a new paragraph
manasijm 2016/08/05 17:34:44 Done.
+extra jump can be inserted in that location, moving the rest of the instructions
+in the node to a newly inserted node. Consider this example::
+
+ __N :
+ a = <something>
+ Instruction 1 without side effect
+ ... b = <something> ...
+ Instruction N without side effect
+ t1 = or a b
+ br t1 __X __Y
+
+is transformed to ::
+
+ __N :
+ a = <something>
+ br a __X __N_ext
+
+ __N_ext :
+ Instruction 1 without side effect
+ ... b = <something> ...
+ Instruction N without side effect
+ br b __X __Y
+
+The logic for AND is analogous, the only difference is that the early jump is
+facilitated by a ``false`` value instead of ``true``.
+
+Global Variable Splitting
+^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Global variable splitting (-split-global-vars) is run after register allocation.
Jim Stichnoth 2016/08/05 06:19:59 ``-split-global-vars``
manasijm 2016/08/05 17:34:44 Done.
+It works on the variables that did not manage to get registers (but are allowed
+to) and decomposes their live ranges into the individual segments (which span a
+single node at most). New variables are created (but not yet used) with these
+smaller live ranges and the register allocator is run again. This is not
+inefficient as the old variables that already had registers are now considered
+pre-colored.
+
+The new variables that get registers replace their parent variables for their
+portion of its (parent's) live range. A copy from the old variable to the new
+is introduced before the first use and the reverse after the last def in the
+live range.
+
+
Basic phi lowering
^^^^^^^^^^^^^^^^^^
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698