Chromium Code Reviews
chromiumcodereview-hr@appspot.gserviceaccount.com (chromiumcodereview-hr) | Please choose your nickname with Settings | Help | Chromium Project | Gerrit Changes | Sign out
(73)

Side by Side Diff: docs/DESIGN.rst

Issue 2217773003: Documentation for LCSE, LICM, Short-Circuit, Global-Splitting (Closed) Base URL: https://chromium.googlesource.com/native_client/pnacl-subzero.git@master
Patch Set: Created 4 years, 4 months ago
Use n/p to move between diff chunks; N/P to move between comments. Draft comments are only viewable by you.
Jump to:
View unified diff | Download patch
« no previous file with comments | « no previous file | no next file » | no next file with comments »
Toggle Intra-line Diffs ('i') | Expand Comments ('e') | Collapse Comments ('c') | Show Comments Hide Comments ('s')
OLDNEW
1 Design of the Subzero fast code generator 1 Design of the Subzero fast code generator
2 ========================================= 2 =========================================
3 3
4 Introduction 4 Introduction
5 ------------ 5 ------------
6 6
7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes 7 The `Portable Native Client (PNaCl) <http://gonacl.com>`_ project includes
8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the 8 compiler technology based on `LLVM <http://llvm.org/>`_. The developer uses the
9 PNaCl toolchain to compile their application to architecture-neutral PNaCl 9 PNaCl toolchain to compile their application to architecture-neutral PNaCl
10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as 10 bitcode (a ``.pexe`` file), using as much architecture-neutral optimization as
(...skipping 38 matching lines...) Expand 10 before | Expand all | Expand 10 after
49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the 49 optimizations (``O2``) and minimal optimizations (``Om1``). The recipes are the
50 following (described in more detail below): 50 following (described in more detail below):
51 51
52 +-----------------------------------+-----------------------------+ 52 +-----------------------------------+-----------------------------+
53 | O2 recipe | Om1 recipe | 53 | O2 recipe | Om1 recipe |
54 +===================================+=============================+ 54 +===================================+=============================+
55 | Parse .pexe file | Parse .pexe file | 55 | Parse .pexe file | Parse .pexe file |
56 +-----------------------------------+-----------------------------+ 56 +-----------------------------------+-----------------------------+
57 | Loop nest analysis | | 57 | Loop nest analysis | |
58 +-----------------------------------+-----------------------------+ 58 +-----------------------------------+-----------------------------+
59 | Local common sub-exp elimination | |
Jim Stichnoth 2016/08/05 06:20:00 I think you can spell out "subexpression" and (pai
manasijm 2016/08/05 06:57:09 Won't be that painful! I might not be an emacs nin
manasijm 2016/08/05 17:34:44 Done.
60 +-----------------------------------+-----------------------------+
59 | Address mode inference | | 61 | Address mode inference | |
60 +-----------------------------------+-----------------------------+ 62 +-----------------------------------+-----------------------------+
61 | Read-modify-write (RMW) transform | | 63 | Read-modify-write (RMW) transform | |
62 +-----------------------------------+-----------------------------+ 64 +-----------------------------------+-----------------------------+
63 | Basic liveness analysis | | 65 | Basic liveness analysis | |
64 +-----------------------------------+-----------------------------+ 66 +-----------------------------------+-----------------------------+
65 | Load optimization | | 67 | Load optimization | |
66 +-----------------------------------+-----------------------------+ 68 +-----------------------------------+-----------------------------+
67 | | Phi lowering (simple) | 69 | | Phi lowering (simple) |
68 +-----------------------------------+-----------------------------+ 70 +-----------------------------------+-----------------------------+
(...skipping 442 matching lines...) Expand 10 before | Expand all | Expand 10 after
511 register for ``T.inf``. This leads to ``T.inf=B`` being a redundant register 513 register for ``T.inf``. This leads to ``T.inf=B`` being a redundant register
512 copy, which is removed as an emission-time peephole optimization. 514 copy, which is removed as an emission-time peephole optimization.
513 515
514 O2 lowering 516 O2 lowering
515 ----------- 517 -----------
516 518
517 Currently, the ``O2`` lowering recipe is the following: 519 Currently, the ``O2`` lowering recipe is the following:
518 520
519 - Loop nest analysis 521 - Loop nest analysis
520 522
523 - Local common sub-expression elimination
Jim Stichnoth 2016/08/05 06:20:00 "Subexpression" is usually written without the hyp
manasijm 2016/08/05 17:34:44 Done.
524
521 - Address mode inference 525 - Address mode inference
522 526
523 - Read-modify-write (RMW) transformation 527 - Read-modify-write (RMW) transformation
524 528
525 - Basic liveness analysis 529 - Basic liveness analysis
526 530
527 - Load optimization 531 - Load optimization
528 532
529 - Target lowering 533 - Target lowering
530 534
(...skipping 160 matching lines...) Expand 10 before | Expand all | Expand 10 after
691 this second-chance mechanism is disabled. 695 this second-chance mechanism is disabled.
692 696
693 Future work is to implement LLVM's `Greedy 697 Future work is to implement LLVM's `Greedy
694 <http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html>`_ 698 <http://blog.llvm.org/2011/09/greedy-register-allocation-in-llvm-30.html>`_
695 register allocator as a replacement for the basic linear-scan algorithm, given 699 register allocator as a replacement for the basic linear-scan algorithm, given
696 LLVM's experience with its improvement in code quality. (The blog post claims 700 LLVM's experience with its improvement in code quality. (The blog post claims
697 that the Greedy allocator also improved maintainability because a lot of hacks 701 that the Greedy allocator also improved maintainability because a lot of hacks
698 could be removed, but Subzero is probably not yet to that level of hacks, and is 702 could be removed, but Subzero is probably not yet to that level of hacks, and is
699 less likely to see that particular benefit.) 703 less likely to see that particular benefit.)
700 704
705 Local common sub-expression elimination
706 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
707
708 The Local CSE implementation goes through each instruction and record ``Seen``
Jim Stichnoth 2016/08/05 06:20:00 records
manasijm 2016/08/05 17:34:44 Done.
709 instructions in a ``CfgUnorderedSet``. The equality comparator for this
Jim Stichnoth 2016/08/05 06:20:00 I don't think the level of detail in this document
manasijm 2016/08/05 17:34:44 Done.
710 container ignores the destination of the instructions. That means ``A = X + Y``
711 and ``B = X + Y`` will be considered to be 'equal' for this purpose. This allows
712 us to detect common sub-expressions.
713
714 Whenever a repetition is detected, the redundant variables are stored in a
715 container mapping the replacee to the replacement. In the case above, it would
716 be ``MAP[B] = A`` assuming ``B = X + Y`` comes after ``A = X + Y``.
717
718 At any point if a variable that has an entry in the replacement table is
719 encountered, it is replaced with the variable it is mapped to. This ensures that
720 the redundant variables will not have any uses in the basic block, allowing
721 dead code elimination to clean up the redundant instruction.
722
723 With SSA, the information stored is never invalidated. However, non-SSA input is
724 supported with the -lcse=no-ssa option. This has to maintain some extra
Jim Stichnoth 2016/08/05 06:19:59 ``-lcse=no-ssa``
manasijm 2016/08/05 17:34:43 Done.
725 dependency information to ensure proper invalidation on variable assignment.
726 This is not rigorously tested because this pass is run at an early stage where
727 it is safe to assume variables have a single definition. This is not enabled by
728 default because it bumps the compile time overhead from 2% to 6%.
729
730 Loop invariant code motion
Jim Stichnoth 2016/08/05 06:20:00 I think it's more common to hyphenate "loop-invari
manasijm 2016/08/05 17:34:44 Done.
731 ^^^^^^^^^^^^^^^^^^^^^^^^^^
732
733 This pass utilizes the loop analysis information to hoist invariant instructions
734 to loop pre-headers. A loop must have a single entry node (header) and that node
735 must have a single external predecessor for this optimization to work, as it is
736 currently implemented.
737
738 The pass works by iterating over all instructions in the loop until the set of
739 invariant instructions converge. In each iteration, a non-invariant instruction
Jim Stichnoth 2016/08/05 06:20:00 converges
manasijm 2016/08/05 17:34:44 Done.
740 involving only constants or a variable known to be invariant is added to the
741 result set. The destination variable of that instruction is added to the set of
742 variables known to be invariant (which is initialized with the constant args).
743
744 Improving the loop-analysis infrastructure is likely to have significant effect
Jim Stichnoth 2016/08/05 06:20:00 maybe s/effect/impact/ ?
manasijm 2016/08/05 17:34:43 Done.
745 on this optimization. Inserting an extra node to act as the pre-header when the
746 header has multiple incoming edges from outside could also be good idea.
Jim Stichnoth 2016/08/05 06:20:00 be a good
manasijm 2016/08/05 17:34:44 Done.
747 Expanding the initial invariant variable set to contain all variables that do
748 not have definitions inside the loop does not seem to improve anything.
749
750 Short Circuit Evaluation
Jim Stichnoth 2016/08/05 06:20:00 For consistency, only capitalize the first word of
manasijm 2016/08/05 17:34:43 Done.
751 ^^^^^^^^^^^^^^^^^^^^^^^^
752
753 Short Circuit Evaluation splits nodes and introduces early jumps when the result
Jim Stichnoth 2016/08/05 06:19:59 I would also not capitalize all these words. To s
manasijm 2016/08/05 17:34:44 Done.
754 of a logical operation can be determined early and there are no side effects of
Jim Stichnoth 2016/08/05 06:20:00 maybe "no observable side effects" ?
manasijm 2016/08/05 17:34:44 Done.
755 skipping the rest of the computation. The instructions considered backwards from
756 the end of the basic blocks.
757 When a definition of a variable involved in a conditional jump is found, an
Jim Stichnoth 2016/08/05 06:20:00 reflow paragraph, or make it a new paragraph
manasijm 2016/08/05 17:34:44 Done.
758 extra jump can be inserted in that location, moving the rest of the instructions
759 in the node to a newly inserted node. Consider this example::
760
761 __N :
762 a = <something>
763 Instruction 1 without side effect
764 ... b = <something> ...
765 Instruction N without side effect
766 t1 = or a b
767 br t1 __X __Y
768
769 is transformed to ::
770
771 __N :
772 a = <something>
773 br a __X __N_ext
774
775 __N_ext :
776 Instruction 1 without side effect
777 ... b = <something> ...
778 Instruction N without side effect
779 br b __X __Y
780
781 The logic for AND is analogous, the only difference is that the early jump is
782 facilitated by a ``false`` value instead of ``true``.
783
784 Global Variable Splitting
785 ^^^^^^^^^^^^^^^^^^^^^^^^^
786
787 Global variable splitting (-split-global-vars) is run after register allocation.
Jim Stichnoth 2016/08/05 06:19:59 ``-split-global-vars``
manasijm 2016/08/05 17:34:44 Done.
788 It works on the variables that did not manage to get registers (but are allowed
789 to) and decomposes their live ranges into the individual segments (which span a
790 single node at most). New variables are created (but not yet used) with these
791 smaller live ranges and the register allocator is run again. This is not
792 inefficient as the old variables that already had registers are now considered
793 pre-colored.
794
795 The new variables that get registers replace their parent variables for their
796 portion of its (parent's) live range. A copy from the old variable to the new
797 is introduced before the first use and the reverse after the last def in the
798 live range.
799
800
701 Basic phi lowering 801 Basic phi lowering
702 ^^^^^^^^^^^^^^^^^^ 802 ^^^^^^^^^^^^^^^^^^
703 803
704 The simplest phi lowering strategy works as follows (this is how LLVM ``-O0`` 804 The simplest phi lowering strategy works as follows (this is how LLVM ``-O0``
705 implements it). Consider this example:: 805 implements it). Consider this example::
706 806
707 L1: 807 L1:
708 ... 808 ...
709 br L3 809 br L3
710 L2: 810 L2:
(...skipping 774 matching lines...) Expand 10 before | Expand all | Expand 10 after
1485 This could be mitigated by switching these to the CFG-local allocator. 1585 This could be mitigated by switching these to the CFG-local allocator.
1486 1586
1487 Third, multithreading may make the default allocator strategy more expensive. 1587 Third, multithreading may make the default allocator strategy more expensive.
1488 In a single-threaded environment, a pass will allocate its containers, run the 1588 In a single-threaded environment, a pass will allocate its containers, run the
1489 pass, and deallocate the containers. This results in stack-like allocation 1589 pass, and deallocate the containers. This results in stack-like allocation
1490 behavior and makes the heap free list easier to manage, with less heap 1590 behavior and makes the heap free list easier to manage, with less heap
1491 fragmentation. But when multithreading is added, the allocations and 1591 fragmentation. But when multithreading is added, the allocations and
1492 deallocations become much less stack-like, making allocation and deallocation 1592 deallocations become much less stack-like, making allocation and deallocation
1493 operations individually more expensive. Again, this could be mitigated by 1593 operations individually more expensive. Again, this could be mitigated by
1494 switching these to the CFG-local allocator. 1594 switching these to the CFG-local allocator.
OLDNEW
« no previous file with comments | « no previous file | no next file » | no next file with comments »

Powered by Google App Engine
This is Rietveld 408576698