Compiler Optimization - Introduction

CSCD70 Week1

Posted by Dino on January 12, 2021

Purpose of compilers

  1. Translate one language into another
    • e.g., convert C++ into x86 object code
  2. Improve (i.e. “optimize”) the code
    • e.g., make the code run 3 times faster
      • or more energy efficient, more robust, etc.

How can the compiler improve performance

Execution time = Operation count * Machine cycles per operation

  • Minimize the number of operations
    • Arithmetic operations, memory accesses
  • Replace expensive operations with simpler ones
    • e.g., replace 4-cycle multiplication with 1-cycle shift
  • Minimize cache misses
    • both data and instruction accesses
  • Perform work in parallel
    • Instruction scheduling within a thread
    • Parallel execution across multiple threads

Optimization Formalization

  1. Formulate optimization problem
    • Identify opportunities of optimization (problem domain of the optimization)
      • Applicable across many programs
      • Affect key parts of the program (loops/recursions)
      • Amenable to “efficient enough” algorithm
  2. Representation
    • Must abstract essential details relevant to optimization
  3. Analysis
    • Detect when it is desirable and safe to apply transformation
  4. Code Transformation

Representation: Instructions

  • Three-address code
    • A := B op C
    • LHS: name of variable e.g., x, A[t] (address of A + contents of t)
    • RHS: value
    • There is at most one operator on the right side of an instruction
      • t2 = a + b + 3 will be replaced to
        • t1 = a + b
        • t2 = t1 + 3
    • Each 3AC contains at most 3 addresses
      • Address can be one of the following
        • Names: a, b
        • Constant: 3
        • Compiler-generated temporary: t1, t2
    • Examples

      code operator
      x = y bop z bop: binary arithmetic or logical operation
      x = uop z uop: unary operation (minus, negation, casting)
      x = y  
      goto L goto L: unconditional jump to a program location L
      if x goto L: if … goto L: conditional jump
      if x ropy goto L: rop: relational operator (>, <, ==, >=, <=, etc.)

Optimization Example

Bubblesort program that sorts an array A that is allocated in static storage:

  • an element of A uses four bytes of a byte-addressed machine
  • elements of A are number 1 through n (n is a variable)
  • A[j] is in location &A + 4 * (j - 1)
1
2
3
4
5
6
7
FOR i := n-1 DOWNTO 1 DO
    FOR j := 1 TO i DO
        IF A[j] > A[j + 1] THEN BEGIN
            temp := A[j];
            A[j] := A[j+1];
            A[j+1] := temp;
        END

Translated code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
    i := n - 1
S5: if i < 1 goto s1 ; boundary condition for outer loop
    j := 1
S4: if j > i goto s2 ; boundary condition for inner loop
    t1 := j - 1
    t2 := 4 * t1
    t3 := A[t2] ; Address for A[j]
    t4 := j + 1
    t5 := t4 - 1
    t6 := 4 * t5
    t7 := A[t6] ; Address for A[j + 1]
    if t3 <= t7 goto s3 ; Exit condition for comparision
    t8 := j - 1
    t9 := 4 * t8
    temp := A[t9] ; Address temp with A[j]
    t10 := j + 1
    t11 := t10 - 1
    t12 := 4 * t11
    t13 := A[t12] ; Address for A[j + 1]
    t14 := j - 1
    t15 := 4 * t14
    A[t15] := t13 ; Assign A[j] with A[j + 1]
    t16 := j + 1
    t17 := t16 - 1
    t18 := 4 * t17
    A[t18] := temp ; Assign A[j + 1] with temp / old value of A[j]
S3: j := j + 1
    goto S4 ; Start new iteration of the inner loop
S2: i := i - 1
    goto S5 ; Start new iteration of the outer loop
S1:

Representation: Basic Block

  • Basic Block: Maximal sequences of consecutive three-address instructions with the properties
    • Only the first statement can be reached from outside the block (no branches into middle of block)
    • It can be exited only at the end
  • Algorithm for building Basic Block

      INPUT: A sequence of three-address instructions of P
      OUTPUT: A list of basic blocks of P
      METHOD: (1) Determine the leads in  P
              - The first instruction in P is a leader
              - Any target instruction of a conditional / unconditional jump is a leader
              - Any instruction that immediatedly follows a jump is a leader
                
              (2) Build BBs for P
              - A BB consists of a leader and all its subsequent instructions 
              - until the next leader 
    
  • Example

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
      x = input
      y = x - 1
      z = x * y
      if z < x goto (7)
      p = x / y
      q = p + y
      a = q
      b = x + a
      c = 2a - b
      if p == q goto (12)
      goto (3)
      return
    
    1. Determine leaders (line number):
      • The first line: 1
      • The target of jump: 3, 7, 12
      • Line follow after the jump: 5, 11, 12
      • 1, 3, 5, 7, 11, 12
    2. Build BBs for P:
      • B1: (1, 2)
      • B2: (3, 4)
      • B3: (5, 6)
      • B4: (7, 8, 9, 10)
      • B5: (11)
      • B6: (12)

Flow Graphs

  • Nodes: Basic Blocks

  • Edges: Block A to B if and only if

    • There is a conditional or unconditional jump from the end of A to the beginning of B
    • B immediately follows A in the original order of instuctions and A does not end in anunconditional jump

Sources of Optimizations

  • Algorithm optimization

  • algebraic optimization
    • A := B + 0 => A := B
  • Local optimizations
    • within a basic block – across instructions
  • Global optimizations
    • within a flow graph – across basic blocks
  • Interprocedural analysis
    • within a program – across procedures (flow graphs)

Local Optimizations

  • Analysis & transformation performed within a basic block

  • No control flow information is considered

  • Examples of local optimizations:
    • local common subexpression elimination
      • analysis: same expression evaluated more than once in b
      • transform: replace with single calculation
    • local constant folding or elimination
      • analysis: expression can be evaluated at compile time
      • transform: replace by constant, compile-time value
    • dead code elimination
  • Example
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
      B3: j := 1
      B4: if j > i goto B5 ; boundary condition for inner loop
      B6: t1 := j - 1
          t2 := 4 * t1
          t3 := A[t2] ; Address for A[j]
          -------------
          |t4 := j + 1 |
          |t5 := t4 - 1| ; t5 can be replaced as j
          |t6 := 4 * t5| ; t6 can be replaced as 4 * j
          -------------
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
      B7: t8 := j - 1
          t9 := 4 * t8
          temp := A[t9] ; Address temp with A[j]
          ---------------
          |t10 := j + 1  |
          |t11 := t10 - 1| ; t11 can be replaced as j
          |t12 := 4 * t11| ; t12 can be replace as 4 * j
          ---------------
          t13 := A[t12] ; Address for A[j + 1]
          --------------
          |t14 := j - 1  | ; t14 is literally same as t8
          |t15 := 4 * t14| ; t15 is literally same as t9
          |A[t15] := t13 |; Assign A[j] with A[j + 1]
          ---------------
          |t16 := j + 1  | ; t16 is literally same as t10
          |t17 := t16 - 1| ; t17 is literally same as t11
          |t18 := 4 * t17| ; t18 is literally same as t12
          |A[t18] := temp| ; Assign A[j + 1] with temp / old value of A[j]
          ---------------
      B8: j := j + 1
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    
  • Optimized Code
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
      B3:    j := 1
      B4: if j > i goto B5 ; boundary condition for inner loop
      B6: t1 := j - 1
          t2 := 4 * t1
          t3 := A[t2] ; Address for A[j]
          -------------
          |t6 := 4 * j|
          -------------
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
      B7: t8 := j - 1
          t9 := 4 * t8
          temp := A[t9] ; Address temp with A[j]
          ---------------
          |t12 := 4 * j |
          ---------------
          t13 := A[t12] ; Address for A[j + 1]
          --------------
          |A[t9] := t13 |; Assign A[j] with A[j + 1]
          ---------------
          |A[t12] := temp| ; Assign A[j + 1] with temp / old value of A[j]
          ---------------
      B8: j := j + 1
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    

(Intraprocedural) Global Optimizations

  • Global versions of local optimizations
    • global common subexpression elimination
    • global constant propagation
    • dead code elimination
  • Loop optimizations
    • reduce code to be executed in each iteration
    • code motion
    • induction variable elimination
  • Other control structures
    • Code hoisting: eliminates copies of identical code on parallel paths in a flow graph to reduce code size
  • Example
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
      B3:    j := 1
      B4: if j > i goto B5 ; boundary condition for inner loop
      B6: t1 := j - 1
          t2 := 4 * t1
          t3 := A[t2] ; Address for A[j]
          t6 := 4 * j
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
          ----------------
      B7: |t8 := j - 1   |  ; t8 is same as t1 in B2
          |t9 := 4 * t8  |  ; t9 is same as t2 in B2
          |temp := A[t9] |  ; temp is same as t2 in B2, Address temp with A[j]
          |t12 := 4 * j  |  ; t12 is same as t6 in B2
          |t13 := A[t12] |  ; t13 is same as t7 in B2, Address for A[j + 1]
          |A[t9] := t13  |  ; Assign A[j] with A[j + 1]
          |A[t12] := temp|  ; Assign A[j + 1] with temp / old value of A[j]
          ---------------
      B8: j := j + 1
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    
  • Optimized Code (After Global Common Subexpression Elimination)
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
      B3: j := 1
      B4: if j > i goto B5 ; boundary condition for inner loop
      B6: t1 := j - 1
          t2 := 4 * t1
          t3 := A[t2] ; Address for A[j]
          t6 := 4 * j
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
          --------------
      B7: |A[t2] := t7  | ; Assign A[j] with A[j + 1]
          |A[t6] := t2  | ; Assign A[j + 1] with temp / old value of A[j]
          ---------------
      B8: j := j + 1
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    

Induction Variable Elimination

  • Intuitively
    • Loop indices are induction variables
      • (Counting iterations)
    • Linear functions of the loop indices are also induction variables
      • (For accessing arrays)
  • Analysis: Detection of induction variable

  • Optimizations:
    • Strength reduction:
      • Replace multiplication by additions
    • Elimination of loop index:
      • Replace termination by test on other induction variables
  • Example:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
          -----------------   
      B3: |j := 1          | ; Loop variable for accessing array
      B4: |if j > i goto B5| ; boundary condition for inner loop
      B6: |t1 := j - 1     |
          |t2 := 4 * t1    |
          -----------------
          t3 := A[t2] ; Address for A[j]
          ------------
          |t6 := 4 * j| ; Strength reduction
          ------------
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
      B7: A[t2] := t7   ; Assign A[j] with A[j + 1]
          A[t6] := t2   ; Assign A[j + 1] with temp / old value of A[j]
          -----------
      B8: |j := j + 1|
          -----------
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    
  • Code after optimization
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
      B1: i := n - 1
      B2: if i < 1 goto out ; boundary condition for outer loop
          --------------------   
      B3: |t2 := 0            | ; replace t2 with 0 and use addition for each iteration
          |t6 := 4            | ; replace t6 with 4 and use addition for each iteration
      B4: |t19 := 4 * i       | ; Since we are placing an addition of 4 for j each time, to compare with i we need to multiply i with 4
          |if t6 > t19 goto B5| ; boundary condition for inner loop
          --------------------
      B6: t3 := A[t2] ; Address for A[j]
          t7 := A[t6] ; Address for A[j + 1]
          if t3 <= t7 goto B8 ; Exit condition for comparision
      B7: A[t2] := t7   ; Assign A[j] with A[j + 1]
          A[t6] := t2   ; Assign A[j + 1] with temp / old value of A[j]
          -------------
      B8: |t2 := t2 + 4|
          |t6 := t6 + 4|
          --------------
          goto B4 ; Start new iteration of the inner loop
      B5: i := i - 1
          goto B2 ; Start new iteration of the outer loop
      out:
    

Loop Invariant Code Motion

  • Analysis:
    • A computation is done within a loop and
    • result of the computation is the same as long as we keep going around the loop
  • Transformation:
    • move the computation outside the loop

Machine Dependent Optimizations

  • Register allocation
  • Instruction scheduling
  • Memory hierarchy optimizations
  • etc.

Local Optimizations (More Details)

  • Common subexpression elimination
    • array expressions
    • field access in records
    • access to parameters

Grpah Abstractions

Example 1

  • grammar (for bottom-up parsing):
    • E -> E + T | E - T | T, T -> T * F | F, F -> (E) | id
  • expression: a + a * (b - c) + (b - c) * d

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    
      # Parse Tree                                          # DAG
              +                                              +
           /       \                                       /   \ 
          +         *                                     +      *
        /   \      /   \                                 /\      / \
      a      *    -     d                               a--*    /   d
            / \  / \                                        \  /
           a  -  b c                                         -
             / \                                            / \ 
             b  c                                          b   c
    
  • Optimized code:
    1
    2
    3
    4
    5
    
      t1 = b - c
      t2 = a * t1
      t3 = a + t2
      t4 = t1 * d
      t5 = t3 + t4
    

Example 2 (DAG is not correct all the time)

  • Code:
    1
    2
    3
    4
    
      a = b + c
      b = a - d
      c = b + c
      d = a - d
    
  • The resulting DAG

  • Optimized Code
    1
    2
    3
    
          a = b + c;
          d = a - d;
          c = d + c;
    

    In fact, this code is incorrect

  • Critiques of DAGs
    • Cause of problems
      • Assignment statements
      • Value of variable depends on TIME
    • How to fix problem?
      • build graph in order of execution
      • attach variable name to lastest value
    • Final graph created is not very interesting
      • Key: variable -> value mapping across time
      • loses appeal of abstraction

Value Numbering(VN)

  • More explicit with respect to VALUES, and TIME

  • Each value has its own “number”
    • Common subexpression means same value number
  • var2value: current map of variable to value
    • Used to determine the value number of current expression
    • r1 + r2 $\rightarrow$ var2value(r1) + var2value(r2)

  • Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Data Structure: VALUES = 
    Table of expression //[OP, valnum1, valnum2]
             var // name of variable currently holding expression
                                  
For each instruction (dst = src1 OP src2) in execution order

    valnum1 = var2value(src1);
    varnum2 = var2value(src2);
    
    IF[OP, valnum1, valnum2] is in VALUES
        v = the index of expression
        Replace instruction with CPY dst = VALUES[v].var
    ELSE
        Add
            expression = [OP, valnum1, valnum2]
            var = dst
        to VALUES
        v = index of new entry; tv is new temporary for v
        Replace instruction with: 
            tv = VALUES[valnum1].var OP VALUES[valnum2].var
        CPY dst = tv
    set_var2value(dst, v)
  • More Details
  • What are the initial values of the variables?
    • Values at beginning of the basic block
  • Possible implementations:
    • Initialization: Create “initial values” for all variables
    • Or dynamically create them as they are used
  • Implementation of VALUES and var2value: hash tables

Example Assign: a $\rightarrow r1$ b $\rightarrow r2$ c $\rightarrow r3$ d $\rightarrow r4$

   
a = b + c ADD t1 = r2, r3
CPY r1 = t1
b = a - d SUB t2 = r1, r4
CPY r2 = t2
c = b + c ADD t3 = r2, r3
CPY r3 = t3
Since b is reassigned in the previous it is not the same as the first one thus it requires a new insert
d = a - d SUB t4 = r1, r4
CPY r4 = t2