Purpose of compilers
- Translate one language into another
- e.g., convert C++ into x86 object code
- Improve (i.e. “optimize”) the code
- e.g., make the code run 3 times faster
- or more energy efficient, more robust, etc.
- e.g., make the code run 3 times faster
How can the compiler improve performance
Execution time = Operation count * Machine cycles per operation
- Minimize the number of operations
- Arithmetic operations, memory accesses
- Replace expensive operations with simpler ones
- e.g., replace 4-cycle multiplication with 1-cycle shift
- Minimize cache misses
- both data and instruction accesses
- Perform work in parallel
- Instruction scheduling within a thread
- Parallel execution across multiple threads
Optimization Formalization
- Formulate optimization problem
- Identify opportunities of optimization (problem domain of the optimization)
- Applicable across many programs
- Affect key parts of the program (loops/recursions)
- Amenable to “efficient enough” algorithm
- Identify opportunities of optimization (problem domain of the optimization)
- Representation
- Must abstract essential details relevant to optimization
- Analysis
- Detect when it is desirable and safe to apply transformation
- Code Transformation
Representation: Instructions
- Three-address code
- A := B op C
- LHS: name of variable e.g., x, A[t] (address of A + contents of t)
- RHS: value
- There is at most one operator on the right side of an instruction
- t2 = a + b + 3 will be replaced to
- t1 = a + b
- t2 = t1 + 3
- t2 = a + b + 3 will be replaced to
- Each 3AC contains at most 3 addresses
- Address can be one of the following
- Names: a, b
- Constant: 3
- Compiler-generated temporary: t1, t2
- Address can be one of the following
-
Examples
code operator x = y bop z bop: binary arithmetic or logical operation x = uop z uop: unary operation (minus, negation, casting) x = y goto L goto L: unconditional jump to a program location L if x goto L: if … goto L: conditional jump if x ropy goto L: rop: relational operator (>, <, ==, >=, <=, etc.)
Optimization Example
Bubblesort program that sorts an array A that is allocated in static storage:
- an element of A uses four bytes of a byte-addressed machine
- elements of A are number 1 through n (n is a variable)
- A[j] is in location &A + 4 * (j - 1)
1
2
3
4
5
6
7
FOR i := n-1 DOWNTO 1 DO
FOR j := 1 TO i DO
IF A[j] > A[j + 1] THEN BEGIN
temp := A[j];
A[j] := A[j+1];
A[j+1] := temp;
END
Translated code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
i := n - 1
S5: if i < 1 goto s1 ; boundary condition for outer loop
j := 1
S4: if j > i goto s2 ; boundary condition for inner loop
t1 := j - 1
t2 := 4 * t1
t3 := A[t2] ; Address for A[j]
t4 := j + 1
t5 := t4 - 1
t6 := 4 * t5
t7 := A[t6] ; Address for A[j + 1]
if t3 <= t7 goto s3 ; Exit condition for comparision
t8 := j - 1
t9 := 4 * t8
temp := A[t9] ; Address temp with A[j]
t10 := j + 1
t11 := t10 - 1
t12 := 4 * t11
t13 := A[t12] ; Address for A[j + 1]
t14 := j - 1
t15 := 4 * t14
A[t15] := t13 ; Assign A[j] with A[j + 1]
t16 := j + 1
t17 := t16 - 1
t18 := 4 * t17
A[t18] := temp ; Assign A[j + 1] with temp / old value of A[j]
S3: j := j + 1
goto S4 ; Start new iteration of the inner loop
S2: i := i - 1
goto S5 ; Start new iteration of the outer loop
S1:
Representation: Basic Block
- Basic Block: Maximal sequences of consecutive three-address instructions with the properties
- Only the first statement can be reached from outside the block (no branches into middle of block)
- It can be exited only at the end
-
Algorithm for building Basic Block
INPUT: A sequence of three-address instructions of P OUTPUT: A list of basic blocks of P METHOD: (1) Determine the leads in P - The first instruction in P is a leader - Any target instruction of a conditional / unconditional jump is a leader - Any instruction that immediatedly follows a jump is a leader (2) Build BBs for P - A BB consists of a leader and all its subsequent instructions - until the next leader
-
Example
1 2 3 4 5 6 7 8 9 10 11 12
x = input y = x - 1 z = x * y if z < x goto (7) p = x / y q = p + y a = q b = x + a c = 2a - b if p == q goto (12) goto (3) return
- Determine leaders (line number):
- The first line: 1
- The target of jump: 3, 7, 12
- Line follow after the jump: 5, 11, 12
- 1, 3, 5, 7, 11, 12
- Build BBs for P:
- B1: (1, 2)
- B2: (3, 4)
- B3: (5, 6)
- B4: (7, 8, 9, 10)
- B5: (11)
- B6: (12)
- Determine leaders (line number):
Flow Graphs
-
Nodes: Basic Blocks
-
Edges: Block A to B if and only if
- There is a conditional or unconditional jump from the end of A to the beginning of B
- B immediately follows A in the original order of instuctions and A does not end in anunconditional jump
Sources of Optimizations
-
Algorithm optimization
- algebraic optimization
- A := B + 0 => A := B
- Local optimizations
- within a basic block – across instructions
- Global optimizations
- within a flow graph – across basic blocks
- Interprocedural analysis
- within a program – across procedures (flow graphs)
Local Optimizations
-
Analysis & transformation performed within a basic block
-
No control flow information is considered
- Examples of local optimizations:
- local common subexpression elimination
- analysis: same expression evaluated more than once in b
- transform: replace with single calculation
- local constant folding or elimination
- analysis: expression can be evaluated at compile time
- transform: replace by constant, compile-time value
- dead code elimination
- local common subexpression elimination
- Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop B3: j := 1 B4: if j > i goto B5 ; boundary condition for inner loop B6: t1 := j - 1 t2 := 4 * t1 t3 := A[t2] ; Address for A[j] ------------- |t4 := j + 1 | |t5 := t4 - 1| ; t5 can be replaced as j |t6 := 4 * t5| ; t6 can be replaced as 4 * j ------------- t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision B7: t8 := j - 1 t9 := 4 * t8 temp := A[t9] ; Address temp with A[j] --------------- |t10 := j + 1 | |t11 := t10 - 1| ; t11 can be replaced as j |t12 := 4 * t11| ; t12 can be replace as 4 * j --------------- t13 := A[t12] ; Address for A[j + 1] -------------- |t14 := j - 1 | ; t14 is literally same as t8 |t15 := 4 * t14| ; t15 is literally same as t9 |A[t15] := t13 |; Assign A[j] with A[j + 1] --------------- |t16 := j + 1 | ; t16 is literally same as t10 |t17 := t16 - 1| ; t17 is literally same as t11 |t18 := 4 * t17| ; t18 is literally same as t12 |A[t18] := temp| ; Assign A[j + 1] with temp / old value of A[j] --------------- B8: j := j + 1 goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
- Optimized Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop B3: j := 1 B4: if j > i goto B5 ; boundary condition for inner loop B6: t1 := j - 1 t2 := 4 * t1 t3 := A[t2] ; Address for A[j] ------------- |t6 := 4 * j| ------------- t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision B7: t8 := j - 1 t9 := 4 * t8 temp := A[t9] ; Address temp with A[j] --------------- |t12 := 4 * j | --------------- t13 := A[t12] ; Address for A[j + 1] -------------- |A[t9] := t13 |; Assign A[j] with A[j + 1] --------------- |A[t12] := temp| ; Assign A[j + 1] with temp / old value of A[j] --------------- B8: j := j + 1 goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
(Intraprocedural) Global Optimizations
- Global versions of local optimizations
- global common subexpression elimination
- global constant propagation
- dead code elimination
- Loop optimizations
- reduce code to be executed in each iteration
- code motion
- induction variable elimination
- Other control structures
- Code hoisting: eliminates copies of identical code on parallel paths in a flow graph to reduce code size
- Example
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop B3: j := 1 B4: if j > i goto B5 ; boundary condition for inner loop B6: t1 := j - 1 t2 := 4 * t1 t3 := A[t2] ; Address for A[j] t6 := 4 * j t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision ---------------- B7: |t8 := j - 1 | ; t8 is same as t1 in B2 |t9 := 4 * t8 | ; t9 is same as t2 in B2 |temp := A[t9] | ; temp is same as t2 in B2, Address temp with A[j] |t12 := 4 * j | ; t12 is same as t6 in B2 |t13 := A[t12] | ; t13 is same as t7 in B2, Address for A[j + 1] |A[t9] := t13 | ; Assign A[j] with A[j + 1] |A[t12] := temp| ; Assign A[j + 1] with temp / old value of A[j] --------------- B8: j := j + 1 goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
- Optimized Code (After Global Common Subexpression Elimination)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop B3: j := 1 B4: if j > i goto B5 ; boundary condition for inner loop B6: t1 := j - 1 t2 := 4 * t1 t3 := A[t2] ; Address for A[j] t6 := 4 * j t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision -------------- B7: |A[t2] := t7 | ; Assign A[j] with A[j + 1] |A[t6] := t2 | ; Assign A[j + 1] with temp / old value of A[j] --------------- B8: j := j + 1 goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
Induction Variable Elimination
- Intuitively
- Loop indices are induction variables
- (Counting iterations)
- Linear functions of the loop indices are also induction variables
- (For accessing arrays)
- Loop indices are induction variables
-
Analysis: Detection of induction variable
- Optimizations:
- Strength reduction:
- Replace multiplication by additions
- Elimination of loop index:
- Replace termination by test on other induction variables
- Strength reduction:
- Example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop ----------------- B3: |j := 1 | ; Loop variable for accessing array B4: |if j > i goto B5| ; boundary condition for inner loop B6: |t1 := j - 1 | |t2 := 4 * t1 | ----------------- t3 := A[t2] ; Address for A[j] ------------ |t6 := 4 * j| ; Strength reduction ------------ t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision B7: A[t2] := t7 ; Assign A[j] with A[j + 1] A[t6] := t2 ; Assign A[j + 1] with temp / old value of A[j] ----------- B8: |j := j + 1| ----------- goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
- Code after optimization
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
B1: i := n - 1 B2: if i < 1 goto out ; boundary condition for outer loop -------------------- B3: |t2 := 0 | ; replace t2 with 0 and use addition for each iteration |t6 := 4 | ; replace t6 with 4 and use addition for each iteration B4: |t19 := 4 * i | ; Since we are placing an addition of 4 for j each time, to compare with i we need to multiply i with 4 |if t6 > t19 goto B5| ; boundary condition for inner loop -------------------- B6: t3 := A[t2] ; Address for A[j] t7 := A[t6] ; Address for A[j + 1] if t3 <= t7 goto B8 ; Exit condition for comparision B7: A[t2] := t7 ; Assign A[j] with A[j + 1] A[t6] := t2 ; Assign A[j + 1] with temp / old value of A[j] ------------- B8: |t2 := t2 + 4| |t6 := t6 + 4| -------------- goto B4 ; Start new iteration of the inner loop B5: i := i - 1 goto B2 ; Start new iteration of the outer loop out:
Loop Invariant Code Motion
- Analysis:
- A computation is done within a loop and
- result of the computation is the same as long as we keep going around the loop
- Transformation:
- move the computation outside the loop
Machine Dependent Optimizations
- Register allocation
- Instruction scheduling
- Memory hierarchy optimizations
- etc.
Local Optimizations (More Details)
- Common subexpression elimination
- array expressions
- field access in records
- access to parameters
Grpah Abstractions
Example 1
- grammar (for bottom-up parsing):
E -> E + T | E - T | T, T -> T * F | F, F -> (E) | id
-
expression: a + a * (b - c) + (b - c) * d
1 2 3 4 5 6 7 8 9 10
# Parse Tree # DAG + + / \ / \ + * + * / \ / \ /\ / \ a * - d a--* / d / \ / \ \ / a - b c - / \ / \ b c b c
- Optimized code:
1 2 3 4 5
t1 = b - c t2 = a * t1 t3 = a + t2 t4 = t1 * d t5 = t3 + t4
Example 2 (DAG is not correct all the time)
- Code:
1 2 3 4
a = b + c b = a - d c = b + c d = a - d
-
The resulting DAG
- Optimized Code
1 2 3
a = b + c; d = a - d; c = d + c;
In fact, this code is incorrect
- Critiques of DAGs
- Cause of problems
- Assignment statements
- Value of variable depends on TIME
- How to fix problem?
- build graph in order of execution
- attach variable name to lastest value
- Final graph created is not very interesting
- Key: variable -> value mapping across time
- loses appeal of abstraction
- Cause of problems
Value Numbering(VN)
-
More explicit with respect to VALUES, and TIME
- Each value has its own “number”
- Common subexpression means same value number
- var2value: current map of variable to value
- Used to determine the value number of current expression
- r1 + r2 $\rightarrow$ var2value(r1) + var2value(r2)
- Algorithm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
Data Structure: VALUES =
Table of expression //[OP, valnum1, valnum2]
var // name of variable currently holding expression
For each instruction (dst = src1 OP src2) in execution order
valnum1 = var2value(src1);
varnum2 = var2value(src2);
IF[OP, valnum1, valnum2] is in VALUES
v = the index of expression
Replace instruction with CPY dst = VALUES[v].var
ELSE
Add
expression = [OP, valnum1, valnum2]
var = dst
to VALUES
v = index of new entry; tv is new temporary for v
Replace instruction with:
tv = VALUES[valnum1].var OP VALUES[valnum2].var
CPY dst = tv
set_var2value(dst, v)
- More Details
- What are the initial values of the variables?
- Values at beginning of the basic block
- Possible implementations:
- Initialization: Create “initial values” for all variables
- Or dynamically create them as they are used
- Implementation of VALUES and var2value: hash tables
Example Assign: a $\rightarrow r1$ b $\rightarrow r2$ c $\rightarrow r3$ d $\rightarrow r4$
a = b + c | ADD t1 = r2, r3 CPY r1 = t1 |
b = a - d | SUB t2 = r1, r4 CPY r2 = t2 |
c = b + c | ADD t3 = r2, r3 CPY r3 = t3 Since b is reassigned in the previous it is not the same as the first one thus it requires a new insert |
d = a - d | SUB t4 = r1, r4 CPY r4 = t2 |