Optimization
- Goals for Optimization
- Make the object program faster and/or smaller, and/or use less power without changing the meaning of the program
- To produce object code that is optimal, i.e., minimum execution time or minimum space. Usually an undecidable problem.
- Why optimization is needed
- To improve on inefficient implementation of programming language constructs.
- To mitigate the mismatch between high level programming language constructs and low-level machine instructions
- To compensate for sloppy programming practices
- To improve programs in ways that can’t be expressed in high level programming languages
- Optimization is not a substitute for good algorithm selection
When Do Optimizations Happen?
- Machine Independent Optimizations: Operate on the IR and happen after the IR code generation
- Architecture independent optimizations only
- Could be reused for different backend targets
- Machine Dependent Optimizations: Operate on the generated assembly code and happen after the backend code generation.
- Can do architecture dependent optimizations
- Local (peephole) optimizations only because reasoning global properties of assembly code is hard.
Scope of Optimizations
- peephole: Optimize over a few machine instructions
- local: Optimize over a few statements
- Intra-procedural: Optimize over the body of one routine
- Inter-procedural: Optimize over some collection of routines, e.g. all the member functions in a Class
- global: Optimize over an entire program
When is Optimization Worthwhile
- Most expressions in typical programs are very simple
- Big optimization gains come from optimizing loops and subscript calculation
- Generating locally good code to begin with always wins
- Most interesting optimization problems are NPC problems. There are useful heuristics in many cases.
- Historically, optimization is notorious for introducing bus into a compiler
Classical Optimizations
- Machine Independent Optimizations
- Constant Folding
- Common Subexpression Elimination
- Memory to Register Promotion
- Algebraic Simplification
- Code Motion (Hosting/sink)
- Strength Reduction
- Dead Code Elimination
- Loop Unrolling, Loop Fusion
- Procedure/Function Integration
- Machine Dependent Optimizations
- Register Allocation, allocating variables in registers
- Use of hardware idioms
- Peephole optimization
- Instruction scheduling
- Cache-sensitive code generation
- Modern compilers will chain multiple optimizations together
- Constant folding, loop unrolling, and algebraic simplification may create more common subexpressions
- Inline functions will create more opportunities for common
optimization Inhibitors
-
Side Effects: Any construct that changes the value of a variable has a side effect on that variable. An optimizer needs to know when the value of a variable might change in order to optimize the use of variables
-
Exception Handler: An exception handling mechanism can be invoked asynchronously during the execution of a program and can in general cause side effects on variables. The occurrence of exception-caused side effects is extremely difficult to predict.
-
Aliasing: An alias exists when two identifiers refer to the same storage location. An optimizer needs to know the effect of every assignment statement. An alias causes an unexpected side effect
How to Write Optimizations for LLVM IR
- In LLVM, optimizations are organized as transformation passes.
- Define a class that inherits FunctionPass/ModulePass
- The pass takes the LLVM IR of a module or a function as the input and produces the transformed IR of the module or the function.
- Implement
runOnFunction()
orrunOnModule()
manipulate the IR - PassManager in LLVM will chain multiple optimization and analysis passes to run together
- O3 is actually running a predefined set of optimization passes in LLVM
- opt is LLVM tool that directly invoke a pass on a LLVM IR bytecode
Register Promotion Optimization
-
Memory loads/stores are significantly more expensive than register accesses on all architectures. It is therefore desirable to promote variables holding in memory to registers.
- Challenges:
- Aliasing and address taken operator
- LLVM pass mem2reg performs register promotion for LLVM IR.
Safely Promote Local Variable to Registers
- Code translation pass often use alloca instructions to hold local variables.
- Step 1: Detect all alloca instructions that could be promoted. we do not try to promote an alloca instruction ifL
- Used at instructions other than load/store pointer operands (address taken or aliasing risk)
- It is an aggregate type (i.e., do not promote array/struct)
- Step 2: Iterate over all basic blocks of the function: For each one:
- For every load instruction that fetches a local variable to be promoted, we replace its value with the stored value of the last store instruction in the same basic block
- If there is no store instruction proceeding the load in the same block, create a PHI instruction for the variable in the basic block and replace the value of the load instruction with the PHi. (For entry block with no preceding blocks, it will be an undef value).
- Use Post[BB, v] to hold the representative value of the variable v at the end of the basic block BB.
- Post[BB, v] equals to the last stored value of v if any.
- If there is no store instruction for v, Post[BB, v] equals to the PHI instruction created for v in BB
- Step 3: Fill incoming edges of all created PHI instructions based on the computed Post table
- If the PHI instruction for variable v has an incoming edge from the preceding block BB, the incoming edge should equal to Post[BB, v]
- Step 4: Remove unused PHI instructions
- Step 5: Remove all load/store instructions accessing promoted variables. They now should have no other use.
-
Step 6: Remove all alloca instructions for promoted variables, they now should have no other use as well.
Removable Instructions After Remove - Step 7: Eliminate redundant PHI instructions
-
If all incoming edges of a PHI instruction pointing to the same value (or self referencing), remove the PHI instruction and replace its references with this value throughout the program.
-
Iteratively perform the above step until no more PHI instruction can be eliminated.
-
Note that this last step does not affect the correctness but remove redundant IR for speed.
-
Notation for Optimization Examples
-
We use the operator @ as a C-style pointer dereferencing operator in these examples to avoid confusion with * which is always multiple
-
We use a C-style assignment operator = that expects an l-value as its left operand and an r-value as its right operand
-
&X is used for address of X
-
Variables with names ending in P (e.g., AP, BP) are pointer variables
-
Most data items are assumed to be 4 bytes wide
-
For most optimizations we will preferentially force address items into the form @P + constant because this form fits well with the most common register + displacement addressing mode on many machines.
Constant Folding
-
If an expression involves only constants or other values known to the compiler, calculate the value of the expression at compile time
-
Challenges:
- Compiler must contain a calculator for expressions. For cross compiler must calculate in the arithmetic of the target machine.
- May want to extend to common builtin functions, e.g. max, sqrt
- Beware of arithmetic faults during constant expression evaluation, e.g. overflow, underflow, divide by zero
- May need to reorder expressions to facilitate constant folding, e.g. 3 + A + 4
Source Code Optimized Code #define aSize 100
int A[aSize]
A[aSize-1] = aSize * aSize
@(&A[0] + 396) = 10000
Common Subexpression Elimination
-
If the same expression is calculated more than once, save the value of the first calculation and use it place of the repeated calculations
-
Challenges:
- Detection of common subexpression. Use heuristics, canonical ordering
- Must be able to detect when the values of the variables in an expression could be changed. The presence of functions with side effects, aliasing and exception handling make this more difficult.
Source Code Optimized Code A[J] = A[J] + 1;
A[J + 1] = A[J]
AP = (&A[0] + 4 * J)
@AP = @AP + 1
@(AP + 4) = @AP
Variable Folding
-
If a variable is assigned a “computationally simple” value, replace subsequent uses of the variable by the value. Creates new opportunities for constant folding and common subexpression elimination.
-
Challenges: aliasing, side effects
Source Code Optimized Code J = 13
A[J] = B[J] * C[J]
K = J + 1
A[K] = B[K]
A[13] = B[13] * C[13]
A[J + 1] = B[J + 1]
Algebraic Simplification
-
Use algebraic identities to simplify or reorder expressions. Use commutativity and associativity of operators. Often used to facilitate constant folding and common subexpression elimination.
-
Challenges:
- Must be careful not to change the meaning of the program
- Consider the corner cases of overflow and underflow
Source Code Optimized Code X + 0
X
X - 0
X
X * 1
X
Y / 1
Y
X * 0
0
-X + -Y
-(X + Y)
P and true
P
P and false
false
P or true
true
P or false
P
Code Motion
-
Move code (statements, expressions or fragments) out of loops (forward or backward) to places where they are executed less frequently
-
Code that is moved must be loop invariant, i.e. independent of the loop indices
- Challenges:
- Correctness (safety) of code motion.
- Influenced by side effects, aliasing and exception handling
- Must preserve semantics of loops that never execute
- May need fixup code after the loop.
- Some optimizers do code motion by copy insertion followed by common subexpression elimination
Strength Reduction
-
Strength reduction is the replacement of “slow” operations (e.g., multiply and divide) with “faster” operations (e.g., add and subtract)
- Challenges:
- Deciding which operations are sufficiently “slow” as to warrant this optimization. Strength reduction was more important when multiply and divide were slow relative to add and subtract.
- Need to be careful not to change the meaning of the program
- Strength reduction is often used to eliminate multiplications in array subscript polynomials. Often a precursor to machine dependent optimizations.
Dead Code Elimination
-
Compiler detects and eliminates code that can never be executed (dead code) and computations of values that are never subsequently used (useless computations)
-
Challenges:
- Detection of dead code requires analysis of control flow graph
- Useless calculations may result from programmer error or from other optimizations. Test replacement is a form of useless calculation elimination
Source Code Optimized Code if(true)
X = Y
else
X = Z
return
X = Y
X = Y
return
Loop Unrolling
-
Replicate the body of a loop and adjust loop index. Reduces loop overhead relative to loop body. Enables common subexpression and constant folding optimizations
-
Challenges:
- Detecting unrollable loops and considering side effects
- Loop carried dependencies
- Handling loop remainder
Loop Fusion(Jamming)
-
combine and simplify two or more loops that share a common index or range. Leads to common subexpression optimizations.
-
Challenges:
- Detection of fusible loops
- Side effects and aliasing
Routine Inlining
-
Replace the call of a procedure or function with an inline expansion of the routine body with actual parameters substituted for the formal parameters.
-
Eliminates routine call and return overhead
- Challenges:
- Aliasing and side effects.
- Recursive procedures
- Extreme care must be taken to preserve the semantics of parameter passing
- Only possible for leaf routines
- Must rename variables / virtual registers to avoid name collision
- This is an important optimization for OOP languages where objects export a lot of very small object access routines
Tail Recursion Elimination
-
If the body of a recursive routine ends with a recursive call to the routine, replace the call with a setting of parameters followed by a branch to the start of routine body.
- Save routine call and return overhead
- Saves activation record allocation/de-allocation overhead, and a huge amount of stack space.
- Challenges:
- Needs extreme care to preserve parameter passing semantics
- Often done where parameters are passed by value
- This is a really important optimization in functional languages
Code Hoisting
-
If some expression is calculated on all paths leading from some point P in the programs CFG, and the expression satisfies the conditions for common subexpression elimination then code for the expression can be moved to point P
-
Challenges:
- Safety, aliasing, side effects
- Computationally expensive to detect hoisting opportunities
- Mostly useful for moving out of loops
Machine Dependent Optimizations
-
Recognize specific programming language constructs that can be more efficiently implemented using specialized machine instructions
-
Challenges:
- Must maintain programming language semantics
- Be careful about interactions with other instructions, i.e., setting of condition codes.