Code Optimization

Optimization

Goals for Optimization
- Make the object program faster and/or smaller, and/or use less power without changing the meaning of the program
- To produce object code that is optimal, i.e., minimum execution time or minimum space. Usually an undecidable problem.
Why optimization is needed
- To improve on inefficient implementation of programming language constructs.
- To mitigate the mismatch between high level programming language constructs and low-level machine instructions
- To compensate for sloppy programming practices
- To improve programs in ways that can’t be expressed in high level programming languages
Optimization is not a substitute for good algorithm selection

When Do Optimizations Happen?

Machine Independent Optimizations: Operate on the IR and happen after the IR code generation
- Architecture independent optimizations only
- Could be reused for different backend targets
Machine Dependent Optimizations: Operate on the generated assembly code and happen after the backend code generation.
- Can do architecture dependent optimizations
- Local (peephole) optimizations only because reasoning global properties of assembly code is hard.

Scope of Optimizations

peephole: Optimize over a few machine instructions
local: Optimize over a few statements
Intra-procedural: Optimize over the body of one routine
Inter-procedural: Optimize over some collection of routines, e.g. all the member functions in a Class
global: Optimize over an entire program

When is Optimization Worthwhile

Most expressions in typical programs are very simple
Big optimization gains come from optimizing loops and subscript calculation
Generating locally good code to begin with always wins
Most interesting optimization problems are NPC problems. There are useful heuristics in many cases.
Historically, optimization is notorious for introducing bus into a compiler

Classical Optimizations

Machine Independent Optimizations
- Constant Folding
- Common Subexpression Elimination
- Memory to Register Promotion
- Algebraic Simplification
- Code Motion (Hosting/sink)
- Strength Reduction
- Dead Code Elimination
- Loop Unrolling, Loop Fusion
- Procedure/Function Integration
Machine Dependent Optimizations
- Register Allocation, allocating variables in registers
- Use of hardware idioms
- Peephole optimization
- Instruction scheduling
- Cache-sensitive code generation
Modern compilers will chain multiple optimizations together
- Constant folding, loop unrolling, and algebraic simplification may create more common subexpressions
- Inline functions will create more opportunities for common

optimization Inhibitors

Side Effects: Any construct that changes the value of a variable has a side effect on that variable. An optimizer needs to know when the value of a variable might change in order to optimize the use of variables
Exception Handler: An exception handling mechanism can be invoked asynchronously during the execution of a program and can in general cause side effects on variables. The occurrence of exception-caused side effects is extremely difficult to predict.
Aliasing: An alias exists when two identifiers refer to the same storage location. An optimizer needs to know the effect of every assignment statement. An alias causes an unexpected side effect

How to Write Optimizations for LLVM IR

In LLVM, optimizations are organized as transformation passes.
Define a class that inherits FunctionPass/ModulePass
The pass takes the LLVM IR of a module or a function as the input and produces the transformed IR of the module or the function.
Implement runOnFunction() or runOnModule() manipulate the IR
PassManager in LLVM will chain multiple optimization and analysis passes to run together
- O3 is actually running a predefined set of optimization passes in LLVM
opt is LLVM tool that directly invoke a pass on a LLVM IR bytecode

Register Promotion Optimization

Memory loads/stores are significantly more expensive than register accesses on all architectures. It is therefore desirable to promote variables holding in memory to registers.
Challenges:
- Aliasing and address taken operator
LLVM pass mem2reg performs register promotion for LLVM IR.

Safely Promote Local Variable to Registers

Code translation pass often use alloca instructions to hold local variables.

Step 1: Detect all alloca instructions that could be promoted. we do not try to promote an alloca instruction ifL
- Used at instructions other than load/store pointer operands (address taken or aliasing risk)
- It is an aggregate type (i.e., do not promote array/struct)
Step 2: Iterate over all basic blocks of the function: For each one:
- For every load instruction that fetches a local variable to be promoted, we replace its value with the stored value of the last store instruction in the same basic block
- If there is no store instruction proceeding the load in the same block, create a PHI instruction for the variable in the basic block and replace the value of the load instruction with the PHi. (For entry block with no preceding blocks, it will be an undef value).
- Use Post[BB, v] to hold the representative value of the variable v at the end of the basic block BB.
- Post[BB, v] equals to the last stored value of v if any.
- If there is no store instruction for v, Post[BB, v] equals to the PHI instruction created for v in BB
Step 3: Fill incoming edges of all created PHI instructions based on the computed Post table
- If the PHI instruction for variable v has an incoming edge from the preceding block BB, the incoming edge should equal to Post[BB, v]
Step 4: Remove unused PHI instructions
Step 5: Remove all load/store instructions accessing promoted variables. They now should have no other use.
Step 6: Remove all alloca instructions for promoted variables, they now should have no other use as well.

Removable Instructions After Remove
Step 7: Eliminate redundant PHI instructions
- If all incoming edges of a PHI instruction pointing to the same value (or self referencing), remove the PHI instruction and replace its references with this value throughout the program.
- Iteratively perform the above step until no more PHI instruction can be eliminated.
- Note that this last step does not affect the correctness but remove redundant IR for speed.

Notation for Optimization Examples

We use the operator @ as a C-style pointer dereferencing operator in these examples to avoid confusion with * which is always multiple
We use a C-style assignment operator = that expects an l-value as its left operand and an r-value as its right operand
&X is used for address of X
Variables with names ending in P (e.g., AP, BP) are pointer variables
Most data items are assumed to be 4 bytes wide
For most optimizations we will preferentially force address items into the form @P + constant because this form fits well with the most common register + displacement addressing mode on many machines.

Constant Folding

If an expression involves only constants or other values known to the compiler, calculate the value of the expression at compile time

Challenges:

Compiler must contain a calculator for expressions. For cross compiler must calculate in the arithmetic of the target machine.
May want to extend to common builtin functions, e.g. max, sqrt
Beware of arithmetic faults during constant expression evaluation, e.g. overflow, underflow, divide by zero
May need to reorder expressions to facilitate constant folding, e.g. 3 + A + 4

Source Code	Optimized Code
`#define aSize 100` `int A[aSize]` `A[aSize-1] = aSize * aSize`	`@(&A[0] + 396) = 10000`

Common Subexpression Elimination

If the same expression is calculated more than once, save the value of the first calculation and use it place of the repeated calculations

Challenges:

Detection of common subexpression. Use heuristics, canonical ordering
Must be able to detect when the values of the variables in an expression could be changed. The presence of functions with side effects, aliasing and exception handling make this more difficult.

Source Code	Optimized Code
`A[J] = A[J] + 1;` `A[J + 1] = A[J]`	`AP = (&A[0] + 4 * J)` `@AP = @AP + 1` `@(AP + 4) = @AP`

Variable Folding

If a variable is assigned a “computationally simple” value, replace subsequent uses of the variable by the value. Creates new opportunities for constant folding and common subexpression elimination.

Challenges: aliasing, side effects

Source Code	Optimized Code
`J = 13` `A[J] = B[J] * C[J]` `K = J + 1` `A[K] = B[K]`	`A[13] = B[13] * C[13]` `A[J + 1] = B[J + 1]`

Algebraic Simplification

Use algebraic identities to simplify or reorder expressions. Use commutativity and associativity of operators. Often used to facilitate constant folding and common subexpression elimination.

Challenges:

Must be careful not to change the meaning of the program
Consider the corner cases of overflow and underflow

Source Code	Optimized Code
`X + 0`	`X`
`X - 0`	`X`
`X * 1`	`X`
`Y / 1`	`Y`
`X * 0`	`0`
`-X + -Y`	`-(X + Y)`
`P and true`	`P`
`P and false`	`false`
`P or true`	`true`
`P or false`	`P`

Code Motion

Move code (statements, expressions or fragments) out of loops (forward or backward) to places where they are executed less frequently
Code that is moved must be loop invariant, i.e. independent of the loop indices
Challenges:
- Correctness (safety) of code motion.
- Influenced by side effects, aliasing and exception handling
- Must preserve semantics of loops that never execute
- May need fixup code after the loop.
Some optimizers do code motion by copy insertion followed by common subexpression elimination

Strength Reduction

Strength reduction is the replacement of “slow” operations (e.g., multiply and divide) with “faster” operations (e.g., add and subtract)
Challenges:
- Deciding which operations are sufficiently “slow” as to warrant this optimization. Strength reduction was more important when multiply and divide were slow relative to add and subtract.
- Need to be careful not to change the meaning of the program
Strength reduction is often used to eliminate multiplications in array subscript polynomials. Often a precursor to machine dependent optimizations.

Dead Code Elimination

Compiler detects and eliminates code that can never be executed (dead code) and computations of values that are never subsequently used (useless computations)
Challenges:
- Detection of dead code requires analysis of control flow graph
- Useless calculations may result from programmer error or from other optimizations. Test replacement is a form of useless calculation elimination
Source Code Optimized Code

if(true)
X = Y
else
X = Z
return
X = Y X = Y
return

Loop Unrolling

Replicate the body of a loop and adjust loop index. Reduces loop overhead relative to loop body. Enables common subexpression and constant folding optimizations
Challenges:
- Detecting unrollable loops and considering side effects
- Loop carried dependencies
- Handling loop remainder

Loop Fusion(Jamming)

combine and simplify two or more loops that share a common index or range. Leads to common subexpression optimizations.
Challenges:
- Detection of fusible loops
- Side effects and aliasing

Routine Inlining

Replace the call of a procedure or function with an inline expansion of the routine body with actual parameters substituted for the formal parameters.
Eliminates routine call and return overhead
Challenges:
- Aliasing and side effects.
- Recursive procedures
- Extreme care must be taken to preserve the semantics of parameter passing
- Only possible for leaf routines
- Must rename variables / virtual registers to avoid name collision
This is an important optimization for OOP languages where objects export a lot of very small object access routines

Tail Recursion Elimination

If the body of a recursive routine ends with a recursive call to the routine, replace the call with a setting of parameters followed by a branch to the start of routine body.
Save routine call and return overhead
- Saves activation record allocation/de-allocation overhead, and a huge amount of stack space.
Challenges:
- Needs extreme care to preserve parameter passing semantics
- Often done where parameters are passed by value
This is a really important optimization in functional languages

Code Hoisting

If some expression is calculated on all paths leading from some point P in the programs CFG, and the expression satisfies the conditions for common subexpression elimination then code for the expression can be moved to point P
Challenges:
- Safety, aliasing, side effects
- Computationally expensive to detect hoisting opportunities
- Mostly useful for moving out of loops

Machine Dependent Optimizations

Recognize specific programming language constructs that can be more efficiently implemented using specialized machine instructions
Challenges:
- Must maintain programming language semantics
- Be careful about interactions with other instructions, i.e., setting of condition codes.

CSC488 Lec8 & 9

Optimization

When Do Optimizations Happen?

Scope of Optimizations

When is Optimization Worthwhile

Classical Optimizations

optimization Inhibitors

How to Write Optimizations for LLVM IR

Register Promotion Optimization

Safely Promote Local Variable to Registers

Notation for Optimization Examples

Constant Folding

Common Subexpression Elimination

Variable Folding

Algebraic Simplification

Code Motion

Strength Reduction

Dead Code Elimination

Loop Unrolling

Loop Fusion(Jamming)

Routine Inlining

Tail Recursion Elimination

Code Hoisting

Machine Dependent Optimizations

CATALOG

FEATURED TAGS

Source Code	Optimized Code
`if(true)` `X = Y` `else` `X = Z` `return` `X = Y`	`X = Y` `return`

Removable Instructions	After Remove