Pointer Analysis

Pointer Analysis Basics: Aliases

Two variables are aliases if:
- They reference the same memory location
More useful:
- Prove variables reference different location

E.g.

  int x, y;
  int *p = &x;
  int *q = &y;
  int *r = p;
  int **s = &q;

Alias Sets:
- {x, *p. *r}
- {y, *q, **s}
- {q, *s}

The Pointer Alias analysis Problem

Decide for every pair of pointers at every program point
- Do they point to the same memory location?
Correctness:
- Report all pairs of pointers which do/may alias
Ambiguous:
- Two pointers which may or may not alias
Accuracy/Precision:
- How few pairs of pointers are reported while remaining correct
- i.e., reduce ambiguity to improve accuracy

Pointer Analysis: Design Options

Representation
Heap Modeling
Aggregate modeling
Flow sensitivity
Context sensitivity

Alias Representation

Track pointer alias
- More Precise, Less efficient
- {*a, b}, {*a, e}, {b, e}, {*a, c}, {*a, d}
Track points-to info
- Less Precise, More Efficient
- {a, b}, {b, c}, {b, d}, {e, c}, {e, d}

Heap Modeling Options

Heap merged
- i.e. “no heap modeling”
Allocation sit (any call to malloc/calloc)
- Consider each to be a unique location
- Does not differentiate between multiple objects allocated by the same allocation site
Shape Analysis
- Recognize linked lists, trees, DAGs, etc

Aggregate Modeling Options

Arrays
1. Elements are treated as individual locations
2. Treat entire array as a single location
3. Treat first element separate from others
Structures
1. Elements are treated as individual locations (Field sensitive)
2. Treat entire structure as a single location

Flow Sensitivity Options

Flow Insensitive
- The order of statements does not matter
  - Result of analysis is the same regardless of statement order
- Uses a single global state to store results as they are computed
- Not very accurate
Flow Sensitive
- The order of the statements matter
- Need a control flow graph
- Must store results for each program point
- Improve accuracy
Path Sensitive
- Each path in a control flow graph is considered

S1: a = malloc();
S2: b = malloc();
S3: a = b;
S4: a = malloc();
S5: if(c)
    a = b;
S6: if(!c)
    a = malloc();
S7: ... = *a;        

Option	Result	Explanation
Flow Insensitive	{heapS1, heapS2, heapS4, heapS6}	Order does not matter, union of all possibilities
Flow Sensitive	{heapS2, heapS4, heapS6}	In order, does not aware exclusive
Path Sensitive	S2, S6	In order, aware exclusive

Context Sensitivity Options

Context insensitive/sensitive
- whether to consider different calling contexts

Consider the context of p at S6

Option	Result	Explanation
Context Insensitive	{a, b}	Union all the possibilities
Context Sensitive	Called from S3:{a} Called from S5:{b}	Consider each individual path

Address Taken Algorithm

Basic, fast, ultra-conservative algorithm
- Flow-insensitive, context-insensitive
Algorithm
- Generate the set of all variables whose addresses are assigned to another variable
- Assume that any pointer can potentially point to any variable in that set
Complexity: O(n) - linear in size of program
Accuracy: Very imprecise

$P_{s5}$ = {$heapS_1, heapS_4, heapS_6, p, q, local$}

Andersen’s Algorithm

Flow-insensitive, context-insensitive
Representation:
- One points-to graph for entire program
- Each node represents exactly one location

For each statement, build the points-to graph:

Assignment	Edge
`y = &x`	y points to x
`y = x`	if x points to w then y points to w
`*y = x`	if y points to z and x points to w then z points to w
`y = *x`	if x points to z and z points to w then y points to w

Iterate until graph no longer changes
Worst case complexity: O($n^3$), where n = program size

$P_{s5}$ = {$heapS_1, heapS_4, local$}

Steensgaard’s Algorithm

Flow-insensitive, context-insensitive
Representation
- A compact points-to graph for entire program
  - Each node can represent multiple locations
  - But can only point to one other node
    - i.e. Every node has a fan-out of 1 or 0
union0find data structure implements fan-out
- unioning while finding eliminates need to iterate
Worst case complexity: O(n)
Precision: Less precise than Andersen’s

$P_{s5}$ = {$heapS_1, heapS_4, heapS_6, local$}

Pointer Analysis Using BDDs: Binary Decision Diagrams

Use a BDD to represent transfer functions
- Encode procedure as a function of its calling context
- Compact and efficient representation
Perform context-sensitive, inter-procedural analysis
- Similar to dataflow analysis
- But across the procedure call graph
Gives accurate results
- Scales up to large programs

Pointer Analysis: Yes, No, & Maybe

Do pointers a and b point to the same location?
- Repeat for every pair of pointers at every program point
How can we optimize the maybe cases?

Pointer Analysis: Data Speculative Optimization

Implement a potentially unsafe optimization
- Verify and Recover if necessary

EPIC Instruction sets
- Support for speculative load/store instructions
Speculative compiler optimizations
- Dead store elimination
- Redundancy elimination
- Copy propagation
- Strength reduction
- Register promotion
Thread-level speculation(TLS)
- Hardware and compiler support for speculative parallel threads
Transactional programming
- Hardware and software support for speculative parallel transactions

Probabilistic Pointer Analysis

Potential advantage of Probabilistic Pointer Analysis:
- It does not to be safe
PRA Research Objectives:
- Accurate points-to probability information
  - At every static pointer deference
- Scalable analysis
  - Goal: Entire SPEC integer benchmark suite
- Understand scalability/accuracy tradeoff
  - Through flexible static memory mode
Algorithm Design Choices
- Fixed:
  - Bottom Up / Top Down Approach
  - Linear transfer functions (for scalability)
  - One-level context and flow sensitive
- Flexible:
  - Edge profiling (or static prediction)
  - Safe (or unsafe)
  - Field sensitive (or field insensitive)

Traditional Points-To Graph	Probabilistic Points-To Graph

PPA Result Summary:
- Matrix-based, transfer function approach
  - SUIF/Matlab implementation
- Scales to the SPECint 95/2000 benchmarks
  - One-level context and flow sensitive
- As accurate as the most precise algorithms
- Interesting result:
  - Approximately 90% of pointers tend to point to only one thing

CSCD70 Lec7