Dataflow Analysis 2 & Loop

CSCD70 Lec3

Posted by Dino on January 25, 2021

Framework

  Reaching Definitions Live Variables Available Expression
Domain Sets of definitions Sets of variables Sets of expressions
Direction forward
$out[b] = f_b([in[b]])$
$in[b] = \wedge out[pred(b)]$
backward
$in[b] = f_b([out[b]])$
$out = \wedge in[succ[b]]$
forward
$out[b] = f_b([in[b]])$
$in[b] = \wedge out[pred(b)]$
Transfer Function $out = Gen_b \cup (In - Kill_b)$ $In = Use_b \cup (Out - Def_b)$ $f_B := Gen_B \cup (In - Kill_B)$
Meet Operator $\cup$ $\cup$ $\cap$
Boundary Condition out[entry] = $\emptyset$ in[exit] = $\emptyset$ out[entry] = $\emptyset$
Initial Interior Points out[b] = $\emptyset$ in[b] = $\emptyset$ out[b] = $\mathbb U$

Foundations of Data Flow Analysis

  1. Meet operator
  2. Transfer Functions
  3. Correctness, Precision, Convergence
  4. Efficiency

A Unified Framework

  • Data flow problems are defined by
    • Domains of values: V
    • Meet operator (V $\wedge$ V $\rightarrow$ V)
    • A set of transfer functions (V $\rightarrow$ V)
  • Usefulness of unified framework
    • To answer questions such as correctness, precision, convergence, speed of convergence for a family of problems
    • Reuse code

Meet Operator

  • Properties of the meet operator
    • commutative: $x \wedge y = y \wedge x$
    • idempotent: $x \wedge x = x$
    • associative: $x \wedge (y \wedge z) = (x\wedge y) \wedge z$
    • There is a Top element T such that $x \wedge T = x$
  • Meet operator defines a partial ordering on values
    • Transitivity: if $x \leq y$ and $y \leq z$ then $x \leq z$
    • Antisymmetry: if $x \leq y$ and $y \leq x$ then $x = y$
    • Reflexivity: $x \leq x$

Partial Order

  • Top and Bottom elements
    • Top $\top$ such that: $x \wedge T = x$
    • Bottom $\bot$ such that : $x \wedge \bot = \bot$
  • Values and meet operator in a data flow problem defines a semi-lattice:
    • There exists a $\top$, but not necessarily a $\bot$
    • (The actual definition in the book is either $\top$ or $\bot$ exists)
  • x, y are ordered: $x \leq y$ then $x \wedge y = x$

Descending Chain

  • Definition
    • The height of a lattice is the largest number of > relations that will fit in a descending chain
    • e.g. $x_o > x_1 > x_2 > …$
  • Height of values in reaching definitions?
    • n - number of definitions
  • Important property: finite descending chain

  • Can an infinity lattice have a finite descending chain?
    • yes

Transfer Functions

  • Basic Properties f: $V \rightarrow V$
    • Has an identity function
      • There exists an f such that f(x) = x, for all x
    • Closed under composition
      • if $f_1, f_2 \in F$, then $f_1 \cdot f_2 \in F$

Monotonicity

  • A framework (F, V, $\wedge$) is monotone iff
    • $x \leq y$ implies $f(x) \leq f(y)$
    • i.e. a “smaller or equal” input to the same function will always give a “smaller or equal” output
  • Equivalently, a framework (F, V, $\wedge$) is monotone iff
    • $f(x \wedge y) \leq f(x) \wedge f(y)$
    • i.e. merge input, then apply f is **smaller than or equal to ** apply the transfer function individually and then merge the result

Distributivity

  • A framework (F, V, $\wedge$) is distributive iff
    • $f(x \wedge y) = f(x) \wedge f(y)$
    • i.e. merge input, then apply f is equal to apply the transfer function individually then merge result

Data Flow Analysis

  • Definition
    • Let $f_1, …. f_m : \in F$, where $f_i$ is the transfer function for the node i
      • $f_p = f_{n_k} \cdot … \cdot f_{n_1}$, where p is a path through nodes $n_1$, …, $n_k$
      • $f_p$ = identify function, if p is an empty path
  • Ideal data flow answer:
    • For each node n
      • $\wedge f_{p_i}(T)$, for all possibly executed paths $p_i$ reaching n.
    • But determining all possibly executed paths is undecidable

Meet-Over-Paths (MOP)

  • Error in the conservative direction

  • Meet-Over-Paths (MOP):
    • For each node n:
      • MOP(n) = $\wedge f_{pi}(T)$, for all paths $p_i$ reaching n
      • a path exists as long there is an edge in the code
      • consider more paths than necessary
      • MOP = Perfect Solution $\wedge$ Solution to Unexecuted Paths
      • MOP $\leq$ Perfect Solution
      • Potentially more constrained, solution is small
        • Hence conservative
      • It is not safe to be > Perfect Solution
  • Desirable solution: as close to MOP as possible

Mop Example

Assume B2 and B4 do not update x

  • Ideal Solution
    • $B1 \rightarrow B2 \rightarrow B4 \rightarrow B6 \rightarrow B7 $ (i.e. x = 1)
    • $B1 \rightarrow B3 \rightarrow B4 \rightarrow B5 \rightarrow B7 $ (i.e. x = 0)
  • MOP
    • $B1 \rightarrow B2 \rightarrow B4 \rightarrow B6 \rightarrow B7 $ (i.e. x = 1)
    • $B1 \rightarrow B2 \rightarrow B4 \rightarrow B5 \rightarrow B7 $ (Unexecuted path)
    • $B1 \rightarrow B3 \rightarrow B4 \rightarrow B5 \rightarrow B7 $ (i.e. x = 0)
    • $B1 \rightarrow B3 \rightarrow B4 \rightarrow B6 \rightarrow B7 $ (Unexecuted path)

Solving Data Flow Equations

  • Example: Reaching definitions
    • out[entry] = {}
    • Values = {subsets of definitions}
    • Meet operator: $\cup$
    • Transfer Function: out[b] = $gen_b \cup (in[b] - kill_b)$
  • Any solution satisfying equations = Fixed Point Solution (FP)

  • Iterative algorithm
    • initializes out[b] to {}
    • If converges, then it computes Maximum Fixed Point(MFP):
      • MFP is the largest of all solutions to equations
  • Properties:
    • $FP \leq MFP \leq MOP \leq Perfect Solution$
    • FP & MFP are safe (because they are conservative)

Precision

  • If data flow framework is distributive, then if the algorithm converges, IN[b] = MOP[b]

  • Monotone but not distributive: behaves as if there are additional paths

Additional Property to Guarantee Convergence

  • Data flow framework (monotone) converges if there is a finite descending chain

  • For each variable IN[b], OUT[b], consider the sequence of values set to each variable across iterations:

    • if sequence for in[b] is monotonically decreasing
      • sequence for out[b] is monotonically decreasing
    • if sequence for out[b] os monotonically decreasing
      • sequence for in[b] is monotonically decreasing

Speed of Convergence

  • Speed of convergence depends on order of node visits

  • Reverse “direction” for backward flow problems

Reverse Postorder

  • Step 1: depth-first post order
1
2
3
4
5
6
7
8
9
10
11
Visit(n){
  for each successor s that has not been visited
    Visit(s);
    PostOrder(n) = count;
    count = count + 1;
}

main(){
  count = 1;
  Visit(root);
}
  • Step 2: reverse order
1
2
For each node i
  rPostOrder = NumNodes - PostOrder(i);

Depth-First Iterative Algorithm (forward)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
input: control flow graph CFG = (N, E, Entry, Exit)
/* Initialize */
out[entry] = init_value
For all nodes i
    out[i] = T
Change = True

/* Iterate */
While Change {
  Change = False
  For each node i in rPostOrder {
    in[i] = Meet(out[p]) // For all predecessor of i
    oldout = out[i]
    out[i] = Transfer(in[i])
    if oldout != out[i]
      Change = True
  }
}

Speed of Convergence

  • If cycles do not add information
    • information can flow in one pass down a series of nodes of increasing order number:
      • e.g. $1 \rightarrow 4 \rightarrow 5 \rightarrow 7 \rightarrow 2 \rightarrow 4$
    • passes determined by number of back edges in the path
      • Essentially the nesting depth of the graph
    • # iterations = # back edges in any acyclic path + 2

Check List for Data Flow Problems

  • Semi-lattice
    • set of values
    • meet operator
    • top, bottom
    • finite descending chain?
  • Transfer functions
    • function of each basic block
    • monotone
    • distribute?
  • Algorithm
    • initialization step(entry/exit, other nodes)
    • visit order: rPostOrder
    • depth of the graph

Loops

Intuitive properties of a loop

  • single entry point
    • You don’t want to start in the middle of loop
  • edges must form at least a cycle
    • The meaning/purpose of the single entry point

Formal Definitions

  • Dominators
    • Node d dominates node n in a graph (d dom n) if every path from the start node to n goes through d (Informally the lowest common ancestor)
  • Dominators can be organized as a tree
    • a $\rightarrow$ b in the dominator tree iff a immediately dominates b

Dominance

  • x strictly dominates w (x sdom w) iff x dom w and x $\neq$ w

Natural Loops

  • Definitions
    • Single entry-point: header
      • a header dominates all nodes in the loop
    • A back edge is an arc whose head dominates its tail (tail $\rightarrow$ head)
      • a back edge must be a part of at least one loop
    • The natural loop of a back edge is the smallest set of nodes that includes the head and tail of the back edge and has no predecessors outside the set, except for the predecessors of the header

    • Algorithm to find natural loops
      1. Find the dominator relations in a flow graph
      2. Identify the back edges
      3. Find the natural loop associated with the back edge

1. Finding Dominators

  • Definition
    • Node d dominates node n in a graph (d dom n) if every path from the start node to n goes through d (Informally the lowest common ancestor)
  • Formulated as MOP problem:
    • node d lies on all possible paths reaching node n $\rightarrow$ d dom n
    Direction Forward
    Values Set of Basic Blocks
    Meet Operator $\cap$
    Top All BBs
    Bottom $\emptyset$
    In Equation $\cap out[pre[b]]$
    Boundary Condition out[entry] = entry
    Initialization out[B] = T
    Finite descending chain # bbs
    Transfer function $f_b(x) = b \cup {x}$
  • Speed:
    • With reverse postorder, most flow graphs(reducible flow graphs) converge in 1 pass
  • Example

    CFG Dominator Tree
    PRE[1] {entry, 9}
    OUT[1] {1}
    PRE[2] {1}
    OUT[2] {1, 2}
    PRE[3] {1, 2, 8}
    OUT[3] {1, 3}
    PRE[4] {3, 7}
    OUT[4] {1, 3, 4}
    PRE[5] {4}
    OUT[5] {1, 3, 4, 5}
    PRE[6] {4}
    OUT[6] {1, 3, 4, 6}
    PRE[7] {5, 6, 10}
    OUT[7] {1, 3, 4, 7}
    PRE[8] {7}
    OUT[8] {1, 3, 4, 7, 8}
    PRE[9] {8}
    OUT[9] {1, 3, 4, 7, 8, 9}
    PRE[10] {8}
    OUT[10] {1, 3, 4, 7, 8, 10}

2. Finding Back Edges

  • Depth-first spanning tree
    • Edges traversed in a depth-first search of the flow graph form a depth-first spanning tree

  • Categorizing edges in graph
    • Advancing (A) edges: from ancestor to proper descendant
    • Cross (C) edges: from right to left
    • Retreating (R) edges: from descendant to ancestor
  • Algorithm
    1. Perform a depth first search
    2. For each retreating edge t $\rightarrow$ h, check if h is in t’s dominator list
  • Example

3. Constructing Natural Loops

  • The natural loop of a back edge is the smallest set of nodes that includes the head and tail of the back edge and has no predecessors outside the set, except for the predecessors of the header

  • Algorithm

    • Delete h from the flow graph
    • Find nodes that can reach t
    • Those nodes plus h form the natural loop of $t \rightarrow h$

Inner Loop

  • If two loops do not have the same header:
    • they are either disjoint, or
    • one is entirely contained (nested within) the other
      • inner loop: one that contains no other loop
  • If two loops share the same header:
    • Hard to tell which is the inner loop
    • Combine as one

Preheader

  • Optimizations often required code to be executed once before the loop

  • Create a preheader basic block for every loop