0%

南大《软件分析》05.Data Flow Analysis Foundation I

前言

本文是南京大学李樾、谭添老师的《软件分析》课程的笔记。内容来自课程ppt以及课程讲解。

课程主页:Static Program Analysis | Tai-e (pascal-lab.net)

授课视频:视频南京大学《软件分析》课程01(Introduction)_哔哩哔哩_bilibili

课程实验repo:pascal-lab/Tai-e-assignments: Tai-e assignments for static program analysis (github.com)

Contents

  1. Iterative Algorithm, Another View
  2. Partial Order
  3. Upper and Lower Bounds
  4. Lattice, Semilattice, Complete and Product Lattice
  5. Data Flow Analysis Framework via Lattice
  6. Monotonicity and Fixed Point Theorem
  7. Relate Iterative Algorithm to Fixed Point Theorem
  8. May/Must Analysis, A Lattice View
  9. MOP and Distributivity
  10. Constant Propagation
  11. Worklist Algorithm

Iterative Algorithm, Anthor View

  • Given a CFG (program) with k nodes, the iterative algorithm updates OUT[n] for every node n in each iteration.

  • Assume the domain of the values in data flow analysis is V, then we can define a k-tuple

    $(OUT[n_1], OUT[n_2], …, OUT[n_k])$

    as an element of set $(V_1 × V_2 × … × V_3)$denoted as $V^k$ , to hold the values of the analysis after each iteration

  • Each iteration can be considered as taking an action to map an element of Vk to a new element of Vk , through applying the transfer functions and control-flow handing, abstracted as a function$F: V^k \rightarrow V^k$

  • Then the algorithm outputs a series of k-tuples iteratively until a k-tuple is the same as the last one in two consecutive iterations

Given a CFG (program) with $k$ nodes, the iterative algorithm updates $OUT[n]$ for every node n in each iteration.

Each iteration takes an action$F: V^k \rightarrow V^k$

  • init → $(\bot, \bot …, \bot) = X_0$
  • iter 1 → $(v_1^1, v_2^1, …, v_k^1) = X_1 = F(X_0)$
  • iter 2 → $(v_1^2, v_2^2, …, v_k^2) = X_2 = F(X_1)$
  • iter i → $(v1^i, v_2^i, …, v_k^i) = X_i = F(X{i-1})\because Xi=X{i+1}$
  • iter i+1 → $(v1^i, v_2^i, …, v_k^i) = X{i+1} = F(X{i})\therefore X_i=X{i+1}=F(X_i)$

X is a fied of function F if $X=F(X)$

The iterative algorithm reaches a fixed point

The iterative algorithm (or the IN/OUT equation system) produces a solution to a data flow analysis

  • Is the algorithm guaranteed to terminate or reach the fixed point, or does it always have a solution?
  • If so, is there only one solution or only one fixed point? If more than one, is our solution the best one (most precise)?
  • When will the algorithm reach the fixed point, or when can we get the solution?

Partial Order

Definition

We define poset as a pair (P, ⊑) where ⊑ is a binary relation that defines a partial ordering over P, and ⊑ has the following properties:

  1. ∀x ∈ P, x ⊑ x (Reflexivity)
  2. ∀x, y ∈ P, x ⊑ y ∧ y ⊑ x ⟹ x = y (Antisymmetry)
  3. ∀x, y, z ∈ P, x ⊑ y ∧ y ⊑ z ⟹ x ⊑ z (Transitivity)

Example

Example 1

Is (S, ⊑) a poset where S is a set of integers and ⊑ represents ≤ (less than or equal to)?

  1. Reflexivity 1 ≤ 1, 2 ≤ 2 √
  2. Antisymmetry x ≤ y ∧ y ≤ x then x = y √
  3. Transitivity 1 ≤ 2 ∧ 2 ≤ 3 then 1 ≤ 3 √

Example 2

Is (S, ⊑) a poset where S is a set of integers and ⊑ represents < (less than)?

(1) Reflexivity 1 < 1, 2< 2 ×

Example 3

Is (S, ⊑) a poset where S is a set of English words and ⊑ represents the substring relation, i.e., s1 ⊑ s2 means s1 is a substring of s2?

  1. Reflexivity √
  2. Antisymmetry √
  3. Transitivity √

Example 4

Is (S, ⊑) a poset where S is the power set of set {a,b,c} and ⊑ represents ⊆ (subset)?

  1. Reflexivity √
  2. Antisymmetry √
  3. Transitivity √

Upper and Lower Bounds

Given a poset (P, ⊑) and its subset S that S ⊆ P, we say that u ∈ P is an upper bound of S, if ∀x ∈ S, x ⊑ u. Similarly, l ∈ P is an lower bound of S, if ∀x ∈ S, l ⊑ x.

We define the least upper bound (lub or join) of S, written ⊔S, if for every upper bound of S, say u, ⊔S ⊑ u. Similarly, We define the greatest lower bound (glb, or meet) of S, written ⊓S, if for every lower bound of S, say l, l ⊑ ⊓S.

Usually, if S contains only two elements a and b (S = {a, b}), then ⊔S can be written a ⊔ b (the join of a and b) ⊓S can be written a ⊓ b (the meet of a and b)

Some Properties

  • Not every poset has lub or glb

  • But if a poset has lub or glb, it will be unique

    Proof:

    assume g1 and g2 are both glbs of poset P

    according to the definition of glb

    g1 ⊑ (g2 = ⊓P) and g2 ⊑ (g1 = ⊓P)

    by the antisymmetry of partial order ⊑

    g1 = g2

Lattice, Semilattice, Complete and Product Lattice

Lattice

Given a poset (P, ⊑), ∀a, b ∈ P, if a ⊔ b and a ⊓ b exist, then (P, ⊑) is called a lattice

A poset is a lattice if every pair of its elements has a least upper bound and a greatest lower bound

Example

Example 1

Is (S, ⊑) a lattice where S is a set of integers and ⊑ represents ≤ (less than or equal to)?

The ⊔ operator means “max” and ⊓ operator means “min” √

Example 2

s (S, ⊑) a lattice where S is a set of English words and ⊑ represents the substring relation, i.e., s1 ⊑ s2 means s1 is a substring of s2?

pin ⊔ sin = ? ×

Example 3

Is (S, ⊑) a lattice where S is the power set of set {a,b,c} and ⊑ represents ⊆ (subset)?

The ⊔ operator means ∪ and ⊓ operator means ∩ √

Semilattice

Given a poset (P, ⊑), ∀a, b ∈ P, if only a ⊔ b exists, then (P, ⊑) is called a join semilattice if only a ⊓ b exists, then (P, ⊑) is called a meet semilattice

Complete Lattice

Given a lattice (P, ⊑), for arbitrary subset S of P, if ⊔S and ⊓S exist, then (P, ⊑) is called a complete lattice

All subsets of a lattice have a least upper bound and a greatest lower bound

Example

Example 1

Is (S, ⊑) a complete lattice where S is a set of integers and ⊑ represents ≤ (less than or equal to)?

For a subset $S^+$ including all positive integers, it has no ⊔$S^+$ (+∞)

Example 2

Is (S, ⊑) a complete lattice where S is the power set of set {a,b,c} and ⊑ represents ⊆ (subset)?

Note: the definition of bounds implies that the bounds are not necessarily in the subsets (but they must be in the lattice)

Every complete lattice (P, ⊑) has a greatest element = ⊔P called top and a least element ⊥ = ⊓P called bottom

Every finite lattice (P is finite) is a complete lattice

Product Lattice

Given lattices L1 = (P1, $⊑_1$), L2 = (P2, $⊑_2$), …, Ln = (Pn, $⊑_n$), if for all i, (Pi , $⊑_i$ ) has $⊔_i$ (least upper bound) and $⊓_i$ (greatest lower bound), then we can have a product lattice $L^n = (P, ⊑)$ that is defined by:

  • $P = P_1 × … × P_n$
  • $(x_1,…,x_n)⊑(y_1,…,y_n)\Leftrightarrow(x_1⊑y_1)∧…∧(x_n⊑y_n)$
  • $(x_1,…,x_n)⊔(y_1,…,y_n)=(x_1⊔_1y_1,…,x_n⊔_ny_n)$
  • $(x_1,…,x_n)⊓(y_1,…,y_n)=(x_1⊓ _1y_1,…,x_n⊓ _ny_n)$

  • A product lattice is a lattice

  • If a product lattice L is a product of complete lattices, then L is also complete

Data Flow Analysis Framework via Lattice

A data flow analysis framework (D, L, F) consists of:

  • D: a direction of data flow: forwards or backwards
  • L: a lattice including domain of the values V and a meet ⊓ or join ⊔ operator
  • F: a family of transfer functions from V to V

Data flow analysis can be seen as iteratively applying transfer functions and meet/join operations on the values of a lattice

Monotonicity and Fixed Point Theorem

Monotonicity

A function f: L → L (L is a lattice) is monotonic if ∀x, y ∈ L, x ⊑ y ⟹ f(x) ⊑ f(y)

Fixed-Point Theorem

Given a complete lattice (L, ⊑), if

(1) f: L → L is monotonic and (2) L is finite,

then the least fixed point of f can be found by iterating $f(⊥), f(f(⊥)), …, f^k(⊥)$ until a fixed point is reached the greatest fixed point of f can be found by iterating $f( ), f(f( )), …, f^k( )$ until a fixed point is reached

Prove: Existence of fixed point

Proof:

By the definition of ⊥ and f: L → L, we have $⊥ ⊑ f(⊥)$

As f is monotonic, we have $f(⊥) ⊑ f(f(⊥)) = f^2(⊥)$

By repeatedly applying f, we have an ascending chain $⊥ ⊑ f(⊥) ⊑ f^2(⊥) ⊑ … ⊑ f^i (⊥)$

As L is finite (its height is H), the values are bounded among $⊥ , f(⊥) , f^2(⊥) … f^H(⊥)$

When i > H, by pigeonhole principle, there exists k and j that $f^k(⊥) = f^j (⊥) (assume k < j ≤ H+1)$

Further as fk(⊥) ⊑ … ⊑ fj (⊥), we have $f^{Fix} = f^k(⊥) = f^{k+1}(⊥) = f^j (⊥)$

Thus, the fixed point exists.

Prove: Least Fixed Point

Proof:

Assume we have another fixed point x, i.e., x = f(x)

By the definition of ⊥, we have ⊥ ⊑ x

Induction begins:

As f is monotonic, we have f(⊥) ⊑ f(x)

Assume $f^i (⊥) ⊑ f^i (x)$, as f is monotonic, we have $f^{i+1}(⊥) ⊑ f{i+1}(x)$

Thus by induction, we have $f^i (⊥) ⊑ f^i (x)$

Thus $f^i (⊥) ⊑ f^i (x) = x$, then we have $f^{Fix} = f^k(⊥) ⊑ x$

Thus the fixed point is the least

Now what we have just seen is the property (fixed point theorem) for the function on a lattice. We cannot say our iterative algorithm also has that property unless we can relate the algorithm to the fixed point theorem, if possible