See also: https://arxiv.org/pdf/2012.10086

Dataflow analysis

  • A dataflow framework (read: a dataflow problem) consists of:
    • A direction , forwards or backwards.
    • A semilattice with domain and meet operator .
      • In dataflow analysis, more precise analyses are always “bigger,” since meet is the combination of two paths.
        • We’re going down the lattice as we combine more paths
      • while in abstract interpretation this is the other way around (though in abstraction, this is just convention and isn’t required).
    • A set of transfer functions .
      • For forward, meet on entry, transfer goes from enter to exit.
      • Meet in between statements/blocks, transfer within a block.
      • Monotone if preserves lattice order, distributive if continuous, i.e.,
  • Strategies for computing dataflow
    • Ideal: find meet between all paths from entry that actually reach in real computation
      • Smaller (less precise) solutions are conservative, greater solutions are incorrect.
      • Undecidable
    • Meet-over-paths (MOP): find meet between all paths, even if not executed.
      • No direct algorithm: “all paths” is unbounded with cycles
    • Maximum fixpoint (MFP): algorithm
      • At each node, compute meet, then transfer function. Repeat til fixed point.
      • If distributive, this is MOP. If not, this is less than MOP.
  • Example analyses
    • Liveness: will this variable be used later before modification? Backwards
    • Availability: has this expression been used before as-is? Forwards
    • Reaching definitions: what definitions are relevant here? Forwards
    • Constant propagation: i.e., constant folding. Not distributive.
    • Alias analysis: do two pointers point to same object? See Alias analysis.
  • Optimizations
    • Constant folding: const eval. Constant propagation is just substitution.
    • Dead-code elimination: liveness analysis.
    • Partial-redundancy elimination: don’t repeat expression multiple times.
      • Incorporates loop-invariant code motion, eliminates redundant expressions, computes expressions as late as possible
      • Find anticipated (i.e., very busy) expressions (will x + y always be computed between here and end): backwards dataflow.
      • Find points where each expr is postponable: points where expr where anticipated but not available (i.e., before they’re used). Forward analysis.
      • Eliminate single-use temp var assignment with used (liveness for expressions) analysis. Backwards.
      • Place expr definition accordingly and replace use if needed.
  • Loops
    • Dominator: start to must go through . Post-dominator is converse.
    • Preorder: node before children ltr. Postorder: children ltr, node.
    • Depth-first ordering: reverse postorder of CFG.
    • Depth-first spanning tree: visit CFG in depth-first order, add edge if not seen before.
      • Dataflow algorithm will converge in depth of DFST + 2 iters.
    • Edge of a CFG is:
      • Advancing: if descendant of in DFST
      • Retreating: if opposite
      • Cross: if neither
      • Back edge: dominates . Always retreating.
    • CFG reducible if retreating edges is back edges regardless of DFST build order doesn’t matter.
      • Non-reducible:

2024-07-10 • optimizations and gc (Dragon Book 9: Machine-independent optimization), lattice, inlining

Abstract interpretation

  • Abstract interpretation is any case where we have abstract semantics approximating concrete semantics for analysis purposes.
    • Dataflow analysis, control flow analysis are kinds of abstract interpretation.
  • Defined by two functions:
  • Other stuff
    • Galois insertion: must be injective: one element of per .
    • Composing Galois connections
      • Sequential composition
      • Products (regular, relational: )
      • Monotone functions
      • Direct product: abstraction is element-wise, concretization is join of items
      • Direct tensor product: is to direct product what relational is to independent items. Requires two , similar construction to relational.
  • Hierarchy of semantics. Trace semantics at very bottom (most precise), axiomatic semantics at top.
  • Fixed points
    • Widening to get to an over-approximate fixpoint…
      • Define operation such that
      • Given a chain , the chain eventually stabilizes.
    • …then narrowing to get closer to least fixpoint.
        • gives us something less than , but not less than .
      • For decreasing chains , the chain eventually stabilizes

See 2024-08-07 • control flow analysis & abstract interpretation (Principles of Program Analysis 4: Abstract interpretation), lattice, 2024-07-31 • Justin, prelim targeted - abstract interpretation

Control flow analysis

  • Control flow analysis asks: what functions could this expr evaluate into?
    • A kind of abstract interpretation that works well for functional programs with no program points, nested computation, and first-class functions
    • Dispatch is dynamic. How do we know where control flow goes?
  • Analysis maps from variable or labeled subexpression to lattice:
    • We can divide this into execution environment and abstract environment
    • In CFA case, ,
  • Acceptability relation: iff is an acceptable CFA for .
    • The relation generates constraints.
      • For application, arg flows to parameter, body flows to application. Into function through arg, out through body.
    • Can be solved in with graph approach, with edges between constraints
      • Syntax-directed constraint gen.
      • Generate and constraints, meaning if , and
      • Can also be solved in with fixed-point calculation.
  • Adding dataflow information
    • We add dataflow info alongside our CFA info, mapping from / to dataflow lattice .
    • . Constraints mirror regular CFA constraints, but with .
  • Adding context
    • Different call sites aren’t distinguished from each other.
    • Since bodies flow to applications, all applications inherit analysis of all other applications, since they’re tied to the same body.
    • e.g., let f = (λx -> x) in (f f) (λy -> y)
    • -CFA
      • is context of last function calls.
        • is a context environment, recording context where each var was bound. (i.e., when we analyze lambda introduction)
        • and are a kind of state, changed during analysis
      • Exponential space; 0-CFA is polynomial.
    • -CFA
      • Polynomial time by not allowing captures. All variables bound in same context, so context env is just .
    • Cartesian Product Algorithm
      • In a language with multiple args, apply them all at once.
      • Kind of getting -CFA benefits!

Types of analyses

  • Flow sensitivity: do we take order of execution and control flow into account while executing analysis?
    • “if we have an application , even if our analysis of is empty (i.e., it can’t take on any lambda values), we still perform analysis of anyways.”
  • Path sensitivity: when encountering conditionals, does each branch’s analysis incorporate conditional?
  • Context sensitivity: do we treat function calls differently based on where function is called?

See Principles of Program Analysis 3: Constraint-based analysis.

Alias analysis

  • Points-to analysis: keep track of all heap locations each variable could refer to at each program point. May-alias analysis
    • Forward dataflow with domain , union meet.
    • Can also use constraint-based repr:
      • We can add context to this analysis: domain is now , where is the call-stack (think -CFAs) at the point of allocation.
      • Object sensitivity: context is self
      • Cartesian Product Algorithm: context is self and all arguments. Used for type inference, but slow
      • Can use unification (union-find) to solve. Equality solving.
    • Sucks for big libraries, reflection.
    • Binary decision diagrams: exist, can be used for equivalence checking/SAT solving
  • Access-path tracking: Abstract interpreter that keeps track of tuples of info. must-alias
    • Weak update = maintains old state as part of updated state. Strong update = decisively choose state
    • First, do points-to. Each abstract memory location is an instance key.
    • An access path consists of all of x.y. ... .z.
    • Statements transform abstract tuples: instance key, allocation site has single concrete live object, access paths that must point to instance key, are there other access paths that could point to key, access paths that don’t point to instance key.
    • Good for underapproximate analyses. “This must be an issue, and there might be more”

See 2024-08-07 • control flow analysis & abstract interpretation (Alias Analysis for Object-Oriented Programs)

Interprocedural analysis

  • Build a call graph. But no context sensitivity means dataflow info from diff call sites is confused.
  • Inlining into CFG works, but suffers from exponential space blowup, can’t do recursion.
  • Context-sensitive analyses! See Control flow analysis.
  • (Context-free language) graph reachability a good fit in this context.

See interprocedural analysis, Program analysis via graph reachability.