Summary

SSA is the same as CPS. The algorithm for finding SSA is the same as finding optimal nesting in CPS. In this algorithm, wherever a variable is defined, we need s for that variable in the dominator frontier of that node. The dominator frontier consists of all nodes who are successors of dominated nodes, but are not themselves dominated.

The core thesis: SSA and lambda calculus are doing the same thing. Imperative languages compile down to basic blocks and flow edges, functional languages compile to lexically nested functions. They’re the same in different notation!

Motivation for SSA

Sometimes we want to find the use-sites of each defined variable, or conversely, the definition-sites of each variable in an expression. The def-use chain is a data structure that holds, for each statement, all use sites of variables defined there, and all definition sites of the variables used there. But if a variable has definitions and uses, this will require pointers! Not good.

Hence, SSA:

§ 6.2.4. Single static assignment (SSA)

Single static assignment (SSA) form is a form of a program with two key attributes:

  • Every variable is only defined once. No variable is redefined or shadowed or mutated, ever.
  • Where two diverging control paths rejoin (e.g., after a branch), we use the function to denote “pick the value based on which branch was taken.”

Example:

a = 2
b = 3
b = 4
if a > b:
	c = 9
else:
	c = 4

becomes

a1 = 2
b1 = 3
b2 = 4
if a1 > b2:
	c1 = 9
else:
	c2 = 4
c = phi(c1, c2)
Link to original

Where to place s?

So how do we convert into SSA? Specifically, how do we rename variables and where do we put s?

One dumb approach: split every variable at every basic-block boundary, and use for every variable in every block:

Example of crude placement

Lot of useless copies… what can we do about that?

Taking inspo from functional programming

We can instead express the program as a set of mutually recursive functions, where each function represents a basic block, and its arguments represents the locals in the program.

In this representation, a basic block might correspond to a function: . Transferring control from basic block to basic block is represented by calling function. This is a form of continuation-passing style.

If, in that function, for instance, that would correspond to two different calls into that assign different values of : and .

When actually writing programs in this style though, normally we don’t define a ton of mutually recursive functions; instead we take advantage of nested scope to reduce repetition:

In this case, since we only ever use one value of , we can just leave that in the outermost scope.

Here’s the key: the best way to nest definitions is the same as the optimal SSA form!

The algorithm for optimal s

We only need a where two different definitions reach the same point. How do we know?

Well, let’s look at blocks 1 and 2, for instance. Only one definition of makes it to block 2, even though it has two inputs. This is because block 1 dominates block 2! All paths to block 2 must pass through block 1, and so the definition of always comes from there.

Additionally, we say that if there’s an edge , and a node that strictly dominates (i.e., does not strictly dominate itself) but not , then is in the dominance frontier of .

Let’s assume that at the start node, we initializing all variables with some definition. If a node defines a variable, then by definition, the nodes in the dominance frontier of are reachable (and could thus inherit variable definitions) from two different places: (by definition), and the start node (since it dominates everything and doesn’t dominate this node).

Thus, the rule is: whenever node defines variable x, all nodes in the dominance frontier of need a -function for .

Because of the equivalence above, we can say that this algorithm converts imperative procedures into well-structured functional programs with nested scope.

What we can learn

For SSA users: one important property of SSA is that a variable definition dominates all uses (or for a use within a -function, dominates a predecessor of the use). This is made implicit in SSA explanations, but is made explicit when we represent SSA as nested scopes—function nesting encodes domination!

For functional programmers: learn how to use flow charts and boxes lmao:

Functional programmers often get lost in the notation of functional programming, which is a shame.