A program dependence graph (PDG) is an alternate representation of programs, where nodes are statements or conditional predicates, and edges are drawn between nodes that depend on each other. For instance, in the following code:
…we would have edges
- Auto-parallelization.
- This representation is unlike a control flow graph, since for nodes whose concrete values or execution don’t depend on each other, we don’t enforce any sort of ordering on them!
- Thus, this representation only exposes necessary data dependence/sequencing, exposing opportunities for parallel computation!
- This was the original motivation for this development
- Improving performance of program optimization.
- Lots of program transformations can be computed with a single walk of these dependences, since computationally relevant parts of the program are directly connected.
- Also, unlike a data dependence graph, we also expose control dependencies (e.g., the edge
). - Thus, transformations like vectorization and code motion that rely on knowing both data and control dependencies are easily handled!
- Allowing for incremental dataflow analysis.
- This representation allows for an incremental dataflow algorithm that permits incremental optimization as dataflow information is updated.
The program dependence graph
When actually implementing this, we construct two sub-graphs: a graph for control dependence and a graph for data dependence. We can then combine the edges of these two graphs to get our full PDG.
Control dependence
We define control dependence in terms of a control flow graph and dominators! Remember that we say ENTRY
to EXIT
pass through
Given two nodes
post-dominates all nodes in any path from to (not including and ), but doesn’t post-dominate .
Basically,
A diagram of this
Here’s a diagram from CSE 231 @ UCSD, FA12’s lecture slides that makes this clear:
Building a control dependence graph
Identifying control dependence
To actually build this, we first build a post-dominator tree (parent immediately post-dominates children) using a modified CFG, where we add a node ENTRY
with a T
edge to START
and a F
edge to STOP
. To get the post-dominator tree, we build the dominator tree for the reverse CFG, which is just a dataflow problem! (See the notes above).
Let
Example
As an example, here’s an unmodified CFG for a program:
If we build the post-dominator tree for this program, we get:
Building the set
from this, we have: The intuition is that
consists of all edges where may or may not be executed depending on the execution of . Note that we have but not , since post-dominates ; either goes directly to , or it’ll go through the rest of program execution and end up there eventually.
Each edge
- If
is an ancestor of in the post-dominator tree, we mark all nodes on the path up from to as being control dependent on . - This occurs when we have loops: when
is the last node in a loop body, and loops back to depending on a condition. is first in the CFG, but post-dominates. - The post-dominator tree is kind of backwards:
is always on top!
- This occurs when we have loops: when
- Otherwise, we mark all nodes from the path up from
to the parent of as being control dependent on . - This node is guaranteed to be the least common ancestor of
and . - We know it must be an ancestor of
since if it weren’t, it wouldn’t be able to be an ancestor of , since TODO - It must be the parent of
since that is ‘s immediate post-dominator.
- This node is guaranteed to be the least common ancestor of
Note
Note that above, we stated by definition that
can’t post-dominate . But we never said anything about the other way around!
This first step takes
- Post-dominator tree construction is
? is determined in time. - Each edge in
takes time to walk—worst case, the height of the post-dominator tree is a straight line, meaning time.
Adding region nodes
Now, let’s say based on the analysis above, we have a bunch of nodes with the same control dependences: a false edge from node
Thus, every region node corresponds to a set of control dependences, each of which is a node potentially labeled as true or false.
This is a kind of common subexpression elimination: if multiple different nodes have the same set of control dependences, we factor out those control dependences into a single region node that gets pointed to instead. Additionally, because of this, predicates only have two successors, like in CFG: the true and false cases, instead of “all the trues” and “all of the falses.”
Eventually, our goal is to create a bunch of region nodes, where for each region node
To create these region nodes, we let’s look at our control dependences, and perform the following steps:
- Examine every node
that has other than a single unlabeled control dependence predecessor. This node has a set of control dependence predecessors . - We do a postorder traversal over the post-dominator tree, visiting children left-to-right, then parents.
- This means
runtime, with respect to number of nodes . - Remember that the set of control dependences of ancestors in the post-dominator tree don’t necessarily have a subset relationship with respect to those of its children!
- Extract
into its own region node . ’s predecessors are now , and ‘s predecessor is . - We keep a map from sets of CDs to region nodes.
- Check the intersection
between and each of its children ‘s CDs, - If
, we can make the child just depend on instead. - If
, replace whatever those dependences are in with the child’s predecessor. - After processing a node, it will only have one predecessor
- And children are processed first!
- If
- Set
‘s successors to all nodes - It’s assigned
as its predecessors, and its successors are all the nodes that are control dependent on exactly this set. - Finally, for every predicate node, we want to make sure it only has one false edge and one true edge
- If there’s multiple false edges, create a new region node, have the false edge point to that, and have all the false successors receive from that new region node instead.
Example
The left image shows the whole process up until the last “Finally” step. The right image shows the completed process.
Approximating control dependence
Danger
I haven’t fully understood this part. I don’t fucking get it. Do another pass at some point.
It’s difficult to recreate the CFG from the control dependence graph above, which is important for source-to-source translators and generating sequential code. For instance, with node
Instead, we’re gonna create an approximate control dependence graph. This graph has two properties:
- It’s much better suited for generating efficient control flow
- For “structured programs” (i.e., no gotos?), this is equivalent to the actual control dependence graph.
- This paper was written in 1987. This was a concern they had back then
- Even when it’s not, it’s better than the CFG is lmao
At a high level, the issue with our previous version of CDGs was what information region nodes encoded. Before, region nodes stood in for a set of conditional predicates that were true at that point; they had a single predecessor—usually? or always? a true/false edge from a predicate—and all of its successors were items where the conditional predicates were at least what was contained in this region node. It created hierarchical structure in the CDG.
However, this says nothing about how its descendants should be ordered when going back to a CFG! To ameliorate this, we want regions to hold some extra information. Specifically, they can now have two kinds of control dependence successors:
- First, a region node can point to the entry node of a hammock, a subgraph within the CFG that has single entry and exit nodes.
- A hammock is a subgraph within the graph with associated entry and exit nodes.
- Its properties, formally:
- All edges from
to go to entry node . - All edges from
to go to exit node . .
- All edges from
- More informally:
- All edges into the hammock must go to the entry node.
- All edges out of the hammock must go to the exit node.
- Basically, the hammock is like an island in the control flow graph, where all inputs go through the entry, and all outputs go through the exit
- The exit is considered not part of the hammock.
- If
and , isn’t control dependent on . - If
is the exit node, it necessarily post-dominates . - Otherwise, for a given exit node
, we know all paths from to the exit must go through the exit node . If were to be control dependent on , would post-dominate . This would mean post-dominates , which is a contradiction.
- If
- Note that hammocks can have sub-hammocks!
- These edges will always point to nodes.
- Second, region nodes can have a number of exit edges.
- These are like exit edges of a basic block; they can optionally be labeled with true/false.
- After we do the translation, these edges will point to other regions.
- Additionally, each region now contains state about the subhammocks (all nodes and edges) it “corresponds” to.
- This correspondence can be exploited for generating code from the control dependence graph, as we’ll see below
This approximation is computed by repeatedly transforming the CFG in place:
- First, we detect hammocks in the control flow graph.
- Second, add a new region
and a new edge . - Third, convert each predicate node into a region node.
- Region nodes now contain a list of “subhammocks” as state: this is initialized to the empty set.
- Fourth, apply some transformations to turn the CFG into a control dependence graph!
- Process in inverse topological order: end-to-beginning.
- These transformations do two things:
- Move some nodes/subhammocks around.
- Add the subhammock to the region node involved.
- Type 1: make exit node with no non-hammock predecessors a predecessor to entry node
- Because we’re going in inverse topological, the exit node is guaranteed to be a region.
- We also add the hammock into the state of the region node being pointed to.
is the entry node of the hammock.
- Type 2: for exit node with other predecessors, create region node with subhammock and exit edge to exit node
We can exploit this construction for efficient code generation:
- To generate code for a hammock, generate code corresponding to its region nodes in topological order of control dependencies.
- To generate code for a region, generate code for the “corresponding” set of hammocks of that region in topological order of inter-hammock data dependencies.
Data dependence
Prerequisite: what is an "upwards exposed use"
The text makes mention of this without defining it. Here’s a slide from some Georgia Tech notes:
This is also known as a reachable use. It’s kind of like availability, but it’s not “has this been already referenced.”
To figure out data dependence, we make a DAG for each basic block during program parsing. Each upwards exposed use in a block corresponds to a DAG leaf node, called a merge node. We then do reaching definition analysis, and connect reaching definitions to their use sites. Thus, a merge node represents the set of reaching definitions at that point—all the predecessors are all the reaching definitions.
The concept is pretty simple, but there are some complications for common language features:
- I/O operations act on an implicit file object.
- Array operations are assumed to act on the entire array.
- We can try doing vectorization, etc. by rephrasing all loop indices as operations on a loop index from 1 to the size of the array
- Check dependence analysis
- Aliasing and pointers complicate things, often requiring interprocedural analysis
Def-use chains
We can also define data dependence in terms of two sets:
(variables defined in this statement) and (variables referenced in this statement). If a variable in is later used in , there’s a flow/data dependence from . This is transitive; if and , we say there is a def-use chain from . See https://piazza.com/class_profile/get_resource/hy7enxf648g7me/i2qodoub2q73x for more.
These dependences can be categorized as loop-carried or loop-independent.
- Loop-carried data dependences is caused because of loop iterations; e.g., a definition from one iteration referenced during the next iteration.
- Loop-independent data dependences occur because of execution order, regardless of how many times any loop is iterated.
Note that there can be self-edges in a data dependence graph!
Practical use cases
Okay. What can we use a PDG for?
- Detecting parallelism.
- Any node not in a strongly connected region of both data and control dependencies can be vectorized.
- i.e., if some nodes don’t either data or control depend on one another, they can be vectorized!
- With only a data dependence graph, you have to do “if-conversion” to turn control structure of program into data dependencies, which makes it difficult to recover original control structure after
- See Dragon §11.8 for more details.
- Any node not in a strongly connected region of both data and control dependencies can be vectorized.
- Node splitting
- We purposefully duplicate nodes to enable better parallelism, less IPC.
- Here’s an example control flow graph and associated PDG:
- Here,
X = P
. We could duplicate theX
,D
, andE
on both branches ofP
and eliminateX
. - But this isn’t clear in the CFG.
- Either we have to duplicate
A
,B
, andC
going all the way down, or we have to try and moveX
earlier in the control flow graph.
- Either we have to duplicate
- In the PDG, successors of a region node only need to be ordered in data-dependency order when running.
- We immediately know that
X
can followP
, since there’s no data dependency on one another. - Thus, we know we can move it up earlier in the CFG, and then do scalar propagation to eliminate
X
on both branches.
- We immediately know that
- Code motion
- We can do loop-invariant code motion faster, and with arbitrary control flow!
- In general, we can move whatever wherever as long as we don’t change the PDG!
- Loop fusion
- We can join loops if:
- They’re executed under same conditions
- There’s no data dependence between loops
- They’re executed the same number of times
- The first two are trivially checkable in a PDG.
- The last is also accomplished by checking the hierarchical PDG for e.g., if
do
loops iterate the same number of times
- We can join loops if:
- Program slicing
- See also:
- A slice shows a set of statements that influence the value of a variable at a particular program point.
- We need to see what data and control flow dependencies lead to this variable!
Incremental data flow update & optimization
Compilers will often transform source programs for optimization purposes. When we mean incremental data flow updates and incremental optimization, we mean incremental in this sense—updating with granularity with respect to compiler source changes, not arbitrary user programmer program changes. The former is much more limited in scope, the latter is not.
Basically, we’re describing ways to incrementally update dataflow information while doing compiler optimizations. Additionally, we can do incremental optimization: applying additional optimizations during an incremental dataflow update. Here, we describe how this algorithm works with respect to two specific optimizations.
Branch deletion
Sometimes a compiler can delete a conditional branch that it determines will never be taken. If a def-use path goes through this deleted branch, we can only delete the corresponding data dependence if no other paths exist that don’t go thru that branch.
Our incremental solution first adds output dependence edges to the PDG: treating a definition as a use of a variable. Next, we update our control dependencies:
- Anything in the region pointed to by the “predicate is false” edge is deleted.
- The original predicate computation node is deleted.
- Everything that used to be in the “predicate is true” region is placed in a region dependent on P’s CDs.
Antidependence
There is also the concept of an antidependence edge: the reverse of a regular data dependence, where we add an edge from use to definition.
Next, we incrementally update our dataflow information. This is hard to explain in the abstract, so we’re gonna walk through an example. Here’s a short pseudocode program:
B = ??? // 1
if P then // 2
A = ??? // 3
B = ??? // 4
else // 5
A = ??? // 6
endif // 7
??? = A // 8
??? = B // 9
And here’s the corresponding PDG for this program:
First, note that we’ve added output dependence edges, labeled with a circle. B
; we want to enforce that 4
comes after 1
. Now, assuming the true branch is always taken, we need to delete the ENTRY
would point straight to nodes 1, 3, and 4.
So how can we formalize this intuition? Well first, some notation:
- Let
be the control predecessor of a node, and let be the least common ancestor of and in the depth-first spanning tree of the CFG.
And now, here’s the formalization:
- First, let
be a node in the “predicate is true” region, and consider pairs where is output dependent on and both have a data dependence successor . - If
, we have an issue: this implies can reach through , which is incorrect; delete the edge . - How does this apply to our current situation?
is ENTRY
.is also ENTRY
. Bothand point to (data dependence). - This implies we can get to
through either just , or . i.e., can reach through . - We should delete
since should be the only path.
There’s another case we have to consider too! For instance, see:
loop
if P then A = ???
??? = A
A = ???
endloop
if P
is always true, the definition on line 2 will always kill the definition on line 4. The first definition is loop-independent, because it executes regardless of # of iterations. The second definition is loop-carried—the definition carries across loop iterations. Here,
- Let
be a node in the “predicate is true” region, and consider pairs , where is output dependent on , and both have a data dependence successor . - If
, and is a loop-independent data dependence edge, but is a loop-carried data dependence edge, should prevent from reaching . Delete .
One last wrinkle: in the general case, several definitions along several paths might kill another definition. To solve this, we create a pseudo-definition for a variable in a region, and a correspondence dependence edge, if all paths in that region have that definition and that dependence.
In this CFG and PDG example, we can group the assignments for
We add pseudo-definitions at a region if some successor has a definition. Additionally, we propagate backwards up until we reach a least common ancestor of:
- A definition in the altered “predicate is true” region
- Another definition of
that the previous is output dependent on
During propagation, only examine definitions output dependent on
Loop peeling & unrolling
Loop peeling is one step of loop unrolling. We can move dependences to point to the peeled iteration. Additionally, if you have weird GOTO stuff happening, you can reduce control dependencies into the loop:
To modify PDG, add output dependence from peeled iteration to loop. Loop body’s control dependences are only on the loop predicate; all others moved to peeled iteration. Data dependences within loop body updated.
Incremental dependence update requires additional case to deal with labels and gotos:
- If
output dependent on , and both have as data dependence successor. - If
is loop-independent, should prevent from reaching . Remove .