See also

These notes incorporate several other resources, in order of used-ness:

Additionally, for practice talking about abstract interpretation, see 2024-07-31 • Justin, prelim targeted - abstract interpretation

Abstract interpretation is a general framework

A high-level overview

Before we dive into the textbook, let’s provide a very high-level overview of what abstract interpretation is.

Often times, we have some correctness property we want to prove about our program—for instance, “program never reaches state .” However, using the concrete semantics of our language—the possible executions of our program—and checking that the property actually holds for our exact program behavior is undecidable.

However, when proving properties about our programs, often times we don’t need to know the exact execution traces of our programs; instead, we can just approximate the behavior of our program by recording more general properties about our programs, and then having our semantics act on those general properties. For instance, if we want to weed out “divides by 0” bugs, we could the record the sign or zero-ness of every variable, and propagate this analysis by defining what analysis values should be returned from function calls. Or, if you want to do interval analysis generally, you could have that be your analysis.

In other words, we can use an abstract semantics—a conservative over-approximation of our program’s possible execution behavior. If we can prove program correctness holds for this over-approximation, then it must hold for the actual programs!

This is abstract interpretation, and it provides a general framework around program analyses, just as dataflow analysis provides a framework for program analyses. As we’ve alluded to in the example above, an abstract interpretation consists of two pieces:

  • the abstract domain; the general property that we record in place of the programs themselves.
  • the abstract semantics, which defines how programs transform these properties.

The upshot

We have two questions: why does this matter, and how is this related to other static analysis approaches: dataflow analysis and control-flow analysis?

See also

As alluded to in the above section, abstract interpretation is another framework for static analysis, similar to dataflow analysis. In particular, they both use lattices, though in different ways, as you’ll see below!

However, unlike dataflow analysis, abstract interpretation guarantees that our program analysis results are consistent with a program’s semantics! In abstract interpretation, we establish key relationships between the static analysis and the formal semantics. Additionally, we can iteratively build up more complex or more approximate analyses using simpler or more precise analyses, and still maintain guarantees that the analysis agrees with the semantics.

In dataflow analysis, we never establish this, and so we don’t have this guarantee.

A control flow analysis is a form of abstract interpretation!

Danger

Note that denotational semantics and domain theory also uses partial ordering, but in a completely different way than we’re using. Both use partial orders/posets to help with their formalizations, but completely differently. Don’t mix them up!

Abstract interpretations in depth

In the concrete world, a program might transform a concrete value from to . If the concrete semantics of language dictates that evaluates into , we denote that as:

Example: imperative statements

For example, in IMP, a statement transforms from state to state. In operational semantics land, we might say that a statement and a state eventually evaluates to a new state: . Similarly, for a single statement , we can state the following relation:

Additionally, for a bunch of statements , we could say:

We’ll say that these concrete values in our language are part of the set . Since evaluation may be non-deterministic (this is the whole point—it’s undecidable!), there’s no function to denote this. Our concrete semantics defines what programs and states are in this relation.

Note that we never enforce . In functional programs, for instance, this isn’t necessarily the case!

Additionally, we’ll have a program analysis on expressions in our program. For instance, this might be an interval analysis that says “this expression must be within these two ints.” Maybe it’s a constant propagation analysis that says “this expression must have constant value .” Either way, elements in are an abstraction of expression behavior—they record some property about our expression.

In our abstract semantics, programs act on a property and return another property . To denote this, we write:

We can define a deterministic function . This family of functions is our abstract semantics: how do we symbolically execute statements/expressions in the abstract domain? In some literature, is called the abstract transformer.

In the context of abstract interpretation, we insist that and are lattices—having both a meet () and a join () that agree on a definition of partial ordering (the absorption laws: , ).

In short, this means that we’re creating an abstract interpreter for our language. Instead of defining semantics for how values get transformed to values, we define semantics on how abstract values get transformed into other abstract values. This could be a range if we’re doing interval analysis, or information about the sign of different expression, or things like that.

To be honest, that’s all you really need to understand the technique. The next sections go through a ton of math, but most of this is used to formalize what we mean by “abstract interpreter,” and thus establish the important correctness properties of abstract interpretation:

  • If we can define a representation function that represents concrete values in terms of analysis values, and this function fulfills a certain quality, we know that our analysis will preserve the semantics of our programs!
  • If we have a precise analysis and a more coarse analysis , and they fulfill certain properties with respect to each other, we can establish an equivalence between them, which gives us the ability to move between them and combine analyses with respect to this equivalence.
  • For sound analyses that need to run iteratively to fixpoint, we can use weakening and narrowing to ensure that we get relatively good results in a reasonable amount of time.

Soundness and correctness

Note

For the next few sections, we’re gonna get pretty abstract. Hold on! We’ll bring back concrete examples very soon.

So how do we know that an abstract semantics “corresponds” to a concrete semantics? That is to say, how do we know that if a property holds for the abstract semantics, it also holds for the concrete semantics? This is formalized as soundness (also called correctness in the textbook).

To formalize this, we first define representation functions and , which maps from the concrete domains and to the best values describing them in abstract domains and , respectively.

We can formulate “correctness” or “soundness” of an analysis like so:

The most important part of this equation is boxed: if is safely described by , then the final value is safely described by .

Alternative framing around relations

Alternatively, we can frame correctness in terms of a correctness relation , which returns whether a value is described by the property . This relation must have the following properties:

Due to the first condition, if approximates and the analysis is able to compute an upper approximation of , , is also an approximation of . Therefore, in the property lattice if , then is more precise than ( approximates all the values approximated by and possibly some other ones).

If a value is approximated by more than one abstract value, the second condition allows us to deterministically select a single abstract value to use in our analysis: we take the smallest (i.e., most precise) of them all, . is a valid approximation of . This condition excludes correctness relations where a value is approximated by two incomparable abstract values, and allows us to work with a single analysis instead of considering one analysis for each candidate abstract value.

On precision and the meaning of partial orderings

Notice that when discussing lattices here, we say to mean that is more precise than . However, some literature in data flow has this flipped— would be the better, more precise analysis. For instance, this is the case in the Dragon book:

Note

Another way of phrasing this:

Link to original

For more information, see the note on lattices, which covers some of these differences and links to different references of lattices.

Galois connections

Sometimes, if we have a complex analysis , we want to replace with a simpler lattice . Instead of performing the analysis in , we find a description of the elements of in , then perform the analysis in . We can define their relationship in terms of two functions:

  • An abstraction function provides a representation of elements as elements
  • A concretization function expresses the meaning of elements of as elements of .

There are two isomorphic ways of concretizing the notion that these are comparable analyses.

  • In the first approach, we say is a Galois connection iff and are monotone and the following holds:
      • Applying and undoing abstraction will yield something at most as strong as the original. It might be weaker (which is okay, since we don’t want abstraction to introduce spurious strength)
      • Applying and undoing concretization will yield something at least as strong as the original.
    • In both cases, the operations may lose precision—we might not get back what we started with, but we can get back a safe approximation
  • Alternatively, we say is an adjunction iff .
    • and should respect the ordering between and .
    • If safely describes , the concrete version of should over-approximate .

Example

We can describe the power set of integers as an interval . The abstraction function returns the smallest interval that contains all the integers in the set, and the concretization function gets the set of all integers within the interval.

Note that this relies on a partial ordering of intervals , defined as if is entirely inside . Remember that in abstract interpretation, lesser analyses are more precise!

To relate it back to our discussion of program analyses above, we can define Galois connections using extraction functions! Given an extraction function , a program analysis can be seen as a more coarse description of sets of possible values, thus giving rise to the Galois connection:

…where abstraction identifies the least upper bound of all the representations of the values in a set:

…and concretization correspondingly finds all the concrete values such that their representation is less than the abstraction value:

Soundness in Galois connections

A worked example: sign analysis

As an example, for integer expressions, we might want to figure out if an integer expression could be positive, negative, or zero. For instance, if we see x = y + z, and we know y is positive and z is non-negative, what can we say about the sign of x? This is the analysis discussed in the Isil slides (though with less precision than we’re doing here)!

In this case, let’s say we have two analyses. The first analysis (where ) yields the set of possible integers that every expression could evaluate to. The second analysis (also where ) gives a set of the possible signs that expression could have. We record a set of signs so that we can say things like “x is non-negative”—. Thus, our Galois connection is:

So how might we define , , and ? Well to start, it’s pretty easy to define a function that gets the sign of a number. In general, it’s pretty common to have a Galois correspondence between two powersets and and an extraction function that gets us from . In this case, we can define our functions as follows:

What this means is that the abstraction function just gets the set of all values for each of its values, and the concretization function gets every value where its extraction is in the input extraction set.

In the sign example, we’d read this as: abstraction gets the signs of all of the values in the input set, and concretization gets every integer with a sign contained in our sign set.

Extra properties of Galois connections

There are some properties we can establish about Galois connections:

  • is purely additive, is purely multiplicative
    • , . has least upper bound, lots of joins—gets less precise (remember that we’re backwards vs dataflow here)
    • , . has greatest lower bound, lots of meets—gets more precise
  • We can define and in terms of each other!
    • Let’s say we’ve already defined somewhere.
    • Substitute by adjunction: . Defining using .
      • If we had originally written , we wouldn’t have been able to use adjunction!
    • Same in other direction:

Strengthening Galois connections with Galois insertions

In a Galois connection, doesn’t have to be injective or one-to-one: multiple elements of , when concretized, might map to the same element of . This is because in the definition of a Galois connection, we only guaranteed that . Since multiple inputs to , can map to the same output , may or may not result in the same value as the original!

In a Galois insertion, we now require:

thus guaranteeing that is injective. Thus, doesn’t contain superfluous elements that describe . This also means that is surjective—for every possible value , there exists some value where . Finally, from this, we know that .

Note that if we’re using an extraction function , since just calls on all the values in the set, is surjective if and only if is!

Alternatively, if we have a reduction operator that picks the representative for each , we can use that. This reduction operator just gets the smallest/most precise value among equivalent values. This operation is defined as:

If we have an extraction function, we can get the reduction operator by:

Composing Galois connections

The power of defining all this structure is that we can define simpler analyses, prove the core properties of Soundness and correctness on those analyses, then compose them together to get more complex analyses. Additionally, if can show equivalences between more precise and more approximate analyses, we can move between precise and approximate analyses too, all while maintaining our guarantees about soundness and correctness!

Functionally (i.e., sequentially)

If and are Galois connections, we can compose them together sequentially! is also a Galois connection. This is useful for moving to a more coarse analysis.

Independent attributes

A product! We have a Galois connection where we apply and element-wise:

This can lose us precision. For instance, let’s say we have an expression (x, -x). This has possible values . If we do a product to represent integers, and we have a sign analysis , our most precise analysis in the original analysis is now , meaning we lose all information about the relative signs of the two components. This is a tuple of sets, not a set of tuples.

Relational method

If we have and , the relational method lets us do products within the powerset: . If we have an extraction function , we can define:

This is precise enough to have a sign analysis of !

Total function space

If is a Galois connection and is a set. We have , where and apply their inputs, returning an or , respectively, and then we use and from the original definition to get to where we need to go.

Monotone function space

Note

Remember when we said we’d deal with different lattice value types? Here we are :)

If we have and as Galois connections, we can get the Galois connection :

Basically, to turn into a , apply to turn new input into , apply , then apply to go from ‘s output to . The same applies in the other direction for .

Other combinations

  • Direct product: abstraction is element-wise, concretization is join of items
  • Direct tensor product: is to direct product what relational is to independent items. Requires two , similar construction to relational.
  • Reduced product and tensor product: define reduction to get a Galois insertion

Taking a step back

Okay. That was. A ton of theory. Let’s take a step back and come up for some air. What does all of this theory get us? How can express the intuition of abstract interpretation using the precise mathematical tools we have now?

Squaring with Isil Dillig’s lecture slides

As we alluded to at the start of this chapter, in an imperative language like IMP, we can interpret a statement as a function from state to state. A state is just a mapping from a variable to a concrete value. This is expressed in the notation of this chapter like so:

In the Isil Dillig lecture notes, our abstract semantics maps abstract states to abstract states, where abstract states map variables to abstract values:

We might express this relationship in our notation as:

where for some abstract analysis value .

Now, let’s say we want to do sign analysis, as the lecture slides do. Thus, our analysis value . And let’s say we want to perform analysis on a statement x = y + z. The concrete semantics of that statement are as follows:

Now we want to go about defining an analysis such that we can intuit the sign of based on the signs of and . In other words, we want to define an interpreter that takes some state and returns a new state of that same type.

Let’s first start by looking at plus expressions themselves. The semantics of a plus expression is to add together a pair of numbers:

So how can we get an analysis of the sign of ?

Well, we can start by lifting numbers into a very precise analysis that records sets of numbers . The lecture slides do this implicitly:

Notice how the lecture slides refers to the concrete meaning of our abstract domain as sets of integers, not just integers! We do this lifting with our representation function . This function just wraps the number in a set: . Note that this also creates a Galois connection between and ! Our concrete semantics is also a kind of program analysis: the most concrete, precise possible analysis!

Now, since we’re operating on sets of numbers, we can define a precise analysis of that says: given the possible input values to , what are the possible output values?

Now, the Galois connection in the lecture slides corresponds to what we worked on in A worked example sign analysis. We defined a function , returning the sign of an integer. This gives us the Galois connection , letting us get the set of signs in a set of integers.

Additionally, using the relational method, we can get the Galois connection .

With these different pieces, we can compose together everything we need to get a sign-approximate analysis of . Using the monotone function space, we can compose together our different functions:

The benefit of abstract interpretation is that at every step of the way, we know that this analysis is valid with respect to our concrete semantics!

Further up, our analysis for the statement can call into .

Finally, remember that our actual analysis is a mapping . We can lift the concrete into our precise analysis with the Galois connection between total functions. We can do this again to to get our full abstract environment.

Types of analyses

So what can we do with abstract interpretation? A ton.

  • Relational domains
    • Karr’s domain: linear equalities between variables (e.g., )
    • Octagon domain: constraints of the form
    • Polyhedra domain: constraints of the form
  • Rounding error analysis (egglog, Herbie)
  • Interval analysis (also egglog, Herbie)
  • Symbolic analysis
  • Strictness analysis: will a parameter in a function call be evaluated? Does this call terminate?
  • Aliasing and pointer anlaysis
  • Escape analysis: does data lifetime exceed scope?
  • Heap analysis: memory leak detection, etc.
  • Control flow analysis

With respect to MOP in dataflow

The dataflow world has something called a meet-over-paths solution, where we get a fixpoint solution via transfer functions. In the abstract interpretation world, we specify static analysis algorithms as approximations of program semantics, and use our soundness/correctness rules to ensure that these are valid approximations.

The abstract interpretation fixpoint solution is the same as the MOP solution for distributive abstract interpretations—when abstract transformers (i.e., abstract interpreters) preserve union.

Danger

I don’t know what means but this is on this page about abstract interpretation:

Otherwise, the abstract domain used for the fixpoint solution can be minimally enriched into its disjunctive completion to get exactly the “MOP solution” in fixpoint form (on the completed abstract domain)

Fixed points

As we’ve elaborated above before, we express the relation that our abstract interpreter transforms a state into a new state when executing program with the notation . Simultaneously, we denote this as for a monotone function dependent on the program .

However, for recursive or iterative program constructs, we want to obtain the least fixed point, , as the result of a finite iterative process. In denotational semantics and domain theory, we do this by calling over and over again, starting with ; this denoted . Because of Tarski’s fixed point theorem, we have a guarantee that the fixed point of this value exists, and that the least pre-fixed point is equal to .

However, this relies on being continuous, which we didn’t guarantee! In general, if our lattice has infinite height, we can’t make the same guarantee. And practically, when implementing this in the real world, this might just take too damn long to converge!

Thus, we don’t know if this infinite sequence stabilizes, and even if it does, whether that is truly the least fixed point of the function.

Some notation

Before we move on, let’s establish some notation.

Since a function can have many different fixed points—values where applying the function returns the same thing—we denote the set of all these fixed points as :

We can also define two others sets , when a function is reductive and results in a value smaller according to the partial order:

…and , when the function is extensive and results in a value larger according to the partial order:

The different possible fixed points for , and how they relate to each other, is summarized in this diagram:

Note that the greatest lower bound for is equal to that of ; it’s .

If we started with , we would be starting with a hella over-approximation! Not precise enough.

Widening

Okay. So we can’t say that eventually stabilizes, and we can’t say that if it does, that stabilizing point is . And sometimes stabilizing just takes way too long. What do we do?

At a broad level, we first calculate an over-approximation of a fixed point that we know will converge. We will then later iteratively improve this approximation until we reach a true least fixed point. To calculate this over-approximation, we introduce a widening operator . This operator has two properties that will be helpful for us:

    • It must provide an “upper bound” on and .
    • We don’t require to be monotone, commutative, associative, or absorptive.
    • It literally just has to return something bigger than and .
    • We don’t guarantee that !
  • Given a chain , the chain eventually stabilizes.
    • Essentially, build the next item of the chain by -ing with the previous item.
    • Basically, we widen! At each point in the chain, the next item has to be at least bigger than that point in the original chain, as well as every other previous item in the new chain.
    • Think: becomes .
    • Forcing items to become bigger!

Normally to show that eventually stabilizes for some definition of , you assume it doesn’t and do a proof by contradiction.

Example

We can define a simple widening operator for intervals.

Wherever the first interval is inside the second, extend to infinity. Otherwise, use the first interval.

With this operator, we can define , the version of that uses our widening operation:

Essentially, if applying again would cause the value to shrink or stay the same, that’s our final result. Otherwise, we call the widening operator on the previous value and the new, bigger value. We know that will stabilize at some point , since by the definition of we know that repeated widening calls will eventually stabilize (there’s a proof on page 226).

We know that —otherwise, the second clause wouldn’t have kicked in and we wouldn’t have stabilized—and since Tarski’s theorem states , we take thisvalue as our safe approximation of