We’re looking at effect type systems as a means to region-based memory management. Effect types describe important effects of computation, while region-based memory management refers to arenas basically.

Value flow by typing with labels

They talk about BL—basically with fix, bools, and call-by-value semantics. They then introduce TL, where values are tagged with a label variable . We denote as tagged with , using the syntax t at p. The operation t ! p' returns back if matches , becoming stuck otherwise. Typing rules guarantee that stuckness never happens with well-typed terms.

We denote a completion of if with its tagging information erased—denoted —is equal to . A constructor/deconstructor completion (or con/decon completion) is one where every value is tagged on creation, and untagged immediately before it is eliminated (i.e., right before function application or if condition evaluation).

This form can give us insight into where values flow into each other. For instance, with the following program:

How do we know where fst will be used? We can annotate explicitly!

Subscripts denote tag labels, superscripts denote untag operations.

Correctness

Correctness of TL means evaluating TL terms gives the “same” results as evaluating their underlying BL-terms. This has two parts:

  1. Conditional correctness: TL-terms produce same results as BL-terms unless they get stuck. Property of eval rules
  2. Soundness: TL-terms don’t get stuck. Progress & preservation.

TODO

There’s a proof here; I skipped it.

Inference of value flow info

What if we don’t want to manually annotate this? Can we infer value flow?

Yes, but there are infinitely many con/decon completions for a term. The most useless completion is where we assign every value the same label—any value can be used anywhere. We want the completion with the most distinct labels, since this gives us the most specific information. This is the principal completion—it is unique, and all other completions can be obtained with variable substitution. To infer, we do the following:

  1. Start with a con/decon completion template where each label variable occurs once
  2. To satisfy typing rules, we need a substitution for label variables.
  3. This involves solving equational constraints between template labels
  4. Finding and solving the set of constraints both take linear time.

Labels as regions

A label is a memory region. We can add lexically-scoped regions as follows:

Things to note:

  • E-New takes precedence, evaluating the term inside the lexical scope with respect to the region.
  • Once evaluated, E-NewBeta replaces the region with a dummy value, modeling deallocation. This models the intuition that if the final evaluated term still holds a pointer into the region, outside of that region’s scope, that value won’t make any sense!
    • (new p. x at p) is invalid, since when we leave the scope, the region is deallocated, and p becomes dangling.
    • We model this by evaluating (new p. x at p) -> p at INVALID, since p has been deallocated.
  • The semantics are: evaluate region, run item within scope, deallocate region.

This is unsound. is well-typed.

Effects

The form of the judgment is:

where:

  • is an effect expression (i.e., effect)
  • is an effect type

meaning: “Under , evaluating term produces observable effect , yielding a value of type .”

In a call-by-name (lazy) language, a variable has type , since the variable might be an unevaluated thunk, and arrows have type . In call-by-value (strict), variables are just , and arrow types have for .

In region-land, our effects are sets of regions that are accessed!

Important notes:

  • Region expressions yield constraints on .
  • With lexical scoping, the expression as a whole cannot contain , allowing us to codify at the type level the intuition of “this expression can’t reference once the lexical scope returns”
  • The arrow type can be read as: “if evaluating results in effect , evaluating the lambda expression as a whole yields effect .”
  • Our base bool values don’t make any prescription on what set of effects they type-check under—it may produce any effect.

Region-based memory management

Compile-time management of the heap, no GC required, by annotating which regions a pointer is allocated in With region inference we can even do this manually without annotations! Rust does a variation of this!!

A region is a sub-heap containing heap allocated values. The heap is divided into distinct regions.

Versus the tagged language, we have some changes:

  • We’ve added region abstraction—we can now bind a region and pass it in as an argument.
    • This allows for region polymorphism.
    • Since we have lexical region allocation only, region parameters must outlive functions.
    • Region polymorphic recursion means that the body of a recursive function can use pass in regions to recursive invocations different than the ones passed to itself at the top.
  • Untagging is implicit: see RE-RBeta and RE-Beta.
    • Untagging will get stuck if we try to untag with , since the rules require the place to be , which is separate from in the syntax.
    • Additionally, allocation at a non-existent region will get stuck for the same reason: see RE-Clos and RE-RClos.

Deallocated memory can be reused. In other words, if is a term with some deallocated values, and is constructed from by replacing some of these with new values, if doesn’t get stuck, neither does .

This satisfies conditional correctness.

The Tofte-Talpin calculus & type system

The Tofte-Talpin (TT) region language is the full, original version of what we’ve been building to. One small difference with the calculus is that it only allows region abstraction in the definition of recursive functions:

letrec f[rho_i...](x) at rho = t1 in ...

One of the distinguishing factors about TT is that it has a type system at all! Back in our land, we adapt the TT language’s type system to work with our own:

h o o h b o y. Let’s go through this:

  • We have type polymorphism like System F, but also effect polymorphism . This allows us to define functions that don’t specifically hardcode the set of regions.
    • Map normally has type
    • Taking into account regions:
    • But that requires us to lock in to a specific selection of .
    • We can be polymorphic over that!

This system is sound: progress & preservation means well-typed programs don’t get stuck.

We can add extensions, e.g., list datatypes! For lists, we enforce that all cons cells have same region.

Region inference

Like type reconstruction (inference), but for regions! This broadly follows the template algorithm in Inference of value flow info.

We start with an optimistic assumption where we introduce new variables for everything—none of the effects or regions depend on each other. Let’s say a function has type:

We then analyze the expression from the inside out, building a typing tree as we go. These typing judgments collect constraints on our effects.

This tree yields a proof of… something, depending on what we subsitute for . However, in this tree, and aren’t mentioned in the conclusion, so they don’t change what we’re proving. Thus, we can remove mentions of them from our constraints.

These constraints imply that and —they’re in the final effect. However, they’re not mentioned explicitly in the environment (see the sentence “where is…”). This means we can add a new rho4, rho 5. statement before this expression, making them invisible to the outside of the expression and dropping them from the constraint.

For any effects not connecting to the environment with subset constraints, we quantify over that effect.

We repeat this process until fixpoint.

Alternatives

Lexical scoping for regions is still too limiting. For example, consider the recursive game of life:

We have some issues:

  1. This isn’t tail recursive anymore, cause we have to deallocate after returning
  2. The nextgen function has to construct its result in the same region that contains its input: space leak!

How can we fix this?

Region resetting in ML Kit

Adding the ability to allocate, but to remove all the items that were previously allocated in that region. We can infer when to do this with a local liveness analysis—we see if other values in that region are still live at this point. This works, but obscures the original algorithm and isn’t obvious.

Aiken-Fähndrich-Levien’s analysis for early deallocation

Change: a region variable introduced by new can be unallocated or deallocated! Dataflow analysis for region variables introduces annotations indicating when a region should be allocated or deallocated.

Imperative regions

Instead of lexical scope, new and release (i.e., deallocation) are statements!

Cyclone

Rust is based on Cyclone’s model! Lifetimes: region outlives region if the lifetime of encompasses that of . We have region subtyping. Instead of effect variables, Cyclone introduces a regions_of operator, representing the region variables that occur free in a type. This operator is left abstract until the type variable is instantiated, thus emulating effect polymorphism! Here’s the type of map:

Other approaches

Regions with CPS, type system where region references can be stored in data structures in the presence of linear types.