We’re looking at effect type systems as a means to region-based memory management. Effect types describe important effects of computation, while region-based memory management refers to arenas basically.
Value flow by typing with labels
They talk about BL—basically fix
, bools, and call-by-value semantics. They then introduce TL, where values are tagged with a label variable t at p
. The operation t ! p'
returns back
We denote
This form can give us insight into where values flow into each other. For instance, with the following program:
How do we know where fst
will be used? We can annotate explicitly!
Subscripts denote tag labels, superscripts denote untag operations.
Correctness
Correctness of TL means evaluating TL terms gives the “same” results as evaluating their underlying BL-terms. This has two parts:
- Conditional correctness: TL-terms produce same results as BL-terms unless they get stuck. Property of eval rules
- Soundness: TL-terms don’t get stuck. Progress & preservation.
TODO
There’s a proof here; I skipped it.
Inference of value flow info
What if we don’t want to manually annotate this? Can we infer value flow?
Yes, but there are infinitely many con/decon completions for a term. The most useless completion is where we assign every value the same label—any value can be used anywhere. We want the completion with the most distinct labels, since this gives us the most specific information. This is the principal completion—it is unique, and all other completions can be obtained with variable substitution. To infer, we do the following:
- Start with a con/decon completion template where each label variable occurs once
- To satisfy typing rules, we need a substitution for label variables.
- This involves solving equational constraints between template labels
- Finding and solving the set of constraints both take linear time.
Labels as regions
A label is a memory region. We can add lexically-scoped regions as follows:
Things to note:
- E-New takes precedence, evaluating the term inside the lexical scope with respect to the region.
- Once evaluated, E-NewBeta replaces the region with a dummy value, modeling deallocation. This models the intuition that if the final evaluated term still holds a pointer into the region, outside of that region’s scope, that value won’t make any sense!
(new p. x at p)
is invalid, since when we leave the scope, the region is deallocated, andp
becomes dangling.- We model this by evaluating
(new p. x at p) -> p at INVALID
, sincep
has been deallocated.
- The semantics are: evaluate region, run item within scope, deallocate region.
This is unsound.
Effects
The form of the judgment is:
where:
is an effect expression (i.e., effect) is an effect type
meaning: “Under
In a call-by-name (lazy) language, a variable has type
In region-land, our effects are sets of regions that are accessed!
Important notes:
- Region expressions yield constraints on
. - With lexical scoping, the expression as a whole cannot contain
, allowing us to codify at the type level the intuition of “this expression can’t reference once the lexical scope returns” - The arrow type can be read as: “if evaluating
results in effect , evaluating the lambda expression as a whole yields effect .” - Our base bool values don’t make any prescription on what set of effects they type-check under—it may produce any effect.
Region-based memory management
Compile-time management of the heap, no GC required, by annotating which regions a pointer is allocated in With region inference we can even do this manually without annotations! Rust does a variation of this!!
A region is a sub-heap containing heap allocated values. The heap is divided into distinct regions.
Versus the tagged language, we have some changes:
- We’ve added region abstraction—we can now bind a region and pass it in as an argument.
- This allows for region polymorphism.
- Since we have lexical region allocation only, region parameters must outlive functions.
- Region polymorphic recursion means that the body of a recursive function can use pass in regions to recursive invocations different than the ones passed to itself at the top.
- Untagging is implicit: see RE-RBeta and RE-Beta.
- Untagging will get stuck if we try to untag with
, since the rules require the place to be , which is separate from in the syntax. - Additionally, allocation at a non-existent region will get stuck for the same reason: see RE-Clos and RE-RClos.
- Untagging will get stuck if we try to untag with
Deallocated memory
This satisfies conditional correctness.
The Tofte-Talpin calculus & type system
The Tofte-Talpin (TT) region language is the full, original version of what we’ve been building to. One small difference with the calculus is that it only allows region abstraction in the definition of recursive functions:
letrec f[rho_i...](x) at rho = t1 in ...
One of the distinguishing factors about TT is that it has a type system at all! Back in our land, we adapt the TT language’s type system to work with our own:
h o o h b o y. Let’s go through this:
- We have type polymorphism like System F, but also effect polymorphism
. This allows us to define functions that don’t specifically hardcode the set of regions. - Map normally has type
- Taking into account regions:
- But that requires us to lock in to a specific selection of
. - We can be polymorphic over that!
- Map normally has type
This system is sound: progress & preservation means well-typed programs don’t get stuck.
We can add extensions, e.g., list datatypes! For lists, we enforce that all cons cells have same region.
Region inference
Like type reconstruction (inference), but for regions! This broadly follows the template algorithm in Inference of value flow info.
We start with an optimistic assumption where we introduce new variables for everything—none of the effects or regions depend on each other. Let’s say a function
We then analyze the expression from the inside out, building a typing tree as we go. These typing judgments collect constraints on our effects.
This tree yields a proof of… something, depending on what we subsitute for
These constraints imply that new rho4, rho 5.
statement before this expression, making them invisible to the outside of the expression and dropping them from the constraint.
For any effects not connecting to the environment with subset constraints, we quantify over that effect.
We repeat this process until fixpoint.
Alternatives
Lexical scoping for regions is still too limiting. For example, consider the recursive game of life:
We have some issues:
- This isn’t tail recursive anymore, cause we have to deallocate after returning
- The
nextgen
function has to construct its result in the same region that contains its input: space leak!
How can we fix this?
Region resetting in ML Kit
Adding the ability to allocate, but to remove all the items that were previously allocated in that region. We can infer when to do this with a local liveness analysis—we see if other values in that region are still live at this point. This works, but obscures the original algorithm and isn’t obvious.
Aiken-Fähndrich-Levien’s analysis for early deallocation
Change: a region variable introduced by new
can be unallocated or deallocated! Dataflow analysis for region variables introduces annotations indicating when a region should be allocated or deallocated.
Imperative regions
Instead of lexical scope, new
and release
(i.e., deallocation) are statements!
Cyclone
Rust is based on Cyclone’s model! Lifetimes: region regions_of
operator, representing the region variables that occur free in a type. This operator is left abstract until the type variable is instantiated, thus emulating effect polymorphism! Here’s the type of map:
Other approaches
Regions with CPS, type system where region references can be stored in data structures in the presence of linear types.