-
Notifications
You must be signed in to change notification settings - Fork 234
Towards a Finer grained Incremental Typechecker for F*
A main obstacle towards scaling F* towards verifying large developments is its usability. Usability covers many dimensions, ranging from documentation to better error reports. This note focuses on just a few of them: notably, latency of typechecking; better error localization; and control over proof contexts.
So far, perhaps the biggest step towards improving the usability of F* has been its support for incrementally typechecking top-level elements of a module via its emacs interface. The main value of this feature is that it significantly improves the latency of typechecking results, when compared to batch-mode verification. This has allowed us to develop large verified codebases composed of many modules, with each module consisting of dozens of small top-level elements.
But, since incremental typechecking is only supported for top-level elements, there are still significant limitations:
-
Programs must be broken into several smaller top-level fragments to benefit from incremental typechecking
- This comes at the cost of destroying the structure of the original code, copying specifications, manually closure-converting, etc.
-
A single verification condition is produced for a top-level term, potentially containing 100s of separate assertions that are all checked together
-
This can lead to brittleness of SMT-based proofs, since a single proof needs to cover many different aspects of a program, with little control over which hypotheses are relevant for which parts of a proof (a local variant of the "context pollution" problem).
-
When SMT proofs timeout, this leads to poor error localization, since without a model from the SMT solver, F* can do little but blame the entire top-level term
-
This leads to further ad hoc strategies, e.g., the "sliding admit" style (https://github.com/FStarLang/FStar/wiki/Sliding-admit-verification-style), to localize verification errors. However, these remain non-incremental, e.g., each time the admit is advanced, a VC for the whole prefix for the program is recomputed and must be proven by the SMT solver, with, say, just one new assertion.
-
Aside from SMT solving, just running F*'s VC generator on a large term can itself be expensive. This adds latency when re-typechecking large terms. Worse, for enormous terms (e.g., a large nest of mutually recursive functions spanning 1000s of lines), the VC generator is likely to just explode when computing a single VC for all of it.
-
-
Certain forms of abstraction, that are easily expressed within F*'s theory are not actually easily expressible in practice, e.g., a variable whose scope spans a large definition (e.g., a "section variable", encompassing several top-level functions), since this requires repeatedly abstracting and applying for each small top-level function.
To address these issues, this note proposes
-
a revised syntactic structure of F* programs, more amenable to incremental parsing
-
a revision to the structure of the F* typechecker to provide incremental typechecking for partial programs
-
revisions to the F* ide to exploit these features
What makes the top-level syntactic structure amenable to incremental processing? Well, the top-level of a module is simply a sequence of declarations.
-
Processing a partial module amounts to just processing a prefix of its declarations
-
Extending a partially processed module with some more declarations involves
-
First, persisting the state of the compiler after it has processed a prefix of a module. This is easy, since the top-level processing loop is just a fold over the list of declarations. So, we just persist the accumulated state in the fold.
-
When more declarations arrive, we simply continue folding over them with the additional declarations and persisting the resulting accumulated state for further incremental processing.
-
Applying this structure to the syntax of F* terms is not so easy, since the terms are in an applicative syntax of lambda terms and it's hard to represent a partial term, to extend a partial term, and to persist the intermediate state of the compiler while it is processing a partial term so that it can be resumed.
However, revising the syntax of terms to follow a similar sequential structure as the top-level may allow us to process terms incrementally, just as we do top-level terms.
Towards that end, here's a first sketch of a revised syntax, presented in a stratified form that separates complete terms from extensible term fragments.
But, before presenting the syntax, first a disclaimer:
I am intentionally proposing a rather heavy concrete syntax: I can imagine many ways in which this could be made lighter and I welcome suggestions for the nicest concrete notation.
Second, I will gloss over the type annotation convention until the next section.
The main idea of the revised syntax is to introduce introduce two new syntactic forms for sequential and branching control flow that are easily extensible with additional fragments. For example, one could write the following block of sequentially composed terms:
begin_seq
var x1 = e1
...
var xn = en
return e
end_seq
where the block of var
-lines is processed simply by folding the
state of the compiler over each var
-line. Given an incomplete block
of var
s, one can easily add additional var lines and continue
processing, until the end_seq
is encountered.
So, here's the core of the syntax. It should be entirely familiar,
except that in several places in the grammar, we use a non-terminal
E1..En
to allow for term fragments where previously we only allowed
terms.
constants
c ::= () | 0 | 1 ...
computation types
C ::= M t e1..en
terms
t, e ::=
| c //constant
| x //variable
| e1 e2 //application
| e <: C //ascription
| fun (x1:t1) (xn:tm) -> E1..En //abstraction
| let x = e in E1..En //atomic sequential composition
| match e with | Pi -> E1..En //atomic branching
Term fragments are defined quite loosely. Not all sequences of term fragments can be reconstituted into well-formed terms. But, sequences of term fragments are trivial to parse incrementally.
fragment
E ::=
| e //promote an expression as a fragment
| begin_seq : C //start incremental seq
| var x = E1..En //incremental seq op
| end_seq //end incremental seq
| begin_match e return C //start incremental branching
| case P1 -> E1..En //incremental branch case
| end_match //end incremental branch
Here are some examples of programs, extended incrementally to a complete programs in stages.
- Double:
Every line is a valid partial term. It can be extended one line at a time producing a fully defined term.
1. let double = fun (x:ref int) -> begin_seq : _
2. var v = !x
3. (x := 2*v)
4. end_seq
- Append
Here, the first three lines are processed intially, then 4, 5, 6 incrementally.
1. let rec append =
2. fun (x y : list 'a) ->
3. begin_match x return _
4. case [] -> y
5. case hd::tl -> hd :: append tl y
6. end_match
- Invalid programs
Not all sequences of fragments yield valid programs, e.g.,
begin_seq
case [] -> []
will be parsed as a sequence of fragments, but will be rejected as syntactically invalid
- Nested fragments
Every line is processed incrementally
fun (x y z:ref int) -> begin_seq : _
var xy = begin_seq : _
var x0 = !x
var y0 = !y
x0 + y0
end_seq
var z0 = !z;
return (z := z0 + xy)
end_seq
- Controlling the granularity of checking
Where larger blocks of code are to be typechecked together, the
programmer can always just use the existing functional syntax, e.g.,
in this variant of the program in 4, the expression bound to xy
is
typechecked atomically.
fun (x y z:ref int) -> begin_seq : _
var xy =
let x0 = !x in
let y0 = !y in
x0 + y0
var z0 = !z;
z := z0 + xy
end_seq
- Some non-examples
Incremental processing of program fragments is more than just about incrementally parsing fragments. Internally, as explained already in how we incrementally process top-level terms, we need to persist the state of the typechecker as it processes a sequence of fragments.
Conceptually, checking the following block of code
let go () =
var x1 = e1
...
var xn = en
var y =
begin_seq : C
var y1 = f1
...
var ym = fn
ym
end_seq
var z1 = ...
...
var zk =
should be equivalent to checking
let hoist_y (x1:t1) (xn:tn) : C =
let y1 = f1 in
let ym = fm in
ym
let go () =
let x1 = e1 in
...
let xn = en in
let y = hoist_y x1 x2 in
let z1 = ..
...
let zk = ...
In other words, when checking a begin_seq : C
block, the typechecker
only makes use of the types of other variables in scope (x1:t1, ..., xn:tn
), and whatever effectful preconditions are available from the
annotated computation type C
. Dually, the context of the seq block
(z1...zk
in this case) can only make use of the computation type
proven about the seq block, not its definition.
Internally, to support incrementally checking a seq block, the accumulated state of the typechecker is something like
type seq_state = {
env: TcEnv.env; //all variable bindings in scope
goal: Syntax.comp; //the goal type to be eventually checked
fragments: list (fragment * comp); //type-checked fragments seen so far
guard : guard_t; //accumulated typechecking guard
}
The seq_state
is initialized with the enclosing environment, the
goal, an empty guard, empty fragments.
To extend
the state with another fragment, we use something like
let extend (s:seq_state) (fs:fragments) : seq_state
= match f with
| Var(y, e) ->
let e, g, c = tc_term env e in
let fs = (e, c)::fs in
let check =
//check that the partial computation with an `admit`
//continuation validates the goal
let c = bind_l ((admit, admit_c) :: fs) in
force_trivial_guard (sub_comp c goal)
in
let env = Env.push_bv (y, c_f.result_typ) in
{ env = env;
goal = goal;
fragments = fs;
guard = conj_guard g s.guard }
| EndSeq ->
let c = bind_l fs in
force_trivial_guard (mk_conj s.guard (sub_comp c goal))
| _ -> fail "Invalid seq fragment"
This is simple and incrementalizes type inference, VC generation etc.
But, the check
in the first branch repeatedly checks that the
partial computations validate the goal, e.g., perhaps repeatedly
proving some assertions in prior fragments already processed. We could
try to be finer about this and turn those prior assertions into
assumes when checking the continuation. Or, perhaps support for
"effectful calc" statements would help here.
Similarly, when checking a begin_match x : C
each case is checked in
isolation, and when the end_match
is processed, we emit an
exhaustiveness check.
For hiding facts, would it be enough to do something like this?
let wrap (pre post : prop)
(proof : unit -> Lemma (requires pre) (ensures post))
: Lemma (requires pre) (ensures post) =
proof ()
assume val p : prop
assume val q : prop
assume val r : prop
assume val pq : unit -> Lemma (requires p) (ensures q)
assume val qr : unit -> Lemma (requires q) (ensures r)
let test () : Lemma (requires p) (ensures r) =
assert p;
wrap p r (fun () ->
wrap p q pq;
wrap q r qr
);
//assert q; // fails, `q` is hidden here
assert r;
()
I think this can easily be turned into a calc-like syntax for doing proofs. This example
is just for Lemma
but should generalize to other effects too (though it cannot be effect-polymorphic sadly).
Here's an idea for obtaining incrementality without introducing a new construct for it. Start with this context:
assume val g : unit -> Pure int (requires pre) (ensures post)
// which is really:
assume val g : unit -> PURE int (fun p -> pre /\ (forall x. post x ==> p x))
val f : unit -> PURE int WPf // some concrete WPf
Let's now check f
incrementally, which begins with a call to g
.
let f () =
let x = g () in
CONT
For some body CONT
, which we could even not know at this stage. The well-typedness of f
is provided by the formula:
WPf `stronger` WP(body_of_f)
= WPf `stronger` bind_wp WPg WPCONT
= forall p. WPf p ==> WPg (fun x -> WPCONT x p)
We'll now unfold WPg
, which happens to come from a requires/ensures, but it doesn't have to be.
forall p. WPf p ==> pre /\ forall x. post x ==> WPCONT x p
Now, we don't know this formula since we don't know WPCONT
(yet), but we can already check part of it: the pre
part. We can split this into:
p, WPf p |= pre
p, WPf p, x, post x |= WPCONT x p
The first VC is the precondition of the g
call in the proper logical context, the second one
is the VC for the continuation.
Once CONT
gets further defined (or we take another step), we can repeat this procedure and break down the VC more and more.
Note that this is completely generic in the effect, we are only doing some VC magic by splitting. This might not work so well for some effects (e.g. angelic nondeterminism.. I think?), but we should be safe for all effects currently in use.
This involves many SMT queries. However the contexts for each subquery are incrementally growing, so we could imagine an interaction such as:
;; common prefix
(push)
(declare-const p)
(assert (WPf p))
;; check pre_g
(prove pre_g)
;; continue with context
(declare-const x)
(assert (post_g x))
;; more until we discharge all pieces ....
(pop)
(This is also different from the current interaction in that assert/declare parts of the VC incrementally instead of just checking it in one go. This change should not be consequential, I think.)
However, the (prove pre_g)
is really:
(push)
(assert (not pre_g))
(check-sat)
(pop)
And I think the pop
will discard all of the state that the SMT accumulated in the proof of pre_g
... I was hoping doing this incremental style would allow for reuse, but sadly it doesn't look like it at a first glance. The double negation of the formula we really want to prove makes these contexts not really monotonic...
As described so far, we need no particular support from the editor environment to support incrementality.
However, the form of incrementality that we get is limited in that programs are built only by textually appending fragments to it. As such, incomplete program fragments only appear in a suffix of the program text.
This may be fine for many of our uses. But, with more work on the editor side, one should be able to see a
begin_seq : C
prefix
[edit here]
end_seq
as a hole in the program that can be filled in and type-checked incrementally.
I am not sure about the editor support needed to handle this (Agda does something like it already, right?).
But on the compiler side of things, given support for "linear incrementality" supporting programs with multiple holes that are filled incrementally doesn't seem too much harder---we just need a way to focus on a hole within an expression tree, perhaps using a zipper like structure on our syntax trees.
At least for cases where the top level type annotation is anyway inferred, we could make it easier to switch code from non-incremental to incremental mode and back, by considering a more lightweight syntax. For instance, code written in {
and }
could be automatically switched from non-incremental (let) to incremental (var).
A less ambitious idea is that we could make the current sliding admit less heavy by introducing syntax for "admitted blocks". Any code within say {{
and }}
is desugared to admit()
.
Sections à la Coq support a subset of the features above, but go a long way alleviating most of the issues (latency of typechecking; error localization; and control over SMT context). What I like the most about sections in Coq is that the syntax in non-intrusive and that it fits naturally the style of interactive verification in editors, in that a partially typechecked module is always a prefix of a file.
Example:
begin_section Double
var x : ref int
local
val read (#a:Type) (r:ref a) : ST a
(fun _ -> True)
(fun h0 v h1 -> h0 == h1 /\ v == sel h0 r)
let read #a r = !r
local
val write (#a:Type) (r:ref a) (v:a) : ST unit
(fun _ -> True)
(fun h0 _ h1 -> modifies (Set.singleton (addr_of r)) h0 h1 /\ sel h1 r == v)
let write #a r v = r := v
val double: unit -> ST unit
(fun _ -> True)
(fun h0 _ h1 -> modifies (Set.singleton (addr_of x)) h0 h1 /\ sel h1 x == 2 * sel h0 x)
let double () =
let v = read x in
write x (v + v)
end_section Double
x
, read
and write
are only bound inside the section. When the section is closed, definitions without a local
qualifier are generalized over variables declared in the section, while local definitions are inlined. E.g., the above would be equivalent to writing
val double: x:ref int -> unit -> ST unit
(fun _ -> True)
(fun h0 _ h1 -> modifies (Set.singleton (addr_of x)) h0 h1 /\ sel h1 x == 2 * sel h0 x)
let double x () =
let v = !x in
x := v + v
A more ambitious proposal would be to support something similar to Agda holes. This changes radically the interactive verification experience.
One can use ?
as a placeholder for any expression (including one with a computational type):
val double: x:ref int -> ST unit
(fun _ -> True)
(fun h0 _ h1 -> modifies (Set.singleton (addr_of x)) h0 h1 /\ sel h1 x == 2 * sel h0 x)
let double x () =
let v = ? in
x := v + v
Typechecking this replaces ?
with a hole of an appropriate type. The editor should provide support for seeing this type and filling it in, potentially with expressions with additional placeholders:
?1 : x:ref int -> unit -> ST a
(fun _ -> True)
(fun h0 v h1 -> modifies (Set.singleton (addr_of x)) h0 h1 /\ 2 * v == 2 * sel h1 x)
Doing this for computational types is challenging, but perhaps restricting where placeholders can appear (e.g. as the right-hand-side of a let
or match
branch) makes it manageable?