The example below is a code snippet in this example file.
First some bytecode is created using the Code.Create
function from the EVM module.
/** Code snippet that detects overflow in addition. */
const OVERFLOW_CHECK := Code.Create(
[
// 0x00 0x01 // stack
DUP2, ADD, // [y + x, y] .
// 0x02 0x03
LT, // [x + y < y?1:0]
// 0x03
PUSH1, 0x07, JUMPI, // [0x07, x < x + y?1:0]
// If Peek(1) is 0 no overflow, STOP, otherwise JUMP and revert.
// 0x06
STOP,
// 0x07 0x08 0x09 0x10 0x11, 0x12
JUMPDEST, PUSH1, 0x00, PUSH1, 0x00, REVERT
]
);
The bytecode is in the form of opcodes defined in the Opcodes module. The more friendly layout of the code is:
-------------------
Address|Instruction
-------------------
0x00: DUP2
0x01: ADD
0x02: LT
0x03: PUSH1 0x07
0x05: JUMPI
0x06: STOP
0x07: JUMPDEST
0x08: PUSH1 0x00
0x10: PUSH1 0x00
0x12: REVERT
The intention is that this code snippet:
- adds the two values at the top of the stack,
- detects under/overflow,
- normally terminates if no overflow happens and reverts otherwise.
What we are going to do is write a Dafny program that verifies that this code snippet does the job, for all possible inputs at the top of stack.
Detecting overflows in addition modulo a given number can be done by first computing the result of an addition modulo, and then checking a property of the result against one of the inputs. In other words, there is a necessary and sufficient condition to detect overflows on addition modulo. This is formally defined in the following Dafny lemma:
/** Necessary and sufficient condition for detecting overflow in ADD. */
lemma AddOverflowNSC(x: u256, y: u256)
ensures /* C1 */ x as nat + y as nat > MAX_U256
<==> /* C2 */ (x as nat + y as nat) % TWO_256 < x as nat
{
// Proof should be here, but Dafny can prove it on its own. Thanks Dafny!
}
This lemma asserts (postcondition ensures
) that the following conditions are equivalent (<==>
):
- C1: the addition of
x
andy
as if there were natural numbers (no modulo) is larger thanMAX_U256 = 2^256 - 1
, - C2: the addition of
x
andy
modulo2^256
is less thanx
(y
would work too as the+
is symmetric).
Dafny successfully verifies this lemma, using clever automated reasoning and an SMT-solver in the backend.
What we end up with is a machine-checkable proof (the lemma is written as a program, and the program is verified by Dafny) that the lemma is valid for all inputs x
and y
of type u256
.
We can now use this fact to write an overflow checker and this is exactly what the OVERFLOW_CHECK
code snippet does.
In order to verify that the overflow checker is correct we need a few things:
- a formal specification of what the correctness property is,
- a mechanism to reason about the bytecode.
To do so we are going to write a Dafny program, a method OverflowCheck
(below) that captures the specification and the correctness criteria:
- the inputs
x
andy
: there are input parameters ofOverflowCheck
together with a statest:State
of an EVM. The inputs can take arbitrary values but we add some reasonable contraints on them defined by preconditions (requires
):Pre0
indicates that we start in a non-failure state, with the Program CounterPC
pointing to instruction at address0x00
.Pre1
: we assume the amount ofgas
in the EVM is larger than a certain amount. The values used here are defined for the Berlin version of the EVM semantics and available in the Gas module.Pre2
: the initial stack contains the two valuesx
andy
withx
at the top withx
andy
arbitrary values of typeu256
.Pre3
: the code that is currently executed in the EVM is theOVERFLOW_CHECK
code above.
- the correctness criteria: this is captured by postconditions (
ensures
). The postconditions formally and logically define our correctness criteria:- the body always terminates; this is enforced by Dafny and when the body terminates, the following postconditions hold:
Post0
: the final state is either a normal stop or a revert. This excludesout-of-gas
exception (we intend to provide enough gas inPre1
to avoidout-of-gas
exceptions).Post1
: the final state is a revert if and only if an overflow occurs. Note that overflow is defined mathematically with unbounded natural numbers (nat
).Post2
: the final state is a normal state if and only if no overflow occurs. That postcondition logically follows fromPost0
andPost1
.
- the instrumented code: to run the code we use the
ExecuteN(s, k)
function in the EVM module. This function executes (at most)k
steps of the code ins
. If an invalid state (e.g. stack under/overflow, out-of-gas) is encountered before thek
-th step the execution prematurely stops and returns the invalid state. The body of the methodOverflowCheck
below somehow monitors what the bytecode is doing. If executes the first 4 instructions, resulting in statest' == ExecuteN(st, 4)
. Depending on the result (second element of the stackst'.Peek(1)
) of the comparison operatorLT
(semantics in the bytecode module) what is left to execute next is: either one instructionSTOP
or the section of the code that reverts (from0x07
). The instrumentation of the execution of the bytecode with a Dafnyif-then-else
statement enables to track these two different paths.
/**
* This is a pattern that is frequently used to detect overflows for ADD.
*
* @param st A state.
* @param x A u256.
* @param y A u256.
* @returns A normal state if no overflow, a revert state otherwise.
* @note The check relies on the property specified by lemma AddOverflowNSC.
* @note The overflow is specified as x + y exceeding MAX_U256.
*/
method OverflowCheck(st: State, x: u256, y: u256) returns (st': State)
/** OK state and initial PC. */
requires /* Pre0 */ st.OK? && st.PC() == 0
/** Enough gas. Longest path gas-wise is via JUMPI. */
requires /* Pre1 */ st.Gas() >= 6 * Gas.G_VERYLOW + Gas.G_HIGH + Gas.G_JUMPDEST
/** Initial stack is [x, y]. */
requires /* Pre2 */ st.GetStack() == Stack.Make([x, y]);
/** The code is the snippet to detect overflow. */
requires /* Pre3 */ st.evm.code == OVERFLOW_CHECK
/** The contract never runs out of gas thanks to Pre1. */
ensures /* Post0 */ st'.REVERTS? || st'.RETURNS?
/** Should revert iff overflow. */
ensures /* Post1 */ st'.REVERTS? <==> x as nat + y as nat > MAX_U256
/** Normal termination iff no overflow. */
ensures /* Post2 */ st'.RETURNS? <==> x as nat + y as nat <= MAX_U256
{
// Execute 4 steps -- DUP2 ADD LT PUSH1 0x07
st' := ExecuteN(st,4);
// Depending on result of LT comparison overflow or not
if st'.Peek(1) == 0 {
st':= Execute(st');
assert st'.PC() == 0x06;
st' := ExecuteN(st',1);
assert st'.RETURNS?;
} else {
st':= Execute(st');
assert st'.PC() == 0x07;
st' := ExecuteN(st',4);
assert st'.REVERTS?;
}
}
So the question is now: How does the verification work and what does it verify?
What is verified is the following correctness property: for all inputs that satisfy the preconditions,
- the body always terminates (as there is no loop this check is easy),
- the
assert
statements are never violated, and - when the body of the method
OverflowCheck
terminates, theensures
statement hold.
As for the How, there several things that happen during this verification phase under the hood. In a nutshell, Dafny builds big verification conditions (logical predicates), and it turns out that they are valid if and only if the correctness property above is true. Whether the verification conditions are valid is outsourced to a backend SMT-solver (Z3).
If we check the method OverflowCheck
, the Dafny verifier indeed confirms that it is correct.
So what have we proved with this process?
Several interesting properties:
- first, we have proved correctness for all inputs satisfying the preconditions. It is a symbolic and logical reasoning process, not a testing process. This is equivalent to exhaustively testing all inputs (and there are more than
2^257
possible inputs). - second, with
Post0
we have a guarantee that the code snippet never runs out of gas. Of course this is true provided there is enough gas initially, as perPre1
which gives us a lower bound on what is needed to avoid an out-of-gas exception. - third, we have proved that no exceptions related to stack under/overflow occur: otherwise we would
reach an
INVALID
state which is ruled out byPost0
- finally, we have proved that the code snippet correctly detects overflows in addition modulo (
Post1
andPost2
), and this for any values ofx
andy
initially on the stack.
The previous bytecode OVERFLOW_CHECK
does not exercise a loop which makes relatively easy to verify.
The code below iterates a loop c
times where c
is an arbitrary u8
, unsigned integer over 8 bits.
We want to guarantee that this bytecode always terminates for any value of c
and,
a suitable initial value of gas in the EVM.
What we are doing here is building a proof for a family of bytecodes as c
is a paramater of the bytecode itself.
The layout of the bytecode is as follows:
-------------------
Address|Instruction
-------------------
0x00: PUSH1 c
0x02: JUMPDEST <-------
0x03: DUP1 |
0x04: PUSH1 0x08 |
0x06: JUMPI ---- |
0x07: STOP | |
0x08: JUMPDEST <--- |
0x09: PUSH1, 0x01 |
0x0b: SWAP1 |
0x0c: SUB |
0x0d: PUSH1, 0x02 |
0x0f: JUMP --------
The bytecode initially pushes c
at the top of the stack.
Let's assume the stack is initially empty. Then after executing 0x00
it is [c]
.
The bytecode then iterates the following sequence of instructions:
- duplicates top of the stack (
0x03
) and pushes aJUMP
target0x08
; the stack is[0x08, c, c]
. - if
stack[1] > 0
then jump tostack[0]
and otherwise go to next instruction0x07
; pop two elements off the stack. This all happens withJUMPI
at0x06
. After this instruction the stack is[c]
. - if we have not jumped in the previous step, the next instruction is
0x07
and we halt. - otherwise we are at
0x08
. From0x08
to0x0d
, decrement the top of the stack by one resulting in a stack[c - 1]
and push aJUMP
target0x02
. JUMP
to0x02
and pop top of the stack, i.e. go to step 1.
We assume that the starting state st
of the EVM is such that:
- the state is a non-failure state, with
PC=0
and there is room for pushing 3 elements on the stack (in the EVM the stack has limited capcity of 1024). This is captured by preconditionPre0
; - we initially load the EVM with some gas, precondition
Pre1
. The amount of gas needed to terminate the computation depends on the value ofc
. It may take a while to figure out the formula but Dafny will verify that it is indeed enough to run the program until completion. - the code to execute in the EVM is the bytecode above: this is enforced precondition
Pre2
.
/**
* A program with a loop.
*
* @param c The number of times to iterate the loop.
*
* @note The code iterates a loop `c` times by decremeting
* a copy of `c` until it is zero.
* We prove termination on this program and also
* that it ends in a RETURN state.
*
* The stack content is unconstrained but there must be
* enough capacity (3) to perform this computation.
*/
method Loopy(st: State, c: u8) returns (st': State)
requires /* Pre0 */ st.OK? && st.PC() == 0 && st.Capacity() >= 3
requires /* Pre1 */ st.Gas() >=
3 * Gas.G_VERYLOW + Gas.G_JUMPDEST +
c as nat * (2 * Gas.G_HIGH + 2 * Gas.G_JUMPDEST + 6 * Gas.G_VERYLOW)
+ Gas.G_HIGH
requires /* Pre2 */ st.evm.code == Code.Create(
[
// 0x00 0x01
PUSH1, c,
// 0x02, 0x03 0x04 0x05
JUMPDEST, DUP1, PUSH1, 0x08,
// 0x06
JUMPI,
// 0x07
STOP,
// 0x08 0x09 0x0a
JUMPDEST, PUSH1, 0x01,
// 0x0b 0x0c
SWAP1, SUB,
// 0x0d 0x0e 0x0f
PUSH1, 0x02, JUMP
]
);
ensures /* Post0 */ st'.RETURNS?
{
// Execute PUSH1, c, JUMPDEST, DUP1, PUSH1, 0x08
st' := ExecuteN(st, 4);
// verification variable to track decreasing counter
ghost var count : u256 := c as u256;
// number of times we get into the loop.
ghost var n: nat := 0;
while st'.Peek(2) > 0
invariant st'.OK?
invariant st'.Gas() >= count as nat * (2 * Gas.G_HIGH + 2 * Gas.G_JUMPDEST + 6 * Gas.G_VERYLOW) + Gas.G_HIGH
invariant st'.PC() == 0x06
invariant st'.Operands() > 2
invariant count == st'.Peek(2) == st'.Peek(1)
invariant st'.Peek(0) == 0x08;
invariant st'.evm.code == st.evm.code
invariant n == c as nat - count as nat
decreases st'.Peek(2)
{
assert st'.PC() == 0x06;
// Execute body of the loop
st':= ExecuteN(st' ,10);
count := count - 1;
n := n + 1;
}
assert st'.PC() == 0x06;
// Check we iterated the loop c times.
assert n == c as nat;
// JUMPI, STOP
st' := ExecuteN(st', 2);
}
To verify that the loop iterates c
times we use a verification ghost
variable n
to record
the number of times we hit the instruction at 0x08
.
We also use a ghost
variable count
to keep track of the value of the loop counter which is actually on the stack.
Ghost variables cannot be used to influence the control flow.
The fact that we iterate the loop c
times is specified as assert c == n
at the end the method Loopy
.
And normal termination of the bytecode is specified by postcondition Post0
.
So How does it work?
To verify this bytecode we instrument it.
The execution of the bytecode is monitored by some Dafny code.
This includes tracking whether we are iterating the loop body or not.
We can think about the instrumented code in Loopy
as if we had two programs: the bytecode on the EVM executed with ExecuteN
function, and in parallel a monitor that keeps track of where we are and can observe the state of the EVM but not influence it.
For instance, our monitor can check the value of the third element of the stack, st'.Peek(2)
and
according to its value, predicts the next instruction to be executed.
If st'.Peek(2) > 0
it predicts that PC=0x06
and the result of the JUMPI
will be the loop body.
We then execute 9 more steps.
Otherwise it predicts that the next two steps will end the program (exit the while
loop).
What if our predictions are wrong? The method will not verify as we won't be able to execute 10 instructions without
encountering an INVALID
state.
And what do we gain from this instrumentation?
We can define loop invariants which are predicates that must hold on the loop entry, and must be maintained by the loop body.
By doing so we instrument the bytecode with some checks at regular intervals.
We can ensure that the loop terminates by providing a ranking function (decreases
).
Dafny successfully verifies our method Loopy
which means that the invariants are actually loop invariants, and the postcondition and assert
statements always hold.
As can be readily seen, the instrumentation and specification of the correctness criteria (pre- and postconditions, invariants) is much larger than the code to execute the bytecode, but this is expected in a formal verification world.
What we have proved in this bytecode is:
- the execution always terminates,for all values of
c
, and for any initial state of the EVM that satisfies the preconditions. - the loop is iterated
c
times. - there is no stack under/overflow.
- if we start with enough gas (
Pre1
) there is no out of gas exception.
Every computation on the EVM incurs some cost measured in gas.
To save on gas, some optimisations can be applied to the bytecode.
A correct optimisation of a sequence of instructions xi
into opt(xi)
should satisfy at least two properties:
- if executing
xi
from states
results in states'
then executingopt(xi)
froms
should result ins''
that may differ froms'
only for the values ofgas
andpc
- from any state
s
, the gas cost of executingopt(xi)
should be less than the gas cost of executingxi
.
There are several ways of optimising EVM bytecode ranging from constant propagation, detecting/replacing some patterns, etc. Some optimisations may look trivial and correct but there are some known bugs related to optimisations that can result in observing outdated memory states Overly Optimistic Optimizer — Certora Bug Disclosure.
Using our framework, we can formalise the correctness criterion above and also prove that some optimisations are correct. We demonstrate this on a simple example from this Thesis.
Proposition 12 in this Thesis proposes to replace any sequence of the form SWAPN POP^(n+1)
by POP^(n+1)
(we use regular-language notation so a^k
stands for a
repeated k
times).
That seems plausible as:
SWAPN
for1 <= N <= 16
swaps the element at index0
(top of the stack) with the element at indexN
,- because we
POP
N+1
times right after theSWAPN
, the SWAP somehow does not matter: the firstN+1
elements are popped and the elements at indexN+2
and above are left unchanged.
We want to formally prove that replacing SWAPN POP^(n+1)
by POP^(n+1)
is correct, and this for any (valid) input state.
Moreover, we provide a machine-checkable proofs of this fact.
To build a machine-checkable proof of Proposition 12 we write a Dafny program, Proposition12a
below, with the following features:
- we start from an arbitrary state of an EVM,
vm
, with the constraints that the stack has more than 2 elements (and less than the maximum 1024). We also require that there is enough gas in the machine. This is specified by therequires
clauses in the Dafny code below. If these constraints are not satisfied we may reach andINVALID
state and won't be able to prove any useful property. - we execute
POP POP
in one EVM,vm1
, starting from the initial statevm
; - then we execute
SWAP1 POP POP
in another EVM,vm2
, starting from the same initial statevm
; - we then assert that the states after executing
SWAP1 POP POP
andPOP POP
are the same upto the values ofgas
andpc
.
/**
* Proposition 12: n + 1 consecutive Pop <_gas SwapN n + 1 consecutive Pop
* Gas cost for POP is G_BASE and for SWAP G_VERYLOW
*
* @param s An initial stack content.
* @param g An initial value of gas.
*
* @note Case n = 1
*/
method Proposition12a(s: seq<u256>, g: nat)
/** Stack must have at least 2 elements. */
requires 2 <= |s| <= 1024
/** Minimum gas needed. */
requires g >= 2 * Gas.G_BASE + Gas.G_VERYLOW
{
// Get an EVM with some gas and an arbitrary stack `s`
var vm := EvmBerlin.Init(gas := g, stk := s, code := []);
// execute 2 POPs
var vm1 := vm;
vm1 := ExecuteOP(vm1, POP);
vm1 := ExecuteOP(vm1, POP);
// execute SWAP1 followed by 2 POPs
var vm2 := vm;
vm2 := ExecuteOP(vm2, SWAP1);
vm2 := ExecuteOP(vm2, POP);
vm2 := ExecuteOP(vm2, POP);
// States are the same except for gas and PC
assert vm1.evm.(pc := 0, gas := 0) == vm2.evm.(pc := 0, gas :=0);
// Gas saved is Gas.G_VERYLOW
assert vm2.Gas() == vm1.Gas() - Gas.G_VERYLOW;
assert vm2.Gas() < vm1.Gas();
}
First, we use an input parameter s
to define the content of the EVM at the beginning.
Because it is an input parameter it can take any arbitrary value, but must satisfy the size requirements 2 <= |s| <= 1024
.
The same holds for the gas value in the initial state: we require that this is more than a certain amount that corresponds to one SWAP and 2 POPS (we could actually require two different values for the two EVMs).
Second, the assert
statements in Dafny are verification statements and must provably hold for any inputs.
If you use this code in VSCode, you may change requires 2 <= |s| <= 1024
into requires 1 <= |s| <= 1024
and see the effects: Dafny cannot prove that a valid state is reached after the computations.
Dafny can successfully verify (without any extra proof hints) the previous method
and what we have proved is the following:
for any state of the EVM with more than 2 elements on the stack, and enough gas,
- the two sequences
SWAP1 POP POP
andPOP POP
lead to the same states (except for the value ofgas
andpc
), - the sequence
POP POP
is strictly less expensive thanSWAP1 POP POP
, - the
method
Proposition12a
is a machine-checkable proof.
The general case is more tricky as we have to prove it for any 1 <= N <= 16
. We could write separate methods
to prove each case, but we show here how to use loop invariants
to reason about EVM bytecode.
The Dafny method
below specifies the proof of the general case.
As for the case N = 1
, we run the code and the optimised code in two different EVMs starting from the same state.
What is more complicated here is that we have to execute the POP
n times where n
is an input parameter of the proof.
The result is that the constraint on the initial state of the EVM must satisfy some requirements that take into account the value of n
. You can check that n = 1
yields the constraints we had for the Case N = 1
above.
/**
*
* Proposition 12: n + 1 consecutive Pop <_gas SwapN n + 1 consecutive Pop
* Gas cost for POP is G_BASE and for SWAP G_VERYLOW
*
* @param n As described above.
* @param s An initial stack content.
* @param g An initial value of gas.
*
* @note General case.
*
*/
method Proposition12b(n: nat, s: seq<u256>, g: nat)
requires 1 <= n <= 16
/** Stack must have at least n + 1 elements. */
requires n + 1 <= |s| <= 1024
/** Minimum gas needed. */
requires g >= (n + 1) * Gas.G_BASE + Gas.G_VERYLOW
{
var vm := EvmBerlin.Init(gas := g, stk := s, code := []);
// Execute n + 1 POPs in vm1.
var vm1 := vm;
for i := 0 to n + 1
invariant vm1.OK?
invariant vm1.Gas() == g - i * Gas.G_BASE
invariant vm1.Operands() >= n - i
invariant vm1.GetStack() == vm.SlicePeek(i, |s|)
{
vm1 := ExecuteOP(vm1, POP);
assert vm1.OK?;
}
assert vm1.Gas() >= Gas.G_VERYLOW;
// Stack after n + 1 POPs is suffix of initial stack starting at index n + 1
assert vm1.GetStack() == vm.SlicePeek(n + 1, |s|);
// Execute SWAPn and then n + 1 POPs in vm2.
var vm2 := vm;
vm2 := Swap(vm2, n).UseGas(Gas.G_VERYLOW);
for i := 0 to n + 1
invariant vm2.OK?
invariant vm2.Gas() == g - i * Gas.G_BASE - Gas.G_VERYLOW
invariant vm2.Operands() >= n + 1 - i
invariant vm2.Operands() == vm.Operands() - i == |s| - i
invariant vm2.SlicePeek(n + 1 - i, |s| - i) == vm.SlicePeek(n + 1, |s|)
{
vm2 := ExecuteOP(vm2, POP);
assert vm2.OK?;
}
assert vm2.SlicePeek(0, |s| - n - 1) == vm.SlicePeek(n + 1, |s|);
assert vm1.GetStack() == vm2.GetStack();
assert vm2.Gas() == vm1.Gas() - Gas.G_VERYLOW;
assert vm2.Gas() < vm1.Gas();
}
The code snippet for the first loop shows that it is a bit harder to prove that the two sequences SWAPN POP^(n+1)
by POP^(n+1)
are equivalent.
First we have to prove that we can perform n + 1
POPs.
This requires us to maintain a loop invariant, invariant vm1.Operands() >= n - i
, that ensures that there are enough elements of the stack at each iteration of the loop.
The same holds for gas: we have to prove that there is enough gas after each iteration, which is captured by the loop invariant invariant vm2.Gas() == g - i * Gas.G_BASE - Gas.G_VERYLOW
.
The other invariants invariant vm2.OK?
specifies that at each iteration we are in non-failure state.
Second we have to show that the resulting states are the same. This requires us to prove another loop invariant
invariant vm2.SlicePeek(n + 1 - i, |s| - i) == vm.SlicePeek(n + 1, |s|)
where:
x.SlicePeek(l, u)
is a stack made out of the elements ofx
from indexl
tou - 1
,- the invariant specifies that, in
vm2
, the stack content after the nextn + 1 - i
elements is the same as the initial stack content invm
.
The code snippet for SWAPN POP^(n+1)
is split into SWAPN
and then a loop that pops n + 1
times.
The invariants for the vm2
states are similar to the ones for vm1
.
Overall, Dafny checks that the predicate tagged with attribute invariant
are indeed maintained by the loops, and valid on entry.
Using the fact that the loop invariants are true after exiting the loop, and in conjunction with the exit condition i == n + 1
, this enables us to obtain the two facts:
assert vm1.GetStack() == vm.SlicePeek(n + 1, |s|);
frominvariant vm1.GetStack() == vm.SlicePeek(i, |s|)
andi == n + 1
assert vm2.SlicePeek(0, |s| - n - 1) == vm.SlicePeek(n + 1, |s|);
frominvariant vm2.SlicePeek(n + 1 - i, |s| - i) == vm.SlicePeek(n + 1, |s|)
andi == n + 1
.
As vm2.SlicePeek(0, |s| - n - 1)
is the stack in vm2
,
if we combine them we can prove that assert vm1.GetStack() == vm2.GetStack();
i.e. the two stacks are equal after executing the sequences SWAPN POP^(n+1)
and POP^(n+1)
.
Dafny successfully verifies this program (without any extra proof hints), and we obtain the following results:
for any state of the EVM with more than n + 1
elements on the stack, and enough gas,
- the two sequences
SWAPN POP^(N + 1)
andPOP^(N + 1)
lead to the same states (except for the value ofgas
andpc
), - the sequence
POP^(N + 1)
is strictly less expensive thanSWAPN POP^(N + 1)
. - the
method
Proposition12b
is a machine-checkable proof.