Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update eip-3779.md #4212

Merged
merged 1 commit into from
Sep 24, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 33 additions & 36 deletions EIPS/eip-3779.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ category: Core
author: Greg Colvin (@gcolvin), Greg Colvin <[email protected]>, Brooklyn Zelenka (@expede)
discussions-to: https://ethereum-magicians.org/t/eip-3779-safe-control-flow-for-the-evm/6975
created: 2021-08-30
requires: 3540
requires: 3540, 3670
---

## Abstract
Expand All @@ -21,7 +21,7 @@ This EIP specifies validation rules for some important safety properties, includ

Valid contracts will not halt with an exception unless they either run out of gas or overflow stack during a recursive subroutine call.

Code must be validated at contract creation time – not runtime – by the provided algorithm or its equivalent. Therefore, a table of valid dynamic jumps is specified in an EOF container _section_. This allows for a one-pass algorithm that is (and must be) linear in the size of the _code_ plus the _section_, so as not to be a DoS vulnerability.
Code must be validated at contract creation time – not runtime – by the provided algorithm or its equivalent. Therefore, tables of valid dynamic jumps are specified in an EOF container _section_. This allows for a one-pass algorithm that is (and must be) linear in the size of the _code_ plus the _section_, so as not to be a DoS vulnerability.

## Motivation

Expand All @@ -30,7 +30,7 @@ Validating safe control flow at creation time has a few important advantages.
* Jump destination validity does not always need to be checked for at runtime, improving performance.
* Stack underflow does not ever need to be checked for at runtime, improving performance.

Dynamic jumps, where the destination of a JUMP or JUMPI is not known until runtime, can be an obstacle to statically proving this sort of safety, but have also been seen as necessary to implement the return jump from a subroutine. But consider this example of calling a minimal subroutine
Dynamic jumps, where the destination of a JUMP or JUMPI is not known until runtime, can be an obstacle to statically proving this sort of safety in linear time, but have also been seen as necessary to implement the return jump from a subroutine. But consider this example of calling a minimal subroutine
```
ADD:
RTN_ADD
Expand All @@ -49,26 +49,22 @@ ADDITION:
swap1
jump
```
Note that the return address and the destination address are pushed on the stack as constants -- the `JUMP` instructions are in fact static, not dynamic, in the sense that they jump to the same `PC` on every run. SO we do not need (nor typically use) dynamic jumps to implement subroutines.
Note that the return address and the destination address are pushed on the stack as constants -- the `JUMP` instructions are in fact static, not dynamic, in the sense that they jump to the same `PC` on every run. So we do not need (nor typically use) dynamic jumps to implement subroutines.

Since many of the jumps we need in practice are static we can validate their safety with a static analysis of the _code_. And since can, we should.

Still, providing for the safe use of dynamic jumps makes for concise and efficient implementations of language constructs like switches and virtual functions. Dynamic jumps can be an obstacle to linear-time validation of EVM bytecode. But even where jumps are dynamic it is possible to tabulate valid destinations in advance, and the Ethereum Object Format gives us a place to store such tables.
Even where jumps are dynamic it is possible to tabulate valid destinations in advance, and the Ethereum Object Format gives us a place to store such tables. Providing for the safe use of dynamic jumps makes for concise and efficient implementations of language constructs like switches and virtual functions.

So again, since we can validate the safety of tabulated dynamic jumps with a static analysis of the code, we should.

## Specification

### Dependencies

We need [EIP-3540: EVM Object Format (EOF)](./eip-3540.md) to support container sections and [EIP-3670: EOF - Code Validation](./eip-3670.md) to support validation of instructions.

### EOF container changes

1. A new EOF section called `vjumptable` (`section_kind = 4`) is introduced. It contains a sequence of *n* tuples *(jumpsrc, jumpdest<sub>i</sub>*, sorted in ascending lexicographic order. Each tuple represents a valid jump from one location in the _code_ to another.
2. A new EOF section called `vtraptable` (`section_kind = 5`) is introduced. It contains a sequence of *n* tuples *(jumpsrc, jumpdst<sub>i</sub>*, sorted in ascending lexicographic order. Each tuple represents a valid jump from one location in the _code_ to another.
1. A new EOF section called `vjumptable` (`section_kind = 4`) is introduced. It contains a sequence of _n_ tuples _(jumpsrc, jumpdest<sub>i</sub>__, sorted in ascending lexicographic order. Each tuple represents a valid jump from one location in the _code_ to another.
2. A new EOF section called `vtraptable` (`section_kind = 5`) is introduced. It contains a sequence of _n_ tuples _(jumpsrc, jumpdst<sub>i</sub>_, sorted in ascending lexicographic order. Each tuple represents a valid jump from one location in the _code_ to another. There is one and only one trap table entry for each _jumpsrc_.

At runtime, a dynamic jump cause a search for a match in the `vjumptable.` if found, the jump proceeds to the _jumpdest_. If not, the jump proceeds to the matching _jumpdest_ in the _vtraptable_. In this way dynamic jumps always succeed.
At runtime, a dynamic jump cause a search for a matching _jumpsrc_ and _jumpdst_ in the `vjumptable.` if found, the jump proceeds to the _jumpdest_. If not, the jump proceeds to the _jumpdest_ for the matching _jumpsrc_ in the _vtraptable_. In this way dynamic jumps always succeed, but it is left to the programmer to implement a default case for the trap table.

### Validity

Expand All @@ -88,12 +84,11 @@ _Execution_ is as defined in the [Yellow Paper](https://ethereum.github.io/yello
4. Invalid jump destination
5. Invalid instruction

We would like to consider EVM _code_ valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate _code_ in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation algorithm does not consider the codes data and computations, only its control flow and stack use. This means we will reject programs with any invalid _code_ paths, even if those paths are not reachable at runtime.
We would like to consider EVM _code_ valid iff no execution of the program can lead to an exceptional halting state, but we must be able to validate _code_ in linear time to avoid denial of service attacks. So in practice, we can only partially meet these requirements. Our validation rules do not consider the _code's_ data and computations, only its control flow and stack use. This means we will reject programs with any invalid _code_ paths, even if those paths are not reachable at runtime.

### The Rules
### Validation Rules

> This section extends the contact creation validation rules (as defined in EIP-3540.)
0. All instructions are valid.
1. Every `JUMP` and `JUMPI` either
* matches at least one tuple in the `vjumptable` or the `vtraptable`, or
* is _static_
Expand All @@ -102,19 +97,33 @@ We would like to consider EVM _code_ valid iff no execution of the program can l
* the same on every path through an opcode.
3. The `stack pointer` is always positive and at most 1024.

A _JUMP_ or _JUMPI_ instruction matches a tuple in a table if the first, `jumpsrc` element equals that instructions offset in the _code_.
A _JUMP_ or _JUMPI_ instruction matches a tuple in a table if the first, `jumpsrc` element equals that instruction's offset in the _code_.

We consider a _JUMP_ or _JUMPI_ instruction to be _static_ if its `jumpsrc` argument was first placed on the stack as a constant (e.g. via a `PUSH…`), and its value has not changed since, other than by a `DUP…` or `SWAP…`.

We need to define `stack depth`. The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`. We define the `stack base` (`BP`)as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry. So we can define the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.
The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`. We define the `stack base` (`BP`) as the element that the `SP` addressed at the entry to the current _basic block_, or `0` on program entry, and the `stack depth` as the number of stack elements between the current `SP` and the current `BP`.

Taken together, these rules allow for code to be validated by traversing the control-flow graph, following each edge only once.

## Rationale

The alternative to checking validity at creation time is checking it at runtime. This hurts performance and is a denial of service vulnerability. Thus the above rules and accompanying one-pass validation algorithm.

_Rule 1_ – requiring static or previously tabulated destinations for `JUMP` and `JUMPI` – simplifies static jumps and constrains dynamic jumps.
* Jump destinations are currently checked at runtime, but static jumps can be validated at creation time.
* Requiring the possible destinations of dynamic jumps to be tabulated in advance allows for tractable bytecode traversal for creation-time validation: a jump table takes up space proportional to the number of jump destinations, so attempting to attack the validation algorithm with large numbers of jump destinations will also reduce the available space for _code_ to be validated.

_Rule 2_ – requiring positive, consistent stack depth – ensures sufficient stack. Exceptions can be caused by some irreducible paths like jumping into loops and subroutines, and by calling subroutines with insufficient numbers of arguments.

_Rule 3_ – bounding the `stack pointer` – catches all stack overflows that occur without recursion.

#### Validation Algorithm
## Validation Algorithm

> This section specifies an algorithm for checking the above the rules. Equivalent code must be run at creation time. We assume that the validation defined in EIP-3540 and EIP-3670 has already run to check _Rule 0_, although in practice the algorithms can be merged.

The following is a pseudo-Go implementation of an algorithm for enforcing program validity. This algorithm is a symbolic execution of the program that recursively traverses the bytecode, following its control flow and stack use and checking for violations of the rules above. It uses a stack to track the slots that hold `PUSHed` constants, from which it pops the destinations to validate during the analysis.

This algorithm runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent basic blocks – thus the algorithm takes time proportional to the size of the bytecode.
This algorithm runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ – thus the algorithm takes time proportional to the size of the bytecode.

For simplicity's sake we assume a few helper functions.
* `advance_pc()` advances the `PC`, skipping any immediate data.
Expand Down Expand Up @@ -186,7 +195,7 @@ func validate(pc := 0, depth := 0) boolean {
return false
}

// will enter block at destination
// will enter basic block at destination
bp = sp

// reset pc to destination of jump
Expand All @@ -202,7 +211,7 @@ func validate(pc := 0, depth := 0) boolean {
}
cond := stack[sp++]

// will enter block at destination or next instruction
// will enter basic block at destination or next instruction
bp = sp

// false side of conditional -- continue to next instruction
Expand Down Expand Up @@ -231,23 +240,11 @@ func validate(pc := 0, depth := 0) boolean {
}
```

## Rationale

The alternative to checking validity at creation time is checking it at runtime. This hurts performance and is a denial of service vulnerability. Thus the above rules and accompanying one-pass validation algorithm.

_Rule 1_ – requiring static or previously tabulated destinations for `JUMP` and `JUMPI` – simplifies static jumps and constrains dynamic jumps.
* Jump destinations are currently checked at runtime, but static jumps can be validated at creation time.
* Requiring the possible destinations of dynamic jumps to be tabulated in advance allows for tractable bytecode traversal for creation-time validation: a jump table takes up space proportional to the number of jump destinations, so attempting to attack the validation algorithm with large numbers of jump destinations will also reduce the available space for _code_ to be validated.

_Rule 2_ – requiring positive, consistent stack depth – ensures sufficient stack. Exceptions can be caused by irreducible paths like jumping into loops and subroutines, and by calling subroutines with insufficient numbers of arguments.

_Rule 3_ – bounding the `stack pointer` – catches all stack overflows that occur without recursion.

Taken together, these rules allow for code to be validated by traversing the control-flow graph, following each edge only once.

## Backwards Compatibility

These changes affect the semantics of existing EVM code – the use of JUMP, JUMPI, and the stack are restricted, such that some _code_ that would always run correctly will nonetheless be invalid EVM _code_.
These changes affect the semantics of EVM code – the use of `JUMP`, `JUMPI`, and the stack are restricted, such that some _code_ that would otherwise run correctly will nonetheless be invalid EVM _code_.

These changes and associated security guarantees are compatible with the valid use of the control-flow operations introduced by [EIP-2315: Simple Subroutines for the EVM](./eip-2315.md).

## Security Considerations

Expand Down