Ethereum block delay too low #354

andrey-kuprianov · 2021-10-13T08:24:29Z

Surfaced from @informalsystems audit of Althea Gravity Bridge at commit 19a4cfe

severity: High
type: Protocol / Implementation bug
difficulty: Unknown

Involved artifacts

Orchestrator: ethereum_event_watcher.rs

Description

In Orchestrator's Ethereum event watcher, function get_block_delay() is used to determine the block processing delay wrt. the latest block observed on Ethereum:

/// Let's make a conservative assumption of 1% chance of an uncle being a two block deep reorg
/// (actual is closer to 0.3%) and assume that continues as we increase the depth.
/// Given an uncle every 2.8 minutes, a 6 deep reorg would be 2.8 minutes * (100^4) or one
/// 6 deep reorg every 53,272 years.
///
pub async fn get_block_delay(web3: &Web3) -> Uint256 {
    let net_version = get_net_version_with_retry(web3).await;

    match net_version {
        // Mainline Ethereum, Ethereum classic, or the Ropsten, Kotti, Mordor testnets
        // all POW Chains
        1 | 3 | 6 | 7 => 6u8.into(),
        // Dev, our own Gravity Ethereum testnet, and Hardhat respectively
        // all single signer chains with no chance of any reorgs
        2018 | 15 | 31337 => 0u8.into(),
        // Rinkeby and Goerli use Clique (POA) Consensus, finality takes
        // up to num validators blocks. Number is higher than Ethereum based
        // on experience with operational issues
        4 | 5 => 10u8.into(),
        // assume the safe option (POW) where we don't know
        _ => 6u8.into(),
    }
}

It is employed in the function check_for_events() uniformly in the following way (here shown only for deposits):

    let latest_block = get_block_number_with_retry(web3).await;
    let latest_block = latest_block - get_block_delay(web3).await;
    let deposits = web3.check_for_events(
            starting_block.clone(), Some(latest_block.clone()), ...
        ).await;

As can be seen, the block delay is subtracted from the latest block number, and the events are then retrieved up to the adjusted block number, and processed.

There are three problems with the above approach.

The Ethereum block delay is too low

The above calculation of probabilities is not well-justified: probability theory simply doesn't apply here as Ethereum reorganizations are not random events. The correct estimation of a block delay is a risk assessment, and is a very complicated process. We are not at a position to advice on the correct block delay number, but it makes sense to take a glimpse on what Ethereum block delays are used by crypto exchanges:

Kraken: 20 blocks
Coinbase: 35 blocks
Binance: 12 blocks (changed from 30 two years ago)

As can be seen, the minimal block delay used by crypto exchanges is 12, and the average around 22, which is much larger than 6, used by Gravity Bridge.

No detection / mitigation plan in case of a long reorg

No block delay can guarantee that a reorg longer than that won't happen eventually, as DAO / Ethereum Classic case demonstrates well. While nothing can protect users against long reorgs, those events should be detected and mitigated. On the contrary, Gravity Bridge doesn't include any detection or mitigation mechanism against such events, and will proceed further, possibly causing more damage than necessary.

No testing against reorgs

As can be seen in the above code, testnets do not exercise a case of reorgs at all, even up to the given bound: block delay for testnets is taken to be 0. This doesn't provide any confidence, even a minimal one, that the code will behave correctly when deployed on the Ethereum main net.

Problem Scenarios

A reorg happens on the Ethereum main net with the length larger than it is currently assumed by Gravity Bridge (6)
- This creates a possibility of double spends, or other undesirable consequences;
- Without a detection / mitigation plan, Gravity Bridge will proceed as if nothing has happened, possibly causing more damage.
The testnets use 0 block delay
- It is not clear how the code will behave even in presence of reorgs up to the currently present block delay.

Recommendation

Short term

Undertake risk assessment for the Gravity Bridge. At the very least, orient to the Ethereum block delays used by crypto exchanges, and increase the value used by Gravity Bridge to the average block delay employed there (~ 20 blocks).
In the testnets, undertake testing against Ethereum reorgs both up to, and above the assumed block delay. Make sure that up to the assumed block delay the code behaves correctly, and thoroughly understand possible effects of long reorgs.

Long term

Implement detection / mitigation mechanisms in case of an Ethereum reorg longer than the assumed block delay happens. The detection mechanism should make sure, at the very least, that this event is detected, and communicated to validators. The mitigation mechanism should probably just stop the Gravity Bridge operation; and it should also implement the method to resume the operation after the damage assessment and repair (e.g. in the form of a special message signed by the quorum of validators).

The text was updated successfully, but these errors were encountered:

hannydevelop mentioned this issue Oct 21, 2021

Ethereum block delay too low PeggyJV/gravity-bridge#265

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ethereum block delay too low #354

Ethereum block delay too low #354

andrey-kuprianov commented Oct 13, 2021

Ethereum block delay too low #354

Ethereum block delay too low #354

Comments

andrey-kuprianov commented Oct 13, 2021

Involved artifacts

Description

The Ethereum block delay is too low

No detection / mitigation plan in case of a long reorg

No testing against reorgs

Problem Scenarios

Recommendation

Short term

Long term