Non-determinism unit test #2414

evan-forbes · 2023-09-04T11:30:42Z

Description

Write a non-determinism unit test for the state machine. The test will involve creating a set of transactions, executing them, and comparing the resulting apphash with an expected value.

Acceptance Criteria

Submit and execute txs using txsim
Should cover v1 and v2
Assert expected state and data roots

evan-forbes · 2024-01-12T21:24:47Z

the goal of this test is to ensure that whatever branch is capable of syncing from scratch. In theory, we can do this with just unit tests, however there are so many different hyper werid edge cases that we don't know what all of those are yet.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

we could maybe also run some deterministic (same inputs to each) fuzzing script on each version, and save those results. whatever branch would have to have identical results to process proposal.

rootulp · 2024-03-12T02:13:29Z

Just noting the OP proposes two distinct tests so we could split this into two issues:

Create a test that compares the results of processing and executing txs on the main branch for each app version.
Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

IMO the first one is more valuable than the second because I expect we'll be able to run the first one in CI and make it a required check prior to PR merge. I expect the second test will take much longer to run and may only be able to run via the nightly (or weekly 😨) CI jobs. That means we'll get delayed signal via the second test. If we're committing to single binary syncs then IMO we still need the second test. But ideally the first test gives us confidence to merge PRs and the second test gives us confidence that new release candidates still preserve the single binary feature.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

I really like this idea.

rootulp · 2024-03-12T15:30:19Z

We discussed this in a sync and came to the conclusion that this test:

Create a test that compares the results of processing and executing txs on the main branch for each app version.

is a nice to have.

Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

is release blocking but it can be performed manually via the mainnet.sh script.

rootulp · 2024-03-15T15:06:52Z

Evan and I paired on a prototype for the first test: https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan

evan-forbes · 2024-05-13T16:09:33Z

If we take the route of expanding the existing knuu infrastructure, we could run this test directly after a PR is merged instead of nightly.

rootulp · 2024-05-16T19:27:05Z

Spun out #3490 so that this issue can focus on the first test in the OP:

Create a test that compares the results of processing and executing txs on the main branch for each app version.

rootulp · 2024-05-17T18:12:59Z

Notes:

We've observed flakiness with knuu tests so we're less inclined to add an additional test using that infra. cc: @ninabarbakadze

Ideas:

Consider writing a manual version of this test using local_devnet. Note: if we do this, we should add a guide on how to create releases and add running the manual test as a step in the release guide.
Consider exploring how other chains (Cosmos Hub, Osmosis) test for version compatibility. However it's likely they don't have coverage for our use case (rolling binary upgrades).
Consider continuing with https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan

cmwaters · 2024-05-23T10:54:47Z

So my take on this is that we already have a script for syncing a node from genesis. Which we should use before every release. If we want it to be faster then we can have some beefy machine we run and just spin up a new node in the same machine (to skip the p2p overhead). Or if we wanted this to be a unit test we could persists only the block and state stores and run the application over the set of transactions (kind of like a local block sync) and compare the app hashes.

The second thing I would advocate for is to add the main docker image to the TestMinorVersionCompatibility. That way we have main (using v1) and all our other v1 tags working together. If we feel txsim is not doing a sufficient job we can expand on the amount of different messages it sends.

When we upgrade to v2, we will need to modify TestMinorVersionCompatibility to move from v1 state machine to v2 state machine for future v2 minor releases. We could create more elaborate tests but I want to review our single binary syncs approach before continuing.

## Overview adding genesis to utils as part of #2414

## Overview Fixes #2414 ### This test is split up in a few parts: - Execute transactions deterministically on v1 and save the AppHash. #3522 ([includes genesis folder backport from utils](#3520)) - Set up the same testApp environment on main, execute transactions and assert that the AppHash matches the one from v1 ### Verifying the AppHash: If you want to make sure that the expected AppHash is correct, you can run the [non determinism test on v1.x](#3522) that generates it. ### Follow-ups: #3509 #3540 --------- Co-authored-by: Rootul P <[email protected]>

evan-forbes added testing items that are strictly related to adding or extending test coverage compatability labels Sep 4, 2023

This was referenced Sep 20, 2023

Version compatibility testing #313

Closed

EPIC: Knuu e2e tests #2061

Closed

evan-forbes added this to the v2 milestone Nov 13, 2023

evan-forbes changed the title ~~Sync mainnet once a week using the latest commit from the relevant release branch~~ Test for compatability between main and all previous versions Jan 29, 2024

evan-forbes added the WS: V2 ✌️ lemongrass hardfork related label Feb 27, 2024

rootulp added the nice to have item is not blocking or required. label Mar 12, 2024

rootulp added needs:triage priority:low optional label to track the relative priority of planned items and removed needs:triage labels Apr 9, 2024

evan-forbes mentioned this issue May 13, 2024

EPIC: Preventing and preparing for non-determinism between app versions #3475

Closed

4 tasks

rootulp added priority:normal optional label to track the relative priority of planned items and removed priority:low optional label to track the relative priority of planned items labels May 16, 2024

rootulp mentioned this issue May 16, 2024

Automate the sync from genesis test #3490

Open

rootulp removed the compatability label May 24, 2024

ninabarbakadze self-assigned this May 27, 2024

ninabarbakadze changed the title ~~Test for compatability between main and all previous versions~~ Non-determinism unit test May 27, 2024

evan-forbes added the WS: Maintenance 🔧 includes bugs, refactors, flakes, and tech debt etc label May 27, 2024

This was referenced Jun 3, 2024

test: consistent appHash between commits #3513

Merged

chore: add test/util/genesis on v1.x #3520

Merged

ninabarbakadze added a commit that referenced this issue Jun 10, 2024

chore: add test/util/genesis on v1.x (#3520)

9c6322c

## Overview adding genesis to utils as part of #2414

ninabarbakadze closed this as completed in #3513 Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-determinism unit test #2414

Non-determinism unit test #2414

evan-forbes commented Sep 4, 2023 •

edited by ninabarbakadze

Loading

evan-forbes commented Jan 12, 2024

rootulp commented Mar 12, 2024 •

edited

Loading

rootulp commented Mar 12, 2024

rootulp commented Mar 15, 2024

evan-forbes commented May 13, 2024 •

edited

Loading

rootulp commented May 16, 2024

rootulp commented May 17, 2024 •

edited

Loading

cmwaters commented May 23, 2024

Non-determinism unit test #2414

Non-determinism unit test #2414

Comments

evan-forbes commented Sep 4, 2023 • edited by ninabarbakadze Loading

Description

Acceptance Criteria

evan-forbes commented Jan 12, 2024

rootulp commented Mar 12, 2024 • edited Loading

rootulp commented Mar 12, 2024

rootulp commented Mar 15, 2024

evan-forbes commented May 13, 2024 • edited Loading

rootulp commented May 16, 2024

rootulp commented May 17, 2024 • edited Loading

cmwaters commented May 23, 2024

evan-forbes commented Sep 4, 2023 •

edited by ninabarbakadze

Loading

rootulp commented Mar 12, 2024 •

edited

Loading

evan-forbes commented May 13, 2024 •

edited

Loading

rootulp commented May 17, 2024 •

edited

Loading