Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-determinism unit test #2414

Closed
3 tasks
Tracked by #2061 ...
evan-forbes opened this issue Sep 4, 2023 · 8 comments · Fixed by #3513
Closed
3 tasks
Tracked by #2061 ...

Non-determinism unit test #2414

evan-forbes opened this issue Sep 4, 2023 · 8 comments · Fixed by #3513
Assignees
Labels
nice to have item is not blocking or required. priority:normal optional label to track the relative priority of planned items testing items that are strictly related to adding or extending test coverage WS: Maintenance 🔧 includes bugs, refactors, flakes, and tech debt etc WS: V2 ✌️ lemongrass hardfork related
Milestone

Comments

@evan-forbes
Copy link
Member

evan-forbes commented Sep 4, 2023

Description

Write a non-determinism unit test for the state machine. The test will involve creating a set of transactions, executing them, and comparing the resulting apphash with an expected value.

Acceptance Criteria

  • Submit and execute txs using txsim
  • Should cover v1 and v2
  • Assert expected state and data roots
@evan-forbes evan-forbes added testing items that are strictly related to adding or extending test coverage compatability labels Sep 4, 2023
This was referenced Sep 20, 2023
@evan-forbes evan-forbes added this to the v2 milestone Nov 13, 2023
@evan-forbes
Copy link
Member Author

the goal of this test is to ensure that whatever branch is capable of syncing from scratch. In theory, we can do this with just unit tests, however there are so many different hyper werid edge cases that we don't know what all of those are yet.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

we could maybe also run some deterministic (same inputs to each) fuzzing script on each version, and save those results. whatever branch would have to have identical results to process proposal.

@evan-forbes evan-forbes changed the title Sync mainnet once a week using the latest commit from the relevant release branch Test for compatability between main and all previous versions Jan 29, 2024
@evan-forbes evan-forbes added the WS: V2 ✌️ lemongrass hardfork related label Feb 27, 2024
@rootulp
Copy link
Collaborator

rootulp commented Mar 12, 2024

Just noting the OP proposes two distinct tests so we could split this into two issues:

  1. Create a test that compares the results of processing and executing txs on the main branch for each app version.
  2. Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

IMO the first one is more valuable than the second because I expect we'll be able to run the first one in CI and make it a required check prior to PR merge. I expect the second test will take much longer to run and may only be able to run via the nightly (or weekly 😨) CI jobs. That means we'll get delayed signal via the second test. If we're committing to single binary syncs then IMO we still need the second test. But ideally the first test gives us confidence to merge PRs and the second test gives us confidence that new release candidates still preserve the single binary feature.

for example, we could execute one transaction per block, for every possible transaction type with the v1 binary, save those roots somewhere, and then compare the roots against whatever the result is of the other branch.

I really like this idea.

@rootulp rootulp added the nice to have item is not blocking or required. label Mar 12, 2024
@rootulp
Copy link
Collaborator

rootulp commented Mar 12, 2024

We discussed this in a sync and came to the conclusion that this test:

  1. Create a test that compares the results of processing and executing txs on the main branch for each app version.

is a nice to have.

  1. Create a significantly longer test that syncs mainnet from scratch using a block data archive node.

is release blocking but it can be performed manually via the mainnet.sh script.

@rootulp
Copy link
Collaborator

rootulp commented Mar 15, 2024

Evan and I paired on a prototype for the first test: https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan

@rootulp rootulp added needs:triage priority:low optional label to track the relative priority of planned items and removed needs:triage labels Apr 9, 2024
@evan-forbes
Copy link
Member Author

evan-forbes commented May 13, 2024

If we take the route of expanding the existing knuu infrastructure, we could run this test directly after a PR is merged instead of nightly.

@rootulp rootulp added priority:normal optional label to track the relative priority of planned items and removed priority:low optional label to track the relative priority of planned items labels May 16, 2024
@rootulp
Copy link
Collaborator

rootulp commented May 16, 2024

Spun out #3490 so that this issue can focus on the first test in the OP:

Create a test that compares the results of processing and executing txs on the main branch for each app version.

@rootulp
Copy link
Collaborator

rootulp commented May 17, 2024

Notes:

  • We've observed flakiness with knuu tests so we're less inclined to add an additional test using that infra. cc: @ninabarbakadze

Ideas:

  1. Consider writing a manual version of this test using local_devnet. Note: if we do this, we should add a guide on how to create releases and add running the manual test as a step in the release guide.
  2. Consider exploring how other chains (Cosmos Hub, Osmosis) test for version compatibility. However it's likely they don't have coverage for our use case (rolling binary upgrades).
  3. Consider continuing with https://github.com/celestiaorg/celestia-app/tree/rp/pair-with-evan

@cmwaters
Copy link
Contributor

So my take on this is that we already have a script for syncing a node from genesis. Which we should use before every release. If we want it to be faster then we can have some beefy machine we run and just spin up a new node in the same machine (to skip the p2p overhead). Or if we wanted this to be a unit test we could persists only the block and state stores and run the application over the set of transactions (kind of like a local block sync) and compare the app hashes.

The second thing I would advocate for is to add the main docker image to the TestMinorVersionCompatibility. That way we have main (using v1) and all our other v1 tags working together. If we feel txsim is not doing a sufficient job we can expand on the amount of different messages it sends.

When we upgrade to v2, we will need to modify TestMinorVersionCompatibility to move from v1 state machine to v2 state machine for future v2 minor releases. We could create more elaborate tests but I want to review our single binary syncs approach before continuing.

@ninabarbakadze ninabarbakadze self-assigned this May 27, 2024
@ninabarbakadze ninabarbakadze changed the title Test for compatability between main and all previous versions Non-determinism unit test May 27, 2024
@evan-forbes evan-forbes added the WS: Maintenance 🔧 includes bugs, refactors, flakes, and tech debt etc label May 27, 2024
ninabarbakadze added a commit that referenced this issue Jun 10, 2024
## Overview

adding genesis to utils as part of #2414
ninabarbakadze added a commit that referenced this issue Jun 10, 2024
## Overview

Fixes #2414 

### This test is split up in a few parts:

- Execute transactions deterministically on v1 and save the AppHash.
#3522 ([includes genesis
folder backport from
utils](#3520))
- Set up the same testApp environment on main, execute transactions and
assert that the AppHash matches the one from v1

### Verifying the AppHash: 

If you want to make sure that the expected AppHash is correct, you can
run the [non determinism test on
v1.x](#3522) that
generates it.

### Follow-ups:
#3509 
#3540

---------

Co-authored-by: Rootul P <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
nice to have item is not blocking or required. priority:normal optional label to track the relative priority of planned items testing items that are strictly related to adding or extending test coverage WS: Maintenance 🔧 includes bugs, refactors, flakes, and tech debt etc WS: V2 ✌️ lemongrass hardfork related
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants