Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MTG-1057 Add consistency check tools #343

Open
wants to merge 16 commits into
base: new-main
Choose a base branch
from
Open
583 changes: 568 additions & 15 deletions Cargo.lock

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ members = [
"tests/setup",
"backfill_rpc",
"integrity_verification",
"integration_tests"
]
"integration_tests",
"consistency_check"]

[workspace.dependencies]

Expand Down
41 changes: 41 additions & 0 deletions consistency_check/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
[package]
name = "consistency_check"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { workspace = true, features = ["env"] }
tokio = { workspace = true, features = ["sync"] }
tokio-util = { workspace = true }
solana-sdk = "~1.18.13"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the version from the workspace? Same question applies to other versions below

solana-client = "~1.18.13"
solana-accounts-db = "1.18.13"
solana-runtime = "~1.18.13"
solana-frozen-abi-macro = "~1.18.13"
mpl-bubblegum = { workspace = true }
csv = { workspace = true}
tempfile = { workspace = true }
rocks-db = { path = "../rocks-db" }
metrics-utils = { path = "../metrics_utils" }
nft_ingester = { path = "../nft_ingester" }
spl-concurrent-merkle-tree = { version = "0.4.0" }
usecase = { path = "../usecase" }
indicatif = { workspace = true }
serde_json = { workspace = true }
serde = { workspace = true }
bincode = { workspace = true }
tar = { workspace = true }
zstd = "0.12.4"
memmap2 = "0.9.0"
thiserror = { workspace = true }
lazy_static = { workspace = true }
tracing = { workspace = true }
tracing-subscriber = { workspace = true}

[[bin]]
name = "compressed_assets"

[[bin]]
name = "regular_assets"
63 changes: 63 additions & 0 deletions consistency_check/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Consistency check tools

This crate has two binaries to check data consistency in the DB.

## Compressed assets

Binary `compressed_assets` is taking `csv` file with tree keys and check proof for each minted asset in a tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Binary `compressed_assets` is taking `csv` file with tree keys and check proof for each minted asset in a tree.
The binary `compressed_assets` takes a `csv` file with tree keys and checks the proof for each minted asset in a tree.


Here is example of `csv` file it expects to receive:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here is example of `csv` file it expects to receive:
Here is an example of the `csv` file it expects to receive:


```csv
5wmXasetQTJJ54L3MJta8a4TNPX9piBDneRsC2m2x3Lw
5c5GTTkDVHerDyvXM3gb8bF63AT1ejQ1KzRkJYF7YUnL
DyxaLr1TwhQxD39jdgCYcScZouT815tuHpcLbjEz7ejo
5mEWS3Nzi4seDVLTm8eozdYzMb9vmADzgaT5mG9hfbHm
6B1xTnmCY7naTCJaxKsT6GCRUpZ6NTvJezSowEespk8a
ErmSicq5YrwGdhsvKbzvcotb11ygvHJbVQ1XAJTpmBYc
EDR6ywjZy9pQqz7UCCx3jzCeMQcoks231URFDizJAUNq
Ude9FcHfavnXhPWAUjvPZQ2sbVwJ6bJowQs7FA7nVJg
BrDVQPfTAUCFo5YtJSALM8jt71cW3fGCKYGPDsCSp1rS
```

Launch command:

```
cargo r --bin compressed_assets -- --rpc-endpoint https://solana.rpc --db-path /path/to/rocksdb --trees-file-path ./trees.csv --workers 50 --inner-workers 300
```

Workers parameter points how many trees will be processed in parallel.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Workers parameter points how many trees will be processed in parallel.
The `workers` parameter points to how many trees will be processed in parallel.


Inner workers parameter points how many threads each worker will use to process the tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Inner workers parameter points how many threads each worker will use to process the tree.
The `inner-workers` parameter points to how many threads each worker will use to process the tree.


After launch it will show progress bar with information about how much assets and trees already processed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After launch it will show progress bar with information about how much assets and trees already processed.
After launch, it will show a progress bar with information about how many assets and trees have already been processed.


Once it finishes its job it will create two `csv` files: `failed_checks.csv` and `failed_proofs.csv`.

Failed checks file contains tree addresses which were not processed because of some errors, it could RPC error or tree config account deserialisation error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Failed checks file contains tree addresses which were not processed because of some errors, it could RPC error or tree config account deserialisation error.
The Failed checks file contains tree addresses that were not processed because of some errors, which could be an RPC error or tree config account deserialization error.


Failed proofs file contains data like `treeID,assetID`, it shows assets which has invalid proofs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Failed proofs file contains data like `treeID,assetID`, it shows assets which has invalid proofs.
The Failed proofs file contains data in format `treeID, assetID`, and it shows assets that have invalid proofs.


## Regular assets

Binary `regular_assets` is taking Solana accounts snapshot and verifies that DB is not missing any key from snapshot.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Binary `regular_assets` is taking Solana accounts snapshot and verifies that DB is not missing any key from snapshot.
The binary `regular_assets` takes a Solana accounts snapshot and verifies that the DB is not missing any key from the snapshot.


Launch command:

```
cargo r --bin regular_assets -- --db-path /path/to/rocksdb --snapshot-path /path/to/snapshot.tar.zst --inner-workers 100
```

There are two threads spawned. One to check NFTs and one to check fungible tokens.

Parameter inner workers points how many threads each of that worker going to use to check if account is in DB.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Parameter inner workers points how many threads each of that worker going to use to check if account is in DB.
The parameter `inner-workers` points to how many threads each of those workers is going to use to check if the account is in the DB.


After launch it will show progress bar which shows how many assets iterated over. So that counter shows amount of keys in a snapshot but not number of NFTs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After launch it will show progress bar which shows how many assets iterated over. So that counter shows amount of keys in a snapshot but not number of NFTs.
After launch, it will show a progress bar which shows how many assets have been iterated over. So that counter shows the number of keys in a snapshot but not the number of NFTs.


Once it finishes its job it will create three files: `missed_asset_data.csv`, `missed_mint_info.csv`, `missed_token_acc.csv`.

Missed asset data file contains NFTs for which we missed asset data. Asset data is complete details about NFT.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Missed asset data file contains NFTs for which we missed asset data. Asset data is complete details about NFT.
The Missed asset data file contains NFTs for which we missed asset data. Asset data is complete details about the NFT.


Missed mint info file contains mint addresses which we missed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Missed mint info file contains mint addresses which we missed.
The Missed mint info file contains mint addresses that we missed.


Missed token acc file contains token accounts addresses which we missed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Missed token acc file contains token accounts addresses which we missed.
The Missed token acc file contains token account addresses that we missed.

Loading