Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with devnet/mainnet-beta #10163

Closed
mvines opened this issue May 21, 2020 · 12 comments
Assignees
Milestone

Comments

@mvines
Copy link
Member

mvines commented May 21, 2020

I believe the last remaining rolling update to v1.1 has now activated and master/v1.1 should be compatible again. However starting a master validator, thusly:

solana-validator \
    --trusted-validator 5D1fNXzvv5NjV1ysLjirC4WY92RNsVH18vjmcszZd8on \
    --no-untrusted-rpc \
    --ledger ledger-testnet \
    --rpc-port 8899 \
    --dynamic-port-range 8000-8010 \
    --entrypoint 35.203.170.30:8001 \
    --expected-genesis-hash 4uhcVJyU9pJkvQyS88uRDiswHXSCkY3zQawwpjk2NsNY \
    --expected-shred-version 56096 \
    --enable-rpc-transaction-history \
    -o -

produces a bank hash mismatch on snapshot loading, eg:

WARN  solana_runtime::accounts] verify_bank_hash failed: MismatchedAccountHash
thread 'main' panicked at 'Snapshot bank for slot 14786165 failed to verify', ledger/src/snapshot_utils.rs:478:9

Status update is here: #10163 (comment)

@ryoqun
Copy link
Member

ryoqun commented May 21, 2020

Sorry, my bad.... I found a hunch for this: https://github.com/solana-labs/solana/pull/10166/files#r428900953

@CriesofCarrots
Copy link
Contributor

I saw this against devnet last night as well

@ryoqun
Copy link
Member

ryoqun commented May 21, 2020

@CriesofCarrots Thanks for the heads-up. I'll look into it as well!

@ryoqun
Copy link
Member

ryoqun commented May 22, 2020

First, our sanity is still held; recent tds and devnet snapshots are good on v1.1 branch:

$ ./target/release/solana-ledger-tool verify --ledger data/tds
[2020-05-22T08:12:14.170893233Z INFO  solana_ledger::hardened_unpack] Extracting "/home/ubuntu/solana-git/data/solana-git/data/tds/genesis.tar.bz2"...
[2020-05-22T08:12:14.180453803Z INFO  solana_ledger::hardened_unpack] Extracted "/home/ubuntu/solana-git/data/solana-git/data/tds/genesis.tar.bz2" in 9.522254ms
4uhcVJyU9pJkvQyS88uRDiswHXSCkY3zQawwpjk2NsNY
[2020-05-22T08:12:14.183034063Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-22T08:12:14.183054402Z INFO  solana_ledger::blockstore] Opening database at "/home/ubuntu/solana-git/data/solana-git/data/tds/rocksdb"
[2020-05-22T08:12:14.268429827Z INFO  solana_ledger::blockstore] "/home/ubuntu/solana-git/data/solana-git/data/tds/rocksdb" open took 85ms
[2020-05-22T08:12:14.268569481Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/home/ubuntu/solana-git/data/solana-git/data/tds/snapshot"
[2020-05-22T08:12:14.269152281Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/home/ubuntu/solana-git/data/solana-git/data/tds/snapshot-14911381-35nyeZohe4QHAuPkK8EzfnFjpMAPkrBgoj2Sv6bb6dcU.tar.bz2"
[2020-05-22T08:12:25.523066085Z INFO  solana_ledger::snapshot_utils] snapshot untar took 11.3s
[2020-05-22T08:12:25.523118679Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.1.0
[2020-05-22T08:12:25.523149667Z INFO  solana_ledger::snapshot_utils] Loading bank from "/home/ubuntu/solana-git/data/solana-git/data/tds/snapshot/.tmp8HOgkN/snapshots/14911381/14911381"
[2020-05-22T08:12:25.538092833Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-05-22T08:12:27.382789411Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-05-22T08:12:27.398165177Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 14911381
[2020-05-22T08:12:27.398536535Z INFO  solana_runtime::accounts_db] total_stores: 23867, newest_slot: 14911381, oldest_slot: 0, max_slot: 14911380 (num=21), min_slot: 14849535 (num=1)
[2020-05-22T08:12:27.398576622Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-05-22T08:12:27.398781801Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=23867i
[2020-05-22T08:12:27.792689270Z INFO  solana_runtime::accounts_db] scan took 120us merge took 216ns accumulate took 30us
[2020-05-22T08:12:27.792751935Z INFO  solana_runtime::bank] bank frozen: 14911381 hash: EydkG6hN2ViSC61WyUxKKKbZCNQyArGQbsnsiStvJJmV accounts_delta: 3XPAEf6TSEuLhgn7TjBkrPqjpn5y1TPppz5vL2TQ1H3X signature_count: 9 last_blockhash: E8qsL6Rx1SjPKGDz8A5kqjoQjhxfwkmiVbJTCpNAYpaw capitalization: 1111022875239950860
[2020-05-22T08:12:27.792782676Z INFO  solana_runtime::bank] accounts hash slot: 14911381 stats: BankHashStats { num_removed_accounts: 18, num_added_accounts: 0, num_lamports_stored: 7872659212535785, total_data_len: 180519, num_executable_accounts: 0 }
[2020-05-22T08:12:27.792794507Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 2.3s
[2020-05-22T08:12:27.829506487Z INFO  solana_ledger::blockstore_processor] processing ledger from slot 14911381...
[2020-05-22T08:12:27.928011940Z INFO  solana_ledger::blockstore_processor] ledger processed in 97ms. 92 MB allocated. 1 fork at 14911381, with 1 frozen bank
Ok
$ stat data/tds/snapshot-14911381-35nyeZohe4QHAuPkK8EzfnFjpMAPkrBgoj2Sv6bb6dcU.tar.bz2
  File: data/tds/snapshot-14911381-35nyeZohe4QHAuPkK8EzfnFjpMAPkrBgoj2Sv6bb6dcU.tar.bz2
  Size: 18621942        Blocks: 36376      IO Block: 4096   regular file
Device: 34h/52d Inode: 13071788    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1000/  ubuntu)
Access: 2020-05-22 17:12:14.268868021 +0900
Modify: 2020-05-22 17:11:33.822432815 +0900
Change: 2020-05-22 17:11:33.822432815 +0900
 Birth: -


$ ./target/release/solana-ledger-tool verify --ledger data/devnet-eager-rent-collection-test/
HzyuivuNXMHJKjM6q6BE2qBsR3etqW21BSvuJTpJFj9A
[2020-05-22T08:18:29.967602097Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-22T08:18:29.967624826Z INFO  solana_ledger::blockstore] Opening database at "/home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/rocksdb"
[2020-05-22T08:18:30.021307397Z INFO  solana_ledger::blockstore] "/home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/rocksdb" open took 53ms
[2020-05-22T08:18:30.021344282Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/snapshot"
[2020-05-22T08:18:30.022330374Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/snapshot-5356748-58qxTSuyc8FzymrEUGg75jfKZkYySptn1JP8HJntzkJZ.tar.bz2"
[2020-05-22T08:26:14.007303556Z INFO  solana_ledger::snapshot_utils] snapshot untar took 464.0s
[2020-05-22T08:26:14.007537662Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.1.0
[2020-05-22T08:26:14.007583539Z INFO  solana_ledger::snapshot_utils] Loading bank from "/home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/snapshot/.tmpq7N7rW/snapshots/5356748/5356748"
[2020-05-22T08:26:14.010409585Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-05-22T08:35:35.217878230Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-05-22T08:35:35.219352411Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 5356748
[2020-05-22T08:35:35.224417284Z INFO  solana_runtime::accounts_db] total_stores: 418688, newest_slot: 5356748, oldest_slot: 4656136, max_slot: 5290563 (num=5), min_slot: 4930967 (num=1)
[2020-05-22T08:35:35.224486213Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-05-22T08:35:35.224726627Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=418688i
[2020-05-22T08:35:54.444833443Z INFO  solana_runtime::accounts_db] scan took 131us merge took 436ns accumulate took 12us
[2020-05-22T08:35:54.444882582Z INFO  solana_runtime::bank] bank frozen: 5356748 hash: EMKvfP998kTWaqGK6iRQ4jsEiNbS9fcBN7osWTsF5Pnv accounts_delta: FHGUC6ANa7GkLd71iAHnmnxMhw3icJCuskKpjPmcjMoM signature_count: 1 last_blockhash: G4mBNuJhMX9hXTP7c2NrLdphUy2gv6gSUHBxbmokdVFr capitalization: 1009655446319737000
[2020-05-22T08:35:54.444913085Z INFO  solana_runtime::bank] accounts hash slot: 5356748 stats: BankHashStats { num_removed_accounts: 12, num_added_accounts: 0, num_lamports_stored: 8640062342062327, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-22T08:35:54.444932268Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 580.4s
[2020-05-22T08:35:54.449225268Z INFO  solana_ledger::blockstore_processor] processing ledger from slot 5356748...
[2020-05-22T08:35:54.451551941Z INFO  solana_ledger::blockstore_processor] ledger processed in 1ms. 1 MB allocated. 1 fork at 5356748, with 1 frozen bank
Ok
$ stat /home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/snapshot-5356748-58qxTSuyc8FzymrEUGg75jfKZkYySptn1JP8HJntzkJZ.tar.bz2
  File: /home/ubuntu/solana-git/data/solana-git/data/devnet-eager-rent-collection-test/snapshot-5356748-58qxTSuyc8FzymrEUGg75jfKZkYySptn1JP8HJntzkJZ.tar.bz2
  Size: 1725684446      Blocks: 3370480    IO Block: 4096   regular file
Device: 34h/52d Inode: 13096345    Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1000/  ubuntu)   Gid: ( 1000/  ubuntu)
Access: 2020-05-22 08:18:30.019510962 +0000
Modify: 2020-05-22 08:17:39.040437626 +0000
Change: 2020-05-22 08:17:39.040437626 +0000
 Birth: -

@ryoqun
Copy link
Member

ryoqun commented May 22, 2020

Phew, I've summarized about v1.1 and master snapshot hash compatibility:

devnet: not compatibile at the moment because account.owner hashing is effectively disabled: https://github.com/solana-labs/solana/pull/9918/files#r429132997
tds: not compatible at the moment because account.owner hashing is enabled in rolling fashion: https://github.com/solana-labs/solana/pull/10166/files#r428900953 (this finished)

Anyway, we can restore the compatibility again in a week (yeah, longer than expected as this issue description is pointing out) with flow of (tedious) events:

  1. (DONE) recently enabled eager rent collection automatically replaces all pre-account.owner hashing slots in an epoch or two (luckily, it's now weekend; so that will buy us time) for tds
  2. (DONE) Ensure all of appendvecs in tds's snapshot is above 14_000_000.
  3. (DONE) master, v1.1 and v1.2 will be compatible for tds at this point.
  4. (DONE) Then, we can land this pr by lowering the slot for devnet: v1.1: Always enable nonce slot/owner-in-hash #10166 => Lower owner hashing activation slot for devnet #10244
  5. (DONE) Release 1.2.0.
  6. (DONE) devnet is updated to the 1.2.0, NOT immediately taking effect of account.owner hashing; otherwise it would make it impossible to restart from the last snapshot.. (ouch...)
  7. (DONE) eager rent collection cycles for an epoch or two on devnet
  8. (DONE) master and v1.2 will be compatible for devnet finally.

As a afterthought, rolling update of account.hash causes tedious process but is needed to avoid bad snapshot issue.. Anyway, sorry for that.

@mvines

@ryoqun
Copy link
Member

ryoqun commented May 22, 2020

I also checked master vs v1.1 snapshot compatibility can be restored both for devnet and tds like this patch locally:

For tds

$ git diff
diff --git a/runtime/src/accounts_db.rs b/runtime/src/accounts_db.rs
index 697212c6c..5bbdd443a 100644
--- a/runtime/src/accounts_db.rs
+++ b/runtime/src/accounts_db.rs
@@ -1425,7 +1425,9 @@ impl AccountsDB {
             hasher.hash(&[0u8; 1]);
         }

-        hasher.hash(&owner.as_ref());
+        if slot >= 14_000_000 {
+            hasher.hash(&owner.as_ref());
+        }
         hasher.hash(&pubkey.as_ref());

         hasher.result()

for devnet

$ git diff
diff --git a/runtime/src/accounts_db.rs b/runtime/src/accounts_db.rs
index 697212c6c..140068fae 100644
--- a/runtime/src/accounts_db.rs
+++ b/runtime/src/accounts_db.rs
@@ -1425,7 +1425,7 @@ impl AccountsDB {
             hasher.hash(&[0u8; 1]);
         }   

-        hasher.hash(&owner.as_ref());
+        //hasher.hash(&owner.as_ref());
         hasher.hash(&pubkey.as_ref());

         hasher.result()

@mvines mvines modified the milestones: v1.2.0, v1.3.0 May 25, 2020
@mvines
Copy link
Member Author

mvines commented May 25, 2020

I'm working on this via https://github.com/solana-labs/solana/projects/54

@mvines mvines self-assigned this May 25, 2020
@ryoqun ryoqun changed the title verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with v1.1 verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with v1.2 May 26, 2020
@mvines mvines changed the title verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with v1.2 verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with devnet v1.2 May 28, 2020
@mvines mvines changed the title verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with devnet v1.2 verify_bank_hash failed: MismatchedAccountHash: Tip of master is still not compatible with devnet/mainnet-beta May 28, 2020
@mvines
Copy link
Member Author

mvines commented May 28, 2020

Update:

  • devnet: as of slot 5_958_049, master is still incompatible. (include_owner_in_hash activates on devnet as of slot 5_800_000).

  • mainnet-beta: as of slot 13_453_151, master is still incompatible. (include_owner_in_hash activates on devnet as of slot 12_500_000), and eager rent enables on epoch 34.

  • testnet: master<->testnet v1.1 ok!

Next steps: check again on June 2nd as this is just a waiting game

@ryoqun
Copy link
Member

ryoqun commented May 29, 2020

@mvines Thanks for the update! Your understanding is correct!

Also, note that the compatibility with mainnet-beta is blocked by the enablement of eager-rent collection because otherwise old account state in pre-hasing-change still remains (this still hasn't for the cluster..)

For devnet, it looks like it's gradually transitioning, you can check the progress by decreasing count printed at: https://github.com/solana-labs/solana/blob/v1.2.0/runtime/src/accounts_db.rs#L1503

@mvines
Copy link
Member Author

mvines commented Jun 1, 2020

  • devnet: as of slot 5_958_049, master is still incompatible. (include_owner_in_hash activates on devnet as of slot 5_800_000).

devnet is now compatible with master!

  • mainnet-beta: as of slot 13_453_151, master is still incompatible. (include_owner_in_hash activates on devnet as of slot 12_500_000), and eager rent enables on epoch 34.

No change to status. Retest mainnet-beta<->master compat on epoch 35.

@ryoqun
Copy link
Member

ryoqun commented Jun 6, 2020

Now, mainnet-beta is on epoch 35.

And, I confirmed that mainnet-beta <-> master compat is finally restored.

@mvines
Copy link
Member Author

mvines commented Jun 6, 2020

Looks good here too, we're finally done!

@mvines mvines closed this as completed Jun 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants