Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tip of master panic!()-ed with mainnet-beta's snapshot #16570

Closed
ryoqun opened this issue Apr 15, 2021 · 12 comments
Closed

tip of master panic!()-ed with mainnet-beta's snapshot #16570

ryoqun opened this issue Apr 15, 2021 · 12 comments
Assignees

Comments

@ryoqun
Copy link
Member

ryoqun commented Apr 15, 2021

Problem

maybe @jeffwashington 's #16310?

thread 'main' panicked at 'out of range', runtime/src/accounts_index.rs:208:9
stack backtrace:
   0: std::panicking::begin_panic
             at /home/ryoqun/.rustup/toolchains/nightly-2021-02-18-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/std/src/panicking.rs:519:12
   1: solana_runtime::accounts_index::RollingBitField::check_range
             at /home/ryoqun/work/solana/solana/runtime/src/accounts_index.rs:208:9
   2: solana_runtime::accounts_index::RollingBitField::insert
             at /home/ryoqun/work/solana/solana/runtime/src/accounts_index.rs:217:9
   3: solana_runtime::accounts_index::AccountsIndex<T>::add_root
             at /home/ryoqun/work/solana/solana/runtime/src/accounts_index.rs:1174:9
   4: solana_runtime::accounts_db::AccountsDb::generate_index
             at /home/ryoqun/work/solana/solana/runtime/src/accounts_db.rs:4911:13
   5: solana_runtime::serde_snapshot::reconstruct_accountsdb_from_fields
             at /home/ryoqun/work/solana/solana/runtime/src/serde_snapshot.rs:365:5
   6: solana_runtime::serde_snapshot::reconstruct_bank_from_fields
             at /home/ryoqun/work/solana/solana/runtime/src/serde_snapshot.rs:244:27
   7: solana_runtime::serde_snapshot::bank_from_stream
             at /home/ryoqun/work/solana/solana/runtime/src/serde_snapshot.rs:156:30
   8: solana_runtime::snapshot_utils::rebuild_bank_from_snapshots::{{closure}}
             at /home/ryoqun/work/solana/solana/runtime/src/snapshot_utils.rs:801:40
   9: solana_runtime::snapshot_utils::deserialize_snapshot_data_file_capped
             at /home/ryoqun/work/solana/solana/runtime/src/snapshot_utils.rs:488:15
  10: solana_runtime::snapshot_utils::deserialize_snapshot_data_file
             at /home/ryoqun/work/solana/solana/runtime/src/snapshot_utils.rs:436:5
  11: solana_runtime::snapshot_utils::rebuild_bank_from_snapshots
             at /home/ryoqun/work/solana/solana/runtime/src/snapshot_utils.rs:799:16
  12: solana_runtime::snapshot_utils::bank_from_archive
             at /home/ryoqun/work/solana/solana/runtime/src/snapshot_utils.rs:617:16
  13: solana_ledger::bank_forks_utils::load
             at /home/ryoqun/work/solana/solana/ledger/src/bank_forks_utils.rs:62:41
  14: solana_core::validator::new_banks_from_ledger
             at /home/ryoqun/work/solana/solana/core/src/validator.rs:1109:70
  15: solana_core::validator::Validator::new
             at /home/ryoqun/work/solana/solana/core/src/validator.rs:397:13
  16: solana_validator::main
             at /home/ryoqun/work/solana/solana/validator/src/main.rs:2481:21
  17: core::ops::function::FnOnce::call_once
             at /home/ryoqun/.rustup/toolchains/nightly-2021-02-18-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/ops/function.rs:227:5
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

Proposed Solution

@jeffwashington
Copy link
Contributor

yep, sounds like me.

@jeffwashington jeffwashington self-assigned this Apr 15, 2021
@jeffwashington
Copy link
Contributor

I have so far been unable to reproduce this. I got a snapshot from a gce machine that failed for @ryoqun. I started a validator with and without --no-snapshot-fetch. I ran ledger-tool verify on the snapshot.

@jeffwashington
Copy link
Contributor

I have prepared this pr to revert the change.
This pr improves the assert.

@jeffwashington
Copy link
Contributor

Ok, I was able to reproduce it. user error.
count: 246891, min: 71484741, max: 73581892, width: 2097152, key: 73581893

@jeffwashington
Copy link
Contributor

@jeffwashington
Copy link
Contributor

`root from snapshot: 73790273 (snapshot head: what is this called?)

slots with ONLY zero lamport accounts:
min: 71,500,402 (2,289,871 behind snapshot head)
max: 73,358,267 (432,006 behind snapshot head)
width: 1,857,865
slot count: 73,008
zero lamport account count in this old root range (could be dups): 156,880

first non-zero lamport account:
slot: 73,358,273 (432,000 behind snapshot head)


root info at the point of adding a 2M width root:
count: 288,896
min: 71,500,402
max: 73,597,554 (192,719 behind snapshot head)
width: 2,097,152
trying to add root: 73,597,554
This would exceed 2M width (73,597,554 + 1 - min)


2 accounts in the min root:
min root:71,500,402
7jEfU57R2sV2B1DddKdsqZsdHaHm3B15REb4abvP6Me2, slot: 71,500,402, index: 0, lamports: 0
C57GmZLsPviiHZqWjYHo9is8QnMNq1Fc7SkYvLxLts24, slot: 71,500,402, index: 0, lamports: 0

C57GmZLsPviiHZqWjYHo9is8QnMNq1Fc7SkYvLxLts24 ONLY shows up at root 71,500,402

7jEfU57R2sV2B1DddKdsqZsdHaHm3B15REb4abvP6Me2 shows up in 2 subsequent slots:
7jEfU57R2sV2B1DddKdsqZsdHaHm3B15REb4abvP6Me2, slot: 73,365,276, index: 79377, lamports: 0
7jEfU57R2sV2B1DddKdsqZsdHaHm3B15REb4abvP6Me2, slot: 73,466,436, index: 171049, lamports: 0

`

@jeffwashington
Copy link
Contributor

jeffwashington commented Apr 16, 2021

Talked with @sakridge to confirm some things:
Zero lamport accounts in rooted slots with no earlier slot containing a non-zero lamport instance can be ignored/removed.
In this case, there are 156k accounts over the first 73k slots we encounter in the snapshot that only contain zero lamport accounts. All these accounts can be/could have been cleaned/removed/deleted/ignored. ideally, they would have been cleaned before the snapshot was created by whatever validator created it.

@jeffwashington
Copy link
Contributor

currently, 4m accounts are allowed, so this should not panic anymore.

@jeffwashington
Copy link
Contributor

#16838
#16830

@jeffwashington
Copy link
Contributor

logging:
#16636

@sakridge
Copy link
Member

sakridge commented May 5, 2021

I think we can close now?

@jeffwashington
Copy link
Contributor

the limit for roots tracker was set wider, avoiding the panic.
roots tracker is now stretchy, eliminating the possibility of panic.
there are metrics tracking how wide of a range is currently active.
clean is now erasing old accounts, keeping the range of roots as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants