Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in core/src/heaviest_subtree_fork_choice.rs, thread solReplayStage #32880

Closed
ruuda opened this issue Aug 18, 2023 · 7 comments · Fixed by #32944
Closed

Panic in core/src/heaviest_subtree_fork_choice.rs, thread solReplayStage #32880

ruuda opened this issue Aug 18, 2023 · 7 comments · Fixed by #32944
Labels
community Community contribution

Comments

@ruuda
Copy link
Contributor

ruuda commented Aug 18, 2023

Problem

One of our Solana mainnet nodes panicked. It was not validating at the time.

thread 'solReplayStage' panicked at 'a bank at last_voted_slot((212118116, ugZWxPNYKgnkE1h71oek7Xvr9vDvcwRS1PZ8HcgcRGv)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: <solana_core::heaviest_subtree_fork_choice::HeaviestSubtreeForkChoice as solana_core::fork_choice::ForkChoice>::select_forks
   3: solana_core::replay_stage::ReplayStage::new::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2023-08-18T07:20:50.607088657Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solReplayStage" one=1i message="panicked at 'a bank at last_voted_slot((212118116, ugZWxPNYKgnkE1h71oek7Xvr9vDvcwRS1PZ8HcgcRGv)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29" location="core/src/heaviest_subtree_fork_choice.rs:874:29" version="\"1.14.23 (src:devbuild; feat:1879391783)\""

This is running Solana 1.14.23.

Full validator logs since the earliest occurrence of ugZWxPNYKgnkE1h71oek7Xvr9vDvcwRS1PZ8HcgcRGv.

The code is this:

// Unless last vote is stray and stale, self.is_candidate(last_voted_slot_hash) must return
// Some(_), justifying to panic! here.
// Also, adjust_lockouts_after_replay() correctly makes last_voted_slot None,
// if all saved votes are ancestors of replayed_root_slot. So this code shouldn't be
// touched in that case as well.
// In other words, except being stray, all other slots have been voted on while this
// validator has been running, so we must be able to fetch best_slots for all of
// them.
panic!(
"a bank at last_voted_slot({:?}) is a frozen bank so must have been \
added to heaviest_subtree_fork_choice at time of freezing",
last_voted_slot_hash,
)

It was originally introduced in cb8661b#diff-f1f89ae82910cdfebacc27d6ef29a9a55dc89eea426a8e4fcb1586f4ca8da55cR501-R513 by @ryoqun, although the conditions around it have changed since then. @ryoqun, do you have any clue what might be going on here?

Proposed Solution

I am not familiar enough with the replay stage to investigate what went wrong here. Let me know if there is anything I can do to help diagnose this further.

@ruuda ruuda added the community Community contribution label Aug 18, 2023
@ruuda
Copy link
Contributor Author

ruuda commented Aug 18, 2023

It happened a second time to the same node:

solReplayStage' panicked at 'a bank at last_voted_slot((212139600, EQ1VXBxKTLePTpAwfYTWFusgGRzswiepwZntDrKXqG3R)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', [2023-08-18T09:58:55.224186191Z INFO  solana_metrics::metrics] datapoint: bank_weight slot=212139603i weight="0"
core/src/heaviest_subtree_fork_choice.rs:874:29
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: <solana_core::heaviest_subtree_fork_choice::HeaviestSubtreeForkChoice as solana_core::fork_choice::ForkChoice>::select_forks
   3: solana_core::replay_stage::ReplayStage::new::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2023-08-18T09:58:55.240753496Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solReplayStage" one=1i message="panicked at 'a bank at last_voted_slot((212139600, EQ1VXBxKTLePTpAwfYTWFusgGRzswiepwZntDrKXqG3R)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29" location="core/src/heaviest_subtree_fork_choice.rs:874:29" version="\"1.14.23 (src:devbuild; feat:1879391783)\""

Logs since first occurrence of EQ1VXBxKTLePTpAwfYTWFusgGRzswiepwZntDrKXqG3R.

@ruuda
Copy link
Contributor Author

ruuda commented Aug 18, 2023

It happened a third time to the same node:

thread 'solReplayStage' panicked at 'a bank at last_voted_slot((212165388, 4he27KybCzcVqhPkcL8ZG4WXTg2ZkspsJk3ZBdHHV9au)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29
stack backtrace:
   0: rust_begin_unwind
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
   1: core::panicking::panic_fmt
             at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
   2: <solana_core::heaviest_subtree_fork_choice::HeaviestSubtreeForkChoice as solana_core::fork_choice::ForkChoice>::select_forks
   3: solana_core::replay_stage::ReplayStage::new::{{closure}}
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
[2023-08-18T13:07:02.218784854Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solReplayStage" one=1i message="panicked at 'a bank at last_voted_slot((212165388, 4he27KybCzcVqhPkcL8ZG4WXTg2ZkspsJk3ZBdHHV9au)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29" location="core/src/heaviest_subtree_fork_choice.rs:874:29" version="\"1.14.23 (src:devbuild; feat:1879391783)\""

Logs since slightly before the first occurrence of 4he27KybCzcVqhPkcL8ZG4WXTg2ZkspsJk3ZBdHHV9au.

We run multiple nodes and it’s happening only on this machine, but so far we have no evidence it’s a problem with the host — no signs of data corruption etc.

@ruuda
Copy link
Contributor Author

ruuda commented Aug 21, 2023

The same issue occurred many more times over the weekend, now also on a different machine.

@ruuda
Copy link
Contributor Author

ruuda commented Aug 21, 2023

Some people are posting the same panic on Discord (1, 2). Trent says:

it's going to crash every time the secondary observes the primary voting at the tip

@ruuda
Copy link
Contributor Author

ruuda commented Aug 21, 2023

#32894 references the same problem.

@AshwinSekar
Copy link
Contributor

Fix has been released in 1.14.25 and 1.16.10 .

@Szymongib
Copy link

It is more rare, but we still see it on 1.14.27:

Sep 28 13:32:02 solana-validator[746621]: [2023-09-28T13:32:02.878695407Z ERROR solana_metrics::metrics] datapoint: panic program="validator" thread="solReplayStage" one=1i message="panicked at 'a bank at last_voted_slot((220308112, ArmtDGoACSv4jSwsY1XyqWpj9eAFhUsVM7j6n3EBHRrz)) is a frozen bank so must have been added to heaviest_subtree_fork_choice at time of freezing', core/src/heaviest_subtree_fork_choice.rs:874:29" location="core/src/heaviest_subtree_fork_choice.rs:874:29" version="\"1.14.27 (src:devbuild; feat:3351361749)\""
Sep 28 13:32:02 solana-validator[746621]: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
Sep 28 13:32:02 solana-validator[746621]:    3: solana_core::replay_stage::ReplayStage::new::{{closure}}
Sep 28 13:32:02 solana-validator[746621]:    2: <solana_core::heaviest_subtree_fork_choice::HeaviestSubtreeForkChoice as solana_core::fork_choice::ForkChoice>::select_forks
Sep 28 13:32:02 solana-validator[746621]:              at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/core/src/panicking.rs:143:14
Sep 28 13:32:02 solana-validator[746621]:    1: core::panicking::panic_fmt
Sep 28 13:32:02 solana-validator[746621]:              at ./rustc/7737e0b5c4103216d6fd8cf941b7ab9bdbaace7c/library/std/src/panicking.rs:584:5
Sep 28 13:32:02 solana-validator[746621]:    0: rust_begin_unwind

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Community contribution
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants