Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check poh_recorder.start_slot() hasn't been dumped previously before checking it in ProgressMap. #2676

Merged
merged 7 commits into from
Aug 22, 2024
151 changes: 146 additions & 5 deletions core/src/replay_stage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ use {
},
crossbeam_channel::{Receiver, RecvTimeoutError, Sender},
rayon::{prelude::*, ThreadPool},
solana_accounts_db::contains::Contains,
solana_entry::entry::VerifyRecyclers,
solana_geyser_plugin_manager::block_metadata_notifier_interface::BlockMetadataNotifierArc,
solana_gossip::cluster_info::ClusterInfo,
Expand Down Expand Up @@ -1306,6 +1307,13 @@ impl ReplayStage {
) {
let start_slot = poh_recorder.read().unwrap().start_slot();

// It is possible that bank corresponding to `start_slot` has been
// dumped, so we need to double check it exists before proceeding
if !progress.contains(&start_slot) {
warn!("Poh start slot {start_slot}, is missing from progress map. This indicates that we are in the middle of a dump and repair. Skipping retransmission of unpropagated leader slots");
return;
}

Comment on lines +1312 to +1316
Copy link

@carllin carllin Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm if we're about to dump the slot, shouldn't we be resetting PoH to an ancestor of that slot? @AshwinSekar

Seems like select_vote_and_reset_forks should be handling the reset to account for latest duplicate we reset to before we get here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[2024-08-20T07:07:42.150220136Z WARN  solana_core::repair::cluster_slot_state_verifier] Cluster duplicate confirmed slot 288668257 with hash 2kQ6z6XvYiFc3hnw5PqhwREa9yDChBDqneVrjYvcabeu, but our version has hash AytETM4WSS9AheCCDLdiBwXVWkkmQY59EEZzdUKzDuBo
[2024-08-20T07:07:42.150226519Z INFO  solana_core::consensus::heaviest_subtree_fork_choice] marking fork starting at: (288668257, AytETM4WSS9AheCCDLdiBwXVWkkmQY59EEZzdUKzDuBo) invalid candidate
[2024-08-20T07:07:42.150244693Z INFO  solana_core::consensus::heaviest_subtree_fork_choice] Fork choice for (288668259, D4281472KYmdbJg8YFUHuJc59oGMTzJhNFeEUUtaAKuH) setting latest invalid ancestor from None to 288668257
[2024-08-20T07:07:42.150252709Z INFO  solana_core::consensus::heaviest_subtree_fork_choice] Fork choice for (288668258, Eq2j5UoAQFsJftEp7X5gtvF7MiykGznFySLT3mku3zwy) setting latest invalid ancestor from None to 288668257
[2024-08-20T07:07:42.150259652Z INFO  solana_core::consensus::heaviest_subtree_fork_choice] Fork choice for (288668257, AytETM4WSS9AheCCDLdiBwXVWkkmQY59EEZzdUKzDuBo) setting latest invalid ancestor from None to 288668257
[2024-08-20T07:07:42.162790246Z INFO  solana_core::replay_stage] 4mJV1MwoVG1Qztpwagnw396QGALvWQxbGXw5cChWKaJJ slot_weight: 288668260 0.0% 288668254
[2024-08-20T07:07:42.162816507Z INFO  solana_core::replay_stage] validator fork confirmed 288668225 21957ms
[2024-08-20T07:07:42.162821536Z INFO  solana_core::replay_stage] validator fork confirmed 288668219 25771ms
[2024-08-20T07:07:42.162825554Z INFO  solana_core::replay_stage] validator fork confirmed 288668238 13693ms
[2024-08-20T07:07:42.165829839Z INFO  solana_core::replay_stage] Waiting to switch vote to 288668260,
                    resetting to slot Some(288668259) for now,
                    switch proof stake: 0,
                    threshold stake: 106674872336469650,
                    total stake: 280723348253867482

It looks like in this case we shouldn't reset to 288668259 which links to an invalid ancestor?

Copy link

@AshwinSekar AshwinSekar Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dump_then_repair happens in between the reset and maybe_start_leader.
It does have a check to ensure that it doesn't dump a slot that is an ancestor of a working_bank we currently have, however that doesn't help if maybe_start_leader is just about to tick to our leader slot.

Seems like select_vote_and_reset_forks should be handling the reset to account for latest duplicate we reset to before we get here

We do reset to the duplicate fork deepest descended from our last vote in case of switching such as the example wen posted above. To achieve what you're hinting at we should modify dump_then_repair to reset to the heaviest block descended from the last vote that is not dumped (or in case of lockout, from the switch slot) OR have select forks talk to the state machine in order to exclude blocks that could be potentially dumped.

I have little confidence in making a fork choice modification that we can backport to 2.0 at this time. I would prefer to hold off on starting leader until dump and repair has succeeded or we have reset to another bank. In the meantime we can make make the fork choice change on master.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm so the current order of events is:

  1. Ingest new duplicate and update state machine and thus fork choice in process_duplicate_slots()
  2. select_vote_and_reset_forks runs and picks the reset bank, which currently does not account for the new duplicates from 1?
  3. Reset PoH happens to a block that will get dumped
  4. dump and repair dumps block from progress map
  5. Panic happens in retransmit_latest_unpropagated_leader_slot

So the issue even though we've updated fork choice in 1, the reset bank is still picking an invalid fork to reset to in 2 right?

Copy link

@carllin carllin Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought select vote and reset in 2. already accounted for slots marked duplicate by 1 that should be marked as invalid in fork choice

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far I see, yes. We currently use heaviest_bank_on_same_voted_fork for reset bank if switch threshold fails, and we don't seem to set heaviest_bank_on_same_voted_fork to a valid block, instead we change to deepest slot:
https://github.com/anza-xyz/agave/blob/master/core/src/consensus/heaviest_subtree_fork_choice.rs#L1138

Did I read that correctly?

if let (false, Some(latest_leader_slot)) =
progress.get_leader_propagation_slot_must_exist(start_slot)
{
Expand Down Expand Up @@ -1962,6 +1970,9 @@ impl ReplayStage {
// `poh_slot` and `parent_slot`, because they're in the same
// `NUM_CONSECUTIVE_LEADER_SLOTS` block, we still skip the propagated
// check because it's still within the propagation grace period.
//
// We've already checked in start_leader() that parent_slot hasn't been
// dumped, so we should get it in the progress map.
if let Some(latest_leader_slot) =
progress_map.get_latest_leader_slot_must_exist(parent_slot)
{
Expand Down Expand Up @@ -2034,11 +2045,12 @@ impl ReplayStage {

trace!("{} reached_leader_slot", my_pubkey);

let parent = bank_forks
.read()
.unwrap()
.get(parent_slot)
.expect("parent_slot doesn't exist in bank forks");
let Some(parent) = bank_forks.read().unwrap().get(parent_slot) else {
warn!(
"Poh recorder parent slot {parent_slot} is missing from bank_forks. This indicates \
that we are in the middle of a dump and repair. Unable to start leader");
return false;
};

assert!(parent.is_frozen());

Expand Down Expand Up @@ -3587,6 +3599,8 @@ impl ReplayStage {
vote_tracker: &VoteTracker,
cluster_slots: &ClusterSlots,
) {
// We would only reach here if the bank is in bank_forks, so it
// isn't dumped and should exist in progress map.
// If propagation has already been confirmed, return
if progress.get_leader_propagation_slot_must_exist(slot).0 {
return;
Expand Down Expand Up @@ -3902,6 +3916,8 @@ impl ReplayStage {
)
};

// If we reach here, the candidate_vote_bank exists in the bank_forks, so it isn't
// dumped and should exist in progress map.
let propagation_confirmed = is_leader_slot
|| progress
.get_leader_propagation_slot_must_exist(candidate_vote_bank.slot())
Expand Down Expand Up @@ -3979,6 +3995,8 @@ impl ReplayStage {
fork_tip: Slot,
bank_forks: &BankForks,
) {
// We would only reach here if the bank is in bank_forks, so it
// isn't dumped and should exist in progress map.
let mut current_leader_slot = progress.get_latest_leader_slot_must_exist(fork_tip);
let mut did_newly_reach_threshold = false;
let root = bank_forks.root();
Expand Down Expand Up @@ -4405,6 +4423,9 @@ impl ReplayStage {
for failure in heaviest_fork_failures {
match failure {
HeaviestForkFailures::NoPropagatedConfirmation(slot, ..) => {
// If failure is NoPropagatedConfirmation, then inside select_vote_and_reset_forks
// we already confirmed it's in progress map, we should see it in progress map
// here because we don't have dump and repair in between.
if let Some(latest_leader_slot) =
progress.get_latest_leader_slot_must_exist(*slot)
{
Expand Down Expand Up @@ -8527,6 +8548,126 @@ pub(crate) mod tests {
assert_eq!(received_slots, vec![8, 9, 11]);
}

#[test]
fn test_dumped_slot_not_causing_panic() {
solana_logger::setup();
let ReplayBlockstoreComponents {
validator_node_to_vote_keys,
leader_schedule_cache,
poh_recorder,
vote_simulator,
rpc_subscriptions,
ref my_pubkey,
ref blockstore,
..
} = replay_blockstore_components(None, 10, None::<GenerateVotes>);

let VoteSimulator {
mut progress,
ref bank_forks,
..
} = vote_simulator;

let poh_recorder = Arc::new(poh_recorder);
let (retransmit_slots_sender, _) = unbounded();

// Use a bank slot when I was not leader to avoid panic for dumping my own slot
let slot_to_dump = (1..100)
.find(|i| leader_schedule_cache.slot_leader_at(*i, None) != Some(*my_pubkey))
.unwrap();
let bank_to_dump = Bank::new_from_parent(
bank_forks.read().unwrap().get(0).unwrap(),
&leader_schedule_cache
.slot_leader_at(slot_to_dump, None)
.unwrap(),
slot_to_dump,
);
progress.insert(
slot_to_dump,
ForkProgress::new_from_bank(
&bank_to_dump,
bank_to_dump.collector_id(),
validator_node_to_vote_keys
.get(bank_to_dump.collector_id())
.unwrap(),
Some(0),
0,
0,
),
);
assert!(progress.get_propagated_stats(slot_to_dump).is_some());
bank_to_dump.freeze();
bank_forks.write().unwrap().insert(bank_to_dump);
let bank_to_dump = bank_forks
.read()
.unwrap()
.get(slot_to_dump)
.expect("Just inserted");

progress.get_retransmit_info_mut(0).unwrap().retry_time = Instant::now();
poh_recorder
.write()
.unwrap()
.reset(bank_to_dump, Some((slot_to_dump + 1, slot_to_dump + 1)));
assert_eq!(poh_recorder.read().unwrap().start_slot(), slot_to_dump);

// Now dump and repair slot_to_dump
let (mut ancestors, mut descendants) = {
let r_bank_forks = bank_forks.read().unwrap();
(r_bank_forks.ancestors(), r_bank_forks.descendants())
};
let mut duplicate_slots_to_repair = DuplicateSlotsToRepair::default();
let bank_to_dump_bad_hash = Hash::new_unique();
duplicate_slots_to_repair.insert(slot_to_dump, bank_to_dump_bad_hash);
let mut purge_repair_slot_counter = PurgeRepairSlotCounter::default();
let (dumped_slots_sender, dumped_slots_receiver) = unbounded();

ReplayStage::dump_then_repair_correct_slots(
&mut duplicate_slots_to_repair,
&mut ancestors,
&mut descendants,
&mut progress,
bank_forks,
blockstore,
None,
&mut purge_repair_slot_counter,
&dumped_slots_sender,
my_pubkey,
&leader_schedule_cache,
);
assert_eq!(
dumped_slots_receiver.recv_timeout(Duration::from_secs(1)),
Ok(vec![(slot_to_dump, bank_to_dump_bad_hash)])
);

// Now check it doesn't cause panic in the following functions.
ReplayStage::retransmit_latest_unpropagated_leader_slot(
&poh_recorder,
&retransmit_slots_sender,
&mut progress,
);

let (banking_tracer, _) = BankingTracer::new(None).unwrap();
// A vote has not technically been rooted, but it doesn't matter for
// this test to use true to avoid skipping the leader slot
let has_new_vote_been_rooted = true;
let track_transaction_indexes = false;

assert!(!ReplayStage::maybe_start_leader(
my_pubkey,
bank_forks,
&poh_recorder,
&leader_schedule_cache,
&rpc_subscriptions,
&mut progress,
&retransmit_slots_sender,
&mut SkippedSlotsInfo::default(),
&banking_tracer,
has_new_vote_been_rooted,
track_transaction_indexes,
));
}

#[test]
#[should_panic(expected = "We are attempting to dump a block that we produced")]
fn test_dump_own_slots_fails() {
Expand Down