forked from solana-labs/solana
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check poh_recorder.start_slot() hasn't been dumped previously before checking it in ProgressMap. #2676
Merged
wen-coding
merged 7 commits into
anza-xyz:master
from
wen-coding:replay_audit_must_exist
Aug 22, 2024
Merged
Check poh_recorder.start_slot() hasn't been dumped previously before checking it in ProgressMap. #2676
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1c34597
Check poh_recorder.start_slot() hasn't been dumped previously before …
wen-coding 8f36a5c
Merge branch 'master' into replay_audit_must_exist
wen-coding 9f96749
Add more comments and put in checks for maybe_start_leader.
wen-coding d2263d1
Update core/src/replay_stage.rs
wen-coding 5a50220
Use a slot which I am not leader to avoid dumping my own slot panic.
wen-coding 6591ddf
Address reviewer comments.
wen-coding ee936a6
Address reviewer comments.
wen-coding File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm if we're about to dump the slot, shouldn't we be resetting PoH to an ancestor of that slot? @AshwinSekar
Seems like
select_vote_and_reset_forks
should be handling the reset to account for latest duplicate we reset to before we get hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in this case we shouldn't reset to 288668259 which links to an invalid ancestor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dump_then_repair
happens in between the reset andmaybe_start_leader
.It does have a check to ensure that it doesn't dump a slot that is an ancestor of a
working_bank
we currently have, however that doesn't help ifmaybe_start_leader
is just about to tick to our leader slot.We do reset to the duplicate fork deepest descended from our last vote in case of switching such as the example wen posted above. To achieve what you're hinting at we should modify
dump_then_repair
to reset to the heaviest block descended from the last vote that is not dumped (or in case of lockout, from the switch slot) OR have select forks talk to the state machine in order to exclude blocks that could be potentially dumped.I have little confidence in making a fork choice modification that we can backport to 2.0 at this time. I would prefer to hold off on starting leader until dump and repair has succeeded or we have reset to another bank. In the meantime we can make make the fork choice change on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm so the current order of events is:
process_duplicate_slots()
select_vote_and_reset_forks
runs and picks the reset bank, which currently does not account for the new duplicates from 1?retransmit_latest_unpropagated_leader_slot
So the issue even though we've updated fork choice in 1, the reset bank is still picking an invalid fork to reset to in 2 right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought select vote and reset in 2. already accounted for slots marked duplicate by 1 that should be marked as invalid in fork choice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far I see, yes. We currently use
heaviest_bank_on_same_voted_fork
for reset bank if switch threshold fails, and we don't seem to set heaviest_bank_on_same_voted_fork to a valid block, instead we change to deepest slot:https://github.com/anza-xyz/agave/blob/master/core/src/consensus/heaviest_subtree_fork_choice.rs#L1138
Did I read that correctly?