-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix get_first_available_block for genesis; also make get_blocks(_with_limit) consistent #24760
Fix get_first_available_block for genesis; also make get_blocks(_with_limit) consistent #24760
Conversation
Fresh boot:
After first root:
After first prune:
|
let first_root = root_iterator.next().unwrap_or_default(); | ||
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct | ||
// to return it as first-available. | ||
if first_root == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will we hit this when restarting a node from snapshot and empty rocksdb dir? @joeaba had reported BT upload breakage when restarting warehouse nodes in this configuration as first available was reporting zero (and last the snapshot slot IIRC). I figured it was the original unwrap_or_default()
which makes it impossible to discern between starting from genesis and simply having no ledger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my testing on a node booting from snapshot and empty ledger, once the RPC service was available the first_root was set to the snapshot slot. This was true both before and after this PR.
If there were no other roots, first-available would report 0, though, because we are looking at the index-1 item in the iterator, which would be None
. I'm guessing this was what happened in the case you're referring to?
As you say, this was and is due to the unwrap_or_default()
. Should we return an error instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like "node still booting"...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to decide what the least wrong thing to do is here. Error seems right, but I'm not sure what the downstream effects might be. We can punt to another PR though I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can punt to another PR though I think
Sg. Possibly we also need to look at error-handling in the bigtable uploader as well
c81c40f
to
73adca1
Compare
Codecov Report
@@ Coverage Diff @@
## master #24760 +/- ##
===========================================
+ Coverage 70.1% 82.0% +11.9%
===========================================
Files 38 596 +558
Lines 2303 165370 +163067
Branches 325 0 -325
===========================================
+ Hits 1615 135759 +134144
- Misses 573 29611 +29038
+ Partials 115 0 -115 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
let first_root = root_iterator.next().unwrap_or_default(); | ||
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct | ||
// to return it as first-available. | ||
if first_root == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been trying to decide what the least wrong thing to do is here. Error seems right, but I'm not sure what the downstream effects might be. We can punt to another PR though I think
…_limit) consistent (#24760) (#24769) * Handle genesis more appropriately in get_first_available_block * Add unit test * Make get_blocks starting-slots consistent with other methods (cherry picked from commit b6a18e0) Co-authored-by: Tyera Eulberg <[email protected]>
…_limit) consistent (solana-labs#24760) * Handle genesis more appropriately in get_first_available_block * Add unit test * Make get_blocks starting-slots consistent with other methods
Problem
After #23403, the first two blocks of a fresh validator become inaccessible to RPC after the first root is set. This is because of some assumptions that are true for a ledger that has had at least one prune or was booted from a snapshot, but not true for a ledger fresh from genesis.
For one,
Blockstore::lowest_slot()
(whichBlockstore::get_first_available_block()
uses to seed its root iterator) explicitly skips slot 0, even though it is populated and has SlotMeta. Secondly, the genesis block doesn't need a populated parent block to calculate parent blockhash, because it is genesis.Summary of Changes
Blockstore::lowest_slot_with_genesis()
with the same logic asBlockstore::lowest_slot()
, except the return slot may be zeroBlockstore::get_first_available_block()
to return 0 when that is the first rootlowest_slot()
instead ofget_first_available_block()
. So I fixed the glitch.Fixes #23853