Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix get_first_available_block for genesis; also make get_blocks(_with_limit) consistent #24760

Merged
merged 3 commits into from
Apr 28, 2022

Conversation

CriesofCarrots
Copy link
Contributor

Problem

After #23403, the first two blocks of a fresh validator become inaccessible to RPC after the first root is set. This is because of some assumptions that are true for a ledger that has had at least one prune or was booted from a snapshot, but not true for a ledger fresh from genesis.
For one, Blockstore::lowest_slot() (which Blockstore::get_first_available_block() uses to seed its root iterator) explicitly skips slot 0, even though it is populated and has SlotMeta. Secondly, the genesis block doesn't need a populated parent block to calculate parent blockhash, because it is genesis.

Summary of Changes

  • Add Blockstore::lowest_slot_with_genesis() with the same logic as Blockstore::lowest_slot(), except the return slot may be zero
  • Rework logic in Blockstore::get_first_available_block() to return 0 when that is the first root
  • Also, I noticed (recollected?) that the Rpc endpoints getBlocks and getBlocksWithLimit aren't consistent with other endpoints in their starting block because they use lowest_slot() instead of get_first_available_block(). So I fixed the glitch.
  • Added unit test to demonstrate how the relevant Blockstore methods work

Fixes #23853

@CriesofCarrots
Copy link
Contributor Author

Fresh boot:

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getSlot","params":[{"commitment":"confirmed"]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":1,"id":1}

// Master

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":0,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[1,2,3,4,5],"id":1}

// This PR
$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":0,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[0,1,2,3,4],"id":1}

After first root:

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getSlot","params":["commitment":"confirmed"]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":50,"id":1}

// Master

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":2,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[1,2,3,4,5],"id":1}

// This PR:

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":0,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[0,1,2,3,4],"id":1}

After first prune:

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getSlot","params":["commitment":"confirmed"]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":503,"id":1}

// Master

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":363,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[362,363,364,365,366],"id":1}

// This PR

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getFirstAvailableBlock","params":[]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":363,"id":1}

$ curl -X POST -H "Content-Type: application/json" -d '{"jsonrpc": "2.0","id":1,"method":"getBlocksWithLimit","params":[0,5,{"commitment":"confirmed"}]}' 127.0.0.1:8899
{"jsonrpc":"2.0","result":[363,364,365,366,367],"id":1}

let first_root = root_iterator.next().unwrap_or_default();
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct
// to return it as first-available.
if first_root == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will we hit this when restarting a node from snapshot and empty rocksdb dir? @joeaba had reported BT upload breakage when restarting warehouse nodes in this configuration as first available was reporting zero (and last the snapshot slot IIRC). I figured it was the original unwrap_or_default() which makes it impossible to discern between starting from genesis and simply having no ledger

Copy link
Contributor Author

@CriesofCarrots CriesofCarrots Apr 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my testing on a node booting from snapshot and empty ledger, once the RPC service was available the first_root was set to the snapshot slot. This was true both before and after this PR.

If there were no other roots, first-available would report 0, though, because we are looking at the index-1 item in the iterator, which would be None. I'm guessing this was what happened in the case you're referring to?
As you say, this was and is due to the unwrap_or_default(). Should we return an error instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "node still booting"...?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to decide what the least wrong thing to do is here. Error seems right, but I'm not sure what the downstream effects might be. We can punt to another PR though I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can punt to another PR though I think

Sg. Possibly we also need to look at error-handling in the bigtable uploader as well

@codecov
Copy link

codecov bot commented Apr 28, 2022

Codecov Report

Merging #24760 (73adca1) into master (7b5aee7) will increase coverage by 11.9%.
The diff coverage is 97.5%.

@@             Coverage Diff             @@
##           master   #24760       +/-   ##
===========================================
+ Coverage    70.1%    82.0%    +11.9%     
===========================================
  Files          38      596      +558     
  Lines        2303   165370   +163067     
  Branches      325        0      -325     
===========================================
+ Hits         1615   135759   +134144     
- Misses        573    29611    +29038     
+ Partials      115        0      -115     

Copy link
Contributor

@t-nelson t-nelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

let first_root = root_iterator.next().unwrap_or_default();
// If the first root is slot 0, it is genesis. Genesis is always complete, so it is correct
// to return it as first-available.
if first_root == 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to decide what the least wrong thing to do is here. Error seems right, but I'm not sure what the downstream effects might be. We can punt to another PR though I think

@CriesofCarrots CriesofCarrots merged commit b6a18e0 into solana-labs:master Apr 28, 2022
mergify bot pushed a commit that referenced this pull request Apr 28, 2022
…_limit) consistent (#24760)

* Handle genesis more appropriately in get_first_available_block

* Add unit test

* Make get_blocks starting-slots consistent with other methods

(cherry picked from commit b6a18e0)
mergify bot added a commit that referenced this pull request Apr 28, 2022
…_limit) consistent (#24760) (#24769)

* Handle genesis more appropriately in get_first_available_block

* Add unit test

* Make get_blocks starting-slots consistent with other methods

(cherry picked from commit b6a18e0)

Co-authored-by: Tyera Eulberg <[email protected]>
jeffwashington pushed a commit to jeffwashington/solana that referenced this pull request Jun 29, 2022
…_limit) consistent (solana-labs#24760)

* Handle genesis more appropriately in get_first_available_block

* Add unit test

* Make get_blocks starting-slots consistent with other methods
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Blocks zero and one become inaccessible after the first root is set
2 participants