Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create bank snapshots #4244

Merged
merged 10 commits into from
May 31, 2019
Merged

Create bank snapshots #4244

merged 10 commits into from
May 31, 2019

Conversation

sambley
Copy link
Contributor

@sambley sambley commented May 10, 2019

Problem

Full node startup is slow when many transactions have been processed as this involved querying for the full ledger and replaying it.

Summary of Changes

Serialize bank state into a snapshot and try to restore from that on boot.

Fixes #2475

@codecov
Copy link

codecov bot commented May 10, 2019

Codecov Report

Merging #4244 into master will increase coverage by <.1%.
The diff coverage is 87.8%.

@@           Coverage Diff            @@
##           master   #4244     +/-   ##
========================================
+ Coverage    78.7%   78.7%   +<.1%     
========================================
  Files         165     166      +1     
  Lines       28639   29223    +584     
========================================
+ Hits        22557   23017    +460     
- Misses       6082    6206    +124

@codecov
Copy link

codecov bot commented May 10, 2019

Codecov Report

Merging #4244 into master will increase coverage by 1.6%.
The diff coverage is 88.2%.

@@           Coverage Diff            @@
##           master   #4244     +/-   ##
========================================
+ Coverage    77.5%   79.2%   +1.6%     
========================================
  Files         178     179      +1     
  Lines       32049   32051      +2     
========================================
+ Hits        24857   25403    +546     
+ Misses       7192    6648    -544

@sambley
Copy link
Contributor Author

sambley commented May 10, 2019

@aeyakovenko, regarding serializing the BankStatusCache, could you clarify more on what you mean by "Only the signatures per block are needed"

@aeyakovenko
Copy link
Member

@sambley the status code is not necessary. We just need the signatures to do a duplicate check. It might be possible to shrink the signature size to 16 random bytes without loss to security.

@aeyakovenko
Copy link
Member

@sambley 20 bytes would definitely be secure

@sambley
Copy link
Contributor Author

sambley commented May 13, 2019

@aeyakovenko, to reduce the snapshot size, have split the cache entries in status cache into two, with one holding the items for which snapshot has been taken prior and the current active set for which snapshot needs to be taken. This allows for incremental snapshots of status cache, do you see any issues with this approach. We can further cut this down by reducing by using only 20 bytes of signature like you had indicated
https://github.com/solana-labs/solana/pull/4244/files#diff-92c739d9ad61135b886d1a44957fe485

@aeyakovenko
Copy link
Member

@sambley, I just thought of this, but blocktree already stores all the signatures. @rob-solana @carllin wouldn’t it be easier to just scan the last 32 blocks to recover them

@carllin
Copy link
Contributor

carllin commented May 13, 2019

@aeyakovenko, blocktree only stores blobs, so you would have to copy the blobs out of storage, deserialize them, then scan each transaction for the signature, seems slow. Moreover, somebody could have sent you a bogus transaction that didn't actually make it successfully through the bank/update the status cache, which we don't currently track

@aeyakovenko
Copy link
Member

aeyakovenko commented May 13, 2019

@carllin don't we already know the root fork? we just need to read all the blocks chained to it to get the signature cache. the snapshot for the cache is read from disk anyways.

@carllin
Copy link
Contributor

carllin commented May 13, 2019

@aeyakovenko, yes, the root fork is stored in blocktree. You can get all the blocks that are children of this root, but how do you propose to rebuild he signature cache from these blocks?

@aeyakovenko
Copy link
Member

@carllin read all the transactions and read their signatures. The status itself is not important.

@carllin
Copy link
Contributor

carllin commented May 13, 2019

Currently, there could potentially be transactions that get included in blocktree, but their signatures don't get included in the status cache (for instance if they cause transaction errors).

@aeyakovenko
Copy link
Member

@carllin anything that is recorded into the ledger should be in the cache. And vice versa. Otherwise it’s a ram spam attack vector.

@carllin
Copy link
Contributor

carllin commented May 14, 2019

We don't currently mark blobs as invalid or purge them if they cause unexpected errors. We also don't mark those forks as dead. Thats something that still needs to be done. But even if this work was completed, it's possible that a blob that causes such unexpected errors exists in blocktree on restart if the node shuts down/restarts before the purging logic can complete

@aeyakovenko
Copy link
Member

@carllin given a root fork, we can’t reconstruct the ledger that got us there?

@carllin
Copy link
Contributor

carllin commented May 14, 2019

@aeyakovenko, you can't currently reliably derive all the transactions that went into that root fork. You know that some subset of the transactions in the blocks that chain to the root fork were included.

@aeyakovenko
Copy link
Member

@carlin, I don’t understand what is missing. A root fork depends on frozen blocks. So the entire frozen block path to genesis is available. Each block contains exactly the set of transactions that compose that block. That’s all we need.

@carllin
Copy link
Contributor

carllin commented May 14, 2019

I think the problem is the assumption that every transaction in a block makes it into the status cache, which is not currently true. The signatures of transactions that cause certain types of errors do not get recorded into the status cache.

@aeyakovenko
Copy link
Member

@carllin, those transactions don’t need to be present in the cache.

@carllin
Copy link
Contributor

carllin commented May 14, 2019

Currently blocktree cannot differentiate those transactions that cause errors from those that would succeed without replaying those transactions.

@aeyakovenko
Copy link
Member

@carllin we don’t need the status, just the signatures to avoid encoding a dup

@carllin
Copy link
Contributor

carllin commented May 14, 2019

Right, but some of those signatures wouldn't make it into the cache if we were to replay them, but you can't tell which those are by just looking at them. I think the danger here is if we were to just include every signature in blocktree, then a validator booting from a snapshot could potentially include a signature that the other validators don't have in their signature caches

@aeyakovenko
Copy link
Member

aeyakovenko commented May 14, 2019

@carllin all signatures that are in the ledger must be in the cache. All signatures that are not in the ledger, must not be in the cache.

@rob-solana
Copy link
Contributor

we use the cache to return TX status to RPC clients, though, right?

@aeyakovenko
Copy link
Member

@rob-solana yes, but it’s not critical

@sakridge
Copy link
Member

I think it's a known issue at the moment.

@sambley sambley force-pushed the snapshot branch 2 times, most recently from 862d7ea to c5fe5d4 Compare May 28, 2019 00:54
@sambley
Copy link
Contributor Author

sambley commented May 28, 2019

@sakridge, @rob-solana, @carllin, could you help review the changes and let me know if you have any comments.

@rob-solana
Copy link
Contributor

@sambley, but these AccountNotFounds are not happening on the other validators, correct? If everybody is playing the same transactions, then all the accounts should match so the behavior should be the same So there is some mismatch in accounts happening on the validator that is restarting from the snapshots? Can we figure out what these zero balanced accounts look like on validators that aren't failing?

@carllin, identified a fix in the restore and have not been to reproduce the ANF so far (would have to test more to see if its really gone). I am noticing a different issue now where after restore, a few votes are getting confirmed and then the below message starts appearing and all further votes are getting dropped. Do you have clues to what could be causing it?

[2019-05-25T05:38:22.705442839Z INFO solana::replay_stage] bank frozen 559
[2019-05-25T05:38:22.706442201Z INFO solana::replay_stage] validator fork confirmed 558 826
[2019-05-25T05:38:22.707536546Z INFO solana::replay_stage] new fork:568 parent:559
[2019-05-25T05:38:22.710977032Z INFO solana::replay_stage] bank frozen 568
[2019-05-25T05:38:22.712022085Z INFO solana::replay_stage] new fork:569 parent:568
[2019-05-25T05:38:23.298403567Z WARN solana_vote_api::vote_state] dropping vote Vote { slot: 568, hash: 5aotmPDFuLYztKPhyzwgE39VsciMMJwgYasxnLRbhpKC }, no matching slot/hash combination
[2019-05-25T05:38:23.298434102Z WARN solana_vote_api::vote_state] dropping vote Vote { slot: 569, hash: G2BS8cZFkAYNgPduXkNWTRz7gq4BApq7Ypjypejbc3AM }, no matching slot/hash combination

is this showing up in a single-node test or on a restarted validator?

map: MmapMut,
// This mutex forces append to be single threaded, but concurrent with reads
append_offset: Mutex<usize>,
current_len: AtomicUsize,
file_size: u64,
}

impl Drop for AppendVec {
fn drop(&mut self) {
let _ = std::fs::remove_dir_all(&self.path.parent().unwrap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convention for this is _ignored =

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will update this

convention for this is _ignored =

will update accordingly

#[derive(Default)]
pub struct Bank {
pub struct BankRc {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the Rc suffix mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the Rc suffix mean?

This just holds the Arc members of the bank as they need to be serialized in a special manner and the test of the members of bank can be serialized directly.

@rob-solana
Copy link
Contributor

@sambley I don't see anything obviously wrong with this patch, still looking

@sambley
Copy link
Contributor Author

sambley commented May 28, 2019

@sambley, but these AccountNotFounds are not happening on the other validators, correct? If everybody is playing the same transactions, then all the accounts should match so the behavior should be the same So there is some mismatch in accounts happening on the validator that is restarting from the snapshots? Can we figure out what these zero balanced accounts look like on validators that aren't failing?

@carllin, identified a fix in the restore and have not been to reproduce the ANF so far (would have to test more to see if its really gone). I am noticing a different issue now where after restore, a few votes are getting confirmed and then the below message starts appearing and all further votes are getting dropped. Do you have clues to what could be causing it?
[2019-05-25T05:38:22.705442839Z INFO solana::replay_stage] bank frozen 559
[2019-05-25T05:38:22.706442201Z INFO solana::replay_stage] validator fork confirmed 558 826
[2019-05-25T05:38:22.707536546Z INFO solana::replay_stage] new fork:568 parent:559
[2019-05-25T05:38:22.710977032Z INFO solana::replay_stage] bank frozen 568
[2019-05-25T05:38:22.712022085Z INFO solana::replay_stage] new fork:569 parent:568
[2019-05-25T05:38:23.298403567Z WARN solana_vote_api::vote_state] dropping vote Vote { slot: 568, hash: 5aotmPDFuLYztKPhyzwgE39VsciMMJwgYasxnLRbhpKC }, no matching slot/hash combination
[2019-05-25T05:38:23.298434102Z WARN solana_vote_api::vote_state] dropping vote Vote { slot: 569, hash: G2BS8cZFkAYNgPduXkNWTRz7gq4BApq7Ypjypejbc3AM }, no matching slot/hash combination

is this showing up in a single-node test or on a restarted validator?

I have not seen this happen on single node test, have seen the issue on testnet when running with 6 nodes. It happens even if the validator has not been restarted. Seems to happen at the point of fork, when dumping the individual hashes of the account when this happened, did notice that the individual hash values different for 2 or 3 accounts.

@sambley
Copy link
Contributor Author

sambley commented May 30, 2019

let me know if there are any additional comments, if not would like to go ahead and merge this one.

@mvines
Copy link
Member

mvines commented May 31, 2019

@sambley - yeah let's merge it when you're ready. We can always have follow up PRs as needed. Are you planning to look at sharing snapshots between validator nodes next?

run.sh Outdated
solana-keygen -o "$dataDir"/config/leader-vote-account-keypair.json
solana-keygen -o "$dataDir"/config/leader-stake-account-keypair.json
leader_keypair="$dataDir/config/leader-keypair.json"
if [ -e "$leader_keypair" ]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh instead of

if [ -e "$leader_keypair" ]
then

please use this form:

if [[ -e "$leader_keypair" ]]; then

because

  1. less lines of code
  2. [[ is a bash built-in

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, in the built-in, the quotes around $leader_keypair are unnecessary

@sambley
Copy link
Contributor Author

sambley commented May 31, 2019

@sambley - yeah let's merge it when you're ready. We can always have follow up PRs as needed. Are you planning to look at sharing snapshots between validator nodes next?

@mvines, yes I can start looking at sharing the snapshots and accounts between validator nodes. Please let me know if you have ideas of how best to share the data between the nodes.

@sambley sambley merged commit 182096d into solana-labs:master May 31, 2019
@mvines
Copy link
Member

mvines commented May 31, 2019

Awesome. I don't have anything super concrete to offer. Ideally a validator that's just starting up could download the most recent snapshot from the cluster entrypoint it was given (as v1, later it could potentially try to fetch snapshots from other nodes it finds over gossip too perhaps). Http seems like a reasonable choice. I'm not sure if we can serve normal http responses over the existing JSON RPC API, so perhaps it would be better to just have a new pure http port that a validator node exports too.

@aeyakovenko
Copy link
Member

@sambley I think the best way to sync the checkpoint would be to paginate the index and the accounts across many validators.

@brooksprumo brooksprumo mentioned this pull request May 6, 2021
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fullnode startup is slow
7 participants