-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create bank snapshots #4244
Create bank snapshots #4244
Conversation
Codecov Report
@@ Coverage Diff @@
## master #4244 +/- ##
========================================
+ Coverage 78.7% 78.7% +<.1%
========================================
Files 165 166 +1
Lines 28639 29223 +584
========================================
+ Hits 22557 23017 +460
- Misses 6082 6206 +124 |
Codecov Report
@@ Coverage Diff @@
## master #4244 +/- ##
========================================
+ Coverage 77.5% 79.2% +1.6%
========================================
Files 178 179 +1
Lines 32049 32051 +2
========================================
+ Hits 24857 25403 +546
+ Misses 7192 6648 -544 |
@aeyakovenko, regarding serializing the BankStatusCache, could you clarify more on what you mean by "Only the signatures per block are needed" |
@sambley the status code is not necessary. We just need the signatures to do a duplicate check. It might be possible to shrink the signature size to 16 random bytes without loss to security. |
@sambley 20 bytes would definitely be secure |
@aeyakovenko, to reduce the snapshot size, have split the cache entries in status cache into two, with one holding the items for which snapshot has been taken prior and the current active set for which snapshot needs to be taken. This allows for incremental snapshots of status cache, do you see any issues with this approach. We can further cut this down by reducing by using only 20 bytes of signature like you had indicated |
@sambley, I just thought of this, but blocktree already stores all the signatures. @rob-solana @carllin wouldn’t it be easier to just scan the last 32 blocks to recover them |
@aeyakovenko, blocktree only stores blobs, so you would have to copy the blobs out of storage, deserialize them, then scan each transaction for the signature, seems slow. Moreover, somebody could have sent you a bogus transaction that didn't actually make it successfully through the bank/update the status cache, which we don't currently track |
@carllin don't we already know the root fork? we just need to read all the blocks chained to it to get the signature cache. the snapshot for the cache is read from disk anyways. |
@aeyakovenko, yes, the root fork is stored in blocktree. You can get all the blocks that are children of this root, but how do you propose to rebuild he signature cache from these blocks? |
@carllin read all the transactions and read their signatures. The status itself is not important. |
Currently, there could potentially be transactions that get included in blocktree, but their signatures don't get included in the status cache (for instance if they cause transaction errors). |
@carllin anything that is recorded into the ledger should be in the cache. And vice versa. Otherwise it’s a ram spam attack vector. |
We don't currently mark blobs as invalid or purge them if they cause unexpected errors. We also don't mark those forks as dead. Thats something that still needs to be done. But even if this work was completed, it's possible that a blob that causes such unexpected errors exists in blocktree on restart if the node shuts down/restarts before the purging logic can complete |
@carllin given a root fork, we can’t reconstruct the ledger that got us there? |
@aeyakovenko, you can't currently reliably derive all the transactions that went into that root fork. You know that some subset of the transactions in the blocks that chain to the root fork were included. |
@carlin, I don’t understand what is missing. A root fork depends on frozen blocks. So the entire frozen block path to genesis is available. Each block contains exactly the set of transactions that compose that block. That’s all we need. |
I think the problem is the assumption that every transaction in a block makes it into the status cache, which is not currently true. The signatures of transactions that cause certain types of errors do not get recorded into the status cache. |
@carllin, those transactions don’t need to be present in the cache. |
Currently blocktree cannot differentiate those transactions that cause errors from those that would succeed without replaying those transactions. |
@carllin we don’t need the status, just the signatures to avoid encoding a dup |
Right, but some of those signatures wouldn't make it into the cache if we were to replay them, but you can't tell which those are by just looking at them. I think the danger here is if we were to just include every signature in blocktree, then a validator booting from a snapshot could potentially include a signature that the other validators don't have in their signature caches |
@carllin all signatures that are in the ledger must be in the cache. All signatures that are not in the ledger, must not be in the cache. |
we use the cache to return TX status to RPC clients, though, right? |
@rob-solana yes, but it’s not critical |
I think it's a known issue at the moment. |
862d7ea
to
c5fe5d4
Compare
@sakridge, @rob-solana, @carllin, could you help review the changes and let me know if you have any comments. |
is this showing up in a single-node test or on a restarted validator? |
runtime/src/append_vec.rs
Outdated
map: MmapMut, | ||
// This mutex forces append to be single threaded, but concurrent with reads | ||
append_offset: Mutex<usize>, | ||
current_len: AtomicUsize, | ||
file_size: u64, | ||
} | ||
|
||
impl Drop for AppendVec { | ||
fn drop(&mut self) { | ||
let _ = std::fs::remove_dir_all(&self.path.parent().unwrap()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
convention for this is _ignored =
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will update this
convention for this is _ignored =
will update accordingly
#[derive(Default)] | ||
pub struct Bank { | ||
pub struct BankRc { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the Rc suffix mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the Rc suffix mean?
This just holds the Arc members of the bank as they need to be serialized in a special manner and the test of the members of bank can be serialized directly.
@sambley I don't see anything obviously wrong with this patch, still looking |
I have not seen this happen on single node test, have seen the issue on testnet when running with 6 nodes. It happens even if the validator has not been restarted. Seems to happen at the point of fork, when dumping the individual hashes of the account when this happened, did notice that the individual hash values different for 2 or 3 accounts. |
let me know if there are any additional comments, if not would like to go ahead and merge this one. |
@sambley - yeah let's merge it when you're ready. We can always have follow up PRs as needed. Are you planning to look at sharing snapshots between validator nodes next? |
run.sh
Outdated
solana-keygen -o "$dataDir"/config/leader-vote-account-keypair.json | ||
solana-keygen -o "$dataDir"/config/leader-stake-account-keypair.json | ||
leader_keypair="$dataDir/config/leader-keypair.json" | ||
if [ -e "$leader_keypair" ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh instead of
if [ -e "$leader_keypair" ]
then
please use this form:
if [[ -e "$leader_keypair" ]]; then
because
- less lines of code
[[
is a bash built-in
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, in the built-in, the quotes around $leader_keypair
are unnecessary
@mvines, yes I can start looking at sharing the snapshots and accounts between validator nodes. Please let me know if you have ideas of how best to share the data between the nodes. |
Awesome. I don't have anything super concrete to offer. Ideally a validator that's just starting up could download the most recent snapshot from the cluster entrypoint it was given (as v1, later it could potentially try to fetch snapshots from other nodes it finds over gossip too perhaps). Http seems like a reasonable choice. I'm not sure if we can serve normal http responses over the existing JSON RPC API, so perhaps it would be better to just have a new pure http port that a validator node exports too. |
@sambley I think the best way to sync the checkpoint would be to paginate the index and the accounts across many validators. |
Problem
Full node startup is slow when many transactions have been processed as this involved querying for the full ledger and replaying it.
Summary of Changes
Serialize bank state into a snapshot and try to restore from that on boot.
Fixes #2475