Incremental Snapshots #17088

brooksprumo · 2021-05-06T18:53:13Z

Problem

Startup time for nodes is slow when having to download a large snapshot.

Proposed Solution

Incremental snapshots! Instead of always having to download large, full snapshots, download a full snapshot once/less often, then download a small incremental snapshot. The expectation/hope/thought is that only a small number of accounts are touched often, so use incremental snapshots to optimize for that behavior. At startup, now a node with an existing full snapshot only needs to download a new, small incremental snapshot.

Create full snapshots much less often (?)
- every 100,000 slots? at epoch? SLOTS_PER_EPOCH / 2 - 1?
Create incremental snapshots every 100 slots (?)
Each incremental snapshot is the difference from the last full snapshot
Old incremental snapshots can be cleaned up, but save at least one extra as fallback
Add a new snapshot field to gossip to differentiate between full and incremental snapshots
- The gossip info for incremental snapshots will need to include the slot of the full snapshot that this incremental snapshot is based on

Example

slot 100,000: full snapshot (A)
slot 100,100: incremental snapshot (B)
slot 100,200: incremental snapshot (C)
...
slot 1xx,x0: incremental snapshot (D)
...
slot 200,000: full snapshot (E)

Incremental snapshot (ISS for short) B is the diff between full snapshot (FSS) A and ISS B. Similarly, ISS C = diff(A, C), and so on.
The latest snapshot is still the valid snapshot. If the latest snapshot is an incremental snapshot, replay the FSS then the ISS.
Incremental snapshots older than a full snapshot can be deleted (i.e. FSS E supersedes FSS A, and ISS B, C, and D).
When ISS D is created, ISS B can be deleted.
If at a slot between D and E, a new node would query for FSS A and then ISS D through gossip.

Details

Storing an Incremental Snapshot

Get the slot from the last full snapshot
Snapshot the bank (same as for FSS)
Snapshot the status cache (slot deltas) (same as FSS)
Package up the storages (AppendVecs) from after the FSS
Make archive

Loading from an Incremental Snapshot

Get the highest full snapshot as done now
Get the highest incremental snapshot based on the full snapshot from above
Extract full snapshot
Extract incremental snapshot
Rebuild the AccountsDb from the storages in both FSS and ISS
Rebuild the Bank from the ISS

Validator

new CLI args for setting ISS interval
loading ISS at startup
creating ISS periodically
discovering and downloading ISS in bootstrap

Background Services

AccountsBackgroundService will need to know about the last FSS slot, as to not clean past it
AccountsBackgroundService will now decide based on the full/incremental snapshot interval if the snapshot package will be a FSS or an ISS
AccountsHashVerifier no longer needs to decide full vs incremental
SnapshotPackagerService is largely unchanged

AccountsDb

Update clean_accounts() to add a new parameter, last_full_snapshot_slot to not clean zero-lamport accounts above the last FSS slot

Ledger Tool

Update CLI args to set maximum number of incremental snapshots to retain

RPC

Add support for downloading incremental snapshots

Gossip

Add incremental snapshot hashes to CrdsData

Bootstrap

Discover and download incremental snapshots during bootstrap

Testing

Unit Tests

snapshot_utils: roundtrip bank to snapshot to bank for FSS
snapshot_utils: roundtrip bank to snapshot to bank for ISS
snapshot_utils: cleanup zero-lamport accounts in slots after FSS

Integration Tests

`core/tests/snapshots.rs`

Make a similar test as the bank forks test, but also creating incremental snapshots
Make a new test that spins up all the background services and ensures FSS and ISS are taken at the correct intervals, and they deserialize correctly

`local_cluster`

Make a similar test that generates ISS on one node, and the other node downloads then loads from it
Make a test for startup processing new roots past full snapshot interval

Questions

~~how often should incremental snapshots be created?~~
- ~~more is good when (re)joining (faster startup time?), but less is good for the running node (less resource utilization?)~~
- ~~does it matter when incremental snapshots would be made? Like after/before certain cleanup code?~~
~~should incremental snapshots only exist locally, or should they also be sent to new nodes~~
- ~~i'm guessing we want to send incremental snapshots to new nodes as well, so they start up faster~~
~~what goes in the incremental snapshot?~~
- ~~is it all the same data types as a full snapshot, just the delta since the last snapshot?~~
~~should full snapshots be created less/same/more frequently now?~~
- ~~likely not more... but there for completeness~~
- ~~still need full snapshots for a new node joining the network~~
~~what tests are needed?~~
- ~~obviously a test to make sure it works~~
- ~~ensure fallback to full snapshot works if an incremental snapshot is borked~~

Related Work

Original Snapshot Work

Original issue: Fullnode startup is slow #2475
PR 1: Create bank snapshots #3671
PR 2: Create bank snapshots #4244

Future Work

Dynamically decide when to generate full and incremental snapshots.
With the current implementation, it's highly beneficial if nodes use the same full snapshot interval. This is so at bootstrap if a node needs to download a snapshot, and already has a full snapshot, it's most likely to not need to download another full snapshot, and instead just the incremental snapshot. More discovery methods or decisions could be added to RPC/bootstrap to better support different full snapshot intervals.

Tasks

The text was updated successfully, but these errors were encountered:

sakridge · 2021-05-06T18:55:25Z

Actually the only problem this fixes is the one that startup takes longer because the node might have to download a large snapshot. Now, if the node has the large 'full' snapshot, then it just needs to download the correct incremental one to apply on top of that. The snapshot extraction speed would be exactly the same, maybe worse because it now has to combine both snapshots and the amount of data it is ingesting will be the same at the end of the computation.

Incremental snapshot would be on top of a full snapshot. The idea is that a validator creates a full snapshot maybe every 100,000 slots, and then creates an incremental snapshot every 100 slots. A node joining the network from nothing would have to download the full and and then the latest incremental that applies on top of it.

Some solution steps/ideas:

Add another snapshot field to gossip. This will then have the snapshot slot and hash, but also the parent 'full' snapshot slot so that the client node knows which incremental options it has and which match up with what 'full' snapshot it has.
Download logic changed to see the above and choose whether to download a 'full'+incremental or an incremental based on a 'full' that it already has.
Snapshots today become 'full' snapshots. Add a new flag for when the snapshot service needs to create an incremental one.
On full snapshot creation, the logic is the same as today. Full clean to try to get the state as small as possible.
After full snapshot is created, now the clean_accounts cannot operate on the range including the last 'full' slot. It needs to only operate on newer slots so that the append-vecs apply correctly onto the 'full' files. Delta will now clean in this special range and then collect all append-vecs which are present in this range to create the new delta snapshot.

sakridge · 2021-05-06T19:11:23Z

how often should incremental snapshots be created?
100 slots is what we try to create snapshots at now, I think that's a fine start. Although today, it takes longer than 100 slots to create a snapshot on mainnet-beta.
does it matter when incremental snapshots would be made? Like after/before certain cleanup code?
I think the current place in accounts_background_service is fine.
should incremental snapshots only exist locally, or should they also be sent to new nodes
Yes, definitely want to share with other nodes, that's one of the main benefits.
what goes in the incremental snapshot?
Yes, I would say the same data, just a different set of updates. The current snapshot code collects append-vecs for all rooted slots below the snapshot target slot, this would collect slots from last_full_snapshot_slot to incremental_snapshot_slot
should full snapshots be created less/same/more frequently now?
Less, way less. Once we have some data about how big the incremental grows, then we can better tell.

There's a couple strategies here. One is a fixed interval, and have all nodes in the network try to create at the same block height. That is nice because then a node can get a full snapshot from one node and then it can likely find an incremental one later from another node which used the same full snapshot slot. Incremental snapshots with different parents will obviously be incompatible.

Another could be to create it dynamically based on how quickly the incremental grows. It's expected the incremental will keep getting larger and larger and potentially as large as the full snapshot. At that point or maybe 50% of the size of the full, then you roll up the state and just create a new full snapshot then. This may be necessary if the fixed interval doesn't work well. If a node ends up downloading a full snapshot and an incremental one just as big, then that's not great.

still need full snapshots for a new node joining the network
yes, hopefully the node has one locally. Since they don't update as often, then hopefully that will be the case.
what tests are needed?
obviously a test to make sure it works - yes
ensure fallback to full snapshot works if an incremental snapshot is borked - yes. Maybe it could just download a new incremental?

Another idea is that we re-store part of the account state on each slot to keep the append-vec number down to under the number of slots in an epoch. This isn't great, because it then will bloat the size of the incremental snapshots because those will look like new updates to the accounts system. I think it would be good to remove this and then support combining accounts from different slots into a single append-vec.

This is the account store code as part of the rent collection:

solana/runtime/src/bank.rs

Line 3571 in fa86a33

self.store_account(&pubkey, &account);

brooksprumo · 2021-05-06T19:30:28Z

* what goes in the incremental snapshot?
  Yes, I would say the same data, just a different set of updates. The current snapshot code collects append-vecs for all rooted slots below the snapshot target slot, this would collect slots from `last_full_snapshot_slot` to `incremental_snapshot_slot`

@sakridge Are you envisioning just a single incremental snapshot between full snapshots (a new incremental snapshot replaces the existing one), or multiple? I was assuming multiple, but it doesn't need to be that way.

Multiple

slot 1,000: full snapshot (A)
slot 1,100: incremental snapshot (B)
slot 1,200: incremental snapshot (C)
...
slot 1,x00: incremental snapshot (S)
...
Slot 2,000: full snapshot (V)

Incremental snapshot B is the diff from A to B, incremental snapshot C is the diff from B to C, etc.

Single

Same picture as above, but when incremental snapshot C is created, it is the diff from A to C, and then B is deleted. Same for incremental snapshot S, which is the diff from A to S, and all other incremental snapshots are removed.

sakridge · 2021-05-06T19:56:57Z

@sakridge Are you envisioning just a single incremental snapshot between full snapshots (a new incremental snapshot replaces the existing one), or multiple? I was assuming multiple, but it doesn't need to be that way.

Multiple
slot 1,000: full snapshot (A)
slot 1,100: incremental snapshot (B)
slot 1,200: incremental snapshot (C)
...
slot 1,x00: incremental snapshot (S)
...
Slot 2,000: full snapshot (V)
Incremental snapshot B is the diff from A to B, incremental snapshot C is the diff from B to C, etc.

Single

Same picture as above, but when incremental snapshot C is created, it is the diff from A to C, and then B is deleted. Same for incremental snapshot S, which is the diff from A to S, and all other incremental snapshots are removed.

I was thinking the node would just have a single incremental which would replace all previous incremental ones, but I'm open to arguments for the multiple design (or others?)

I just think with the multiple you will have a lot of duplicated states updated in the subsequent snapshots, and only 1 is useful. So there will be a lot of overlap. I think there are some small set of accounts updated a lot, like once per slot, and there are many more accounts which are only updated very infrequently, like once in a million or more slots.

Hopefully we capture those infrequently updated accounts with the full snapshot, and the incremental captures the frequently updated ones.

edit: Maybe 2 incremental kind of like we have today, so if the newest one is bad, you can fallback to the old one.

brooksprumo · 2021-05-06T20:10:21Z

I was thinking the node would just have a single incremental which would replace all previous incremental ones,

Sounds good!

Maybe 2 incremental kind of like we have today, so if the newest one is bad, you can fallback to the old one.

I like it.

carllin · 2021-05-07T02:10:26Z

I was thinking about the interval as well, and thought maybe if feasible it would be cool to have snapshots cascading into different intervals. For instance, a couple that are 100 from the tip, a couple that are that 200, then 400, 800, .... etc.

You could even have different threads packaging these at different intervals, or maybe different validators package snapshots at these varying intervals. I.e. some package at a faster rate than others.

The benefit here is I think most validators who shut down recently don't need to download a large incremental snapshot thats 1/2 the size of the full snapshot, and can grab a few smaller ones to catch up. This might be also useful for nodes trying to catch up if they can fast sync small incremental snapshots from near the tip of the network.

brooksprumo · 2021-05-07T18:12:06Z

I was thinking about the interval as well, and thought maybe if feasible it would be cool to have snapshots cascading into different intervals. For instance, a couple that are 100 from the tip, a couple that are that 200, then 400, 800, .... etc.

The benefit here is I think most validators who shut down recently don't need to download a large incremental snapshot thats 1/2 the size of the full snapshot, and can grab a few smaller ones to catch up.

@carllin This design sounds like it fall under the "Multiple" category of number-of-incremental-snapshots-between-full-snapshots. Is that right?

sakridge · 2021-05-07T19:10:43Z

Another way to do it:

Split the snapshot across a range of slots, then keep track of every instance of clean_accounts and shrink_slots and then keep a dirty set of stores to then regenerate the new snasphot slices for only those slots that had changed data.

carllin · 2021-05-07T22:29:27Z

I think one of the nice things about how we've organized storage entries by slot currently is that the set of storage entries forms a natural diff tracker.

So for instance if we were taking a snapshot every 100 slots:

Slot 100, Slot 200, Slot 300

Then as long as you guarantee clean doesn't progress past the slot while you're grabbing a copy of the storage entries (max_clean_root acts as the guard 😃 ), the diff from 100 to 200 would just be the storage entries for slots in the range (100, 200], the diff from 200 to 300 would just be the storage entries in the range (200, 300].

And this can be expanded to arbitrary intervals, 200, 400, etc. Also while you're generating the 100 slot diff lets say from (200, 300], you could use that to then generate the 200 slot diff from (100, 300] (
because they utilize the same set of storage entries in the overlapping range), which could be the "accumulator" incremental snapshot Stephen described earlier.

For v1 one approach is to just focus on using a fixed, configurable interval N. Using this, I could imagine a tiered system where certain validators are generating snapshots at varying intervals to provide better coverage of the space. For instance let's say we had one set of validators generating every 100 slots, another set 1000 slots, and another 10,000 slots. Then once the 100 interval validators have generated 10 such diffs, they can start dropping the earliest one since they know that range is now covered by the validators who generated the 1000 slot interval. And the validators who generated the 1000 slot intervals could do the same by relying on the validator with 10,000 slots coverage

ryoqun · 2021-05-12T16:08:19Z

hey, late for the party. :)

I thought on this a bit. and I may have something to add more color.

Still inheriting the delta slot interval design discussed so far, I think we can forgo generating those delta snapshots at some guessed intervals. Then we can realize yet faster startup? My idea is to reuse accounts dir across reboots and rather dumb (and hopefully easy-to-secure) ondemand delta snapshot api endpoint like ./snapshot/accounts/1111X00-1111Y00.zst with little system load for trusted validators?

This design can come later, but this mostly overlaps the delta snapshot archive generation and download code.

the new restart flow:

exiting node...
- serialize root bank and the whole index into ledger dir (yeah, would be 10-15G) (this could write secondary index as well)
- and prune unrooted older slots?
- then finally mark the dir as successfully finalized for next reboot.
booting node...
- deserialize rooted bank
- only remove newer appendvecs after the serialized rooted bank
- read the serialized index (would be 10-15G disk io, but should be faster than recomputing it. maybe with borsh? lol)
- (optional: check accounts hash?)
- request to one of trusted validators for /snapshot/accounts/1111XX00-1111YY00.zst. XX is from deserialized rooted bank and YY is from gossip.
requested trusted validator...
- Some sanity range check against too large slot range
- Grab Arc<AppendVec> like snapshot generation process
- Read those appendvecs in order while removing updated accounts (this is to compress well the final delta snapshot)
  - for this, we can reuse AccountsIndex reconstruction code needing relatively small memory.
  - for zero-lamport accounts, I think we need to pause clean a bit?
- Then create final sorted and shrunken appendvec and stream it to the booting node via zstd compression
booting node...
- apply the delta on-the-fly snapshot appendvecs to ./accounts and restored indexes while streaming off.
done

my assumptions and observations:

accounts are updated in two extremely-distinct patterns: very frequently and large (serum state, any future fun dapp game's world state, etc) VS very infrequently and small (spl-token holdings, user states self-custodied by end-user's keypairs)
- this write pattern should be the case for very long time. we just see increased number of such updates per slot as we grow.
- So, omitting the frequently updated older account states should be very effective for compressing.
the network bandwidth should be the bottle neck. so minimize the delta snapshot as much as possible.
- also those legitimate access shouldn't high because validator doesn't restart so often.
the needed rpc endpoint should be rather primitive and easy to secure.
- also, should be less resource intensive (maybe io bound)
- the increased threat of DOS for the trusted node isn't significant. (already quite easy to saturate victim's outgoing network by bogus request)
delta snapshot is needed for quick operator-initiated restarts almost always (so, older delta snapshot archive's utility will wane soon).
- and the delta range END must be the most recent root almost absolutely (so, we can exploit this property for less harm of replaying regarding pause of cleaning)
- unclean restarts should be rare.
accounts dir recreation is kind of wasted time to begin with..
index generation is another major part of wasteful booting process.
this introduces accounts dir layout and index serialization compatibility across reboots, but I think this isn't big issue.

brooksprumo · 2021-05-12T19:26:13Z

OK, looks like there are a few different ways that I could go to implement incremental snapshots (or whatever we call it). I don't know how to quantify what way is the right way to go though. Who should be the one to pick? Or, are there additional data points that I can gather that would make the decision clear?

ryoqun · 2021-05-14T13:21:34Z

@carllin @sakridge could you share initial thoughts on my alternative idea?: #17088 (comment)

carllin · 2021-05-17T22:56:33Z

@ryoqun

serialize root bank and the whole index into ledger dir (yeah, would be 10-15G) (this could write >secondary index as well)
and prune unrooted older slots?
then finally mark the dir as successfully finalized for next reboot

As long as you build in a mechanism to ensure this serialization happens at a consistent point in time between clean/shrink, this should work.

Some sanity range check against too large slot range
Grab Arc like snapshot generation process
Read those appendvecs in order while removing updated accounts (this is to compress well the final >delta snapshot)
for this, we can reuse AccountsIndex reconstruction code needing relatively small memory.
for zero-lamport accounts, I think we need to pause clean a bit?
Then create final sorted and shrunken appendvec and stream it to the booting node via zstd >compression

From my understanding, this is an on demand streaming of AccountsDb storages. However, it seems like you'll still need to support a full snapshot fetch (not just the accounts storage, but also the status cache, bank etc.) for nodes who crashed/corrupted their serialized/saved state.

As for how fast on demand snapshotting is compared to incremental snapshots that happen at regular intervals, I think on demand snapshotting is strictly worse if the time to take a snapshot X is greater than the time for the interval I.

For instance, if we have some nodes packaging incremental snapshots every 100 slots, and the packaging process takes 500 slots, then:

On demand snapshot will be 500 slots behind by the time the snapshots arrives
If you just took the last incremental snapshot, it would only be 100 slots behind

This implies to me that having some nodes doing regular snapshotting at different varying intervals I might be better for catchup than on demand snapshotting

ryoqun · 2021-05-18T14:32:34Z

As long as you build in a mechanism to ensure this serialization happens at a consistent point in time between clean/shrink, this should work.

thanks for confirming! yeah, hopefully restarting by itself should make this synchronization requirement easy. :)

However, it seems like you'll still need to support a full snapshot fetch (not just the accounts storage, but also the status cache, bank etc.)

Yeah, but I think this should be rare.

as for the delta snapshot, I think status cache and bank should be small compared to the accounts storage. So, I omitted to mention. Maybe, the ondemand snapshot endpoint processing need to briefly grab the root bank and stash those binaries and include them into the delta snapshot archive

As for how fast on demand snapshotting is compared to incremental snapshots that happen at regular intervals, I think on demand snapshotting is strictly worse if the time to take a snapshot X is greater than the time for the interval I.
....

Thanks for great analysis. :)

Firstly, I just noticed that avoidance of purging accounts and recomputing index can be realized for both incremental and ondemand snapshotting if both just restarts with proper coordination like saving the incremental snapshot locally before exiting (CC: @brooksprumo ) Originally, I thought this is only possible because ondemand can freely specify the start of slot for fetching snapshot dynamically. Maybe you can select and download an incremental snapshot which slightly overlaps with the local root to take the same optimization?

This implies to me that having some nodes doing regular snapshotting at different varying intervals I might be better for catchup than on demand snapshotting

Oh, I'm getting a clue. The different varying intervals part. So, are you assuming newly-restarted validator usually needs to fetch 1 incremental snapshot in normal case? I was originally concerned about wasted bandwidth for the case of multiple incremental snapshot download (i blindly thought about 2-3 deltas, hard-coded 100 slots interval). The wasted bandwidth comes from the fact that multiple incremental snapshots should contain substantial amount of outdated (duplicated) account state between them. (ondemand snapshot tries to hard de-duplicate it)

I think on demand snapshotting is strictly worse if the time to take a snapshot X is greater than the time for the interval I.

yeah. that's true. However, I thought we can offset that delay with vastly reduced network bandwidth by de-duplicating data across equivalent multiple incremental snapshots. As said above, if we can realize 1 delta snapshot download per restart under normal case. the merit of ondemand snapshot is small, though. :)

Also, I don't think ondemand snapshot takes so long. Recent appendvecs should generally be on pagecache and index creation is basically linear processing so, bound to disk read bandwidth (unlike incremental snapshot no need to write archive for archive, only needs to grab mmap).

sakridge · 2021-05-18T15:06:52Z

Handling the zero-lamport account updates sounds hard to me for @ryoqun idea. The target node would have to have the same clean state as the source node or some way to reconcile it. I think initially starting from a known state is an easier solution. I think these on-demand ideas might be good to explore once we have the basic mechanism working.

brooksprumo · 2021-05-18T20:59:01Z

Thanks for the input, everyone. I'm going ahead with the single incremental snapshot that will be set at an interval of 100 slots, and full snapshots at 100,000 slots.

One of the implementation details is that I'll need to pass around a slightly different set of parameters for an incremental snapshot vs a full snapshot. I was thinking I could either do this by either:

Creating a new code path for everything related to incremental snapshots. So new functions that duplicate most all the existing snapshot functions, and also channels/PendingSnapshotPackage, for incremental snapshots.

or

Make a Snapshot enum(s) with fields for Full and Incremental, and do this all the way down. This will require updating all the existing functions too, to check for which snapshot and do the right thing.

There's some pros and cons to both ways, but also neither seems like the clear better way. Given your knowledge of the codebase, and also Rust, does one way sound better than the other?

brooksprumo · 2021-05-18T21:03:11Z

Also, thinking about this a bit more, since the current snapshot logic is based on multiples of Account Hash Interval, I could also piggy back on it and do an incremental snapshot for every account hash interval.

The existing snapshot functions would need to have parameters added for being able to handle incremental snapshots. Then AccountsHashVerifier would need some more logic to do either an incremental snapshot or a full snapshot based on the snapshot interval.

Is that an easier/better than the other two ways?

behzadnouri · 2021-06-15T16:35:07Z

Very late to the party here : )
Last week Brooks asked me about the gossip part of this and I just got the chance to take a look how this is impacting gossip.

From my understanding, this will require new values to be gossiped between nodes. So we will save some time+bandwidth when starting a validator, but then adding to gossip bandwidth usage (which is already a problem) by continuously sending new values across entire cluster each time a node has a new incremental snapshot.

So a one-time payload between only 2 nodes is replaced by continuous gossip traffic across all nodes all the time. Is it still given that the trade-off is positive here? I am in particular worried about how much this is going to make gossip worse.

sakridge · 2021-06-15T16:38:10Z

Very late to the party here : )
Last week Brooks asked me about the gossip part of this and I just got the chance to take a look how this is impacting gossip.

From my understanding, this will require new values to be gossiped between nodes. So we will save some time+bandwidth when starting a validator, but then adding to gossip bandwidth usage (which is already a problem) by continuously sending new values across entire cluster each time a node has a new incremental snapshot.

So a one-time payload between only 2 nodes is replaced by continuous gossip traffic across all nodes all the time. Is it still given that the trade-off is positive here? I am in particular worried about how much this is going to make gossip worse.

The incremental snapshot values would be updated at roughly the same speed as snapshot values today. The regular snapshot rate will be reduced significantly.

behzadnouri · 2021-06-15T18:06:30Z

Can we consider this alternative implementation of incremental snapshots:

Nodes locally keep latest snapshot at multiples of 100,000 slots as well as recent slots.
- So only one extra snapshot in addition to whatever snapshots they currently keep.
Lets say node a has snapshot at slot 812,345.
- So it also has snapshot at slot 800,000 as well (last multiple of 100k).
If node b wants to start and it already has some snapshot at multiple 100k it tells a what that slot is (and its respective hash).
- In the case that b tells a: "I already have snapshot at 800k with this hash" and it matches what a has, then a will compute the diff between its latest snapshot (i.e. 812,345) and 800k and only return that 812,345 diff 800k.
- Otherwise, if things do not match or b does not have any 100k snapshots, a will send back both the 800k snapshot and 812,345 diff 800k (so 2 files but same total bytes as before).

So, effectively the difference is that:

Incremental snapshot conceptually only exist at the time one node wants to start off another node, and is only implemented at snapshot send/receive code.
Every where else in the code snapshot means "full snapshot". No other part of the runtime (and in particular gossip) need to be updated to support incremental snapshots.

Now, to save disk space, under the hood a may store 812,345 as a diff off 800k but that is only an optional internal optimization and not exposed outside of snapshoting code. If node a chose to do so, it will also speed things up when responding to node b.

sakridge · 2021-06-15T18:16:30Z

How does the node know that whatever data it got for diff of slot 812,345 -> 800,000 is good and matches the rest of the network?

This commit builds on PR #18504 by adding a test to core/tests/snapshot.rs for Incremental Snapshots. The test adds banks to bank forks in a loop and takes both full snapshots and incremental snapshots at intervals, and validates they are rebuild-able. For background info about Incremental Snapshots, see #17088. Fixes #18829 and #18972

When reconstructing the AccountsDb, if the storages came from full and incremental snapshots generated on different nodes, it's possible that the AppendVec IDs could overlap/have duplicates, which would cause the reconstruction to fail. This commit handles this issue by unconditionally remapping the AppendVec ID for every AppendVec. Fixes solana-labs#17088

brooksprumo · 2021-10-21T14:15:33Z

DONE!

When reconstructing the AccountsDb, if the storages came from full and incremental snapshots generated on different nodes, it's possible that the AppendVec IDs could overlap/have duplicates, which would cause the reconstruction to fail. This commit handles this issue by unconditionally remapping the AppendVec ID for every AppendVec. Fixes solana-labs#17088

github-actions · 2022-03-30T09:25:01Z

This issue has been automatically locked since there has not been any activity in past 7 days after it was closed. Please open a new issue for related bugs.

mvines modified the milestones: v1.7.0, v1.8.0 May 10, 2021

sakridge mentioned this issue May 11, 2021

Snapshots could be retained at a better interval #17172

Closed

brooksprumo self-assigned this May 11, 2021

sakridge mentioned this issue May 11, 2021

Writing account state on every slot inefficient and reduces incremental snapshot efficiency #17178

Closed

brooksprumo mentioned this issue May 23, 2021

Flatten SnapshotStorages #17369

Closed

This was referenced Jun 10, 2021

Rudimentary working implementation for incremental snapshots #17875

Closed

Add incremental snapshots to RPC #17749

Closed

This was referenced Jul 7, 2021

Add incremental snapshot utils #18504

Merged

Add test_bank_forks_incremental_snapshot() #18565

Merged

brooksprumo mentioned this issue Aug 3, 2021

Add incremental_snapshot_archive_interval_slots to SnapshotConfig #19026

Merged

This was referenced Aug 20, 2021

Add LastFullSnapshotSlot to SnapshotConfig #19341

Merged

Use LastFullSnapshotSlot #19342

Closed

Add base_slot to get_snapshot_storages() #19348

Merged

Remove filter_snapshot_storages_for_incremental_snapshot() #19349

Merged

brooksprumo mentioned this issue Sep 2, 2021

Make wait_for_restart_window() aware of Incremental Snapshots #19587

Merged

brooksprumo mentioned this issue Sep 21, 2021

Handle duplicate AppendVec IDs #20096

Merged

brooksprumo closed this as completed in 1347b50 Sep 23, 2021

brooksprumo reopened this Sep 24, 2021

brooksprumo mentioned this issue Oct 21, 2021

Simplify & combine "with" and "without" incremental snapshots code paths in bootstrap #20848

Closed

brooksprumo closed this as completed Oct 21, 2021

brooksprumo mentioned this issue Oct 21, 2021

Ledger-tool should be able to generate incremental snapshots #20855

Closed

This was referenced Nov 16, 2021

Do not check for full snapshot interval during bootstrap #21300

Merged

Bootstrap with incremental snapshots will never get a snapshot that's a multiple of our full snapshot #21130

Closed

github-actions bot added the locked issue label Mar 30, 2022

github-actions bot locked as resolved and limited conversation to collaborators Mar 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental Snapshots #17088

Incremental Snapshots #17088

brooksprumo commented May 6, 2021 •

edited

Loading

sakridge commented May 6, 2021

sakridge commented May 6, 2021 •

edited

Loading

brooksprumo commented May 6, 2021

sakridge commented May 6, 2021 •

edited

Loading

Multiple

Single

brooksprumo commented May 6, 2021

carllin commented May 7, 2021 •

edited

Loading

brooksprumo commented May 7, 2021

sakridge commented May 7, 2021

carllin commented May 7, 2021 •

edited

Loading

ryoqun commented May 12, 2021 •

edited

Loading

brooksprumo commented May 12, 2021

ryoqun commented May 14, 2021

carllin commented May 17, 2021

ryoqun commented May 18, 2021

sakridge commented May 18, 2021

brooksprumo commented May 18, 2021

brooksprumo commented May 18, 2021

behzadnouri commented Jun 15, 2021

sakridge commented Jun 15, 2021

behzadnouri commented Jun 15, 2021

sakridge commented Jun 15, 2021 •

edited

Loading

brooksprumo commented Oct 21, 2021

github-actions bot commented Mar 30, 2022

Incremental Snapshots #17088

Incremental Snapshots #17088

Comments

brooksprumo commented May 6, 2021 • edited Loading

Problem

Proposed Solution

Example

Details

Storing an Incremental Snapshot

Loading from an Incremental Snapshot

Validator

Background Services

AccountsDb

Ledger Tool

RPC

Gossip

Bootstrap

Testing

Unit Tests

Integration Tests

core/tests/snapshots.rs

local_cluster

Questions

Related Work

Original Snapshot Work

Future Work

Tasks

sakridge commented May 6, 2021

sakridge commented May 6, 2021 • edited Loading

brooksprumo commented May 6, 2021

Multiple

Single

sakridge commented May 6, 2021 • edited Loading

Multiple

Single

brooksprumo commented May 6, 2021

carllin commented May 7, 2021 • edited Loading

brooksprumo commented May 7, 2021

sakridge commented May 7, 2021

carllin commented May 7, 2021 • edited Loading

ryoqun commented May 12, 2021 • edited Loading

brooksprumo commented May 12, 2021

ryoqun commented May 14, 2021

carllin commented May 17, 2021

ryoqun commented May 18, 2021

sakridge commented May 18, 2021

brooksprumo commented May 18, 2021

brooksprumo commented May 18, 2021

behzadnouri commented Jun 15, 2021

sakridge commented Jun 15, 2021

behzadnouri commented Jun 15, 2021

sakridge commented Jun 15, 2021 • edited Loading

brooksprumo commented Oct 21, 2021

github-actions bot commented Mar 30, 2022

brooksprumo commented May 6, 2021 •

edited

Loading

`core/tests/snapshots.rs`

`local_cluster`

sakridge commented May 6, 2021 •

edited

Loading

sakridge commented May 6, 2021 •

edited

Loading

carllin commented May 7, 2021 •

edited

Loading

carllin commented May 7, 2021 •

edited

Loading

ryoqun commented May 12, 2021 •

edited

Loading

sakridge commented Jun 15, 2021 •

edited

Loading