Make Snapshots streamable by not requiring a known size up front. #601

otoolep · 2024-06-21T00:35:45Z

https://pkg.go.dev/github.com/hashicorp/raft#SnapshotMeta

I am implementing (not for the first time) my own Snapshot Store in rqlite. I am reading a Snapshot data from an uncompressed source, but would like to compress it on the fly when a Snapshot it the data is read from the io.ReadCloser returned by Snapshot store.Open().

Since the source data is not itself compressed, but compressed bytes will be served by the io.ReadCloser, I do not know the size of the Snapshot data ahead of time. This forces me to compress the data first in a scratch area.

Any ideas for avoiding this?

The text was updated successfully, but these errors were encountered:

banks · 2024-06-21T13:17:00Z

Yeah we also jump through similar hoops in Vault currently where we actually have to read through the snapshot twice when we send it to a follower just because we don't know the size up front but have to fill in the size in the header.

It would be nicer if the InstallSnapshot RPC allowed for something like chunked encoding so that we could stream a large snapshot without knowing the size upfront but that would be a decent amount of work so the team that integrated raft into Vault choose to do the more expensive double read. It's not in a super performance critical part of the code so it's a little harder for us to prioritize the more complete fix, but if you'd be interested in proposing something I think we could make a strong case for making time to support and review it since it would impact Vault directly too!

I'm going to edit the issue title to reflect that this is a useful enhancement/feature request.

otoolep · 2024-06-21T13:46:56Z

Two paths occur to me:

some sort of chunked streaming as you say. This is probably the most polished implementation, and would be the most robust.
a simple flag in https://pkg.go.dev/github.com/hashicorp/raft#InstallSnapshotRequest which indicates that size should be ignored, and that transfer should be considered complete when the receiver receives EOF. In that case it would be up to the Snapshot store itself (the Sink implementation specifically) to validate the data received.

otoolep · 2024-06-21T14:02:58Z

@banks --- in practise does Vault snapshot data ever get to 10+ GB? Do you see that in the field?

banks · 2024-06-21T15:00:56Z

Yep for sure we do. And reading the snapshot from BoltDB means a bunch of disk IO if the file is larger than RAM available for page cache too so it would be a solid improvement for Vault... but typically because users with DBs that larger are using large servers too that probably do still have the entire file in page cache it's not something they perceive as a big problem performance wise. Snapshot install is also quite rare and they typically have disks. with plenty of spere IOPs because raft can only consume a relatively small number with a single serial writer...

banks · 2024-06-21T15:03:22Z

should be ignored, and that transfer should be considered complete when the receiver receives EOF

This could be an option. I think we'd need to be careful to design that so that there is a trailer with a checksum etc. to ensure transient network EOF isn't mistaken for actual end of content etc. Also careful buffer size handling etc. so it's not a DOS vector, but that could work.

otoolep · 2024-06-21T16:10:39Z

OK, thanks, let me think more about this.

banks changed the title ~~Any way to avoid setting SnapshotMeta.Size?~~ Make Snapshots streamable by not requiring a known size up front. Jun 21, 2024

banks added the enhancement label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Snapshots streamable by not requiring a known size up front. #601

Make Snapshots streamable by not requiring a known size up front. #601

otoolep commented Jun 21, 2024

banks commented Jun 21, 2024

otoolep commented Jun 21, 2024

otoolep commented Jun 21, 2024 •

edited

Loading

banks commented Jun 21, 2024

banks commented Jun 21, 2024 •

edited

Loading

otoolep commented Jun 21, 2024

Make Snapshots streamable by not requiring a known size up front. #601

Make Snapshots streamable by not requiring a known size up front. #601

Comments

otoolep commented Jun 21, 2024

banks commented Jun 21, 2024

otoolep commented Jun 21, 2024

otoolep commented Jun 21, 2024 • edited Loading

banks commented Jun 21, 2024

banks commented Jun 21, 2024 • edited Loading

otoolep commented Jun 21, 2024

otoolep commented Jun 21, 2024 •

edited

Loading

banks commented Jun 21, 2024 •

edited

Loading