-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce snapshot disk I/O #346
Comments
Hey @suizman, I'm a little confused on the second question, could you give me more insight into what your use case and problem you're encountering? |
@s-christoff sure, but what we needed to do is skip the periodic Snapshots and only take them on demand. We already implemented in our project a custom Snapshot strategy on top of Hashicorp's raft. What I mean by streaming the snapshots is that right now the Snapshot must be written in leader disk before replaying it to the follower nodes. It would be great to have the possibility to stream them directly through the followers instead of waiting to be written on disk first. For now, we're fine with our implementation for our use case with RocksDB. But It would be great to see this functionality in this library. |
Streaming snapshots would be great. Here's a use case: Let's say the data managed by the FSM is already compacted, if you snapshot to a file then you need double the storage capacity of what you actually store. Whereas if you stream the snapshot then theoretically you don't need any more disk than what you store. |
Though I suppose you'd many of the benefits using an S3 snapshot store or something. |
I'm reluctant to provide another option to opt-out of periodic snapshots, since their side effect (compaction) is needed for a healthy raft cluster. It is already possible to configure SnapshotInterval and SnapshotThreshold high enough that it doesn't happen. We're open to the idea of reducing disk I/O for snapshots. Vault does take some steps in this direction, but there's more work to be done. I'm going to recast this issue to focus on that side of things. |
In our project QED the FSM persists the data on disk. On high loads this is very disk intensive task. It would be great to be able to take snapshots on-demand instead of doing at recurrent intervals.
Also we'd like to stream the snapshot directly to the nodes instead of waiting to be written to disk first and then send it over the network.
Are they any plans to add this functionality?
The text was updated successfully, but these errors were encountered: