-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve snapshot building #823
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
I agree with you. It seems like the snapshot issues we had stem from working about the timing of snapshots instead of aligning them with with our internal state. In a nutshell, our state uses a snapshot persistence pattern, meaning it's ephemeral but a snapshot is taken at regular interval. To load the state, one must restore the snapshot and then replay the inputs (which are the raft normal entries) that follow until the desired point. The problem is that our state snapshots are time based (for example, it may backup the state every 10 minutes) and opernraft's policy is based on entries. Our internal state snapshots are more frequent that the Raft snapshots need to be. Also, there's asynchronous execution between the openraft state machine and final state which complicates the alignment. To work around these differences, when openraft's Perhaps the easiest way to align openraft snapshots with the state is simply to add a snapshot policy that takes a closure like |
+1, we have very similar snapshot pattern - snapshots triggered not by the amount of log, but time + the amount of dirty data. |
I notice the /// Trigger to build a snapshot at once and return at once.
///
/// Returns error when RaftCore has Fatal error, e.g. shut down or having storage error.
pub async fn trigger_snapshot(&self) -> Result<(), Fatal<C::NodeId>> {
self.send_external_command(ExternalCommand::Snapshot, "trigger_snapshot").await
} This seems good enough for me. Basically, turn off scheduled snapshot and trigger then at the opportune time. @drmingdrmer Does this make sense or is it a bad idea? |
Seems good. I would like to give more control to an application to decouple logics from Openraft. The only limitation is that it cannot accurately determine when to start building a snapshot from outside. The command If this limitation doesn't impact your situation, then this solution would work well. |
I believe that's ok. The openraft state machine holds the inputs to my program's executor, which asynchronously writes the final state including the said internal clock. Basically, there's inherent complexity to the snapshot logic. Each snapshot contains a state backup created in advance by the executor as a precondition. This approach would allow me to trigger Can I reasonably do this with the current code? Or should there be a |
To be considered legal, a snapshot must meet the following criteria:
It is permissible to build a snapshot from a former state machine. The build-snapshot command does not have to be executed from the current state of the machine. In my opinion, it seems suitable for your scenario. No need for |
Yes, that what I meant. Otherwise, I can maybe configure the existing policy with something like |
Adding another variant would be fine. Changing the default value might break user applications :( For now, you can just set the policy to a very large value. |
In the above scenario where we want to manually trigger snapshots because we prefer a time-based approach instead a snapshotting based on number of entries, if we set the policy to a very large value, what are the implications for log purging? My understanding is that when snapshots are build or installed, pub(crate) fn calc_purge_upto(&self) -> Option<LogId<C::NodeId>> {
let st = &self.state;
let max_keep = self.config.max_in_snapshot_log_to_keep;
let batch_size = self.config.purge_batch_size;
let purge_end = self.state.snapshot_meta.last_log_id.next_index().saturating_sub(max_keep);
tracing::debug!(
snapshot_last_log_id = debug(self.state.snapshot_meta.last_log_id),
max_keep,
"try purge: (-oo, {})",
purge_end
);
if st.last_purged_log_id().next_index() + batch_size > purge_end {
tracing::debug!(
snapshot_last_log_id = debug(self.state.snapshot_meta.last_log_id),
max_keep,
last_purged_log_id = display(st.last_purged_log_id().summary()),
batch_size,
purge_end,
"no need to purge",
);
return None;
}
let log_id = self.state.log_ids.get(purge_end - 1); So I think if we set |
@kevlu93 I'll add SnapshotPolicy::Never and Manual.
Logs wont be purged if they are not included in a snapshot.
No. If your application purges logs but Isn't |
Well currently snapshotting and purging with |
@kevlu93 And I do not know what a snapshot buffer is. Did you mean to keep the last two snapshots? |
Yup exactly we want to keep the last two snapshots, so being able to specify the log index manually would be helpful if we switch to a time based snapshot triggers. |
In my opinion, providing you with the ability to manually initiate snapshot and log purging would suffice. |
Yup I think this should work well! |
It's appreciate for offering to contribute some long running snapshots to our test suite. :D
The conditions for testing the building of a snapshot are defined using
BlockOperation
, which simulates different delays.You can add more entries to
BlockOperation
to emulate other conditions in your case.https://github.com/datafuselabs/openraft/blob/fb5bb36b025aa0bd27014ea79315788acbd5bd80/memstore/src/lib.rs#L238-L262
Right.
Yes, aborting snapshot building by an application can be help. However, it introduces more interaction between Openraft and the application, and I believe that what you require is a snapshot policy configuration that is more adaptable.
At present, Openraft offers a very basic policy that merely verifies if the last snapshot is lagging behind last applied log index. What criteria does your application use to build a snapshot? It would be advantageous to use a user-defined
Fn() -> bool
function to inform Openraft when to create a snapshot.https://github.com/datafuselabs/openraft/blob/fb5bb36b025aa0bd27014ea79315788acbd5bd80/openraft/src/config/config.rs#L154
https://github.com/datafuselabs/openraft/blob/fb5bb36b025aa0bd27014ea79315788acbd5bd80/openraft/src/config/config.rs#L25-L40
Originally posted by @fredfortier in #596 (comment)
The text was updated successfully, but these errors were encountered: