-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor snapshot API #606
Comments
👋 Thanks for opening this issue! Get help or engage by:
|
/assignme |
Can't be believe it's been almost a year since I've had time to look at this. Are the previous discussions around what is needed still valid? |
Yes, the previous discussion remains valid since no changes have been made to the snapshot part. :) GoalThe general idea is to introduce a more generic snapshot API (primarily for storage) to:
StorageTo achieve these goals, the storage API should treat the snapshot as a Upon receiving the The
During transport, a CompatibilityIn cases where the snapshot is a single file and is transported in The new snapshot API introduces several breaking changes. To address these changes, let's first finalize the design in a documentation section located at |
Looking forward to read the design :-). If I understand it correctly, the retry unit would be either a Since the Just my 2c. |
The retry unit is a Yes the
The following pseudo code shows my perception of how snapshot transmision ought to be done:
Opinions? // - A follower calls `Default::default()` to initiate a session to receive snapshot.
// - A follower calls `PartialEq::cmp()` to compare the leader manifest and its local manifest to
// ensure a snapshot is received completely.
trait Manifest : Default + PartialEq {
type Chunk;
// Leader call this method to build chunks to send to a follower.
fn chunks_to_send(&self, received: &Self) -> impl Iterator<Item=Chunk> + 'static;
// The follower add a chunk to its temporary manifest, when a chunk is received.
fn receive(&mut self, c: Chunk);
}
enum InstallSnapshotRequest<M: Manifest> {
Chunk(M::Chunk),
Manifest(M),
}
// Pseudo code for the leader to send snapshot.
fn leader_sending_snapshot() {
let snapshot = self.get_current_snapshot()
let manifest: MyManifest = &snapshot.manifest;
let chunks = manifest.chunks_to_send(&Manifest::default());
for c in chunks {
self.send_install_snapshot(InstallSnapshotRequest::Chunk(c));
}
self.send_install_snapshot(InstallSnapshotRequest::Manifest(manifest.clone()));
}
// Pseudo code for a follower to receive snapshot.
fn follower_receive_snapshot() {
let received: MyManifest = Default::default();
loop {
let req: InstallSnapshotRequest = self.receive_install_snapshot();
match req {
InstallSnapshotRequest::Chunk(c) => received.receive(c);
InstallSnapshotRequest::Manifest(m) => {
if m == received {
self.do_install_snapshot();
return;
} else {
panic!("incomplete snapshot")
}
}
}
}
} |
OK, that might work. Regarding restart of the sending upon error:
Still wondering how the retries will work. Let's see when it's done. |
A follower should always be able to handle a duplicated message, in case the ack of a successful chunk sending is lost.
Yes, the leader should do the best not to send a duplicated chunk. But in a distributed system duplicated message is inevitable. |
Sure, that was the background of the comment. We will get duplicate messages, so they have to be handled gracefully. :-) |
One issue that was brought up when I was looking at just removing |
One of the solution would be: Another way is to make // Leader call this method to build chunks to send to a follower.
fn chunks_to_send(&self, received: &Self) -> impl Iterator<Item=Chunk> + 'static; |
Another option would be to require some kind of monotonically increasing ID for each |
@drmingdrmer I've been starting on an impl of the above. Wouldn't this check be valid on all InstallSnap requests? |
@zach-schoenberger These two cases are about managing raft-logs when a |
My bad, I was really trying to ask if we should error out on a chunk if it isn’t meeting the above check. |
I do not quite understand what a If you meant to cancel a snapshot sending on the leader: If the leader decide to send a snapshot to a follower, then sending a snapshot is the only way to replicate data, e.g., when the raft-log the follower lacks of have already purged on the leader. And the leader expect the follower to be able to handle such situation. A follower should not install the snapshot for the first case: because it is older than its local snapshot(only in-snapshot logs will be purged). For the second case the follower should install the snapshot. |
@drmingdrmer So I'm making some progress with the changes, and I've tried to stick to the main ideas discussed above as much as I could. The notable differences right now are:
Do you see any issue with these? |
I have reviewed your proposal and do not see any issues with the modifications. Thank you for your input. Do you have any concerns regarding Zach's proposal? |
I suppose, you are referring to the change in the fork: https://github.com/zach-schoenberger/openraft/tree/refactor_snapshot? I quickly scanned the change and aside from some cosmetics it seems OK to me. Unfortunately we are still not as far in our project as to send/install snapshots, so I still can't 100% assess the impact of the changes, but the API seems good enough for our purposes (definitely better than before, since it allows us to send user-defined chunks instead of byte data with byte offsets).
This is a verb Ivan |
Sorry, I forgot to clear up that. Thanks @schreter. I just pushed the last bit of cleanup in terms of the actual implementation logic. The hard part was really just updating to use the new traits. If it wouldn't be too much trouble it would be awesome if someone could review the changes to make sure they align with what the project wants. After which I'll update all the other parts of the repo. That in itself is a lot so I don't really want to start it until the api looks right. |
@zach-schoenberger BWT the following PR can be closed? |
Sounds good. Just opened it here. And I closed the other branch. |
Using this method, the application provides a full snapshot to Openraft, which is then used to install and replace the state machine. It is entirely the responsibility of the application to acquire a snapshot through any means: be it in chunks, as a stream, or via shared storage solutions like S3. This method necessitates that the caller supplies a valid `Vote` to confirm the legitimacy of the leader, mirroring the requirements of other Raft protocol APIs such as `append_entries` and `vote`. - Part of databendlabs#606
Using this method, the application provides a full snapshot to Openraft, which is then used to install and replace the state machine. It is entirely the responsibility of the application to acquire a snapshot through any means: be it in chunks, as a stream, or via shared storage solutions like S3. This method necessitates that the caller supplies a valid `Vote` to confirm the legitimacy of the leader, mirroring the requirements of other Raft protocol APIs such as `append_entries` and `vote`. - Part of databendlabs#606
Using this method, the application provides a full snapshot to Openraft, which is then used to install and replace the state machine. It is entirely the responsibility of the application to acquire a snapshot through any means: be it in chunks, as a stream, or via shared storage solutions like S3. This method necessitates that the caller supplies a valid `Vote` to confirm the legitimacy of the leader, mirroring the requirements of other Raft protocol APIs such as `append_entries` and `vote`. - Part of databendlabs#606
Using this method, the application provides a full snapshot to Openraft, which is then used to install and replace the state machine. It is entirely the responsibility of the application to acquire a snapshot through any means: be it in chunks, as a stream, or via shared storage solutions like S3. This method necessitates that the caller supplies a valid `Vote` to confirm the legitimacy of the leader, mirroring the requirements of other Raft protocol APIs such as `append_entries` and `vote`. - Part of databendlabs#606
Using this method, the application provides a full snapshot to Openraft, which is then used to install and replace the state machine. It is entirely the responsibility of the application to acquire a snapshot through any means: be it in chunks, as a stream, or via shared storage solutions like S3. This method necessitates that the caller supplies a valid `Vote` to confirm the legitimacy of the leader, mirroring the requirements of other Raft protocol APIs such as `append_entries` and `vote`. - Part of #606
Handling snapshot receiving is moved out of state-machine worker task. Now it is in implemented outside the `RaftCore`. Receiving snapshot could be totally application specific and should not be part of Openraft. The in sm-worker snapshot receiving is removed. - Part of databendlabs#606
Handling snapshot receiving is moved out of state-machine worker task. Now it is in implemented outside the `RaftCore`. Receiving snapshot could be totally application specific and should not be part of Openraft. The in sm-worker snapshot receiving is removed. - Part of databendlabs#606
`Raft::begin_receiving_snapshot()` request the state machine to return a `SnapshotData` for receiving snapshot from the leader. Internally it calls `RaftStateMachine::begin_receiving_snapshot()` Handling snapshot receiving is moved out of state-machine worker task. Now it is in implemented outside the `RaftCore`. Receiving snapshot could be totally application specific and should not be part of Openraft. The in sm-worker snapshot receiving is removed. - Part of databendlabs#606
`Raft::begin_receiving_snapshot()` request the state machine to return a `SnapshotData` for receiving snapshot from the leader. Internally it calls `RaftStateMachine::begin_receiving_snapshot()` Handling snapshot receiving is moved out of state-machine worker task. Now it is in implemented outside the `RaftCore`. Receiving snapshot could be totally application specific and should not be part of Openraft. The in sm-worker snapshot receiving is removed. - Part of #606
To enable a fully customizable implementation of snapshot transmission tailored to the application's needs, this commit relocates the chunk-by-chunk transmission logic from `ReplicationCore` to a new sub mod, `crate::network::stream_snapshot`. The `stream_snapshot` mod provides a default chunk-based snapshot transmission mechanism, which can be overridden by creating a custom implementation of the `RaftNetwork::snapshot()` method. As part of this commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`. Developers may use `stream_snapshot` as a reference when implementing their own snapshot transmission strategy. Snapshot transmission is internally divided into two distinct phases: 1. Upon request for snapshot transmission, `ReplicationCore` initiates a new task `RaftNetwork::snapshot()` dedicated to sending a complete `Snapshot`. This task should be able to be terminated gracefully by subscribing the `cancel` future. 2. Once the snapshot has been fully transmitted by `RaftNetwork::snapshot()`, the task signals an event back to `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore` of the event, allowing it to acknowledge the completion of the snapshot transmission. Other changes: - `ReplicationCore` has two `RaftNetwork`s, one for log replication and heartbeat, the other for snapshot only. - `ReplicationClosed` becomes a public error for notifying the application implemented sender that a snapshot replication is canceled. - `StreamingError` is introduced as a container of errors that may occur in application defined snapshot transmission, including local IO error, network errors, errors returned by remote peer and `ReplicationClosed`. - The `SnapshotResponse` type is introduced to differentiate it from the `InstallSnapshotResponse`, which is used for chunk-based responses. --- - Part of databendlabs#606
Add `RaftNetwork::snapshot()` to send a complete snapshot and move sending snapshot by chunks out of ReplicationCore. To enable a fully customizable implementation of snapshot transmission tailored to the application's needs, this commit relocates the chunk-by-chunk transmission logic from `ReplicationCore` to a new sub mod, `crate::network::stream_snapshot`. The `stream_snapshot` mod provides a default chunk-based snapshot transmission mechanism, which can be overridden by creating a custom implementation of the `RaftNetwork::snapshot()` method. As part of this commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`. Developers may use `stream_snapshot` as a reference when implementing their own snapshot transmission strategy. Snapshot transmission is internally divided into two distinct phases: 1. Upon request for snapshot transmission, `ReplicationCore` initiates a new task `RaftNetwork::snapshot()` dedicated to sending a complete `Snapshot`. This task should be able to be terminated gracefully by subscribing the `cancel` future. 2. Once the snapshot has been fully transmitted by `RaftNetwork::snapshot()`, the task signals an event back to `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore` of the event, allowing it to acknowledge the completion of the snapshot transmission. Other changes: - `ReplicationCore` has two `RaftNetwork`s, one for log replication and heartbeat, the other for snapshot only. - `ReplicationClosed` becomes a public error for notifying the application implemented sender that a snapshot replication is canceled. - `StreamingError` is introduced as a container of errors that may occur in application defined snapshot transmission, including local IO error, network errors, errors returned by remote peer and `ReplicationClosed`. - The `SnapshotResponse` type is introduced to differentiate it from the `InstallSnapshotResponse`, which is used for chunk-based responses. --- - Part of databendlabs#606
Add `RaftNetwork::snapshot()` to send a complete snapshot and move sending snapshot by chunks out of ReplicationCore. To enable a fully customizable implementation of snapshot transmission tailored to the application's needs, this commit relocates the chunk-by-chunk transmission logic from `ReplicationCore` to a new sub mod, `crate::network::stream_snapshot`. The `stream_snapshot` mod provides a default chunk-based snapshot transmission mechanism, which can be overridden by creating a custom implementation of the `RaftNetwork::snapshot()` method. As part of this commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`. Developers may use `stream_snapshot` as a reference when implementing their own snapshot transmission strategy. Snapshot transmission is internally divided into two distinct phases: 1. Upon request for snapshot transmission, `ReplicationCore` initiates a new task `RaftNetwork::snapshot()` dedicated to sending a complete `Snapshot`. This task should be able to be terminated gracefully by subscribing the `cancel` future. 2. Once the snapshot has been fully transmitted by `RaftNetwork::snapshot()`, the task signals an event back to `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore` of the event, allowing it to acknowledge the completion of the snapshot transmission. Other changes: - `ReplicationCore` has two `RaftNetwork`s, one for log replication and heartbeat, the other for snapshot only. - `ReplicationClosed` becomes a public error for notifying the application implemented sender that a snapshot replication is canceled. - `StreamingError` is introduced as a container of errors that may occur in application defined snapshot transmission, including local IO error, network errors, errors returned by remote peer and `ReplicationClosed`. - The `SnapshotResponse` type is introduced to differentiate it from the `InstallSnapshotResponse`, which is used for chunk-based responses. --- - Part of databendlabs#606
Add `RaftNetwork::snapshot()` to send a complete snapshot and move sending snapshot by chunks out of ReplicationCore. To enable a fully customizable implementation of snapshot transmission tailored to the application's needs, this commit relocates the chunk-by-chunk transmission logic from `ReplicationCore` to a new sub mod, `crate::network::stream_snapshot`. The `stream_snapshot` mod provides a default chunk-based snapshot transmission mechanism, which can be overridden by creating a custom implementation of the `RaftNetwork::snapshot()` method. As part of this commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`. Developers may use `stream_snapshot` as a reference when implementing their own snapshot transmission strategy. Snapshot transmission is internally divided into two distinct phases: 1. Upon request for snapshot transmission, `ReplicationCore` initiates a new task `RaftNetwork::snapshot()` dedicated to sending a complete `Snapshot`. This task should be able to be terminated gracefully by subscribing the `cancel` future. 2. Once the snapshot has been fully transmitted by `RaftNetwork::snapshot()`, the task signals an event back to `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore` of the event, allowing it to acknowledge the completion of the snapshot transmission. Other changes: - `ReplicationCore` has two `RaftNetwork`s, one for log replication and heartbeat, the other for snapshot only. - `ReplicationClosed` becomes a public error for notifying the application implemented sender that a snapshot replication is canceled. - `StreamingError` is introduced as a container of errors that may occur in application defined snapshot transmission, including local IO error, network errors, errors returned by remote peer and `ReplicationClosed`. - The `SnapshotResponse` type is introduced to differentiate it from the `InstallSnapshotResponse`, which is used for chunk-based responses. --- - Part of #606
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `general-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not also use `RaftNetwork::install_snapshot()` for On the receiving end(follower): - `Raft::install_snapshot()` is available only when `general-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-general-snapshot-data` with `general-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `general-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not also use `RaftNetwork::install_snapshot()` for On the receiving end(follower): - `Raft::install_snapshot()` is available only when `general-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-general-snapshot-data` with `general-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format and transmission protocol. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `general-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not also use `RaftNetwork::install_snapshot()` for On the receiving end(follower): - `Raft::install_snapshot()` is available only when `general-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-general-snapshot-data` with `general-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format and transmission protocol. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `general-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not also use `RaftNetwork::install_snapshot()` for On the receiving end(follower): - `Raft::install_snapshot()` is available only when `general-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-general-snapshot-data` with `general-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606 - Fix: databendlabs#209
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format and transmission protocol. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `general-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not need to implement `RaftNetwork::install_snapshot()`. On the receiving end(follower): - `Raft::install_snapshot()` is available only when `general-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-general-snapshot-data` with `general-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606 - Fix: databendlabs#209
Add feature flag `generic-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format and transmission protocol. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `generic-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `generic-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not need to implement `RaftNetwork::install_snapshot()`. On the receiving end(follower): - `Raft::install_snapshot()` is available only when `generic-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-generic-snapshot-data` with `generic-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: databendlabs#606 - Fix: databendlabs#209
Add feature flag `generic-snapshot-data`: when enabled, `SnapshotData` does not have `AsyncSeek + AsyncRead + AsyncWrite` bound. This enables application to define their own snapshot format and transmission protocol. If this feature flag is not eabled, no changes are required for application to upgrade Openraft. On the sending end(leader that sends snapshot to follower): - Without `generic-snapshot-data`: `RaftNetwork::snapshot()` provides a default implementation that invokes the chunk based API `RaftNetwork::install_snapshot()` for transmit. - With `generic-snapshot-data` enabled: `RaftNetwork::snapshot()` must be implemented to provide application customized snapshot transmission. Application does not need to implement `RaftNetwork::install_snapshot()`. On the receiving end(follower): - `Raft::install_snapshot()` is available only when `generic-snapshot-data` is disabled. Add an example `examples/raft-kv-memstore-generic-snapshot-data` with `generic-snapshot-data` enabled. In this example snapshot is transmitted without fragmentation, i.e., via `RaftNetwork::snapshot()`. The chunk based API `RaftNetwork::install_snapshot()` is not used. In a production scenario, a snapshot can be transmitted in arbitrary manner. - Fix: #606 - Fix: #209
Introduce a new set of snapshot API based on the discussion in:
The text was updated successfully, but these errors were encountered: