Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor snapshot API #606

Closed
Tracked by #857
drmingdrmer opened this issue Nov 22, 2022 · 22 comments · Fixed by #1016
Closed
Tracked by #857

Refactor snapshot API #606

drmingdrmer opened this issue Nov 22, 2022 · 22 comments · Fixed by #1016
Assignees
Labels
A-replication Area: replication protocol A-snapshot Area: snapshot creation/transport/management A-store Area: storage

Comments

@drmingdrmer
Copy link
Member

Introduce a new set of snapshot API based on the discussion in:

@github-actions
Copy link

👋 Thanks for opening this issue!

Get help or engage by:

  • /help : to print help messages.
  • /assignme : to assign this issue to you.

@zach-schoenberger
Copy link
Contributor

/assignme

@drmingdrmer drmingdrmer added A-store Area: storage A-replication Area: replication protocol A-snapshot Area: snapshot creation/transport/management labels Jan 4, 2023
@zach-schoenberger
Copy link
Contributor

Can't be believe it's been almost a year since I've had time to look at this. Are the previous discussions around what is needed still valid?

@drmingdrmer
Copy link
Member Author

Yes, the previous discussion remains valid since no changes have been made to the snapshot part. :)

Goal

The general idea is to introduce a more generic snapshot API (primarily for storage) to:

  • Accurately and easily describe single-file and multi-file based snapshots using the new API.
  • Support incremental snapshot transport.

Storage

To achieve these goals, the storage API should treat the snapshot as a
manifest file and several data files. When sending a snapshot, the leader
sends data files sequentially and sends the manifest file last.

Upon receiving the manifest file, a follower checks the snapshot's integrity,
i.e., verifies that every data file mentioned in the manifest has been
received, and then installs the snapshot.

The manifest and data should be entirely user-defined:

  • A data file can be a regular on-disk file or a structured data type.
  • A manifest file can be a list of data files.

During transport, a data file may be divided into smaller chunks, similar to the current implementation.

Compatibility

In cases where the snapshot is a single file and is transported in chunks, the
manifest is simply the snapshot file's size in bytes.


The new snapshot API introduces several breaking changes. To address these changes, let's first finalize the design in a documentation section located at openraft/src/docs/data/.

@schreter
Copy link
Collaborator

schreter commented Jun 5, 2023

To address these changes, let's first finalize the design in a documentation section located at openraft/src/docs/data/.

Looking forward to read the design :-).

If I understand it correctly, the retry unit would be either a data file or a chunk? I.e., when sending the snapshot breaks and restarts, it will re-start sending the snapshot starting from last-known confirmed data file or confirmed chunk?

Since the manifest is opaque, how do you know how many data files with how many chunks to transfer? Also, since it's opaque, do we need the differentiation between data and manifest file? We just need to tell the follower "that's it, check & install it", right? Alternatively, the last data file might carry a flag that it's the last one and then by some implementation it might be interpreted as manifest?

Just my 2c.

@drmingdrmer
Copy link
Member Author

@schreter

The retry unit is a chunk.

Yes the manifest is opaque. Therefore there should be a trait Manifest defining its behavior. It defines how to build a series of chunks for a leader to send, and how to store the information about already received chunks for a follower.

chunk(not data) and manifest should be differentiated on transport.

The following pseudo code shows my perception of how snapshot transmision ought to be done:

  • The leader build a series of chunks by calling Manifest::chunks_to_send().
  • The follower initiates an empty Manifest, and fill chunks it received to the Manifest and finally the follower compare its local manifest and the one the leader send, to ensure that the snapshot is transmitted completely:

Opinions?

// - A follower calls `Default::default()` to initiate a session to receive snapshot.
// - A follower calls `PartialEq::cmp()` to compare the leader manifest and its local manifest to
//   ensure a snapshot is received completely.
trait Manifest : Default + PartialEq {
    type Chunk;

    // Leader call this method to build chunks to send to a follower.
    fn chunks_to_send(&self, received: &Self) -> impl Iterator<Item=Chunk> + 'static;

    // The follower add a chunk to its temporary manifest, when a chunk is received.
    fn receive(&mut self, c: Chunk);
}

enum InstallSnapshotRequest<M: Manifest> {
    Chunk(M::Chunk),
    Manifest(M),
}

// Pseudo code for the leader to send snapshot.
fn leader_sending_snapshot() {
    let snapshot = self.get_current_snapshot()
    let manifest: MyManifest = &snapshot.manifest;
    let chunks = manifest.chunks_to_send(&Manifest::default());

    for c in chunks {
        self.send_install_snapshot(InstallSnapshotRequest::Chunk(c));
    }

    self.send_install_snapshot(InstallSnapshotRequest::Manifest(manifest.clone()));
}

// Pseudo code for a follower to receive snapshot.
fn follower_receive_snapshot() {
    let received: MyManifest = Default::default();
    loop {
        let req: InstallSnapshotRequest = self.receive_install_snapshot();

        match req {
            InstallSnapshotRequest::Chunk(c) => received.receive(c);
            InstallSnapshotRequest::Manifest(m) => {
                if m == received {
                    self.do_install_snapshot();
                    return;
                } else {
                    panic!("incomplete snapshot")
                }
            }
        }
    }
}

@schreter
Copy link
Collaborator

schreter commented Jun 6, 2023

OK, that might work.

Regarding restart of the sending upon error:

  • I assume that the follower should gracefully handle retry of chunk sending (i.e., if a chunk is received twice due to a network error)
  • I assume the follower responds to each chunk it received with some form of ack, so the leader would not re-send the entire snapshot upon retry

Still wondering how the retries will work. Let's see when it's done.

@drmingdrmer
Copy link
Member Author

OK, that might work.

Regarding restart of the sending upon error:

  • I assume that the follower should gracefully handle retry of chunk sending (i.e., if a chunk is received twice due to a network error)

A follower should always be able to handle a duplicated message, in case the ack of a successful chunk sending is lost.

  • I assume the follower responds to each chunk it received with some form of ack, so the leader would not re-send the entire snapshot upon retry

Yes, the leader should do the best not to send a duplicated chunk. But in a distributed system duplicated message is inevitable.

@schreter
Copy link
Collaborator

schreter commented Jun 6, 2023

in a distributed system duplicated message is inevitable

Sure, that was the background of the comment. We will get duplicate messages, so they have to be handled gracefully. :-)

@zach-schoenberger
Copy link
Contributor

One issue that was brought up when I was looking at just removing AsyncSeek from the snapshot trait bounds was that it is used to resend the data. I think above pseudo code runs into the same problem. Other than that I like the break away from a set of bytes for the chunk to something user defined.

@drmingdrmer
Copy link
Member Author

@zach-schoenberger

One of the solution would be:
The leader maintains a temporary Manifest that tracks already received and acknowledged chunks.
When resending a chunk, the leader passes this received to chunks_to_send(), chunks_to_send() calculates the difference between self and received and then build a series of chunk to send.

Another way is to make chunk a Clone, the leader just send a copy of a chunk to a follower.

    // Leader call this method to build chunks to send to a follower.
    fn chunks_to_send(&self, received: &Self) -> impl Iterator<Item=Chunk> + 'static;

@schreter
Copy link
Collaborator

schreter commented Jun 7, 2023

Another option would be to require some kind of monotonically increasing ID for each Chunk and tell the chunks_to_send the last successfully received ID (for our project, this would fit quite good, since we have sparsely-indexed segments 🙂).

@zach-schoenberger
Copy link
Contributor

@drmingdrmer I've been starting on an impl of the above. Wouldn't this check be valid on all InstallSnap requests?

https://github.com/datafuselabs/openraft/blob/add2f7b1cd6754ca566006d67e81bc9f59258403/openraft/src/engine/handler/following_handler/mod.rs#L295-L311

@drmingdrmer
Copy link
Member Author

@zach-schoenberger
Yes, These two cases are both valid state and will be handled when installing the snapshot on a follower.

These two cases are about managing raft-logs when a SnapshotMeta being received by a follower. And the handling of these two cases have nothing to do with the data type that represents the snapshot data.

@zach-schoenberger
Copy link
Contributor

My bad, I was really trying to ask if we should error out on a chunk if it isn’t meeting the above check.

@drmingdrmer
Copy link
Member Author

My bad, I was really trying to ask if we should error out on a chunk if it isn’t meeting the above check.

I do not quite understand what a error out is.

If you meant to cancel a snapshot sending on the leader: If the leader decide to send a snapshot to a follower, then sending a snapshot is the only way to replicate data, e.g., when the raft-log the follower lacks of have already purged on the leader. And the leader expect the follower to be able to handle such situation.

A follower should not install the snapshot for the first case: because it is older than its local snapshot(only in-snapshot logs will be purged). For the second case the follower should install the snapshot.

@zach-schoenberger
Copy link
Contributor

@drmingdrmer So I'm making some progress with the changes, and I've tried to stick to the main ideas discussed above as much as I could. The notable differences right now are:

  • Sending the manifest first
    • allows for handling out of order chunks more easily (potentially)
    • doesn't require a "finalize" message, but that might be nice api wise (not in the code at this time)
  • Manifest is generated from the snapshot, not from default
  • Removing logic that requires ordering of the chunks: since we define a manifest, this should be gracefully handled and errors should not be needed in this case

Do you see any issue with these?

@drmingdrmer
Copy link
Member Author

@zach-schoenberger,

I have reviewed your proposal and do not see any issues with the modifications. Thank you for your input.

@schreter,

Do you have any concerns regarding Zach's proposal?

@schreter
Copy link
Collaborator

@drmingdrmer

Do you have any concerns regarding Zach's proposal?

I suppose, you are referring to the change in the fork: https://github.com/zach-schoenberger/openraft/tree/refactor_snapshot?

I quickly scanned the change and aside from some cosmetics it seems OK to me.

Unfortunately we are still not as far in our project as to send/install snapshots, so I still can't 100% assess the impact of the changes, but the API seems good enough for our purposes (definitely better than before, since it allows us to send user-defined chunks instead of byte data with byte offsets).

I do not quite understand what a error out is.

This is a verb to error out == bail out == return with an error :-).
I'm also no native speaker. Sometimes the people use funny constructs, but to error out is actually quite frequently used.

Ivan

@zach-schoenberger
Copy link
Contributor

Sorry, I forgot to clear up that. Thanks @schreter. I just pushed the last bit of cleanup in terms of the actual implementation logic. The hard part was really just updating to use the new traits. If it wouldn't be too much trouble it would be awesome if someone could review the changes to make sure they align with what the project wants. After which I'll update all the other parts of the repo. That in itself is a lot so I don't really want to start it until the api looks right.

@drmingdrmer
Copy link
Member Author

it would be awesome if someone could review the changes to make sure they align with what the project wants.

@zach-schoenberger
You can just open a draft pull request so that anyone interested will participate in the review process.:)

BWT the following PR can be closed?

@zach-schoenberger
Copy link
Contributor

Sounds good. Just opened it here. And I closed the other branch.

drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 10, 2024
Using this method, the application provides a full snapshot to
Openraft, which is then used to install and replace the state machine.
It is entirely the responsibility of the application to acquire a
snapshot through any means: be it in chunks, as a stream, or via shared
storage solutions like S3.

This method necessitates that the caller supplies a valid `Vote` to
confirm the legitimacy of the leader, mirroring the requirements of
other Raft protocol APIs such as `append_entries` and `vote`.

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 10, 2024
Using this method, the application provides a full snapshot to
Openraft, which is then used to install and replace the state machine.
It is entirely the responsibility of the application to acquire a
snapshot through any means: be it in chunks, as a stream, or via shared
storage solutions like S3.

This method necessitates that the caller supplies a valid `Vote` to
confirm the legitimacy of the leader, mirroring the requirements of
other Raft protocol APIs such as `append_entries` and `vote`.

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 11, 2024
Using this method, the application provides a full snapshot to
Openraft, which is then used to install and replace the state machine.
It is entirely the responsibility of the application to acquire a
snapshot through any means: be it in chunks, as a stream, or via shared
storage solutions like S3.

This method necessitates that the caller supplies a valid `Vote` to
confirm the legitimacy of the leader, mirroring the requirements of
other Raft protocol APIs such as `append_entries` and `vote`.

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 11, 2024
Using this method, the application provides a full snapshot to
Openraft, which is then used to install and replace the state machine.
It is entirely the responsibility of the application to acquire a
snapshot through any means: be it in chunks, as a stream, or via shared
storage solutions like S3.

This method necessitates that the caller supplies a valid `Vote` to
confirm the legitimacy of the leader, mirroring the requirements of
other Raft protocol APIs such as `append_entries` and `vote`.

- Part of databendlabs#606
drmingdrmer added a commit that referenced this issue Feb 11, 2024
Using this method, the application provides a full snapshot to
Openraft, which is then used to install and replace the state machine.
It is entirely the responsibility of the application to acquire a
snapshot through any means: be it in chunks, as a stream, or via shared
storage solutions like S3.

This method necessitates that the caller supplies a valid `Vote` to
confirm the legitimacy of the leader, mirroring the requirements of
other Raft protocol APIs such as `append_entries` and `vote`.

- Part of #606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 15, 2024
Handling snapshot receiving is moved out of state-machine worker task.
Now it is in implemented outside the `RaftCore`.
Receiving snapshot could be totally application specific and should not
be part of Openraft.

The in sm-worker snapshot receiving is removed.

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 15, 2024
Handling snapshot receiving is moved out of state-machine worker task.
Now it is in implemented outside the `RaftCore`.
Receiving snapshot could be totally application specific and should not
be part of Openraft.

The in sm-worker snapshot receiving is removed.

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 15, 2024
`Raft::begin_receiving_snapshot()` request the state machine to return a
`SnapshotData` for receiving snapshot from the leader.
Internally it calls `RaftStateMachine::begin_receiving_snapshot()`

Handling snapshot receiving is moved out of state-machine worker task.
Now it is in implemented outside the `RaftCore`.
Receiving snapshot could be totally application specific and should not
be part of Openraft.

The in sm-worker snapshot receiving is removed.

- Part of databendlabs#606
drmingdrmer added a commit that referenced this issue Feb 16, 2024
`Raft::begin_receiving_snapshot()` request the state machine to return a
`SnapshotData` for receiving snapshot from the leader.
Internally it calls `RaftStateMachine::begin_receiving_snapshot()`

Handling snapshot receiving is moved out of state-machine worker task.
Now it is in implemented outside the `RaftCore`.
Receiving snapshot could be totally application specific and should not
be part of Openraft.

The in sm-worker snapshot receiving is removed.

- Part of #606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 16, 2024
To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from `ReplicationCore` to a new
sub mod, `crate::network::stream_snapshot`.

The `stream_snapshot` mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the `RaftNetwork::snapshot()` method. As part of this
commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
Developers may use `stream_snapshot` as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

1. Upon request for snapshot transmission, `ReplicationCore` initiates a
   new task `RaftNetwork::snapshot()` dedicated to sending a complete
   `Snapshot`. This task should be able to be terminated gracefully by
   subscribing the `cancel` future.

2. Once the snapshot has been fully transmitted by
   `RaftNetwork::snapshot()`, the task signals an event back to
   `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
   of the event, allowing it to acknowledge the completion of the
   snapshot transmission.

Other changes:

- `ReplicationCore` has two `RaftNetwork`s, one for log replication and
  heartbeat, the other for snapshot only.

- `ReplicationClosed` becomes a public error for notifying the
  application implemented sender that a snapshot replication is
  canceled.

- `StreamingError` is introduced as a container of errors that may occur
  in application defined snapshot transmission, including local IO
  error, network errors, errors returned by remote peer and `ReplicationClosed`.

- The `SnapshotResponse` type is introduced to differentiate it from the
  `InstallSnapshotResponse`, which is used for chunk-based responses.

---

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 16, 2024
Add `RaftNetwork::snapshot()` to send a complete snapshot and move
sending snapshot by chunks out of ReplicationCore.

To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from `ReplicationCore` to a new
sub mod, `crate::network::stream_snapshot`.

The `stream_snapshot` mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the `RaftNetwork::snapshot()` method. As part of this
commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
Developers may use `stream_snapshot` as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

1. Upon request for snapshot transmission, `ReplicationCore` initiates a
   new task `RaftNetwork::snapshot()` dedicated to sending a complete
   `Snapshot`. This task should be able to be terminated gracefully by
   subscribing the `cancel` future.

2. Once the snapshot has been fully transmitted by
   `RaftNetwork::snapshot()`, the task signals an event back to
   `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
   of the event, allowing it to acknowledge the completion of the
   snapshot transmission.

Other changes:

- `ReplicationCore` has two `RaftNetwork`s, one for log replication and
  heartbeat, the other for snapshot only.

- `ReplicationClosed` becomes a public error for notifying the
  application implemented sender that a snapshot replication is
  canceled.

- `StreamingError` is introduced as a container of errors that may occur
  in application defined snapshot transmission, including local IO
  error, network errors, errors returned by remote peer and `ReplicationClosed`.

- The `SnapshotResponse` type is introduced to differentiate it from the
  `InstallSnapshotResponse`, which is used for chunk-based responses.

---

- Part of databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add `RaftNetwork::snapshot()` to send a complete snapshot and move
sending snapshot by chunks out of ReplicationCore.

To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from `ReplicationCore` to a new
sub mod, `crate::network::stream_snapshot`.

The `stream_snapshot` mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the `RaftNetwork::snapshot()` method. As part of this
commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
Developers may use `stream_snapshot` as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

1. Upon request for snapshot transmission, `ReplicationCore` initiates a
   new task `RaftNetwork::snapshot()` dedicated to sending a complete
   `Snapshot`. This task should be able to be terminated gracefully by
   subscribing the `cancel` future.

2. Once the snapshot has been fully transmitted by
   `RaftNetwork::snapshot()`, the task signals an event back to
   `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
   of the event, allowing it to acknowledge the completion of the
   snapshot transmission.

Other changes:

- `ReplicationCore` has two `RaftNetwork`s, one for log replication and
  heartbeat, the other for snapshot only.

- `ReplicationClosed` becomes a public error for notifying the
  application implemented sender that a snapshot replication is
  canceled.

- `StreamingError` is introduced as a container of errors that may occur
  in application defined snapshot transmission, including local IO
  error, network errors, errors returned by remote peer and `ReplicationClosed`.

- The `SnapshotResponse` type is introduced to differentiate it from the
  `InstallSnapshotResponse`, which is used for chunk-based responses.

---

- Part of databendlabs#606
drmingdrmer added a commit that referenced this issue Feb 19, 2024
Add `RaftNetwork::snapshot()` to send a complete snapshot and move
sending snapshot by chunks out of ReplicationCore.

To enable a fully customizable implementation of snapshot transmission
tailored to the application's needs, this commit relocates the
chunk-by-chunk transmission logic from `ReplicationCore` to a new
sub mod, `crate::network::stream_snapshot`.

The `stream_snapshot` mod provides a default chunk-based snapshot
transmission mechanism, which can be overridden by creating a custom
implementation of the `RaftNetwork::snapshot()` method. As part of this
commit, `RaftNetwork::snapshot()` simply delegates to `stream_snapshot`.
Developers may use `stream_snapshot` as a reference when implementing
their own snapshot transmission strategy.

Snapshot transmission is internally divided into two distinct phases:

1. Upon request for snapshot transmission, `ReplicationCore` initiates a
   new task `RaftNetwork::snapshot()` dedicated to sending a complete
   `Snapshot`. This task should be able to be terminated gracefully by
   subscribing the `cancel` future.

2. Once the snapshot has been fully transmitted by
   `RaftNetwork::snapshot()`, the task signals an event back to
   `ReplicationCore`. Subsequently, `ReplicationCore` informs `RaftCore`
   of the event, allowing it to acknowledge the completion of the
   snapshot transmission.

Other changes:

- `ReplicationCore` has two `RaftNetwork`s, one for log replication and
  heartbeat, the other for snapshot only.

- `ReplicationClosed` becomes a public error for notifying the
  application implemented sender that a snapshot replication is
  canceled.

- `StreamingError` is introduced as a container of errors that may occur
  in application defined snapshot transmission, including local IO
  error, network errors, errors returned by remote peer and `ReplicationClosed`.

- The `SnapshotResponse` type is introduced to differentiate it from the
  `InstallSnapshotResponse`, which is used for chunk-based responses.

---

- Part of #606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format.

If this feature flag is not eabled, no changes are required for application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `general-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not also use `RaftNetwork::install_snapshot()` for

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `general-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-general-snapshot-data` with
`general-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format.

If this feature flag is not eabled, no changes are required for application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `general-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not also use `RaftNetwork::install_snapshot()` for

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `general-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-general-snapshot-data` with
`general-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format and
transmission protocol.

If this feature flag is not eabled, no changes are required for
application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `general-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not also use `RaftNetwork::install_snapshot()` for

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `general-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-general-snapshot-data` with
`general-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format and
transmission protocol.

If this feature flag is not eabled, no changes are required for
application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `general-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not also use `RaftNetwork::install_snapshot()` for

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `general-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-general-snapshot-data` with
`general-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
- Fix: databendlabs#209
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `general-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format and
transmission protocol.

If this feature flag is not eabled, no changes are required for
application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `general-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `general-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not need to implement `RaftNetwork::install_snapshot()`.

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `general-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-general-snapshot-data` with
`general-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
- Fix: databendlabs#209
drmingdrmer added a commit to drmingdrmer/openraft that referenced this issue Feb 19, 2024
Add feature flag `generic-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format and
transmission protocol.

If this feature flag is not eabled, no changes are required for
application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `generic-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `generic-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not need to implement `RaftNetwork::install_snapshot()`.

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `generic-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-generic-snapshot-data` with
`generic-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: databendlabs#606
- Fix: databendlabs#209
drmingdrmer added a commit that referenced this issue Feb 19, 2024
Add feature flag `generic-snapshot-data`: when enabled, `SnapshotData`
does not have `AsyncSeek + AsyncRead + AsyncWrite` bound.
This enables application to define their own snapshot format and
transmission protocol.

If this feature flag is not eabled, no changes are required for
application to upgrade Openraft.

On the sending end(leader that sends snapshot to follower):

- Without `generic-snapshot-data`: `RaftNetwork::snapshot()`
  provides a default implementation that invokes the chunk based API
  `RaftNetwork::install_snapshot()` for transmit.

- With `generic-snapshot-data` enabled: `RaftNetwork::snapshot()` must be
  implemented to provide application customized snapshot transmission.
  Application does not need to implement `RaftNetwork::install_snapshot()`.

On the receiving end(follower):

- `Raft::install_snapshot()` is available only when
  `generic-snapshot-data` is disabled.

Add an example `examples/raft-kv-memstore-generic-snapshot-data` with
`generic-snapshot-data` enabled.
In this example snapshot is transmitted without fragmentation, i.e., via
`RaftNetwork::snapshot()`. The chunk based API
`RaftNetwork::install_snapshot()` is not used.
In a production scenario, a snapshot can be transmitted in arbitrary
manner.

- Fix: #606
- Fix: #209
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-replication Area: replication protocol A-snapshot Area: snapshot creation/transport/management A-store Area: storage
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants