Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thinking about disaster recovery #534

Open
bingkunyangvungle opened this issue Apr 8, 2024 · 5 comments
Open

Thinking about disaster recovery #534

bingkunyangvungle opened this issue Apr 8, 2024 · 5 comments

Comments

@bingkunyangvungle
Copy link

What is currently missing?

Since we have all the data needed in S3 already (including the log data and the metadata), it can be of great help if we can just recover from the shutdown of the cluster, and launch the new cluster to use the same topic folder(prefix_randomstring) in S3. For the new cluster, we can consume from the oldest offset from the old topic folder. In this way, we can fully achieve the disaster recovery.

How could this be improved?

Some suggestions:
1.Enable the topic can be created with designated folder (with random string) in S3 in the new cluster;
2.The new message for the topic can be stored in the older folder

Is this a feature you would work on yourself?

I haven't take a look for the actual code part, but if someone can point me to the proper code link, I'd be happy to start looking.

@ivanyu
Copy link
Contributor

ivanyu commented Apr 9, 2024

Hi @bingkunyangvungle.
Such restoration is certainly possible. Do you see particular changes need to be done to this RSM plugin?

@ivanyu
Copy link
Contributor

ivanyu commented Apr 9, 2024

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

@bingkunyangvungle
Copy link
Author

bingkunyangvungle commented Apr 9, 2024

This is the tricky part that I think it might be. One proposal that I can think of is to have an internal mapping between the kafka-created topic ID <------> the plugin-managed topic ID. Then the plugin will manage the remote storage folder with the plugin-managed topic ID.
Also user can provide/configure the plugin-managed topic ID for topic provisioning. Of course the plugin would also need to check if the remote storage folder exist or not.

Just toss some idea here to share.

@funky-eyes
Copy link
Contributor

Kafka doesn't really support this out of the box and quite a bit of work, sometimes hacky, needs to be done on a fresh cluster to make it correctly pick up an "attached" remote storage. Start from that topic IDs must match between the remote storage and the cluster, but the user doesn't control IDs of topics being created.

Perhaps we can ignore the topicid in ObjectKeyFactory#mainPath? Adding some configuration to achieve this purpose.

@ivanyu
Copy link
Contributor

ivanyu commented Jun 6, 2024

This probably also doesn't have to do to the plugin. The plugin is a pretty passive component and does only a limited number of operations by the broker request. I think this should be a separate tool and/or certain broker modifications to support restoration like this in the first place.

We're exploring this idea in Aiven, but for the reason mentioned above, I don't think the plugin will undergo much change in the course of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants