Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcd version should be easier to access when performing snapshot/restore #13070

Closed
serathius opened this issue Jun 2, 2021 · 3 comments
Closed
Assignees

Comments

@serathius
Copy link
Member

serathius commented Jun 2, 2021

Background

Etcd allows for executing remote snapshot command that streams the snapshot data to the client. Client can upload the received data to any cloud storage of their preference (S3, GCS etc). Snapshot can be created by any 3.* compatible client, however restoring requires server to match the minor 3.X version of server that created the snapshot (for example 3.4 snapshot can be only restored into 3.4 server).

Restoring a snapshot is usually done in case of emergency mitigation when cluster is down. In such case it's preferred to restore cluster behavior before root causing. If downtime is caused by cluster upgrade or upgrade has happened since the snapshot, it is required to switch Etcd binary.

When working with larger fleets of clusters it is much harder to correlate information about snapshots and upgrades. Providing a fast response to restore critical clusters necessitates pairing etcd snapshots data with information server version.

Problem

Information about etcd version of particular node, nesesery for restoring snapshot, is not easily accessible when creating the snapshot. Version can be retrieved from etcd binary or external system (docker image, node metadata, etc). This undermines the ability to execute snapshots remotely as it either requires access to local node or some external system.

Proposal

I propose to implemented following improvements:

First change would allow to remove dependence on inspecting etcd binary thus making snapshoting fully remote. Remaining two changes would improve the process of restoring the snapshot, by allowing to verify that snapshot version matches the expected one.

cc @ptabor

@serathius serathius changed the title Snapshot should return Etcd version Etcd version should be accessible when performing snapshot/restore Jun 2, 2021
@serathius serathius changed the title Etcd version should be accessible when performing snapshot/restore Etcd version easier to access when performing snapshot/restore Jun 2, 2021
@serathius serathius changed the title Etcd version easier to access when performing snapshot/restore Etcd version should be easier to access when performing snapshot/restore Jun 2, 2021
@ptabor
Copy link
Contributor

ptabor commented Jun 9, 2021

LGTM

Store information about local etcd version within the database itself, therefore making it available within the snapshot.

Nit: I would fraze it: 'Store etcd file format version in the backend/bbolt itself, therefore making it available within the snapshot' .
Technically it means the version of etcd that wrote the file. For bbolt file in running etcd (not backup snapshot written from scratch), the version should get upgraded when properties of new format has been upgraded.

I.e. etcd-3.5 added new field: meta/term
etcd-3.4 does not have it.

When we run etcd-3.5 for the first time on the '3.4' file, some logic should trigger that the meta/term field is 'populated' and only if its fulfiled the file should store to be in version-3.5.

@serathius
Copy link
Member Author

ping @gyuho

@serathius
Copy link
Member Author

WIth both PRs merged this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants