-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Etcd version should be easier to access when performing snapshot/restore #13070
Comments
LGTM
Nit: I would fraze it: 'Store etcd file format version in the backend/bbolt itself, therefore making it available within the snapshot' . I.e. etcd-3.5 added new field: meta/term When we run etcd-3.5 for the first time on the '3.4' file, some logic should trigger that the meta/term field is 'populated' and only if its fulfiled the file should store to be in version-3.5. |
ping @gyuho |
WIth both PRs merged this issue can be closed. |
Background
Etcd allows for executing remote snapshot command that streams the snapshot data to the client. Client can upload the received data to any cloud storage of their preference (S3, GCS etc). Snapshot can be created by any 3.* compatible client, however restoring requires server to match the minor 3.X version of server that created the snapshot (for example 3.4 snapshot can be only restored into 3.4 server).
Restoring a snapshot is usually done in case of emergency mitigation when cluster is down. In such case it's preferred to restore cluster behavior before root causing. If downtime is caused by cluster upgrade or upgrade has happened since the snapshot, it is required to switch Etcd binary.
When working with larger fleets of clusters it is much harder to correlate information about snapshots and upgrades. Providing a fast response to restore critical clusters necessitates pairing etcd snapshots data with information server version.
Problem
Information about etcd version of particular node, nesesery for restoring snapshot, is not easily accessible when creating the snapshot. Version can be retrieved from etcd binary or external system (docker image, node metadata, etc). This undermines the ability to execute snapshots remotely as it either requires access to local node or some external system.
Proposal
I propose to implemented following improvements:
SnapshotResponse
proto to include information about etcd version ([Version in Snapshot] SnapshotResponse includes local etcd version #13073)etcdutl snapshot status
print information about etcd server that created it. ([Version in Snapshot] Preserve etcd version in backend allowing etcdutl to read it from snapshot #13094)First change would allow to remove dependence on inspecting etcd binary thus making snapshoting fully remote. Remaining two changes would improve the process of restoring the snapshot, by allowing to verify that snapshot version matches the expected one.
cc @ptabor
The text was updated successfully, but these errors were encountered: