Etcdctl snapshot restore doesn't actually restore data #13763

pcgeek86 · 2022-03-07T23:34:25Z

What happened?

Running etcdctl snapshot restore doesn't actually restore any data to the cluster.
Data that should have been stored in the snapshot isn't restored to the cluster.

root@etcd01:~# docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 snapshot restore /etc/kubernetes/snapshot01
Deprecated: Use `etcdutl snapshot restore` instead.

2022-03-07T23:28:26Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "/etc/kubernetes/snapshot01", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-03-07T23:28:26Z    info    membership/store.go:141 Trimming membership information from the backend...
2022-03-07T23:28:26Z    info    membership/cluster.go:421       added member    {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-03-07T23:28:26Z    info    snapshot/v3_snapshot.go:272     restored snapshot       {"path": "/etc/kubernetes/snapshot01", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap"}

What did you expect to happen?

Etcd restores data from snapshot

How can we reproduce it (as minimally and precisely as possible)?

Set up an etcd cluster
Add a couple of keys to the cluster (ie. put FirstName Trevor, put LastName Sullivan)
Run etcdctl snapshot save
Run etcdctl snapshot restore

I followed these documents to set up a cluster, and tried using etcdctl to backup and restore a cluster.
Even though the restore appears to succeed, the data isn't actually restored to the cluster.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/

Anything else we need to know?

etcdctl said restoring snapshots is deprecated, but the etcdutl command that it points to doesn't work either.

Etcd version (please run commands below)

docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 version
etcdctl version: 3.5.1
API version: 3.5

Etcd configuration (command line flags or environment variables)

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.31.61.35:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.31.61.35:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://172.31.61.35:2380
    - --initial-cluster=etcd01=https://172.31.61.35:2380,etcd02=https://172.31.55.170:2380,etcd03=https://172.31.60.240:2380
    - --initial-cluster-state=new
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://172.31.61.35:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.31.61.35:2380
    - --name=etcd01
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.5.1-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 member list -w table
+------------------+---------+--------+----------------------------+----------------------------+------------+
|        ID        | STATUS  |  NAME  |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+----------------------------+----------------------------+------------+
| 4ff49da3ad1aed7e | started | etcd03 | https://172.31.60.240:2380 | https://172.31.60.240:2379 |      false |
| 5d857b9f00d3be7d | started | etcd02 | https://172.31.55.170:2380 | https://172.31.55.170:2379 |      false |
| cb90006b7c1c5478 | started | etcd01 |  https://172.31.61.35:2380 |  https://172.31.61.35:2379 |      false |
+------------------+---------+--------+----------------------------+----------------------------+------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.31.61.35:2379 | cb90006b7c1c5478 |   3.5.1 |   20 kB |      true |      false |         2 |
18 |                 18 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

moonovo · 2022-03-08T09:11:56Z

the snapshot restore command does not send requests to etcd. Before using the command, you need to stop the all etcd pod, add the --data-dir flag to your data directory (maybe /var/lib/etcd/default.etcd)，and run this command to rebuild the data directory.

kkkkun · 2022-03-08T09:15:45Z

I restore my local etcd used ./etcdutl snapshot restore ./snapshot.db --data-dir=./default.etcd.
It worked.

But it print a stack, you can ingore. it fixed by #13767

serathius · 2022-03-08T11:02:49Z

Sorry for confusing naming, etcdctl snapshot restore doesn't restore cluster to snapshot, but data directory so you can run etcd instance on it.

etcd $ ./bin/etcdctl snapshot restore --help
NAME:
        snapshot restore - Restores an etcd member snapshot to an etcd directory

pcgeek86 added the type/bug label Mar 7, 2022

serathius closed this as completed Mar 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Etcdctl snapshot restore doesn't actually restore data #13763

Etcdctl snapshot restore doesn't actually restore data #13763

pcgeek86 commented Mar 7, 2022

moonovo commented Mar 8, 2022

kkkkun commented Mar 8, 2022

serathius commented Mar 8, 2022

Etcdctl snapshot restore doesn't actually restore data #13763

Etcdctl snapshot restore doesn't actually restore data #13763

Comments

pcgeek86 commented Mar 7, 2022

What happened?

What did you expect to happen?

How can we reproduce it (as minimally and precisely as possible)?

Anything else we need to know?

Etcd version (please run commands below)

Etcd configuration (command line flags or environment variables)

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

Relevant log output

moonovo commented Mar 8, 2022

kkkkun commented Mar 8, 2022

serathius commented Mar 8, 2022