Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Etcdctl snapshot restore doesn't actually restore data #13763

Closed
pcgeek86 opened this issue Mar 7, 2022 · 3 comments
Closed

Etcdctl snapshot restore doesn't actually restore data #13763

pcgeek86 opened this issue Mar 7, 2022 · 3 comments
Labels

Comments

@pcgeek86
Copy link

pcgeek86 commented Mar 7, 2022

What happened?

Running etcdctl snapshot restore doesn't actually restore any data to the cluster.
Data that should have been stored in the snapshot isn't restored to the cluster.

root@etcd01:~# docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 snapshot restore /etc/kubernetes/snapshot01
Deprecated: Use `etcdutl snapshot restore` instead.

2022-03-07T23:28:26Z    info    snapshot/v3_snapshot.go:251     restoring snapshot      {"path": "/etc/kubernetes/snapshot01", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap", "stack": "go.etcd.io/etcd/etcdutl/v3/snapshot.(*v3Manager).Restore\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/snapshot/v3_snapshot.go:257\ngo.etcd.io/etcd/etcdutl/v3/etcdutl.SnapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdutl/etcdutl/snapshot_command.go:147\ngo.etcd.io/etcd/etcdctl/v3/ctlv3/command.snapshotRestoreCommandFunc\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/command/snapshot_command.go:128\ngithub.com/spf13/cobra.(*Command).execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:960\ngithub.com/spf13/cobra.(*Command).Execute\n\t/home/remote/sbatsche/.gvm/pkgsets/go1.16.3/global/pkg/mod/github.com/spf13/[email protected]/command.go:897\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.Start\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:107\ngo.etcd.io/etcd/etcdctl/v3/ctlv3.MustStart\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/ctlv3/ctl.go:111\nmain.main\n\t/tmp/etcd-release-3.5.1/etcd/release/etcd/etcdctl/main.go:59\nruntime.main\n\t/home/remote/sbatsche/.gvm/gos/go1.16.3/src/runtime/proc.go:225"}
2022-03-07T23:28:26Z    info    membership/store.go:141 Trimming membership information from the backend...
2022-03-07T23:28:26Z    info    membership/cluster.go:421       added member    {"cluster-id": "cdf818194e3a8c32", "local-member-id": "0", "added-peer-id": "8e9e05c52164694d", "added-peer-peer-urls": ["http://localhost:2380"]}
2022-03-07T23:28:26Z    info    snapshot/v3_snapshot.go:272     restored snapshot       {"path": "/etc/kubernetes/snapshot01", "wal-dir": "default.etcd/member/wal", "data-dir": "default.etcd", "snap-dir": "default.etcd/member/snap"}

What did you expect to happen?

Etcd restores data from snapshot

How can we reproduce it (as minimally and precisely as possible)?

  1. Set up an etcd cluster
  2. Add a couple of keys to the cluster (ie. put FirstName Trevor, put LastName Sullivan)
  3. Run etcdctl snapshot save
  4. Run etcdctl snapshot restore

I followed these documents to set up a cluster, and tried using etcdctl to backup and restore a cluster.
Even though the restore appears to succeed, the data isn't actually restored to the cluster.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/setup-ha-etcd-with-kubeadm/

Anything else we need to know?

etcdctl said restoring snapshots is deprecated, but the etcdutl command that it points to doesn't work either.

Etcd version (please run commands below)

docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 version
etcdctl version: 3.5.1
API version: 3.5

Etcd configuration (command line flags or environment variables)

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.31.61.35:2379
  creationTimestamp: null
  labels:
    component: etcd
    tier: control-plane
  name: etcd
  namespace: kube-system
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.31.61.35:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --initial-advertise-peer-urls=https://172.31.61.35:2380
    - --initial-cluster=etcd01=https://172.31.61.35:2380,etcd02=https://172.31.55.170:2380,etcd03=https://172.31.60.240:2380
    - --initial-cluster-state=new
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://172.31.61.35:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.31.61.35:2380
    - --name=etcd01
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: k8s.gcr.io/etcd:3.5.1-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    name: etcd
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /health
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  hostNetwork: true
  priorityClassName: system-node-critical
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status: {}

Etcd debug information (please run commands blow, feel free to obfuscate the IP address or FQDN in the output)

$ docker run -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:${ETCD_TAG} etcdctl --cert /etc/kubernetes/pki/etcd/peer.crt --key /etc/kubernetes/pki/etcd/peer.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://${HOST0}:2379 member list -w table
+------------------+---------+--------+----------------------------+----------------------------+------------+
|        ID        | STATUS  |  NAME  |         PEER ADDRS         |        CLIENT ADDRS        | IS LEARNER |
+------------------+---------+--------+----------------------------+----------------------------+------------+
| 4ff49da3ad1aed7e | started | etcd03 | https://172.31.60.240:2380 | https://172.31.60.240:2379 |      false |
| 5d857b9f00d3be7d | started | etcd02 | https://172.31.55.170:2380 | https://172.31.55.170:2379 |      false |
| cb90006b7c1c5478 | started | etcd01 |  https://172.31.61.35:2380 |  https://172.31.61.35:2379 |      false |
+------------------+---------+--------+----------------------------+----------------------------+------------+

$ etcdctl --endpoints=<member list> endpoint status -w table
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.31.61.35:2379 | cb90006b7c1c5478 |   3.5.1 |   20 kB |      true |      false |         2 |
18 |                 18 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

Relevant log output

No response

@moonovo
Copy link

moonovo commented Mar 8, 2022

the snapshot restore command does not send requests to etcd. Before using the command, you need to stop the all etcd pod, add the --data-dir flag to your data directory (maybe /var/lib/etcd/default.etcd),and run this command to rebuild the data directory.

@kkkkun
Copy link
Contributor

kkkkun commented Mar 8, 2022

I restore my local etcd used ./etcdutl snapshot restore ./snapshot.db --data-dir=./default.etcd.
It worked.

But it print a stack, you can ingore. it fixed by #13767

@serathius
Copy link
Member

Sorry for confusing naming, etcdctl snapshot restore doesn't restore cluster to snapshot, but data directory so you can run etcd instance on it.

etcd $ ./bin/etcdctl snapshot restore --help
NAME:
        snapshot restore - Restores an etcd member snapshot to an etcd directory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants