You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug:
It has been observed in one of our production cluster that when etcd's data-dir got removed somehow, backup-restore failed to detect this as a single member restoration scenario for a etcd pod when snapstore is not configured and backup-restore falsely detect this case as bootstrap case. This leads to etcd-events-0 pod not starting up as it failed to join the cluster due to memberID mismatch.
❯ k get pods etcd-events-0
etcd-events-0 1/2 Running 0 2m14s
How To Reproduce (as minimally and precisely as possible):
Start a 3 member etcd cluster when snapstore is not configured.
Start a debug container to etcd-0 pod then remove the data-dir completely.
Kill the etcd container to restart/trigger the restoration.
Logs:
backup-restore logs of etcd-events-0 pod:
2024-08-07 23:59:36 | {"log":"Served config for ETCD instance.","severity":"INFO"}
2024-08-07 23:59:36 | {"log":"checking the presence of a learner in a cluster...","severity":"INFO"}
2024-08-07 23:59:35 | {"log":{"attempt":0,"caller":"clientv3/retry_interceptor.go:62","error":"rpc error: code = DeadlineExceeded desc = latest balancer error: connection error: desc = \"transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused\"","level":"warn","msg":"retrying of unary invoker failed","target":"passthrough:///https://etcd-events-local:2379","ts":"2024-08-07T23:59:35.845Z"}}
2024-08-07 23:59:35 | {"log":"failed to get status of etcd endPoint: https://etcd-events-local:2379 with error: context deadline exceeded","severity":"ERR"}
2024-08-07 23:59:35 | {"log":"Updating status from Successful to New","severity":"INFO"}
2024-08-07 23:59:35 | {"log":"Responding to status request with: Successful","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Successfully initialized data directory for etcd.","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Removing directory(/var/etcd/data/new.etcd) since snapstore is empty.","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"storage provider name not specified","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Checking whether the backup bucket is empty or not...","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Validation mode: full","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Validation failBelowRevision: ","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Setting status to : 503","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Updating status from New to Progress","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Received start initialization request.","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"Responding to status request with: New","severity":"INFO"}
2024-08-07 23:59:34 | {"log":"No snapstore storage provider configured.","severity":"WARN"}
2024-08-07 23:59:33 | {"log":"TLS enabled. Starting HTTPS server.","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Starting HTTP server at addr: :8080","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Checking if etcd is running","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Starting the http server...","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Registering the http request handlers...","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Setting status to : 503","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"compressionConfig:\\n enabled: true\\n policy: gzip\\ndefragmentationSchedule: 17 1 */3 * *\\netcdConnectionConfig:\\n caFile: /var/etcd/ssl/client/ca/bundle.crt\\n certFile: /var/etcd/ssl/client/client/tls.crt\\n connectionTimeout: 5m0s\\n defragTimeout: 15m0s\\n endpoints:\\n - https://etcd-events-local:2379\\n keyFile: /var/etcd/ssl/client/client/tls.key\\n serviceEndpoints:\\n - https://etcd-events-client:2379\\n snapshotTimeout: 15m0s\\nexponentialBackoffConfig:\\n attemptLimit: 6\\n multiplier: 2\\n thresholdTime: 2m8s\\nhealthConfig:\\n deltaSnapshotLeaseName: delta-snapshot-revisions\\n fullSnapshotLeaseName: full-snapshot-revisions\\n heartbeatDuration: 10s\\n memberGCDuration: 1m0s\\n memberLeaseRenewalEnabled: true\\nleaderElectionConfig:\\n etcdConnectionTimeout: 5s\\n reelectionPeriod: 5s\\nrestorationConfig:\\n MaxRequestBytes: 10485760\\n MaxTxnOps: 10240\\n autoCompactionMode: periodic\\n autoCompactionRetention: 30m\\n dataDir: /var/etcd/data/new.etcd\\n embeddedEtcdQuotaBytes: 8589934592\\n initialAdvertisePeerURLs:\\n - http://localhost:2380\\n initialCluster: default=http://localhost:2380\\n initialClusterToken: etcd-cluster\\n maxCallSendMsgSize: 10485760\\n maxFetchers: 6\\n name: default\\n tempDir: /var/etcd/data/restoration.temp\\nserverConfig:\\n port: 8080\\n server-cert: /var/etcd/ssl/client/server/tls.crt\\n server-key: /var/etcd/ssl/client/server/tls.key\\nsnapshotterConfig:\\n deltaSnapshotMemoryLimit: 104857600\\n deltaSnapshotPeriod: 20s\\n deltaSnapshotRetentionPeriod: 0s\\n garbageCollectionPeriod: 12h0m0s\\n garbageCollectionPolicy: Exponential\\n maxBackups: 7\\n schedule: 0 */1 * * *\\nsnapstoreConfig:\\n container: \\\"\\\"\\n maxParallelChunkUploads: 5\\n minChunkSize: 5242880\\n prefix: v2\\n tempDir: /var/etcd/data/temp\\n","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Go OS/Arch: linux/amd64","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Go Version: go1.20.3","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"Git SHA: 6a8f2198","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"etcd-backup-restore Version: v0.28.2","severity":"INFO"}
2024-08-07 23:59:33 | {"log":"No snapstore storage provider configured. Will not start backup schedule.","severity":"WARN"}
2024-08-07 23:17:38 | {"log":"HTTPS server closed gracefully.","severity":"INFO"}
2024-08-07 23:17:38 | {"log":"Shutting down LeaderElection...","severity":"INFO"}
Screenshots (if applicable):
Environment (please complete the following information):
Etcd version/commit ID :
Etcd-backup-restore version/commit ID:
Cloud Provider [All/AWS/GCS/ABS/Swift/OSS]:
Anything else we need to know?:
This issue can only be occur for 0th pod.
The text was updated successfully, but these errors were encountered:
Describe the bug:
It has been observed in one of our production cluster that when etcd's data-dir got removed somehow, backup-restore failed to detect this as a single member restoration scenario for a etcd pod when snapstore is not configured and backup-restore falsely detect this case as
bootstrap
case. This leads toetcd-events-0
pod not starting up as it failed to join the cluster due to memberID mismatch.How To Reproduce (as minimally and precisely as possible):
etcd-0
pod then remove the data-dir completely.Logs:
backup-restore logs of
etcd-events-0
pod:Screenshots (if applicable):
Environment (please complete the following information):
Anything else we need to know?:
This issue can only be occur for
0th
pod.The text was updated successfully, but these errors were encountered: