Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd panic on startup (auth enabled) #13937

Closed
Tracked by #13960
ahrtr opened this issue Apr 13, 2022 · 2 comments · Fixed by #13942
Closed
Tracked by #13960

etcd panic on startup (auth enabled) #13937

ahrtr opened this issue Apr 13, 2022 · 2 comments · Fixed by #13942

Comments

@ahrtr
Copy link
Member

ahrtr commented Apr 13, 2022

The issue is reproduced on main branch.

Steps to reproduce this issue:

  1. Start an etcd cluster with 3 members using the Procfile;
  2. Enable Auth per the guide ;
  3. Run load test (2000 threads) against the cluster using Jmeter (see the attached test plan). The command: jmeter -n -t etcd_data_inconsistency.jmx (Please remove the suffix .txt )
  4. Some etcd instances may crash;
  5. Stop all etcd members;
  6. Run goreman start to start the cluster again. Then the etcd panic and I saw the following error log.

wachao-a01:etcd wachao$ goreman start
17:03:01 etcd2 | Starting etcd2 on port 5100
17:03:01 etcd3 | Starting etcd3 on port 5200
17:03:01 etcd1 | Starting etcd1 on port 5000
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.862+0800","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","infra3","--listen-metrics-urls","http://0.0.0.0:32381","--listen-client-urls","http://127.0.0.1:32379","--advertise-client-urls","http://127.0.0.1:32379","--listen-peer-urls","http://127.0.0.1:32380","--initial-advertise-peer-urls","http://127.0.0.1:32380","--initial-cluster-token","etcd-cluster-1","--initial-cluster","infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380","--initial-cluster-state","new","--enable-pprof","--logger=zap","--log-outputs=stderr","--snapshot-count","3"]}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:100","msg":"failed to detect default host","error":"default host not supported on darwin_amd64"}
17:03:01 etcd3 | {"level":"warn","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:105","msg":"'data-dir' was empty; using default","data-dir":"infra3.etcd"}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","infra2","--listen-metrics-urls","http://0.0.0.0:22381","--listen-client-urls","http://127.0.0.1:22379","--advertise-client-urls","http://127.0.0.1:22379","--listen-peer-urls","http://127.0.0.1:22380","--initial-advertise-peer-urls","http://127.0.0.1:22380","--initial-cluster-token","etcd-cluster-1","--initial-cluster","infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380","--initial-cluster-state","new","--enable-pprof","--logger=zap","--log-outputs=stderr","--snapshot-count","3"]}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:116","msg":"server has already been initialized","data-dir":"infra3.etcd","dir-type":"member"}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:100","msg":"failed to detect default host","error":"default host not supported on darwin_amd64"}
17:03:01 etcd2 | {"level":"warn","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:105","msg":"'data-dir' was empty; using default","data-dir":"infra2.etcd"}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:121","msg":"configuring peer listeners","listen-peer-urls":["http://127.0.0.1:32380"]}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:116","msg":"server has already been initialized","data-dir":"infra2.etcd","dir-type":"member"}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:129","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:32379"]}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:606","msg":"pprof is enabled","path":"/debug/pprof"}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:121","msg":"configuring peer listeners","listen-peer-urls":["http://127.0.0.1:22380"]}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:129","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:22379"]}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:606","msg":"pprof is enabled","path":"/debug/pprof"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:73","msg":"Running: ","args":["etcd","--name","infra1","--listen-metrics-urls","http://0.0.0.0:2381","--listen-client-urls","http://127.0.0.1:2379","--advertise-client-urls","http://127.0.0.1:2379","--listen-peer-urls","http://127.0.0.1:12380","--initial-advertise-peer-urls","http://127.0.0.1:12380","--initial-cluster-token","etcd-cluster-1","--initial-cluster","infra1=http://127.0.0.1:12380,infra2=http://127.0.0.1:22380,infra3=http://127.0.0.1:32380","--initial-cluster-state","new","--enable-pprof","--logger=zap","--log-outputs=stderr","--snapshot-count","3"]}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:100","msg":"failed to detect default host","error":"default host not supported on darwin_amd64"}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"b018742c4","go-version":"go1.17","go-os":"darwin","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":true,"name":"infra3","data-dir":"infra3.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"infra3.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","wait-cluster-ready-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":3,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://127.0.0.1:32380"],"listen-peer-urls":["http://127.0.0.1:32380"],"advertise-client-urls":["http://127.0.0.1:32379"],"listen-client-urls":["http://127.0.0.1:32379"],"listen-metrics-urls":["http://0.0.0.0:32381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
17:03:01 etcd1 | {"level":"warn","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:105","msg":"'data-dir' was empty; using default","data-dir":"infra1.etcd"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"etcdmain/etcd.go:116","msg":"server has already been initialized","data-dir":"infra1.etcd","dir-type":"member"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:121","msg":"configuring peer listeners","listen-peer-urls":["http://127.0.0.1:12380"]}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.863+0800","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"b018742c4","go-version":"go1.17","go-os":"darwin","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":true,"name":"infra2","data-dir":"infra2.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"infra2.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","wait-cluster-ready-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":3,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://127.0.0.1:22380"],"listen-peer-urls":["http://127.0.0.1:22380"],"advertise-client-urls":["http://127.0.0.1:22379"],"listen-client-urls":["http://127.0.0.1:22379"],"listen-metrics-urls":["http://0.0.0.0:22381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.864+0800","caller":"embed/etcd.go:129","msg":"configuring client listeners","listen-client-urls":["http://127.0.0.1:2379"]}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.864+0800","caller":"embed/etcd.go:606","msg":"pprof is enabled","path":"/debug/pprof"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.864+0800","caller":"embed/etcd.go:307","msg":"starting an etcd server","etcd-version":"3.6.0-alpha.0","git-sha":"b018742c4","go-version":"go1.17","go-os":"darwin","go-arch":"amd64","max-cpu-set":16,"max-cpu-available":16,"member-initialized":true,"name":"infra1","data-dir":"infra1.etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"infra1.etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","wait-cluster-ready-timeout":"5s","initial-election-tick-advance":true,"snapshot-count":3,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://127.0.0.1:12380"],"listen-peer-urls":["http://127.0.0.1:12380"],"advertise-client-urls":["http://127.0.0.1:2379"],"listen-client-urls":["http://127.0.0.1:2379"],"listen-metrics-urls":["http://0.0.0.0:2381"],"cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-size-bytes":2147483648,"pre-vote":true,"initial-corrupt-check":false,"corrupt-check-time-interval":"0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.865+0800","caller":"storage/backend.go:81","msg":"opened backend db","path":"infra3.etcd/member/snap/db","took":"202.338µs"}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.865+0800","caller":"storage/backend.go:81","msg":"opened backend db","path":"infra2.etcd/member/snap/db","took":"207.292µs"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.865+0800","caller":"storage/backend.go:81","msg":"opened backend db","path":"infra1.etcd/member/snap/db","took":"183.284µs"}
17:03:01 etcd3 | {"level":"info","ts":"2022-04-13T17:03:01.865+0800","caller":"etcdserver/bootstrap.go:409","msg":"recovered v2 store from snapshot","snapshot-index":292,"snapshot-size":"9.0 kB"}
17:03:01 etcd2 | {"level":"info","ts":"2022-04-13T17:03:01.866+0800","caller":"etcdserver/bootstrap.go:409","msg":"recovered v2 store from snapshot","snapshot-index":292,"snapshot-size":"9.0 kB"}
17:03:01 etcd1 | {"level":"info","ts":"2022-04-13T17:03:01.866+0800","caller":"etcdserver/bootstrap.go:409","msg":"recovered v2 store from snapshot","snapshot-index":68,"snapshot-size":"9.0 kB"}
17:03:01 etcd3 | {"level":"warn","ts":"2022-04-13T17:03:01.963+0800","caller":"snap/db.go:87","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":292,"snapshot-file-path":"infra3.etcd/member/snap/0000000000000124.snap.db","error":"snap: snapshot file doesn't exist"}
17:03:01 etcd3 | {"level":"panic","ts":"2022-04-13T17:03:01.963+0800","caller":"etcdserver/bootstrap.go:416","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40\nmain.main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32\nruntime.main\n\t/Users/wachao/software/go/src/runtime/proc.go:255"}
17:03:01 etcd3 | panic: failed to recover v3 backend from snapshot
17:03:01 etcd3 | goroutine 1 [running]:
17:03:01 etcd3 | go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc0006b63c0, {0xc000148b80, 0x1, 0x1})
17:03:01 etcd3 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:234 +0x499
17:03:01 etcd3 | go.uber.org/zap.(*Logger).Panic(0x7ffeefbff7fc, {0x1bfe548, 0x0}, {0xc000148b80, 0x1, 0x1})
17:03:01 etcd3 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/logger.go:227 +0x59
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416 +0x974
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228 +0x455
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdserver.bootstrap({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79 +0x7d2
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdserver.NewServer({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302 +0x98
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0002db500)
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244 +0x1258
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0x7ffeefbff7fc)
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228 +0x1d
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2({0xc00003a480, 0x18, 0x18})
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123 +0x14e5
17:03:01 etcd3 | go.etcd.io/etcd/server/v3/etcdmain.Main({0xc00003a480, 0x1a31720, 0x18})
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40 +0x10c
17:03:01 etcd3 | main.main()
17:03:01 etcd3 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32 +0x2e
17:03:01 etcd3 | Terminating etcd3
17:03:01 etcd2 | {"level":"warn","ts":"2022-04-13T17:03:01.990+0800","caller":"snap/db.go:87","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":292,"snapshot-file-path":"infra2.etcd/member/snap/0000000000000124.snap.db","error":"snap: snapshot file doesn't exist"}
17:03:01 etcd2 | {"level":"panic","ts":"2022-04-13T17:03:01.990+0800","caller":"etcdserver/bootstrap.go:416","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40\nmain.main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32\nruntime.main\n\t/Users/wachao/software/go/src/runtime/proc.go:255"}
17:03:01 etcd2 | panic: failed to recover v3 backend from snapshot
17:03:01 etcd2 | goroutine 1 [running]:
17:03:01 etcd2 | go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc000196600, {0xc0007a05c0, 0x1, 0x1})
17:03:01 etcd2 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:234 +0x499
17:03:01 etcd2 | go.uber.org/zap.(*Logger).Panic(0x7ffeefbff7fc, {0x1bfe548, 0x0}, {0xc0007a05c0, 0x1, 0x1})
17:03:01 etcd2 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/logger.go:227 +0x59
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416 +0x974
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228 +0x455
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdserver.bootstrap({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79 +0x7d2
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdserver.NewServer({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302 +0x98
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0002f2700)
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244 +0x1258
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0x7ffeefbff7fc)
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228 +0x1d
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2({0xc000120180, 0x18, 0x18})
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123 +0x14e5
17:03:01 etcd2 | go.etcd.io/etcd/server/v3/etcdmain.Main({0xc000120180, 0x1a31720, 0x18})
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40 +0x10c
17:03:01 etcd2 | main.main()
17:03:01 etcd2 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32 +0x2e
17:03:01 etcd2 | Terminating etcd2
17:03:02 etcd1 | {"level":"warn","ts":"2022-04-13T17:03:02.009+0800","caller":"snap/db.go:87","msg":"failed to find [SNAPSHOT-INDEX].snap.db","snapshot-index":68,"snapshot-file-path":"infra1.etcd/member/snap/0000000000000044.snap.db","error":"snap: snapshot file doesn't exist"}
17:03:02 etcd1 | {"level":"panic","ts":"2022-04-13T17:03:02.009+0800","caller":"etcdserver/bootstrap.go:416","msg":"failed to recover v3 backend from snapshot","error":"failed to find database snapshot file (snap: snapshot file doesn't exist)","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40\nmain.main\n\t/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32\nruntime.main\n\t/Users/wachao/software/go/src/runtime/proc.go:255"}
17:03:02 etcd1 | panic: failed to recover v3 backend from snapshot
17:03:02 etcd1 | goroutine 1 [running]:
17:03:02 etcd1 | go.uber.org/zap/zapcore.(*CheckedEntry).Write(0xc00068e540, {0xc000148900, 0x1, 0x1})
17:03:02 etcd1 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/zapcore/entry.go:234 +0x499
17:03:02 etcd1 | go.uber.org/zap.(*Logger).Panic(0x7ffeefbff7fc, {0x1bfe548, 0x0}, {0xc000148900, 0x1, 0x1})
17:03:02 etcd1 | 	/Users/wachao/go/gopath/pkg/mod/go.uber.org/[email protected]/logger.go:227 +0x59
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:416 +0x974
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...}, ...)
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:228 +0x455
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdserver.bootstrap({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/bootstrap.go:79 +0x7d2
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdserver.NewServer({{0x7ffeefbff7fc, 0x6}, {0x0, 0x0}, {0x0, 0x0}, {{{0x24fae48, 0x0, 0x0}, 0x12a05f200, ...}, ...}, ...})
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdserver/server.go:302 +0x98
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/embed.StartEtcd(0xc0005a6700)
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/embed/etcd.go:244 +0x1258
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdmain.startEtcd(0x7ffeefbff7fc)
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:228 +0x1d
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2({0xc00003a480, 0x18, 0x18})
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/etcd.go:123 +0x14e5
17:03:02 etcd1 | go.etcd.io/etcd/server/v3/etcdmain.Main({0xc00003a480, 0x1a31720, 0x18})
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/etcdmain/main.go:40 +0x10c
17:03:02 etcd1 | main.main()
17:03:02 etcd1 | 	/Users/wachao/go/src/github.com/ahrtr/etcd/server/main.go:32 +0x2e
17:03:02 etcd1 | Terminating etcd1

The current implementation looks too fragile once the Auth is enabled. Will continue to investigate this. Anyone feels free to jump in and investigate & debug.

etcd_data_inconsistency.jmx.txt

@serathius
Copy link
Member

I didn't manage to reproduce it, all I get is "failed to apply request" and "user name is empty".

@ahrtr ahrtr changed the title etcd panic on startup [3.5.3] etcd panic on startup (auth enabled) Apr 14, 2022
@ahrtr ahrtr changed the title [3.5.3] etcd panic on startup (auth enabled) etcd panic on startup (auth enabled) Apr 14, 2022
@ahrtr
Copy link
Member Author

ahrtr commented Apr 14, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

2 participants