-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ETCD with TLS showing warning "transport: authentication handshake failed: remote error: tls: bad certificate" #9785
Comments
When I replaced the server certificate with the peer certificate, the warning was gone. Why? # -profile=peer
$ cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer server.json | cfssljson -bare server $ etcd --name infra0 --data-dir infra0 \
--client-cert-auth --trusted-ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem \
--advertise-client-urls https://127.0.0.1:2379 --listen-client-urls https://127.0.0.1:2379
2018-05-29 11:21:09.053070 I | etcdmain: etcd Version: 3.3.5
2018-05-29 11:21:09.053133 I | etcdmain: Git SHA: 70c872620
2018-05-29 11:21:09.053141 I | etcdmain: Go Version: go1.9.6
2018-05-29 11:21:09.053146 I | etcdmain: Go OS/Arch: linux/amd64
2018-05-29 11:21:09.053152 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-05-29 11:21:09.053557 I | embed: listening for peers on http://localhost:2380
2018-05-29 11:21:09.053597 I | embed: listening for client requests on 127.0.0.1:2379
2018-05-29 11:21:09.055180 I | etcdserver: name = infra0
2018-05-29 11:21:09.055195 I | etcdserver: data dir = infra0
2018-05-29 11:21:09.055202 I | etcdserver: member dir = infra0/member
2018-05-29 11:21:09.055207 I | etcdserver: heartbeat = 100ms
2018-05-29 11:21:09.055212 I | etcdserver: election = 1000ms
2018-05-29 11:21:09.055220 I | etcdserver: snapshot count = 100000
2018-05-29 11:21:09.055230 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-05-29 11:21:09.055237 I | etcdserver: initial advertise peer URLs = http://localhost:2380
2018-05-29 11:21:09.055246 I | etcdserver: initial cluster = infra0=http://localhost:2380
2018-05-29 11:21:09.056700 I | etcdserver: starting member 8e9e05c52164694d in cluster cdf818194e3a8c32
2018-05-29 11:21:09.056732 I | raft: 8e9e05c52164694d became follower at term 0
2018-05-29 11:21:09.056747 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0]
2018-05-29 11:21:09.056753 I | raft: 8e9e05c52164694d became follower at term 1
2018-05-29 11:21:09.059841 W | auth: simple token is not cryptographically signed
2018-05-29 11:21:09.061318 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided]
2018-05-29 11:21:09.061669 I | etcdserver: 8e9e05c52164694d as single-node; fast-forwarding 9 ticks (election ticks 10)
2018-05-29 11:21:09.062072 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-29 11:21:09.063469 I | embed: ClientTLS: cert = server.pem, key = server-key.pem, ca = , trusted-ca = ca.pem, client-cert-auth = true, crl-file =
2018-05-29 11:21:09.657081 I | raft: 8e9e05c52164694d is starting a new election at term 1
2018-05-29 11:21:09.657149 I | raft: 8e9e05c52164694d became candidate at term 2
2018-05-29 11:21:09.657179 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 2
2018-05-29 11:21:09.657203 I | raft: 8e9e05c52164694d became leader at term 2
2018-05-29 11:21:09.657215 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 2
2018-05-29 11:21:09.657608 I | etcdserver: setting up the initial cluster version to 3.3
2018-05-29 11:21:09.658381 N | etcdserver/membership: set the initial cluster version to 3.3
2018-05-29 11:21:09.658457 I | etcdserver/api: enabled capabilities for version 3.3
2018-05-29 11:21:09.658520 I | etcdserver: published {Name:infra0 ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-29 11:21:09.658536 I | embed: ready to serve client requests
2018-05-29 11:21:09.658751 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-05-29 11:21:09.712055 I | embed: serving client requests on 127.0.0.1:2379 |
@JinsYin your config defines server profile as server auth only while peer profile has both server auth and client auth extensions. I see how this is confusing as the example uses server in the file name.
So it seems as soon as client auth is attempted it fails because the server config does not output certificates that will facilitate client auth. This is how I read it at least. ref https://github.com/cloudflare/cfssl/blob/master/doc/cmd/cfssl.txt |
@hexfusion I agree. My confusion is why etcd server needs client auth. |
When I set the # server auth & --client-cert-auth=false
$ etcd --name infra0 --data-dir infra0 \
--client-cert-auth=false --trusted-ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem \
--advertise-client-urls https://127.0.0.1:2379 --listen-client-urls https://127.0.0.1:2379
2018-05-30 11:43:23.150450 I | etcdmain: etcd Version: 3.3.5
2018-05-30 11:43:23.150561 I | etcdmain: Git SHA: 70c872620
2018-05-30 11:43:23.150577 I | etcdmain: Go Version: go1.9.6
2018-05-30 11:43:23.150590 I | etcdmain: Go OS/Arch: linux/amd64
2018-05-30 11:43:23.150602 I | etcdmain: setting maximum number of CPUs to 4, total number of available CPUs is 4
2018-05-30 11:43:23.150699 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-05-30 11:43:23.151409 I | embed: listening for peers on http://localhost:2380
2018-05-30 11:43:23.151494 I | embed: listening for client requests on 127.0.0.1:2379
2018-05-30 11:43:23.152450 I | etcdserver: name = infra0
2018-05-30 11:43:23.152471 I | etcdserver: data dir = infra0
2018-05-30 11:43:23.152484 I | etcdserver: member dir = infra0/member
2018-05-30 11:43:23.152496 I | etcdserver: heartbeat = 100ms
2018-05-30 11:43:23.152516 I | etcdserver: election = 1000ms
2018-05-30 11:43:23.152529 I | etcdserver: snapshot count = 100000
2018-05-30 11:43:23.152550 I | etcdserver: advertise client URLs = https://127.0.0.1:2379
2018-05-30 11:43:23.153964 I | etcdserver: restarting member 8e9e05c52164694d in cluster cdf818194e3a8c32 at commit index 14
2018-05-30 11:43:23.154047 I | raft: 8e9e05c52164694d became follower at term 7
2018-05-30 11:43:23.154074 I | raft: newRaft 8e9e05c52164694d [peers: [], term: 7, commit: 14, applied: 0, lastindex: 14, lastterm: 7]
2018-05-30 11:43:23.158976 W | auth: simple token is not cryptographically signed
2018-05-30 11:43:23.161144 I | etcdserver: starting server... [version: 3.3.5, cluster version: to_be_decided]
2018-05-30 11:43:23.162710 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32
2018-05-30 11:43:23.163138 N | etcdserver/membership: set the initial cluster version to 3.3
2018-05-30 11:43:23.163261 I | etcdserver/api: enabled capabilities for version 3.3
2018-05-30 11:43:23.165712 I | embed: ClientTLS: cert = server.pem, key = server-key.pem, ca = , trusted-ca = ca.pem, client-cert-auth = false, crl-file =
2018-05-30 11:43:25.054746 I | raft: 8e9e05c52164694d is starting a new election at term 7
2018-05-30 11:43:25.054839 I | raft: 8e9e05c52164694d became candidate at term 8
2018-05-30 11:43:25.054875 I | raft: 8e9e05c52164694d received MsgVoteResp from 8e9e05c52164694d at term 8
2018-05-30 11:43:25.054908 I | raft: 8e9e05c52164694d became leader at term 8
2018-05-30 11:43:25.054930 I | raft: raft.node: 8e9e05c52164694d elected leader 8e9e05c52164694d at term 8
2018-05-30 11:43:25.056827 I | etcdserver: published {Name:infra0 ClientURLs:[https://127.0.0.1:2379]} to cluster cdf818194e3a8c32
2018-05-30 11:43:25.056909 I | embed: ready to serve client requests
2018-05-30 11:43:25.057110 E | etcdmain: forgot to set Type=notify in systemd service file?
2018-05-30 11:43:25.113424 I | embed: serving client requests on 127.0.0.1:2379 |
I found this issue as I was troubleshooting issues that arose during an etcd upgrade from 3.1.x to 3.2.x using kubeadm. After some debugging I was able to determine that the new (as of etcd 3.2.x) client usage requirement of the serving certificate is due to the use of the server certificate as a client certificate for the grpc gateway. This requirement doesn't appear to be documented in any of the places I would expect, such as: Ideally, I would expect there to be a configuration option to specify a separate client cert for the grpc gateway (and tangentially also be able to specify separate client/server certs for the peer certificates as well). |
TL;DR: How to fix the issue: ca-config.json: add "client auth" to the "server" section
Regenerate the cert
Check server certificate: (I copied it to /etc/etcd/server.pem)
Environment vars:
Run etcd
|
Btw, even after the issue was fixed, I still see a lot of messages like this in log:
I feel like it could be related to health checks from a Network Load Balancer. |
@JinsYin For your confusion about server and client auth, here is the up to date documentation on etcd tls setup, example 1 refers to "client-cert-auth" situation and example 2 refers to "client-cert-auth" set to true. Thanks to @KIVagant 's detailed demo! @KIVagant for your "embed: rejected connection from "35.111.222.111:41886" (error "EOF", ServerName "")" comment, may I ask if you are using etcd in k8s? Because there is a bug in k8s that would lead to that. If you are, I will add more details, never mind if not. |
@wenjiaswe , I'm preparing etcd for K8s but there is nothing else except ETCD, its network balancer and bastion host in my test google cloud. But I see 4 or 5 different IP addresses that are trying to connect to etcd, so I'm still don't really know where they come from. |
Yes, it's just a clean isolated installation of ETCD. |
I'm also seeing a ton of the "rejected connection" errors, running vanilla etcd (likewise, preparing for k8s) on EC2. I've been following the instructions here: https://github.com/kelseyhightower/kubernetes-the-hard-way/blob/master/docs/04-certificate-authority.md and I suspect that one or more of my certificates are missing something, or need to be tweaked. Still digging into it, but any advice would be much appreciated. |
@mindcrime first of all check if the cluster works. I believe there is a big difference between working cluster when something external tries to connect to the port and when cluster's nodes really can't join. In my case all nodes operate normal and I can get members info and put messages. |
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to a gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to the gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
I ran into this as well. Adding client usage fixed it. I agree that there should be an option for separate client cert for this purpose instead of hijacking the server certificate for this purpose! |
@KIVagant edit Figured it out. Use the documentation from Kubernetes here: You want to utilize the |
Having trouble deploying etcd-cluster on k8s using bitnami charts i'm getting a lot of |
It seems that I had a mistake with addressing the etcd from etcdctl from within the pod, I |
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to the gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to the gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to the gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
Avoid issues like [1]: WARNING: 2018/05/29 11:17:10 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry. In the discussion there, the issue seems to be that etcd 3.2 started requiring the client usage for the server cert, which is (for some reason) used when connecting to the gRPC gateway [2,3]. [1]: etcd-io/etcd#9785 (comment) [2]: etcd-io/etcd#9785 (comment) [3]: https://github.com/etcd-io/etcd/blob/v3.3.10/Documentation/dev-guide/api_grpc_gateway.md
This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions. |
Yes, you need to specify --cacert ./ca.crt, --cert ./server.crt and --key ./server.key flags for it to work. Looks like you figured it out. I am closing this issue. |
|
I still have this problem. I have added the etcd server certificate to be used as client certificate authentication, but the following error still occurs: |
I have the issue |
I refer to the following two articles:
Initialize a certificate authority
Generate server certificate
Etcd Server
The text was updated successfully, but these errors were encountered: