Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

etcd 3.2.x upgrade fails with tls: bad certificate"; please retry. #910

Closed
neolit123 opened this issue Jun 11, 2018 · 5 comments · Fixed by kubernetes/kubernetes#65020
Closed
Assignees
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Milestone

Comments

@neolit123
Copy link
Member

etcd upgrade at HEAD is failing with WARNING: 2018/06/11 19:24:21 Failed to dial 127.0.0.1:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

initial hint was about SAN and CN fields in the generated certificates.

refs:
https://github.com/coreos/etcd/blob/master/CHANGELOG-3.2.md#security-authentication-1
https://github.com/coreos/etcd/blob/master/CHANGELOG-3.2.md#security-authentication-8
etcd-io/etcd#8603
https://bugzilla.redhat.com/show_bug.cgi?id=1565762

@neolit123 neolit123 added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. area/upgrades lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Jun 11, 2018
@neolit123 neolit123 added this to the v1.11 milestone Jun 11, 2018
@detiber
Copy link
Member

detiber commented Jun 11, 2018

/assign

@detiber
Copy link
Member

detiber commented Jun 11, 2018

Using kubeadm from master, I can replicate this warning in the etcd pod logs, but the cluster is otherwise fully functional: sudo ./kubeadm init --kubernetes-version=v1.11.0-beta.2

@neolit123 neolit123 added the kind/bug Categorizes issue or PR as related to a bug. label Jun 11, 2018
@timothysc
Copy link
Member

xref: kubernetes/kubernetes#64988

@neolit123
Copy link
Member Author

@kubernetes/sig-cluster-lifecycle-bugs

is this related to the problem here?
etcd-io/etcd#9785 (comment)

@k8s-ci-robot k8s-ci-robot added the sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. label Jun 12, 2018
@neolit123
Copy link
Member Author

from @detiber at slack :

the tls warning is coming from the embedded grpc gateway that etcd configures... more info in a bit

Ok, so the grpc gateway is attempting to use the server certificate as a client certificate to interact with etcd, which is causing the tls warning in the logs (and also the grpc gateway to not actually be functional)
The fix is to create the etcd server certificate with server and client usages similar to the peer certificate
This shouldn't affect anything within k8s or etcdctl because they are using grpc, but could affect how end users might interact and debug their etcd deployment

I've also verified that the grpc gateway is non-functional before the fix and is functional after the fix

here is the offending code if you are curious:
https://github.com/coreos/etcd/blob/v3.2.18/embed/serve.go#L103-L104

adding server auth and client auth for the server profile seems to be a fix for this.

k8s-github-robot pushed a commit to kubernetes/kubernetes that referenced this issue Jun 12, 2018
Automatic merge from submit-queue (batch tested with PRs 64862, 65020). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

kubeadm - fix local etcd grpc gateway

**What this PR does / why we need it**:
etcd 3.2 uses the server certificate as the client cert for the grpc
gateway, this updates the generation of the etcd server certificate to
add client usage to resolve the issue.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes/kubeadm#910

**Release note**:
```release-note
NONE
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/upgrades kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants