[kubeadm control plane]: etcd communication errors are being swallowed #2454

sethp-nr · 2020-02-26T17:02:59Z

What steps did you take and what happened:

While testing an upgrade the etcd health checks were failing repeatedly. With the code from #2451 in place I could resolve it down one level:

failed to create etcd client: unable to create etcd client: context deadline exceeded

After some work, I found that my etcd ca secret was regenerated, changing the private key (see: #2454). It seems that GRPC has exactly one error message when the connection is misconfigured, and that's "context deadline exceeded." I haven't yet found a way to get more information on what happened via the API, but I'm continuing to dig.

What did you expect to happen:

When I set up the same condition with etcdctl and k port-forward I got a helpful error message:

{"level":"warn","ts":"2020-02-25T20:50:29.757-0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-f67a0407-f684-4682-bb79-ec33c94b2178/127.0.0.1:63477","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = latest connection error: connection error: desc = \"transport: authentication handshake failed: remote error: tls: bad certificate\""}
Error: context deadline exceeded

Note the Error: context deadline exceeded is what came back from clientv3.New, and the other is a log statement being printed to stderr.

Anything else you would like to add:

I found that any error sent back from the proxy dial function was being swallowed in the same way. It also looks like we're not using the errorStream we set up with the API Server, so it's possible that we'd miss important information about the proxy connection.

Environment:

Cluster-api version: master
Minikube/KIND version: kind v0.7.0 go1.13.6 darwin/amd64
Kubernetes version: (use kubectl version): a mix of v1.15 and v1.16 control plane nodes
OS (e.g. from /etc/os-release): ubuntu

/kind bug
/assign
/lifecycle active

The text was updated successfully, but these errors were encountered:

vincepri · 2020-02-26T17:59:57Z

/milestone v0.3.0

vincepri · 2020-03-09T20:34:50Z

/milestone v0.3.x

vincepri · 2020-03-09T20:35:11Z

Bumping this given that it seems the upstream grpc fix might not go in soon

vincepri · 2020-04-30T23:12:31Z

@sethp-nr Not sure if you saw this response, should we leave things as they are if the upstream fix isn't merged?

sethp-nr · 2020-05-01T23:53:57Z

It ended up going in as an option under a different PR after some discussion in the linked thread (culminating here: grpc/grpc-go#2031 (comment) ).

There's still some work in getting it into a released version & getting etcd to be compatible with that version & picking that version up here and then we can finally replace WithBlock with WithReturnLastError and see clearer errors.

I'm working that in fits and starts when I have time, but if someone else wanted to do it I would not get in their way.

fejta-bot · 2020-07-30T23:58:01Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

vincepri · 2020-07-31T15:09:49Z

/lifecycle frozen

vincepri · 2020-07-31T15:11:35Z

The PR changes with WithReturnConnectionError has been merged and available from v1.30.0

/milestone v0.4.0

vincepri · 2020-07-31T15:11:49Z

/help

k8s-ci-robot · 2020-07-31T15:11:50Z

@vincepri:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vincepri · 2020-07-31T15:11:58Z

/priority important-longterm

vincepri · 2021-10-19T14:13:58Z

/milestone v1.0

sbueringer · 2022-02-18T18:35:33Z

/assign @killianmuldoon
to re-assess the current state

timoreimann · 2022-02-19T07:12:10Z

FWIW I submitted #4997 some time ago that addressed at least one occurrence of error hiding / swallowing. Not sure if this bug report is about more though.

fabriziopandini · 2022-09-30T19:18:01Z

/close

until we get more evidence that there are still other occurrences of this error after #4997 merged

k8s-ci-robot · 2022-09-30T19:18:05Z

@fabriziopandini: Closing this issue.

In response to this:

/close

until we get more evidence that there are still other occurrences of this error after #4997 merged

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot assigned sethp-nr Feb 26, 2020

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. labels Feb 26, 2020

sethp-nr mentioned this issue Feb 26, 2020

[kubeadm control plane] upgrade: etcd CA was regenerated #2455

Closed

sethp-nr changed the title ~~KubeadmControlPlane: etcd communication errors are being swallowed~~ [kubeadm control plane]: etcd communication errors are being swallowed Feb 26, 2020

k8s-ci-robot added this to the v0.3.0 milestone Feb 26, 2020

This was referenced Feb 27, 2020

🏃 refactor the etcd client in the cluster object #2470

Merged

🏃 Add unit tests for health check function #2472

Merged

This was referenced Feb 28, 2020

grpc provides uninformative error messages, even when set to "block" grpc/grpc-go#3406

Closed

🐛 etcd client terseness #2486

Closed

k8s-ci-robot modified the milestones: v0.3.0, v0.3.x Mar 9, 2020

vincepri removed the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 30, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 30, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 31, 2020

k8s-ci-robot modified the milestones: v0.3.x, v0.4.0 Jul 31, 2020

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Jul 31, 2020

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 31, 2020

k8s-ci-robot modified the milestones: v0.4, v1.0 Oct 19, 2021

vincepri modified the milestones: v1.0, v1.1 Oct 22, 2021

fabriziopandini modified the milestones: v1.1, v1.2 Feb 3, 2022

k8s-ci-robot assigned killianmuldoon Feb 18, 2022

fabriziopandini added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

fabriziopandini removed this from the v1.2 milestone Jul 29, 2022

fabriziopandini removed the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Jul 29, 2022

k8s-ci-robot closed this as completed Sep 30, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[kubeadm control plane]: etcd communication errors are being swallowed #2454

[kubeadm control plane]: etcd communication errors are being swallowed #2454

sethp-nr commented Feb 26, 2020 •

edited

Loading

vincepri commented Feb 26, 2020

vincepri commented Mar 9, 2020

vincepri commented Mar 9, 2020

vincepri commented Apr 30, 2020

sethp-nr commented May 1, 2020

fejta-bot commented Jul 30, 2020

vincepri commented Jul 31, 2020

vincepri commented Jul 31, 2020

vincepri commented Jul 31, 2020

k8s-ci-robot commented Jul 31, 2020

vincepri commented Jul 31, 2020

vincepri commented Oct 19, 2021

sbueringer commented Feb 18, 2022

timoreimann commented Feb 19, 2022 •

edited

Loading

fabriziopandini commented Sep 30, 2022

k8s-ci-robot commented Sep 30, 2022

[kubeadm control plane]: etcd communication errors are being swallowed #2454

[kubeadm control plane]: etcd communication errors are being swallowed #2454

Comments

sethp-nr commented Feb 26, 2020 • edited Loading

vincepri commented Feb 26, 2020

vincepri commented Mar 9, 2020

vincepri commented Mar 9, 2020

vincepri commented Apr 30, 2020

sethp-nr commented May 1, 2020

fejta-bot commented Jul 30, 2020

vincepri commented Jul 31, 2020

vincepri commented Jul 31, 2020

vincepri commented Jul 31, 2020

k8s-ci-robot commented Jul 31, 2020

vincepri commented Jul 31, 2020

vincepri commented Oct 19, 2021

sbueringer commented Feb 18, 2022

timoreimann commented Feb 19, 2022 • edited Loading

fabriziopandini commented Sep 30, 2022

k8s-ci-robot commented Sep 30, 2022

sethp-nr commented Feb 26, 2020 •

edited

Loading

timoreimann commented Feb 19, 2022 •

edited

Loading