Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements for etcd liveness probes #2567

Closed
randomvariable opened this issue Sep 10, 2021 · 25 comments · Fixed by kubernetes/kubernetes#110744
Closed

Improvements for etcd liveness probes #2567

randomvariable opened this issue Sep 10, 2021 · 25 comments · Fixed by kubernetes/kubernetes#110744
Assignees
Labels
area/etcd kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.

Comments

@randomvariable
Copy link
Member

randomvariable commented Sep 10, 2021

What keywords did you search in kubeadm issues before filing this one?

This is related to kubernetes/kubernetes#96886
and etcd-io/etcd#13340

Is this a BUG REPORT or FEATURE REQUEST?

FEATURE REQUEST

Versions

kubeadm version (use kubeadm version): v1.22, etcd v3.5.0

What happened?

Under certain cluster conditions, such as an entire cluster being powered off and on, then you may not want etcd pods restarted if no raft leader is present since leader election is taking place. There are more details in the etcd-io/etcd#13340, in which we have requested that there is either a lightweight /ready endpoint added to etcd or to allow /health to take additional query parameters that would allow us to relax the constraints.

Once that's in place, downstream consumers can use the JSON patch method to patch the etcd static pod (e.g. kubernetes-sigs/cluster-api#4874).

However, we may also want to change the defaults for kubeadm but we should do some modelling about what state transitions and cluster conditions we care about.

Additionally, I think we will also want to relax the consistency constraints as part of learner mode adoption.

This is mostly a tracking issue for possible changes to etcd that we can consume.

What you expected to happen?

Turned off clusters can be restarted

How to reproduce it (as minimally and precisely as possible)?

I have a feeling this is also related to kubernetes-sigs/kind#1689 , so improvements may be testable


TODOs

1.25:

@neolit123 neolit123 added area/etcd kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Sep 10, 2021
@neolit123 neolit123 added this to the v1.23 milestone Sep 10, 2021
@pacoxu
Copy link
Member

pacoxu commented Sep 13, 2021

Adding /health?exclude=NORAFTLEADER&consistency=s seems to be current workaround solution in kubeadm.

@neolit123
Copy link
Member

neolit123 commented Sep 13, 2021 via email

@pacoxu
Copy link
Member

pacoxu commented Sep 13, 2021

How to easily reproduce it with a simple script?

@randomvariable
Copy link
Member Author

I think starting a HA kind cluster, and then shutting down the containers and restarting them might be sufficient. I have a feeling it's related to kubernetes-sigs/kind#1689

@neolit123
Copy link
Member

looks like etcd 3.6 will have the missing feature that we wanted.
etcd-io/etcd#13340

we are in code freeze for 1.23, but this seems like something that can be added once 1.24 starts and backported to older releases (assuming it's not a big diff in kubeadm).

@neolit123 neolit123 modified the milestones: v1.23, v1.24 Nov 22, 2021
@palnabarun
Copy link
Member

palnabarun commented Nov 23, 2021

It also depends on when etcd 3.6 would be released. Does etcd follow a defined timeline? Unless I am missing something, I couldn't find much with my searching the web skills.

PS: If etcd 3.6 gets released before Code Freeze for Kubernetes 1.24, Whenever the next version of etcd gets released with the above feature, I would like to work on this.

@neolit123
Copy link
Member

Etcd does not have a fixed cadence for minors AFAIK.
It took two years between .4 and .5 appatently.
But prior releases were only one year appart.

Judging from that i don't think this will align with 1.24 k8s.

@palnabarun
Copy link
Member

Can we ask the etcd team if they would like to cut a patch release with the said feature?

@neolit123
Copy link
Member

i don't think they will agree to that, but worth a try if someone wants to do it.

@ahrtr
Copy link
Member

ahrtr commented Feb 17, 2022

FYI. Back porting the PR to etcd 3.5, etcd-io/etcd#13706

@ahrtr
Copy link
Member

ahrtr commented Feb 17, 2022

cc @serathius

@neolit123
Copy link
Member

@ahrtr great. If the etcd backport is accepted we can try backporting a kubeadm etcd bump.

@pacoxu
Copy link
Member

pacoxu commented Feb 22, 2022

The backport PR to v3.5 etcd-io/etcd#13706 was merged and we may wait for etcd v3.5.3.

@ahrtr
Copy link
Member

ahrtr commented Feb 22, 2022

Yes, the PR was merged, and I just submitted another PR pull/13725 to update the 3.5 changelog.

@neolit123
Copy link
Member

etcd bump to 3.5.3 merged in 1.24 /master.
kubernetes/kubernetes#109471

here are backports for 1.23 and 1.22:
kubernetes/kubernetes#109533
kubernetes/kubernetes#109532

after / if these merge we would want to backport and enable the new probes conditionally for kubeadm versions that use 3.5.3.

@neolit123
Copy link
Member

neolit123 commented Apr 18, 2022

i think what we have to do to enable the new check is the following in the kubeadm etcd manifest:

    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
-        path: /health
+        path: /health/serializable=true
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      timeoutSeconds: 15

but it must be done only for k8s control plane version >= 1.22 (if the above cherry picks merge, that is)

@neolit123
Copy link
Member

neolit123 commented May 11, 2022

once we add etcd 3.5.3 to the support skew, would anyone like to work on this?

3.5.3 backports:
kubernetes/kubernetes#109471
kubernetes/kubernetes#109532
kubernetes/kubernetes#109533

@neolit123 neolit123 self-assigned this May 16, 2022
@neolit123 neolit123 added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label May 16, 2022
@neolit123
Copy link
Member

1.25 PR is here:
kubernetes/kubernetes#110072

@neolit123 neolit123 added the kind/bug Categorizes issue or PR as related to a bug. label May 16, 2022
@neolit123
Copy link
Member

neolit123 commented May 16, 2022

@ahrtr

i've tested the new HTTP endpoint with etcd 3.5.3 and it seems to be failing.

EDIT: NVM, as @VirrageS pointed out on the PR it should be health?serializable=true i.e. serializable is a query parameter.

@ahrtr
Copy link
Member

ahrtr commented May 16, 2022

The reason is that the PR etcd/pull/13525 isn't backported to 3.5.

It's a little subtle. Let's work with an example, assuming there are an etcd cluster with 3 members. There are two cases here:

  1. In the beginning, all 3 members ( or at least 2 members) are working well. Somehow 1 or 2 members are gone/down for whatever reason. Then the left member(s) can still serve serializable requests;
  2. In the beginning, all 3 members ( or at least 2 members) are working well. If you stop all the 3 members, and only restart 1 of them afterwards. Since the quorum isn't satisfied, so the member will not serve client requests, even serializable requests, until the quorum is satisfied. I believe this should be your case. Please double confirm.

Probably we need to backport the PR to 3.5 as well. But it's an enhancement, I need to discuss with other etcd maintainers. Please also let me know whether you really need it, or probably you adjust/update K8s test to adapt to case 1?

@neolit123
Copy link
Member

@ahrtr

The reason is that the PR etcd-io/etcd#13525 isn't backported to 3.5.

the issue that i saw earlier was due to a mistake on my end - using /health/serializable=true instead of health?serializable=true. after switching to health?serializable=true startup / liveness probes work as expected, at least for a new cluster with healthy members.

In the beginning, all 3 members ( or at least 2 members) are working well. If you stop all the 3 members, and only restart 1 of them afterwards. Since the quorum isn't satisfied, so the member will not serve client requests, even serializable requests, until the quorum is satisfied. I believe this should be your case. Please double confirm.

that sounds like the scenario the OP describes here.

Probably we need to backport the PR to 3.5 as well. But it's an enhancement, I need to discuss with other etcd maintainers. Please also let me know whether you really need it, or probably you adjust/update K8s test to adapt to case 1?

it's unclear to me if these additional changes are needed or not.

@ahrtr
Copy link
Member

ahrtr commented May 16, 2022

The key point is etcd can't finish the bootstrap/startup process if the quorum isn't satisfied. So it can't serve any client requests, even serializable requests. It is exactly what the PR 13525 fixed.

But once etcd finishes the bootstrap/startup process, it can continue to serve serializable requests if the quorum isn't satisfied.

@neolit123
Copy link
Member

neolit123 commented May 16, 2022

The key point is etcd can't finish the bootstrap/startup process if the quorum isn't satisfied. So it can't serve any client requests, even serializable requests. It is exactly what the PR etcd-io/etcd#13525 fixed.

this sounds like a good argument for the backport of etcd-io/etcd#13525

@ahrtr
Copy link
Member

ahrtr commented May 16, 2022

Let me submit a PR for the backport and get feedback from other maintainer.

@neolit123
Copy link
Member

neolit123 commented Jun 23, 2022

@ahrtr
PTAL at this for LGTM:
kubernetes/kubernetes#110744

NOTE: it closes this issue because there isn't much else we can do here.

it's following your recommendation here:
etcd-io/etcd#14048 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/etcd kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete.
Projects
None yet
5 participants