-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workcluster(one control plane + one node) upgrade #4493
Comments
@chymy: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@chymy It does. Our upgrade test uses 1 control plane node, a machine deployment with 3 replicas and a machinepool with 3 replicas. This is tested for:
Test runs: https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api#capi-e2e-main-1-18-1-19 I think we'll need more details to find out why it fails in your case. |
Thanks @sbueringer , It should be related to infrastructure provider. |
Hi, @sbueringer, cluster-api/controlplane/kubeadm/controllers/scale.go Lines 118 to 142 in 46dc43a
I create a workcluster, Thanks. |
While removing etcd member, should also stopping apiserver |
Let's reopen the issue for now to discuss this point. I cannot really say why the code is as it is right now, so let's see what others will say. /reopen |
@sbueringer: Reopened this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
After testing, before removing etcd member, remove the static yaml of apiserver first, and delete the controlplane node is successful. So just make sure that the apisever stops and doesn't start any more, the load balancer will not connect to old controlplane node. |
/retitle workcluster(one control plane + one node) upgrade |
After testing, cluster-api/controlplane/kubeadm/controllers/scale.go Lines 118 to 142 in 46dc43a
|
/assign @sbueringer A good opportunity here is also to write a single node control plane upgrade e2e test in our periodics |
@vincepri Just reviewed the whole issue again. We already have e2e tests which are testing this, our regular upgrade tests. (#4493 (comment)). I don't think I've ever seen this issue in CI. Short summary (from my point of view), our code does the following right now:
The problem seems to be that after removing the etcd member the API server on the same machine goes down and the loadbalancer still sends requests to this node even though the API server is down (which I guess can always happen until the machine is dropped from the lb pool). So I assume this maybe doesn't happen in CI because CAPD is relatively fast. I wonder why this isn't just retried if We could also change the order, I think that's safe although I'm not entirely sure. I think overall our code has to be robust enough to handle occasionally failed requests (and it usually is, because we just retry failed reconciles). So changing the order would just be hacking around the problem (imho). @fabriziopandini @vincepri WDYT? |
I don't see problem in changing the order (also kubeadm doesn't use the list of etcd endpoint in the config map anymore since 1.18 and it was removed in 1.22, see #4423) However I agree we should understand where our code is not re-entrant. |
After removing the etcd member, the API server on the same machine goes down, but the API server status is
It can be seen from the above, the etcd member has been removed. the API server pod status is
|
@chymy Looks like the retry works as expected. I would have expected a 50% success ratio with 2 control plane nodes. Which provider are you using? I'm a bit confused:
Is that because the API server is still up but all requests fail because it cannot communicate with the etcd on the same host? |
I use openstack. Sorry. yes, the API server is still up(not down). LB and APIServer are long connections, it seems that it is feasible to retry the work |
@fabriziopandini What do you think? I would have assumed our retries would succeed eventually because we should hit the still working API server at some point, but that doesn't seem to be the case. Ideally the loadbalancer would either drop the not-working API server at some point because of failing health checks or we would sooner or later get the request through to the working API server. As that's not the case it would be fine for me to change the order to mitigate the issue (as it's not a problem for us to change it). |
In fact, by changing the order, LB can detect that the old apisever is no longer working, so it can connect to the new apisever |
Just out of curiosity (I don't really know how this stuff works): How does that work? I thought we only remove the server from the ConfigMap. How does that signal to the loadbalancer that the API server no longer works? |
I think |
TBH, I think that something should be tuned in the LB configuration, given that according to the log above it can't detect the API server failed in 4 minutes, and all the calls goes on a single endpoint (stickiness?). This is a problem no matter of the order of the tree operations we are discussing, because other API servers calls are going happen for sure immediately after. However, as per #4493 (comment), I don't see particular problems in the proposed change, and if this help in making our code a little bit more robust for such use cases... |
Sorry missed something before. I think it might be dangerous to change the order. New code would be:
This means (imho) that we have a race condition between deletion of the server and moving etcd leadership. I'm not sure if that's really a good idea. Forwarding etcd leader ship is probably really fast in most cases (not sure what might slow it down), but if the server deletion is faster this can probably lead to some really bad effects (?). EDIT: obvious option I missed:
But I'm not sure if that's a good idea either. It feels safer/cleaner to first remove the etcd member and then the machine/server. (when using kubeadm (in the usual case without CAPI) you also first remove the member via |
@fabriziopandini what do you think, should we still change the order? |
The only change of order that makes sense IMO is
Because delete machine should always be called after completing all the actions on the components hosted on that machine. However the context and limitations of this fix described in #4493 (comment) still holds (most probably there is something to be fixed in the LB configuration because if this takes a long time to detect problems we are going to incur in errors on delete or on subsequent API calls) |
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs. This bot triages issues and PRs according to the following rules:
You can:
Please send feedback to sig-contributor-experience at kubernetes/community. /lifecycle stale |
/area control-plane |
Given that this does not seem a problem and probably the issue in this thread is related to LB misconfigurations |
@fabriziopandini: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I created a workcluster: one master + one node, upgrading the workcluster always fails. Does CAPI not support the upgrade for
single master
?Environment:
kubectl version
):/etc/os-release
):/kind bug
/help
The text was updated successfully, but these errors were encountered: