Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MachineDeployment Status should take MachineHealthCheck into account when computing final status #7533

Closed
nehagjain15 opened this issue Nov 10, 2022 · 9 comments · Fixed by #8464
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@nehagjain15
Copy link

What steps did you take and what happened:
The cluster was provisioned with version 1.22.9+.
The cluster was then upgraded to v1.23.8+. At the time when the cluster was upgraded Machine Deployment status was running but MachineHealthCheck was still reporting some nodes as unhealthy.
Post k8s version upgrade Control plane nodes were upgraded to the new version but MD was not updated with the new k8s version because of the following:

Seen in the Cluster Description:

  - lastTransitionTime: "2022-10-31T19:58:16Z"
    message: MachineDeployment(s) tkc-wc-default-nodepool-87m2h, tkc-wc-wcm-nodepool-8f399767-n854h
      upgrade to version v1.22.9+vmware.1 on hold. MachineDeployment(s) tkc-wc-wcm-nodepool-8f399767-n854h
      are rolling out
    reason: MachineDeploymentsUpgradePending
# kubectl get kcp -A
NAMESPACE   NAME           CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
testns      tkc-wc-rqh9z   tkc-wc    true          true                   1          1       1         0             44m   v1.22.9+vmware.1

# kubectl get md -A
NAMESPACE   NAME                                 CLUSTER   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
testns      tkc-wc-default-nodepool-87m2h        tkc-wc    2          2       2         0             Running   44m   v1.21.6+vmware.1
testns      tkc-wc-wcm-nodepool-8f399767-n854h   tkc-wc    2          1       1         1             Running   28m   v1.21.6+vmware.1

##Snippet from Logs

I1031 15:22:37.003006       1 machinehealthcheck_controller.go:426] controller/machinehealthcheck "msg"="Target has failed health check, marking for remediation" "cluster"="tkc-wc" "name"="tkc-wc-wcm-nodepool-8f399767-n854h" "namespace"="testns" "reconciler group"="cluster.x-k8s.io" "reconciler kind"="MachineHealthCheck" "message"="Node failed to report startup in &Duration{Duration:2h0m0s,}" "reason"="NodeStartupTimeout" "target"="testns/tkc-wc-wcm-nodepool-8f399767-n854h/tkc-wc-wcm-nodepool-8f399767-n854h-6bdbcf6578-5t8hs/"

Currently, MachineDeployment status reports running for nodepool:tkc-wc-wcm-nodepool-8f399767-n854h but MHC clearly shows that all expected machines are not ready.

What did you expect to happen:
MachineDeployment status should not be running in the above scenario.
If the upgrade for MachineDeployment is blocked based on MachineHealthCheck then the status of MachineDeployment should take that into account.

Anything else you would like to add:
[Miscellaneous information that will assist in solving the issue.]

Environment:

  • Cluster-api version: 1.1.3
  • minikube/kind version:
  • Kubernetes version: (use kubectl version):
  • OS (e.g. from /etc/os-release):

/kind bug
[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 10, 2022
@sbueringer
Copy link
Member

sbueringer commented Nov 11, 2022

Thx for opening the issue

I think we might have an additional problem. In this situation afaik we can't create a new Machine with the old version to remediate

@fabriziopandini
Copy link
Member

fabriziopandini commented Nov 11, 2022

/triage accepted
Let's focus this issue on the status and what it should represent. (what @sbueringer is highlighted is correct, but worth a separate discussion given that this limitation is rooted down in kubeadm/image builder)

I have some question on the use case because it shows cluster status reporting an issue while the ask is to consider MHC, which surfaces at machine level with an annotation

According to current API, A machine is considered ready when the node has been created and is "Ready" (which does not include MHC failures at current state).
Can we get the yaml of machines under tkc-wc-default-nodepool-87m2h?

Changing this probably is an API change
/kind api-change

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 11, 2022
@fabriziopandini fabriziopandini added the kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API label Nov 11, 2022
@nehagjain15
Copy link
Author

@fabriziopandini I will repro the issue and attach the cluster and machine description.

@nehagjain15
Copy link
Author

Reproduced the issue:
Steps:

  1. Create cluster with 1.21.6 version
  2. Add a new nodepool to this cluster tkc-wc-wcm-nodepool
  3. Upgrade cluster to 1.22.9
  4. Observed that the new node pool was still rolling out so MD was not upgraded, but KCP was.

Will try to explain the problem that I see and that there could be different ways to fix it than what has been asked in the issue.

  1. At step#3 - MD was not updated because one nodepool was still being rolled out. May be desired version for CP should not be reconciled if MD is still rolling out as it could cause issues like in this case ( MD is stuck in rolling out step as the new machine cannot join the cluster due to a mismatch in kubeadm version
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to decode cluster configuration data: no kind "ClusterConfiguration" is registered for version "kubeadm.k8s.io/v1beta3" in scheme "k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/scheme/scheme.go:31"
  1. If rolling deployment can block version upgrade for MD, then MD status should take that into account when reporting running status. If an end user looks at just MachineDeployment, might think everything is ready and it is ok to perform other operations whereas in reality, it is not.
root@422cd89b861cec670c12eed9bd2b03b3 [ ~ ]# kubectl get cluster,kcp,md,machine -A
NAMESPACE   NAME                              PHASE         AGE   VERSION
testns      cluster.cluster.x-k8s.io/tkc-wc   Provisioned   16h   v1.22.9+vmware.1

NAMESPACE   NAME                                                             CLUSTER   INITIALIZED   API SERVER AVAILABLE   REPLICAS   READY   UPDATED   UNAVAILABLE   AGE   VERSION
testns      kubeadmcontrolplane.controlplane.cluster.x-k8s.io/tkc-wc-hvl7p   tkc-wc    true          true                   1          1       1         0             16h   v1.22.9+vmware.1

NAMESPACE   NAME                                                                    CLUSTER   REPLICAS   READY   UPDATED   UNAVAILABLE   PHASE     AGE   VERSION
testns      machinedeployment.cluster.x-k8s.io/tkc-wc-default-nodepool-m86kj        tkc-wc    2          2       2         0             Running   16h   v1.21.6+vmware.1
testns      machinedeployment.cluster.x-k8s.io/tkc-wc-wcm-nodepool-fb5d4258-sbl26   tkc-wc    2          1       1         1             Running   16h   v1.21.6+vmware.1

NAMESPACE   NAME                                                                           CLUSTER   NODENAME                                              PROVIDERID                                       PHASE         AGE   VERSION
testns      machine.cluster.x-k8s.io/tkc-wc-default-nodepool-m86kj-6784966b8c-mvsz6        tkc-wc    tkc-wc-default-nodepool-m86kj-6784966b8c-mvsz6        vsphere://422c7fbd-6d78-8f9c-bf73-64e85ba7fd7f   Running       16h   v1.21.6+vmware.1
testns      machine.cluster.x-k8s.io/tkc-wc-default-nodepool-m86kj-6784966b8c-wl8kj        tkc-wc    tkc-wc-default-nodepool-m86kj-6784966b8c-wl8kj        vsphere://422c80a1-ad81-c332-f63b-984d251c75d4   Running       16h   v1.21.6+vmware.1
testns      machine.cluster.x-k8s.io/tkc-wc-hvl7p-hhckm                                    tkc-wc    tkc-wc-hvl7p-hhckm                                    vsphere://422ce03d-19dc-6987-c4ad-7c31337eb119   Running       16h   v1.22.9+vmware.1
testns      machine.cluster.x-k8s.io/tkc-wc-wcm-nodepool-fb5d4258-sbl26-6cbf96df98-gb28b   tkc-wc    tkc-wc-wcm-nodepool-fb5d4258-sbl26-6cbf96df98-gb28b   vsphere://422c768a-a24a-d572-f8fc-e3b4e5eabd06   Running       16h   v1.21.6+vmware.1
testns      machine.cluster.x-k8s.io/tkc-wc-wcm-nodepool-fb5d4258-sbl26-6dd79d9ff-6qckr    tkc-wc                                                          vsphere://422cb1c3-d022-b308-4409-e56bd5d0944b   Provisioned   29m   v1.21.6+vmware.1

Attaching machine/cluster yaml file.
machine.yaml.txt

cluster.yaml.txt

@randomvariable
Copy link
Member

randomvariable commented Nov 18, 2022

I guess one of the key things in this particular issue is that kubeadm uses configmaps rather than CRDs and there's no conversion webhooks. That means if kubeadm does an API revision bump and upconverts the cluster config to it in the same release, then the immediately preceding release doesn't understand the config anymore. There's no supportable version skew in this scenario.

@fabriziopandini
Copy link
Member

echoing from above (trying to avoid scope creeping on the issue)

Let's focus this issue on the status and what it should represent. (what @sbueringer/@randomvariable highlighted is correct - the kubeadm issue -, but worth a separate discussion given that this limitation is rooted down in kubeadm/image builder)

I'm investigating the kubeadm issue; ASAP I will follow up with a separated issue and possibly some ideas

@randomvariable
Copy link
Member

randomvariable commented Nov 18, 2022

Fair enough.

I don't know if there's much we want to do with Phase. It's very tricky to convey a sensible meaning in a single status field. @nehagjain15 , as much as possible I would avoid relying on Phase in favour of conditions.

What might be more interesting, on the above basis, is to set a Condition. In this error state we have:

Status:
  Available Replicas:  1
  Conditions:
    Last Transition Time:  2022-11-18T07:26:08Z
    Status:                True
    Type:                  Ready
    Last Transition Time:  2022-11-18T07:26:08Z
    Status:                True
    Type:                  Available
  Observed Generation:     2
  Phase:                   Running
  Ready Replicas:          1
  Replicas:                2
  Selector:                cluster.x-k8s.io/cluster-name=tkc-wc,topology.cluster.x-k8s.io/deployment-name=wcm-nodepool-fb5d4258,topology.cluster.x-k8s.io/owned=
  Unavailable Replicas:    1
  Updated Replicas:        1

Maybe we want an explicit Condition indicating there is at least one bad replica.

@sbueringer
Copy link
Member

sbueringer commented Nov 21, 2022

I have some question on the use case because it shows cluster status reporting an issue while the ask is to consider MHC, which surfaces at machine level with an annotation

@fabriziopandini I'm probably wrong but I thought the MHC controller sets the OwnerRemediated (false) condition on the Machine to indicate the Machine should be remediated. Which annotation do you mean?

Independent of that. To me it seems we have a mismatch between when the Cluster topology controller considers a MD as "rolling out" vs what you can see at the MD (specifically the phase).

Topology controller code: https://github.com/kubernetes-sigs/cluster-api/blob/main/internal/controllers/topology/cluster/scope/state.go#L96-L98

@nehagjain15 Can you please provide the MachineDeployment YAML at the time when the topology controller is stuck?

EDIT: I assume what Naadir posted is the status of the MD at the time where a Machine is remediated? (sorry it's hard to guess :))

I think on a high-level we have the following options:

  1. Bubble up the Machine condition in some way up until the MD as we already discussed a few times in the past (Assuming I'm correct with the assumption that MHC adds a condition to the Machine)
  2. Consider more information in the calculation of Phase: Running
    • It can be argued that "// MachineDeploymentPhaseRunning indicates scaling has completed and all Machines are running." is not true when there are unavailable replicas. But on the other side, we don't really have a phase which fits this use case.

I think 2. is not really an option as I think our current idea is to surface things like this through conditions and not somehow through phase.

Looking a bit closer into how remediation in the MS controller works. Looks like the "OwnerRemediated" condition will be there only for a few seconds before the Machine is deleted. So the better approach is probably to take the information we have about spec replicas, available replicas, ... and surface a condition which expresses if the entire MD is ready/running/... (not sure what the right wording is).

@sbueringer
Copy link
Member

sbueringer commented Apr 25, 2023

@nehagjain15 Issue should be resolved now. Basically we don't block MD upgrades anymore if other MachineDeployments are rolling out for other reasons (e.g. MHC remediation).

This will be released in v1.4.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants