Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972

primeroz · 2022-12-29T15:15:04Z

/kind bug

What steps did you take and what happened:

create a Cluster with a single machine pool ( see below for the actual manifests ) with 3 nodes
Once the pool is stable change the vmSize ( from Standard_D2s_v5 to Standard_D4s_v5 )
the pool goes into ScaleSetOufOfDate state and one node gets replaced
- The pool is grown to 4 nodes ( by the capz controller i guess )
- when the new machine is ready the pool is shrunk to 3 nodes ( with a default strategy of deleting the oldest node )
This only happens for one node instead of repeating for every node in the pool that is still running on the old instance type

What did you expect to happen:
All nodes to be replaced using the same process that was used for the first node

Anything else you would like to add:
The docs suggests that when changing vmSize or kubernetesVersion a full refresh of nodes in the nodepool should be performed

Machine pool manifests ( pre-change )

---
apiVersion: cluster.x-k8s.io/v1beta1
kind: MachinePool
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: "fctest1"
  name: fctest1-def00
  namespace: org-multi-project
spec:
  clusterName: fctest1
  replicas: 3
  template:
    spec:
      bootstrap:
        configRef:
          apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
          kind: KubeadmConfig
          name: fctest1-def00
      clusterName: fctest1
      infrastructureRef:
        apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
        kind: AzureMachinePool
        name: fctest1-def00
      version: 1.24.8
---
apiVersion: infrastructure.cluster.x-k8s.io/v1beta1
kind: AzureMachinePool
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: "fctest1"
  name: fctest1-def00
  namespace: org-multi-project
spec:
  location: westeurope
  strategy:
    rollingUpdate:
      deletePolicy: Oldest
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    osDisk:
      diskSizeGB: 50
      managedDisk:
        storageAccountType: Premium_LRS
      osType: Linux
    sshPublicKey: XXXXXX
    vmSize: Standard_D2s_v5
---
apiVersion: bootstrap.cluster.x-k8s.io/v1beta1
kind: KubeadmConfig
metadata:
  labels:
    cluster.x-k8s.io/cluster-name: "fctest1"
  name: fctest1-def00
  namespace: org-multi-project
spec:
  joinConfiguration:
    nodeRegistration:
      kubeletExtraArgs:
        cloud-config: /etc/kubernetes/azure.json
        cloud-provider: external
      name: '{{ ds.meta_data.local_hostname }}'
  files:
  - contentFrom:
      secret:
        key: worker-node-azure.json
        name: fctest1-def00-azure-json
    owner: root:root
    path: /etc/kubernetes/azure.json
    permissions: "0644"

Before changing the instance type

➜ kubectl --kubeconfig=/dev/shm/fctest1 get node -o yaml | grep node.kubernetes.io/instance-type
      node.kubernetes.io/instance-type: Standard_D4s_v3 <--- This is the control plane
      node.kubernetes.io/instance-type: Standard_D2s_v5
      node.kubernetes.io/instance-type: Standard_D2s_v5
      node.kubernetes.io/instance-type: Standard_D2s_v5

After Changing the instance type

➜ kubectl --kubeconfig=/dev/shm/fctest1 get node -o yaml | grep node.kubernetes.io/instance-type
      node.kubernetes.io/instance-type: Standard_D4s_v3 <-- This is the Control plane
      node.kubernetes.io/instance-type: Standard_D2s_v5
      node.kubernetes.io/instance-type: Standard_D2s_v5
      node.kubernetes.io/instance-type: Standard_D4s_v5 <-- This is the node that was replaced

Everything is reported stable and ready, nothing is modelOutOfDate

➜ kubectl get machinepool
NAME            CLUSTER   REPLICAS   PHASE     AGE   VERSION
fctest1-def00   fctest1   3          Running   27m   v1.24.8

➜ kubectl get azuremachinepool
NAME            REPLICAS   READY   STATE
fctest1-def00   3          true    Succeeded

➜ kubectl get azuremachinepoolmachine
NAME              VERSION   READY   STATE
fctest1-def00-1   v1.24.8   true    Succeeded
fctest1-def00-2   v1.24.8   true    Succeeded
fctest1-def00-3   v1.24.8   true    Succeeded

➜ clusterctl describe ....

└─Workers                                                                                                                         
  └─MachinePool/fctest1-def00                                                    True                     15m                          
    ├─BootstrapConfig - KubeadmConfig/fctest1-def00                              True                     25m                           
    └─MachineInfrastructure - AzureMachinePool/fctest1-def00                     True                     15m

logs only show the standard reconciling regular messages

I1229 15:15:53.012865       1 azuremachinepoolmachine_controller.go:248] controllers.AzureMachinePoolMachineController.reconcileNormal "msg"="Reconciling AzureMachinePoolMachine" "azureMachinePoolMachine"={"name":"fctest1-def00-3","namespace":"org-multi-project"} "controller"="azuremachinepoolmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePoolMachine" "name"="fctest1-def00-3" "namespace"="org-multi-project" "reconcileID"="2a1ee6d5-9f7e-43de-aae6-960166e965c6" "x-ms-correlation-request-id"="faaf9cc7-2980-46f6-93db-84b6aa12b538" 
I1229 15:15:53.013230       1 azuremachinepoolmachine_controller.go:248] controllers.AzureMachinePoolMachineController.reconcileNormal "msg"="Reconciling AzureMachinePoolMachine" "azureMachinePoolMachine"={"name":"fctest1-def00-1","namespace":"org-multi-project"} "controller"="azuremachinepoolmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePoolMachine" "name"="fctest1-def00-1" "namespace"="org-multi-project" "reconcileID"="ab358c05-4524-45f9-bb3e-cf119c829bb3" "x-ms-correlation-request-id"="17d16756-4544-4d75-98f9-c017514a10b6" 
I1229 15:15:53.013629       1 azuremachinepoolmachine_controller.go:248] controllers.AzureMachinePoolMachineController.reconcileNormal "msg"="Reconciling AzureMachinePoolMachine" "azureMachinePoolMachine"={"name":"fctest1-def00-2","namespace":"org-multi-project"} "controller"="azuremachinepoolmachine" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePoolMachine" "name"="fctest1-def00-2" "namespace"="org-multi-project" "reconcileID"="4a0a03ef-496c-4c05-bfa2-e85b23ff1e48" "x-ms-correlation-request-id"="b613f8ac-1bcd-47b5-a24d-e49f8a73f564" 
I1229 15:16:49.498054       1 azurejson_machinepool_controller.go:195] controllers.AzureJSONMachinePoolReconciler.Reconcile "msg"="WARNING, You are using Service Principal authentication for Cloud Provider Azure which is less secure than Managed Identity. Your Service Principal credentials will be written to a file on the disk of each VM in order to be accessible by Cloud Provider. To learn more, see https://capz.sigs.k8s.io/topics/identities-use-cases.html#azure-host-identity " "azureMachinePool"="fctest1-def00" "cluster"="fctest1" "controller"="azuremachinepool" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePool" "kind"="AzureMachinePool" "machinePool"="fctest1-def00" "name"="fctest1-def00" "namespace"="org-multi-project" "reconcileID"="9158e1ac-a8c2-4084-91d2-4616ed8c924e" "x-ms-correlation-request-id"="37735e0f-fda4-420f-872b-976dc142e1e5" 
I1229 15:16:49.498371       1 azuremachinepool_controller.go:260] controllers.AzureMachinePoolReconciler.reconcileNormal "msg"="Reconciling AzureMachinePool" "azureMachinePool"={"name":"fctest1-def00","namespace":"org-multi-project"} "controller"="azuremachinepool" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePool" "name"="fctest1-def00" "namespace"="org-multi-project" "reconcileID"="197951b6-13ed-4f20-a152-da99e620d85b" "x-ms-correlation-request-id"="4c9cd63e-9274-4412-a7ba-62efa892ea2f" 
I1229 15:16:49.499700       1 helpers.go:583] controllers.AzureMachinePoolMachineController.SetupWithManager "msg"="machine pool predicate" "azureMachinePool"="fctest1-def00" "controller"="AzureMachinePoolMachine" "eventType"="update" "namespace"="org-multi-project" "predicate"="MachinePoolModelHasChanged" "x-ms-correlation-request-id"="093d7495-213a-460e-a306-107961f0febf" "shouldUpdate"=false
I1229 15:17:06.598662       1 azurejson_machinetemplate_controller.go:186] controllers.AzureJSONTemplateReconciler.Reconcile "msg"="WARNING, You are using Service Principal authentication for Cloud Provider Azure which is less secure than Managed Identity. Your Service Principal credentials will be written to a file on the disk of each VM in order to be accessible by Cloud Provider. To learn more, see https://capz.sigs.k8s.io/topics/identities-use-cases.html#azure-host-identity " "azureMachineTemplate"={"name":"fctest1-control-plane-2016ea16","namespace":"org-multi-project"} "cluster"="fctest1" "controller"="azuremachinetemplate" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachineTemplate" "kind"="AzureMachineTemplate" "name"="fctest1-control-plane-2016ea16" "namespace"="org-multi-project" "reconcileID"="c47badc6-254b-4c5f-a15c-49b4392a3791" "x-ms-correlation-request-id"="d7cba8ae-7a6e-493f-8805-cae56e8c1f24" 
I1229 15:17:13.182767       1 azuremachinepool_controller.go:260] controllers.AzureMachinePoolReconciler.reconcileNormal "msg"="Reconciling AzureMachinePool" "azureMachinePool"={"name":"fctest1-def00","namespace":"org-multi-project"} "controller"="azuremachinepool" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePool" "name"="fctest1-def00" "namespace"="org-multi-project" "reconcileID"="085ac5da-a54b-443f-a050-e3225b80d307" "x-ms-correlation-request-id"="4917f01a-d070-46df-b322-d0dadb7d99df"

Environment:

cluster-api-provider-azure version: 1.6.0
Kubernetes version: (use kubectl version): 1.24.8
OS (e.g. from /etc/os-release):

The text was updated successfully, but these errors were encountered:

primeroz · 2023-01-02T11:25:26Z

Looking a bit more into this using Tilt and Extra logging i can see that after the first node is replaced with a surge the controller thinks that all remaining nodes are actually with the latestmodel and so no node needs to be replaced

capz-controller-manager-787d4f7996-cfklq manager I0102 10:46:09.084865      48 machinepool_deployment_strategy.go:200]  "msg"="nothing more to do since all the AzureMachinePoolMachine(s) are the latest model and
 not over-provisioned" "azureMachinePool"={"name":"tilt-mp-0","namespace":"default"} "controller"="azuremachinepool" "controllerGroup"="infrastructure.cluster.x-k8s.io" "controllerKind"="AzureMachinePool" "name"
 ="tilt-mp-0" "namespace"="default" "reconcileID"="69cef20e-3e57-4bf7-b150-feaed24f5ec0"

Following the code

Check which machines are Not Latest Model by checking the status field on the azureMachinePoolMachine
- cluster-api-provider-azure/azure/scope/strategies/machinepool_deployments/machinepool_deployment_strategy.go
  
  Line 134 in e5063d1
  
  machinesWithoutLatestModel = order(getMachinesWithoutLatestModel(machinesByProviderID))
This value is set by only comparing the image . any other difference will not set this value to false
- cluster-api-provider-azure/azure/scope/machinepoolmachine.go
  
  Line 505 in a38912d
  
  func (s *MachinePoolMachineScope) hasLatestModelApplied(ctx context.Context) (bool, error) {

Is this what we expect ? should we actually look at more to determine if the machine is the latestmodel ? ( Hopefully we can look at what the Azure UI is reporting )

primeroz · 2023-01-10T10:22:01Z

I closed the PR i opened because it does conflict with the MachinPool bootstrap token not getting refreshed automatically issue #2683 as it was pointed out by @dthorsen.

Pinging @mweibel for his work into the bootstrap token issue.

Context

I will summarize here what i understand the issue to be, then i will bring this up in slack for feedback / discussion

example Machinepool mp0 has 3 replicas, max surge 25% ( 1 )
Reconcile loop of scaleset checks if any patching of the VMSS in Azure is needed
If hasModelChanges=TRUE or Number of nodes with latestModelApplied!=MachinePool.replicas
- the patch to the VMSS to be applied is updated so that the Capacity is increased by maxSurge
the VMSS is updated
- VMSS is grown by number of nodes in maxSurge : 3 -> 4
  - 1 Node gets started with updated spec
- Decrease is handled by Delete function
  - the replicas in machinepool was not changed so 1 node , the oldest, will be removed
Staus of mp0: 1 node is up-to-date , 2 nodes are out of date ( and marked as latestModelNotApplied in azure )
Next Reconcile loop of scaleset
- there are no changes to be applied to the VMSS
- the latestModelApplied in the azureMachinePool will return true when is actually false because at the moment it is only based on the image of the machine rather than the status reported by azure about the VM
- -> Nothing happens , no more machines get updated

This will cause the nodes in the VMSS to have inconsistent configurations between them and, the more changes are applied over time the more mixed the set of VMs in the VMSS will be.

The Problem

the PR #2975 I proposed does sync the latestModelApplied status field of the AzureMachinePoolMachines with the one reported by Azure.
The effect of this would be to let the reconcile loop of the scaleset refresh all nodes by Surging then Deleting nodes at every loop.

Until the Bootstrap Token refresh issue is not addressed this change could exacerbate the problem , potentially causing a set of working but out-of-date nodes break by being replaced with nodes that are up-to-date but can't join the cluster

Further Complication

With the proposed fix for the Bootstrap Token Refresh issue another problem might arise though.
When the token is refreshed the VMSS gets updated causing all current nodes to be marked as out of date , with the current logic this would trigger a roll restart of all nodes even though the only change was the bootstrap token and so there would be no need to recreate those nodes.

Questions

Is using the azure LatestModelApplied to detect when to roll nodes just too blind ?
should we just extend the hasLatestModelApplied to check for more than just the image but not to blindly follow what azure says ?
- In this case, how could we track the change of customData to trigger a restart excluding the case where only the token was updated ?

mweibel · 2023-01-10T11:42:20Z

thanks for looking into this - I'm glad not being the only one to experience it 🙂

Your summary is interesting and clarified a few things for me. Thanks!
Updating the customData which could lead to marking all nodes as out of date - that would be very bad. I previously assumed a customData diff would not trigger a latestModelApplied change but never verified.

The change in #2975 does make sense from my point of view. In our case we often see out of date instances at the moment. Using the data from the API as the indicator makes sense to me because otherwise we might always see inconsistent state in the portal, for example.

To continue with #2803 I wanted to check/verify how e.g. CAPA handles this. If the issue with not updating bootstrap tokens is consistent across providers, we might want a solution (or at least a documentation on how to handle it) on the main CAPI project.

I didn't investigate more into this due to time constraints (this didn't change yet unfortunately). If anyone is willing to take this over that'd be great!

Questions

Do you happen to know if there are details on which properties have an effect on latestModelApplied? I haven't yet found related documentation.
Does somebody remember how AKS engine handled updating bootstrap tokens? It worked on VMSS too and most likely tokens got refreshed in some way or another.

primeroz · 2023-01-10T15:28:20Z

Do you happen to know if there are details on which properties have an effect on latestModelApplied? I haven't yet found related documentation.

I do not know , I will try and do a bit of reasearch.

Updating the customData which could lead to marking all nodes as out of date - that would be very bad. I previously assumed a customData diff would not trigger a latestModelApplied change but never verified.

I might need to double check this . I came to this conclusion because when i do a

kind with CAPZ -> create cluster in azure
install capi in azure k8s
clusterctl mv to azure k8s

the instances are latestModel TRUE for a short period of time then switch to FALSE. If i delete one azuremachinepoolmachine and let it re-create it stays latestModel TRUE permanently.

Looking at the data for cluster-api resources the only difference i saw was the token had changed so i thought that would trigger it. I Should confirm that. if that is not true then no problem here

but looking at the actual output of

az vmss list-instances -n gs-fciocchetti-def00 -g gs-fciocchetti | jq '.[0]' - VM not with latestModel
az vmss list-instances -n gs-fciocchetti-def00 -g gs-fciocchetti | jq '.[2]' - VM with latestModel after a delete

I see a weird difference which makes no sense to me modelDefinitionApplied changed from VirtualMachine to VirtualMachineScaleSet ( for a machine that was created by the VMSS and replaced by the VMSS 🤷 )

-  "latestModelApplied": false,
+  "latestModelApplied": true,
   "licenseType": null,
   "location": "westeurope",
-  "modelDefinitionApplied": "VirtualMachine",
-  "name": "gs-fciocchetti-def00_0",
+  "modelDefinitionApplied": "VirtualMachineScaleSet",
+  "name": "gs-fciocchetti-def00_3",

I didn't investigate more into this due to time constraints (this didn't change yet unfortunately). If anyone is willing to take this over that'd be great!

I was planning to give your branch a try , i saw you mentioned in your draft pr that is not working. Mind adding a bit more notes about what's not working there ?

I will also use it to confirm if changing of CustomData does affect the latestmodel or not , either way we still need that fix before implementing this one

primeroz · 2023-01-19T15:09:58Z

For the moment internally we decided to move back to MachineDeployments to focus on other things so i won't do any work on fixing this issue for now.

I am planning to get back to it as soon as it becomes relevant for us again

CecileRobertMichon · 2023-03-10T22:09:32Z

@mweibel was this fixed with #3134 or is that different?

mweibel · 2023-03-13T07:49:16Z

Unsure, I think that would need to be checked again. I plan on verifying the join issues are gone in the next couple of days. However our use case doesn't usually involve adjusting vmSize for existing MachinePools - we usually create new ones when we iterate on the specs so I'm not sure I can verify if this issue is fixed.

primeroz · 2023-03-13T08:42:49Z

Unsure

Same here. This issue, at least its original version when i noticed it, would not be solved by the fix of customData .

the fix of the customData was a requirement to get a fix for this issue though to avoid refreshing a MachinePool/VMSS would cause to refresh nodes with ones that could not join the cluster due to expired token

the logic i looked at when opened this issue is here #2972 (comment)

our use case doesn't usually involve adjusting vmSize for existing MachinePools - we usually create new ones when we iterate on the specs so I'm not sure I can verify if this issue is fixed.

The same problem would happen on change of kubernetesVersion which is our main use case , upgrade version of a MachinePool/VMSS

I am currently not looking at this issue since we reverted to MachineDeployments since then , but what i found at the time was that

Using the Azure latestModelApplied as a signal to CAPZ to trigger the replacement of a node is not realiable enough since there seem to be a lot of events that cause that flag to be set on azureSide even when no real changes have happened
Today i would probably try to track the generation of the MachinePool and AzureMachineTemplate with the one of the node and let capz decide if the node need to be refreshed based on that

k8s-triage-robot · 2024-01-26T05:27:58Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-02-25T06:25:25Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

nawazkh · 2024-02-26T17:47:47Z

/remove-lifecycle rotten

k8s-triage-robot · 2024-05-26T18:42:21Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2024-06-25T18:56:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2024-07-25T19:31:35Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot · 2024-07-25T19:31:39Z

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen

Mark this issue as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Dec 29, 2022

primeroz mentioned this issue Jan 2, 2023

Better Detect when Machines in MachinePool VMSS are not running on the latest model #2975

Closed

3 tasks

primeroz changed the title ~~Changing vmSize on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state~~ Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state Jan 10, 2023

primeroz mentioned this issue Jan 10, 2023

refresh custom data automatically #2803

Closed

3 tasks

jackfrancis added this to the v1.8 milestone Jan 19, 2023

jackfrancis modified the milestones: v1.8, next Mar 2, 2023

CecileRobertMichon added this to CAPZ Planning Apr 5, 2023

jackfrancis removed this from the next milestone May 11, 2023

dtzar added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Aug 3, 2023

CecileRobertMichon added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Aug 10, 2023

AndiDog mentioned this issue Jan 23, 2024

KubeadmConfig changes should be reconciled for machine pools, triggering instance recreation kubernetes-sigs/cluster-api#8858

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 25, 2024

k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 26, 2024

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 26, 2024

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 25, 2024

k8s-ci-robot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 25, 2024

github-project-automation bot moved this to Done in CAPZ Planning Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972

Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972

primeroz commented Dec 29, 2022 •

edited

Loading

primeroz commented Jan 2, 2023

primeroz commented Jan 10, 2023

mweibel commented Jan 10, 2023

primeroz commented Jan 10, 2023

primeroz commented Jan 19, 2023

CecileRobertMichon commented Mar 10, 2023

mweibel commented Mar 13, 2023

primeroz commented Mar 13, 2023

k8s-triage-robot commented Jan 26, 2024

k8s-triage-robot commented Feb 25, 2024

nawazkh commented Feb 26, 2024

k8s-triage-robot commented May 26, 2024

k8s-triage-robot commented Jun 25, 2024

k8s-triage-robot commented Jul 25, 2024

k8s-ci-robot commented Jul 25, 2024

Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972

Changing vmSize/kubernetesVersion on a MachinePool only refresh one node in the VMSS leaving the pool in an inconsistent state #2972

Comments

primeroz commented Dec 29, 2022 • edited Loading

primeroz commented Jan 2, 2023

primeroz commented Jan 10, 2023

Context

The Problem

Further Complication

Questions

mweibel commented Jan 10, 2023

Questions

primeroz commented Jan 10, 2023

primeroz commented Jan 19, 2023

CecileRobertMichon commented Mar 10, 2023

mweibel commented Mar 13, 2023

primeroz commented Mar 13, 2023

k8s-triage-robot commented Jan 26, 2024

k8s-triage-robot commented Feb 25, 2024

nawazkh commented Feb 26, 2024

k8s-triage-robot commented May 26, 2024

k8s-triage-robot commented Jun 25, 2024

k8s-triage-robot commented Jul 25, 2024

k8s-ci-robot commented Jul 25, 2024

primeroz commented Dec 29, 2022 •

edited

Loading