Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defer template deletion for MachineSets in a managed topologies #5176

Closed
fabriziopandini opened this issue Aug 30, 2021 · 8 comments
Closed
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@fabriziopandini
Copy link
Member

What steps did you take and what happened:
While testing template rotation for a MachineDeployment in a managed topology, deletion process of the old MachineSet got stuck and the MachineSet controller started to fail due to missing templates.

What did you expect to happen:
The machine deployment in a managed topology to properly rotate templates

Anything else you would like to add:
This is a problem of syncronization between the managed topology controller and the MachineSet controller.

Managed topology controller is responsible to delete old templates in case of template rotation, but this operation has to wait for the MachineSet to actually delete all the machines before removing the template.

The current working assumption to address this issue is to implement a MachineSet topology controller watching for MachineSets with the cluster.x-k8s.io/topology label only; this controller is going to:

  • add a finaliser on the machine set if missing
  • detect the MachineSet being deleted after removing all the machines
  • delete the corresponding templates if rotated (if different from the ones in the MachineDeployment)
  • remove the finaliser

Environment:

  • Cluster-api version: master

/kind bug

@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 30, 2021
@fabriziopandini
Copy link
Member Author

/area topology
/milestone v0.4

@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone Aug 30, 2021
@sbueringer
Copy link
Member

sbueringer commented Aug 30, 2021

@fabriziopandini Do you think we should also check that no other MachineSet is using the templates and only delete them if the "current" MachineSet is the last one? (not sure if we can get that 100% race condition free)

I think otherwise we run into problems if two MachineSets are using the same templates and both are in deletion.

@sbueringer
Copy link
Member

Do we need a similar mechanism for MachineDeployments? (or maybe use the one for MachineSets for both)

@fabriziopandini
Copy link
Member Author

We probably need both, but let's start by MachineSet because they have more frequent turnover than MachineDeployments

@sbueringer
Copy link
Member

/assign

@fabriziopandini
Copy link
Member Author

@sbueringer should we close this issue now that #5191 is merged?

@sbueringer
Copy link
Member

Yup, let's close it.
/close

@k8s-ci-robot
Copy link
Contributor

@sbueringer: Closing this issue.

In response to this:

Yup, let's close it.
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

4 participants