Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Implement template deletion for topology-owned MD and MS #5191

Merged

Conversation

sbueringer
Copy link
Member

@sbueringer sbueringer commented Sep 2, 2021

Signed-off-by: Stefan Büringer [email protected]

What this PR does / why we need it:

Before this PR, our topology code:

  • deleted "old" templates after template rotation of MachineDeployments immediately
    • thus reconciliation of the old MachineSets was immediately broken as the MachineSet controller requires the templates
  • didn't delete corresponding templates when deleting MachineDeployments

This PR aims to solve this problem. The high-level concept is:

  • topology code doesn't delete templates in "successful" reconciles (template rotation without errors / MachineDeployment deletions)
    • Note: we will have to introduce another separate mechanism which cleans up templates when the template rotation fails (e.g. new templates were never referenced in any MachineDeployment or MachineSet)
  • we add new MD and MS topology reconcilers which on MD/MS delete:
    • calculate which templates are still used by MachineDeployments or MachineSets not in deleting
    • deletes unused templates

Note: this is only done for topology-owned MD and MS

tl;dr we cleanup templates which would otherwise be orphaned during deletion of topology-owned MachineDeployments and MachineSets

I tested the following locally:

  • rotate infra/bootstrap templates in MDs and ensure old templates are properly deleted
  • delete MDs and ensure old templates are properly deleted

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 2, 2021
@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Sep 2, 2021
@sbueringer
Copy link
Member Author

sbueringer commented Sep 2, 2021

/assign @vincepri
(feedback about the general approach would be great, it's more than a bit tricky and full of assumptions)

@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch 3 times, most recently from 6ed296b to a087030 Compare September 3, 2021 07:38
@sbueringer sbueringer changed the title [WIP] 🐛 Implement template deletion for topology-owned MD and MS 🐛 Implement template deletion for topology-owned MD and MS Sep 3, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 3, 2021
@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from d5f8a8b to 9313b24 Compare September 3, 2021 08:35
@sbueringer
Copy link
Member Author

Removed WIP. Fixed a few minor things and tested it locally (for details see PR description)

@vincepri
Copy link
Member

vincepri commented Sep 3, 2021

Reviewing now

controllers/topology/desired_state.go Outdated Show resolved Hide resolved
controllers/topology/desired_state.go Outdated Show resolved Hide resolved
controllers/topology/machinedeployment_controller.go Outdated Show resolved Hide resolved
controllers/topology/machinedeployment_controller.go Outdated Show resolved Hide resolved
controllers/topology/machinedeployment_controller.go Outdated Show resolved Hide resolved
main.go Outdated Show resolved Hide resolved
@sbueringer sbueringer changed the title 🐛 Implement template deletion for topology-owned MD and MS [WIP] 🐛 Implement template deletion for topology-owned MD and MS Sep 6, 2021
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 6, 2021
@sbueringer
Copy link
Member Author

sbueringer commented Sep 6, 2021

A few notes about why the current implementation is only one MD reconciler:

  • Initially it was not possible to detect if a MachineSet is topology owned. That's why I wanted to always reconcile MDs, but later on I found out that during MD deletion MDs are deleted first, so we had to propagated the topology owned label to MS anyway.
  • I though it's a good idea to "synchronize" reconciliation of MD and its related MS to avoid race conditions. Given what I know now about how MD/MS deletion works, I'm not 100% sure, but I think there are no race conditions. For example if multiple MS are deleted concurrently we always have one reconciliation loop which will see the deletionTimestamp on both MS and can then delete templates accordingly.

Overall, I now think it would be better to split up the reconciler in one MD and one MS reconciler to simplify the logic. The new logic would just be:

  • MD reconciler
    • if MD in deletion and not paused: check if templates are used in corresponding MS, otherwise delete them
  • MS reconciler
    • if MS in deletion and not paused: check if templates are used in MD/ corresponding MS, otherwise delete them

I think it would be simpler because in the current implementation one problem is that I'm reconciling a MachineDeployment which could be already gone.

Give @vincepri's suggestion here. I would also inline this behaviour in the current MD/MS controllers.

@fabriziopandini / @vincepri WDYT

@fabriziopandini
Copy link
Member

(in case we are not moving this into exiting controllers)

WRT to the discussion one controller/two controllers,I'm ok with both approaches; in this case if doable having separated controllers, this gives a nice symmetry with existing controllers and a cleaner separation of concerns.

@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from 9313b24 to 4438fa3 Compare September 6, 2021 15:10
@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from e9b6986 to 19d5e60 Compare September 7, 2021 18:18
@sbueringer
Copy link
Member Author

@enxebre @vincepri @fabriziopandini PTAL. I separated the reconcilers as discussed in Slack. Now it should be pretty clear how the reconciliation logic looks like.

@sbueringer sbueringer changed the title [WIP] 🐛 Implement template deletion for topology-owned MD and MS 🐛 Implement template deletion for topology-owned MD and MS Sep 7, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 7, 2021
@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from 77624b9 to 8d90dc1 Compare September 9, 2021 07:19
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Sep 9, 2021
@sbueringer
Copy link
Member Author

sbueringer commented Sep 9, 2021

Findings should be addressed. PTAL @enxebre @fabriziopandini @vincepri

@fabriziopandini After moving the template utils back into the package I'm now down to 3 TODOs caused by #5153. So I guess depending on which PR gets ready first we could also merge the current PR before #5153.

#5153 has been merged, rebased on top of it and also squashed the commits

@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from 8d90dc1 to 75fcfc1 Compare September 9, 2021 08:58
@sbueringer
Copy link
Member Author

/hold cancel (#5153 merged + resolved open TODOs accordingly)

Copy link
Member

@fabriziopandini fabriziopandini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a nit then lgtm for me

controllers/topology/util.go Outdated Show resolved Hide resolved
@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from 3c31410 to 41017ee Compare September 9, 2021 16:49
@sbueringer
Copy link
Member Author

@enxebre @vincepri @fabriziopandini Should work again (ci works without topology, and locally template rotation and MD deletion works - we really need topology e2e tests :))

@fabriziopandini
Copy link
Member

changes lgtm, pending squash / others feedback after latest changes

@enxebre
Copy link
Member

enxebre commented Sep 10, 2021

lgtm after squashing

Signed-off-by: Stefan Büringer [email protected]

Co-authored-by: fabriziopandini <[email protected]>
@sbueringer sbueringer force-pushed the pr-improve-template-deletion branch from 41017ee to ae31c7c Compare September 10, 2021 11:12
Copy link
Member

@vincepri vincepri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2021
@vincepri
Copy link
Member

@enxebre Over to you to unhold

@enxebre
Copy link
Member

enxebre commented Sep 13, 2021

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 13, 2021
@k8s-ci-robot k8s-ci-robot merged commit a24cbf2 into kubernetes-sigs:master Sep 13, 2021
@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone Sep 13, 2021
@sbueringer sbueringer deleted the pr-improve-template-deletion branch September 13, 2021 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants