Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement reconcile MachineDeployments for managed topologies #5064

Closed
fabriziopandini opened this issue Aug 9, 2021 · 9 comments · Fixed by #5072
Closed

Implement reconcile MachineDeployments for managed topologies #5064

fabriziopandini opened this issue Aug 9, 2021 · 9 comments · Fixed by #5072
Assignees
Labels
area/clusterclass Issues or PRs related to clusterclass kind/feature Categorizes issue or PR as related to a new feature.
Milestone

Comments

@fabriziopandini
Copy link
Member

Detailed Description

This is part of the activities for the implementation of the ClusterClass proposal.

The ClusterTopologyReconciler, after reading the cluster class, the current cluster and computing the desired state, is required to reconcile current and desired state.

This issue is about implementing reconcile current and desired state for the MachineDeployments object.

The operation could be divided into two steps:

  1. Comparing desired MachineDeployments and current MachineDeployments and identify
    a. desired MachineDeployments that does not exists yet
    b. desired MachineDeployments that already exists as a current Machine deployments
    c. current MachineDeployments that does not exists as desired

  2. Handle the three above use cases, executing create & update (reconcile) or delete MachineDeployment

create & update (reconcile) MachineDeployment is similar to what already discussed in #5036 or in #5049, with the difference that it is required to reconcile the BootstrapConfigTemplate and the InfrastructureMachineTemplate before reconciling the MachineDeployment (Note: both BootstrapConfigTemplate and the InfrastructureMachineTemplate should handle template rotation similarly to what discussed in #5049)

delete MachineDeployment instead is straight forward (it should delete the machine deployment only, this will trigger template deletion as well).

Anything else you would like to add:

NOTE: Given how reading the cluster class, the current cluster and computing the desired state are implemented it is possible to do following assumptions:

  • Cluster class and related templates have been read and are all in the clusterTopologyClass struct.
  • Current state of the cluster has been successfully read; please not that, at the first reconciliation, the current state is empty (with the exception of Cluster).
  • The desired state has been fully computed and it is in the clusterTopologyState struct.

NOTE: The func must provide logs at different level to allow debug

/kind feature

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Aug 9, 2021
@sbueringer
Copy link
Member

/assign

@fabriziopandini
Copy link
Member Author

/milestone v0.4

@k8s-ci-robot k8s-ci-robot added this to the v0.4 milestone Aug 9, 2021
@sbueringer
Copy link
Member

sbueringer commented Aug 9, 2021

@srm09 @fabriziopandini A question regarding "How can I correlate MachineDeployment?"

Based on the proposal I can identify MachineDeployments based on cluster.x-k8s.io/topology: <cluster-name>-<worker-set-name>. What is the current state of that? In case we move away from it, what alternative way is there to correlate current/desired MachineDeployments?

@sbueringer
Copy link
Member

@srm09 @fabriziopandini A question regarding "How can I correlate MachineDeployment?"

Based on the proposal I can identify MachineDeployments based on cluster.x-k8s.io/topology: <cluster-name>-<worker-set-name>. What is the current state of that? In case we move away from it, what alternative way is there to correlate current/desired MachineDeployments?

/cc @killianmuldoon

@sbueringer
Copy link
Member

sbueringer commented Aug 9, 2021

@fabriziopandini A few questions:

  1. In case we want to update a MachineDeployment with template rotation on one of the templates. The templates in current & desired have intentionally the same name. Should we just generate a new name analog to how we did it in computeDesiredState to avoid a name conflict? (I'm not sure if there are good/any alternatives)

  2. The naive way to implement template rotation is:

    • update MachineDeployment
      • if InfrastructureMachineTemplate changed:
        • create new InfrastructureMachineTemplate and update ref in local desired MachineDeployment
      • if BootstrapTemplate changed:
        • create new BootstrapTemplate and update ref in local desired MachineDeployment
      • if MachineDeployment changed (either through a ref change or through another field in the MachineDeployment)
        • update the MachineDeployment

Unfortunately, this will leak templates in case something fails after the first template creation. Do we already have a way to avoid leaking templates?

Possible solutions:

  • try to find out if a correct new template already exists before creating a new one and in case it does, use that one
    • this covers some template leaks but not all. E.g. after we updated the MachineDeployment we have to potentially delete old BootstrapTemplates and InfrastructureMachineTemplates, this is not re-entrant in case the deletion fails as the MachineDeployment will then already only have references to the new templates.
  • cleanup leaked templates at some point, e.g. listing all "cluster-related" templates at some point and deleting all that are not referenced by the current cluster resource. This is also probably hard to do as the bootstrap provider can change and then we don't know which template type we should list (or we could exclude that edge case)
  • ignore for now and open follow-up issue

@killianmuldoon
Copy link
Contributor

@srm09 @fabriziopandini A question regarding "How can I correlate MachineDeployment?"
Based on the proposal I can identify MachineDeployments based on cluster.x-k8s.io/topology: <cluster-name>-<worker-set-name>. What is the current state of that? In case we move away from it, what alternative way is there to correlate current/desired MachineDeployments?

/cc @killianmuldoon

This needs to be updated in the proposal is my understanding. In the getCurrentState PR we're using two labels to get the machineDeployments related to the managed cluster: the topology label to show that the cluster is actually managed and then the clusterName label to show which cluster it's a part of. The code for this check is:
err := r.Client.List(ctx, md, client.MatchingLabels{ clusterv1.ClusterLabelName: cluster.Name, clusterv1.ClusterTopologyLabelName: "", })

@sbueringer
Copy link
Member

fyi Fabrizio added the comment for the issue for the "underlying" implementation in #4999 (comment) (which fulfills my requirement to be able to correlate MachineDeployments)

@fabriziopandini
Copy link
Member Author

@sbueringer

Should we just generate a new name analog to how we did it in computeDesiredState to avoid a name conflict?

Yes, in case of template rotation, we should generate a new name and then update the reference accordingly

Do we already have a way to avoid leaking templates?

I have explored a couple of options with @vincepri , but at the end there is always risk of orphaning templates when doing template rotation. I'm considering to add a final step in the controller cleaning up orphan templates; it is on my agenda to open an issue for this point.

@sbueringer
Copy link
Member

/area topology

@killianmuldoon killianmuldoon added the area/clusterclass Issues or PRs related to clusterclass label May 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/clusterclass Issues or PRs related to clusterclass kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants