Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeadmConfig owner references not restored #6980

Closed
killianmuldoon opened this issue Jul 26, 2022 · 7 comments
Closed

KubeadmConfig owner references not restored #6980

killianmuldoon opened this issue Jul 26, 2022 · 7 comments
Assignees
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.

Comments

@killianmuldoon
Copy link
Contributor

When Cluster API is backed up and restored the ownerReferences on the KubeadmConfig are not restored. As there is a check in the KubeadmConfig reconciler for the ownerReference these KubeadmConfigs are not reconciled after a restore is done.

I'm not sure if there's a practical implication for this on the functioning of the cluster as:

  1. New Machines will create new KubeadmConfigs, so new machine bootstrapping will should not be impacted.

  2. If the machines being restored are already bootstrapped the status fields set during reconciliation shouldn't be used in future.

It's possible that the lack of a reconcile on restored KubeadmConfigs will have a different impact for MachinePools.

Regardless, I think we should attempt to restore the ownerReferences on the KubeadmConfig object if they don't have them. We could rebuild the reference from the Cluster name + ControlPlane / MachineDeployment name labels.

/area bootstrap
/kind bug
/kind cleanup
Possibly related: #3134

@k8s-ci-robot k8s-ci-robot added area/bootstrap Issues or PRs related to bootstrap providers kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jul 26, 2022
@ywk253100
Copy link

Here is the KubeadmConfig before backing up during my testing:

{
  "apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1",
  "kind":"KubeadmConfig",
  "metadata":{
    "annotations":{
      "cluster.x-k8s.io/cloned-from-groupkind":"KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
      "cluster.x-k8s.io/cloned-from-name":"tkg-vc-antrea-md-0"
    },
    "creationTimestamp":"2022-07-26T02:00:44Z",
    "generation":2,
    "labels":{
      "cluster.x-k8s.io/cluster-name":"tkg-vc-antrea",
      "cluster.x-k8s.io/deployment-name":"tkg-vc-antrea-md-0",
      "machine-template-hash":"2318501170",
      "node-pool":"tkg-vc-antrea-worker-pool"
    },
    "name":"tkg-vc-antrea-md-0-g9vlq",
    "namespace":"default",
    "ownerReferences":[
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "kind":"MachineSet",
        "name":"tkg-vc-antrea-md-0-675d9455c4",
        "uid":"aaf0f32e-30bf-48ec-98b2-3460025abf79"
      },
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "blockOwnerDeletion":true,
        "controller":true,
        "kind":"Machine",
        "name":"tkg-vc-antrea-md-0-675d9455c4-crnkg",
        "uid":"79e13399-c348-4752-82d5-1aa09370a284"
      }
    ],
    "resourceVersion":"33144",
    "uid":"988556f1-7566-4eff-8470-ecf52c400455"
  },
  "spec":{
    "files":[

    ],
    "format":"cloud-config",
    "joinConfiguration":{
      "discovery":{
        "bootstrapToken":{
          "apiServerEndpoint":"10.180.130.83:6443",
          "caCertHashes":[
            "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
          ],
          "token":"6p9lrl.9f56wnkq55vkjgro"
        }
      },
      "nodeRegistration":{
        "criSocket":"/var/run/containerd/containerd.sock",
        "kubeletExtraArgs":{
          "cloud-provider":"external",
          "tls-cipher-suites":"xxx"
        },
        "name":"{{ ds.meta_data.hostname }}"
      }
    },
    "preKubeadmCommands":[
      "hostname \"{{ ds.meta_data.hostname }}\"",
      "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
      "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
      "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
      "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
      "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
      "systemctl restart containerd"
    ],
    "useExperimentalRetryJoin":true,
    "users":[
      {
        "name":"capv",
        "sshAuthorizedKeys":[
          "ssh-rsa xxx"
        ],
        "sudo":"ALL=(ALL) NOPASSWD:ALL"
      }
    ]
  },
  "status":{
    "conditions":[
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"Ready"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"CertificatesAvailable"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"DataSecretAvailable"
      }
    ],
    "dataSecretName":"tkg-vc-antrea-md-0-g9vlq",
    "observedGeneration":2,
    "ready":true
  }
}

The same one after restoring:

{
    "apiVersion": "bootstrap.cluster.x-k8s.io/v1beta1",
    "kind": "KubeadmConfig",
    "metadata": {
        "annotations": {
            "cluster.x-k8s.io/cloned-from-groupkind": "KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
            "cluster.x-k8s.io/cloned-from-name": "tkg-vc-antrea-md-0"
        },
        "creationTimestamp": "2022-07-26T03:06:52Z",
        "generation": 1,
        "labels": {
            "cluster.x-k8s.io/cluster-name": "tkg-vc-antrea",
            "cluster.x-k8s.io/deployment-name": "tkg-vc-antrea-md-0",
            "machine-template-hash": "2318501170",
            "node-pool": "tkg-vc-antrea-worker-pool",
            "velero.io/backup-name": "prod-backup-include-146",
            "velero.io/restore-name": "prod-restore-include-146"
        },
        "name": "tkg-vc-antrea-md-0-g9vlq",
        "namespace": "default",
        "resourceVersion": "15925",
        "uid": "a348bd01-fbbc-42a6-bc11-178f4630cf82"
    },
    "spec": {
        "files": [],
        "format": "cloud-config",
        "joinConfiguration": {
            "discovery": {
                "bootstrapToken": {
                    "apiServerEndpoint": "10.180.130.83:6443",
                    "caCertHashes": [
                        "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
                    ],
                    "token": "6p9lrl.9f56wnkq55vkjgro"
                }
            },
            "nodeRegistration": {
                "criSocket": "/var/run/containerd/containerd.sock",
                "kubeletExtraArgs": {
                    "cloud-provider": "external",
                    "tls-cipher-suites": "xxx"
                },
                "name": "{{ ds.meta_data.hostname }}"
            }
        },
        "preKubeadmCommands": [
            "hostname \"{{ ds.meta_data.hostname }}\"",
            "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
            "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
            "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
            "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
            "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
            "systemctl restart containerd"
        ],
        "useExperimentalRetryJoin": true,
        "users": [
            {
                "name": "capv",
                "sshAuthorizedKeys": [
                    "ssh-rsa xxx"
                ],
                "sudo": "ALL=(ALL) NOPASSWD:ALL"
            }
        ]
    }
}

@fabriziopandini fabriziopandini added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 29, 2022
@sbueringer
Copy link
Member

I guess one impact of the current behavior is that cluster deletion would ignore the restored KubeadmConfigs without ownerRefs?

@ywk253100
Copy link

There is no status section if the restored KubeadmConfig cannot be adopted by the owner.

This may impact the downstream consumers, e.g. Tanzu considers the machine is still under configuring as Tanzu cannot identify the status

@fabriziopandini
Copy link
Member

IMO fixing this behavior is mostly for ensuring a proper cleanup

WRT downstream consumers, I don't think they should check the status of the BootstrapConfig if machines are already provisioned, it is redundant and not representative of the current machine state (bootstrap already completed).

@killianmuldoon
Copy link
Contributor Author

/assign

@killianmuldoon
Copy link
Contributor Author

/close

This is fixed in #7394

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: Closing this issue.

In response to this:

/close

This is fixed in #7394

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/bootstrap Issues or PRs related to bootstrap providers kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt.
Projects
None yet
Development

No branches or pull requests

5 participants