KubeadmConfig owner references not restored #6980

killianmuldoon · 2022-07-26T12:26:07Z

When Cluster API is backed up and restored the ownerReferences on the KubeadmConfig are not restored. As there is a check in the KubeadmConfig reconciler for the ownerReference these KubeadmConfigs are not reconciled after a restore is done.

I'm not sure if there's a practical implication for this on the functioning of the cluster as:

New Machines will create new KubeadmConfigs, so new machine bootstrapping will should not be impacted.
If the machines being restored are already bootstrapped the status fields set during reconciliation shouldn't be used in future.

It's possible that the lack of a reconcile on restored KubeadmConfigs will have a different impact for MachinePools.

Regardless, I think we should attempt to restore the ownerReferences on the KubeadmConfig object if they don't have them. We could rebuild the reference from the Cluster name + ControlPlane / MachineDeployment name labels.

/area bootstrap
/kind bug
/kind cleanup
Possibly related: #3134

ywk253100 · 2022-07-27T00:52:58Z

Here is the KubeadmConfig before backing up during my testing:

{
  "apiVersion":"bootstrap.cluster.x-k8s.io/v1beta1",
  "kind":"KubeadmConfig",
  "metadata":{
    "annotations":{
      "cluster.x-k8s.io/cloned-from-groupkind":"KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
      "cluster.x-k8s.io/cloned-from-name":"tkg-vc-antrea-md-0"
    },
    "creationTimestamp":"2022-07-26T02:00:44Z",
    "generation":2,
    "labels":{
      "cluster.x-k8s.io/cluster-name":"tkg-vc-antrea",
      "cluster.x-k8s.io/deployment-name":"tkg-vc-antrea-md-0",
      "machine-template-hash":"2318501170",
      "node-pool":"tkg-vc-antrea-worker-pool"
    },
    "name":"tkg-vc-antrea-md-0-g9vlq",
    "namespace":"default",
    "ownerReferences":[
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "kind":"MachineSet",
        "name":"tkg-vc-antrea-md-0-675d9455c4",
        "uid":"aaf0f32e-30bf-48ec-98b2-3460025abf79"
      },
      {
        "apiVersion":"cluster.x-k8s.io/v1beta1",
        "blockOwnerDeletion":true,
        "controller":true,
        "kind":"Machine",
        "name":"tkg-vc-antrea-md-0-675d9455c4-crnkg",
        "uid":"79e13399-c348-4752-82d5-1aa09370a284"
      }
    ],
    "resourceVersion":"33144",
    "uid":"988556f1-7566-4eff-8470-ecf52c400455"
  },
  "spec":{
    "files":[

    ],
    "format":"cloud-config",
    "joinConfiguration":{
      "discovery":{
        "bootstrapToken":{
          "apiServerEndpoint":"10.180.130.83:6443",
          "caCertHashes":[
            "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
          ],
          "token":"6p9lrl.9f56wnkq55vkjgro"
        }
      },
      "nodeRegistration":{
        "criSocket":"/var/run/containerd/containerd.sock",
        "kubeletExtraArgs":{
          "cloud-provider":"external",
          "tls-cipher-suites":"xxx"
        },
        "name":"{{ ds.meta_data.hostname }}"
      }
    },
    "preKubeadmCommands":[
      "hostname \"{{ ds.meta_data.hostname }}\"",
      "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
      "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
      "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
      "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
      "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
      "systemctl restart containerd"
    ],
    "useExperimentalRetryJoin":true,
    "users":[
      {
        "name":"capv",
        "sshAuthorizedKeys":[
          "ssh-rsa xxx"
        ],
        "sudo":"ALL=(ALL) NOPASSWD:ALL"
      }
    ]
  },
  "status":{
    "conditions":[
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"Ready"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"CertificatesAvailable"
      },
      {
        "lastTransitionTime":"2022-07-26T02:02:45Z",
        "status":"True",
        "type":"DataSecretAvailable"
      }
    ],
    "dataSecretName":"tkg-vc-antrea-md-0-g9vlq",
    "observedGeneration":2,
    "ready":true
  }
}

The same one after restoring:

{
    "apiVersion": "bootstrap.cluster.x-k8s.io/v1beta1",
    "kind": "KubeadmConfig",
    "metadata": {
        "annotations": {
            "cluster.x-k8s.io/cloned-from-groupkind": "KubeadmConfigTemplate.bootstrap.cluster.x-k8s.io",
            "cluster.x-k8s.io/cloned-from-name": "tkg-vc-antrea-md-0"
        },
        "creationTimestamp": "2022-07-26T03:06:52Z",
        "generation": 1,
        "labels": {
            "cluster.x-k8s.io/cluster-name": "tkg-vc-antrea",
            "cluster.x-k8s.io/deployment-name": "tkg-vc-antrea-md-0",
            "machine-template-hash": "2318501170",
            "node-pool": "tkg-vc-antrea-worker-pool",
            "velero.io/backup-name": "prod-backup-include-146",
            "velero.io/restore-name": "prod-restore-include-146"
        },
        "name": "tkg-vc-antrea-md-0-g9vlq",
        "namespace": "default",
        "resourceVersion": "15925",
        "uid": "a348bd01-fbbc-42a6-bc11-178f4630cf82"
    },
    "spec": {
        "files": [],
        "format": "cloud-config",
        "joinConfiguration": {
            "discovery": {
                "bootstrapToken": {
                    "apiServerEndpoint": "10.180.130.83:6443",
                    "caCertHashes": [
                        "sha256:ea50162f9a80682561413b2dac768e7b1de60350adb30c090f71ae5645203ec7"
                    ],
                    "token": "6p9lrl.9f56wnkq55vkjgro"
                }
            },
            "nodeRegistration": {
                "criSocket": "/var/run/containerd/containerd.sock",
                "kubeletExtraArgs": {
                    "cloud-provider": "external",
                    "tls-cipher-suites": "xxx"
                },
                "name": "{{ ds.meta_data.hostname }}"
            }
        },
        "preKubeadmCommands": [
            "hostname \"{{ ds.meta_data.hostname }}\"",
            "echo \"::1         ipv6-localhost ipv6-loopback\" \u003e/etc/hosts",
            "echo \"127.0.0.1   localhost\" \u003e\u003e/etc/hosts",
            "echo \"127.0.0.1   {{ ds.meta_data.hostname }}\" \u003e\u003e/etc/hosts",
            "echo \"{{ ds.meta_data.hostname }}\" \u003e/etc/hostname",
            "sed -i 's|\".*/pause|\"projects-stg.registry.vmware.com/tkg/pause|' /etc/containerd/config.toml",
            "systemctl restart containerd"
        ],
        "useExperimentalRetryJoin": true,
        "users": [
            {
                "name": "capv",
                "sshAuthorizedKeys": [
                    "ssh-rsa xxx"
                ],
                "sudo": "ALL=(ALL) NOPASSWD:ALL"
            }
        ]
    }
}

sbueringer · 2022-08-16T12:01:52Z

I guess one impact of the current behavior is that cluster deletion would ignore the restored KubeadmConfigs without ownerRefs?

ywk253100 · 2022-08-17T02:40:40Z

There is no status section if the restored KubeadmConfig cannot be adopted by the owner.

This may impact the downstream consumers, e.g. Tanzu considers the machine is still under configuring as Tanzu cannot identify the status

fabriziopandini · 2022-08-26T11:58:09Z

IMO fixing this behavior is mostly for ensuring a proper cleanup

WRT downstream consumers, I don't think they should check the status of the BootstrapConfig if machines are already provisioned, it is redundant and not representative of the current machine state (bootstrap already completed).

killianmuldoon · 2022-10-11T14:54:30Z

/assign

killianmuldoon · 2022-11-29T16:39:56Z

/close

This is fixed in #7394

k8s-ci-robot · 2022-11-29T16:40:01Z

@killianmuldoon: Closing this issue.

In response to this:

/close

This is fixed in #7394

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added area/bootstrap Issues or PRs related to bootstrap providers kind/bug Categorizes issue or PR as related to a bug. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jul 26, 2022

fabriziopandini added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Jul 29, 2022

k8s-ci-robot assigned killianmuldoon Oct 11, 2022

killianmuldoon mentioned this issue Oct 11, 2022

🐛 Ensure Kubeadmconfig is reconciled after datasecretname is set #7394

Merged

k8s-ci-robot closed this as completed Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KubeadmConfig owner references not restored #6980

KubeadmConfig owner references not restored #6980

killianmuldoon commented Jul 26, 2022

ywk253100 commented Jul 27, 2022

sbueringer commented Aug 16, 2022

ywk253100 commented Aug 17, 2022

fabriziopandini commented Aug 26, 2022

killianmuldoon commented Oct 11, 2022

killianmuldoon commented Nov 29, 2022

k8s-ci-robot commented Nov 29, 2022

KubeadmConfig owner references not restored #6980

KubeadmConfig owner references not restored #6980

Comments

killianmuldoon commented Jul 26, 2022

ywk253100 commented Jul 27, 2022

sbueringer commented Aug 16, 2022

ywk253100 commented Aug 17, 2022

fabriziopandini commented Aug 26, 2022

killianmuldoon commented Oct 11, 2022

killianmuldoon commented Nov 29, 2022

k8s-ci-robot commented Nov 29, 2022