Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛Requeue while cluster's control plane object is being deleted #2519

Merged
merged 1 commit into from
Mar 4, 2020

Conversation

vincepri
Copy link
Member

@vincepri vincepri commented Mar 4, 2020

Signed-off-by: Vince Prignano [email protected]

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2507

/assign @sedefsavas @ncdc
/milestone v0.3.0-rc.3

@k8s-ci-robot k8s-ci-robot added this to the v0.3.0-rc.3 milestone Mar 4, 2020
@k8s-ci-robot
Copy link
Contributor

@vincepri: GitHub didn't allow me to assign the following users: sedefsavas.

Note that only kubernetes-sigs members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

Signed-off-by: Vince Prignano [email protected]

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #2507

/assign @sedefsavas @ncdc
/milestone v0.3.0-rc.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Mar 4, 2020
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vincepri

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 4, 2020
@k8s-ci-robot k8s-ci-robot requested review from chuckha and detiber March 4, 2020 01:29
@@ -219,6 +219,9 @@ func (r *ClusterReconciler) reconcileDelete(ctx context.Context, cluster *cluste
"failed to delete %v %q for Cluster %q in namespace %q",
obj.GroupVersionKind(), obj.GetName(), cluster.Name, cluster.Namespace)
}
default:
logger.Info("Cluster still has descendants - need to requeue", "controlPlaneRef", cluster.Spec.ControlPlaneRef.Name)
return ctrl.Result{RequeueAfter: deleteRequeueAfter}, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do this here - we have code in the KCP controller that defers deletion of the control plane machines if there are any non-control plane machines still around. Making this change here would cause a deadlock, I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forget why the ControlPlaneRef is handled first, shouldn't that be the last object to be deleted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure we must have had a reason for doing it first...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original PR #2094, trying to dig through the comments, I might just test it moving it down though

@vincepri
Copy link
Member Author

vincepri commented Mar 4, 2020

/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2020
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 4, 2020
@vincepri vincepri force-pushed the requeue-cp branch 2 times, most recently from 71d1218 to 3720fde Compare March 4, 2020 03:16
@vincepri
Copy link
Member Author

vincepri commented Mar 4, 2020

The latest code change behaves as we want, the main changes are:

  • Avoid adding a Cluster OwnerReference when there is a controller OwnerRef, we already do this for MachineSets, seems fair to do the same here.
  • Only count ControlPlaneMachines when listing the Cluster's descendants if there is no control plane provider.

Deleting a cluster with $ k delete cluster test1 results in the Machines to be deleted first:

$ k get machines -A
NAMESPACE   NAME                         PROVIDERID                    PHASE
default     test1-control-plane-828k8    aws:////i-00710d0d692170b68   Running
default     test1-control-plane-ktbmh    aws:////i-0e531be42f37cb0a9   Running
default     test1-control-plane-lnnx8    aws:////i-0092f77ca43ce27d0   Running
default     test1-md-0-7bdfd9f44-k79d7   aws:////i-03d60eff678a27f0b   Deleting

Then after all the workers are gone, the control plane machines in KCP gets a deletion timestamp:

$ k get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io -o json | jq .items[0].metadata.deletionTimestamp
"2020-03-04T02:40:00Z"

And machines associated with it get deleted:

default     test1-control-plane-828k8   aws:////i-00710d0d692170b68   Deleting
default     test1-control-plane-ktbmh   aws:////i-0e531be42f37cb0a9   Deleting
default     test1-control-plane-lnnx8   aws:////i-0092f77ca43ce27d0   Deleting

After a little, the cluster is gone without errors:

$ k get cluster
No resources found in default namespace.

controllers/cluster_controller.go Outdated Show resolved Hide resolved
controllers/machine_controller.go Outdated Show resolved Hide resolved
@vincepri
Copy link
Member Author

vincepri commented Mar 4, 2020

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2020
@ncdc
Copy link
Contributor

ncdc commented Mar 4, 2020

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2020
@k8s-ci-robot k8s-ci-robot merged commit 3ccb9b9 into kubernetes-sigs:master Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Could not delete KubeadmControlPlane object if owner cluster has already being deleted
3 participants