Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve IBMPowerVSCluster deletion #1825

Merged
merged 1 commit into from
Jun 21, 2024

Conversation

dharaneeshvrd
Copy link
Contributor

What this PR does / why we need it:

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1805

Special notes for your reviewer:

/area provider/ibmcloud

  1. Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.

Release note:


@k8s-ci-robot k8s-ci-robot added area/provider/ibmcloud Issues or PRs related to ibmcloud provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 4, 2024
Copy link

netlify bot commented Jun 4, 2024

Deploy Preview for kubernetes-sigs-cluster-api-ibmcloud ready!

Name Link
🔨 Latest commit 726be6b
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-ibmcloud/deploys/66729e2be7e23a00081c2bfd
😎 Deploy Preview https://deploy-preview-1825--kubernetes-sigs-cluster-api-ibmcloud.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 4, 2024
@@ -311,7 +311,8 @@ func (r *IBMPowerVSClusterReconciler) reconcileDelete(ctx context.Context, clust

clusterScope.Info("Deleting DHCP server")
if err := clusterScope.DeleteDHCPServer(); err != nil {
allErrs = append(allErrs, errors.Wrapf(err, "failed to delete DHCP server"))
clusterScope.Error(err, "failed to delete DHCP server")
return reconcile.Result{RequeueAfter: 30 * time.Second}, nil
Copy link
Contributor

@Amulyam24 Amulyam24 Jun 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dharaneeshvrd, if the DHCP server deletion constantly fails, won't this keep requeuing the request and not proceed with PowerVS service instance deletion causing cluster deletion to be stuck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently also same thing will happen with this block https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/blob/main/cloud/scope/powervs_cluster.go#L2543toL2554
We need to delete before proceeding to the powervs instance right? or we can try deleting powervs instance after few retries anyway?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, currently also same thing will happen with this block https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/blob/main/cloud/scope/powervs_cluster.go#L2543toL2554

Yes, currently it's doing the same. I'm trying to understand how this change will help

we can try deleting powervs instance after few retries anyway?

right, I feel we need to proceed with PowerVS service instance deletion despite failure with DHCP, wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not tested the scenario but in case of DHCP failure, will the service instance deletion also get stuck?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes if the dhcp server did not get deleted, I don't think powervs instance will get deleted successfully.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, Thanks. @Amulyam24 was there any particular reason for explicitly deleting DHCP server in CAPI?

Copy link
Contributor

@Amulyam24 Amulyam24 Jun 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial flow before adding proper state management of resources, we added the DHCP deletion and check to make sure PowerVS workspace deletion succeeds. We can consider removing it now and proceed with workspace deletion and retry until it is succeeds. We need to test it thoroughly to make sure it is cleaned up. @dharaneeshvrd can you please try this out and let us know?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure will try and see!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried many times in different region, the deletion went through without explicitly deleting the DHCP server.
Modified the code accordingly, please check!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, Thanks for verifying.

@Prajyot-Parab
Copy link
Contributor

/cc @Karthik-K-N

@k8s-ci-robot k8s-ci-robot requested a review from Karthik-K-N June 13, 2024 17:16
@dharaneeshvrd dharaneeshvrd force-pushed the improve-deletion branch 2 times, most recently from 644791b to 0730fa7 Compare June 17, 2024 12:51
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 17, 2024
@@ -2492,32 +2490,6 @@ func (s *PowerVSClusterScope) deleteTransitGatewayConnections(tg *tgapiv1.Transi
return requeue, nil
}

// DeleteDHCPServer deletes DHCP server.
func (s *PowerVSClusterScope) DeleteDHCPServer() error {
if !s.isResourceCreatedByController(infrav1beta2.ResourceTypeDHCPServer) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think there will be scenario where user provides existing workspace and expects the DHCP server to be created? If so in that case we need to delete only DHCP server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we need to handle that. Will do the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this, ptal!

@dharaneeshvrd dharaneeshvrd force-pushed the improve-deletion branch 2 times, most recently from 06c5ad5 to c795492 Compare June 18, 2024 04:05
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jun 18, 2024
@@ -2498,6 +2498,10 @@ func (s *PowerVSClusterScope) DeleteDHCPServer() error {
s.Info("Skipping DHP server deletion as resource is not created by controller")
return nil
}
if s.isResourceCreatedByController(infrav1beta2.ResourceTypeServiceInstance) {
s.Info("Skipping DHCP server deletion as PowerVS service instance is created by controller, will directly delete the PowerVS service instance")
Copy link
Contributor

@Karthik-K-N Karthik-K-N Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think user may not know PowerVS service instance delete will also delete DHCP server and may get confused after seeing word Skipping, I think its better to either use V(3) or update the log message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the log, let's have this log IMO, since its skipping a major action in delete flow.

ID: subnet.ID,
})

if err != nil {
if strings.Contains(err.Error(), string(VPCSubnetNotFound)) {
if resp.StatusCode == ResourceNotFoundCode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is it safe to assume that resp wont be nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there is a possibility for it to be nil, added a nil check.

@dharaneeshvrd dharaneeshvrd force-pushed the improve-deletion branch 2 times, most recently from 2b7a3fd to dbe9618 Compare June 18, 2024 13:14
Copy link
Contributor

@Karthik-K-N Karthik-K-N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2024
Copy link
Contributor

@Amulyam24 Amulyam24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit, otherwise LGTM. Thanks for verifying the deletion with DHCP server!


// TransitGatewayNotFound is the error returned when a transit gateway is not found.
TransitGatewayNotFound = ResourceNotFound("gateway was not found")
// ResourceNotFoundCode indicates the http status code used to denote that the resource not exists.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about simplifying as?

Suggested change
// ResourceNotFoundCode indicates the http status code used to denote that the resource not exists.
// ResourceNotFoundCode indicates the http status code when a resource does not exist.

Skip DHCP server deletion when PowerVS service instance is created by controller
Since PowerVS service instance deletion will take care of deleting the DHCP server as well
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 19, 2024
@dharaneeshvrd
Copy link
Contributor Author

@Karthik-K-N @Amulyam24 addressed the comments, please take a look!

Copy link
Contributor

@Amulyam24 Amulyam24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 21, 2024
Copy link
Contributor

@Karthik-K-N Karthik-K-N left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Member

@mkumatag mkumatag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Amulyam24, dharaneeshvrd, Karthik-K-N, mkumatag

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 21, 2024
@k8s-ci-robot k8s-ci-robot merged commit af94a98 into kubernetes-sigs:main Jun 21, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/provider/ibmcloud Issues or PRs related to ibmcloud provider cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhance deletion logic
6 participants