Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Remove bastion security group when disabling the bastion #2114

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

EmilienM
Copy link
Contributor

@EmilienM EmilienM commented Jun 5, 2024

What this PR does / why we need it:

We reconcile the security groups before the bastion, because the bastion
needs its security group to be created first when managed security groups are enabled.
When the bastion is disabled, we will try to delete the security group if it exists.
In the first attempt, the security group will still be in-use by the bastion instance
but then the bastion instance will be deleted in the next reconcile loop.
We do that here because we don't want to manage the bastion security group from
elsewhere, that could cause infinite loops between ReconcileSecurityGroups and ReconcileBastion.
Therefore we try to delete the bastion security group as a best effort here
and also when the cluster is deleted so we're sure it will be deleted at some point.

Also, we're trying to remove it when the cluster is deleted in case it
wasn't done before. This doesn't trigger an error if the security group
didn't exist.

Adding e2e tests to cover the scenarios:

  • Disabling the bastion should reduce the total number of managed SGs to 2.
  • Re-enabling it should make it to 3 SGs.

Which issue(s) this PR fixes:
Fixes #2113

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 5, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 5, 2024
Copy link

netlify bot commented Jun 5, 2024

Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name Link
🔨 Latest commit 489961a
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/6661b1f1094e2100089a329a
😎 Deploy Preview https://deploy-preview-2114--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@EmilienM EmilienM changed the title Remove bastion security group when disabling the bastion 🐛 Remove bastion security group when disabling the bastion Jun 5, 2024
@EmilienM
Copy link
Contributor Author

EmilienM commented Jun 5, 2024

/hold
I need to fix unit tests

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 5, 2024
@EmilienM EmilienM marked this pull request as draft June 6, 2024 01:58
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024
@jichenjc
Copy link
Contributor

jichenjc commented Jun 6, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2024
We reconcile the security groups before the bastion, because the bastion
needs its security group to be created first when managed security groups are enabled.
When the bastion is disabled, we will try to delete the security group if it exists.
In the first attempt, the security group will still be in-use by the bastion instance
but then the bastion instance will be deleted in the next reconcile loop.
We do that here because we don't want to manage the bastion security group from
elsewhere, that could cause infinite loops between ReconCileSecurityGroups and ReconcileBastion.
Therefore we try to delete the bastion security group as a best effort here
and also when the cluster is deleted so we're sure it will be deleted at some point.

Also, we're trying to remove it when the cluster is deleted in case it
wasn't done before. This doesn't trigger an error if the security group
didn't exist.

Adding e2e tests to cover the scenarios:
* Disabling the bastion should reduce the total number of managed SGs to
  2.
* Re-enabling it should make it to 3 SGs.
@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 6, 2024
@EmilienM EmilienM marked this pull request as ready for review June 6, 2024 12:56
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 6, 2024
@k8s-ci-robot k8s-ci-robot requested a review from mdbooth June 6, 2024 12:56
@EmilienM
Copy link
Contributor Author

EmilienM commented Jun 6, 2024

/hold cancel
/cc mdbooth

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 6, 2024
// Therefore we try to delete the bastion security group as a best effort here
// and also when the cluster is deleted so we're sure it will be deleted at some point.
// https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2113
if err := s.deleteSecurityGroup(openStackCluster, secBastionGroupName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for ways to avoid often calling neutron in every reconciliation, but I don't think there is any way how to check if the sg should be retrieved and deleted as the sg reference is not present on the cluster obj anymore. Do you see any possible way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think on a way to not make these calls, unless we expose the risk of race conditions if we would do it once early in the function, to get all SGs, and later in the function consume its data, but seconds could happen between the calls... Not idea either but worth considering the risk anyway, since the deletion isn't fatal and creation would fail if it exists already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here. In general the solution looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only slight worry here is that we're potentially short-cutting everything else that this reconciliation does for something we're expecting to fail sometimes.

I've tried to think through all the ordering scenarios and I suspect this is ok. However, I wonder if it would have been safer to add this delete step at the bottom of the reconcile immediately before setting BastionSecurityGroup to nil.

@MaysaMacedo
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 6, 2024
@jichenjc
Copy link
Contributor

jichenjc commented Jun 7, 2024

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jichenjc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 7, 2024
@k8s-ci-robot k8s-ci-robot merged commit 0b793c1 into kubernetes-sigs:main Jun 7, 2024
9 checks passed
@EmilienM EmilienM deleted the issue_2113 branch June 7, 2024 01:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Bastion security group is not deleted when removing a cluster that had bastion and then disabled
5 participants