-
Notifications
You must be signed in to change notification settings - Fork 257
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 Remove bastion security group when disabling the bastion #2114
Conversation
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
/hold |
/lgtm |
We reconcile the security groups before the bastion, because the bastion needs its security group to be created first when managed security groups are enabled. When the bastion is disabled, we will try to delete the security group if it exists. In the first attempt, the security group will still be in-use by the bastion instance but then the bastion instance will be deleted in the next reconcile loop. We do that here because we don't want to manage the bastion security group from elsewhere, that could cause infinite loops between ReconCileSecurityGroups and ReconcileBastion. Therefore we try to delete the bastion security group as a best effort here and also when the cluster is deleted so we're sure it will be deleted at some point. Also, we're trying to remove it when the cluster is deleted in case it wasn't done before. This doesn't trigger an error if the security group didn't exist. Adding e2e tests to cover the scenarios: * Disabling the bastion should reduce the total number of managed SGs to 2. * Re-enabling it should make it to 3 SGs.
/hold cancel |
// Therefore we try to delete the bastion security group as a best effort here | ||
// and also when the cluster is deleted so we're sure it will be deleted at some point. | ||
// https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2113 | ||
if err := s.deleteSecurityGroup(openStackCluster, secBastionGroupName); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was looking for ways to avoid often calling neutron in every reconciliation, but I don't think there is any way how to check if the sg should be retrieved and deleted as the sg reference is not present on the cluster obj anymore. Do you see any possible way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't think on a way to not make these calls, unless we expose the risk of race conditions if we would do it once early in the function, to get all SGs, and later in the function consume its data, but seconds could happen between the calls... Not idea either but worth considering the risk anyway, since the deletion isn't fatal and creation would fail if it exists already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here. In general the solution looks good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only slight worry here is that we're potentially short-cutting everything else that this reconciliation does for something we're expecting to fail sometimes.
I've tried to think through all the ordering scenarios and I suspect this is ok. However, I wonder if it would have been safer to add this delete step at the bottom of the reconcile immediately before setting BastionSecurityGroup to nil.
/lgtm |
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jichenjc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
What this PR does / why we need it:
We reconcile the security groups before the bastion, because the bastion
needs its security group to be created first when managed security groups are enabled.
When the bastion is disabled, we will try to delete the security group if it exists.
In the first attempt, the security group will still be in-use by the bastion instance
but then the bastion instance will be deleted in the next reconcile loop.
We do that here because we don't want to manage the bastion security group from
elsewhere, that could cause infinite loops between ReconcileSecurityGroups and ReconcileBastion.
Therefore we try to delete the bastion security group as a best effort here
and also when the cluster is deleted so we're sure it will be deleted at some point.
Also, we're trying to remove it when the cluster is deleted in case it
wasn't done before. This doesn't trigger an error if the security group
didn't exist.
Adding e2e tests to cover the scenarios:
Which issue(s) this PR fixes:
Fixes #2113