Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Remove bastion security group when disabling the bastion #2114

Merged
merged 1 commit into from
Jun 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 20 additions & 5 deletions pkg/cloud/services/networking/securitygroups.go
Original file line number Diff line number Diff line change
Expand Up @@ -56,9 +56,24 @@ func (s *Service) ReconcileSecurityGroups(openStackCluster *infrav1.OpenStackClu
workerSuffix: secWorkerGroupName,
}

secBastionGroupName := getSecBastionGroupName(clusterResourceName)
if bastionEnabled {
secBastionGroupName := getSecBastionGroupName(clusterResourceName)
suffixToNameMap[bastionSuffix] = secBastionGroupName
} else {
// We reconcile the security groups before the bastion, because the bastion
// needs its security group to be created first when managed security groups are enabled.
// When the bastion is disabled, we will try to delete the security group if it exists.
// In the first attempt, the security group will still be in-use by the bastion instance
// but then the bastion instance will be deleted in the next reconcile loop.
// We do that here because we don't want to manage the bastion security group from
// elsewhere, that could cause infinite loops between ReconCileSecurityGroups and ReconcileBastion.
// Therefore we try to delete the bastion security group as a best effort here
// and also when the cluster is deleted so we're sure it will be deleted at some point.
// https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2113
if err := s.deleteSecurityGroup(openStackCluster, secBastionGroupName); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking for ways to avoid often calling neutron in every reconciliation, but I don't think there is any way how to check if the sg should be retrieved and deleted as the sg reference is not present on the cluster obj anymore. Do you see any possible way?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't think on a way to not make these calls, unless we expose the risk of race conditions if we would do it once early in the function, to get all SGs, and later in the function consume its data, but seconds could happen between the calls... Not idea either but worth considering the risk anyway, since the deletion isn't fatal and creation would fail if it exists already.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here. In general the solution looks good

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only slight worry here is that we're potentially short-cutting everything else that this reconciliation does for something we're expecting to fail sometimes.

I've tried to think through all the ordering scenarios and I suspect this is ok. However, I wonder if it would have been safer to add this delete step at the bottom of the reconcile immediately before setting BastionSecurityGroup to nil.

s.scope.Logger().Info("Non-fatal error when deleting the bastion security group", "name", secBastionGroupName, "error", err)
return nil
}
}

// create security groups first, because desired rules use group ids.
Expand Down Expand Up @@ -351,10 +366,10 @@ func (s *Service) DeleteSecurityGroups(openStackCluster *infrav1.OpenStackCluste
secGroupNames := []string{
getSecControlPlaneGroupName(clusterResourceName),
getSecWorkerGroupName(clusterResourceName),
}

if openStackCluster.Spec.Bastion.IsEnabled() {
secGroupNames = append(secGroupNames, getSecBastionGroupName(clusterResourceName))
// Even if the bastion might be disabled, we still try to delete the security group in case
// we had a bastion before and for some reason we didn't delete its security group.
// https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/2113
getSecBastionGroupName(clusterResourceName),
}

for _, secGroupName := range secGroupNames {
Expand Down
1 change: 1 addition & 0 deletions pkg/cloud/services/networking/securitygroups_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -568,6 +568,7 @@ func TestService_ReconcileSecurityGroups(t *testing.T) {
Return([]groups.SecGroup{{ID: "0", Name: controlPlaneSGName}}, nil)
m.ListSecGroup(groups.ListOpts{Name: workerSGName}).
Return([]groups.SecGroup{{ID: "1", Name: workerSGName}}, nil)
m.ListSecGroup(groups.ListOpts{Name: bastionSGName}).Return(nil, nil)

// We expect a total of 12 rules to be created.
// Nothing actually looks at the generated
Expand Down
6 changes: 6 additions & 0 deletions test/e2e/suites/e2e/e2e_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,9 @@ var _ = Describe("e2e tests [PR-Blocking]", func() {
return false, errors.New("Bastion was not removed in OpenStackCluster.Status")
}, e2eCtx.E2EConfig.GetIntervals(specName, "wait-bastion")...,
).Should(BeTrue())
securityGroupsList, err = shared.DumpOpenStackSecurityGroups(e2eCtx, groups.ListOpts{Tags: clusterName})
Expect(err).NotTo(HaveOccurred())
Expect(securityGroupsList).To(HaveLen(2))

shared.Logf("Delete the bastion")
openStackCluster, err = shared.ClusterForSpec(ctx, e2eCtx, namespace)
Expand Down Expand Up @@ -242,6 +245,9 @@ var _ = Describe("e2e tests [PR-Blocking]", func() {
Expect(err).NotTo(HaveOccurred())
Expect(openStackCluster.Spec.Bastion).To(Equal(openStackClusterWithNewBastionFlavor.Spec.Bastion))
Expect(openStackCluster.Status.Bastion).NotTo(BeNil(), "OpenStackCluster.Status.Bastion with new flavor has not been populated")
securityGroupsList, err = shared.DumpOpenStackSecurityGroups(e2eCtx, groups.ListOpts{Tags: clusterName})
Expect(err).NotTo(HaveOccurred())
Expect(securityGroupsList).To(HaveLen(3))
})
})

Expand Down