Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable deletion of floating IP on cluster deletion #1615

Closed
seanrmurphy opened this issue Jul 21, 2023 · 28 comments · Fixed by #1728
Closed

Disable deletion of floating IP on cluster deletion #1615

seanrmurphy opened this issue Jul 21, 2023 · 28 comments · Fixed by #1728
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@seanrmurphy
Copy link

seanrmurphy commented Jul 21, 2023

/kind feature

Describe the solution you'd like
I would like to have the option of not deleting the floating IP when deleting the cluster.

I have a scenario in which the floating ip is generated a priori and provided as input to Cluster API cluster
creation mechanisms (and also to a cilium helm chart). I am still experimenting with Cluster API so I'm going
through workflows of creating clusters and removing them - things generally work fine and when the cluster
is created, the floating ip is attached to the load balancer and the cluster is accessible.

One specific issue that I have is that when the cluster is deleted, the floating IP is removed from my Openstack project - this means that every time I want to recreate the cluster, I need to generate a new floating ip and modify my config to use this new floating ip. It would be preferable if I could have some flag which could control whether the floating ip is deleted or not in the case that it is provided as an input.

Anything else you would like to add:
We are a user of an Openstack service; we don't manage the Openstack cluster ourselves, so we prob
don't have authorizations to demand a specific floating ip all the time; in any case, removing the floating ip
from the project gives someone else the opportunity to use it and even if we did have higher level privileges,
we may not be easily able to take it from a user in the case that a user had taken this IP and attached it
to some resource.

@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Jul 21, 2023
@jichenjc
Copy link
Contributor

have you tried this https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/api/v1alpha7/openstackcluster_types.go#L90C2-L90C22 ?
e.g the floating ip is created on the fly so you don't need manually create every time?

@seanrmurphy
Copy link
Author

Thanks for that @jichenjc - yep, I have been using that and it basically works fine.

However, I'm trying to automate stuff as much as possible; more specifically, I'm automating cluster creation with cluster API and addition of the cilium based network plumbing once the cluster has been created. To do this in an automated manner, I need to provide the cilium helm chart with the floating IP of the cluster; I have not found a way to do this an auto-generated floating IP. Consequently, my solution is to create the floating IP a priori and specify it for cluster creation and pass it to the cilium helm chart.

In case you're interested in more detaills, the scenario is one in which we have flux on the mgmt cluster, a new workload cluster is created in a git repo (including the cluster itself and the cilium helm chart) and it should all come up in one go.

Do you think this is reasonable? Can you think of alternative approaches which could achieve the same?

@jichenjc
Copy link
Contributor

um.. maybe I don't fully understand the scenario but I Think it's more like a gitops scearnio so everything need to be auto

@seanrmurphy
Copy link
Author

That's more or less the scenario; however, we do not control the Openstack cluster and as such, we are not certain that we can obtain a specific floating IP.

More specifically, if the floating IP is deleted, it is no longer associated with our project and consequently another user could take this floating IP and hence it would not be available to us.

What could be good is if there was say a 'donotdelete' tag on a given floating ip, CAPO would not delete the floating ip when removing a cluster - would this make sense?

@dulek
Copy link
Contributor

dulek commented Jul 26, 2023

I can confirm that by default non-admin users cannot create a FIP with a specific address as indicated here.

The use case makes sense to me, but I think I'd rather make it possible to provide an ID of a floating IP to use and then not delete it.

@jichenjc
Copy link
Contributor

jichenjc commented Jul 27, 2023

The use case makes sense to me, but I think I'd rather make it possible to provide an ID of a floating IP to use and then not delete it.

I think it might be ok ,so if we are given a uuid ,then we will use it and not delete it , if it's ip , we create /use then later delete it
something like

APIServerFloatingIP .is_uuid():
              use floating ip directly
else 
               // same to before logic
               get or create floating ip 

or we can create a new variable which might be clear ..both fine to me

@dulek
Copy link
Contributor

dulek commented Jul 27, 2023

The use case makes sense to me, but I think I'd rather make it possible to provide an ID of a floating IP to use and then not delete it.

I think it might be ok ,so if we are given a uuid ,then we will use it and not delete it , if it's ip , we create /use then later delete it something like

APIServerFloatingIP .is_uuid():
              use floating ip directly
else 
               // same to before logic
               get or create floating ip 

or we can create a new variable which might be clear ..both fine to me

Yeah, I had that idea and I don't see anything in API guidelines that prevent that, but it still might be a bad API practice. @mdbooth, what do you think?

@seanrmurphy
Copy link
Author

I can think of three different scenarios here:

  • fip is generated a priori and given as input to the cluster API generation mechanisms
    • in this case, it's more natural that the fip is not removed in case of cluster deletion
  • fip is not generated a priori but specific ip addr is given as input to cluster API generation mechanisms (requires elevated privileges)
    • I guess it is prob more natural to delete the fip in this case
  • no fip specified
    • i guess in this case it is prob more natural to remove the fip

For me, it's a bit unnatural to allow a field to be either an ip addr or a uuid but of course if it provides a workable solution for me it's perfectly fine.

If this is the solution, then I think it's important that this non-obvious (to me!) behaviour is well documented.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 27, 2023

In general, I'm not currently inclined to add features to CAPO which require elevated OpenStack privileges. There may come a time where that changes, but if it does I would like to be extremely well considered. For that reason I'd like to exclude the 'passing an IP address' option for now.

I agree with @dulek about heuristic semantics for APIServerFloatingIP: I'd prefer not to do that either.

The remaining option, passing in an explicitly externally-managed floating IP, would follow a pattern we already have for networks, subnets, and routers. The mechanism we have there (IIRC: testing if NodeCIDR is set or not) isn't great either, but the use case is at least consistent.

I wonder if we're nearly there already. How about:

  • Spec.APIServerFloatingIP is set iff1 it was pre-created. This is presumably already true?
  • We add a new field to cluster status, something like:
type FloatingIPStatus struct {
    IP String
    UUID String
}

OpenStackClusterStatus {
    ...
    APIServerFloatingIP *FloatingIPStatus
    ...
}
  • On reconciliation of APIServerFloatingIP we populate Status.APIServerFloatingIP either with details of Spec.APIServerFloatingIP if it was specified, or a newly created floating IP if it was not
  • Subsequently, controllers consider Status.APIServerFloatingIP to be canonical
  • On deletion, we delete the floating IP in Status.APIServerFloatingIP iff it is not specified in Spec.APIServerFloatingIP.

Footnotes

  1. https://en.wikipedia.org/wiki/If_and_only_if

@seanrmurphy
Copy link
Author

Thanks for that @mdbooth.

In general, I'm not currently inclined to add features to CAPO which require elevated OpenStack privileges. There may come a time where that changes, but if it does I would like to be extremely well considered. For that reason I'd like to exclude the 'passing an IP address' option for now.

Currently the modus operandi is that it is possible to specify a FIP using an IP addr; if this FIP already exists, then it is used, but if no FIP exists with the given IP addr, then one is created. As noted above creating a FIP specifying an IP requires elevated privileges so use of elevated privileges is already (to some extent) baked in to the current design.

@mdbooth
Copy link
Contributor

mdbooth commented Jul 31, 2023

Thanks for that @mdbooth.

In general, I'm not currently inclined to add features to CAPO which require elevated OpenStack privileges. There may come a time where that changes, but if it does I would like to be extremely well considered. For that reason I'd like to exclude the 'passing an IP address' option for now.

Currently the modus operandi is that it is possible to specify a FIP using an IP addr; if this FIP already exists, then it is used, but if no FIP exists with the given IP addr, then one is created. As noted above creating a FIP specifying an IP requires elevated privileges so use of elevated privileges is already (to some extent) baked in to the current design.

I suspect this was accidental rather than deliberate. Either way I would not be inclined to add to it.

Is there any reason you can't rely on a pre-created FIP with the modifications to CAPO outlined above?

@seanrmurphy
Copy link
Author

Is there any reason you can't rely on a pre-created FIP with the modifications to CAPO outlined above?

I'm not sure I understand correctly.

You say:

Spec.APIServerFloatingIP is set iff it was pre-created. This is presumably already true?

This is not true in general. For my scenario, it is true as I don't have elevated privileges on the Openstack cluster but it is possible to specify an IP addr for which no FIP exists and CAPO will try to create an FIP using this IP addr and presumably fail somehow if this is not possible - see https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/api/v1alpha7/openstackcluster_types.go#L87

More generally, regarding your proposal: I don't understand the specifics. If Spec.APIServerFloatingIP maps to a FIP uuid, then I guess it will work; if it maps to an IP addr, then I think it's not possible to determine if the FIP was created by CAPO or if it was generated a priori.

@dulek
Copy link
Contributor

dulek commented Jul 31, 2023

Summing up - currently you can only specify an IP as APIServerFloatingIP. If it exists - it'll be used, but also removed after the cluster gets deleted. If it doesn't, CAPO will try to create it, but creating a FIP with specified address is an admin operation with default Neutron policies.

My point was that we should allow specifying FIP by an UUID and then make sure we're not removing it.

Now thinking about this more - I wonder if we shouldn't just abstain from removing the FIP if it was specified by address and was existing, so CAPO just used an existing resource. That way we're probably solving @seanrmurphy's use case.

@seanrmurphy
Copy link
Author

@dulek - that would be fine with me; I guess it would simply involve adding a check to determine if the FIP with the specified addr exists at cluster creation time; if so, it somehow (?) tracks that the FIP existed a priori and was not created by CAPO. This tracking info can then be used in case of cluster deletion to determine if the FIP should be removed.

@jichenjc
Copy link
Contributor

jichenjc commented Aug 1, 2023

I guess it would simply involve adding a check to determine if the FIP with the specified addr exists at cluster creation time; if so, it somehow (?) tracks that the FIP existed a priori and was not created by CAPO. This tracking info can then be used in case of cluster deletion to determine if the FIP should be removed.

in other discussion(e.g keep port from deletion during VM delete) I Think we want to put into status.xxx (e.g status.createdExternalFIP) if we create external FIP and during delete VM, we should be able to keep the FIP with the status field true/false?

@seanrmurphy
Copy link
Author

I also like the idea which I proposed above of adding DONOTDELETE via tags - there would then be some default logic but it could be overridden if the user adds the above tag. More specifically, I could imagine a scenario where someone brings up a cluster with cluster API and the FIP gets created by CAPO - the default behaviour if, say, the cluster was torn down would be to delete the FIP. However, the FIP could be registered in some DNS servers and people may realize that they don't want the FIP to be deleted by CAPO - in this case, they could add this tag as a mechanism to tell CAPO to not delete the FIP.

WDYT?

@jichenjc
Copy link
Contributor

jichenjc commented Aug 2, 2023

I am ok with above proposal, maybe need think about the persistent flag for the resource CAPO
e.g resource created by CAPO or CAPO use but created from external ,so make those consistent ..

@dulek
Copy link
Contributor

dulek commented Aug 2, 2023

I don't really like adding arbitrary tags to the OpenStack resources. It feels to me that the source of truth for how CAPO should operate is the Cluster object, not OpenStack.

I understand the implementation is simple though. It's just it doesn't feel right, I'd rather have it as an option on the CLuster.

@seanrmurphy
Copy link
Author

So I guess it might make sense to have some kind of removeFloatingIPOnClusterDeletion field which can be changed after the cluster is created - would that make sense?

@dulek
Copy link
Contributor

dulek commented Aug 7, 2023

I'd be more willing to accept such a solution. Not sure about the others.

@jichenjc
Copy link
Contributor

jichenjc commented Aug 7, 2023

+1 to me , as long as we have something can solve this issue and if we have strong objection later we can adjust the design

@mdbooth
Copy link
Contributor

mdbooth commented Aug 30, 2023

I also like the idea which I proposed above of adding DONOTDELETE via tags - there would then be some default logic but it could be overridden if the user adds the above tag. More specifically, I could imagine a scenario where someone brings up a cluster with cluster API and the FIP gets created by CAPO - the default behaviour if, say, the cluster was torn down would be to delete the FIP. However, the FIP could be registered in some DNS servers and people may realize that they don't want the FIP to be deleted by CAPO - in this case, they could add this tag as a mechanism to tell CAPO to not delete the FIP.

If I paraphrase, can you confirm this is what you're describing:

  • A user brings up a new cluster and CAPO creates a new FIP for it
  • The user subsequently decides that the FIP should be persistent and wants to ensure that the FIP will not be deleted with the cluster

Despite being a user 'whoops', I think this is a reasonable use case for the FIP and possibly other resources.

I have 2 unrelated thoughts about this. I think we could reasonably do either or both.

Tags

We've been talking for some time about using tags to resolve the problem of accidentally deleting objects from other clusters when there are naming clashes. Leaving aside upgrade considerations for a moment, how about we decide that any resource which should be deleted with the cluster should be tagged with something like capo-owner=<k8s object uid>. An advanced user could then have a get out of jail free card by removing the tag from anything they didn't want deleted.

It's a bit janky as an API and probably not very discoverable, but it would basically be free if we implemented tags for resource deletion, which we're hopefully going to do anyway.

More explicit adoption of BYO resources in the spec

In an earlier comment I suggested that we could delete the FIP if (and only if) it was not explicitly specified in the spec. I still very much like this ideal in principal.

We currently support BYO1 for a number of network resources:

  • Router
  • Network
  • Subnet
  • ExternalNetwork (although this is always BYO)

I don't like the way these are gated on the setting of NodeCIDR because it's not at all obvious. I thought of collecting these in an 'externalResources' field or something to make it super obvious what they mean. This is somewhat orthogonal to this discussion, although I think it would make what I propose below more obvious to the user.

How about we adopt this principal for resources: If it was specified as BYO in the spec then we don't delete it.

I think this would allow us to solve both the whoops and the non-whoops case with the cluster FIP:

Non-whoops (the user knew to specify it up front): This is as described in my comment above. The user specifies the FIP in an explicitly BYO field of the spec. It is an error if the resource doesn't already exist2. On deletion we don't delete BYO resources, so it is not deleted.

Whoops (the user decides later they want to keep it): They can add the existing FIP to the cluster spec as a BYO resource. We can validate this because it will also be in the cluster status and we can assert that if setting the FIP in the cluster spec from empty it can only be set to exactly what is specified in the status.

Footnotes

  1. Bring Your Own

  2. I am not at all keen on adding anything which requires admin permissions.

@jichenjc
Copy link
Contributor

jichenjc commented Sep 1, 2023

How about we adopt this principal for resources: If it was specified as BYO in the spec then we don't delete it.

I like the idea as it's clear msg we can deliver to user and follow such rule we should make code easier to handle logic
so the question become check existing logic to see whether we have any logic conflict to this proposed rule

@seanrmurphy
Copy link
Author

If I paraphrase, can you confirm this is what you're describing:

  • A user brings up a new cluster and CAPO creates a new FIP for it
  • The user subsequently decides that the FIP should be persistent and wants to ensure that the FIP will not be deleted with the cluster

Despite being a user 'whoops', I think this is a reasonable use case for the FIP and possibly other resources.

Yes - this is the scenario I envisaged; it's also the scenario which I have encountered
(in some sense) and having the FIP deleted each time has caused significant friction in
my testing.

Either (or both) of the proposals are fine with me.

@alexandrevilain
Copy link
Contributor

alexandrevilain commented Sep 14, 2023

Hi!

I'm facing the same issue, but for the bastion floating IP.

When I'm forcing the bastion rebuild (basically by emptying the infrastructure.cluster.x-k8s.io/bastion-hash annotation), the floating IP is deleted and as I have don't have authorization to demand the same floating IP it fails to spawn a new bastion.

My two cents:

  • The idea of removeFloatingIPOnClusterDeletion is great and easy to understand, but we would have to add the removeFloatingIPOnBastionDeletion too ..
  • @mdbooth's proposal: I suggested that we could delete the FIP if (and only if) it was not explicitly specified in the spec. is a good idea.

By looking at the deleteBastion function (https://github.com/kubernetes-sigs/cluster-api-provider-openstack/blob/main/controllers/openstackcluster_controller.go#L208), the deletion condition could be updated from:

if address.Type == corev1.NodeExternalIP {
 // call to networkingService.DeleteFloatingIP()
}

to:

if address.Type == corev1.NodeExternalIP &&  address.Address != openStackCluster.Spec.Bastion.Instance.FloatingIP  {
 // call to networkingService.DeleteFloatingIP()
}

What do you think ?

@mdbooth
Copy link
Contributor

mdbooth commented Sep 15, 2023

This is more complicated for the Bastion. In fact, that's for bringing this up, because that FloatingIP field isn't used by the machine controller, only for the Bastion. We should remove it from the MachineSpec before v1beta1.

Unlike the API server's floating IP, the Bastion's floating IP doesn't have such an obvious lifetime. If I create a Bastion then I'm going to create a floating IP for it. If I delete the Bastion, I should presumably delete the floating IP for it. If I force a rebuild of the Bastion, is keeping the floating IP a useful optimisation?

I think we should remove MachineSpec.FloatingIP and add a FloatingIP to Cluster.Spec.Bastion. We can then apply the same deletion rule for Bastion as the the apiserver: if bastion.floatingIP is defined we:

  • Require the floating IP to exist already (on creation)
  • Don't delete it on bastion deletion

@mdbooth
Copy link
Contributor

mdbooth commented Sep 15, 2023

#1674

@alexandrevilain
Copy link
Contributor

Hi @mdbooth

I proposed a PR which supports both your "Non-whoops" and "Whoops" use-cases.

I hope it can fix this issue as I'm also impacted (I don't have authorization to demand the same floating IP).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
6 participants