Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ephemeral IP should ideally be released once an instance stops running #2715

Open
askfongjojo opened this issue Mar 30, 2023 · 3 comments
Open
Milestone

Comments

@askfongjojo
Copy link

Currently ephemeral IP address seems to be released only after an instance has been destroyed (probably the expected behavior per #1333). Customer will likely want to optimize external IP resources and expect the public cloud experience, i.e. free up the IP address for the next guest if the instance with it has stopped running after certain time threshold.

@bnaecker
Copy link
Collaborator

As you noted, this behavior is intentional. The issue is that what you're describing is both confusing as a user and hard for us to implement. Suppose we assign an IP address to a guest, and then claw it back after the instance has been stopped for some time. What happens if the customer then restarts that exact same instance? Do we a assign a new address? If no new addresses are available, does the instance fail to start?

We've generally been taking the path that resources assigned to an instance are reserved as long as the instance exists. That "wastes" resources on stopped instances. It's also much easier for the customer to reason about and plan for, and easier for us to implement.

We currently support deleting a NIC when the instance is stopped, which would remove the external IP address. We can also add support for updating a NIC by removing the external IP address. That doesn't exist, but seems like a good path forward. Thoughts?

@smklein
Copy link
Collaborator

smklein commented Mar 31, 2023

To be clear: I am not proposing we modify this before MVP, but this seems like an area where we could benefit from some consistency.

Resource Used By Instance Still in-use when instance stopped
Sled Placement Yes, but maybe no later: #2315
CPU and RAM usage ("what is provisioned to the sled") No accounting yet, but we do stop the instance
CPU and RAM usage (virtual_provisioning_collection_insert_instance) Yes
Attached Disks Yes
Ephemeral IPs Currently Yes

@rmustacc
Copy link

As you noted, this behavior is intentional. The issue is that what you're describing is both confusing as a user and hard for us to implement. Suppose we assign an IP address to a guest, and then claw it back after the instance has been stopped for some time. What happens if the customer then restarts that exact same instance? Do we a assign a new address? If no new addresses are available, does the instance fail to start?

So, this came up in conversation I had with @askfongjojo yesterday. The point is actually that the documentation and experience from other clouds is that this isn't locked in permanently. Reserving the resource permanently is a short-term thing that we did to speed up initial implementation and is still the right choice in the short term. But in this model the only difference between the ephemeral and static IP is the naming and movability. The ephemeral IP concept comes directly from AWS where the behavior and answer to your questions are:

  • Instance stop release the IP. Subsequent instance start is random.
  • If there are no new addresses, yes, the instance fails to start.

There are a lot of mixed tensions here. we know for example the SNAT IP allocation is wasteful. It simplifies what we need, but over time we're going to need to be more complex because we're wasting a lot of resources. Similarly, stopped instances having a hotel room reservation is going to be confusing. Most people don't think of stopped instances taking up a slot that stops other things from running. It's the right call for now.

Ultimately both behavior are confusing and the one right now was one that confusing @askfongjojo when reading docs and seeing behavior. It's not that different from a static IP. It does make me wonder if there's a reason to have the ephemeral vs.static distinction now given that we have to educate users on something and this is the kind of behavior people will start to rely on given the lack of DNS.

To be clear: I am not proposing we modify this before MVP, but this seems like an area where we could benefit from some consistency.

There are other networking things where things will go away when instances are stopped. In particular when we get to all DNS related things. Importantly to further the distinction with networking from other resources you can't reach it externally from other instances while the instance is stopped. External addressing feels different from volumes because the volumes only make sense in the context of the instance generally where as an external address is trivially swapped one for another unless you hardcode it.

I assigned the unscheduled milestone as per the original intent not to change for MVP. Though I think we should probably reconsider what we want to do with static vs. ephemeral given the fact that it is weird.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants