Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SNAT IP should come from chosen pool instead of default pool #5043

Closed
david-crespo opened this issue Feb 12, 2024 · 12 comments
Closed

SNAT IP should come from chosen pool instead of default pool #5043

david-crespo opened this issue Feb 12, 2024 · 12 comments
Assignees
Labels
networking Related to the networking. nexus Related to nexus
Milestone

Comments

@david-crespo
Copy link
Contributor

david-crespo commented Feb 12, 2024

Right now the SNAT IP always comes from the default pool for the current silo, even when the user has specified they want an external IP from a different pool. We want to change the following logic to use the specified pool instead.

That change is simple, but there’s another wrinkle: what to do about existing instances where the SNAT IP and ephemeral IP come from different pools. This could lead to confusion later when an operator tries to delete an IP range with outstanding IPs. There’s a good chance there aren’t any such instances because nearly all instances are created with the default pool. But it might be worth checking somehow, just so we know. Longer term we will want (and have plans to build) ways of telling exactly where allocated IPs actually are.

let (.., pool) = datastore
.ip_pools_fetch_default(&opctx)
.await
.map_err(ActionError::action_failed)?;
let pool_id = pool.identity.id;
datastore
.allocate_instance_snat_ip(&opctx, ip_id, instance_id, pool_id)
.await
.map_err(ActionError::action_failed)?;

@david-crespo david-crespo added this to the 7 milestone Feb 12, 2024
@david-crespo david-crespo self-assigned this Feb 12, 2024
@david-crespo david-crespo added networking Related to the networking. nexus Related to nexus labels Feb 12, 2024
@david-crespo david-crespo modified the milestones: 7, 8 Mar 20, 2024
@david-crespo david-crespo modified the milestones: 8, 9 Apr 15, 2024
@david-crespo
Copy link
Contributor Author

david-crespo commented May 1, 2024

After taking a looking at this at @luqmana's suggestion as part of #5674, I'm realizing this is a bit weird, though it still might be worthwhile. When we say we want to follow the "specified pool", there is no general specified pool for the instance at create time: the pool is specified for the ephemeral IP. So is it right to say that the SNAT IP should always come from the same pool as the ephemeral IP (if there is one)? And if there is no ephemeral IP asked for (which you can do by leaving the external IPs array empty) we will use the default pool.

@david-crespo
Copy link
Contributor Author

#4317 points out that if you have an ephemeral IP, OPTE will use that and you don't really need a SNAT IP. To me that points in the opposite direction of this issue: If there is an ephemeral IP specified, don't allocate a SNAT IP. Only if there is no ephemeral IP specified would you need the SNAT IP, in which case you would use the default pool. So SNAT IPs would always come from the default pool.

@luqmana
Copy link
Contributor

luqmana commented May 2, 2024

I guess then it comes down to do we want to allow operators a choice of pool for SNAT IPs?

@david-crespo
Copy link
Contributor Author

I have been assuming that 99+% of the time there will only be one pool linked to a silo and it will be the default, so the question of picking pools is usually moot. If that is not true and we expect real use cases for multiple pools in a silo, getting a better picture of the reason for that will help a lot.

@luqmana
Copy link
Contributor

luqmana commented May 2, 2024

There was some question recently in chat about quotas for IPs and pools sorta being a way of doing that.

(cc @augustuswm who's been managing this on colo and might have some perspective)

@augustuswm
Copy link
Contributor

augustuswm commented May 2, 2024

So an immediate use case for this would be with the colo rack where we want users to have a choice between our public IPs and VPN IPs. The plan is to have at least two pools here linked to the same silo.

If we can use the selected ephemeral IP for SNAT that would be ideal as it would be a large reduction in our current IP usage. It would also keep traffic to those instances on the IPs we expect.

While this is our solution for today, I'm not sure if this is what we want in the future state to expose to end users. IP pools seem like a very coarse grained tool and in a silo we would want to provide some way to carve up those available IPs amongst projects, vpcs, or maybe subnets.

@rmustacc
Copy link

rmustacc commented May 2, 2024

So let me give a bit of context to how this was meant to fit together a bit as some of what was intended doesn't really exist:

So to start with in the original take on RFD 21 we had a notion of the Internet Gateway. The Internet Gateway critically was meant to represent the thing that got you to the Internet like how AWS has the NAT gateway. Internet gateways semantically represent a few things:

  • They represent the default route out to the Internet in the VPC system routing table
  • They have an IP pool associated with them or a series of explicitly allocated IPs
  • If there is no Internet Gateway the instance cannot reach the outside world at all, not even floating or ephemeral IPs would work

That's the first bit. The second bit that's relevant is that the original intent was that if you had an ephemeral or floating IP you would have no SNAT assigned to you. As a speed up to avoid dealing with allocating this at instance start time, this was originally implemented as instance create time.

So from an original design perspective, no SNAT shouldn't come from the chosen pool, it should fall back to the IP pool on the Internet gateway which is something explicit, but probably would inherit the silo default at creation time. However, I want to acknowledge that we're not in that original design. Additionally, in early discussions my assumptions were that we'd end up with the following additional concepts that I think in part get to @augustuswm is hinting at: critically that projects would have their own allow or deny lists of the IP Pools, images, and resources that they can use and their own defaults.

What @augustuswm is wanting to do would work well in the original model where the Internet gateway's default IP pool is the VPN IP pool; however, instances could ask for an external address from the public IP pool.

I'm not sure how much this makes sense or not, but hopefully provides some context. I'm less certain what the current answer should be. I think the reason this feels odd is due to the lack of Internet Gateway.

@askfongjojo
Copy link

@rmustacc - In the use case @augustuswm mentioned, if we don't change the current implementation, assuming that the public IP pool is the default pool and the user has chosen to provision a VM with its ephemeral IP allocated from the VPN IP pool - with the intention of it being to able to reach other VMs accessible via VPN, would it be able to do so if its SNAT IP is allocated from the default/public IP pool?

@david-crespo
Copy link
Contributor Author

david-crespo commented May 3, 2024

According to this #4317 it should use the ephemeral IP in that case anyway. We need to go and confirm that in the code.

If that’s true, I’m thinking the simpler and more future-oriented version of this change is to not bother allocating a SNAT IP when there is an ephemeral IP.

@rmustacc
Copy link

rmustacc commented May 3, 2024

@rmustacc - In the use case @augustuswm mentioned, if we don't change the current implementation, assuming that the public IP pool is the default pool and the user has chosen to provision a VM with its ephemeral IP allocated from the VPN IP pool - with the intention of it being to able to reach other VMs accessible via VPN, would it be able to do so if its SNAT IP is allocated from the default/public IP pool?

The SNAT should never be used if any external IP is allocated and assigned to the instance whether ephemeral or floating.

If that’s true, I’m thinking the simpler and more future-oriented version of this change is to not bother allocating a SNAT IP when there is an ephemeral IP.

Yeah, I think that's fine as long as when that IP is removed we do the right thing and allocate the SNAT then. That was the original long term plan. Similarly when an instance has a floating or ephemeral IP assigned then the SNAT should be removed (of course this is all assuming that it can happen while the instance is up).

@david-crespo
Copy link
Contributor Author

Ok, that's interesting. A bit more complicated, but very doable. I will update the issues accordingly and probably close my PR.

@david-crespo
Copy link
Contributor Author

Ok, closing this because I think the right direction will be to instead

  • Not allocate a SNAT IP at create time if there is an ephemeral or floating IP
  • Allocate a SNAT IP from the default pool when the user detaches all ephemeral and floating IPs
  • Remove SNAT IP when an ephemeral IP or floating IP is added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
networking Related to the networking. nexus Related to nexus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants