Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silo-scoped IP pools #3926

Closed
david-crespo opened this issue Aug 21, 2023 · 5 comments · Fixed by #3985
Closed

Silo-scoped IP pools #3926

david-crespo opened this issue Aug 21, 2023 · 5 comments · Fixed by #3985
Assignees
Labels
api Related to the API. networking Related to the networking. nexus Related to nexus
Milestone

Comments

@david-crespo
Copy link
Contributor

david-crespo commented Aug 21, 2023

Details to be added. Discussion in https://github.com/oxidecomputer/customer-support/issues/18.

Currently we have fleet-scoped IP pools with CRUD ops. We want to be able to do the same thing for each silo so instances created in a given silo are only drawing from the IP pools for that silo.

@david-crespo david-crespo added api Related to the API. networking Related to the networking. nexus Related to nexus labels Aug 21, 2023
@askfongjojo askfongjojo added this to the 1.0.2 milestone Aug 21, 2023
@david-crespo
Copy link
Contributor Author

david-crespo commented Aug 25, 2023

I have been doing some research. The work plan isn't complete, especially around allocation. I will be adding more as I figure it out.

Work plan

This work has two broad parts:

  • Data modeling and CRUD on the IP pools (similar to images and NICs, see Implement silo level images #2772 Generalize network_interface table to allow for new kinds of NICs. #2767)
    • Add nullable silo_id UUID column to ip_pool table
    • Add DB views for fleet_ip_pool (null silo_id) and silo_ip_pool (non-null silo_id) views over ip_pool
    • Add table! calls in schema.rs for fleet_ip_pool and silo_ip_pool (ip_pool is already in there)
    • Add models for FleetIpPool and SiloIpPool (imitate SiloImage and ProjectImage in db-model/src/image.rs)
    • Add datastore functions and nexus service level wrappers
    • Add CRUD endpoints for silo-scoped images at /v1/system/ip-pools?silo=<name_or_id>
      • As noted in answered questions, we do not want this to be silo-scoped in the usual authz sense. It is a fleet-level administration concept.
    • Add CRUD endpoints for silo-scoped images at /v1/ip-pools
      • This endpoint already exists as a project-scoped IP pools concept, but it’s a dummy thing — it just lists system pools. It’s easy to make it silo-scoped by removing the project selector query param. And we can always add the query param back in when we want to do project-scoped pools.
      • This is simplest from an API design point of view, but it we decide we don’t want authz behavior to map to the fleet/silo distinction (i.e., silo admins can create silo pools) then for consistency we’ll probably have to put both sets of endpoints under /v1/system, which means coming up with an ugly scheme to distinguish the two, whether it’s two sets of paths like /v1/system/ip-pools and /v1/system/silo-ip-pools (do not like) or sharing /v1/system/ip-pools and using a query params to distinguish (which is weird for POSTs but tolerable — we do that in other spots, like creating a VPC in a project with POST /v1/vpcs?project=my-project)
  • Instance IP allocation logic
    • Source NAT should use default silo pool instead of default system pool… if it exists?
    • Ephemeral IP pool is specified by name at instance time. We have to change that if we want to allow someone to choose from both silo and fleet IP pools. On the hand maybe we don’t want to allow that.
    • More to be added

What we do during instance create

Right now we have a list of fully global IP pools with a list child IP pool ranges. Each IP pool range has a parent pool ID and a start and end address.

When an instance is created, it seems there are at least two external IP address allocations: one for the “source NAT” and one for the instance itself.

These two functions are pretty similar. The only real difference:

  • kind: IpKind::SNat vs. kind: IpKind::Ephemeral
  • The source NAT IP can only come from the IP pool named default, whereas the instance external IP can come from another pool if specified in the external_ips vec in InstanceCreate.

source NAT IPs are not discoverable in the API at all, and only provide outbound connectivity to instances, not inbound. (source)

The updated data model might be a little annoying, but the allocation story here is the complicated part. Or maybe it’s not so bad — each of the two things takes a pool identifier. If we can simply determine the appropriate pool in the calling code and pass it in, we shouldn’t need to change anything else. The two datastore functions take their pool parameter in stupidly different ways: one takes a pool ID and the other takes a pool name. If they both took an ID, they would be identical. And at that point, they do not need to be two functions at all — all they do is call different constructor methods on IncompleteExternalIp.

I thought maybe I had understood the app code wrong because of a comment on the DB query that should theoretically take the specified pool and pick an IP address from that pool because of this comment (2022-06-30):

Selects the next available IP address and port range from any IP Pool

But the where clause in the query itself says otherwise. Logic added in this PR (2022-07-25). I have a to-do item to fix this comment.

Questions

  • Can I refactor datastore.allocate_instance_snat_ip and datastore.allocate_instance_ephemeral_ip ? They don’t really both need to exist. The logic for constructing the snat vs ephemeral IP could be done in the calling code. They’re only called in one spot each. There are tests that require things to be factored this way, maybe they don’t need to exist either.
  • What do we currently do if a pool is out of IPs? Should we change that logic?
    • Maybe we are punting on this and relying on the customer to set up ranges that are big enough. It’s not like we can run 10,000 instances at once.

Answered questions

  • Should we let silo-scoped IP pools follow the silo permission model? i.e., silo admins can CRUD them, silo readers can read, etc.
    • @askfongjojo: "No, we already settled on the assumption that IP addresses are rack resources that should be managed by fleet admin only (especially if they are the ones who deal with the team who owns corporate firewall configurations)." (Silo-scoped IP pools #3926 (comment))
  • Should we do the same thing in silos as we do currently for system IP pools, namely automatically create one at silo create time with name default?
    • I’m thinking no, because the simplest configuration remains a single system-wide IP pool with no silo IP pools, and we want that to keep working
    • @askfongjojo: No. "That was my original proposal in the linked customer ticket but with the most recent discussion, we won't need something like that until we implement project-level IP pools." (Silo-scoped IP pools #3926 (comment))
  • Do we create the default IP pool automatically?
    • Yes. We create an internal pool called SERVICE_IP_POOL_NAME == "oxide-service-pool" and we create an external pool called default.
    • Source NAT IP is always allocated from the default pool. Is it a requirement we need to enforce for there to be a pool named default? — seems like no if we create it automatically. But can it get deleted?
  • Is there a risk of deleting the default IP pool?
    • Not really? In ip_pool_delete, we refuse to delete an IP pool containing any ranges. And in ip_pool_delete_range we refuse to delete any range if it has any IPs. So as long as there are any IPs allocated in the default pool, you can’t delete it. And because instance source NATs only get their IPs from the default pool, it seems that if you have any instances you can’t delete the default pool. But it doesn’t seem like there’s any explicit check preventing it.
  • If we have silo IP pools, do we still always need a fleet pool to exist?
    • Probably, as there are things other than instance IP allocation that use the fleet pools. Though those probably use the internal service pool, not the default external pool. So we could probably live without the default external pool, though it’s also not hurting anyone. If it contains no ranges and isn’t used because the silos have their own, that’s fine.
  • Can we implement this by adding a nullable silo_id column to IP pools?
    • This fits the pattern of how we did silo and project images. Seems like the easiest way.
    • Most of the SQL work is actually done on the IP pool range table, but it is nice to have one kind of thing that goes in the ip_pool_id column. Otherwise we have all this pain having to mark on an IP range both the pool ID and the kind, so we can go look for it in the right table if we need to.
    • Does that require too much fiddly changing of the existing queries? The alternative is probably worse: duplication with small modifications.
    • Is it too hard to extend later to allow project-scoped pools? For projects, we could add a second column for project_id and enforce that you never have both a silo_id and a project_id… or maybe that you always have a silo_id if you have a project_id.

@askfongjojo
Copy link

@david-crespo: On the two "functional" questions,

Should we let silo-scoped IP pools follow the silo permission model? i.e., silo admins can CRUD them, silo readers can read, etc.

No, we already settled on the assumption that IP addresses are rack resources that should be managed by fleet admin only (especially if they are the ones who deal with the team who owns corporate firewall configurations).

Should we do the same thing in silos as we do currently for system IP pools, namely automatically create one at silo create time with name default?

That was my original proposal in the linked customer ticket but with the most recent discussion, we won't need something like that until we implement project-level IP pools.

@askfongjojo
Copy link

Sorry meant to address this one earlier as well:

What do we currently do if a pool is out of IPs? Should we change that logic? (Maybe we are punting on this and relying on the customer to set up ranges that are big enough. It’s not like we can run 10,000 instances at once.)

I think there is no need to change the current logic. The odds of running out of addresses may be less when ephemeral IP addresses are released from stopped instances some day.

Btw, these are not in your list of questions but it sounds like they can use some feedback:

Source NAT should use default silo pool instead of default system pool… if it exists?

Yes

Ephemeral IP pool is specified by name at instance time. We have to change that if we want to allow someone to choose from both silo and fleet IP pools. On the hand maybe we don’t want to allow that.

There are still use cases that can leverage non-default, fleet-wide pools (e.g. a special pool that has routes to another cloud). I think we can leave the functionality there.

Relatedly, we probably to want to fix the issue of requiring user to set the pool name explicitly in API/CLI requests. When user specifies default as pool name or leave out the attribute altogether, the API can apply the same logic for IP pool lookup (i.e. prefer silo-specific pool, if not found, use the fleet default pool). I hope this makes sense.

@david-crespo
Copy link
Contributor Author

Yes, all helpful.

@david-crespo
Copy link
Contributor Author

Talked through the plan with @ahl we were able to narrow scope further in a couple of ways:

  • Operator API does not need to filter by silo, just list all the pools and they either have a silo or not
  • There does not need to be a silo-scoped API for this at all yet — instance allocation uses the silo's default pool and end users do not needed to enumerate their silo's IP pools

We also realized we might as well add the project_id column already to minimize database migrations and make the future direction clear. This doesn't narrow scope but it's trivial to do.

david-crespo added a commit that referenced this issue Aug 29, 2023
This is quite small, but I'm pretty confident it's a sufficient basis
for the rest of the work in #3926, so we might as well review and merge
already.

A couple of minor pain points I ran into while doing this:

* Needed guidance on version number (3.0.0 it is) — it will be more
obvious when there are more, but I added a line to the readme anyway
* Diff output for
[`dbinit_equals_sum_of_all_up`](https://github.com/oxidecomputer/omicron/blob/fb16870de8c0dba92d0868a984dd715749141b73/nexus/tests/integration_tests/schema.rs#L601)
is terrible — thousands of lines with only a few relevant ones. It turns
out the column order matters, so if you add columns to a table in a
migration, you have to add them to the _end_ of the `create table` in
`dbinit.sql`.
david-crespo added a commit that referenced this issue Aug 30, 2023
Closes #3926 

A lot going on here. I will update this description as I finish things
out.

## Important background

* #3981 added `silo_id` and `project_id` columns to the IP pools table
in the hope that that would be sufficient for the rest of the work
schema-wise, but it turns out there was more needed.
* Before this change, the concept of a **default** IP pool was
implemented through the `name` on the IP pool, i.e., the IP pool named
`default` is the one used by default for instance IP allocation if no
pool name is specified as part of the instance create POST
* Before this change the concept of an **internal** IP pool was
implemented through a boolean `internal` column on the `ip_pool` table

## IP pool selection logic

There are two situations where we pick an IP pool to allocate IPs from.
For an instance's **source NAT**, we always use the default pool. Before
this change, with only fleet pools, this simply meant picking the one
named `default` (analogous now to the one with `is_default == true`).
With the possibility of silo pools, we now pick the most specific
default available. That means that if there is a silo-scoped pool marked
default _matching the current silo_, we use that. If not, we pick the
fleet-level pool marked default, which should always exist (see possible
todos at the bottom — we might want to take steps to guarantee this).

For instance ephemeral IPs, the instance create POST body takes an
optional pool name. If none is specified, we follow the same defaulting
logic as above — the most-specific pool marked `is_default`. We are
leaving pool names globally unique (as opposed to per-scope) which IMO
makes the following lookup logic easy to understand and implement: given
a pool name, look up the pool by name. (That part we were already
going.) The difference now with scopes is that we need to make sure that
the scope of the selected pool (assuming it exists) **does not
conflict** with the current scope, i.e., the current silo. In this
situation, we do not care about what's marked default, and we are not
trying to get an exact match on scope. We just need to disallow an
instance from using an IP pool reserved for a different silo. We can
revisit this, but as implemented here you can, for example, specify a
non-default pool scoped to fleet or silo (if one exists) even if there
exists a default pool scoped to your silo.

## DB migrations on `ip_pool` table

There are three migrations here based on guidance from @smklein based on
[CRDB docs about limitations to online schema
changes](https://www.cockroachlabs.com/docs/stable/online-schema-changes#limitations)
and some conversations he had with them. It's possible they could be
made into two. I don't think it can be done in one.

* Add `is_default` column and a unique index ensuring there is only one
default IP pool per "scope" (unique `silo_id`, including null as a
distinct value)
* Populate correct data in new columns
* Populate `is_default = true` for any IP pools named `default` (there
should be exactly one, but nothing depends on that)
* `silo_id = INTERNAL_SILO_ID` for any IP pools marked `internal` (there
should be exactly one, but nothing depends on that)
* Drop the `internal` column


## Code changes

- [x] Add [`similar-asserts`](https://crates.io/crates/similar-asserts)
so we can get a usable diff when the schema migration tests fail.
Without this you could get a 20k+ line diff with 4 relevant lines.
- [x] Use `silo_id == INTERNAL_SILO_ID` everywhere we were previously
looking at the `internal` flag (thanks @luqmana for the
[suggestion](#3981 (comment)))
- [x] Add `silo_id` and `default` to `IpPoolCreate` (create POST params)
and plumb them down the create chain
  * Leave off `project_id` for now, we can add that later
- [x] Fix index that is failing to prevent multiple `default` pools for
a given scope (see comment
#3985 (comment))
- [x] Source NAT IP allocation uses new defaulting logic to get the most
specific available default IP Pool
- [x] Instance ephemeral IP allocation uses that default logic if no
pool name specified in the create POST, otherwise look up pool by
specified name (but can only get pools matching its scope, i.e., its
project, silo, or the whole fleet)

### Limitations that we might want to turn into to-dos

* You can't update `default` on a pool, i.e., you can't make a pool
default. You have to delete it and make a new one. This one isn't that
hard — I would think of it like image promotion, where it's not a
regular update pool, it's a special endpoint for making a pool default
that can unset the current default if it's a different pool.
* Project-scoped IP pools endpoints are fake — they just return all IP
pools. They were made in anticipation of being able to make them real.
I'm thinking we should remove them or make them work, but I don't think
we have time to make them work.
* Ensure somehow that there is always a fleet-level default pool to fall
back to
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the API. networking Related to the networking. nexus Related to nexus
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants