Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addresses for propolis instances at SLED_PREFIX + 0xFFFF #4777

Merged
merged 7 commits into from
Jan 10, 2024

Conversation

andrewjstone
Copy link
Contributor

@andrewjstone andrewjstone commented Jan 8, 2024

Rather than allocating instance IPs starting at SLED_PREFIX + RSS_RESERVED_ADDRESSES + 1 where the 1 is the sled-agent allocated address of the GZ, we begin allocation from a larger block: SLED_PREFIX + CP_SERVICES_RESERVED_ADDRESSES. This gives us more room for nexus to allocate control plane services.

Implements #4765

Rather than allocating instance IPs starting at `sLED_PREFIX`
+ `RSS_RESERVED_ADDRESSES` + 1 where the `1` is the sled-agent
allocated address of the GZ, we begin allocation from a larger block:
`SLED_PREFIX` + `CP_SERVICES_RESERVED_ADDRESSES`. This gives us more room
for nexus to allocate control plane services.

Implements #4765

TODO: This is a wip, as a DB migration is needed to change
`last_used_address` to start at the right value for existing DBs.
start at `SLED_PREFIX + 0xFFFF + 1`.

This still requires testing
@andrewjstone andrewjstone changed the title WIP: Addresses for propolis instances at SLED_PREFIX + 0xFFFF Addresses for propolis instances at SLED_PREFIX + 0xFFFF Jan 10, 2024
@andrewjstone andrewjstone marked this pull request as ready for review January 10, 2024 18:01
@smklein
Copy link
Collaborator

smklein commented Jan 10, 2024

At a high-level, I see how this bumps up the last_used_address value in the sled table, but I'm not sure how this resolves #4765.

To me, that issue implies that we basically have a "separate pool" for services and instance addresses.

First off, I'd recommend the following:

  • schema/crdb/dbinit.sql - there's a comment on last_used_address suggesting this is for an "Oxide service" - now it isn't, it's for instances exclusively.
  • nexus/db-model/src/sled.rs - there's a similar comment on last_used_address that needs updating
  • nexus/db-queries/src/db/datastore/mod.rs - there's another comment on next_ipv6_address that indicates "pulling from last_used_address" is for Oxide services. This isn't true anymore, again, it's service-only.
  • That next_ipv6_address method is called by a function called allocate_sled_ipv6, which again, is kinda a misnomer, since this should now only be used for instances.

But before we merge this:

  • Suppose someone does want to provision a new IP for an "Oxide service" (and not an instance!) on a sled. How should they do this? Previously, they could have called next_ipv6_address / allocate_sled_ipv6 on the datastore. By using this field as a counter, there was no additional overhead. Now, this PR seems like it does not provide a mechanism for actually doing this allocation. Since we're inserting a schema change here, are we sure this is sufficient? Don't we need to now check for conflicts in the "reserved-for-control-plane-services" address space?

@davepacheco
Copy link
Collaborator

davepacheco commented Jan 10, 2024

Suppose someone does want to provision a new IP for an "Oxide service" (and not an instance!) on a sled. How should they do this? Previously, they could have called next_ipv6_address / allocate_sled_ipv6 on the datastore. By using this field as a counter, there was no additional overhead. Now, this PR seems like it does not provide a mechanism for actually doing this allocation. Since we're inserting a schema change here, are we sure this is sufficient? Don't we need to now check for conflicts in the "reserved-for-control-plane-services" address space?

The plan I've been working from (which is really just an idea we've talked about, plus an implementation in my branch, so still somewhat speculative) is that the store of record for IP addresses in use by Omicron is the current target blueprint. Thus in generating a new blueprint the planner can pick any addresses it wants as long as they don't overlap with each other.

@smklein
Copy link
Collaborator

smklein commented Jan 10, 2024

The plan I've been working from (which is really just an idea we've talked about, plus an implementation in my branch, so still somewhat speculative) is that the store of record for IP addresses in use by Omicron is the current target blueprint. Thus in generating a new blueprint the planner can pick any addresses it wants as long as they don't overlap with each other.

Gotcha, so this conflict resolution could all happen in-memory then, and we wouldn't need to do anything in the DB to help us do the allocation? That makes sense to me.

@smklein
Copy link
Collaborator

smklein commented Jan 10, 2024

  • schema/crdb/dbinit.sql - there's a comment on last_used_address suggesting this is for an "Oxide service" - now it isn't, it's for instances exclusively.
  • nexus/db-model/src/sled.rs - there's a similar comment on last_used_address that needs updating
  • nexus/db-queries/src/db/datastore/mod.rs - there's another comment on next_ipv6_address that indicates "pulling from last_used_address" is for Oxide services. This isn't true anymore, again, it's service-only.
  • That next_ipv6_address method is called by a function called allocate_sled_ipv6, which again, is kinda a misnomer, since this should now only be used for instances.

@andrewjstone , if we can resolve these nits, this PR LGTM!

@andrewjstone
Copy link
Contributor Author

  • schema/crdb/dbinit.sql - there's a comment on last_used_address suggesting this is for an "Oxide service" - now it isn't, it's for instances exclusively.
  • nexus/db-model/src/sled.rs - there's a similar comment on last_used_address that needs updating
  • nexus/db-queries/src/db/datastore/mod.rs - there's another comment on next_ipv6_address that indicates "pulling from last_used_address" is for Oxide services. This isn't true anymore, again, it's service-only.
  • That next_ipv6_address method is called by a function called allocate_sled_ipv6, which again, is kinda a misnomer, since this should now only be used for instances.

@andrewjstone , if we can resolve these nits, this PR LGTM!

These are great catches! I will fix them. Thank you very much for the thorough review!

@andrewjstone andrewjstone merged commit 0c7be4c into main Jan 10, 2024
20 checks passed
@andrewjstone andrewjstone deleted the ajs/underlay-ip-ranges branch January 10, 2024 21:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants