Skip to content

Commit

Permalink
background task for service zone nat (#4857)
Browse files Browse the repository at this point in the history
Currently the logic for configuring NAT for service zones is deeply
nested and crosses sled-agent http API boundaries. The cleanest way to
deliver eventual consistency for service zone nat entries was to pull
the zone information from inventory and use that to generate nat entries
to reconcile against the `ipv4_nat_entry` table. This covers us in the
following scenarios:

### RSS:
* User provides configuration to RSS
* RSS process ultimately creates a sled plan and service plan
* Application of service plan by sled-agents creates zones
* zone create makes direct calls to dendrite to configure NAT (it is the
only way it can be done at this time)
* eventually the Nexus zones are launched and handoff to Nexus is
complete
* inventory task is run, recording zone locations to db
* service zone nat background task reads inventory from db and uses the
data to generate records for `ipv4_nat_entry` table, then triggers
dendrite sync.
* sync is ultimately a noop because nat entries already exist in
dendrite (dendrite operations are idempotent)
 
### Cold boot:
* sled-agents create switch zones if they are managing a scrimlet, and
subsequently create zones written to their ledgers. This may result in
direct calls to dendrite.
* Once nexus is back up, inventory will resume being collected
* service zone nat background task will read inventory from db to
reconcile entries in `ipv4_nat_entry` table and then trigger dendrite
sync.
* If nat is out of date on dendrite, it will be updated on trigger.

### Dendrite crash
* If dendrite crashes and restarts, it will immediately contact Nexus
for re-sync (pre-existing logic from earlier NAT RPW work)
* service zone and instance nat entries are now present in rpw table, so
all nat entries will be restored

### Migration / Relocation of service zone
* New zone gets created on a sled in the rack. Direct call to dendrite
will be made (it uses the same logic as pre-nexus to create zone).
* Inventory task will record new location of service zone
* Service zone nat background task will use inventory to update table,
adding and removing the necessary nat entries and triggering a dendrite
update


Considerations
---
Because this relies on data from the inventory task which runs on a
periodic timer (600s), and because this task also runs on a periodic
timer (30s), there may be some latency for picking up changes. A few
potential avenues for improvement:

* Plumb additional logic into service zone nat configuration that
enables direct updates to the `ipv4_nat_entry` table once nexus is
online. Of note, this would further bifurcate the logic of pre-nexus and
post-nexus state management. At this moment, it seems that this is the
most painful approach. An argument can be made that we ultimately should
be lifting the nat configuration logic _out_ of the service zone
creation instead.

* Decrease the timer for the inventory task. This is the simplest
change, however this would result in more frequent collection,
increasing overhead. I do not know _how much_ this would increase
overhead. Maybe it is negligible.

* Plumb in the ability to trigger the inventory collection task for
interesting control plane events. This would allow us to keep the
_relatively_ infrequent timing intervals but allow us to refresh
on-demand when needed.

Related
---
Closes #4650 
Extracted from #4822
  • Loading branch information
internet-diglett authored Jan 26, 2024
1 parent 80cc001 commit 5215d85
Show file tree
Hide file tree
Showing 21 changed files with 770 additions and 31 deletions.
6 changes: 6 additions & 0 deletions common/src/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,12 @@ pub const AZ_PREFIX: u8 = 48;
pub const RACK_PREFIX: u8 = 56;
pub const SLED_PREFIX: u8 = 64;

/// maximum possible value for a tcp or udp port
pub const MAX_PORT: u16 = u16::MAX;

/// minimum possible value for a tcp or udp port
pub const MIN_PORT: u16 = u16::MIN;

/// The amount of redundancy for internal DNS servers.
///
/// Must be less than or equal to MAX_DNS_REDUNDANCY.
Expand Down
16 changes: 16 additions & 0 deletions common/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,8 @@ pub struct BackgroundTaskConfig {
pub inventory: InventoryConfig,
/// configuration for phantom disks task
pub phantom_disks: PhantomDiskConfig,
/// configuration for service zone nat sync task
pub sync_service_zone_nat: SyncServiceZoneNatConfig,
}

#[serde_as]
Expand Down Expand Up @@ -376,6 +378,14 @@ pub struct NatCleanupConfig {
pub period_secs: Duration,
}

#[serde_as]
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub struct SyncServiceZoneNatConfig {
/// period (in seconds) for periodic activations of this background task
#[serde_as(as = "DurationSeconds<u64>")]
pub period_secs: Duration,
}

#[serde_as]
#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
pub struct InventoryConfig {
Expand Down Expand Up @@ -517,6 +527,7 @@ mod test {
};
use crate::address::{Ipv6Subnet, RACK_PREFIX};
use crate::api::internal::shared::SwitchLocation;
use crate::nexus_config::SyncServiceZoneNatConfig;
use camino::{Utf8Path, Utf8PathBuf};
use dropshot::ConfigDropshot;
use dropshot::ConfigLogging;
Expand Down Expand Up @@ -665,6 +676,7 @@ mod test {
inventory.nkeep = 11
inventory.disable = false
phantom_disks.period_secs = 30
sync_service_zone_nat.period_secs = 30
[default_region_allocation_strategy]
type = "random"
seed = 0
Expand Down Expand Up @@ -769,6 +781,9 @@ mod test {
phantom_disks: PhantomDiskConfig {
period_secs: Duration::from_secs(30),
},
sync_service_zone_nat: SyncServiceZoneNatConfig {
period_secs: Duration::from_secs(30)
}
},
default_region_allocation_strategy:
crate::nexus_config::RegionAllocationStrategy::Random {
Expand Down Expand Up @@ -827,6 +842,7 @@ mod test {
inventory.nkeep = 3
inventory.disable = false
phantom_disks.period_secs = 30
sync_service_zone_nat.period_secs = 30
[default_region_allocation_strategy]
type = "random"
"##,
Expand Down
12 changes: 12 additions & 0 deletions dev-tools/omdb/tests/env.out
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,10 @@ task: "phantom_disks"
detects and un-deletes phantom disks


task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT
Expand Down Expand Up @@ -139,6 +143,10 @@ task: "phantom_disks"
detects and un-deletes phantom disks


task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


---------------------------------------------
stderr:
note: Nexus URL not specified. Will pick one from DNS.
Expand Down Expand Up @@ -195,6 +203,10 @@ task: "phantom_disks"
detects and un-deletes phantom disks


task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


---------------------------------------------
stderr:
note: Nexus URL not specified. Will pick one from DNS.
Expand Down
11 changes: 11 additions & 0 deletions dev-tools/omdb/tests/successes.out
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,10 @@ task: "phantom_disks"
detects and un-deletes phantom disks


task: "service_zone_nat_tracker"
ensures service zone nat records are recorded in NAT RPW table


---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/
Expand Down Expand Up @@ -369,6 +373,13 @@ task: "phantom_disks"
number of phantom disks deleted: 0
number of phantom disk delete errors: 0

task: "service_zone_nat_tracker"
configured period: every 30s
currently executing: no
last completed activation: iter 2, triggered by an explicit signal
started at <REDACTED TIMESTAMP> (<REDACTED DURATION>s ago) and ran for <REDACTED DURATION>ms
last completion reported error: inventory collection is None

---------------------------------------------
stderr:
note: using Nexus URL http://127.0.0.1:REDACTED_PORT/
Expand Down
102 changes: 77 additions & 25 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -498,41 +498,93 @@ Follow the instructions to set up the https://github.com/oxidecomputer/oxide.rs[
oxide auth login --host http://192.168.1.21
----

=== Configure quotas for your silo

Setting resource quotas is required before you can begin uploading images, provisioning instances, etc.
In this example we'll update the recovery silo so we can provision instances directly from it:

[source, console]
----
$ oxide api /v1/system/silos/recovery/quotas --method PUT --input - <<EOF
{
"cpus": 9999999999,
"memory": 999999999999999999,
"storage": 999999999999999999
}
EOF
# example response
{
"cpus": 9999999999,
"memory": 999999999999999999,
"silo_id": "fa12b74d-30f8-4d5a-bc0e-4d229f13c6e5",
"storage": 999999999999999999
}
----

=== Create an IP pool

An IP pool is needed to provide external connectivity to Instances. The addresses you use here should be addresses you've reserved from the external network (see <<_external_networking>>).

Here we will first create an ip pool for the recovery silo:
[source,console]
----
$ oxide ip-pool range add --pool default --first 192.168.1.31 --last 192.168.1.40
success
IpPoolRange {
id: 4a61e65a-d96d-4c56-9cfd-dc1e44d9e99b,
ip_pool_id: 1b1289a7-cefe-4a7e-a8c9-d93330846301,
range: V4(
Ipv4Range {
first: 192.168.1.31,
last: 192.168.1.40,
},
),
time_created: 2023-08-02T16:31:43.679785Z,
---
$ oxide api /v1/system/ip-pools --method POST --input - <<EOF
{
"name": "default",
"description": "default ip-pool"
}
----
EOF

# example response
{
"description": "default ip-pool",
"id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"name": "default",
"time_created": "2024-01-16T22:51:54.679751Z",
"time_modified": "2024-01-16T22:51:54.679751Z"
}
---

Now we will associate the pool with the recovery silo.
[source,console]
---
$ oxide api /v1/system/ip-pools/default/silos --method POST --input - <<EOF
{
"silo": "recovery",
"is_default": true
}
EOF

# example response
{
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"is_default": true,
"silo_id": "5c0aca09-d7ee-4be6-b7b1-060655659f74"
}
---

With SoftNPU you will generally also need to configure Proxy ARP. Below, `IP_POOL_START` and `IP_POOL_END` are the first and last addresses you used in the previous command:
Now we will add an address range to the recovery silo:

[source,console]
----
# dladm won't return leading zeroes but `scadm` expects them
$ SOFTNPU_MAC=$(dladm show-vnic sc0_1 -p -o macaddress | gsed 's/\b\(\w\)\b/0\1/g')
$ pfexec zlogin sidecar_softnpu /softnpu/scadm \
--server /softnpu/server \
--client /softnpu/client \
standalone \
add-proxy-arp \
$IP_POOL_START \
$IP_POOL_END \
$SOFTNPU_MAC
oxide api /v1/system/ip-pools/default/ranges/add --method POST --input - <<EOF
{
"first": "$IP_POOL_START",
"last": "$IP_POOL_END"
}
EOF
# example response
{
"id": "6209516e-2b38-4cbd-bff4-688ffa39d50b",
"ip_pool_id": "1c3dfa5c-7b00-46ff-987a-4e59e512b250",
"range": {
"first": "192.168.1.35",
"last": "192.168.1.40"
},
"time_created": "2024-01-16T22:53:43.179726Z"
}
----

=== Create a Project and Image
Expand Down
2 changes: 1 addition & 1 deletion nexus/db-model/src/ipv4_nat_entry.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ use serde::Serialize;
use uuid::Uuid;

/// Values used to create an Ipv4NatEntry
#[derive(Insertable, Debug, Clone)]
#[derive(Insertable, Debug, Clone, Eq, PartialEq)]
#[diesel(table_name = ipv4_nat_entry)]
pub struct Ipv4NatValues {
pub external_address: Ipv4Net,
Expand Down
1 change: 1 addition & 0 deletions nexus/db-model/src/ipv4net.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ use std::net::Ipv4Addr;
Clone,
Copy,
Debug,
Eq,
PartialEq,
AsExpression,
FromSqlRow,
Expand Down
1 change: 1 addition & 0 deletions nexus/db-model/src/ipv6net.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ use crate::RequestAddressError;
Clone,
Copy,
Debug,
Eq,
PartialEq,
AsExpression,
FromSqlRow,
Expand Down
1 change: 1 addition & 0 deletions nexus/db-model/src/macaddr.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ use serde::Serialize;
Clone,
Copy,
Debug,
Eq,
PartialEq,
AsExpression,
FromSqlRow,
Expand Down
2 changes: 1 addition & 1 deletion nexus/db-model/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use omicron_common::api::external::SemverVersion;
///
/// This should be updated whenever the schema is changed. For more details,
/// refer to: schema/crdb/README.adoc
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(28, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(29, 0, 0);

table! {
disk (id) {
Expand Down
10 changes: 9 additions & 1 deletion nexus/db-model/src/vni.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,15 @@ use serde::Deserialize;
use serde::Serialize;

#[derive(
Clone, Debug, Copy, AsExpression, FromSqlRow, Serialize, Deserialize,
Clone,
Debug,
Copy,
AsExpression,
FromSqlRow,
Serialize,
Deserialize,
Eq,
PartialEq,
)]
#[diesel(sql_type = sql_types::Int4)]
pub struct Vni(pub external::Vni);
Expand Down
Loading

0 comments on commit 5215d85

Please sign in to comment.