Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

User data support #911

Merged
merged 11 commits into from
Apr 15, 2022
Merged
13 changes: 13 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 3 additions & 0 deletions common/src/sql/dbinit.sql
Original file line number Diff line number Diff line change
Expand Up @@ -303,6 +303,9 @@ CREATE TABLE omicron.public.instance (
/* Every Instance is in exactly one Project at a time. */
project_id UUID NOT NULL,

/* user data for instance initialization systems (e.g. cloud-init) */
user_data BYTES NOT NULL,

/*
* TODO Would it make sense for the runtime state to live in a separate
* table?
Expand Down
1 change: 1 addition & 0 deletions nexus/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ cookie = "0.16"
crucible-agent-client = { git = "https://github.com/oxidecomputer/crucible", rev = "945daedb88cefa790f1d994b3a038b8fa9ac514a" }
# Tracking pending 2.0 version.
diesel = { git = "https://github.com/diesel-rs/diesel", rev = "ce77c382", features = ["postgres", "r2d2", "chrono", "serde_json", "network-address", "uuid"] }
fatfs = "0.3.5"
futures = "0.3.21"
headers = "0.3.7"
hex = "0.4.3"
Expand Down
93 changes: 93 additions & 0 deletions nexus/src/cidata.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
use crate::db::{identity::Resource, model::Instance};
use fatfs::{FileSystem, FormatVolumeOptions, FsOptions};
use omicron_common::api::external::Error;
use serde::Serialize;
use std::io::{Cursor, Write};
use uuid::Uuid;

impl Instance {
pub fn generate_cidata(&self) -> Result<Vec<u8>, Error> {
// cloud-init meta-data is YAML, but YAML is a strict superset of JSON.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

let meta_data = serde_json::to_vec(&MetaData {
instance_id: self.id(),
local_hostname: &self.runtime().hostname,
public_keys: "", // TODO
})
.map_err(|_| Error::internal_error("failed to serialize meta-data"))?;
let cidata =
build_vfat(&meta_data, &self.user_data).map_err(|err| {
Error::internal_error(&format!(
"failed to create cidata volume: {}",
err
))
})?;
Ok(cidata)
}
}

#[derive(Serialize)]
#[serde(rename_all = "kebab-case")]
struct MetaData<'a> {
instance_id: Uuid,
local_hostname: &'a str,
public_keys: &'a str,
iliana marked this conversation as resolved.
Show resolved Hide resolved
}

fn build_vfat(meta_data: &[u8], user_data: &[u8]) -> std::io::Result<Vec<u8>> {
// requires #![feature(int_roundings)].
// https://github.com/rust-lang/rust/issues/88581
let file_sectors = meta_data.len().unstable_div_ceil(512)
+ user_data.len().unstable_div_ceil(512);
// this was reverse engineered by making the numbers go lower until the
// code failed (usually because of low disk space). this only works for
// FAT12 filesystems
let sectors = 42.max(file_sectors + 35 + ((file_sectors + 1) / 341 * 2));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do the numbers in this calculation represent? my FAT* filesystem ignorant read is "the lowest number of sectors we support is 42, and otherwise we have to compute enough space for FAT* formatting / metadata plus our file data", but that implies that FAT* needs 35 sectors for something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have no idea because the only reference I could easily find is [[Design of the FAT file system]], which I repeatedly tried to read and my eyes kept falling off the page. I wrote some code using fatfs to find out what the smallest numbers are that work with it.

This more or less spells out to:

  • No device less than 42 sectors can be formatted as FAT12
  • Given the total number of sectors taken up by your files, 35 are overhead
  • ... plus an additional 2 sectors of overhead based on some linear relationship that has numbers I don't understand in it (why is it 341???).

Copy link
Collaborator

@smklein smklein Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf as a reference - it is the FAT spec.

Page 14 has a section on FAT type determination:

The FAT type— one of FAT12, FAT16, or FAT32—is determined by the count of clusters on the volume and nothing
else.

Since it seems we're seeing bytes_per_cluster to 512, and the default bytes_per_sector is also 512, the choice of FAT filesystem type seems dependent on "how many clusters we need", which seems dependent on the total input data size.

It also mentions the calculation:

// Calculate the Root Directory Sectors
RootDirSectors = ((BPB_RootEntCnt * 32) + (BPB_BytsPerSec – 1)) / BPB_BytsPerSec;

// Calculate the sectors in the data region
If(BPB_FATSz16 != 0)
  FATSz = BPB_FATSz16;
Else
  FATSz = BPB_FATSz32;
If(BPB_TotSec16 != 0)
  TotSec = BPB_TotSec16;
Else
  TotSec = BPB_TotSec32;
DataSec = TotSec – (BPB_ResvdSecCnt + (BPB_NumFATs * FATSz) + RootDirSectors);

// Calculate the cluster count
CountofClusters = DataSec / BPB_SecPerClus;

// Figure out FAT type
If(CountofClusters < 4085) {
  /* Volume is FAT12 */
} else if(CountofClusters < 65525) {
  /* Volume is FAT16 */
} else {
  /* Volume is FAT32 */
}

I kinda despise this calculation, because to solve the equation for "am I using a FAT12 filesystem", you must first provide input that knows if you are using FAT12/FAT16/FAT32. For example, BPB_RootEntCnt is zero on FAT32, but is not on FAT12 / FAT16.

(This is literally what the fatfs crate does - guess, then check)

They even acknowledge this is kinda fucked up on page 19, under "FAT Volume Initialization":

At this point, the careful reader should have one very interesting question. Given that the FAT type
(FAT12, FAT16, or FAT32) is dependant on the number of clusters—and that the sectors available in
the data area of a FAT volume is dependant on the size of the FAT—when handed an unformatted
volume that does not yet have a BPB, how do you determine all this and compute the proper values to
put in BPB_SecPerClus and either BPB_FATSz16 or BPB_FATSz32? The way Microsoft operating
systems do this is with a fixed value, several tables, and a clever piece of arithmetic.

Microsoft operating systems only do FAT12 on floppy disks. Because there is a limited number of
floppy formats that all have a fixed size, this is done with a simple table:
“If it is a floppy of this type, then the BPB looks like this.”

There is no dynamic computation for FAT12. For the FAT12 formats, all the computation for
BPB_SecPerClus and BPB_FATSz16 was worked out by hand on a piece of paper and recorded in the
table (being careful of course that the resultant cluster count was always less than 4085). If your media
is larger than 4 MB, do not bother with FAT12. Use smaller BPB_SecPerClus values so that the
volume will be FAT16.

So, anyway, "worked out by hand on a piece of paper" aside, we can make some assumptions and try calculating this for a limited size:

  • If we assume we're using FAT12 (which we can validate later),
  • and we assume sector and cluster size are both 512
  • and we assume "max_root_dir_entries" is 512,
  • and we assume "BPB_ResvdSecCnt = 1"
  • and we assume "# of FATs = 2" (we could change this, but that's documented as the default of the rust crate you're using)
RootDirSectors = ((512 * 32) + (512 – 1)) / 512 = 32;
DataSec = TotSec - (1 + 2 * FATSz + 32); 
DataSec = TotSec - (33 + 2 * FATSz)
// The filesystem stays FAT12 if DataSec < 4085

So, anyway, how do we calculate the size of the actual "FAT"s on the FS? With a "simple" calculation (see: determine_sectors_per_fat in fatfs). According to page 16 of the FAT spec, this is "more complicated" for FAT12 because "there are 1.5 bytes (12-bits) per FAT entry".

So, recalling our calculation earlier:

DataSec = TotSec - (33 + 2 * FATSz)
  • If FATSz = 1 => 512 bytes => 4096 bits -> can represent 341 clusters (4096 / 12), or ~170 KiB worth of clusters
  • If FATSz = 2 => 1024 bytes => 8192 bits -> can represent 682 clusters, or ~341 KiB worth of clusters.

So:

TotalSec = 35 + DataSec, if we're storing < 170 KiB of clusters
TotalSec = 37 + DataSec, if we're storing < 341 KiB of clusters

TL:DR:

If we think it's okay to put a bound on user_data + meta_data, my inclination would be:

  • Calculate: file_sectors = user_data.len().div_ceil(512) + meta_data.len().div_ceil(512)
  • Return an error if file_sectors > 512 (implying a combined sum of 256KiB)
  • Expect that 37 sectors will be reserved for overhead
  • Use a disk size of 37 + file_sectors

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though reading your comment earlier, it seems like you experimentally noticed the minimum FS size was 42, no? Any idea where that's coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea where the 42 is coming from, no. But if you try to give fatfs fewer than 42 sectors, it gives you an "unfortunate disk size" error.


let mut disk = Cursor::new(vec![0; sectors * 512]);
fatfs::format_volume(
&mut disk,
FormatVolumeOptions::new()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can explicitly specify that this needs to be FAT12 using .fat_type(FatType::Fat12), if that's necessary to assert our comment above. However, I'm not sure it's safe to assume we'll be using FAT12 - I think that's dependent on the size of userdata + metadata.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit; Unless we think it's okay to impose an upper size bound, then setting the fat type would be totally reasonable here.

.bytes_per_cluster(512)
.volume_label(*b"CIDATA "),
)?;

{
let fs = FileSystem::new(&mut disk, FsOptions::new())?;
let root_dir = fs.root_dir();
for (file, data) in [("meta-data", meta_data), ("user-data", user_data)]
{
if !data.is_empty() {
let mut file = root_dir.create_file(file)?;
file.write_all(data)?;
}
}
}

Ok(disk.into_inner())
}

#[cfg(test)]
mod tests {
/// the fatfs crate has some unfortunate panics if you ask it to do
/// incredibly stupid things, like format a filesystem with an invalid
/// cluster size.
///
/// to ensure that our math for the filesystem size is correct, and also to
/// ensure that we don't ask fatfs to do incredibly stupid things, this
/// test checks that `build_vfat` works on a representative sample of weird
/// file sizes. (32 KiB is our enforced limit for user_data, so push it a
/// little further.)
#[test]
fn build_vfat_works_with_arbitrarily_sized_input() {
// somewhat arbitrarily-chosen prime numbers near 1 KiB and 256 bytes
for md_size in (0..36 * 1024).step_by(1019) {
for ud_size in (0..36 * 1024).step_by(269) {
assert!(super::build_vfat(
&vec![0x5a; md_size],
&vec![0xa5; ud_size]
)
.is_ok());
}
}
}
}
10 changes: 9 additions & 1 deletion nexus/src/db/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1118,6 +1118,9 @@ pub struct Instance {
/// id for the project containing this Instance
pub project_id: Uuid,

/// user data for instance initialization systems (e.g. cloud-init)
pub user_data: Vec<u8>,

/// runtime state of the Instance
#[diesel(embed)]
pub runtime_state: InstanceRuntimeState,
Expand All @@ -1132,7 +1135,12 @@ impl Instance {
) -> Self {
let identity =
InstanceIdentity::new(instance_id, params.identity.clone());
Self { identity, project_id, runtime_state: runtime }
Self {
identity,
project_id,
user_data: params.user_data.clone(),
runtime_state: runtime,
}
}

pub fn runtime(&self) -> &InstanceRuntimeState {
Expand Down
1 change: 1 addition & 0 deletions nexus/src/db/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,7 @@ table! {
time_modified -> Timestamptz,
time_deleted -> Nullable<Timestamptz>,
project_id -> Uuid,
user_data -> Binary,
state -> crate::db::model::InstanceStateEnum,
time_state_updated -> Timestamptz,
state_generation -> Int8,
Expand Down
46 changes: 46 additions & 0 deletions nexus/src/external_api/params.rs
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,15 @@ pub struct InstanceCreate {
pub memory: ByteCount,
pub hostname: String, // TODO-cleanup different type?

/// User data for instance initialization systems (such as cloud-init).
/// Must be a Base64-encoded string, as specified in RFC 4648 § 4 (+ and /
/// characters with padding). Maximum 32 KiB unencoded data.
// TODO: this should emit `"format": "byte"`, but progenitor doesn't
// understand that yet.
#[schemars(default, with = "String")]
#[serde(with = "serde_user_data")]
pub user_data: Vec<u8>,

/// The network interfaces to be created for this instance.
#[serde(default)]
pub network_interfaces: InstanceNetworkInterfaceAttachment,
Expand All @@ -147,6 +156,43 @@ pub struct InstanceCreate {
pub disks: Vec<InstanceDiskAttachment>,
}

mod serde_user_data {
use serde::{de::Error, Deserialize, Deserializer, Serialize, Serializer};

pub fn serialize<S>(
data: &Vec<u8>,
serializer: S,
) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
base64::encode(data).serialize(serializer)
}

pub fn deserialize<'de, D>(deserializer: D) -> Result<Vec<u8>, D::Error>
where
D: Deserializer<'de>,
{
match base64::decode(<&str>::deserialize(deserializer)?) {
Ok(buf) => {
// if you change this, also update the stress test in crate::cidata
if buf.len() > 32 * 1024 {
iliana marked this conversation as resolved.
Show resolved Hide resolved
Err(D::Error::invalid_length(
buf.len(),
&"less than 32 KiB",
))
} else {
Ok(buf)
}
}
Err(_) => Err(D::Error::invalid_value(
serde::de::Unexpected::Other("invalid base64 string"),
&"a valid base64 string",
)),
}
}
}

/// Migration parameters for an [`Instance`](omicron_common::api::external::Instance)
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct InstanceMigrate {
Expand Down
2 changes: 2 additions & 0 deletions nexus/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
// License, v. 2.0. If a copy of the MPL was not distributed with this
// file, You can obtain one at https://mozilla.org/MPL/2.0/.
#![feature(async_closure)]
#![feature(int_roundings)] // used in crate::cidata
iliana marked this conversation as resolved.
Show resolved Hide resolved

//! Library interface to the Nexus, the heart of the control plane

Expand All @@ -15,6 +16,7 @@

pub mod authn; // Public only for testing
pub mod authz; // Public for documentation examples
mod cidata;
pub mod config; // Public for testing
pub mod context; // Public for documentation examples
pub mod db; // Public for documentation examples
Expand Down
3 changes: 3 additions & 0 deletions nexus/src/nexus.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1697,6 +1697,9 @@ impl Nexus {
),
nics: nics.iter().map(|nic| nic.clone().into()).collect(),
disks: disk_reqs,
cloud_init_bytes: Some(base64::encode(
db_instance.generate_cidata()?,
)),
};

let sa = self.instance_sled(&db_instance).await?;
Expand Down
2 changes: 2 additions & 0 deletions nexus/src/sagas.rs
Original file line number Diff line number Diff line change
Expand Up @@ -973,6 +973,8 @@ async fn sim_instance_migrate(
nics: vec![],
// TODO: populate disks
disks: vec![],
// TODO: populate cloud init bytes
cloud_init_bytes: None,
};
let target = InstanceRuntimeStateRequested {
run_state: InstanceStateRequested::Migrating,
Expand Down
3 changes: 3 additions & 0 deletions nexus/test-utils/src/resource_helpers.rs
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,9 @@ pub async fn create_instance(
ncpus: InstanceCpuCount(4),
memory: ByteCount::from_mebibytes_u32(256),
hostname: String::from("the_host"),
user_data:
b"#cloud-config\nsystem_info:\n default_user:\n name: oxide"
.to_vec(),
network_interfaces:
params::InstanceNetworkInterfaceAttachment::Default,
disks: vec![],
Expand Down
1 change: 1 addition & 0 deletions nexus/tests/integration_tests/endpoints.rs
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ lazy_static! {
ncpus: InstanceCpuCount(1),
memory: ByteCount::from_gibibytes_u32(16),
hostname: String::from("demo-instance"),
user_data: vec![],
network_interfaces:
params::InstanceNetworkInterfaceAttachment::Default,
disks: vec![],
Expand Down
11 changes: 11 additions & 0 deletions nexus/tests/integration_tests/instances.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ async fn test_instances_create_reboot_halt(
ncpus: instance.ncpus,
memory: instance.memory,
hostname: instance.hostname.clone(),
user_data: vec![],
network_interfaces:
params::InstanceNetworkInterfaceAttachment::Default,
disks: vec![],
Expand Down Expand Up @@ -563,6 +564,7 @@ async fn test_instance_create_saga_removes_instance_database_record(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("inst"),
user_data: vec![],
network_interfaces: interface_params.clone(),
disks: vec![],
};
Expand All @@ -584,6 +586,7 @@ async fn test_instance_create_saga_removes_instance_database_record(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("inst2"),
user_data: vec![],
network_interfaces: interface_params,
disks: vec![],
};
Expand Down Expand Up @@ -669,6 +672,7 @@ async fn test_instance_with_single_explicit_ip_address(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nic-test"),
user_data: vec![],
network_interfaces: interface_params,
disks: vec![],
};
Expand Down Expand Up @@ -785,6 +789,7 @@ async fn test_instance_with_new_custom_network_interfaces(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nic-test"),
user_data: vec![],
network_interfaces: interface_params,
disks: vec![],
};
Expand Down Expand Up @@ -878,6 +883,7 @@ async fn test_instance_create_delete_network_interface(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nic-test"),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::None,
disks: vec![],
};
Expand Down Expand Up @@ -1061,6 +1067,7 @@ async fn test_instance_with_multiple_nics_unwinds_completely(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nic-test"),
user_data: vec![],
network_interfaces: interface_params,
disks: vec![],
};
Expand Down Expand Up @@ -1134,6 +1141,7 @@ async fn test_attach_one_disk_to_instance(cptestctx: &ControlPlaneTestContext) {
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nfs"),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::Default,
disks: vec![params::InstanceDiskAttachment::Attach(
params::InstanceDiskAttach {
Expand Down Expand Up @@ -1229,6 +1237,7 @@ async fn test_attach_eight_disks_to_instance(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nfs"),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::Default,
disks: (0..8)
.map(|i| {
Expand Down Expand Up @@ -1334,6 +1343,7 @@ async fn test_cannot_attach_nine_disks_to_instance(
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nfs"),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::Default,
disks: (0..9)
.map(|i| {
Expand Down Expand Up @@ -1460,6 +1470,7 @@ async fn test_cannot_attach_faulted_disks(cptestctx: &ControlPlaneTestContext) {
ncpus: InstanceCpuCount::try_from(2).unwrap(),
memory: ByteCount::from_mebibytes_u32(4),
hostname: String::from("nfs"),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::Default,
disks: (0..8)
.map(|i| {
Expand Down
1 change: 1 addition & 0 deletions nexus/tests/integration_tests/subnet_allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ async fn create_instance_expect_failure(
ncpus: InstanceCpuCount(1),
memory: ByteCount::from_mebibytes_u32(256),
hostname: name.to_string(),
user_data: vec![],
network_interfaces: params::InstanceNetworkInterfaceAttachment::Default,
disks: vec![],
};
Expand Down
4 changes: 4 additions & 0 deletions openapi/nexus.json
Original file line number Diff line number Diff line change
Expand Up @@ -5358,6 +5358,10 @@
"$ref": "#/components/schemas/InstanceNetworkInterfaceAttachment"
}
]
},
"user_data": {
"description": "User data for instance initialization systems (such as cloud-init). Must be a Base64-encoded string, as specified in RFC 4648 § 4 (+ and / characters with padding). Maximum 32 KiB unencoded data.",
"type": "string"
Copy link
Contributor

@david-crespo david-crespo Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave this field out of the request and everything still works the same? I see that it's not in the required list. I also don't see the logic that makes that work, but it might be serde magic or dropshot magic or who knows what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. There's a #[serde(default)] here, so it defaults to an empty string if no data is provided.

(With this change in place there will always be a cloud-init data device attached to an instance with at least the metadata, which is hostname/instance ID/SSH keys.)

}
},
"required": [
Expand Down
Loading