-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User data support #911
User data support #911
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tested this yet, but will shortly! Some questions:
|
||
impl Instance { | ||
pub fn generate_cidata(&self) -> Result<Vec<u8>, Error> { | ||
// cloud-init meta-data is YAML, but YAML is a strict superset of JSON. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
nexus/src/cidata.rs
Outdated
// this was reverse engineered by making the numbers go lower until the | ||
// code failed (usually because of low disk space). this only works for | ||
// FAT12 filesystems | ||
let sectors = 42.max(file_sectors + 35 + ((file_sectors + 1) / 341 * 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do the numbers in this calculation represent? my FAT* filesystem ignorant read is "the lowest number of sectors we support is 42, and otherwise we have to compute enough space for FAT* formatting / metadata plus our file data", but that implies that FAT* needs 35 sectors for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually have no idea because the only reference I could easily find is [[Design of the FAT file system]], which I repeatedly tried to read and my eyes kept falling off the page. I wrote some code using fatfs to find out what the smallest numbers are that work with it.
This more or less spells out to:
- No device less than 42 sectors can be formatted as FAT12
- Given the total number of sectors taken up by your files, 35 are overhead
- ... plus an additional 2 sectors of overhead based on some linear relationship that has numbers I don't understand in it (why is it 341???).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf as a reference - it is the FAT spec.
Page 14 has a section on FAT type determination:
The FAT type— one of FAT12, FAT16, or FAT32—is determined by the count of clusters on the volume and nothing
else.
Since it seems we're seeing bytes_per_cluster
to 512, and the default bytes_per_sector
is also 512, the choice of FAT filesystem type seems dependent on "how many clusters we need", which seems dependent on the total input data size.
It also mentions the calculation:
// Calculate the Root Directory Sectors
RootDirSectors = ((BPB_RootEntCnt * 32) + (BPB_BytsPerSec – 1)) / BPB_BytsPerSec;
// Calculate the sectors in the data region
If(BPB_FATSz16 != 0)
FATSz = BPB_FATSz16;
Else
FATSz = BPB_FATSz32;
If(BPB_TotSec16 != 0)
TotSec = BPB_TotSec16;
Else
TotSec = BPB_TotSec32;
DataSec = TotSec – (BPB_ResvdSecCnt + (BPB_NumFATs * FATSz) + RootDirSectors);
// Calculate the cluster count
CountofClusters = DataSec / BPB_SecPerClus;
// Figure out FAT type
If(CountofClusters < 4085) {
/* Volume is FAT12 */
} else if(CountofClusters < 65525) {
/* Volume is FAT16 */
} else {
/* Volume is FAT32 */
}
I kinda despise this calculation, because to solve the equation for "am I using a FAT12 filesystem", you must first provide input that knows if you are using FAT12/FAT16/FAT32. For example, BPB_RootEntCnt
is zero on FAT32, but is not on FAT12 / FAT16.
(This is literally what the fatfs crate does - guess, then check)
They even acknowledge this is kinda fucked up on page 19, under "FAT Volume Initialization":
At this point, the careful reader should have one very interesting question. Given that the FAT type
(FAT12, FAT16, or FAT32) is dependant on the number of clusters—and that the sectors available in
the data area of a FAT volume is dependant on the size of the FAT—when handed an unformatted
volume that does not yet have a BPB, how do you determine all this and compute the proper values to
put in BPB_SecPerClus and either BPB_FATSz16 or BPB_FATSz32? The way Microsoft operating
systems do this is with a fixed value, several tables, and a clever piece of arithmetic.
Microsoft operating systems only do FAT12 on floppy disks. Because there is a limited number of
floppy formats that all have a fixed size, this is done with a simple table:
“If it is a floppy of this type, then the BPB looks like this.”
There is no dynamic computation for FAT12. For the FAT12 formats, all the computation for
BPB_SecPerClus and BPB_FATSz16 was worked out by hand on a piece of paper and recorded in the
table (being careful of course that the resultant cluster count was always less than 4085). If your media
is larger than 4 MB, do not bother with FAT12. Use smaller BPB_SecPerClus values so that the
volume will be FAT16.
So, anyway, "worked out by hand on a piece of paper" aside, we can make some assumptions and try calculating this for a limited size:
- If we assume we're using FAT12 (which we can validate later),
- and we assume sector and cluster size are both 512
- and we assume "max_root_dir_entries" is 512,
- and we assume "BPB_ResvdSecCnt = 1"
- and we assume "# of FATs = 2" (we could change this, but that's documented as the default of the rust crate you're using)
RootDirSectors = ((512 * 32) + (512 – 1)) / 512 = 32;
DataSec = TotSec - (1 + 2 * FATSz + 32);
DataSec = TotSec - (33 + 2 * FATSz)
// The filesystem stays FAT12 if DataSec < 4085
So, anyway, how do we calculate the size of the actual "FAT"s on the FS? With a "simple" calculation (see: determine_sectors_per_fat
in fatfs). According to page 16 of the FAT spec, this is "more complicated" for FAT12 because "there are 1.5 bytes (12-bits) per FAT entry".
So, recalling our calculation earlier:
DataSec = TotSec - (33 + 2 * FATSz)
- If FATSz = 1 => 512 bytes => 4096 bits -> can represent 341 clusters (4096 / 12), or ~170 KiB worth of clusters
- If FATSz = 2 => 1024 bytes => 8192 bits -> can represent 682 clusters, or ~341 KiB worth of clusters.
So:
TotalSec = 35 + DataSec, if we're storing < 170 KiB of clusters
TotalSec = 37 + DataSec, if we're storing < 341 KiB of clusters
TL:DR:
If we think it's okay to put a bound on user_data + meta_data, my inclination would be:
- Calculate:
file_sectors = user_data.len().div_ceil(512) + meta_data.len().div_ceil(512)
- Return an error if
file_sectors > 512
(implying a combined sum of 256KiB) - Expect that 37 sectors will be reserved for overhead
- Use a disk size of
37 + file_sectors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though reading your comment earlier, it seems like you experimentally noticed the minimum FS size was 42, no? Any idea where that's coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea where the 42 is coming from, no. But if you try to give fatfs fewer than 42 sectors, it gives you an "unfortunate disk size" error.
beware, YAML carries a terrible curse, but we can use JSON in a FAT filesystem
FAT is also cursed but at least we can use it for cloud-init data
normalize_pubkey_data is also cursed |
yooooo
|
the new curse is that YAML 1.2 is the version of the spec that made it a strict superset, and PyYAML (what cloud-init ultimately uses) does not yet have YAML 1.2 support. this could get interesting edit: nvm we're probably fine:
|
[ 10.766464] cloud-init[578]: Cloud-init v. 20.4.1 finished at Thu, 14 Apr 2022 22:09:45 +0000. Datasource DataSourceNoCloud [seed=/dev/vdb][dsmode=net]. Up 10.76 seconds 🎉 |
I've verified this works on a Debian 11 image with the following user data: #cloud-config
system_info:
default_user:
name: iliana
plain_text_passwd: "this is my password"
lock_passwd: false which correctly set the username and password of the default account, allowing me to log in over the serial console.
|
}, | ||
"user_data": { | ||
"description": "User data for instance initialization systems (such as cloud-init). Must be a Base64-encoded string, as specified in RFC 4648 § 4 (+ and / characters with padding). Maximum 32 KiB unencoded data.", | ||
"type": "string" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave this field out of the request and everything still works the same? I see that it's not in the required
list. I also don't see the logic that makes that work, but it might be serde magic or dropshot magic or who knows what.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. There's a #[serde(default)]
here, so it defaults to an empty string if no data is provided.
(With this change in place there will always be a cloud-init data device attached to an instance with at least the metadata, which is hostname/instance ID/SSH keys.)
let mut disk = Cursor::new(vec![0; sectors * 512]); | ||
fatfs::format_volume( | ||
&mut disk, | ||
FormatVolumeOptions::new() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can explicitly specify that this needs to be FAT12 using .fat_type(FatType::Fat12)
, if that's necessary to assert our comment above. However, I'm not sure it's safe to assume we'll be using FAT12 - I think that's dependent on the size of userdata + metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit; Unless we think it's okay to impose an upper size bound, then setting the fat type would be totally reasonable here.
nexus/src/cidata.rs
Outdated
// this was reverse engineered by making the numbers go lower until the | ||
// code failed (usually because of low disk space). this only works for | ||
// FAT12 filesystems | ||
let sectors = 42.max(file_sectors + 35 + ((file_sectors + 1) / 341 * 2)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend https://www.win.tue.nl/~aeb/linux/fs/fat/fatgen103.pdf as a reference - it is the FAT spec.
Page 14 has a section on FAT type determination:
The FAT type— one of FAT12, FAT16, or FAT32—is determined by the count of clusters on the volume and nothing
else.
Since it seems we're seeing bytes_per_cluster
to 512, and the default bytes_per_sector
is also 512, the choice of FAT filesystem type seems dependent on "how many clusters we need", which seems dependent on the total input data size.
It also mentions the calculation:
// Calculate the Root Directory Sectors
RootDirSectors = ((BPB_RootEntCnt * 32) + (BPB_BytsPerSec – 1)) / BPB_BytsPerSec;
// Calculate the sectors in the data region
If(BPB_FATSz16 != 0)
FATSz = BPB_FATSz16;
Else
FATSz = BPB_FATSz32;
If(BPB_TotSec16 != 0)
TotSec = BPB_TotSec16;
Else
TotSec = BPB_TotSec32;
DataSec = TotSec – (BPB_ResvdSecCnt + (BPB_NumFATs * FATSz) + RootDirSectors);
// Calculate the cluster count
CountofClusters = DataSec / BPB_SecPerClus;
// Figure out FAT type
If(CountofClusters < 4085) {
/* Volume is FAT12 */
} else if(CountofClusters < 65525) {
/* Volume is FAT16 */
} else {
/* Volume is FAT32 */
}
I kinda despise this calculation, because to solve the equation for "am I using a FAT12 filesystem", you must first provide input that knows if you are using FAT12/FAT16/FAT32. For example, BPB_RootEntCnt
is zero on FAT32, but is not on FAT12 / FAT16.
(This is literally what the fatfs crate does - guess, then check)
They even acknowledge this is kinda fucked up on page 19, under "FAT Volume Initialization":
At this point, the careful reader should have one very interesting question. Given that the FAT type
(FAT12, FAT16, or FAT32) is dependant on the number of clusters—and that the sectors available in
the data area of a FAT volume is dependant on the size of the FAT—when handed an unformatted
volume that does not yet have a BPB, how do you determine all this and compute the proper values to
put in BPB_SecPerClus and either BPB_FATSz16 or BPB_FATSz32? The way Microsoft operating
systems do this is with a fixed value, several tables, and a clever piece of arithmetic.
Microsoft operating systems only do FAT12 on floppy disks. Because there is a limited number of
floppy formats that all have a fixed size, this is done with a simple table:
“If it is a floppy of this type, then the BPB looks like this.”
There is no dynamic computation for FAT12. For the FAT12 formats, all the computation for
BPB_SecPerClus and BPB_FATSz16 was worked out by hand on a piece of paper and recorded in the
table (being careful of course that the resultant cluster count was always less than 4085). If your media
is larger than 4 MB, do not bother with FAT12. Use smaller BPB_SecPerClus values so that the
volume will be FAT16.
So, anyway, "worked out by hand on a piece of paper" aside, we can make some assumptions and try calculating this for a limited size:
- If we assume we're using FAT12 (which we can validate later),
- and we assume sector and cluster size are both 512
- and we assume "max_root_dir_entries" is 512,
- and we assume "BPB_ResvdSecCnt = 1"
- and we assume "# of FATs = 2" (we could change this, but that's documented as the default of the rust crate you're using)
RootDirSectors = ((512 * 32) + (512 – 1)) / 512 = 32;
DataSec = TotSec - (1 + 2 * FATSz + 32);
DataSec = TotSec - (33 + 2 * FATSz)
// The filesystem stays FAT12 if DataSec < 4085
So, anyway, how do we calculate the size of the actual "FAT"s on the FS? With a "simple" calculation (see: determine_sectors_per_fat
in fatfs). According to page 16 of the FAT spec, this is "more complicated" for FAT12 because "there are 1.5 bytes (12-bits) per FAT entry".
So, recalling our calculation earlier:
DataSec = TotSec - (33 + 2 * FATSz)
- If FATSz = 1 => 512 bytes => 4096 bits -> can represent 341 clusters (4096 / 12), or ~170 KiB worth of clusters
- If FATSz = 2 => 1024 bytes => 8192 bits -> can represent 682 clusters, or ~341 KiB worth of clusters.
So:
TotalSec = 35 + DataSec, if we're storing < 170 KiB of clusters
TotalSec = 37 + DataSec, if we're storing < 341 KiB of clusters
TL:DR:
If we think it's okay to put a bound on user_data + meta_data, my inclination would be:
- Calculate:
file_sectors = user_data.len().div_ceil(512) + meta_data.len().div_ceil(512)
- Return an error if
file_sectors > 512
(implying a combined sum of 256KiB) - Expect that 37 sectors will be reserved for overhead
- Use a disk size of
37 + file_sectors
Crucible updates all Crucible connections should set TCP_NODELAY (#983) Use a fixed size for tag and nonce (#957) Log crucible opts on start, order crutest options (#974) Lock the Downstairs less (#966) Cache dirty flag locally, reducing SQLite operations (#970) Make stats mutex synchronous (#961) Optimize requeue during flow control conditions (#962) Update Rust crate base64 to 0.21.4 (#950) Do less in control (#949) Fix --flush-per-blocks (#959) Fast dependency checking (#916) Update actions/checkout action to v4 (#960) Use `cargo hakari` for better workspace deps (#956) Update actions/checkout digest to 8ade135 (#939) Cache block size in Guest (#947) Update Rust crate ringbuffer to 0.15.0 (#954) Update Rust crate toml to 0.8 (#955) Update Rust crate reedline to 0.24.0 (#953) Update Rust crate libc to 0.2.148 (#952) Update Rust crate indicatif to 0.17.7 (#951) Remove unused async (#943) Use a synchronous mutex for bw/iop_tokens (#946) Make flush ID non-locking (#945) Use `oneshot` channels instead of `mpsc` for notification (#918) Use a strong type for upstairs negotiation (#941) Add a "dynamometer" option to crucible-downstairs (#931) Get new work and active count in one lock (#938) A bunch of misc test cleanup stuff (#937) Wait for a snapshot to finish on all downstairs (#920) dsc and clippy cleanup. (#935) No need to sort ackable_work (#934) Use a strong type for repair ID (#928) Keep new jobs sorted (#929) Remove state_count function on Downstairs (#927) Small cleanup to IOStateCount (#932) let cmon and IOStateCount use ClientId (#930) Fast return for zero length IOs (#926) Use a strong type for client ID (#925) A few Crucible Agent fixes (#922) Use a newtype for `JobId` (#919) Don't pass MutexGuard into functions (#917) Crutest updates, rename tests, new options (#911) Propolis updates Update tungstenite crates to 0.20 Use `strum` crate for enum-related utilities Wire up bits for CPUID customization PHD: improve artifact store (#529) Revert abort-on-panic in 'dev' cargo profile
Crucible updates all Crucible connections should set TCP_NODELAY (#983) Use a fixed size for tag and nonce (#957) Log crucible opts on start, order crutest options (#974) Lock the Downstairs less (#966) Cache dirty flag locally, reducing SQLite operations (#970) Make stats mutex synchronous (#961) Optimize requeue during flow control conditions (#962) Update Rust crate base64 to 0.21.4 (#950) Do less in control (#949) Fix --flush-per-blocks (#959) Fast dependency checking (#916) Update actions/checkout action to v4 (#960) Use `cargo hakari` for better workspace deps (#956) Update actions/checkout digest to 8ade135 (#939) Cache block size in Guest (#947) Update Rust crate ringbuffer to 0.15.0 (#954) Update Rust crate toml to 0.8 (#955) Update Rust crate reedline to 0.24.0 (#953) Update Rust crate libc to 0.2.148 (#952) Update Rust crate indicatif to 0.17.7 (#951) Remove unused async (#943) Use a synchronous mutex for bw/iop_tokens (#946) Make flush ID non-locking (#945) Use `oneshot` channels instead of `mpsc` for notification (#918) Use a strong type for upstairs negotiation (#941) Add a "dynamometer" option to crucible-downstairs (#931) Get new work and active count in one lock (#938) A bunch of misc test cleanup stuff (#937) Wait for a snapshot to finish on all downstairs (#920) dsc and clippy cleanup. (#935) No need to sort ackable_work (#934) Use a strong type for repair ID (#928) Keep new jobs sorted (#929) Remove state_count function on Downstairs (#927) Small cleanup to IOStateCount (#932) let cmon and IOStateCount use ClientId (#930) Fast return for zero length IOs (#926) Use a strong type for client ID (#925) A few Crucible Agent fixes (#922) Use a newtype for `JobId` (#919) Don't pass MutexGuard into functions (#917) Crutest updates, rename tests, new options (#911) Propolis updates Update tungstenite crates to 0.20 Use `strum` crate for enum-related utilities Wire up bits for CPUID customization PHD: improve artifact store (#529) Revert abort-on-panic in 'dev' cargo profile --------- Co-authored-by: Alan Hanson <[email protected]>
This is a draft because I haven't run this outside of integration tests yet, but in theory it works!It works! Please go ahead and review.cloud-init is the usual instance initialization system for cloud images of common Linux distributions. It is a huge ball of Python code that runs at each boot and helps set things up in an expected way. There are two main pieces of data cloud-init looks for on boot: "meta-data", provided by the cloud provider, and "user-data", provided by whoever launched the actual instance.
This PR:
user_data
to theInstanceCreate
struct, a base64-encoded stringCIDATA
-labeled vfat volume, as specified in the cloud-init docs with the meta-data and user-dataI considered whether to add a
user_data
field to theInstance
struct in omicron_common, but chose against it. I think we want to come up with an alternate means of making user-data accessible from the API rather than add a potentially-long string to the response object for all instance-related actions.Existing tests that launch instances with the simulated sled-agent seem to exercise these changes, but recommendations for additional tests welcome; I suspect that the full end-to-end "the user-data matches on the actual instance" test will require having a CI system that runs actual VMs.