Skip to content

Commit

Permalink
teach propolis-server to understand configurable boot order (#756)
Browse files Browse the repository at this point in the history
while we accept a list of boot options here, note that Nexus intentionally has a
more reductive view, maintaining only a single boot_disk. this is heavily based
on what i've learned writing tests for how OVMF+guests behave with different
mixes of boot device orders and sources of boot configuration.

this conversation on RFD 470:
  oxidecomputer/rfd#751 (comment)

gets at the crux of it - we don't have like a platform NVRAM device
for guests to persist EFI variables in, so boot configuration is persisted in
the EFI system partition (so, associated with a disk, not an instance). because
we're not going to go modify the EFI system partition in user disks to effect
boot order preferences, we're using the QEMU-style fw_cfg mechanism to indicate
boot order. this, in turn, means that guest settings are blown away and replaced
with what OVMF determines at boot time.

taken together, this isn't an ideal spot to be in if we were to support more
complex boot behavior logic, and it's probably not how we want to work
multi-device orderings into propolis-server. it also means that guest-managed
boot ordering just don't reliably persist if the instance is configured with a
specific boot device!

in the future with managed nonvolatile storage for guest firmware to persist
settings in, we'll be in a somewhat better position to extend boot logic, and so
this PR should leave us able to do so without much API hassle. that will
probably be more interesting on the control plane side, which is why i'm
permissive on the Propolis side, but restrictive on the Nexus side.

along the way, this change brings PHD device/backend names in line with
propolis-server.
  • Loading branch information
iximeow authored Sep 28, 2024
1 parent fae5334 commit 11371b0
Show file tree
Hide file tree
Showing 27 changed files with 1,555 additions and 99 deletions.
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions bin/propolis-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,7 @@ async fn new_instance(
// TODO: Allow specifying NICs
nics: vec![],
disks,
boot_settings: None,
migrate: None,
cloud_init_bytes,
};
Expand Down Expand Up @@ -517,6 +518,9 @@ async fn migrate_instance(
// TODO: Handle migrating NICs
nics: vec![],
disks,
// TODO: Handle retaining boot settings? Or extant boot settings
// forwarded along outside InstanceEnsure anyway.
boot_settings: None,
migrate: Some(InstanceMigrateInitiateRequest {
migration_id: Uuid::new_v4(),
src_addr: src_addr.to_string(),
Expand Down
86 changes: 84 additions & 2 deletions bin/propolis-server/src/lib/initializer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ use std::sync::Arc;
use std::time::{SystemTime, UNIX_EPOCH};

use crate::serial::Serial;
use crate::spec::{self, Spec, StorageBackend};
use crate::spec::{self, Spec, StorageBackend, StorageDevice};
use crate::stats::{
track_network_interface_kstats, track_vcpu_kstats, VirtualDiskProducer,
VirtualMachine,
Expand All @@ -34,7 +34,11 @@ use propolis::hw::ibmpc;
use propolis::hw::pci;
use propolis::hw::ps2::ctrl::PS2Ctrl;
use propolis::hw::qemu::pvpanic::QemuPvpanic;
use propolis::hw::qemu::{debug::QemuDebugPort, fwcfg, ramfb};
use propolis::hw::qemu::{
debug::QemuDebugPort,
fwcfg::{self, Entry},
ramfb,
};
use propolis::hw::uart::LpcUart;
use propolis::hw::{nvme, virtio};
use propolis::intr_pins;
Expand Down Expand Up @@ -1001,6 +1005,80 @@ impl<'a> MachineInitializer<'a> {
smb_tables.commit()
}

fn generate_bootorder(&self) -> Result<Option<Entry>, Error> {
info!(
self.log,
"Generating bootorder with order: {:?}",
self.spec.boot_order.as_ref()
);
let Some(boot_names) = self.spec.boot_order.as_ref() else {
return Ok(None);
};

let mut order = fwcfg::formats::BootOrder::new();

let parse_bdf =
|pci_path: &propolis_api_types::instance_spec::PciPath| {
let bdf: Result<pci::Bdf, Error> =
pci_path.to_owned().try_into().map_err(|e| {
Error::new(
ErrorKind::InvalidInput,
format!(
"Couldn't get PCI BDF for {}: {}",
pci_path, e
),
)
});

bdf
};

for boot_entry in boot_names.iter() {
// Theoretically we could support booting from network devices by
// matching them here and adding their PCI paths, but exactly what
// would happen is ill-understood. So, only check disks here.
if let Some(spec) = self.spec.disks.get(boot_entry.name.as_str()) {
match &spec.device_spec {
StorageDevice::Virtio(disk) => {
let bdf = parse_bdf(&disk.pci_path)?;
if bdf.bus.get() != 0 {
return Err(Error::new(
ErrorKind::InvalidInput,
"Boot device currently must be on PCI bus 0",
));
}

order.add_disk(bdf.location);
}
StorageDevice::Nvme(disk) => {
let bdf = parse_bdf(&disk.pci_path)?;
if bdf.bus.get() != 0 {
return Err(Error::new(
ErrorKind::InvalidInput,
"Boot device currently must be on PCI bus 0",
));
}

// TODO: separately, propolis-standalone passes an eui64
// of 0, so do that here too. is that.. ok?
order.add_nvme(bdf.location, 0);
}
};
} else {
// This should be unreachable - we check that the boot disk is
// valid when constructing the spec we're initializing from.
let message = format!(
"Instance spec included boot entry which does not refer \
to an existing disk: `{}`",
boot_entry.name.as_str(),
);
return Err(Error::new(ErrorKind::InvalidInput, message));
}
}

Ok(Some(order.finish()))
}

/// Initialize qemu `fw_cfg` device, and populate it with data including CPU
/// count, SMBIOS tables, and attached RAM-FB device.
///
Expand Down Expand Up @@ -1032,6 +1110,10 @@ impl<'a> MachineInitializer<'a> {
)
.unwrap();

if let Some(boot_order) = self.generate_bootorder()? {
fwcfg.insert_named("bootorder", boot_order).unwrap();
}

let ramfb = ramfb::RamFb::create(
self.log.new(slog::o!("component" => "ramfb")),
);
Expand Down
6 changes: 6 additions & 0 deletions bin/propolis-server/src/lib/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,12 @@ fn instance_spec_from_request(
spec_builder.add_disk_from_request(disk)?;
}

if let Some(boot_settings) = request.boot_settings.as_ref() {
for item in boot_settings.order.iter() {
spec_builder.add_boot_option(item)?;
}
}

if let Some(base64) = &request.cloud_init_bytes {
spec_builder.add_cloud_init_from_request(base64.clone())?;
}
Expand Down
6 changes: 6 additions & 0 deletions bin/propolis-server/src/lib/spec/api_spec_v0.rs
Original file line number Diff line number Diff line change
Expand Up @@ -247,6 +247,12 @@ impl TryFrom<InstanceSpecV0> for Spec {
return Err(ApiSpecError::BackendNotUsed(backend.to_owned()));
}

if let Some(boot_settings) = value.devices.boot_settings.as_ref() {
for item in boot_settings.order.iter() {
builder.add_boot_option(item)?;
}
}

for (name, serial_port) in value.devices.serial_ports {
builder.add_serial_port(name, serial_port.num)?;
}
Expand Down
22 changes: 21 additions & 1 deletion bin/propolis-server/src/lib/spec/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ use propolis_api_types::{
},
PciPath,
},
DiskRequest, InstanceProperties, NetworkInterfaceRequest,
BootOrderEntry, DiskRequest, InstanceProperties, NetworkInterfaceRequest,
};
use thiserror::Error;

Expand Down Expand Up @@ -57,6 +57,9 @@ pub(crate) enum SpecBuilderError {

#[error("pvpanic device already specified")]
PvpanicInUse,

#[error("Boot option {0} is not an attached device")]
BootOptionMissing(String),
}

#[derive(Debug, Default)]
Expand Down Expand Up @@ -110,6 +113,23 @@ impl SpecBuilder {
Ok(())
}

/// Add a boot option to the boot option list of the spec under construction.
pub fn add_boot_option(
&mut self,
item: &BootOrderEntry,
) -> Result<(), SpecBuilderError> {
if !self.spec.disks.contains_key(item.name.as_str()) {
return Err(SpecBuilderError::BootOptionMissing(item.name.clone()));
}

let boot_order = self.spec.boot_order.get_or_insert(Vec::new());

boot_order
.push(crate::spec::BootOrderEntry { name: item.name.clone() });

Ok(())
}

/// Converts an HTTP API request to add a cloud-init disk to an instance
/// into device/backend entries in the spec under construction.
pub fn add_cloud_init_from_request(
Expand Down
6 changes: 6 additions & 0 deletions bin/propolis-server/src/lib/spec/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ pub(crate) struct Spec {
pub board: Board,
pub disks: HashMap<String, Disk>,
pub nics: HashMap<String, Nic>,
pub boot_order: Option<Vec<BootOrderEntry>>,

pub serial: HashMap<String, SerialPort>,

Expand All @@ -67,6 +68,11 @@ pub(crate) struct Spec {
pub softnpu: SoftNpu,
}

#[derive(Clone, Debug, Default)]
pub(crate) struct BootOrderEntry {
pub name: String,
}

/// Describes the device half of a [`Disk`].
#[derive(Clone, Debug)]
pub enum StorageDevice {
Expand Down
29 changes: 18 additions & 11 deletions bin/propolis-standalone/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -959,8 +959,19 @@ fn generate_smbios(params: SmbiosParams) -> anyhow::Result<smbios::TableBytes> {
Ok(smb_tables.commit())
}

fn generate_bootorder(config: &config::Config) -> anyhow::Result<fwcfg::Entry> {
let names = config.main.boot_order.as_ref().unwrap();
fn generate_bootorder(
config: &config::Config,
log: &slog::Logger,
) -> anyhow::Result<Option<fwcfg::Entry>> {
let Some(names) = config.main.boot_order.as_ref() else {
return Ok(None);
};

slog::info!(
log,
"Bootorder declared as {:?}",
config.main.boot_order.as_ref()
);

let mut order = fwcfg::formats::BootOrder::new();
for name in names.iter() {
Expand Down Expand Up @@ -994,7 +1005,7 @@ fn generate_bootorder(config: &config::Config) -> anyhow::Result<fwcfg::Entry> {
}
}
}
Ok(order.finish())
Ok(Some(order.finish()))
}

fn setup_instance(
Expand Down Expand Up @@ -1306,14 +1317,10 @@ fn setup_instance(

// It is "safe" to generate bootorder (if requested) now, given that PCI
// device configuration has been validated by preceding logic
if config.main.boot_order.is_some() {
fwcfg
.insert_named(
"bootorder",
generate_bootorder(&config)
.context("Failed to generate boot order")?,
)
.unwrap();
if let Some(boot_config) = generate_bootorder(&config, log)
.context("Failed to generate boot order")?
{
fwcfg.insert_named("bootorder", boot_config).unwrap();
}

fwcfg.attach(pio, &machine.acc_mem);
Expand Down
5 changes: 5 additions & 0 deletions crates/propolis-api-types/src/instance_spec/v0.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ pub struct DeviceSpecV0 {
#[serde(skip_serializing_if = "Option::is_none")]
pub qemu_pvpanic: Option<components::devices::QemuPvpanic>,

// Same backwards compatibility reasoning as above.
#[serde(default)]
#[serde(skip_serializing_if = "Option::is_none")]
pub boot_settings: Option<crate::BootSettings>,

#[cfg(feature = "falcon")]
pub softnpu_pci_port: Option<components::devices::SoftNpuPciPort>,
#[cfg(feature = "falcon")]
Expand Down
15 changes: 15 additions & 0 deletions crates/propolis-api-types/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,9 @@ pub struct InstanceEnsureRequest {
#[serde(default)]
pub disks: Vec<DiskRequest>,

#[serde(default)]
pub boot_settings: Option<BootSettings>,

pub migrate: Option<InstanceMigrateInitiateRequest>,

// base64 encoded cloud-init ISO
Expand Down Expand Up @@ -385,6 +388,18 @@ pub struct DiskAttachment {
pub state: DiskAttachmentState,
}

#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct BootSettings {
pub order: Vec<BootOrderEntry>,
}

/// An entry in a list of boot options.
#[derive(Clone, Debug, Deserialize, Serialize, JsonSchema)]
pub struct BootOrderEntry {
/// The name of the device to attempt booting from.
pub name: String,
}

/// A stable index which is translated by Propolis
/// into a PCI BDF, visible to the guest.
#[derive(Copy, Clone, Debug, Deserialize, Serialize, JsonSchema)]
Expand Down
28 changes: 28 additions & 0 deletions lib/propolis-client/src/instance_spec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -184,6 +184,34 @@ impl SpecBuilderV0 {
}
}

/// Sets a boot order. Names here refer to devices included in this spec.
///
/// Permissible to not set this if the implicit boot order is desired, but
/// the implicit boot order may be unstable across device addition and
/// removal.
///
/// If any devices named in this order are not actually present in the
/// constructed spec, Propolis will return an error when the spec is
/// provided.
///
/// XXX: this should certainly return `&mut Self` - all the builders here
/// should. check if any of these are chained..?
pub fn set_boot_order(
&mut self,
boot_order: Vec<String>,
) -> Result<&Self, SpecBuilderError> {
let boot_order = boot_order
.into_iter()
.map(|name| crate::types::BootOrderEntry { name })
.collect();

let settings = crate::types::BootSettings { order: boot_order };

self.spec.devices.boot_settings = Some(settings);

Ok(self)
}

/// Yields the completed spec, consuming the builder.
pub fn finish(self) -> InstanceSpecV0 {
self.spec
Expand Down
2 changes: 2 additions & 0 deletions lib/propolis-client/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ progenitor::generate_api!(

// Some Crucible-related bits are re-exported through simulated
// sled-agent and thus require JsonSchema
BootOrderEntry = { derives = [schemars::JsonSchema] },
BootSettings = { derives = [schemars::JsonSchema] },
DiskRequest = { derives = [schemars::JsonSchema] },
VolumeConstructionRequest = { derives = [schemars::JsonSchema] },
CrucibleOpts = { derives = [schemars::JsonSchema] },
Expand Down
Loading

0 comments on commit 11371b0

Please sign in to comment.