Skip to content

Commit

Permalink
Prevent unintentional Upstairs takeovers (#5221)
Browse files Browse the repository at this point in the history
Volumes are "checked out" from Nexus for many reasons, some of which
include sending to another service for use in `Volume::construct`. When
that service activates the resulting Volume, this will forcibly take
over any existing downstairs connections based on the Upstairs'
generation number. This is intentional, and was designed so Nexus, in
handing out Volumes with increasing generation numbers, can be sure that
the resulting Volume works no matter what (for example, even if a
previous Upstairs is wedged somehow, even if the service that is running
the previous Upstairs is no longer accepting network connections).

Up until now, Nexus wouldn't allow checking out a Volume if there is any
chance a Propolis could be running that may use that Volume. This meant
restricting certain operations, like creating a snapshot when a disk is
attached to an instance that is stopped: any action Nexus would take to
attempt a snapshot using a Pantry would race with a user's request to
start that instance, and if the Volume checkouts occur in the wrong
order the Pantry would take over connections from Propolis, resulting in
guest OS errors.

Nexus _can_ do this safely though: it has all the information required
to know when a checkout is safe to do, and when it may not be safe. This
commit adds checks to the Volume checkout transaction that are based on
the reason that checkout is occurring, and requires call sites that are
performing a checkout to say why they are. Because these checks are
performed inside a transaction, Nexus can say for sure when it is safe
to allow a Volume to be checked out for a certain reason.

For example, in the scenario of taking a snapshot of a disk attached to
an instance that is stopped, there are two checkout operations that have
the possiblity of racing:

1) the one that Nexus will send to a Pantry during a snapshot create
   saga.

2) the one that Nexus will send to a Propolis during an instance start
   saga.

If 1 occurs before 2, then Propolis will take over the downstairs
connections that the Pantry has established, and the snapshot create
saga will fail, but the guest OS for that Propolis will not see any
errors. If 2 occurs before 1, then the 1 checkout will fail due to one
of the conditions added in this commit: the checkout is being performed
for use with a Pantry, and a Propolis _may_ exist, so reject the
checkout attempt.

Fixes #3289.
  • Loading branch information
jmpesp authored Apr 3, 2024
1 parent 04b0e34 commit 35dfd9a
Show file tree
Hide file tree
Showing 16 changed files with 764 additions and 118 deletions.
3 changes: 2 additions & 1 deletion nexus/db-model/src/schema_versions.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ use std::collections::BTreeMap;
///
/// This must be updated when you change the database schema. Refer to
/// schema/crdb/README.adoc in the root of this repository for details.
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(49, 0, 0);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(50, 0, 0);

/// List of all past database schema versions, in *reverse* order
///
Expand All @@ -29,6 +29,7 @@ static KNOWN_VERSIONS: Lazy<Vec<KnownVersion>> = Lazy::new(|| {
// | leaving the first copy as an example for the next person.
// v
// KnownVersion::new(next_int, "unique-dirname-with-the-sql-files"),
KnownVersion::new(50, "add-lookup-disk-by-volume-id-index"),
KnownVersion::new(49, "physical-disk-state-and-policy"),
KnownVersion::new(48, "add-metrics-producers-time-modified-index"),
KnownVersion::new(47, "add-view-for-bgp-peer-configs"),
Expand Down
1 change: 1 addition & 0 deletions nexus/db-queries/src/db/datastore/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ pub use virtual_provisioning_collection::StorageType;
pub use volume::read_only_resources_associated_with_volume;
pub use volume::CrucibleResources;
pub use volume::CrucibleTargets;
pub use volume::VolumeCheckoutReason;

// Number of unique datasets required to back a region.
// TODO: This should likely turn into a configuration option.
Expand Down
394 changes: 383 additions & 11 deletions nexus/db-queries/src/db/datastore/volume.rs

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion nexus/src/app/image.rs
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,10 @@ impl super::Nexus {

let image_volume = self
.db_datastore
.volume_checkout_randomize_ids(db_snapshot.volume_id)
.volume_checkout_randomize_ids(
db_snapshot.volume_id,
db::datastore::VolumeCheckoutReason::ReadOnlyCopy,
)
.await?;

db::model::Image {
Expand Down
23 changes: 21 additions & 2 deletions nexus/src/app/instance.rs
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,13 @@ enum InstanceStateChangeRequestAction {
SendToSled(Uuid),
}

/// What is the higher level operation that is calling
/// `instance_ensure_registered`?
pub(crate) enum InstanceRegisterReason {
Start { vmm_id: Uuid },
Migrate { vmm_id: Uuid, target_vmm_id: Uuid },
}

impl super::Nexus {
pub fn instance_lookup<'a>(
&'a self,
Expand Down Expand Up @@ -1010,6 +1017,7 @@ impl super::Nexus {
db_instance: &db::model::Instance,
propolis_id: &Uuid,
initial_vmm: &db::model::Vmm,
operation: InstanceRegisterReason,
) -> Result<(), Error> {
opctx.authorize(authz::Action::Modify, authz_instance).await?;

Expand Down Expand Up @@ -1065,8 +1073,19 @@ impl super::Nexus {
}
};

let volume =
self.db_datastore.volume_checkout(disk.volume_id).await?;
let volume = self
.db_datastore
.volume_checkout(
disk.volume_id,
match operation {
InstanceRegisterReason::Start { vmm_id } =>
db::datastore::VolumeCheckoutReason::InstanceStart { vmm_id },
InstanceRegisterReason::Migrate { vmm_id, target_vmm_id } =>
db::datastore::VolumeCheckoutReason::InstanceMigrate { vmm_id, target_vmm_id },
}
)
.await?;

disk_reqs.push(sled_agent_client::types::DiskRequest {
name: disk.name().to_string(),
slot: sled_agent_client::types::Slot(slot.0),
Expand Down
5 changes: 4 additions & 1 deletion nexus/src/app/sagas/common_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -769,7 +769,10 @@ pub(crate) async fn call_pantry_attach_for_disk(

let disk_volume = nexus
.datastore()
.volume_checkout(disk.volume_id)
.volume_checkout(
disk.volume_id,
db::datastore::VolumeCheckoutReason::Pantry,
)
.await
.map_err(ActionError::action_failed)?;

Expand Down
10 changes: 8 additions & 2 deletions nexus/src/app/sagas/disk_create.rs
Original file line number Diff line number Diff line change
Expand Up @@ -390,7 +390,10 @@ async fn sdc_regions_ensure(

let volume = osagactx
.datastore()
.volume_checkout(db_snapshot.volume_id)
.volume_checkout(
db_snapshot.volume_id,
db::datastore::VolumeCheckoutReason::ReadOnlyCopy,
)
.await
.map_err(ActionError::action_failed)?;

Expand Down Expand Up @@ -433,7 +436,10 @@ async fn sdc_regions_ensure(

let volume = osagactx
.datastore()
.volume_checkout(image.volume_id)
.volume_checkout(
image.volume_id,
db::datastore::VolumeCheckoutReason::ReadOnlyCopy,
)
.await
.map_err(ActionError::action_failed)?;

Expand Down
3 changes: 2 additions & 1 deletion nexus/src/app/sagas/finalize_disk.rs
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,8 @@ impl NexusSaga for SagaFinalizeDisk {
silo_id: params.silo_id,
project_id: params.project_id,
disk_id: params.disk_id,
attached_instance_and_sled: None,
attach_instance_id: None,
use_the_pantry: true,
create_params: params::SnapshotCreate {
identity: external::IdentityMetadataCreateParams {
name: snapshot_name.clone(),
Expand Down
7 changes: 6 additions & 1 deletion nexus/src/app/sagas/instance_migrate.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@

use super::{NexusActionContext, NexusSaga, ACTION_GENERATE_ID};
use crate::app::instance::{
InstanceStateChangeError, InstanceStateChangeRequest,
InstanceRegisterReason, InstanceStateChangeError,
InstanceStateChangeRequest,
};
use crate::app::sagas::{
declare_saga_actions, instance_common::allocate_vmm_ipv6,
Expand Down Expand Up @@ -356,6 +357,10 @@ async fn sim_ensure_destination_propolis(
&db_instance,
&vmm.id,
&vmm,
InstanceRegisterReason::Migrate {
vmm_id: params.src_vmm.id,
target_vmm_id: vmm.id,
},
)
.await
.map_err(ActionError::action_failed)?;
Expand Down
2 changes: 2 additions & 0 deletions nexus/src/app/sagas/instance_start.rs
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ use super::{
instance_common::allocate_vmm_ipv6, NexusActionContext, NexusSaga,
SagaInitError, ACTION_GENERATE_ID,
};
use crate::app::instance::InstanceRegisterReason;
use crate::app::instance::InstanceStateChangeError;
use crate::app::sagas::declare_saga_actions;
use chrono::Utc;
Expand Down Expand Up @@ -527,6 +528,7 @@ async fn sis_ensure_registered(
&db_instance,
&propolis_id,
&vmm_record,
InstanceRegisterReason::Start { vmm_id: propolis_id },
)
.await
.map_err(ActionError::action_failed)?;
Expand Down
Loading

0 comments on commit 35dfd9a

Please sign in to comment.