Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Avoid causing UUID conflicts #7266

Merged
merged 5 commits into from
Dec 18, 2024
Merged

[sled-agent] Avoid causing UUID conflicts #7266

merged 5 commits into from
Dec 18, 2024

Conversation

smklein
Copy link
Collaborator

@smklein smklein commented Dec 18, 2024

  • Avoids overwriting the value of "dataset UUID" when creating datasets from omicron_zones_ensure. Instead, don't set any dataset UUID, which lets subsequent calls to datasets_ensure set the right value here.

Fixes #7265

@@ -526,7 +524,7 @@ impl StorageHandle {
// and ask for the set of all datasets from Nexus.
pub async fn upsert_filesystem(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, this is the only call to upsert_filesystem. Once we remove the invocation in omicron_zones_ensure, we can remove this function (and this "optional dataset UUID" argument) altogether.

@smklein smklein marked this pull request as ready for review December 18, 2024 18:11
@smklein smklein requested a review from jgallagher December 18, 2024 18:16
//
// This means that:
// - If the dataset exists and has a UUID, this will be a no-op
// - If the dataset doesn't exist, it'll be created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add something like

Suggested change
// - If the dataset doesn't exist, it'll be created
// - If the dataset doesn't exist, it'll be created without its oxide:uuid zfs property set

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

additional_options: None,
})?;
// Ensure the dataset has a usable UUID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a little weirded out by this block of code. Was it strictly unnecessary before? If there's any chance it wasn't, should we keep it but inside an if let Some(dataset_id) { ... } block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a historical artifact -- I'd prefer to remove it if we can, because I think sled agent is plagued by too many of these "two ways to set a value" paths already, and I'd prefer to reduce it to one.

From 256dca4 (2021) -

// Idempotently ensure the named dataset exists as a filesystem with a UUID.
//
// Returns the UUID attached to the ZFS filesystem.
fn ensure_dataset_with_id(fs_name: &str) -> Result<Uuid, Error> {
Zfs::ensure_filesystem(&fs_name, Mountpoint::Legacy)?;
// Ensure the dataset has a usable UUID.
if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") {
if let Ok(id) = id_str.parse::<Uuid>() {
return Ok(id);
}
}
let id = Uuid::new_v4();
Zfs::set_oxide_value(&fs_name, "uuid", &id.to_string())?;
Ok(id)
}
This is one of the earliest forms of "ensure filesystem, then set a value".

From ccc28fe (2023) - this was renamed, but still had the structure of "sled agent calls Zfs::ensure_filesystem, and then sets the UUID:

fn ensure_dataset(
&mut self,
dataset_id: Uuid,
dataset_name: &DatasetName,
) -> Result<(), Error> {
let zoned = true;
let fs_name = &dataset_name.full();
let do_format = true;
Zfs::ensure_filesystem(
&dataset_name.full(),
Mountpoint::Path(PathBuf::from("/data")),
zoned,
do_format,
)?;
// Ensure the dataset has a usable UUID.
if let Ok(id_str) = Zfs::get_oxide_value(&fs_name, "uuid") {
if let Ok(id) = id_str.parse::<Uuid>() {
if id != dataset_id {
return Err(Error::UuidMismatch {
name: dataset_name.clone(),
old: id,
new: dataset_id,
});
}
return Ok(());
}
}
Zfs::set_oxide_value(&fs_name, "uuid", &dataset_id.to_string())?;
Ok(())
}

From de8bc93 (last week) - Zfs::ensure_filesystem (later renamed to ensure_dataset) started being able to explicitly set properties, as part of the optimization to avoid calling zfs set individually for each property.

let props = build_zfs_set_key_value_pairs(size_details, id);
if exists {
Self::set_values(name, props.as_slice()).map_err(|err| {
EnsureFilesystemError {
name: name.to_string(),
err: err.err.into(),
}
})?;

This was added recently, because (using dtrace) I noticed the callstack was basically:

zfs set <size properties , one at a time>
zfs get uuid
zfs set uuid

And by moving all properties into Zfs::ensure_filesystem, this could be a single call to get/set.

All this is to say:

  • After de8bc93, any property-managing code after calling Zfs::ensure_filesystem should not be necessary. If the UUID was supplied, it should have already been checked/set at this point.
  • I'd prefer to remove this block of code, rather than diverge the code which mutates dataset UUIDs further.

Does that sound reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah! Yes, I was doing the same digging and came to the same conclusion (i.e., #7236 made this code unnecessary). 👍 on the removal.

let dataset_id = Uuid::new_v4();
// Create a filesystem on the newly formatted U.2.
//
// We can call "upsert_filesystem" both with and without a UUID.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a test for calling with None and seeing the dataset come back without an ID?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added a new test: upsert_filesystem_no_uuid

Copy link
Contributor

@jgallagher jgallagher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM; happy to go ahead and approve so this is ready to go once the testing from @iliana / @leftwo comes back.

Copy link
Contributor

@iliana iliana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Blueprint execution now works on a newly-initialized rack (rel/v12 + this PR backport).

@smklein smklein merged commit 510ef5a into main Dec 18, 2024
16 checks passed
@smklein smklein deleted the less-uuid-conflict branch December 18, 2024 21:35
jclulow pushed a commit that referenced this pull request Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dataset UUID conflicts (blueprint vs sled agent)
3 participants