Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target replacement support for VCRs with multiple sub_volumes #1551

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

leftwo
Copy link
Contributor

@leftwo leftwo commented Nov 5, 2024

Updated the volume layer code to support replacement of a target in a VCR that has multiple sub_volumes in it.
Before this we only supported a single sub_volume.

Most of the changes here are for tests. I've added a loop around the VCR replacement tests to create multiple sub volumes and iterate through the test for different VCR configurations.

In addition, it is now required that the generation number change for all sub-volumes when any target is changed.

Some tests that were duplicates have been removed.

Updated the volume layer code to support replacement of a target in
a VCR that has multiple sub_volumes in it.  Before this we only
supported a single sub_volume.

Most of the changes here are for tests.  I've added a loop around vcr
replacement tests to create multiple sub volumes and iterate through
the test for different VCR configurations.

In addition, it is now required that the generation number change for
all sub-volumes when any target is changed.
@leftwo leftwo requested review from jmpesp and mkeeter November 5, 2024 21:42
upstairs/src/volume.rs Outdated Show resolved Hide resolved
let read_only_parent_compare =
Box::new(match (o_read_only_parent, n_read_only_parent) {
(.., None) => {
// The New VCR is none. When comparing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The New VCR is none" is confusing. I think it means "The new VCR has no read-only parent"? Also, saying why this is okay would be helpful (I think we need to support the case because it occurs after scrubbing is complete?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment in ab9e022

for (o_sv, new_sv) in
o_sub_volumes.iter().zip(n_sub_volumes.iter())
{
let sv_res = Self::compare_vcr_for_replacement(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any worries about unbounded recursion blowing up the stack here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But, I'm not sure what to do about it.
We currently don't have any limits in Nexus, so you could craft VCRs forever until something broke. I don't know what the limit is currently and what will break first.

upstairs/src/volume.rs Outdated Show resolved Hide resolved
read_only_parent: None,
};
sub_volumes.push(sv);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be slightly shorter + immutable

        let sub_volumes = (0..sv_count)
            .map(|_| VolumeConstructionRequest::Region {
                block_size,
                blocks_per_extent,
                extent_count,
                opts: opts.clone(),
                gen: 2,
            })
            .collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ab9e022 has this immutable update, plus I made a specific function and called it wherever I could.

id: vol_id,
block_size,
sub_volumes: replacement_sub_volumes,
read_only_parent: rop.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: similarly, this could be

        let replacement_sub_volumes = (0..sv_count)
            .map(|i| {
                let mut replacement_opts = opts.clone();
                let replacement_gen = 3;

                if i == sv_changed {
                    replacement_opts.target[cid] = new_target;
                }

                VolumeConstructionRequest::Region {
                    block_size,
                    blocks_per_extent,
                    extent_count,
                    opts: replacement_opts.clone(),
                    gen: replacement_gen,
                }
            })
            .collect();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 1949ae7 I did this as well as made a few into a common function.

sv_count: usize,
sv_changed: usize,
cid: usize,
add_rop: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add_rop is confusing, because we don't change the ROP state. Maybe with_rop?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, should we test removing the read-only parent? We could use a three-variant enum in these tests, e.g.

enum ReadOnlyParentMode {
    Absent,
    Present,
    Remove,
}

Copy link
Contributor Author

@leftwo leftwo Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the loops are doing all the work here, we could even do:

ReadOnlyParentMode {
    Both,
    OnlyOriginal,
    OnlyReplacement,
    Neither,
}

The OnlyReplacement should fail 100% of the time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this in 2032eea

id: vol_id,
block_size,
sub_volumes: vec![VolumeConstructionRequest::Region {
let mut sub_volumes = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion about building this immutably with (0..sv_count).map(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ab9e022

}
}

fn test_volume_replace_self_inner(sv_count: usize, add_rop: bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about add_rop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 2032eea, no more add_rop.

fn test_volume_replace_no_gen_inner(
sv_count: usize,
sv_changed: usize,
add_rop: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about add_rop

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With 2032eea, no more add_rop.

let opts = generic_crucible_opts(vol_id);

let mut sub_volumes = Vec::new();
for _ in 0..sv_count {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know, we should probably just have a build_subvolume_vcr helper function instead of copy-pasting this into every single function...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in ab9e022

Cleaned up some comments.
Created a function build_replacement_subvol that will create
Vec<VolumeConstructionRequest> of Regions based on input provided.

This should reduce a bunch of common code.

Converted a bunch of other replacement subvolume creation to be done
immutably.
Copy link
Contributor

@mkeeter mkeeter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

The only follow-up suggestion (optional) is to consolidate "test on every ReadOnlyParentMode", e.g.

impl ReadOnlyParentMode
    fn test_all<F: FnMut(Self)>(mut f: F) {
        f(ReadOnlyParentMode::OnlyOriginal);
        f(ReadOnlyParentMode::OnlyReplacement);
        f(ReadOnlyParentMode::Neither);
        f(ReadOnlyParentMode::Both);
    }
}

...which you could then use like

        for sv in 1..4 {
            for sv_changed in 0..sv {
                for cid in 0..3 {
                    ReadOnlyParentMode::test_all(|mode|
                        test_volume_replace_inner(
                            sv,
                            sv_changed,
                            cid,
                            mode,
                        )
                    );
                }
            }
        }

VCRDelta::Generation | VCRDelta::Target { .. } => {}
},
_ => {
panic!("Unsupported multi level CompareResult");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should return ReplaceRequestInvalid instead of panicking

block_size,
blocks_per_extent,
extent_count,
opts: opts.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The targets should be randomized here so that all sub volumes are not the same, using something like https://docs.rs/ipgen/1.0.2/ipgen/fn.ip.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm holding it wrong, somehow...

I get back:
Err(InvalidIpNetwork("[BUG] failed to parse the generated IP address a00:67 as IPv4"))

And, the source says:

    // This error should never happen but I'm not a fan of panicking in libraries
    .map_err(|_| Error::InvalidIpNetwork(format!("[BUG] failed to parse the generated IP address `{}` as IPv4", ipv6_addr.trim_start_matches(':'))))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it is a bug in the latest release 1.0.2.
Fixed in main, but they have not rolled a new release out that contains the fix.

block_size,
blocks_per_extent,
extent_count,
opts: replacement_opts.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This too will have multiple subvolume targets that are the same

// A valid replacement VCR is provided with a larger generation
// number and only one target being different.
let vol_id = Uuid::new_v4();
let opts = generic_crucible_opts(vol_id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe generic_crucible_opts should also randomize the addresses?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants