Skip to content

Commit

Permalink
RandomnWithDistinctSleds region allocation strategy
Browse files Browse the repository at this point in the history
PR #3650 introduced the Random region allocation strategy to allocate
regions randomly across the rack. This expands on that with the addition
of the RandomWithDistinctSleds region allocation strategy. This strategy
is the same, but requires the 3 crucible regions be allocated on 3
different sleds to improve resiliency against a whole-sled failure.

The Random strategy still exists, and does not require 3 distinct sleds.
This is useful in one-sled environments such as the integration tests,
and lab setups. This PR adds the ability to configure the allocation
strategy in the Nexus PackageConfig toml. Anyone running in a one-sled
setup will need to configure that to one-sled mode (as is done for the
integration test environment).

This also fixes a shortcoming of #3650 whereby multiple datasets on a
single zpool could be selected. That fix applies to both the old Random
strategy and the new RandomWithDistinctSleds strategy.

`smf/nexus/config-partial.toml` is configured for
RandomWithDistinctSleds, as that is what we want to use on prod.

As I mentioned, the integration tests are not using the distinct sleds
allocation strategy. I attempted to add 2 extra sleds to the simulated
environment but found that this broke more things than I had the
understanding to fix in this PR. It would be nice in the future for the
sim environment to have 3 sleds in it though, not just for this but for
anything else that might have different behaviors in a multi-sled setup.

In the present, I have unit tests that verify the allocation behavior
works correctly with cockroachdb, and we can try it out on dogfood.
  • Loading branch information
faithanalog committed Sep 2, 2023
1 parent de1fa54 commit 35f2a45
Show file tree
Hide file tree
Showing 11 changed files with 442 additions and 171 deletions.
38 changes: 38 additions & 0 deletions common/src/nexus_config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,8 @@ pub struct PackageConfig {
pub dendrite: HashMap<SwitchLocation, DpdConfig>,
/// Background task configuration
pub background_tasks: BackgroundTaskConfig,
/// Default Crucible region allocation strategy
pub default_region_allocation_strategy: RegionAllocationStrategy,
}

#[derive(Clone, Debug, PartialEq, Deserialize, Serialize)]
Expand Down Expand Up @@ -594,6 +596,9 @@ mod test {
dns_external.period_secs_propagation = 7
dns_external.max_concurrent_server_updates = 8
external_endpoints.period_secs = 9
[default_region_allocation_strategy]
type = "random"
seed = 0
"##,
)
.unwrap();
Expand Down Expand Up @@ -677,6 +682,10 @@ mod test {
period_secs: Duration::from_secs(9),
}
},
default_region_allocation_strategy:
crate::nexus_config::RegionAllocationStrategy::Random {
seed: Some(0)
}
},
}
);
Expand Down Expand Up @@ -724,6 +733,8 @@ mod test {
dns_external.period_secs_propagation = 7
dns_external.max_concurrent_server_updates = 8
external_endpoints.period_secs = 9
[default_region_allocation_strategy]
type = "random"
"##,
)
.unwrap();
Expand Down Expand Up @@ -894,3 +905,30 @@ mod test {
);
}
}

/// Defines a strategy for choosing what physical disks to use when allocating
/// new crucible regions.
///
/// NOTE: More strategies can - and should! - be added.
///
/// See <https://rfd.shared.oxide.computer/rfd/0205> for a more
/// complete discussion.
///
/// Longer-term, we should consider:
/// - Storage size + remaining free space
/// - Sled placement of datasets
/// - What sort of loads we'd like to create (even split across all disks
/// may not be preferable, especially if maintenance is expected)
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
#[serde(tag = "type", rename_all = "snake_case")]
pub enum RegionAllocationStrategy {
/// Choose disks pseudo-randomly. An optional seed may be provided to make
/// the ordering deterministic, otherwise the current time in nanoseconds
/// will be used. Ordering is based on sorting the output of `md5(UUID of
/// candidate dataset + seed)`. The seed does not need to come from a
/// cryptographically secure source.
Random { seed: Option<u64> },

/// Like Random, but ensures that each region is allocated on its own sled.
RandomWithDistinctSleds { seed: Option<u64> },
}
22 changes: 22 additions & 0 deletions nexus/db-model/src/queries/region_allocation.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,13 @@ table! {
}
}

table! {
shuffled_candidate_datasets {
id -> Uuid,
pool_id -> Uuid,
}
}

table! {
candidate_regions {
id -> Uuid,
Expand Down Expand Up @@ -89,6 +96,19 @@ table! {
}
}

table! {
one_zpool_per_sled (pool_id) {
pool_id -> Uuid
}
}

table! {
one_dataset_per_zpool {
id -> Uuid,
pool_id -> Uuid
}
}

table! {
inserted_regions {
id -> Uuid,
Expand Down Expand Up @@ -141,6 +161,7 @@ diesel::allow_tables_to_appear_in_same_query!(
);

diesel::allow_tables_to_appear_in_same_query!(old_regions, dataset,);
diesel::allow_tables_to_appear_in_same_query!(old_regions, zpool,);

diesel::allow_tables_to_appear_in_same_query!(
inserted_regions,
Expand All @@ -149,6 +170,7 @@ diesel::allow_tables_to_appear_in_same_query!(

diesel::allow_tables_to_appear_in_same_query!(candidate_zpools, dataset,);
diesel::allow_tables_to_appear_in_same_query!(candidate_zpools, zpool,);
diesel::allow_tables_to_appear_in_same_query!(candidate_datasets, dataset);

// == Needed for random region allocation ==

Expand Down
Loading

0 comments on commit 35f2a45

Please sign in to comment.