-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RandomnWithDistinctSleds region allocation strategy #3858
Conversation
I can't find it in the logs but I am assuming that the |
} | ||
|
||
#[tokio::test] | ||
/// Note that this test is currently non-deterministic. It can be made | ||
/// deterministic by generating deterministic *dataset* Uuids. The sled and | ||
/// pool IDs should not matter. | ||
async fn test_region_allocation() { | ||
async fn test_region_allocation_strat_random() { | ||
let logctx = dev::test_setup_log("test_region_allocation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: log doesn't match test name
/// It should always pick datasets where no two datasets are on the same | ||
/// zpool and no two zpools are on the same sled. | ||
async fn test_region_allocation_strat_random_with_distinct_sleds() { | ||
let logctx = dev::test_setup_log("test_region_allocation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: log doesn't match test name
@@ -798,8 +863,8 @@ mod test { | |||
assert!(disk_zpools.insert(dataset.pool_id)); | |||
|
|||
// Must be 3 unique sleds | |||
// TODO: When allocation chooses 3 distinct sleds, uncomment this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay!
/// Ensure the [`RegionAllocationStrategy::RandomWithDistinctSleds`] | ||
/// strategy fails when there aren't enough distinct sleds. | ||
async fn test_region_allocation_strat_random_with_distinct_sleds_fails() { | ||
let logctx = dev::test_setup_log("test_region_allocation"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit log name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for catching these, i hadnt even noticed the log initialization was taking this string at all
let seed_bytes = seed.to_le_bytes(); | ||
|
||
let query: Box<dyn CteQuery<SqlType = candidate_zpools::SqlType>> = | ||
Box::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(unrelated to this specific spot) I just wanna say, thanks again for going so deep into making this CTE magic work. This is a non-trivial query, and I like the way you've done it. Great work.
|
||
[default_region_allocation_strategy] | ||
# we only have one sled in the test environment, so we need to use the | ||
# `Random` strategy, instead of `RandomWithDistinctSleds` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For most of our tests, I think this makes sense.
smf/nexus/config-partial.toml
Outdated
[default_region_allocation_strategy] | ||
# by default, allocate across 3 distinct sleds | ||
# seed is omitted so a new seed will be chosen with every allocation. | ||
type = "random_with_distinct_sleds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I chatted with @iliana about this earlier, and here's the TL;DR of our conversation:
- Most of the "non-deploy" tests work using the config from
nexus/tests/config.test.toml
- The "real prod config" is using this file (
smf/nexus/config-partial.toml
), and uses the distinct sled strategy, as we want
So the problem cases are the following:
- Developers trying to run a single sled
- The "deploy job", which typically runs a "non-gimlet" standalone sled (similar to the developer workflow)
These cases are painful, because we want some config options to be like prod (e.g., we use internal DNS! We have an underlay, kinda!) but others we don't want to look like our prod environment (most obviously: we're only running with a single sled).
iliana and I came to the conclusion that a couple ways to solve this would be the following:
Add a new target option to omicron-package
(my slight preference)
- Add a new feature to the omicron-package target type named
cluster
, which can be eithersingle-sled
ormulti-sled
. This would involve updatingpackage/src/lib.rs
andpackage/src/target.rs
. There is some overlap with thegimlet
vsgimlet-standalone
vsnon-gimlet
options, but I think it's probably worth just adding a new feature explicitly. - Split the Nexus config files into two: what previously was
smf/nexus/config-partial.toml
becomessmf/nexus/multi-sled/config-partial.toml
andsmf/nexus/single-sled/config-partial.toml
. We can provide a different region allocation strategy in each, through thePackageConfig
. - Update
package-manifest.toml
to reference the rightsmf
subdirectory for Nexus, based on thecluster
value (you can look at how we do this for thesled-agent
target using themachine
target).
Pros: Plumbs Nexus-specific options directly to Nexus
Cons: Possible to select confusing combinations of targets, like "gimlet-standalone" + "multi-sled"?
Try to overload an existing target?
- Use the
non-gimlet
andgimlet-standalone
options, and "infer" that these are single-sled - Pass an argument from sled agent to Nexus (via
DeploymentConfig
) when this is the case
Pros: One fewer target?
Cons: Prevents us from effectively having a multi-sled setup without real gimlets (I think this would break the Canada cluster), kinda confusing that a Nexus-level option is plumbed through the sled agent config
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you've sold me on adding a new target option to omicron-package i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What cluster
option will I want to pass if I want to set up a multi-sled dev cluster with exactly two sleds? It seems like the single-sled
option should still work for now (though it makes me wonder if there's a clearer label we could put on it); is that correct?
abc7293
to
960489f
Compare
@smklein I named it |
oh boy, CI failures! ili and us tried these tests locally just running |
The
|
9a5c954
to
2f41c2e
Compare
215a0d5
to
e266136
Compare
finally CI is happy. but at what cost |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies on the delay for this - LGTM, thanks for working through the config issues.
PR #3650 introduced the Random region allocation strategy to allocate regions randomly across the rack. This expands on that with the addition of the RandomWithDistinctSleds region allocation strategy. This strategy is the same, but requires the 3 crucible regions be allocated on 3 different sleds to improve resiliency against a whole-sled failure. The Random strategy still exists, and does not require 3 distinct sleds. This is useful in one-sled environments such as the integration tests, and lab setups. This PR adds the ability to configure the allocation strategy in the Nexus PackageConfig toml. Anyone running in a one-sled setup will need to configure that to one-sled mode (as is done for the integration test environment). This also fixes a shortcoming of #3650 whereby multiple datasets on a single zpool could be selected. That fix applies to both the old Random strategy and the new RandomWithDistinctSleds strategy. `smf/nexus/config-partial.toml` is configured for RandomWithDistinctSleds, as that is what we want to use on prod. As I mentioned, the integration tests are not using the distinct sleds allocation strategy. I attempted to add 2 extra sleds to the simulated environment but found that this broke more things than I had the understanding to fix in this PR. It would be nice in the future for the sim environment to have 3 sleds in it though, not just for this but for anything else that might have different behaviors in a multi-sled setup. In the present, I have unit tests that verify the allocation behavior works correctly with cockroachdb, and we can try it out on dogfood.
This adds the rack-topology package feature with possible values of single-sled or multi-sled. single-sled is intended for dev/CI deployments, while multi-sled is intended for dogfood/prod. The value of this determines which nexus config-partial.toml is packaged. Right now the only difference single/multi is the crucible region allocation strategy.
e266136
to
e689108
Compare
alright ill merge this monday |
I am submitting this draft now, but I am not 100% sure that the configuration location I have selected for the allocation strategy (PackageConfig toml) is the correct one. It would work for getting stuff onto dogfood/prod, but I don't know how to use a different one when compiling a build to run locally for dev purposes. That's a problem, because as devs we're all mostly running "1-sled" setups, so region allocation would just Not Work for any of us.
If you know how to configure this differently for a local dev build, then maybe we need docs for that.
If that's just not in the cards, then maybe it should go into DeploymentConfig instead.
Either way, the core logic of this is fundamentally done. The tests are in there. The integration tests pass. And configuration works right now using PackageConfig, so it is in a state where someone other than me can adjust the way configuration works or add docs if necessary. I will be gone for about a week, so I invite anyone to make changes around that config if they want to do that to get this merged while I'm out.
PR #3650 introduced the Random region allocation strategy to allocate regions randomly across the rack. This expands on that with the addition of the RandomWithDistinctSleds region allocation strategy. This strategy is the same, but requires the 3 crucible regions be allocated on 3 different sleds to improve resiliency against a whole-sled failure.
The Random strategy still exists, and does not require 3 distinct sleds. This is useful in one-sled environments such as the integration tests, and lab setups. This PR adds the ability to configure the allocation strategy in the Nexus PackageConfig toml. Anyone running in a one-sled setup will need to configure that to one-sled mode (as is done for the integration test environment).
This also fixes a shortcoming of #3650 whereby multiple datasets on a single zpool could be selected. That fix applies to both the old Random strategy and the new RandomWithDistinctSleds strategy.
smf/nexus/config-partial.toml
is configured forRandomWithDistinctSleds, as that is what we want to use on prod.
As I mentioned, the integration tests are not using the distinct sleds allocation strategy. I attempted to add 2 extra sleds to the simulated environment but found that this broke more things than I had the understanding to fix in this PR. It would be nice in the future for the sim environment to have 3 sleds in it though, not just for this but for anything else that might have different behaviors in a multi-sled setup.
In the present, I have unit tests that verify the allocation behavior works correctly with cockroachdb, and we can try it out on dogfood.
Fixes #3702