Start integrating clickhouse clusters to blueprints #6627

andrewjstone · 2024-09-21T00:49:36Z

This PR integrates integrates clickhouse keeper and server node allocation via ClickhouseAllocator into the BlueprintBuilder. It goes about testing that allocation works as expected via the planner. There are many more specific tests for the ClickhouseAllocator and the tests here are mainly checking that everything fits together and the same results occur when run through the planner.

I've left a few comments regarding things discovered while implementing this and am very open to suggestions.

This code is safe to merge because it is currently inert. There is no way to enable the ClickhousePolicy outside of tests yet. This will come in one or two follow up PRs where we add an internal nexus endpoint for enabling the policy and then an OMDB command to trigger the endpoint. Further OMDB support will be added for monitoring. We expect that for the foreseeable future we will always deploy with ClickhousePolicy::deploy_with_standalone = true. This is stage 1 of RFD 468 where we run replicated clickhouse and the existing single node clickhouse together. The OMDB command will not allow us to disable this at first. It's still a question of whether we want to be able to disable the policy altogether and expunge any clickhouse zones. I can add that functionality and a test to this PR or a future one if desired, as mentioned in a comment below.

The inertness of this PR will not necessarily change if we decide to deploy zones via RSS. I'm currently leaning against doing this as it adds complexity. Please see the related comment below. One thing that I did realize while writing that out and that should have been relatively obvious is that deploying keepers will currently take a long time without some extra work to trigger inventory collection. This is because keepers must be added or removed one at a time, and we must retrieve the keeper membership status to reflect that before moving on. However, this status lives in inventory which is currently only collected every 10 minutes. I think we have a way to force collection in OMDB, and so we'll have to interleave that with our blueprint reconfigurations, unless that's already done automatically. I honestly don't remember at this point.

Lastly, there will be other PRs to plug in the actual inventory collection and execution phases for clickhouse cluster reconfiguration. We shouldn't bother even implementing the OMDB policy enablement until all that is complete as it just won't work.

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs

andrewjstone · 2024-09-25T22:06:56Z

nexus/reconfigurator/planning/src/blueprint_builder/clickhouse.rs

@@ -143,9 +137,27 @@ impl ClickhouseAllocator {
        // If we fail to retrieve any inventory for keepers in the current
        // collection than we must not modify our keeper config, as we don't
        // know whether a configuration is ongoing or not.
+        //
+        // There is an exception to this rule: on *new* clusters that have
+        // keeper zones deployed buty do not have any keepers running we must


This actually has me thinking that we should gate removal of the last ClickhouseKeeper node by not allowing it to be expunged. However, that seems a lot easier said than done. I think in general this is not really a problem as we will have 5 of them and we won't expunge them all at once. Each time one is expunged a new one will be provisioned to reach the target in the same way we deal with CRDB.

andrewjstone · 2024-09-25T22:22:19Z

nexus/test-utils/src/lib.rs

@@ -808,6 +808,8 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
                cockroachdb_fingerprint: String::new(),
                cockroachdb_setting_preserve_downgrade:
                    CockroachDbPreserveDowngrade::DoNotModify,
+                // TODO(ajs): Should we create this in RSS? it relies on policy


I believe we have three options here:

Start the clickhouse clusters by enabling the policy with OMDB (not implemented yet) once the control plane boots. Reconfigurator will then bring the clusters up via manual OMDB blueprint generation. This requires no change to RSS, but will take a few rounds to bring up all nodes as shown the tests in this PR. We'll have to do this for deployed systems anyway, as they won't rerun RSS.

Add a policy knob to rss_config.toml and manually start all the zones with the right configuration and put them in the blueprint if the policy is enabled. It's actually unclear to me how this will be better, since Nexus still has to generate and push the configs down to clickhouse-admin inside the ClickhouseServer and ClickhouseKeeper zones.

Always enable the policy and do the zone config as in option 2.

I'm currently leaning towards 1 as it's the least amount of code and we have to do the manual work to bring up the nodes anyway via reconfigurator. If we wanted to get fancy, we may be able to figure out a way to generate the configuration for the clickhouse keeper and server nodes and trigger clickhouse-admin from RSS. That would at least make the startup automatic. But it would duplicate the work and complicate things a bit. Once we automate reconfigurator it will also be unnecessary.

1 seems reasonable to me, with the caveat of needing to document / notify folks that once we're in a multinode clickhouse world, RSS alone isn't enough to set up the rack and manual omdb steps are required after initialization. (Assuming we don't automate reconfigurator before that, and assuming I'm understanding correctly?)

So, last week I actually changed the reconfigurator bits such that for new cluster configurations they can be brought up at once rather than keeper node by keeper node. This will reduce the iterations.

andrewjstone · 2024-09-25T22:26:29Z

sled-agent/src/rack_setup/service.rs

@@ -1486,6 +1486,9 @@ pub(crate) fn build_initial_blueprint_from_sled_configs(
        cockroachdb_fingerprint: String::new(),
        cockroachdb_setting_preserve_downgrade:
            CockroachDbPreserveDowngrade::DoNotModify,
+        // TODO: Allocate keepers and servers from the plan if there is a policy,


See my other comment regarding RSS.

We decided not to configure clusters in RSS. I updated the comment.

karencfv

Really excited to see this coming along!

nexus/reconfigurator/planning/src/planner.rs

karencfv · 2024-09-26T01:41:52Z

nexus/reconfigurator/planning/src/planner.rs

+        // The planner should expunge a zone based on the sled being expunged. Since this
+        // is a clickhouse keeper zone, the clickhouse keeper configuration should change
+        // to reflect this.


Just for my own understanding. Let's say we want to move a keeper node (from a cluster or 3) from one zone to another newly created zone. In this scenario we would:

Change the XML configuration files to a 2 keeper node cluster and deploy it .

Change the XML configuration files to a 3 keeper node cluster (including the host address of the new zone and a new keeper ID) and deploy it.

This means we would not:

Change the XML configuration files to a 3 keeper node cluster by removing the old keeper node's host address and ID and adding the new keeper node's host address and ID, and deploying that

Is this accurate?

Correct. However, I'm still not sure if we'll end up using file configuration or the reconfig command for the keepers. I need to do some more testing once I get to the execution phase.

Sadly, it appears that our current version of the keeper cli does not have the reconfig command :(

$ clickhouse keeper-client -h localhost -p 20001 --q help cd [path] -- Change the working path (default `.`) create <path> <value> [mode] -- Creates new node with the set value delete_stale_backups -- Deletes ClickHouse nodes used for backups that are now inactive find_big_family [path] [n] -- Returns the top n nodes with the biggest family in the subtree (default path = `.` and n = 10) find_super_nodes <threshold> [path] -- Finds nodes with number of children larger than some threshold for the given path (default `.`) flwc <command> -- Executes four-letter-word command get <path> -- Returns the node's value get_stat [path] -- Returns the node's stat (default `.`) help -- Prints this message ls [path] -- Lists the nodes for the given path (default: cwd) rm <path> -- Remove the node rmr <path> -- Recursively deletes path. Confirmation required set <path> <value> [version] -- Updates the node's value. Only update if version matches (default: -1) touch <path> -- Creates new node with an empty string as value. Doesn't throw an exception if the node already exists $ clickhouse keeper-client -h localhost -p 20001 --q reconfig Syntax error: failed at position 1 ('reconfig'): reconfig Expected one of: Keeper client query, cd, create, delete_stale_backups, find_big_family, find_super_nodes, flwc, get, get_stat, help, ls, rm, rmr, set, touch, lgif, csnp, dump, wchp, rqld, wchc, isro, crst, dirs, cons, srst, envi, conf, stat, ruok, srvr, wchs, mntr

We may have to use the configuration files for now until we're able to get onto a newer version

Good to know. Thanks for pointing this out @karencfv.

nexus/types/src/deployment/zone_type.rs

Test coming soon

nexus/reconfigurator/planning/src/blueprint_builder/clickhouse.rs

karencfv

Thanks for addressing my comments! Everything looks good from my side, but I'd love it if someone who has a better grasp of reconfigurator than me took a look as well

jgallagher

The reconfigurator bits LGTM. Just a couple questions, mostly out of clickhouse ignorance.

jgallagher · 2024-09-30T20:05:44Z

nexus/test-utils/src/lib.rs

@@ -808,6 +808,8 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
                cockroachdb_fingerprint: String::new(),
                cockroachdb_setting_preserve_downgrade:
                    CockroachDbPreserveDowngrade::DoNotModify,
+                // TODO(ajs): Should we create this in RSS? it relies on policy


1 seems reasonable to me, with the caveat of needing to document / notify folks that once we're in a multinode clickhouse world, RSS alone isn't enough to set up the rack and manual omdb steps are required after initialization. (Assuming we don't automate reconfigurator before that, and assuming I'm understanding correctly?)

jgallagher · 2024-10-01T14:42:41Z

nexus/reconfigurator/planning/src/planner.rs

+
+        // Updating the inventory to reflect the keepers
+        // should result in the same state, except for the
+        // `highest_seen_keeper_leader_committed_log_index`


Just making sure I understand this - does this mean we expect the blueprint contents to change regularly in a way that doesn't really matter while the system is in a steady state? I think we kind of already have something like that in {internal,external}_dns_version (which can change in response to silo creation or deletion, for example, even in a steady state). I don't think there's anything wrong here; I want to understand for when we start talking about automating blueprint generation and deciding when to set a new target (e.g., if they only thing that's changed is from blueprint A to blueprint B highest_seen_keeper_leader_committed_log_index, then we'd say there's no need to advance the target from A to B?).

does this mean we expect the blueprint contents to change regularly in a way that doesn't really matter while the system is in a steady state?

Yep. The keeper will continue to commit entries to the raft log and inventory will continue to pick up the update.

I want to understand for when we start talking about automating blueprint generation and deciding when to set a new target (e.g., if they only thing that's changed is from blueprint A to blueprint B highest_seen_keeper_leader_committed_log_index, then we'd say there's no need to advance the target from A to B?).

Yes, this log index is mainly used as an inventory generation number that maps to the configuration. It is solely for allowing us to know when a changed configuration is newer than another configuration since we pull configurations from multiple nodes and different nodes can be online or offline at a given time. If the configuration in the inventory isn't different than what's in the blueprint then we don't need to generate a new blueprint.

This PR integrates integrates clickhouse keeper and server node allocation via `ClickhouseAllocator` into the `BlueprintBuilder`. It goes about testing that allocation works as expected via the planner. There are many more specific tests for the `ClickhouseAllocator` and the tests here are mainly checking that everything fits together and the same results occur when run through the planner. There is an additional test and code to allow us to disable the cluster policy and expunge all clickhouse keeper and server zones in one shot. This code is safe to merge because it is currently inert. There is no way to enable the `ClickhousePolicy` outside of tests yet. This will come in one or two follow up PRs where we add an internal nexus endpoint for enabling the policy and then an OMDB command to trigger the endpoint. Further OMDB support will be added for monitoring. We expect that for the foreseeable future we will always deploy with `ClickhousePolicy::deploy_with_standalone = true`. This is stage 1 of RFD 468 where we run replicated clickhouse and the existing single node clickhouse together. Lastly, there will be other PRs to plug in the actual inventory collection and execution phases for clickhouse cluster reconfiguration. We shouldn't bother even implementing the OMDB policy enablement until all that is complete as it just won't work.

#6758 moved `ClickhouseKeeperClusterMembership` from `nexus-types` to `clickhouse-admin-types`, but #6627 added new references to the `ClickhouseKeeperClusterMembership` in `nexus-types` earlier.

Clickhouse clusters are only provisioned via reconfigurator as of #6627. This drastically simplifies things since we don't have to put more policy knobs in RSS and wicket. It also matches the fact that these zones always need nexus running in order to get their configurations and run the appropriate processes anyway. This commit removes the vestigal zone creation that was never actually used.

Start integrating clickhouse clusters to blueprints

e7a1c33

andrewjstone mentioned this pull request Sep 21, 2024

WIP: [Reconfigurator] Add new Clickhouse discretionary zones #6392

Closed

andrewjstone changed the title ~~Start integrating clickhouse clusters to blueprints~~ WIP: Start integrating clickhouse clusters to blueprints Sep 21, 2024

andrewjstone added 9 commits September 23, 2024 15:47

Insert clickhouse cluster related blueprint tables

0a39c6f

delete clickhouse cluster related tables

482d650

fix build and clippy

61abafb

wip - safekeeping

3ed0110

deal with going from 0->1 keeper

ac591fd

wrap up first planner test

47e35bf

expunge test

c5a2aea

fix test name

2466bdd

fix clippy and tests

29db4b5

andrewjstone commented Sep 25, 2024

View reviewed changes

nexus/reconfigurator/planning/src/blueprint_builder/builder.rs Show resolved Hide resolved

andrewjstone commented Sep 25, 2024

View reviewed changes

minor cleanup

158a9b5

andrewjstone changed the title ~~WIP: Start integrating clickhouse clusters to blueprints~~ Start integrating clickhouse clusters to blueprints Sep 25, 2024

andrewjstone requested review from jgallagher, sunshowers, davepacheco and karencfv September 25, 2024 22:44

andrewjstone marked this pull request as ready for review September 25, 2024 22:44

andrewjstone added 2 commits September 25, 2024 23:13

fix clippy fix

ce47576

cleanup after yourself you slob

a03052b

karencfv reviewed Sep 26, 2024

View reviewed changes

Expunge relevant zones when clickhouse cluster is disabled via policy

ae380df

Test coming soon

andrewjstone commented Sep 26, 2024

View reviewed changes

nexus/reconfigurator/planning/src/blueprint_builder/clickhouse.rs Show resolved Hide resolved

andrewjstone added 2 commits September 26, 2024 18:48

add a test for zone expungement on policy disable

592608d

fix up comments

ad1801b

andrewjstone added 2 commits September 27, 2024 16:10

support adding initial config at once

f73ea10

fix comments

072097f

karencfv reviewed Sep 30, 2024

View reviewed changes

jgallagher approved these changes Oct 1, 2024

View reviewed changes

andrewjstone added 2 commits October 1, 2024 22:28

Merge branch 'main' into wire-up-clickhouse-to-blueprint

9faa3f3

Comments related to RSS

86c1fc0

andrewjstone enabled auto-merge (squash) October 1, 2024 22:48

andrewjstone merged commit 6d65e1e into main Oct 2, 2024
18 checks passed

andrewjstone deleted the wire-up-clickhouse-to-blueprint branch October 2, 2024 03:40

iliana added a commit that referenced this pull request Oct 2, 2024

semantic merge conflict: #6627 vs #6758

887da76

iliana mentioned this pull request Oct 2, 2024

semantic merge conflict: #6627 vs #6758 #6760

Merged

This was referenced Oct 3, 2024

Don't provision clickhouse servers & keepers in RSS #6767

Merged

Integrate clickhouse planner code into blueprint builder #6577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start integrating clickhouse clusters to blueprints #6627

Start integrating clickhouse clusters to blueprints #6627

andrewjstone commented Sep 21, 2024 •

edited

Loading

andrewjstone Sep 25, 2024

andrewjstone Sep 25, 2024

jgallagher Sep 30, 2024

andrewjstone Oct 1, 2024

andrewjstone Sep 25, 2024

andrewjstone Oct 1, 2024

karencfv left a comment

karencfv Sep 26, 2024

andrewjstone Sep 26, 2024

karencfv Sep 30, 2024

andrewjstone Oct 1, 2024

karencfv left a comment

jgallagher left a comment

jgallagher Sep 30, 2024

jgallagher Oct 1, 2024

andrewjstone Oct 1, 2024 •

edited

Loading

Start integrating clickhouse clusters to blueprints #6627

Start integrating clickhouse clusters to blueprints #6627

Conversation

andrewjstone commented Sep 21, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karencfv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karencfv left a comment

Choose a reason for hiding this comment

jgallagher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andrewjstone Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

andrewjstone commented Sep 21, 2024 •

edited

Loading

andrewjstone Oct 1, 2024 •

edited

Loading