Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reconfigurator] Endpoint to retrieve keeper cluster membership #6758

Merged

Conversation

karencfv
Copy link
Contributor

@karencfv karencfv commented Oct 2, 2024

Overview

This commit implements a new clickhouse-admin endpoint to retrieve information from the keeper cluster. It returns ClickhouseKeeperClusterMembership

Manual testing

$ curl http://[::1]:8888/keeper/cluster-membership
{"queried_keeper":2,"leader_committed_log_index":38875,"raft_config":[1,2,3]}

@karencfv karencfv requested a review from andrewjstone October 2, 2024 19:42
Comment on lines -515 to -528

/// The configuration of the clickhouse keeper raft cluster returned from a
/// single keeper node
///
/// Each keeper is asked for its known raft configuration via `clickhouse-admin`
/// dropshot servers running in `ClickhouseKeeper` zones. state. We include the
/// leader committed log index known to the current keeper node (whether or not
/// it is the leader) to determine which configuration is newest.
#[derive(Clone, Debug, PartialEq, Eq, Deserialize, Serialize)]
pub struct ClickhouseKeeperClusterMembership {
pub queried_keeper: KeeperId,
pub leader_committed_log_index: u64,
pub raft_config: BTreeSet<KeeperId>,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move this to avoid a circular dependency

raft_output.keeper_servers.iter().map(|s| s.server_id).collect();

Ok(ClickhouseKeeperClusterMembership {
queried_keeper: conf_output.server_id,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewjstone I do remember you mentioned we could retrieve the queried_keeper from somewhere else.

I kinda like the idea of the keeper reporting its own server ID though. This way in case we called the wrong address for some reason or whatever, we will always be 100% sure we have the ID of the keeper we queried, not the keeper we think we queried. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine, but I think it will be unnecessary in the near future.

What I was talking about is that when we write the keeper and server configs, we should also store an extra file that contains the generation number from the KeeperConfigurableSettings and ServerConfigurableSettings of the last written config. We would cache that generation number in the clickhouse-admin server and load it from the file on startup. We need to do this as an optimization so that we don't constantly overwrite the clickhouse config files and force them to be reloaded when nothing has changed.

I was thinking that instead of just saving the generation number we could save the entire ServerConfigurableSettings and KeeperConfigurableSettings and cache those as well. Doing that would allow us to just look in memory of the clickhouse-admin server for the keeper-id. And since we use these settings to write the keeper config files, the values would always be the same.

Copy link
Contributor

@andrewjstone andrewjstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

Just some nits and comments about future work from me.

/// single keeper node
///
/// Each keeper is asked for its known raft configuration via `clickhouse-admin`
/// dropshot servers running in `ClickhouseKeeper` zones. state. We include the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you just moved this struct, but there's a typo I made that you may as well cleanup while you're here.

Suggested change
/// dropshot servers running in `ClickhouseKeeper` zones. state. We include the
/// dropshot servers running in `ClickhouseKeeper` zones. We include the

raft_config.insert(clickhouse_admin_types::KeeperId(i));
}

let expected_keeper_cluster_membership =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I wouldn't even bother defining the whole struct just to pull out the 2 fields we care about. I'd just compare what we get back to the values directly.

raft_output.keeper_servers.iter().map(|s| s.server_id).collect();

Ok(ClickhouseKeeperClusterMembership {
queried_keeper: conf_output.server_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works fine, but I think it will be unnecessary in the near future.

What I was talking about is that when we write the keeper and server configs, we should also store an extra file that contains the generation number from the KeeperConfigurableSettings and ServerConfigurableSettings of the last written config. We would cache that generation number in the clickhouse-admin server and load it from the file on startup. We need to do this as an optimization so that we don't constantly overwrite the clickhouse config files and force them to be reloaded when nothing has changed.

I was thinking that instead of just saving the generation number we could save the entire ServerConfigurableSettings and KeeperConfigurableSettings and cache those as well. Doing that would allow us to just look in memory of the clickhouse-admin server for the keeper-id. And since we use these settings to write the keeper config files, the values would always be the same.

Copy link
Contributor Author

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @andrewjstone 🙇‍♀️ I'll merge this PR, and as we talked about, we'll make the small changes in a follow up PR

@karencfv karencfv merged commit abac284 into oxidecomputer:main Oct 2, 2024
17 checks passed
@karencfv karencfv deleted the endpoint-keeper-cluster-membership branch October 2, 2024 21:55
iliana added a commit that referenced this pull request Oct 2, 2024
iliana added a commit that referenced this pull request Oct 3, 2024
#6758 moved `ClickhouseKeeperClusterMembership` from `nexus-types` to
`clickhouse-admin-types`, but #6627 added new references to the
`ClickhouseKeeperClusterMembership` in `nexus-types` earlier.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants