-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reconfigurator: Tracking issue for multi-node clickhouse deployment #5999
Comments
karencfv
added a commit
that referenced
this issue
Aug 14, 2024
…6304) ## Overview New SMF service in `clickhouse` and `clickhouse_keeper` zones which runs a dropshot server. The API contains a single `/node/address` endpoint to retrieve the node's listen address. Other endpoints will be added in future PRs. ## Purpose This server will be used to manage ClickHouse server and Keeper nodes. For now it performs a single basic action to keep the size of this PR small, but this server will perform other actions like generating the XML config files, retrieving the state of the node etc. ## Testing I've deployed locally with the following results: ```console root@oxz_switch:~# curl http://[fd00:1122:3344:101::e]:8888/node/address {"clickhouse_address":"[fd00:1122:3344:101::e]:8123"} ``` ```console root@oxz_clickhouse_2c213ff2:~# cat /var/svc/log/oxide-clickhouse-admin:default.log [ Aug 14 06:54:42 Enabled. ] [ Aug 14 06:54:42 Rereading configuration. ] [ Aug 14 06:54:45 Rereading configuration. ] [ Aug 14 06:54:46 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/clickhouse-admin/bin/clickhouse-admin run -c /var/svc/manifest/site/clickhouse-admin/config.toml -a [fd00:1122:3344:101::e]:8123 -H [fd00:1122:3344:101::e]:8888 &"). ] [ Aug 14 06:54:46 Method "start" exited with status 0. ] note: configured to log to "/dev/stdout" {"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:54:46.721122327Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:205"} {"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:56:17.908877036Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:775","remote_addr":"[fd00:1122:3344:101::2]:37268"} {"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:56:17.91734856Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"uri":"/node/address","method":"GET","req_id":"62a3d8fc-e37e-42aa-a715-52dbce8aa493","remote_addr":"[fd00:1122:3344:101::2]:37268","local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:914","latency_us":3151,"response_code":"200"} ``` Related: #5999
karencfv
added a commit
that referenced
this issue
Aug 20, 2024
…ted mode (#6343) ## Overview This commit introduces a few changes: - a new `clickhouse_server` smf service which runs the old "replicated" mode from the `clickhouse` service - a new `replicated` field for the oximeter configuration file which is consumed by the `oximeter` binary that runs the replicated SQL against a database. It now connects to the listen address from `ServiceName::ClickhouseServer` or `ServiceName::Clickhouse` depending which zone has been deployed. - a new `--clickhouse-topology` build target flag which builds artifacts based on either a `single-node` or `replicated-cluster` setup. The difference between the two is whether the `oximeter` SMF service is executing the `oximeter` CLI with the `--replicated` flag or not. __CAVEAT:__ It's still necessary to manually change the RSS [node count constants](https://github.com/oxidecomputer/omicron/blob/ffc8807caf04ca3f81b543c520ddbe26b3284264/sled-agent/src/rack_setup/plan/service.rs#L57-L77) to the specified amount for each clickhouse topology mode. This requirement will be short lived as we are moving to use reconfigurator. ## Usage To run single node ClickHouse nothing changes, artifacts can be built the same way as before. To run replicated ClickHouse set the [node count constants](https://github.com/oxidecomputer/omicron/blob/ffc8807caf04ca3f81b543c520ddbe26b3284264/sled-agent/src/rack_setup/plan/service.rs#L57-L77) to the specified amount, and set the build target in the following manner: ```console $ cargo run --locked --release --bin omicron-package -- -t <NAME> target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster Finished `release` profile [optimized] target(s) in 1.03s Running `target/release/omicron-package -t <NAME> target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster` Logging to: /home/coatlicue/src/omicron/out/LOG Created new build target 'centzon' and set it as active $ cargo run --locked --release --bin omicron-package -- -t <NAME> package <...> $ pfexec ./target/release/omicron-package -t <NAME> install ``` ## Purpose As laid out in [RFD 468](https://rfd.shared.oxide.computer/rfd/0468), to roll out replicated ClickHouse we will need the ability to roll out either replicated or single node ClickHouse for an undetermined amount of time. This commit is a step in that direction. We need to have separate services for running replicated or single-node ClickHouse servers. ## Testing Deploying omicron on a helios box with both modes. Single node: ```console $ cargo run --locked --release --bin omicron-package -- -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled Finished `release` profile [optimized] target(s) in 0.94s Running `target/release/omicron-package -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled` Logging to: /home/coatlicue/src/omicron/out/LOG Created new build target 'centzon' and set it as active $ cargo run --locked --release --bin omicron-package -- -t centzon package <...> $ pfexec ./target/release/omicron-package -t centzon install Logging to: /home/coatlicue/src/omicron/out/LOG $ zoneadm list | grep clickhouse oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e root@oxz_clickhouse_7ce86c8b:~# /opt/oxide/clickhouse/clickhouse client --host fd00:1122:3344:101::e ClickHouse client version 23.8.7.1. Connecting to fd00:1122:3344:101::e:9000 as user default. Connected to ClickHouse server version 23.8.7 revision 54465. oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e.local :) SHOW TABLES FROM oximeter SHOW TABLES FROM oximeter Query id: 5e91fafb-4d70-4a27-a188-75fb83bb7e5e ┌─name───────────────────────┐ │ fields_bool │ │ fields_i16 │ │ fields_i32 │ │ fields_i64 │ │ fields_i8 │ │ fields_ipaddr │ │ fields_string │ │ fields_u16 │ │ fields_u32 │ │ fields_u64 │ │ fields_u8 │ │ fields_uuid │ │ measurements_bool │ │ measurements_bytes │ │ measurements_cumulativef32 │ │ measurements_cumulativef64 │ │ measurements_cumulativei64 │ │ measurements_cumulativeu64 │ │ measurements_f32 │ │ measurements_f64 │ │ measurements_histogramf32 │ │ measurements_histogramf64 │ │ measurements_histogrami16 │ │ measurements_histogrami32 │ │ measurements_histogrami64 │ │ measurements_histogrami8 │ │ measurements_histogramu16 │ │ measurements_histogramu32 │ │ measurements_histogramu64 │ │ measurements_histogramu8 │ │ measurements_i16 │ │ measurements_i32 │ │ measurements_i64 │ │ measurements_i8 │ │ measurements_string │ │ measurements_u16 │ │ measurements_u32 │ │ measurements_u64 │ │ measurements_u8 │ │ timeseries_schema │ │ version │ └────────────────────────────┘ 41 rows in set. Elapsed: 0.014 sec. oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e.local :) SELECT * FROM oximeter.fields_i64 SELECT * FROM oximeter.fields_i64 Query id: 4bbcec72-101f-4cf4-9966-680381f5b62c ┌─timeseries_name────────────────────────┬───────timeseries_key─┬─field_name──┬─field_value─┐ │ http_service:request_latency_histogram │ 8326032694586838023 │ status_code │ 200 │ <...> $ pfexec zlogin oxz_oximeter_b235200f-f0ad-4218-9184-d995df5acaf0 [Connected to zone 'oxz_oximeter_b235200f-f0ad-4218-9184-d995df5acaf0' pts/3] The illumos Project helios-2.0.22784 July 2024 root@oxz_oximeter_b235200f:~# cat /var/svc/manifest/site/oximeter/config.toml # Example configuration file for running an oximeter collector server [db] batch_size = 1000 batch_interval = 5 # In seconds replicated = false [log] level = "debug" mode = "file" path = "/dev/stdout" if_exists = "append" ``` Replicated cluster: ```console $ cargo run --locked --release --bin omicron-package -- -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster Finished `release` profile [optimized] target(s) in 1.03s Running `target/release/omicron-package -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster` Logging to: /home/coatlicue/src/omicron/out/LOG Created new build target 'centzon' and set it as active $ cargo run --locked --release --bin omicron-package -- -t centzon package <...> $ pfexec ./target/release/omicron-package -t centzon install Logging to: /home/coatlicue/src/omicron/out/LOG $ zoneadm list | grep clickhouse oxz_clickhouse_keeper_73e7fda2-20af-4a90-9a61-c89ceed47c1a oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a oxz_clickhouse_keeper_8eaac4f9-d9e0-4d56-b269-eab7da0c73a3 oxz_clickhouse_keeper_01f3a6af-5249-4dff-b9a4-f1076e467c9a oxz_clickhouse_server_bc6010bf-507c-4b5a-ad4c-3a7af889a6c0 $ pfexec zlogin oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a [Connected to zone 'oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a' pts/3] The illumos Project helios-2.0.22784 July 2024 root@oxz_clickhouse_server_74876663:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::e ClickHouse client version 23.8.7.1. Connecting to fd00:1122:3344:101::e:9000 as user default. Connected to ClickHouse server version 23.8.7 revision 54465. oximeter_cluster node 1 :) SHOW TABLES FROM oximeter SHOW TABLES FROM oximeter Query id: a5603063-1cbc-41a5-bfbd-33c986764e92 ┌─name─────────────────────────────┐ │ fields_bool │ │ fields_bool_local │ │ fields_i16 │ │ fields_i16_local │ │ fields_i32 │ │ fields_i32_local │ │ fields_i64 │ │ fields_i64_local │ │ fields_i8 │ │ fields_i8_local │ │ fields_ipaddr │ │ fields_ipaddr_local │ │ fields_string │ │ fields_string_local │ │ fields_u16 │ │ fields_u16_local │ │ fields_u32 │ │ fields_u32_local │ │ fields_u64 │ │ fields_u64_local │ │ fields_u8 │ │ fields_u8_local │ │ fields_uuid │ │ fields_uuid_local │ │ measurements_bool │ │ measurements_bool_local │ │ measurements_bytes │ │ measurements_bytes_local │ │ measurements_cumulativef32 │ │ measurements_cumulativef32_local │ │ measurements_cumulativef64 │ │ measurements_cumulativef64_local │ │ measurements_cumulativei64 │ │ measurements_cumulativei64_local │ │ measurements_cumulativeu64 │ │ measurements_cumulativeu64_local │ │ measurements_f32 │ │ measurements_f32_local │ │ measurements_f64 │ │ measurements_f64_local │ │ measurements_histogramf32 │ │ measurements_histogramf32_local │ │ measurements_histogramf64 │ │ measurements_histogramf64_local │ │ measurements_histogrami16 │ │ measurements_histogrami16_local │ │ measurements_histogrami32 │ │ measurements_histogrami32_local │ │ measurements_histogrami64 │ │ measurements_histogrami64_local │ │ measurements_histogrami8 │ │ measurements_histogrami8_local │ │ measurements_histogramu16 │ │ measurements_histogramu16_local │ │ measurements_histogramu32 │ │ measurements_histogramu32_local │ │ measurements_histogramu64 │ │ measurements_histogramu64_local │ │ measurements_histogramu8 │ │ measurements_histogramu8_local │ │ measurements_i16 │ │ measurements_i16_local │ │ measurements_i32 │ │ measurements_i32_local │ │ measurements_i64 │ │ measurements_i64_local │ │ measurements_i8 │ │ measurements_i8_local │ │ measurements_string │ │ measurements_string_local │ │ measurements_u16 │ │ measurements_u16_local │ │ measurements_u32 │ │ measurements_u32_local │ │ measurements_u64 │ │ measurements_u64_local │ │ measurements_u8 │ │ measurements_u8_local │ │ timeseries_schema │ │ timeseries_schema_local │ │ version │ └──────────────────────────────────┘ 81 rows in set. Elapsed: 0.010 sec. oximeter_cluster node 1 :) SELECT * FROM oximeter.fields_i64 SELECT * FROM oximeter.fields_i64 Query id: 14f07468-0e33-4de1-8893-df3e11eb7660 ┌─timeseries_name────────────────────────┬───────timeseries_key─┬─field_name──┬─field_value─┐ │ http_service:request_latency_histogram │ 436117616059041516 │ status_code │ 200 │ <...> $ pfexec zlogin oxz_oximeter_bcba1c06-1ca5-49cf-b277-8c2387975274 [Connected to zone 'oxz_oximeter_bcba1c06-1ca5-49cf-b277-8c2387975274' pts/3] The illumos Project helios-2.0.22784 July 2024 root@oxz_oximeter_bcba1c06:~# cat /var/svc/manifest/site/oximeter/config.toml # Example configuration file for running an oximeter collector server [db] batch_size = 1000 batch_interval = 5 # In seconds replicated = true [log] level = "debug" mode = "file" path = "/dev/stdout" if_exists = "append" ``` Related: #5999
Related #6407 |
karencfv
added a commit
that referenced
this issue
Aug 28, 2024
## Overview This commit adds a library to generate ClickHouse replica server and keeper configuration files. A lot of the code in the `clickhouse-admin/types` directory has been copied over from [clickward](https://github.com/oxidecomputer/clickward), but there are several additions and modifications: - New `new()` and `default()` methods that set default Oxide values. - File generation is per node, as opposed to all files generated in a single directory like clickward. ## Usage To generate a replica server configuration file: ```rust let keepers = vec![ KeeperNodeConfig::new("ff::01".to_string()), KeeperNodeConfig::new("ff::02".to_string()), KeeperNodeConfig::new("ff::03".to_string()), ]; let servers = vec![ ServerNodeConfig::new("ff::08".to_string()), ServerNodeConfig::new("ff::09".to_string()), ]; let config = ClickhouseServerConfig::new( Utf8PathBuf::from(config_dir.path()), ServerId(1), Utf8PathBuf::from_str("./").unwrap(), Ipv6Addr::from_str("ff::08").unwrap(), keepers, servers, ); config.generate_xml_file().unwrap(); ``` To generate a keeper configuration file: ```rust let keepers = vec![ RaftServerConfig::new(KeeperId(1), "ff::01".to_string()), RaftServerConfig::new(KeeperId(2), "ff::02".to_string()), RaftServerConfig::new(KeeperId(3), "ff::03".to_string()), ]; let config = ClickhouseKeeperConfig::new( Utf8PathBuf::from(config_dir.path()), KeeperId(1), keepers, Utf8PathBuf::from_str("./").unwrap(), Ipv6Addr::from_str("ff::08").unwrap(), ); config.generate_xml_file().unwrap(); ``` ## Purpose As part of the work to roll out replicated ClickHouse, we'll need to dynamically generate the node configuration files via the `clickhouse-admin` service. This commit is part of the work necessary to do so. Related: #5999 , #3824
karencfv
added a commit
that referenced
this issue
Sep 5, 2024
…nfig files (#6476) ## Overview This commit introduces two new endpoints to the `clickhouse-admin` API. One to generate a ClickHouse server XML configuration file and the other to generate a ClickHouse keeper XML configuration file. ## Purpose "Reconfigurator" will need to call these endpoints to generate the necessary files when bringing up new `clickhouse_server` and `clickhouse_keeper` zones. They will also be necessary to update existing zones with the keeper and server information from new zones. ## API structure Moving forward the API endpoints will follow the following structure: - `/keeper/` : For endpoints that will perform actions on keeper nodes solely. - `/server/` : For endpoints that will perform actions on server nodes solely. ## Usage To generate a server XML config file: ```console $ curl -X put http://example/server/config \ -H "Content-Type: application/json" \ -d '{"generation": {GENERATION_NUM}, settings: {"id": {ID}, "keepers": [{"ipv6|ipv4|domain": "{ADDRESS}"}], "remote_servers": [{"ipv6|ipv4|domain": "{ADDRESS}"}], "config_dir": "{CONFIG_PATH}", "datastore_path": "{DATA_PATH}", "listen_addr": "{LISTEN_ADDRESS}" }}' ``` To generate a keeper XML config file: ```console $ curl -X put http://example/keeper/config \ -H "Content-Type: application/json" \ -d '{"generation": {GENERATION_NUM}, settings: {"node_id": {ID}, "raft_servers": [{"id": {NODE_ID}, "host": {"ipv6|ipv4|domain": "{ADDRESS}"}}], "config_dir": "{CONFIG_PATH}", "datastore_path": "{DATA_PATH}", "listen_addr": "{LISTEN_ADDRESS}"}}' ``` ## Caveats For the time being the generation number isn't used for anything. In a follow up PR we will use it to keep track of the configuration version to make sure we're not generating an incorrect XML configuration file. ## Testing Dropshot server: ```console $ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888 Compiling omicron-clickhouse-admin v0.1.0 (/Users/karcar/src/omicron/clickhouse-admin) Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.52s Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888'` note: configured to log to "/dev/stdout" {"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:00.106783Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"} {"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:17.626437Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:51896"} Path { inner: GenerationNum { generation: Generation(123) } } {"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:17.628323Z","hostname":"ixchel","pid":11318,"uri":"/node/server/generate-config/123","method":"post","req_id":"0b623f6b-441c-4418-9189-99504fbea5bf","remote_addr":"[::1]:51896","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":1044,"response_code":"201"} {"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:34:07.829199Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:52117"} Path { inner: GenerationNum { generation: Generation(123) } } {"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:34:07.830355Z","hostname":"ixchel","pid":11318,"uri":"/node/keeper/generate-config/123","method":"post","req_id":"1c729e0c-04bf-4e75-af0a-b23eef40f20f","remote_addr":"[::1]:52117","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":985,"response_code":"201"} ``` Generate server XML file: ```console $ curl -X put http://[::1]:8888/server/config -H "Content-Type: application/json" -d '{"generation": 3, "settings": {"id": 45, "keepers": [{"ipv6": "ff::01"}, {"ipv4": "127.0.0.1"}, {"domain_name": "hi.there"}], "remote_servers": [{"ipv6": "ff::08"}, {"ipv4": "127.0.0.2"}], "config_dir": "./", "datastore_path": "./", "listen_addr": "::1" }}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 858 100 602 100 256 210k 91789 --:--:-- --:--:-- --:--:-- 418k { "logger": { "level": "trace", "log": "./log/clickhouse.log", "errorlog": "./log/clickhouse.err.log", "size": 100, "count": 1 }, "macros": { "shard": 1, "replica": 45, "cluster": "oximeter_cluster" }, "listen_host": "::1", "http_port": 8123, "tcp_port": 9000, "interserver_http_port": 9009, "remote_servers": { "cluster": "oximeter_cluster", "secret": "some-unique-value", "replicas": [ { "host": { "ipv6": "ff::8" }, "port": 9000 }, { "host": { "ipv4": "127.0.0.2" }, "port": 9000 } ] }, "keepers": { "nodes": [ { "host": { "ipv6": "ff::1" }, "port": 9181 }, { "host": { "ipv4": "127.0.0.1" }, "port": 9181 }, { "host": { "domain_name": "hi.there" }, "port": 9181 } ] }, "data_path": "./data" } ``` Generate keeper XML file: ```console $ curl -X put http://[::1]:8888/keeper/config -H "Content-Type: application/json" -d '{"generation": 3, "settings": {"id": 1, "raft_servers": [{"id": 1, "host": {"ipv6": "ff::01"}}, {"id": 2, "host":{"ipv4": "127.0.0.1"}}, {"id": 3, "host":{"domain_name": "hi.there"}}], "config_dir": "./", "datastore_path": "./", "listen_addr": "::1" }}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 820 100 568 100 252 301k 133k --:--:-- --:--:-- --:--:-- 800k { "logger": { "level": "trace", "log": "./log/clickhouse-keeper.log", "errorlog": "./log/clickhouse-keeper.err.log", "size": 100, "count": 1 }, "listen_host": "::1", "tcp_port": 9181, "server_id": 1, "log_storage_path": "./coordination/log", "snapshot_storage_path": "./coordination/snapshots", "coordination_settings": { "operation_timeout_ms": 10000, "session_timeout_ms": 30000, "raft_logs_level": "trace" }, "raft_config": { "servers": [ { "id": 1, "hostname": { "ipv6": "ff::1" }, "port": 9234 }, { "id": 2, "hostname": { "ipv4": "127.0.0.1" }, "port": 9234 }, { "id": 3, "hostname": { "domain_name": "hi.there" }, "port": 9234 } ] } } ``` Related: #5999 , #3824
karencfv
added a commit
that referenced
this issue
Sep 9, 2024
## Overview This commit replaces the old replicated ClickHouse server and keeper configuration templates with calls to the `clickhouse-admin` API that generate said configuration files. ## Purpose While the end goal is to have Nexus make the API calls to generate the configuration files, we'd like to have a working implementation of the `clickhouse-admin` API via the SMF services. Using `curl` is not what the finished work will look like, but rather it is the simplest way to have a working implementation in the mean time. ## Testing Deployed this branch on a Helios machine with the following results Replica 1 ```console root@oxz_clickhouse_server_a9d02cd3:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::f ClickHouse client version 23.8.7.1. Connecting to fd00:1122:3344:101::f:9000 as user default. Connected to ClickHouse server version 23.8.7 revision 54465. oximeter_cluster_1 :) SHOW TABLES FROM oximeter SHOW TABLES FROM oximeter Query id: 06867649-f49e-451f-b9f1-5a574e12ce5b ┌─name─────────────────────────────┐ │ fields_bool │ │ fields_bool_local │ │ fields_i16 │ │ fields_i16_local │ │ fields_i32 │ │ fields_i32_local │ │ fields_i64 │ │ fields_i64_local │ │ fields_i8 │ │ fields_i8_local │ │ fields_ipaddr │ │ fields_ipaddr_local │ │ fields_string │ │ fields_string_local │ │ fields_u16 │ │ fields_u16_local │ │ fields_u32 │ │ fields_u32_local │ │ fields_u64 │ │ fields_u64_local │ │ fields_u8 │ │ fields_u8_local │ │ fields_uuid │ │ fields_uuid_local │ │ measurements_bool │ │ measurements_bool_local │ │ measurements_bytes │ │ measurements_bytes_local │ │ measurements_cumulativef32 │ │ measurements_cumulativef32_local │ │ measurements_cumulativef64 │ │ measurements_cumulativef64_local │ │ measurements_cumulativei64 │ │ measurements_cumulativei64_local │ │ measurements_cumulativeu64 │ │ measurements_cumulativeu64_local │ │ measurements_f32 │ │ measurements_f32_local │ │ measurements_f64 │ │ measurements_f64_local │ │ measurements_histogramf32 │ │ measurements_histogramf32_local │ │ measurements_histogramf64 │ │ measurements_histogramf64_local │ │ measurements_histogrami16 │ │ measurements_histogrami16_local │ │ measurements_histogrami32 │ │ measurements_histogrami32_local │ │ measurements_histogrami64 │ │ measurements_histogrami64_local │ │ measurements_histogrami8 │ │ measurements_histogrami8_local │ │ measurements_histogramu16 │ │ measurements_histogramu16_local │ │ measurements_histogramu32 │ │ measurements_histogramu32_local │ │ measurements_histogramu64 │ │ measurements_histogramu64_local │ │ measurements_histogramu8 │ │ measurements_histogramu8_local │ │ measurements_i16 │ │ measurements_i16_local │ │ measurements_i32 │ │ measurements_i32_local │ │ measurements_i64 │ │ measurements_i64_local │ │ measurements_i8 │ │ measurements_i8_local │ │ measurements_string │ │ measurements_string_local │ │ measurements_u16 │ │ measurements_u16_local │ │ measurements_u32 │ │ measurements_u32_local │ │ measurements_u64 │ │ measurements_u64_local │ │ measurements_u8 │ │ measurements_u8_local │ │ timeseries_schema │ │ timeseries_schema_local │ │ version │ └──────────────────────────────────┘ 81 rows in set. Elapsed: 0.005 sec. oximeter_cluster_1 :) SELECT * FROM oximeter.measurements_u64 SELECT * FROM oximeter.measurements_u64 Query id: 2e13d330-8f0b-4346-afc0-ba3c21ea7674 ┌─timeseries_name─────────────────────────┬───────timeseries_key─┬─────────────────────timestamp─┬─datum─┐ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:16:47.241835734 │ 0 │ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:16:48.241091831 │ 0 │ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:16:49.241294398 │ 0 │ <...> ``` Replica 2 ```console root@oxz_clickhouse_server_ba1601d3:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::e ClickHouse client version 23.8.7.1. Connecting to fd00:1122:3344:101::e:9000 as user default. Connected to ClickHouse server version 23.8.7 revision 54465. oximeter_cluster_2 :) SHOW TABLES FROM oximeter SHOW TABLES FROM oximeter Query id: 33dd1d4d-1596-44e3-90ea-c755a1e3ae24 ┌─name─────────────────────────────┐ │ fields_bool │ │ fields_bool_local │ │ fields_i16 │ │ fields_i16_local │ │ fields_i32 │ │ fields_i32_local │ │ fields_i64 │ │ fields_i64_local │ │ fields_i8 │ │ fields_i8_local │ │ fields_ipaddr │ │ fields_ipaddr_local │ │ fields_string │ │ fields_string_local │ │ fields_u16 │ │ fields_u16_local │ │ fields_u32 │ │ fields_u32_local │ │ fields_u64 │ │ fields_u64_local │ │ fields_u8 │ │ fields_u8_local │ │ fields_uuid │ │ fields_uuid_local │ │ measurements_bool │ │ measurements_bool_local │ │ measurements_bytes │ │ measurements_bytes_local │ │ measurements_cumulativef32 │ │ measurements_cumulativef32_local │ │ measurements_cumulativef64 │ │ measurements_cumulativef64_local │ │ measurements_cumulativei64 │ │ measurements_cumulativei64_local │ │ measurements_cumulativeu64 │ │ measurements_cumulativeu64_local │ │ measurements_f32 │ │ measurements_f32_local │ │ measurements_f64 │ │ measurements_f64_local │ │ measurements_histogramf32 │ │ measurements_histogramf32_local │ │ measurements_histogramf64 │ │ measurements_histogramf64_local │ │ measurements_histogrami16 │ │ measurements_histogrami16_local │ │ measurements_histogrami32 │ │ measurements_histogrami32_local │ │ measurements_histogrami64 │ │ measurements_histogrami64_local │ │ measurements_histogrami8 │ │ measurements_histogrami8_local │ │ measurements_histogramu16 │ │ measurements_histogramu16_local │ │ measurements_histogramu32 │ │ measurements_histogramu32_local │ │ measurements_histogramu64 │ │ measurements_histogramu64_local │ │ measurements_histogramu8 │ │ measurements_histogramu8_local │ │ measurements_i16 │ │ measurements_i16_local │ │ measurements_i32 │ │ measurements_i32_local │ │ measurements_i64 │ │ measurements_i64_local │ │ measurements_i8 │ │ measurements_i8_local │ │ measurements_string │ │ measurements_string_local │ │ measurements_u16 │ │ measurements_u16_local │ │ measurements_u32 │ │ measurements_u32_local │ │ measurements_u64 │ │ measurements_u64_local │ │ measurements_u8 │ │ measurements_u8_local │ │ timeseries_schema │ │ timeseries_schema_local │ │ version │ └──────────────────────────────────┘ 81 rows in set. Elapsed: 0.010 sec. oximeter_cluster_2 :) SELECT * FROM oximeter.measurements_u64 SELECT * FROM oximeter.measurements_u64 Query id: 06da0f16-3055-47cb-9984-94dc78f99afc ┌─timeseries_name─────────────────────────┬───────timeseries_key─┬─────────────────────timestamp─┬─datum─┐ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:22:02.443983562 │ 0 │ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:22:03.444346219 │ 0 │ │ ddm_router:originated_tunnel_endpoints │ 2085026407707057203 │ 2024-09-09 07:22:04.444356384 │ 0 │ <...> ``` Keeper 1 ```console root@oxz_clickhouse_keeper_8cb0de91:~# echo mntr | nc fd00:1122:3344:101::12 9181 zk_version v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64 zk_avg_latency 4 zk_max_latency 103 zk_min_latency 0 zk_packets_received 27769 zk_packets_sent 29290 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state leader zk_znode_count 6535 zk_watch_count 83 zk_ephemerals_count 82 zk_approximate_data_size 2330794 zk_key_arena_size 1044480 zk_latest_snapshot_size 0 zk_followers 2 zk_synced_followers 2 ``` Keeper 2 ```console root@oxz_clickhouse_keeper_a6c18bd2:~# echo mntr | nc fd00:1122:3344:101::10 9181 zk_version v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64 zk_avg_latency 10 zk_max_latency 139 zk_min_latency 0 zk_packets_received 22278 zk_packets_sent 23922 zk_num_alive_connections 1 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 7015 zk_watch_count 83 zk_ephemerals_count 82 zk_approximate_data_size 2512980 zk_key_arena_size 1044480 zk_latest_snapshot_size 0 ``` Keeper 3 ```console root@oxz_clickhouse_keeper_45d3e6ef:~# echo mntr | nc fd00:1122:3344:101::11 9181 zk_version v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64 zk_avg_latency 0 zk_max_latency 0 zk_min_latency 0 zk_packets_received 0 zk_packets_sent 0 zk_num_alive_connections 0 zk_outstanding_requests 0 zk_server_state follower zk_znode_count 7188 zk_watch_count 0 zk_ephemerals_count 82 zk_approximate_data_size 2575631 zk_key_arena_size 1044480 zk_latest_snapshot_size 0 ``` Related: #5999 Closes: #3824
karencfv
added a commit
that referenced
this issue
Sep 19, 2024
## Overview This commit introduces a new `clickhouse-admin` API endpoint: `/keeper/lgif`. This endpoint uses the ClickHouse CLI internally to retrieve and parse the logically grouped information file from the ClickHouse keepers. ## Purpose Reconfigurator will need this information to reliably manage and operate a ClickHouse replicated cluster. Additional endpoints to retrieve other information from ClickHouse servers or keepers will be added in follow up PRs. ## Testing In addition to the unit tests, I have manually tested with the following results: ```console $ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888 -l [::1]:20001 -b /Users/karcar/src/omicron/out/clickhouse/clickhouse Compiling omicron-clickhouse-admin v0.1.0 (/Users/karcar/src/omicron/clickhouse-admin) Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.46s Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888' -l '[::1]:20001' -b /Users/karcar/src/omicron/out/clickhouse/clickhouse` note: configured to log to "/dev/stdout" {"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:19.383597Z","hostname":"ixchel","pid":3115,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"} {"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:23.843325Z","hostname":"ixchel","pid":3115,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:54455"} {"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:24.302588Z","hostname":"ixchel","pid":3115,"uri":"/keeper/lgif","method":"GET","req_id":"64b232d0-d6ac-4cae-8f0a-f14cf6d1dfba","remote_addr":"[::1]:54455","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":458301,"response_code":"200"} ``` ```console $ curl http://[::1]:8888/keeper/lgif {"first_log_idx":1,"first_log_term":1,"last_log_idx":11717,"last_log_term":20,"last_committed_log_idx":11717,"leader_committed_log_idx":11717,"target_committed_log_idx":11717,"last_snapshot_idx":9465} ``` Related: #5999
karencfv
added a commit
that referenced
this issue
Sep 23, 2024
## Overview This commit implements a new clickhouse-admin endpoint to retrieve and parse information from the ClickHouse virtual node `/keeper/config` which contains the last committed cluster configuration. ## Purpose The main purpose of retrieving this information is to have the ability to populate the inventory's `raft_config` in `ClickhouseKeeperClusterMembership`. https://github.com/oxidecomputer/omicron/blob/453311a880075b9f89626bb20cca1c1cd85ffb4f/nexus/types/src/inventory.rs#L499-L503 In a follow up PR an endpoint that specifically retrieves all information to populate `ClickhouseKeeperClusterMembership`. This will be done by making several calls to the `clickhouse keeper-client` and using the parsing function here to populate `raft_config`. The endpoint itself will be useful to retrieve information for debugging. ## Manual testing ```console $ curl http://[::1]:8888/keeper/raft-config {"keeper_servers":[{"server_id":1,"host":{"ipv6":"::1"},"raft_port":21001,"server_type":"participant","priority":1},{"server_id":2,"host":{"ipv6":"::1"},"raft_port":21002,"server_type":"participant","priority":1},{"server_id":3,"host":{"ipv6":"::1"},"raft_port":21003,"server_type":"participant","priority":1}]} ``` Related: #5999
karencfv
added a commit
that referenced
this issue
Sep 26, 2024
## Overview This commit implements a new clickhouse-admin endpoint to retrieve and parse information from the keeper node configuration. ## Purpose The main purpose of retrieving this information is to have the ability to populate the inventory's `queried_keeper` in `ClickhouseKeeperClusterMembership`. https://github.com/oxidecomputer/omicron/blob/453311a880075b9f89626bb20cca1c1cd85ffb4f/nexus/types/src/inventory.rs#L499-L503 In a follow up PR an endpoint that specifically retrieves all information to populate `ClickhouseKeeperClusterMembership`. This will be done by making several calls to the `clickhouse keeper-client` and using the parsing function here to populate `queried_keeper`. The endpoint itself will be useful to retrieve information for debugging. ## Manual testing ```console $ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888 -l [::1]:20001 -b ./out/clickhouse/clickhouse Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888' -l '[::1]:20001' -b ./out/clickhouse/clickhouse` note: configured to log to "/dev/stdout" {"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:39.529734Z","hostname":"ixchel","pid":61269,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"} {"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:46.767686Z","hostname":"ixchel","pid":61269,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:57461"} {"msg":"Retrieved data from `clickhouse keeper-client --q conf`","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:47.224265Z","hostname":"ixchel","pid":61269,"component":"ClickhouseCli","file":"clickhouse-admin/types/src/lib.rs:605","output":"\"server_id=1\\nenable_ipv6=true\\ntcp_port=20001\\nfour_letter_word_allow_list=conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro,rcvr,apiv,csnp,lgif,rqld,rclc,clrs,ftfl\\nmax_requests_batch_size=100\\nmin_session_timeout_ms=10000\\nsession_timeout_ms=30000\\noperation_timeout_ms=10000\\ndead_session_check_period_ms=500\\nheart_beat_interval_ms=500\\nelection_timeout_lower_bound_ms=1000\\nelection_timeout_upper_bound_ms=2000\\nreserved_log_items=100000\\nsnapshot_distance=100000\\nauto_forwarding=true\\nshutdown_timeout=5000\\nstartup_timeout=180000\\nraft_logs_level=trace\\nsnapshots_to_keep=3\\nrotate_log_storage_interval=100000\\nstale_log_gap=10000\\nfresh_log_gap=200\\nmax_requests_batch_size=100\\nmax_requests_batch_bytes_size=102400\\nmax_request_queue_size=100000\\nmax_requests_quick_batch_size=100\\nquorum_reads=false\\nforce_sync=true\\ncompress_logs=true\\ncompress_snapshots_with_zstd_format=true\\nconfiguration_change_tries_count=20\\nraft_limits_reconnect_limit=50\\nlog_storage_path=./deployment/keeper-1/coordination/log\\nlog_storage_disk=LocalLogDisk\\nsnapshot_storage_path=./deployment/keeper-1/coordination/snapshots\\nsnapshot_storage_disk=LocalSnapshotDisk\\n\\n\""} {"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:47.22448Z","hostname":"ixchel","pid":61269,"uri":"/keeper/conf","method":"GET","req_id":"847f0baa-3b16-4273-a84a-fcfd5acd6b49","remote_addr":"[::1]:57461","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":455407,"response_code":"200"} ``` ```console $ curl http://[::1]:8888/keeper/conf {"server_id":1,"enable_ipv6":true,"tcp_port":20001,"four_letter_word_allow_list":"conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro,rcvr,apiv,csnp,lgif,rqld,rclc,clrs,ftfl","max_requests_batch_size":100,"min_session_timeout_ms":10000,"session_timeout_ms":30000,"operation_timeout_ms":10000,"dead_session_check_period_ms":500,"heart_beat_interval_ms":500,"election_timeout_lower_bound_ms":1000,"election_timeout_upper_bound_ms":2000,"reserved_log_items":100000,"snapshot_distance":100000,"auto_forwarding":true,"shutdown_timeout":5000,"startup_timeout":180000,"raft_logs_level":"trace","snapshots_to_keep":3,"rotate_log_storage_interval":100000,"stale_log_gap":10000,"fresh_log_gap":200,"max_requests_batch_bytes_size":102400,"max_request_queue_size":100000,"max_requests_quick_batch_size":100,"quorum_reads":false,"force_sync":true,"compress_logs":true,"compress_snapshots_with_zstd_format":true,"configuration_change_tries_count":20,"raft_limits_reconnect_limit":50,"log_storage_path":"./deployment/keeper-1/coordination/log","log_storage_disk":"LocalLogDisk","snapshot_storage_path":"./deployment/keeper-1/coordination/snapshots","snapshot_storage_disk":"LocalSnapshotDisk"} ``` Related: #5999
andrewjstone
pushed a commit
that referenced
this issue
Oct 11, 2024
## Overview This commit makes a few changes to the way the `clickhouse_server` and `clickhouse_keeper` SMF services are launched: - They are now disabled by default on zone boot. - They are enabled by clickhouse-admin after the configuration files have been generated. Related: #5999
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
This is a tracking issue for all the work related to deploying multi-node clickhouse on a running system via reconfigurator. We expect that each system will run both single and multi-node for a while per RFD 468 and reconfigurator should allow us to transition between the two at runtime. We likely will need to change RSS to also deploy single and multi-node clickhouse to support this.
Reconfigurator:
OmicronZoneConfig
to supportClickhouseServer
for replication #6298ClickhouseAllocator
ClickhouseAllocator
into blueprint and db queriesSled-agent:
clickhouse_server
zone and move all replicated clickhouse related code there [reconfigurator]clickhouse_server
SMF service and oximeter replicated mode #6343clickhouse_server
SMF service and oximeter replicated mode #6343clickhouse-admin
SMF service with dropshot server #6304The text was updated successfully, but these errors were encountered: