Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconfigurator: Tracking issue for multi-node clickhouse deployment #5999

Open
18 of 20 tasks
andrewjstone opened this issue Jul 3, 2024 · 1 comment
Open
18 of 20 tasks
Assignees
Milestone

Comments

@andrewjstone
Copy link
Contributor

andrewjstone commented Jul 3, 2024

This is a tracking issue for all the work related to deploying multi-node clickhouse on a running system via reconfigurator. We expect that each system will run both single and multi-node for a while per RFD 468 and reconfigurator should allow us to transition between the two at runtime. We likely will need to change RSS to also deploy single and multi-node clickhouse to support this.

Reconfigurator:

Sled-agent:

@andrewjstone andrewjstone added this to the 11 milestone Jul 3, 2024
karencfv added a commit that referenced this issue Aug 14, 2024
…6304)

## Overview

New SMF service in `clickhouse` and `clickhouse_keeper` zones which runs
a dropshot server. The API contains a single `/node/address` endpoint to
retrieve the node's listen address. Other endpoints will be added in
future PRs.

## Purpose

This server will be used to manage ClickHouse server and Keeper nodes.
For now it performs a single basic action to keep the size of this PR
small, but this server will perform other actions like generating the
XML config files, retrieving the state of the node etc.

## Testing

I've deployed locally with the following results:

```console
root@oxz_switch:~# curl http://[fd00:1122:3344:101::e]:8888/node/address
{"clickhouse_address":"[fd00:1122:3344:101::e]:8123"}
```

```console
root@oxz_clickhouse_2c213ff2:~# cat /var/svc/log/oxide-clickhouse-admin:default.log
[ Aug 14 06:54:42 Enabled. ]
[ Aug 14 06:54:42 Rereading configuration. ]
[ Aug 14 06:54:45 Rereading configuration. ]
[ Aug 14 06:54:46 Executing start method ("ctrun -l child -o noorphan,regent /opt/oxide/clickhouse-admin/bin/clickhouse-admin run -c /var/svc/manifest/site/clickhouse-admin/config.toml -a [fd00:1122:3344:101::e]:8123 -H [fd00:1122:3344:101::e]:8888 &"). ]
[ Aug 14 06:54:46 Method "start" exited with status 0. ]
note: configured to log to "/dev/stdout"
{"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:54:46.721122327Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:205"}
{"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:56:17.908877036Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:775","remote_addr":"[fd00:1122:3344:101::2]:37268"}
{"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-08-14T06:56:17.91734856Z","hostname":"oxz_clickhouse_2c213ff2-6544-4316-939f-b51749cf3222","pid":5169,"uri":"/node/address","method":"GET","req_id":"62a3d8fc-e37e-42aa-a715-52dbce8aa493","remote_addr":"[fd00:1122:3344:101::2]:37268","local_addr":"[fd00:1122:3344:101::e]:8888","component":"dropshot","file":"/home/coatlicue/.cargo/git/checkouts/dropshot-a4a923d29dccc492/52d900a/dropshot/src/server.rs:914","latency_us":3151,"response_code":"200"}
```


Related: #5999
karencfv added a commit that referenced this issue Aug 20, 2024
…ted mode (#6343)

## Overview

This commit introduces a few changes: 
- a new `clickhouse_server` smf service which runs the old "replicated"
mode from the `clickhouse` service
- a new `replicated` field for the oximeter configuration file which is
consumed by the `oximeter` binary that runs the replicated SQL against a
database. It now connects to the listen address from
`ServiceName::ClickhouseServer` or `ServiceName::Clickhouse` depending
which zone has been deployed.
- a new `--clickhouse-topology` build target flag which builds artifacts
based on either a `single-node` or `replicated-cluster` setup. The
difference between the two is whether the `oximeter` SMF service is
executing the `oximeter` CLI with the `--replicated` flag or not.
__CAVEAT:__ It's still necessary to manually change the RSS [node count
constants](https://github.com/oxidecomputer/omicron/blob/ffc8807caf04ca3f81b543c520ddbe26b3284264/sled-agent/src/rack_setup/plan/service.rs#L57-L77)
to the specified amount for each clickhouse topology mode. This
requirement will be short lived as we are moving to use reconfigurator.

## Usage

To run single node ClickHouse nothing changes, artifacts can be built
the same way as before.

To run replicated ClickHouse set the [node count
constants](https://github.com/oxidecomputer/omicron/blob/ffc8807caf04ca3f81b543c520ddbe26b3284264/sled-agent/src/rack_setup/plan/service.rs#L57-L77)
to the specified amount, and set the build target in the following
manner:

```console
$ cargo run --locked --release --bin omicron-package -- -t <NAME> target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster
    Finished `release` profile [optimized] target(s) in 1.03s
     Running `target/release/omicron-package -t <NAME> target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster`
Logging to: /home/coatlicue/src/omicron/out/LOG
Created new build target 'centzon' and set it as active
$ cargo run --locked --release --bin omicron-package -- -t <NAME> package
<...>
$ pfexec ./target/release/omicron-package -t <NAME> install
```

## Purpose

As laid out in [RFD 468](https://rfd.shared.oxide.computer/rfd/0468), to
roll out replicated ClickHouse we will need the ability to roll out
either replicated or single node ClickHouse for an undetermined amount
of time. This commit is a step in that direction. We need to have
separate services for running replicated or single-node ClickHouse
servers.

## Testing

Deploying omicron on a helios box with both modes.

Single node:

```console
$ cargo run --locked --release --bin omicron-package -- -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled
    Finished `release` profile [optimized] target(s) in 0.94s
     Running `target/release/omicron-package -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled`
Logging to: /home/coatlicue/src/omicron/out/LOG
Created new build target 'centzon' and set it as active
$ cargo run --locked --release --bin omicron-package -- -t centzon package
<...>
$ pfexec ./target/release/omicron-package -t centzon install
Logging to: /home/coatlicue/src/omicron/out/LOG
$ zoneadm list | grep clickhouse
oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e
root@oxz_clickhouse_7ce86c8b:~# /opt/oxide/clickhouse/clickhouse client --host fd00:1122:3344:101::e
ClickHouse client version 23.8.7.1.
Connecting to fd00:1122:3344:101::e:9000 as user default.
Connected to ClickHouse server version 23.8.7 revision 54465.

oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e.local :) SHOW TABLES FROM oximeter

SHOW TABLES FROM oximeter

Query id: 5e91fafb-4d70-4a27-a188-75fb83bb7e5e

┌─name───────────────────────┐
│ fields_bool                │
│ fields_i16                 │
│ fields_i32                 │
│ fields_i64                 │
│ fields_i8                  │
│ fields_ipaddr              │
│ fields_string              │
│ fields_u16                 │
│ fields_u32                 │
│ fields_u64                 │
│ fields_u8                  │
│ fields_uuid                │
│ measurements_bool          │
│ measurements_bytes         │
│ measurements_cumulativef32 │
│ measurements_cumulativef64 │
│ measurements_cumulativei64 │
│ measurements_cumulativeu64 │
│ measurements_f32           │
│ measurements_f64           │
│ measurements_histogramf32  │
│ measurements_histogramf64  │
│ measurements_histogrami16  │
│ measurements_histogrami32  │
│ measurements_histogrami64  │
│ measurements_histogrami8   │
│ measurements_histogramu16  │
│ measurements_histogramu32  │
│ measurements_histogramu64  │
│ measurements_histogramu8   │
│ measurements_i16           │
│ measurements_i32           │
│ measurements_i64           │
│ measurements_i8            │
│ measurements_string        │
│ measurements_u16           │
│ measurements_u32           │
│ measurements_u64           │
│ measurements_u8            │
│ timeseries_schema          │
│ version                    │
└────────────────────────────┘

41 rows in set. Elapsed: 0.014 sec.

oxz_clickhouse_7ce86c8b-2c9e-4d02-a857-269cb0a99c2e.local :) SELECT * FROM oximeter.fields_i64

SELECT *
FROM oximeter.fields_i64

Query id: 4bbcec72-101f-4cf4-9966-680381f5b62c

┌─timeseries_name────────────────────────┬───────timeseries_key─┬─field_name──┬─field_value─┐
│ http_service:request_latency_histogram │  8326032694586838023 │ status_code │         200 │
<...>

$ pfexec zlogin oxz_oximeter_b235200f-f0ad-4218-9184-d995df5acaf0
[Connected to zone 'oxz_oximeter_b235200f-f0ad-4218-9184-d995df5acaf0' pts/3]
The illumos Project     helios-2.0.22784        July 2024
root@oxz_oximeter_b235200f:~# cat /var/svc/manifest/site/oximeter/config.toml 
# Example configuration file for running an oximeter collector server

[db]
batch_size = 1000
batch_interval = 5 # In seconds
replicated = false

[log]
level = "debug"
mode = "file"
path = "/dev/stdout"
if_exists = "append"
```

Replicated cluster:

```console
$ cargo run --locked --release --bin omicron-package -- -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster
    Finished `release` profile [optimized] target(s) in 1.03s
     Running `target/release/omicron-package -t centzon target create -i standard -m non-gimlet -s softnpu -r single-sled -c replicated-cluster`
Logging to: /home/coatlicue/src/omicron/out/LOG
Created new build target 'centzon' and set it as active
$ cargo run --locked --release --bin omicron-package -- -t centzon package
<...>
$ pfexec ./target/release/omicron-package -t centzon install
Logging to: /home/coatlicue/src/omicron/out/LOG
$ zoneadm list | grep clickhouse
oxz_clickhouse_keeper_73e7fda2-20af-4a90-9a61-c89ceed47c1a
oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a
oxz_clickhouse_keeper_8eaac4f9-d9e0-4d56-b269-eab7da0c73a3
oxz_clickhouse_keeper_01f3a6af-5249-4dff-b9a4-f1076e467c9a
oxz_clickhouse_server_bc6010bf-507c-4b5a-ad4c-3a7af889a6c0
$ pfexec zlogin oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a
[Connected to zone 'oxz_clickhouse_server_74876663-5337-4d9b-85cb-99d1e88bdf8a' pts/3]
The illumos Project     helios-2.0.22784        July 2024
root@oxz_clickhouse_server_74876663:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::e
ClickHouse client version 23.8.7.1.
Connecting to fd00:1122:3344:101::e:9000 as user default.
Connected to ClickHouse server version 23.8.7 revision 54465.

oximeter_cluster node 1 :) SHOW TABLES FROM oximeter

SHOW TABLES FROM oximeter

Query id: a5603063-1cbc-41a5-bfbd-33c986764e92

┌─name─────────────────────────────┐
│ fields_bool                      │
│ fields_bool_local                │
│ fields_i16                       │
│ fields_i16_local                 │
│ fields_i32                       │
│ fields_i32_local                 │
│ fields_i64                       │
│ fields_i64_local                 │
│ fields_i8                        │
│ fields_i8_local                  │
│ fields_ipaddr                    │
│ fields_ipaddr_local              │
│ fields_string                    │
│ fields_string_local              │
│ fields_u16                       │
│ fields_u16_local                 │
│ fields_u32                       │
│ fields_u32_local                 │
│ fields_u64                       │
│ fields_u64_local                 │
│ fields_u8                        │
│ fields_u8_local                  │
│ fields_uuid                      │
│ fields_uuid_local                │
│ measurements_bool                │
│ measurements_bool_local          │
│ measurements_bytes               │
│ measurements_bytes_local         │
│ measurements_cumulativef32       │
│ measurements_cumulativef32_local │
│ measurements_cumulativef64       │
│ measurements_cumulativef64_local │
│ measurements_cumulativei64       │
│ measurements_cumulativei64_local │
│ measurements_cumulativeu64       │
│ measurements_cumulativeu64_local │
│ measurements_f32                 │
│ measurements_f32_local           │
│ measurements_f64                 │
│ measurements_f64_local           │
│ measurements_histogramf32        │
│ measurements_histogramf32_local  │
│ measurements_histogramf64        │
│ measurements_histogramf64_local  │
│ measurements_histogrami16        │
│ measurements_histogrami16_local  │
│ measurements_histogrami32        │
│ measurements_histogrami32_local  │
│ measurements_histogrami64        │
│ measurements_histogrami64_local  │
│ measurements_histogrami8         │
│ measurements_histogrami8_local   │
│ measurements_histogramu16        │
│ measurements_histogramu16_local  │
│ measurements_histogramu32        │
│ measurements_histogramu32_local  │
│ measurements_histogramu64        │
│ measurements_histogramu64_local  │
│ measurements_histogramu8         │
│ measurements_histogramu8_local   │
│ measurements_i16                 │
│ measurements_i16_local           │
│ measurements_i32                 │
│ measurements_i32_local           │
│ measurements_i64                 │
│ measurements_i64_local           │
│ measurements_i8                  │
│ measurements_i8_local            │
│ measurements_string              │
│ measurements_string_local        │
│ measurements_u16                 │
│ measurements_u16_local           │
│ measurements_u32                 │
│ measurements_u32_local           │
│ measurements_u64                 │
│ measurements_u64_local           │
│ measurements_u8                  │
│ measurements_u8_local            │
│ timeseries_schema                │
│ timeseries_schema_local          │
│ version                          │
└──────────────────────────────────┘

81 rows in set. Elapsed: 0.010 sec. 

oximeter_cluster node 1 :) SELECT * FROM oximeter.fields_i64

SELECT *
FROM oximeter.fields_i64

Query id: 14f07468-0e33-4de1-8893-df3e11eb7660

┌─timeseries_name────────────────────────┬───────timeseries_key─┬─field_name──┬─field_value─┐
│ http_service:request_latency_histogram │   436117616059041516 │ status_code │         200 │
<...>

$ pfexec zlogin oxz_oximeter_bcba1c06-1ca5-49cf-b277-8c2387975274
[Connected to zone 'oxz_oximeter_bcba1c06-1ca5-49cf-b277-8c2387975274' pts/3]
The illumos Project     helios-2.0.22784        July 2024
root@oxz_oximeter_bcba1c06:~# cat /var/svc/manifest/site/oximeter/config.toml
# Example configuration file for running an oximeter collector server

[db]
batch_size = 1000
batch_interval = 5 # In seconds
replicated = true

[log]
level = "debug"
mode = "file"
path = "/dev/stdout"
if_exists = "append"

```

Related: #5999
@karencfv
Copy link
Contributor

Related #6407

karencfv added a commit that referenced this issue Aug 28, 2024
## Overview

This commit adds a library to generate ClickHouse replica server and
keeper configuration files. A lot of the code in the
`clickhouse-admin/types` directory has been copied over from
[clickward](https://github.com/oxidecomputer/clickward), but there are
several additions and modifications:

- New `new()` and `default()` methods that set default Oxide values.
- File generation is per node, as opposed to all files generated in a
single directory like clickward.

## Usage

To generate a replica server configuration file:

```rust
let keepers = vec![
    KeeperNodeConfig::new("ff::01".to_string()),
    KeeperNodeConfig::new("ff::02".to_string()),
    KeeperNodeConfig::new("ff::03".to_string()),
];

let servers = vec![
    ServerNodeConfig::new("ff::08".to_string()),
    ServerNodeConfig::new("ff::09".to_string()),
];

let config = ClickhouseServerConfig::new(
    Utf8PathBuf::from(config_dir.path()),
    ServerId(1),
    Utf8PathBuf::from_str("./").unwrap(),
    Ipv6Addr::from_str("ff::08").unwrap(),
    keepers,
    servers,
);

config.generate_xml_file().unwrap();
```

To generate a keeper configuration file:

```rust
let keepers = vec![
    RaftServerConfig::new(KeeperId(1), "ff::01".to_string()),
    RaftServerConfig::new(KeeperId(2), "ff::02".to_string()),
    RaftServerConfig::new(KeeperId(3), "ff::03".to_string()),
];

let config = ClickhouseKeeperConfig::new(
    Utf8PathBuf::from(config_dir.path()),
    KeeperId(1),
    keepers,
    Utf8PathBuf::from_str("./").unwrap(),
    Ipv6Addr::from_str("ff::08").unwrap(),
);

config.generate_xml_file().unwrap();
```

## Purpose

As part of the work to roll out replicated ClickHouse, we'll need to
dynamically generate the node configuration files via the
`clickhouse-admin` service. This commit is part of the work necessary to
do so.

Related: #5999 ,
#3824
karencfv added a commit that referenced this issue Sep 5, 2024
…nfig files (#6476)

## Overview

This commit introduces two new endpoints to the `clickhouse-admin` API.
One to generate a ClickHouse server XML configuration file and the other
to generate a ClickHouse keeper XML configuration file.

## Purpose

"Reconfigurator" will need to call these endpoints to generate the
necessary files when bringing up new `clickhouse_server` and
`clickhouse_keeper` zones. They will also be necessary to update
existing zones with the keeper and server information from new zones.

## API structure

Moving forward the API endpoints will follow the following structure:

- `/keeper/` : For endpoints that will perform actions on keeper nodes
solely.
- `/server/` : For endpoints that will perform actions on server nodes
solely.

## Usage

To generate a server XML config file:

```console
$ curl -X put http://example/server/config \ 
-H "Content-Type: application/json" \
-d '{"generation": {GENERATION_NUM}, settings: {"id": {ID}, "keepers": [{"ipv6|ipv4|domain": "{ADDRESS}"}], "remote_servers": [{"ipv6|ipv4|domain": "{ADDRESS}"}], "config_dir": "{CONFIG_PATH}", "datastore_path": "{DATA_PATH}", "listen_addr": "{LISTEN_ADDRESS}" }}'
```

To generate a keeper XML config file:

```console
$ curl -X put http://example/keeper/config \ 
-H "Content-Type: application/json" \
-d '{"generation": {GENERATION_NUM}, settings: {"node_id": {ID}, "raft_servers": [{"id": {NODE_ID}, "host": {"ipv6|ipv4|domain": "{ADDRESS}"}}], "config_dir": "{CONFIG_PATH}", "datastore_path": "{DATA_PATH}", "listen_addr": "{LISTEN_ADDRESS}"}}'
```

## Caveats

For the time being the generation number isn't used for anything. In a
follow up PR we will use it to keep track of the configuration version
to make sure we're not generating an incorrect XML configuration file.

## Testing

Dropshot server:

```console
$ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888
   Compiling omicron-clickhouse-admin v0.1.0 (/Users/karcar/src/omicron/clickhouse-admin)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.52s
     Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888'`
note: configured to log to "/dev/stdout"
{"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:00.106783Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"}
{"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:17.626437Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:51896"}
Path { inner: GenerationNum { generation: Generation(123) } }
{"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:32:17.628323Z","hostname":"ixchel","pid":11318,"uri":"/node/server/generate-config/123","method":"post","req_id":"0b623f6b-441c-4418-9189-99504fbea5bf","remote_addr":"[::1]:51896","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":1044,"response_code":"201"}
{"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:34:07.829199Z","hostname":"ixchel","pid":11318,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:52117"}
Path { inner: GenerationNum { generation: Generation(123) } }
{"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-02T07:34:07.830355Z","hostname":"ixchel","pid":11318,"uri":"/node/keeper/generate-config/123","method":"post","req_id":"1c729e0c-04bf-4e75-af0a-b23eef40f20f","remote_addr":"[::1]:52117","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":985,"response_code":"201"}

```

Generate server XML file:

```console
$ curl -X put http://[::1]:8888/server/config -H "Content-Type: application/json" -d '{"generation": 3, "settings": {"id": 45, "keepers": [{"ipv6": "ff::01"}, {"ipv4": "127.0.0.1"}, {"domain_name": "hi.there"}], "remote_servers": [{"ipv6": "ff::08"}, {"ipv4": "127.0.0.2"}], "config_dir": "./", "datastore_path": "./", "listen_addr": "::1" }}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   858  100   602  100   256   210k  91789 --:--:-- --:--:-- --:--:--  418k
{
  "logger": {
    "level": "trace",
    "log": "./log/clickhouse.log",
    "errorlog": "./log/clickhouse.err.log",
    "size": 100,
    "count": 1
  },
  "macros": {
    "shard": 1,
    "replica": 45,
    "cluster": "oximeter_cluster"
  },
  "listen_host": "::1",
  "http_port": 8123,
  "tcp_port": 9000,
  "interserver_http_port": 9009,
  "remote_servers": {
    "cluster": "oximeter_cluster",
    "secret": "some-unique-value",
    "replicas": [
      {
        "host": {
          "ipv6": "ff::8"
        },
        "port": 9000
      },
      {
        "host": {
          "ipv4": "127.0.0.2"
        },
        "port": 9000
      }
    ]
  },
  "keepers": {
    "nodes": [
      {
        "host": {
          "ipv6": "ff::1"
        },
        "port": 9181
      },
      {
        "host": {
          "ipv4": "127.0.0.1"
        },
        "port": 9181
      },
      {
        "host": {
          "domain_name": "hi.there"
        },
        "port": 9181
      }
    ]
  },
  "data_path": "./data"
}
```

Generate keeper XML file:

```console
$ curl -X put http://[::1]:8888/keeper/config -H "Content-Type: application/json" -d '{"generation": 3, "settings": {"id": 1, "raft_servers": [{"id": 1, "host": {"ipv6": "ff::01"}}, {"id": 2, "host":{"ipv4": "127.0.0.1"}}, {"id": 3, "host":{"domain_name": "hi.there"}}], "config_dir": "./", "datastore_path": "./", "listen_addr": "::1" }}' | jq
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   820  100   568  100   252   301k   133k --:--:-- --:--:-- --:--:--  800k
{
  "logger": {
    "level": "trace",
    "log": "./log/clickhouse-keeper.log",
    "errorlog": "./log/clickhouse-keeper.err.log",
    "size": 100,
    "count": 1
  },
  "listen_host": "::1",
  "tcp_port": 9181,
  "server_id": 1,
  "log_storage_path": "./coordination/log",
  "snapshot_storage_path": "./coordination/snapshots",
  "coordination_settings": {
    "operation_timeout_ms": 10000,
    "session_timeout_ms": 30000,
    "raft_logs_level": "trace"
  },
  "raft_config": {
    "servers": [
      {
        "id": 1,
        "hostname": {
          "ipv6": "ff::1"
        },
        "port": 9234
      },
      {
        "id": 2,
        "hostname": {
          "ipv4": "127.0.0.1"
        },
        "port": 9234
      },
      {
        "id": 3,
        "hostname": {
          "domain_name": "hi.there"
        },
        "port": 9234
      }
    ]
  }
}
```


Related: #5999 , #3824
karencfv added a commit that referenced this issue Sep 9, 2024
## Overview

This commit replaces the old replicated ClickHouse server and keeper
configuration templates with calls to the `clickhouse-admin` API that
generate said configuration files.

## Purpose

While the end goal is to have Nexus make the API calls to generate the
configuration files, we'd like to have a working implementation of the
`clickhouse-admin` API via the SMF services. Using `curl` is not what
the finished work will look like, but rather it is the simplest way to
have a working implementation in the mean time.

## Testing

Deployed this branch on a Helios machine with the following results

Replica 1
```console
root@oxz_clickhouse_server_a9d02cd3:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::f
ClickHouse client version 23.8.7.1.
Connecting to fd00:1122:3344:101::f:9000 as user default.
Connected to ClickHouse server version 23.8.7 revision 54465.

oximeter_cluster_1 :) SHOW TABLES FROM oximeter

SHOW TABLES FROM oximeter

Query id: 06867649-f49e-451f-b9f1-5a574e12ce5b

┌─name─────────────────────────────┐
│ fields_bool                      │
│ fields_bool_local                │
│ fields_i16                       │
│ fields_i16_local                 │
│ fields_i32                       │
│ fields_i32_local                 │
│ fields_i64                       │
│ fields_i64_local                 │
│ fields_i8                        │
│ fields_i8_local                  │
│ fields_ipaddr                    │
│ fields_ipaddr_local              │
│ fields_string                    │
│ fields_string_local              │
│ fields_u16                       │
│ fields_u16_local                 │
│ fields_u32                       │
│ fields_u32_local                 │
│ fields_u64                       │
│ fields_u64_local                 │
│ fields_u8                        │
│ fields_u8_local                  │
│ fields_uuid                      │
│ fields_uuid_local                │
│ measurements_bool                │
│ measurements_bool_local          │
│ measurements_bytes               │
│ measurements_bytes_local         │
│ measurements_cumulativef32       │
│ measurements_cumulativef32_local │
│ measurements_cumulativef64       │
│ measurements_cumulativef64_local │
│ measurements_cumulativei64       │
│ measurements_cumulativei64_local │
│ measurements_cumulativeu64       │
│ measurements_cumulativeu64_local │
│ measurements_f32                 │
│ measurements_f32_local           │
│ measurements_f64                 │
│ measurements_f64_local           │
│ measurements_histogramf32        │
│ measurements_histogramf32_local  │
│ measurements_histogramf64        │
│ measurements_histogramf64_local  │
│ measurements_histogrami16        │
│ measurements_histogrami16_local  │
│ measurements_histogrami32        │
│ measurements_histogrami32_local  │
│ measurements_histogrami64        │
│ measurements_histogrami64_local  │
│ measurements_histogrami8         │
│ measurements_histogrami8_local   │
│ measurements_histogramu16        │
│ measurements_histogramu16_local  │
│ measurements_histogramu32        │
│ measurements_histogramu32_local  │
│ measurements_histogramu64        │
│ measurements_histogramu64_local  │
│ measurements_histogramu8         │
│ measurements_histogramu8_local   │
│ measurements_i16                 │
│ measurements_i16_local           │
│ measurements_i32                 │
│ measurements_i32_local           │
│ measurements_i64                 │
│ measurements_i64_local           │
│ measurements_i8                  │
│ measurements_i8_local            │
│ measurements_string              │
│ measurements_string_local        │
│ measurements_u16                 │
│ measurements_u16_local           │
│ measurements_u32                 │
│ measurements_u32_local           │
│ measurements_u64                 │
│ measurements_u64_local           │
│ measurements_u8                  │
│ measurements_u8_local            │
│ timeseries_schema                │
│ timeseries_schema_local          │
│ version                          │
└──────────────────────────────────┘

81 rows in set. Elapsed: 0.005 sec. 

oximeter_cluster_1 :) SELECT * FROM oximeter.measurements_u64

SELECT *
FROM oximeter.measurements_u64

Query id: 2e13d330-8f0b-4346-afc0-ba3c21ea7674

┌─timeseries_name─────────────────────────┬───────timeseries_key─┬─────────────────────timestamp─┬─datum─┐
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:16:47.241835734 │     0 │
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:16:48.241091831 │     0 │
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:16:49.241294398 │     0 │
<...>
```

Replica 2
```console
root@oxz_clickhouse_server_ba1601d3:~# /opt/oxide/clickhouse_server/clickhouse client --host fd00:1122:3344:101::e
ClickHouse client version 23.8.7.1.
Connecting to fd00:1122:3344:101::e:9000 as user default.
Connected to ClickHouse server version 23.8.7 revision 54465.

oximeter_cluster_2 :) SHOW TABLES FROM oximeter

SHOW TABLES FROM oximeter

Query id: 33dd1d4d-1596-44e3-90ea-c755a1e3ae24

┌─name─────────────────────────────┐
│ fields_bool                      │
│ fields_bool_local                │
│ fields_i16                       │
│ fields_i16_local                 │
│ fields_i32                       │
│ fields_i32_local                 │
│ fields_i64                       │
│ fields_i64_local                 │
│ fields_i8                        │
│ fields_i8_local                  │
│ fields_ipaddr                    │
│ fields_ipaddr_local              │
│ fields_string                    │
│ fields_string_local              │
│ fields_u16                       │
│ fields_u16_local                 │
│ fields_u32                       │
│ fields_u32_local                 │
│ fields_u64                       │
│ fields_u64_local                 │
│ fields_u8                        │
│ fields_u8_local                  │
│ fields_uuid                      │
│ fields_uuid_local                │
│ measurements_bool                │
│ measurements_bool_local          │
│ measurements_bytes               │
│ measurements_bytes_local         │
│ measurements_cumulativef32       │
│ measurements_cumulativef32_local │
│ measurements_cumulativef64       │
│ measurements_cumulativef64_local │
│ measurements_cumulativei64       │
│ measurements_cumulativei64_local │
│ measurements_cumulativeu64       │
│ measurements_cumulativeu64_local │
│ measurements_f32                 │
│ measurements_f32_local           │
│ measurements_f64                 │
│ measurements_f64_local           │
│ measurements_histogramf32        │
│ measurements_histogramf32_local  │
│ measurements_histogramf64        │
│ measurements_histogramf64_local  │
│ measurements_histogrami16        │
│ measurements_histogrami16_local  │
│ measurements_histogrami32        │
│ measurements_histogrami32_local  │
│ measurements_histogrami64        │
│ measurements_histogrami64_local  │
│ measurements_histogrami8         │
│ measurements_histogrami8_local   │
│ measurements_histogramu16        │
│ measurements_histogramu16_local  │
│ measurements_histogramu32        │
│ measurements_histogramu32_local  │
│ measurements_histogramu64        │
│ measurements_histogramu64_local  │
│ measurements_histogramu8         │
│ measurements_histogramu8_local   │
│ measurements_i16                 │
│ measurements_i16_local           │
│ measurements_i32                 │
│ measurements_i32_local           │
│ measurements_i64                 │
│ measurements_i64_local           │
│ measurements_i8                  │
│ measurements_i8_local            │
│ measurements_string              │
│ measurements_string_local        │
│ measurements_u16                 │
│ measurements_u16_local           │
│ measurements_u32                 │
│ measurements_u32_local           │
│ measurements_u64                 │
│ measurements_u64_local           │
│ measurements_u8                  │
│ measurements_u8_local            │
│ timeseries_schema                │
│ timeseries_schema_local          │
│ version                          │
└──────────────────────────────────┘

81 rows in set. Elapsed: 0.010 sec. 

oximeter_cluster_2 :) SELECT * FROM oximeter.measurements_u64

SELECT *
FROM oximeter.measurements_u64

Query id: 06da0f16-3055-47cb-9984-94dc78f99afc

┌─timeseries_name─────────────────────────┬───────timeseries_key─┬─────────────────────timestamp─┬─datum─┐
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:22:02.443983562 │     0 │
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:22:03.444346219 │     0 │
│ ddm_router:originated_tunnel_endpoints  │  2085026407707057203 │ 2024-09-09 07:22:04.444356384 │     0 │
<...>
```

Keeper 1

```console
root@oxz_clickhouse_keeper_8cb0de91:~# echo mntr | nc fd00:1122:3344:101::12 9181
zk_version      v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64
zk_avg_latency  4
zk_max_latency  103
zk_min_latency  0
zk_packets_received     27769
zk_packets_sent 29290
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state leader
zk_znode_count  6535
zk_watch_count  83
zk_ephemerals_count     82
zk_approximate_data_size        2330794
zk_key_arena_size       1044480
zk_latest_snapshot_size 0
zk_followers    2
zk_synced_followers     2
```

Keeper 2

```console
root@oxz_clickhouse_keeper_a6c18bd2:~# echo mntr | nc fd00:1122:3344:101::10 9181
zk_version      v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64
zk_avg_latency  10
zk_max_latency  139
zk_min_latency  0
zk_packets_received     22278
zk_packets_sent 23922
zk_num_alive_connections        1
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  7015
zk_watch_count  83
zk_ephemerals_count     82
zk_approximate_data_size        2512980
zk_key_arena_size       1044480
zk_latest_snapshot_size 0
```

Keeper 3

```console
root@oxz_clickhouse_keeper_45d3e6ef:~# echo mntr | nc fd00:1122:3344:101::11 9181
zk_version      v23.8.7.1-lts-077df679bed122ad45c8b105d8916ccfec85ae64
zk_avg_latency  0
zk_max_latency  0
zk_min_latency  0
zk_packets_received     0
zk_packets_sent 0
zk_num_alive_connections        0
zk_outstanding_requests 0
zk_server_state follower
zk_znode_count  7188
zk_watch_count  0
zk_ephemerals_count     82
zk_approximate_data_size        2575631
zk_key_arena_size       1044480
zk_latest_snapshot_size 0
```

Related: #5999
Closes: #3824
karencfv added a commit that referenced this issue Sep 19, 2024
## Overview

This commit introduces a new `clickhouse-admin` API endpoint:
`/keeper/lgif`.

This endpoint uses the ClickHouse CLI internally to retrieve and parse
the logically grouped information file from the ClickHouse keepers.

## Purpose

Reconfigurator will need this information to reliably manage and operate
a ClickHouse replicated cluster. Additional endpoints to retrieve other
information from ClickHouse servers or keepers will be added in follow
up PRs.

## Testing

In addition to the unit tests, I have manually tested with the following
results:

```console
$ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888 -l [::1]:20001 -b /Users/karcar/src/omicron/out/clickhouse/clickhouse
   Compiling omicron-clickhouse-admin v0.1.0 (/Users/karcar/src/omicron/clickhouse-admin)
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 2.46s
     Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888' -l '[::1]:20001' -b /Users/karcar/src/omicron/out/clickhouse/clickhouse`
note: configured to log to "/dev/stdout"
{"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:19.383597Z","hostname":"ixchel","pid":3115,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"}
{"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:23.843325Z","hostname":"ixchel","pid":3115,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:54455"}
{"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-12T02:37:24.302588Z","hostname":"ixchel","pid":3115,"uri":"/keeper/lgif","method":"GET","req_id":"64b232d0-d6ac-4cae-8f0a-f14cf6d1dfba","remote_addr":"[::1]:54455","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":458301,"response_code":"200"}
```

```console
$ curl http://[::1]:8888/keeper/lgif
{"first_log_idx":1,"first_log_term":1,"last_log_idx":11717,"last_log_term":20,"last_committed_log_idx":11717,"leader_committed_log_idx":11717,"target_committed_log_idx":11717,"last_snapshot_idx":9465}
```

Related: #5999
karencfv added a commit that referenced this issue Sep 23, 2024
## Overview

This commit implements a new clickhouse-admin endpoint to retrieve and
parse information from the ClickHouse virtual node `/keeper/config`
which contains the last committed cluster configuration.

## Purpose

The main purpose of retrieving this information is to have the ability
to populate the inventory's `raft_config` in
`ClickhouseKeeperClusterMembership`.


https://github.com/oxidecomputer/omicron/blob/453311a880075b9f89626bb20cca1c1cd85ffb4f/nexus/types/src/inventory.rs#L499-L503

In a follow up PR an endpoint that specifically retrieves all
information to populate `ClickhouseKeeperClusterMembership`. This will
be done by making several calls to the `clickhouse keeper-client` and
using the parsing function here to populate `raft_config`.

The endpoint itself will be useful to retrieve information for
debugging.

## Manual testing

```console
$ curl http://[::1]:8888/keeper/raft-config
{"keeper_servers":[{"server_id":1,"host":{"ipv6":"::1"},"raft_port":21001,"server_type":"participant","priority":1},{"server_id":2,"host":{"ipv6":"::1"},"raft_port":21002,"server_type":"participant","priority":1},{"server_id":3,"host":{"ipv6":"::1"},"raft_port":21003,"server_type":"participant","priority":1}]}
```

Related: #5999
karencfv added a commit that referenced this issue Sep 26, 2024
## Overview

This commit implements a new clickhouse-admin endpoint to retrieve and
parse information from the keeper node configuration.

## Purpose

The main purpose of retrieving this information is to have the ability
to populate the inventory's `queried_keeper` in
`ClickhouseKeeperClusterMembership`.


https://github.com/oxidecomputer/omicron/blob/453311a880075b9f89626bb20cca1c1cd85ffb4f/nexus/types/src/inventory.rs#L499-L503

In a follow up PR an endpoint that specifically retrieves all
information to populate `ClickhouseKeeperClusterMembership`. This will
be done by making several calls to the `clickhouse keeper-client` and
using the parsing function here to populate `queried_keeper`.

The endpoint itself will be useful to retrieve information for
debugging.

## Manual testing

```console
$ cargo run --bin=clickhouse-admin -- run -c ./smf/clickhouse-admin/config.toml -a [::1]:8888 -l [::1]:20001 -b ./out/clickhouse/clickhouse
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.43s
     Running `target/debug/clickhouse-admin run -c ./smf/clickhouse-admin/config.toml -a '[::1]:8888' -l '[::1]:20001' -b ./out/clickhouse/clickhouse`
note: configured to log to "/dev/stdout"
{"msg":"listening","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:39.529734Z","hostname":"ixchel","pid":61269,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:205"}
{"msg":"accepted connection","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:46.767686Z","hostname":"ixchel","pid":61269,"local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:775","remote_addr":"[::1]:57461"}
{"msg":"Retrieved data from `clickhouse keeper-client --q conf`","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:47.224265Z","hostname":"ixchel","pid":61269,"component":"ClickhouseCli","file":"clickhouse-admin/types/src/lib.rs:605","output":"\"server_id=1\\nenable_ipv6=true\\ntcp_port=20001\\nfour_letter_word_allow_list=conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro,rcvr,apiv,csnp,lgif,rqld,rclc,clrs,ftfl\\nmax_requests_batch_size=100\\nmin_session_timeout_ms=10000\\nsession_timeout_ms=30000\\noperation_timeout_ms=10000\\ndead_session_check_period_ms=500\\nheart_beat_interval_ms=500\\nelection_timeout_lower_bound_ms=1000\\nelection_timeout_upper_bound_ms=2000\\nreserved_log_items=100000\\nsnapshot_distance=100000\\nauto_forwarding=true\\nshutdown_timeout=5000\\nstartup_timeout=180000\\nraft_logs_level=trace\\nsnapshots_to_keep=3\\nrotate_log_storage_interval=100000\\nstale_log_gap=10000\\nfresh_log_gap=200\\nmax_requests_batch_size=100\\nmax_requests_batch_bytes_size=102400\\nmax_request_queue_size=100000\\nmax_requests_quick_batch_size=100\\nquorum_reads=false\\nforce_sync=true\\ncompress_logs=true\\ncompress_snapshots_with_zstd_format=true\\nconfiguration_change_tries_count=20\\nraft_limits_reconnect_limit=50\\nlog_storage_path=./deployment/keeper-1/coordination/log\\nlog_storage_disk=LocalLogDisk\\nsnapshot_storage_path=./deployment/keeper-1/coordination/snapshots\\nsnapshot_storage_disk=LocalSnapshotDisk\\n\\n\""}
{"msg":"request completed","v":0,"name":"clickhouse-admin","level":30,"time":"2024-09-24T23:15:47.22448Z","hostname":"ixchel","pid":61269,"uri":"/keeper/conf","method":"GET","req_id":"847f0baa-3b16-4273-a84a-fcfd5acd6b49","remote_addr":"[::1]:57461","local_addr":"[::1]:8888","component":"dropshot","file":"/Users/karcar/.cargo/git/checkouts/dropshot-a4a923d29dccc492/06c8dab/dropshot/src/server.rs:914","latency_us":455407,"response_code":"200"}
```

```console
$ curl http://[::1]:8888/keeper/conf
{"server_id":1,"enable_ipv6":true,"tcp_port":20001,"four_letter_word_allow_list":"conf,cons,crst,envi,ruok,srst,srvr,stat,wchs,dirs,mntr,isro,rcvr,apiv,csnp,lgif,rqld,rclc,clrs,ftfl","max_requests_batch_size":100,"min_session_timeout_ms":10000,"session_timeout_ms":30000,"operation_timeout_ms":10000,"dead_session_check_period_ms":500,"heart_beat_interval_ms":500,"election_timeout_lower_bound_ms":1000,"election_timeout_upper_bound_ms":2000,"reserved_log_items":100000,"snapshot_distance":100000,"auto_forwarding":true,"shutdown_timeout":5000,"startup_timeout":180000,"raft_logs_level":"trace","snapshots_to_keep":3,"rotate_log_storage_interval":100000,"stale_log_gap":10000,"fresh_log_gap":200,"max_requests_batch_bytes_size":102400,"max_request_queue_size":100000,"max_requests_quick_batch_size":100,"quorum_reads":false,"force_sync":true,"compress_logs":true,"compress_snapshots_with_zstd_format":true,"configuration_change_tries_count":20,"raft_limits_reconnect_limit":50,"log_storage_path":"./deployment/keeper-1/coordination/log","log_storage_disk":"LocalLogDisk","snapshot_storage_path":"./deployment/keeper-1/coordination/snapshots","snapshot_storage_disk":"LocalSnapshotDisk"}
```

Related: #5999
@davepacheco davepacheco modified the milestones: 11, 12 Oct 4, 2024
andrewjstone pushed a commit that referenced this issue Oct 11, 2024
## Overview

This commit makes a few changes to the way the `clickhouse_server` and
`clickhouse_keeper` SMF services are launched:

- They are now disabled by default on zone boot.
- They are enabled by clickhouse-admin after the configuration files
have been generated.

Related: #5999
@morlandi7 morlandi7 removed this from the 12 milestone Nov 22, 2024
@morlandi7 morlandi7 added this to the 13 milestone Nov 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants