Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scale out Clickhouse to a multinode cluster #3494

Merged
merged 69 commits into from
Sep 5, 2023
Merged
Show file tree
Hide file tree
Changes from 63 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
0b21df2
Initial functional 2 replica 3 coordinator cluster
karencfv Jul 5, 2023
3281cbd
Create config templates and pseudocode for updated init config
karencfv Jul 5, 2023
b628be6
Dynamically build configs for servers and keepers
karencfv Jul 6, 2023
cdfb1d1
Create a separate service for keepers
karencfv Jul 10, 2023
41518c8
Update manifest and file location
karencfv Jul 11, 2023
795327e
clean up
karencfv Jul 11, 2023
73e5e22
make linter happy
karencfv Jul 11, 2023
83449a2
Zone image is clickhouse-keeper.tar.gz not clickhouse_keeper.tar.gz
karencfv Jul 11, 2023
38fb86d
Merge branch 'main' into ch-replicated-engine
karencfv Jul 19, 2023
06b8f21
Only use underscores to simplify
karencfv Jul 20, 2023
b5dd484
Merge remote-tracking branch 'upstream' into ch-replicated-engine
karencfv Jul 20, 2023
ca7ad33
Create composite packages to include internal-dns tar
karencfv Jul 24, 2023
bbbbd28
Get internal DNS working
karencfv Jul 25, 2023
7b7b245
Add datastore to keeper service
karencfv Jul 26, 2023
72cc038
Append default and custom configs
karencfv Jul 31, 2023
34f370b
Give keepers dynamic discoverable IDs
karencfv Jul 31, 2023
79dd329
Clean up scripts and configs
karencfv Aug 1, 2023
94f8376
Clean up
karencfv Aug 1, 2023
cea0612
First pass at making tests pass
karencfv Aug 1, 2023
95be228
gargh linter
karencfv Aug 1, 2023
e24a2dd
Add additional zpools for dev envs
karencfv Aug 2, 2023
f6aac77
Add flag to internal-dns-cli to output host name only
karencfv Aug 2, 2023
758fd39
Revert testing configuration and clean up
karencfv Aug 2, 2023
ef914b1
Run oximeter on replicated or single node set ups
karencfv Aug 2, 2023
7eb06dd
fmt
karencfv Aug 2, 2023
1abe9dd
Merge branch 'main' into ch-replicated-engine
karencfv Aug 2, 2023
e2a4060
Small fix after merge with main branch
karencfv Aug 3, 2023
9c759e6
expectoration
karencfv Aug 3, 2023
bc33e97
Address comments
karencfv Aug 4, 2023
1ebcf14
fmt
karencfv Aug 4, 2023
23df4ef
address review comments
karencfv Aug 7, 2023
80eb1d1
Merge branch 'main' into ch-replicated-engine
karencfv Aug 8, 2023
2b1edd9
save config env vars to file
karencfv Aug 8, 2023
5fd1e75
fix scripts and configuration for bench gimlet
karencfv Aug 9, 2023
3bda6b3
Explicitly declare if a database is single node or replicated
karencfv Aug 9, 2023
2492ae8
foundation to test replicated nodes
karencfv Aug 9, 2023
8541d17
Testing utils
karencfv Aug 10, 2023
148dda9
Test replicated nodes
karencfv Aug 10, 2023
cb0cd66
First try at testing
karencfv Aug 11, 2023
b9e64cd
Keeper doesn't like absolute paths :(
karencfv Aug 11, 2023
9d8d019
Get test keepers going
karencfv Aug 14, 2023
81ab2ad
Make the test work
karencfv Aug 14, 2023
a8a02d4
Correct way to check whether a replicated server is ready for connect…
karencfv Aug 14, 2023
c116370
Clean up
karencfv Aug 14, 2023
28354be
Rename test config directories
karencfv Aug 14, 2023
9562f0e
fmt
karencfv Aug 15, 2023
9520449
fix tests
karencfv Aug 15, 2023
8872b09
Refine testing
karencfv Aug 16, 2023
3af0769
Revert bench gimlet configuration and fmt
karencfv Aug 16, 2023
f621b80
Bump clickhouse readyness testing timeout and make clippy happy
karencfv Aug 16, 2023
e51bd0f
Merge branch 'main' into ch-replicated-engine
karencfv Aug 17, 2023
298ca4e
Give end to end tests more time to bring up nexus
karencfv Aug 21, 2023
1930e14
Merge branch 'main' into ch-replicated-engine
karencfv Aug 21, 2023
e7a4635
Automatically detect whether ClickHouse set up is replicated or singl…
karencfv Aug 23, 2023
b8ccf29
Works on my machine, increase timeout
karencfv Aug 23, 2023
d83dc85
Merge branch 'main' into ch-replicated-engine
karencfv Aug 29, 2023
691d9d5
Update CRDB with new service enums
karencfv Aug 29, 2023
4a1c179
Disable replicated ClickHouse
karencfv Aug 30, 2023
fe124fd
Make clippy happy
karencfv Aug 30, 2023
7f67c7b
Merge branch 'main' into ch-replicated-engine
karencfv Aug 31, 2023
251df8a
Small fix after merge
karencfv Aug 31, 2023
c58c8fe
Revert e2e timeout duration
karencfv Aug 31, 2023
4832dda
Address review comments
karencfv Aug 31, 2023
c7e3598
make the linter happy
karencfv Aug 31, 2023
0adffb7
Address comments
karencfv Sep 1, 2023
98c705b
Create distributed tables
karencfv Sep 1, 2023
77a3492
Stop forgetting to run cargo fmt before pushing the commit
karencfv Sep 1, 2023
5933f51
Also don't forget about clippy :facepalm:
karencfv Sep 1, 2023
6c82c8a
Small fix to referenced macro in SQL
karencfv Sep 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/buildomat/jobs/package.sh
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@ ptime -m ./tools/build-global-zone-packages.sh "$tarball_src_dir" /work
mkdir -p /work/zones
zones=(
out/clickhouse.tar.gz
out/clickhouse_keeper.tar.gz
out/cockroachdb.tar.gz
out/crucible-pantry.tar.gz
out/crucible.tar.gz
Expand Down
1 change: 1 addition & 0 deletions common/src/address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ pub const PROPOLIS_PORT: u16 = 12400;
pub const COCKROACH_PORT: u16 = 32221;
pub const CRUCIBLE_PORT: u16 = 32345;
pub const CLICKHOUSE_PORT: u16 = 8123;
pub const CLICKHOUSE_KEEPER_PORT: u16 = 9181;
pub const OXIMETER_PORT: u16 = 12223;
pub const DENDRITE_PORT: u16 = 12224;
pub const DDMD_PORT: u16 = 8000;
Expand Down
2 changes: 1 addition & 1 deletion dev-tools/src/bin/omicron-dev.rs
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ async fn cmd_clickhouse_run(args: &ChRunArgs) -> Result<(), anyhow::Error> {

// Start the database server process, possibly on a specific port
let mut db_instance =
dev::clickhouse::ClickHouseInstance::new(args.port).await?;
dev::clickhouse::ClickHouseInstance::new_single_node(args.port).await?;
println!(
"omicron-dev: running ClickHouse with full command:\n\"clickhouse {}\"",
db_instance.cmdline().join(" ")
Expand Down
18 changes: 16 additions & 2 deletions internal-dns-cli/src/bin/dnswait.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,21 +23,31 @@ struct Opt {
#[clap(long, action)]
nameserver_addresses: Vec<SocketAddr>,

/// service name to be resolved (should be the target of a DNS name)
/// Service name to be resolved (should be the target of a DNS name)
#[arg(value_enum)]
srv_name: ServiceName,

/// Output service host names only, omitting the port
#[clap(long, short = 'H', action)]
hostname_only: bool,
}

#[derive(Debug, Clone, Copy, ValueEnum)]
#[value(rename_all = "kebab-case")]
enum ServiceName {
Cockroach,
Clickhouse,
ClickhouseKeeper,
}

impl From<ServiceName> for internal_dns::ServiceName {
fn from(value: ServiceName) -> Self {
match value {
ServiceName::Cockroach => internal_dns::ServiceName::Cockroach,
ServiceName::Clickhouse => internal_dns::ServiceName::Clickhouse,
ServiceName::ClickhouseKeeper => {
internal_dns::ServiceName::ClickhouseKeeper
}
}
}
}
Expand Down Expand Up @@ -91,7 +101,11 @@ async fn main() -> Result<()> {
.context("unexpectedly gave up")?;

for (target, port) in result {
println!("{}:{}", target, port)
if opt.hostname_only {
println!("{}", target)
} else {
println!("{}:{}", target, port)
}
}

Ok(())
Expand Down
4 changes: 4 additions & 0 deletions internal-dns/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -422,6 +422,10 @@ mod test {
#[test]
fn display_srv_service() {
assert_eq!(ServiceName::Clickhouse.dns_name(), "_clickhouse._tcp",);
assert_eq!(
ServiceName::ClickhouseKeeper.dns_name(),
"_clickhouse-keeper._tcp",
);
assert_eq!(ServiceName::Cockroach.dns_name(), "_cockroach._tcp",);
assert_eq!(ServiceName::InternalDns.dns_name(), "_nameservice._tcp",);
assert_eq!(ServiceName::Nexus.dns_name(), "_nexus._tcp",);
Expand Down
3 changes: 3 additions & 0 deletions internal-dns/src/names.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub const DNS_ZONE_EXTERNAL_TESTING: &str = "oxide-dev.test";
#[derive(Clone, Debug, Hash, Eq, Ord, PartialEq, PartialOrd)]
pub enum ServiceName {
Clickhouse,
ClickhouseKeeper,
Cockroach,
InternalDns,
ExternalDns,
Expand All @@ -38,6 +39,7 @@ impl ServiceName {
fn service_kind(&self) -> &'static str {
match self {
ServiceName::Clickhouse => "clickhouse",
ServiceName::ClickhouseKeeper => "clickhouse-keeper",
ServiceName::Cockroach => "cockroach",
ServiceName::ExternalDns => "external-dns",
ServiceName::InternalDns => "nameservice",
Expand All @@ -61,6 +63,7 @@ impl ServiceName {
pub(crate) fn dns_name(&self) -> String {
match self {
ServiceName::Clickhouse
| ServiceName::ClickhouseKeeper
| ServiceName::Cockroach
| ServiceName::InternalDns
| ServiceName::ExternalDns
Expand Down
2 changes: 1 addition & 1 deletion nexus/benches/setup_benchmark.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ async fn do_crdb_setup() {
// Wraps exclusively the ClickhouseDB portion of setup/teardown.
async fn do_clickhouse_setup() {
let mut clickhouse =
dev::clickhouse::ClickHouseInstance::new(0).await.unwrap();
dev::clickhouse::ClickHouseInstance::new_single_node(0).await.unwrap();
clickhouse.cleanup().await.unwrap();
}

Expand Down
4 changes: 4 additions & 0 deletions nexus/db-model/src/dataset_kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ impl_enum_type!(
Crucible => b"crucible"
Cockroach => b"cockroach"
Clickhouse => b"clickhouse"
ClickhouseKeeper => b"clickhouse_keeper"
ExternalDns => b"external_dns"
InternalDns => b"internal_dns"
);
Expand All @@ -35,6 +36,9 @@ impl From<internal_api::params::DatasetKind> for DatasetKind {
internal_api::params::DatasetKind::Clickhouse => {
DatasetKind::Clickhouse
}
internal_api::params::DatasetKind::ClickhouseKeeper => {
DatasetKind::ClickhouseKeeper
}
internal_api::params::DatasetKind::ExternalDns => {
DatasetKind::ExternalDns
}
Expand Down
2 changes: 1 addition & 1 deletion nexus/db-model/src/schema.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1130,7 +1130,7 @@ table! {
///
/// This should be updated whenever the schema is changed. For more details,
/// refer to: schema/crdb/README.adoc
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(3, 0, 3);
pub const SCHEMA_VERSION: SemverVersion = SemverVersion::new(4, 0, 0);

allow_tables_to_appear_in_same_query!(
system_update,
Expand Down
4 changes: 4 additions & 0 deletions nexus/db-model/src/service_kind.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ impl_enum_type!(

// Enum values
Clickhouse => b"clickhouse"
ClickhouseKeeper => b"clickhouse_keeper"
Cockroach => b"cockroach"
Crucible => b"crucible"
CruciblePantry => b"crucible_pantry"
Expand Down Expand Up @@ -54,6 +55,9 @@ impl From<internal_api::params::ServiceKind> for ServiceKind {
internal_api::params::ServiceKind::Clickhouse => {
ServiceKind::Clickhouse
}
internal_api::params::ServiceKind::ClickhouseKeeper => {
ServiceKind::ClickhouseKeeper
}
internal_api::params::ServiceKind::Cockroach => {
ServiceKind::Cockroach
}
Expand Down
4 changes: 3 additions & 1 deletion nexus/test-utils/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,9 @@ impl<'a, N: NexusServer> ControlPlaneTestContextBuilder<'a, N> {
let log = &self.logctx.log;
debug!(log, "Starting Clickhouse");
let clickhouse =
dev::clickhouse::ClickHouseInstance::new(0).await.unwrap();
dev::clickhouse::ClickHouseInstance::new_single_node(0)
.await
.unwrap();
let port = clickhouse.port();

let zpool_id = Uuid::new_v4();
Expand Down
5 changes: 4 additions & 1 deletion nexus/tests/integration_tests/oximeter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,10 @@ async fn test_oximeter_reregistration() {
);
let client =
oximeter_db::Client::new(ch_address.into(), &context.logctx.log);
client.init_db().await.expect("Failed to initialize timeseries database");
client
.init_single_node_db()
.await
.expect("Failed to initialize timeseries database");

// Helper to retrieve the timeseries from ClickHouse
let timeseries_name = "integration_target:integration_metric";
Expand Down
4 changes: 4 additions & 0 deletions nexus/types/src/internal_api/params.rs
Original file line number Diff line number Diff line change
Expand Up @@ -125,6 +125,7 @@ pub enum DatasetKind {
Crucible,
Cockroach,
Clickhouse,
ClickhouseKeeper,
ExternalDns,
InternalDns,
}
Expand All @@ -136,6 +137,7 @@ impl fmt::Display for DatasetKind {
Crucible => "crucible",
Cockroach => "cockroach",
Clickhouse => "clickhouse",
ClickhouseKeeper => "clickhouse_keeper",
ExternalDns => "external_dns",
InternalDns => "internal_dns",
};
Expand Down Expand Up @@ -168,6 +170,7 @@ pub struct ServiceNic {
#[serde(rename_all = "snake_case", tag = "type", content = "content")]
pub enum ServiceKind {
Clickhouse,
ClickhouseKeeper,
Cockroach,
Crucible,
CruciblePantry,
Expand All @@ -186,6 +189,7 @@ impl fmt::Display for ServiceKind {
use ServiceKind::*;
let s = match self {
Clickhouse => "clickhouse",
ClickhouseKeeper => "clickhouse_keeper",
Cockroach => "cockroach",
Crucible => "crucible",
ExternalDns { .. } => "external_dns",
Expand Down
15 changes: 15 additions & 0 deletions openapi/nexus-internal.json
Original file line number Diff line number Diff line change
Expand Up @@ -922,6 +922,7 @@
"crucible",
"cockroach",
"clickhouse",
"clickhouse_keeper",
"external_dns",
"internal_dns"
]
Expand Down Expand Up @@ -2803,6 +2804,20 @@
"type"
]
},
{
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"clickhouse_keeper"
]
}
},
"required": [
"type"
]
},
{
"type": "object",
"properties": {
Expand Down
33 changes: 33 additions & 0 deletions openapi/sled-agent.json
Original file line number Diff line number Diff line change
Expand Up @@ -1091,6 +1091,20 @@
"type"
]
},
{
"type": "object",
"properties": {
"type": {
"type": "string",
"enum": [
"clickhouse_keeper"
]
}
},
"required": [
"type"
]
},
{
"type": "object",
"properties": {
Expand Down Expand Up @@ -2524,6 +2538,24 @@
"type"
]
},
{
"type": "object",
"properties": {
"address": {
"type": "string"
},
"type": {
"type": "string",
"enum": [
"clickhouse_keeper"
]
}
},
"required": [
"address",
"type"
]
},
{
"type": "object",
"properties": {
Expand Down Expand Up @@ -3115,6 +3147,7 @@
"type": "string",
"enum": [
"clickhouse",
"clickhouse_keeper",
"cockroach_db",
"crucible_pantry",
"crucible",
Expand Down
7 changes: 6 additions & 1 deletion oximeter/collector/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -321,7 +321,12 @@ impl OximeterAgent {
)
};
let client = Client::new(db_address, &log);
client.init_db().await?;
let replicated = client.is_oximeter_cluster().await?;
if !replicated {
client.init_single_node_db().await?;
} else {
client.init_replicated_db().await?;
}

// Spawn the task for aggregating and inserting all metrics
tokio::spawn(async move {
Expand Down
8 changes: 4 additions & 4 deletions oximeter/db/src/bin/oxdb.rs
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,7 @@ async fn make_client(
let address = SocketAddr::new(address, port);
let client = Client::new(address, &log);
client
.init_db()
.init_single_node_db()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the server this client connects to has created the database on a cluster, instead of single node?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, I'll try it out and report back

.await
.context("Failed to initialize timeseries database")?;
Ok(client)
Expand Down Expand Up @@ -261,13 +261,13 @@ async fn populate(
Ok(())
}

async fn wipe_db(
async fn wipe_single_node_db(
address: IpAddr,
port: u16,
log: Logger,
) -> Result<(), anyhow::Error> {
let client = make_client(address, port, &log).await?;
client.wipe_db().await.context("Failed to wipe database")
client.wipe_single_node_db().await.context("Failed to wipe database")
}

async fn query(
Expand Down Expand Up @@ -313,7 +313,7 @@ async fn main() {
.unwrap();
}
Subcommand::Wipe => {
wipe_db(args.address, args.port, log).await.unwrap()
wipe_single_node_db(args.address, args.port, log).await.unwrap()
}
Subcommand::Query {
timeseries_name,
Expand Down
Loading