Initial work for replicated ClickHouse testing #4149

karencfv · 2023-09-27T06:28:21Z

Structural changes

Instead of spinning up a ClickHouse server per test, now only two servers are spun up to run all tests against. One for single node mode and another for replicated testing. Even though it may look odd to this this now, there is a good reason for this. ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers. This means that if we spin up several servers, the replicated test becomes incredibly flaky. It's important to note that this set up will not live forever. Once we can remove single node mode and start the replicated ClickHouse cluster as part of the build process, the replicated tests will be able to run independently.

All tests now run sequentially against a single server. For correctness, after each test the database is wiped out to give us a clean slate for testing. This only adds a few more seconds (<10s) to the overall testing time. Again, this will not be the final state of things, but necessary to be able to test alongside replicated clusters until we remove single node functionality.

Caveat

Github does some really weird things with the diffs on oximeter/db/src/client.rs. Frankly, it looks like a mess. I would probably checkout the branch itself to see the final state of the tests.

Follow up work

Given the size of this PR, I've not added all of the tests to run against the replicated cluster. There will be a follow up PR that does this.

Related: #4001
Closes: #4037

.config/nextest.toml

bnaecker · 2023-09-27T16:17:09Z

Thanks for starting on this @karencfv!

Can you explain in more detail what you mean when you say the tests become "flaky"? It's been really useful to have each test run against an independent database, and so I think we should try to keep that insofar as possible.

ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers.

This part is a bit confusing to me. The documentation indicates that it's possible to have replicated and non-replicated tables in the same server. Each server also watches its configuration file for changes, and so one can modify the cluster in various ways after each program starts. From that last link:

The server tracks changes in config files, as well as files and ZooKeeper nodes that were used when performing substitutions and overrides, and reloads the settings for users and clusters on the fly. This means that you can modify the cluster, users, and their settings without restarting the server.

Given all that, and the fact that serial_test doesn't work with cargo nextest, it seems like we really want to try to support multiple simultaneous clusters. One way to do that would be:

Start each program, keeper and data server, independently. The configuration file would not specify any details of the cluster, such as port numbers or replica IDs.
For each program, collect any necessary details such as OS-assigned port numbers, after the program begins.
Write the cluster details into the configuration file for each program.

If I'm understanding all the documentation correctly, this would let us start each server and then describe the cluster after the OS has assigned port numbers. We can avoid hard-coding anything, and keep running the tests in parallel.

bnaecker · 2023-09-27T16:57:44Z

oximeter/db/src/client.rs

@@ -610,7 +610,7 @@ impl DbWrite for Client {
    /// Wipe the ClickHouse database entirely from a replicated set up.
    async fn wipe_replicated_db(&self) -> Result<(), Error> {


This makes me realize something. I'm not sure we want the database initialization to be done by the client at all. Consider the case where we start 2 data servers, but only one instance of oximeter. The ClickHouse client that oximeter instance uses will initialize the one replica that it happens to connect to, but there's nothing to initialize the other replica.

I'm really thinking that we would like a way to manage the cluster more holistically. Rather than starting individual server programs, and relying on the client to run the SQL initializing the tables, it might help to have a notion of a ClickHouse deployment. This could describe the IP addresses and ports (if they're known); the replica IDs; and probably the SQL we want to run at startup. It might support single and multi-node, if we still want that.

I would imagine a "runner" program would be helpful here too, which would start the ClickHouse program itself; write out or update the XML based on the cluster configuration; and probably also initialize the database tables itself, again based on the cluster deployment. This would obviate needing the client to initialize the tables at all.

This is all pretty pie-in-the-sky at this point. But I'm reasonably certain we can strip down the configuration to the minimal set; write out the needed XML; and initialize the database tables, with everything derived from the cluster configuration.

That sounds like a good idea! I'll link this comment to the relevant issue

So one additional note to add here, from a conversation I had with @davepacheco. It's probably a good idea in the long run to have this front-end runner have very little configuration itself, and instead get that all from Nexus. It would only need to know how to reach Nexus (via DNS) and probably an ID of some kind, and then hit an endpoint in Nexus with that ID. That request could return the configuration that node needs to run (addresses, ports, whether to start keepers and or data servers, etc.).

karencfv · 2023-09-27T21:16:59Z

Thanks for the review @bnaecker!

Can you explain in more detail what you mean when you say the tests become "flaky"? It's been really useful to have each test run against an independent database, and so I think we should try to keep that insofar as possible.

ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers.

This part is a bit confusing to me. The documentation indicates that it's possible to have replicated and non-replicated tables in the same server. Each server also watches its configuration file for changes, and so one can modify the cluster in various ways after each program starts. From that last link:

The server tracks changes in config files, as well as files and ZooKeeper nodes that were used when performing substitutions and overrides, and reloads the settings for users and clusters on the fly. This means that you can modify the cluster, users, and their settings without restarting the server.

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up). This means that if we are spinning up several single node data servers in parallel, the moment the replicated cluster begins to be built, it may or may not start up if the single node tests are still running (this is what was happening on the ubuntu build-and-test job).

Given all that, and the fact that serial_test doesn't work with cargo nextest, it seems like we really want to try to support multiple simultaneous clusters. One way to do that would be:

Start each program, keeper and data server, independently. The configuration file would not specify any details of the cluster, such as port numbers or replica IDs.

For each program, collect any necessary details such as OS-assigned port numbers, after the program begins.

Write the cluster details into the configuration file for each program.

If I'm understanding all the documentation correctly, this would let us start each server and then describe the cluster after the OS has assigned port numbers. We can avoid hard-coding anything, and keep running the tests in parallel.

I would absolutely love to do that, and thought of several approaches, but the matter here is that the moment we port all of the current tests to replicated clusters the test times are going to go through the roof. Unlike single node servers, The ClickHouse cluster takes much longer to spin up. Because of this, it's not as feasible to have a bunch of clusters being spun up. My plan for this is the following:

Until we can remove all single node functionality, spin up only two ClickHouse deployments (one single node, one clustered) and run tests serially against them.
When we remove single node functionality, spin up a single ClickHouse cluster as a pre-test task which we can run all tests against in parallel if possible.

What do you think?

bnaecker · 2023-09-27T21:28:50Z

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up).

Do we understand what's actually causing the servers that are part of the cluster to start slowly?

Unlike single node servers, The ClickHouse cluster takes much longer to spin up

How much longer? The real question I think is whether the additional startup time is more than the cost of running the tests serially against a single cluster.

karencfv · 2023-09-28T00:33:00Z

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up).

Do we understand what's actually causing the servers that are part of the cluster to start slowly?

Ha! There was a rogue port that is being set up automatically in the single node servers that I can tweak in the clustered config. The two tests don't need to run serially anymore.

Unlike single node servers, The ClickHouse cluster takes much longer to spin up

How much longer? The real question I think is whether the additional startup time is more than the cost of running the tests serially against a single cluster.

Quite a bit, single node takes ~0.537s to start up, whereas a cluster takes ~3.128s as it checks that all keeper/server nodes are ready and listening for connections.

Additionally, I just realised that we would have an issue with ports as well. While we can have OS assigned http ports for single node servers, clustered servers have several other ports that we need to define and share across configs (TCP, interserver port, keeper ports etc). I can see this port allocation becoming quite complex and potentially unreliable if it were to be set dynamically.

That said, I believe we can find a way to run all of these tests in parallel once we work on #3982. When the init SQL is dynamically created, it will give us greater flexibility to test. For example: To avoid consistency issues, I can see us using a different database name for each test (e.g. CREATE DATABASE IF NOT EXISTS oximeter_test_h5f74 ON CLUSTER oximeter_cluster;)

karencfv · 2023-10-18T03:33:07Z

Heya @bnaecker 👋 😄

Just wanted to follow up on this and see if you have any thoughts on my last comment?

bnaecker

I've a few more questions / comments, but I think this is mostly good to go!

test-utils/src/dev/clickhouse.rs

bnaecker · 2023-10-24T20:04:08Z

oximeter/db/src/client.rs

@@ -610,7 +610,7 @@ impl DbWrite for Client {
    /// Wipe the ClickHouse database entirely from a replicated set up.
    async fn wipe_replicated_db(&self) -> Result<(), Error> {


So one additional note to add here, from a conversation I had with @davepacheco. It's probably a good idea in the long run to have this front-end runner have very little configuration itself, and instead get that all from Nexus. It would only need to know how to reach Nexus (via DNS) and probably an ID of some kind, and then hit an endpoint in Nexus with that ID. That request could return the configuration that node needs to run (addresses, ports, whether to start keepers and or data servers, etc.).

bnaecker · 2023-10-24T20:05:41Z

oximeter/db/src/client.rs

-    async fn test_build_replicated() {
-        let log = slog::Logger::root(slog::Discard, o!());
+        // Tests that data can be inserted via the client
+        insert_samples_test(address, &log).await.unwrap();


IIUC, a lot of this diff is moving the implementation of the tests themselves into one actual #[test] function. I was under the impression that was to work around an issue with port conflicts for the servers. Has that conflict been addressed? Do we still need to have all tests inside a single #[test]?

What I have in mind here, is to get the single node tests ready to run with the replicated cluster.

Until single node Clickhouse is removed in its totality, and the init SQL file is created dynamically, the replicated cluster tests will still need to run serially. This means that if we want to run the same tests on both single node and replicated clusters, it'll have to be this way temporarily.

In a follow up PR to this one, I will be running every one of these tests against the replicated cluster. I divided this work into two PRs so it doesn't become a monster like my last PR 😅

oximeter/db/src/client.rs

bnaecker · 2023-10-24T20:26:29Z

oximeter/db/src/client.rs

+
+        // TODO(https://github.com/oxidecomputer/omicron/issues/4001): With distributed
+        // engine, it can take a long time to sync the data. This means it's tricky to
+        // test that the data exists on both nodes.


Looks like we have some control over this from here, which might be helpful in test contexts.

Nice! I'd like to punt on this item until my follow up PR, which will deal more directly with replicated testing if that's OK with you

oximeter/db/src/client.rs

karencfv · 2023-10-25T02:41:22Z

Thanks for the review @bnaecker! I think I've addressed all of your comments, let me know what you think :)

…up (#4372) This is a follow up to #4149 . In this commit all single node tests are now run against a clustered set up. Additionally, a few bugs on the replicated init SQL file have been fixed. Note: All of the individual field and measurement tests all run under a single test now. The reason for this is test times. Each test needs to wipe out it's database before the next test runs. In a replicated installation, this means that all nodes must sync, which takes time. Before I merged all the field/measurement tests into a single one, the testing time for a replicated set up took about 10 minutes, which really is too much. Now it takes ~3.5 minutes. It is still a lot, but only a temporary measure until we can run these tests in parallel again. An unintended bonus of this approach is that running the tests this way means that you are testing that the data is being inserted in the correct table. Related: #4148 Related: #4001

karencfv added 8 commits September 8, 2023 15:49

scaffolding

4785e8f

Spin up a single ClikHouse instance for all tests and run sequentially

f8d2532

Clean up

8951a86

Ensure all tests pass on Linux

19ddc15

Revert smf config

2a67ba9

Merge branch 'main' into oximeter-test-refactor

951dc5d

Adapt newer tests to new testing format

4c2e611

Clean up

612e459

karencfv requested review from smklein and bnaecker September 27, 2023 06:28

karencfv commented Sep 27, 2023

View reviewed changes

.config/nextest.toml Outdated Show resolved Hide resolved

bnaecker reviewed Sep 27, 2023

View reviewed changes

Use nextest test-groups to run serially

20700ef

karencfv mentioned this pull request Sep 27, 2023

[ClickHouse] Refine database/table creation for single or replicated node #3982

Open

Fix port clash between single nodes and replicated nodes

7e066fc

karencfv requested a review from bnaecker September 28, 2023 02:32

bnaecker reviewed Oct 24, 2023

View reviewed changes

karencfv added 5 commits October 25, 2023 11:40

Prepare branch for merge with main

a8f9079

Merge main into oximeter-test-refactor

4e146fc

Adapt new tests to new testing format

d9143d4

Make sure logs are consistent

d410b11

Address comments

05a4503

karencfv requested a review from bnaecker October 25, 2023 02:41

bnaecker approved these changes Oct 25, 2023

View reviewed changes

karencfv added 3 commits October 26, 2023 10:11

Prepare branch for merge with main

4bd9fa1

Merge main into oximeter-test-refactor

ec87f32

Adapt new tests to new testing format and fix recent replicated sql bug

9e9ded3

karencfv merged commit 3c6ae58 into oxidecomputer:main Oct 26, 2023
15 checks passed

karencfv deleted the oximeter-test-refactor branch October 26, 2023 00:03

karencfv mentioned this pull request Oct 27, 2023

[oximeter] Port all single node ClickHouse tests to a replicated set up #4372

Merged

karencfv mentioned this pull request Nov 6, 2023

Add offline schema updates for ClickHouse #4394

Merged

plotnick mentioned this pull request Oct 10, 2024

Need to initialize a fresh ClickHouse database without restarting Oximeter #6826

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial work for replicated ClickHouse testing #4149

Initial work for replicated ClickHouse testing #4149

karencfv commented Sep 27, 2023

bnaecker commented Sep 27, 2023

bnaecker Sep 27, 2023

karencfv Sep 27, 2023

bnaecker Oct 24, 2023

karencfv commented Sep 27, 2023 •

edited

Loading

bnaecker commented Sep 27, 2023

karencfv commented Sep 28, 2023

karencfv commented Oct 18, 2023

bnaecker left a comment

bnaecker Oct 24, 2023

bnaecker Oct 24, 2023

karencfv Oct 25, 2023

bnaecker Oct 24, 2023

karencfv Oct 25, 2023

karencfv commented Oct 25, 2023

		@@ -610,7 +610,7 @@ impl DbWrite for Client {
		/// Wipe the ClickHouse database entirely from a replicated set up.
		async fn wipe_replicated_db(&self) -> Result<(), Error> {

Initial work for replicated ClickHouse testing #4149

Initial work for replicated ClickHouse testing #4149

Conversation

karencfv commented Sep 27, 2023

Structural changes

Caveat

Follow up work

bnaecker commented Sep 27, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karencfv commented Sep 27, 2023 • edited Loading

bnaecker commented Sep 27, 2023

karencfv commented Sep 28, 2023

karencfv commented Oct 18, 2023

bnaecker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karencfv commented Oct 25, 2023

karencfv commented Sep 27, 2023 •

edited

Loading