Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial work for replicated ClickHouse testing #4149

Merged
merged 18 commits into from
Oct 26, 2023

Conversation

karencfv
Copy link
Contributor

Structural changes

Instead of spinning up a ClickHouse server per test, now only two servers are spun up to run all tests against. One for single node mode and another for replicated testing. Even though it may look odd to this this now, there is a good reason for this. ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers. This means that if we spin up several servers, the replicated test becomes incredibly flaky. It's important to note that this set up will not live forever. Once we can remove single node mode and start the replicated ClickHouse cluster as part of the build process, the replicated tests will be able to run independently.

All tests now run sequentially against a single server. For correctness, after each test the database is wiped out to give us a clean slate for testing. This only adds a few more seconds (<10s) to the overall testing time. Again, this will not be the final state of things, but necessary to be able to test alongside replicated clusters until we remove single node functionality.

Caveat

Github does some really weird things with the diffs on oximeter/db/src/client.rs. Frankly, it looks like a mess. I would probably checkout the branch itself to see the final state of the tests.

Follow up work

Given the size of this PR, I've not added all of the tests to run against the replicated cluster. There will be a follow up PR that does this.

Related: #4001
Closes: #4037

.config/nextest.toml Outdated Show resolved Hide resolved
@bnaecker
Copy link
Collaborator

Thanks for starting on this @karencfv!

Can you explain in more detail what you mean when you say the tests become "flaky"? It's been really useful to have each test run against an independent database, and so I think we should try to keep that insofar as possible.

ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers.

This part is a bit confusing to me. The documentation indicates that it's possible to have replicated and non-replicated tables in the same server. Each server also watches its configuration file for changes, and so one can modify the cluster in various ways after each program starts. From that last link:

The server tracks changes in config files, as well as files and ZooKeeper nodes that were used when performing substitutions and overrides, and reloads the settings for users and clusters on the fly. This means that you can modify the cluster, users, and their settings without restarting the server.

Given all that, and the fact that serial_test doesn't work with cargo nextest, it seems like we really want to try to support multiple simultaneous clusters. One way to do that would be:

  • Start each program, keeper and data server, independently. The configuration file would not specify any details of the cluster, such as port numbers or replica IDs.
  • For each program, collect any necessary details such as OS-assigned port numbers, after the program begins.
  • Write the cluster details into the configuration file for each program.

If I'm understanding all the documentation correctly, this would let us start each server and then describe the cluster after the OS has assigned port numbers. We can avoid hard-coding anything, and keep running the tests in parallel.

@@ -610,7 +610,7 @@ impl DbWrite for Client {
/// Wipe the ClickHouse database entirely from a replicated set up.
async fn wipe_replicated_db(&self) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me realize something. I'm not sure we want the database initialization to be done by the client at all. Consider the case where we start 2 data servers, but only one instance of oximeter. The ClickHouse client that oximeter instance uses will initialize the one replica that it happens to connect to, but there's nothing to initialize the other replica.

I'm really thinking that we would like a way to manage the cluster more holistically. Rather than starting individual server programs, and relying on the client to run the SQL initializing the tables, it might help to have a notion of a ClickHouse deployment. This could describe the IP addresses and ports (if they're known); the replica IDs; and probably the SQL we want to run at startup. It might support single and multi-node, if we still want that.

I would imagine a "runner" program would be helpful here too, which would start the ClickHouse program itself; write out or update the XML based on the cluster configuration; and probably also initialize the database tables itself, again based on the cluster deployment. This would obviate needing the client to initialize the tables at all.

This is all pretty pie-in-the-sky at this point. But I'm reasonably certain we can strip down the configuration to the minimal set; write out the needed XML; and initialize the database tables, with everything derived from the cluster configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea! I'll link this comment to the relevant issue

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one additional note to add here, from a conversation I had with @davepacheco. It's probably a good idea in the long run to have this front-end runner have very little configuration itself, and instead get that all from Nexus. It would only need to know how to reach Nexus (via DNS) and probably an ID of some kind, and then hit an endpoint in Nexus with that ID. That request could return the configuration that node needs to run (addresses, ports, whether to start keepers and or data servers, etc.).

@karencfv
Copy link
Contributor Author

karencfv commented Sep 27, 2023

Thanks for the review @bnaecker!

Can you explain in more detail what you mean when you say the tests become "flaky"? It's been really useful to have each test run against an independent database, and so I think we should try to keep that insofar as possible.

ClickHouse servers that have been configured to be part of a cluster cannot be spun up in the same server as single node servers.

This part is a bit confusing to me. The documentation indicates that it's possible to have replicated and non-replicated tables in the same server. Each server also watches its configuration file for changes, and so one can modify the cluster in various ways after each program starts. From that last link:

The server tracks changes in config files, as well as files and ZooKeeper nodes that were used when performing substitutions and overrides, and reloads the settings for users and clusters on the fly. This means that you can modify the cluster, users, and their settings without restarting the server.

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up). This means that if we are spinning up several single node data servers in parallel, the moment the replicated cluster begins to be built, it may or may not start up if the single node tests are still running (this is what was happening on the ubuntu build-and-test job).

Given all that, and the fact that serial_test doesn't work with cargo nextest, it seems like we really want to try to support multiple simultaneous clusters. One way to do that would be:

Start each program, keeper and data server, independently. The configuration file would not specify any details of the cluster, such as port numbers or replica IDs.

  • For each program, collect any necessary details such as OS-assigned port numbers, after the program begins.
  • Write the cluster details into the configuration file for each program.

If I'm understanding all the documentation correctly, this would let us start each server and then describe the cluster after the OS has assigned port numbers. We can avoid hard-coding anything, and keep running the tests in parallel.

I would absolutely love to do that, and thought of several approaches, but the matter here is that the moment we port all of the current tests to replicated clusters the test times are going to go through the roof. Unlike single node servers, The ClickHouse cluster takes much longer to spin up. Because of this, it's not as feasible to have a bunch of clusters being spun up. My plan for this is the following:

  1. Until we can remove all single node functionality, spin up only two ClickHouse deployments (one single node, one clustered) and run tests serially against them.
  2. When we remove single node functionality, spin up a single ClickHouse cluster as a pre-test task which we can run all tests against in parallel if possible.

What do you think?

@bnaecker
Copy link
Collaborator

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up).

Do we understand what's actually causing the servers that are part of the cluster to start slowly?

Unlike single node servers, The ClickHouse cluster takes much longer to spin up

How much longer? The real question I think is whether the additional startup time is more than the cost of running the tests serially against a single cluster.

@karencfv
Copy link
Contributor Author

The thing is, it's the servers themselves that clash. Single node Clickhouse servers cannot run alongside those that are part of a cluster on the same host (I've tried this manually and the clustered servers just hang and don't start up).

Do we understand what's actually causing the servers that are part of the cluster to start slowly?

Ha! There was a rogue port that is being set up automatically in the single node servers that I can tweak in the clustered config. The two tests don't need to run serially anymore.

Unlike single node servers, The ClickHouse cluster takes much longer to spin up

How much longer? The real question I think is whether the additional startup time is more than the cost of running the tests serially against a single cluster.

Quite a bit, single node takes ~0.537s to start up, whereas a cluster takes ~3.128s as it checks that all keeper/server nodes are ready and listening for connections.

Additionally, I just realised that we would have an issue with ports as well. While we can have OS assigned http ports for single node servers, clustered servers have several other ports that we need to define and share across configs (TCP, interserver port, keeper ports etc). I can see this port allocation becoming quite complex and potentially unreliable if it were to be set dynamically.

That said, I believe we can find a way to run all of these tests in parallel once we work on #3982. When the init SQL is dynamically created, it will give us greater flexibility to test. For example: To avoid consistency issues, I can see us using a different database name for each test (e.g. CREATE DATABASE IF NOT EXISTS oximeter_test_h5f74 ON CLUSTER oximeter_cluster;)

@karencfv karencfv requested a review from bnaecker September 28, 2023 02:32
@karencfv
Copy link
Contributor Author

Heya @bnaecker 👋 😄

Just wanted to follow up on this and see if you have any thoughts on my last comment?

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've a few more questions / comments, but I think this is mostly good to go!

test-utils/src/dev/clickhouse.rs Outdated Show resolved Hide resolved
test-utils/src/dev/clickhouse.rs Outdated Show resolved Hide resolved
@@ -610,7 +610,7 @@ impl DbWrite for Client {
/// Wipe the ClickHouse database entirely from a replicated set up.
async fn wipe_replicated_db(&self) -> Result<(), Error> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one additional note to add here, from a conversation I had with @davepacheco. It's probably a good idea in the long run to have this front-end runner have very little configuration itself, and instead get that all from Nexus. It would only need to know how to reach Nexus (via DNS) and probably an ID of some kind, and then hit an endpoint in Nexus with that ID. That request could return the configuration that node needs to run (addresses, ports, whether to start keepers and or data servers, etc.).

async fn test_build_replicated() {
let log = slog::Logger::root(slog::Discard, o!());
// Tests that data can be inserted via the client
insert_samples_test(address, &log).await.unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, a lot of this diff is moving the implementation of the tests themselves into one actual #[test] function. I was under the impression that was to work around an issue with port conflicts for the servers. Has that conflict been addressed? Do we still need to have all tests inside a single #[test]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I have in mind here, is to get the single node tests ready to run with the replicated cluster.

Until single node Clickhouse is removed in its totality, and the init SQL file is created dynamically, the replicated cluster tests will still need to run serially. This means that if we want to run the same tests on both single node and replicated clusters, it'll have to be this way temporarily.

In a follow up PR to this one, I will be running every one of these tests against the replicated cluster. I divided this work into two PRs so it doesn't become a monster like my last PR 😅

oximeter/db/src/client.rs Outdated Show resolved Hide resolved
oximeter/db/src/client.rs Outdated Show resolved Hide resolved
Comment on lines +2441 to +2444

// TODO(https://github.com/oxidecomputer/omicron/issues/4001): With distributed
// engine, it can take a long time to sync the data. This means it's tricky to
// test that the data exists on both nodes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have some control over this from here, which might be helpful in test contexts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I'd like to punt on this item until my follow up PR, which will deal more directly with replicated testing if that's OK with you

oximeter/db/src/client.rs Show resolved Hide resolved
@karencfv
Copy link
Contributor Author

Thanks for the review @bnaecker! I think I've addressed all of your comments, let me know what you think :)

@karencfv karencfv requested a review from bnaecker October 25, 2023 02:41
@karencfv karencfv merged commit 3c6ae58 into oxidecomputer:main Oct 26, 2023
15 checks passed
@karencfv karencfv deleted the oximeter-test-refactor branch October 26, 2023 00:03
karencfv added a commit that referenced this pull request Oct 31, 2023
…up (#4372)

This is a follow up to
#4149 . In this commit all
single node tests are now run against a clustered set up. Additionally,
a few bugs on the replicated init SQL file have been fixed.

Note:

All of the individual field and measurement tests all run under a
single test now. The reason for this is test times. Each test needs to
wipe out it's database before the next test runs. In a replicated
installation, this means that all nodes must sync, which takes time.
Before I merged all the field/measurement tests into a single one, the
testing time for a replicated set up took about 10 minutes, which really
is too much. Now it takes ~3.5 minutes. It is still a lot, but only a
temporary measure until we can run these tests in parallel again. An
unintended bonus of this approach is that running the tests this way
means that you are testing that the data is being inserted in the
correct table.

Related: #4148
Related: #4001
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants