Revised integration test framework #12368

paul-rogers · 2022-03-25T21:54:56Z

Description

Issue #12359 proposes an approach to simplify and streamline integration tests, especially around the developer experience, but also for Travis. See that issue for the background.

This PR is big, but most of that comes from creating revised versions of existing files. Unfortunately, there is no good way using GitHub to compare two copies of the same file. For the most part, these are config files and you can assume that the new versions work (because, when they didn't, the cluster stubbornly refused to start or stay up.)

Developer Experience

With this framework, it is possible to:

Do a normal distribution build.
Build the Docker image in less than a minute. (Most of that is Maven determining what not to do. After the first build, you can use a script to rebuild in a few seconds, depending on what Docker must rebuild.)
Launch the cluster in a few seconds.
Debug an integration test as a JUnit test in your favorite IDE.

The result is that integration tests go from being a nightmare to being an efficient way to develop and test code changes. This author used it to create tests for PR #12222. The process was quick and easy. Not as efficient as just using unit tests (we still want the single-process server), but still pretty good. (By contrast, the new tests were ported to the existing framework, and that is still difficult for the reasons we're trying to address here.)

One huge win is that, with this approach, one can start a Docker cluster and leave it up indefinitely to try out APIs, to create or refactor tests, etc. Though there are many details to get right to use Docker and Docker Compose, once those are addressed, using the cluster becomes quite simple and productive.

Contents of this First Cut

This PR is a first draft of the approach which provides:

A new directory, integration-tests-ex that holds the new integration test structure. (For now, the existing integration-tests is left unchanged.)
Maven module druid-it-tools to hold code placed into the Docker image.
Maven module druid-it-image to build the Druid-only test image from the tarball produced in distribution. (Dependencies live in their "official" image.)
Maven module druid-it-cases that holds the revised tests and the framework itself. The framework includes file-based test configuration, test-specific clients, test initialization and updated versions of some of the common test support classes.

The integration test setup is primarily a huge mass of details. This approach refactors many of those details: from how the image is built and configured to how the Docker Compose scripts are structured to test configuration. An extensive set of "readme" files explains those details. Rather than repeat that material here, please consult those files for explanations.

Limitations

This version is the result of several months of iteration to work out details around builds on various systems. The framework itself is now pretty solid, as is the Druid image. This PR includes two converted tests, and lessons from several others which are in-flight. We expect to refine the framework as we create and convert other tests.

For now, the new framework is intended to exist parallel to the current one so we experiment. The new framework is ignored unless you select the Maven profiles which enable it. (See the docs for details.) Eventually we will retire the integration-tests versions in favor of the integration-tests-ex versions, but we will do so only after the new versions are rock-solid.

There are many other test groups not yet touched. A good approach is to use this framework for new integration tests, and to convert old ones when someone needs to modify them. The cost of converting to this framework is low, and the productivity gain is large.

Other limitations include:

The original tests appear to run not only in Docker, but also against a local QuickStart cluster and against Kubernetes. Neither of these other two modes have been tested in the new framework. (Though, it is now so easy to start and use a Docker cluster that that it may be easier to use Docker than the QuickStart cluster.)
The original tests always have security enabled. While it is important to test security, having security enabled makes debugging far harder (by design.) So, this draft has security disabled. The various scripts and configs are pulled aside. The thought is to enable security as an option when needed, and run without it when debugging things other than the security mechanism.
The supporting classes have the basics, but have been used for only the one integration test group.
This framework is not yet integrated into Travis. A test that exists only in the new framework won't run in the Travis build. We have a working version of the Travis build in a private branch, but that step will be commented out in this PR prior to merge; we'll enable Travis builds as a separate PR as we transition old tests to the new framework.

Next Steps

This PR itself will continue to evolve as some of the final details are sorted out. However, it is at the stage where it will benefit from others taking a look and making suggestions.

The thought is that this PR is large enough already: let's get it reviewed, then tackle the additional issues listed above as the opportunity arrises and step-by-step.

Alternatives

The approach in the PR is based on the existing approach, but re-arranges the parts. Since the integration test are pretty much "nothing but details", there are many approaches that could be taken. Here are a few that were considered.

Run the tests as-is in an AWS instance. Because the tests are very difficult to run on a developer machine, many folks set up an AWS instance to run them. While this can work, it is slow: one has to shuffle code from the laptop to the instance and back. Or, just do development on the instance. The tests are not really set up for debugging, so even on the instance, it is still tedious to make and debug test changes.
Run the tests in Travis as part of a PR. This is the default approach. However, it is akin to the development process of old: submit the changes to a batch run, wait many hours for the answers, plow though the logs, find issues, fix them, and repeat. That process was not efficient in the era of punch cards, and is still not very efficient today. A turnaround of a minute or less is the garget, which Travis approach cannot provide.
Modify the existing integration tests. This is the obvious approach. But, the set of existing ITs is so large that attempting to change everything in one go becomes overwhelming. The chosen approach allows incremental test-by-test conversion without breaking the great mass of existing tests.
Status-quo. I'm working on a project that requires many integration tests. It is faster to fix the test framework once, and do the tests quickly, than to fight with the framework for each of the required tests.

That said, this PR is all about details. Your thoughts, suggestions and corrections are encouraged to ensure we've got our bases covered.

Detailed Changes

A number of specific changes are worth calling out that do not appear in the docs.

The tests use Guice to create various Druid objects. However, they do not use the Druid extension mechanism: the tests don't have visibility to a Druid installation. Instead, any required extensions are expected to appear as normal jars on the class path. That is, they should be listed in the pom.xml file as dependencies.
Tests don't have access to the usual runtime.properties file. Instead, properties come from a new docker.yaml configuration file, from a binding to environment variables, or from command line options. Of these, docker.yaml is preferred for fixed or default properties, environment variables for properties (such as credentials) that vary per run. Avoid use of the command line as that makes test hard to debug in an IDE.
The tests use "official" Docker images for dependencies such as MySQL, ZooKeeper and Kafka. A solution for Hadoop is under investigation.
A custom DruidTestRunner provides a way to add test-specific Guice modules, along with other configuration.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster (in the sense that this PR is for running such a cluster in Docker.)

paul-rogers · 2022-03-28T17:30:33Z

This is a big PR. It is all-in-one so folks can see the whole picture. If it helps, this can be broken into smaller chunks, at loss of overall context.

Here's a quick summary of what's new vs. what's refactored:

docker-tests/docs is all new and is meant to capture all the learnings from this exercise, along with information needed to move froward. This is the main resource for understanding this PR.
docker-tests project is new so it does not conflict with the existing integration-tests project: both can exist.
docker-tests/base-test is mostly new. It contains the revised test config code and a new cluster client.
- ClusterConfig is the YAML-based config mechanism.
- Initializer has a bunch of complexity to force the server-minded Guice config to work in a client. Loads the DB. Etc.
docker-tests/test-image is a greatly refactored set of Docker build scripts. The Dockerfile is heavily refactored to remove the third-party dependencies and rearrange how Druid is laid out (unpacked from the distribution tarball). DB setup is removed.
docker-tests/testing-tools is mostly a copy/paste of extensions-core/testing-tools with the custom node role from integration-tests added.
docker-tests/high-availability is a refactor of one test from integration-tests. The Docker Compose script is specific to this one test, refactored from those in integration-tests. The idea is that this test contains just the files for this "group". Other groups will follow this pattern.
Other files are mostly clean-up uncovered while debugging. In some cases, code was refactored so the test "clients" could use code that was previously tightly coupled with the server.
yaml files: refactor of Docker Compose with new test config.

Also, as noted before, this PR moves authorization aside into separate files. Authorization is not yet enabled.

paul-rogers · 2022-04-04T22:49:17Z

Continuing to whack build details. Who knew that each task did a pre-build before the build in which the pre-build builds everything except distribution. This cases test-image to fail when looking for the non-existent distribution dependency. Using a bit of profile magic to hind this dependency from the pre-build.

paul-rogers · 2022-04-09T02:49:50Z

Sorry, had to do some major surgery to the Maven module structure, which required renaming the modules and their directories. See the description in maven.md.

Other than that, only minor tweaks as I try to run the gauntlet of the zillions of checks run on the code.

paul-rogers · 2022-04-11T22:58:29Z

The new IT task passed, hooray! Whacked a few more static checking issues.

There is one I don't understand. It appears that we've got JS problems, but I didn't change anything in JS:

added 235 packages from 867 contributors and audited 235 packages in 12.562s
found 4 vulnerabilities (2 moderate, 1 high, 1 critical)
  run `npm audit fix` to fix them, or `npm audit` for details
events.js:183
      throw er; // Unhandled 'error' event
      ^
TypeError: Cannot read property 'forEach' of undefined
    at unpackage (/home/travis/build/apache/druid/node_modules/jacoco-parse/source/index.js:27:14)
    at /home/travis/build/apache/druid/node_modules/jacoco-parse/source/index.js:114:22
    at Parser.<anonymous> (/home/travis/build/apache/druid/node_modules/xml2js/lib/parser.js:304:18)
    at emitOne (events.js:116:13)
    at Parser.emit (events.js:211:7)
    at SAXParser.onclosetag (/home/travis/build/apache/druid/node_modules/xml2js/lib/parser.js:262:26)
    at emit (/home/travis/build/apache/druid/node_modules/sax/lib/sax.js:624:35)
    at emitNode (/home/travis/build/apache/druid/node_modules/sax/lib/sax.js:629:5)
    at closeTag (/home/travis/build/apache/druid/node_modules/sax/lib/sax.js:889:7)
    at SAXParser.write (/home/travis/build/apache/druid/node_modules/sax/lib/sax.js:1436:13)
    at Parser.exports.Parser.Parser.parseString (/home/travis/build/apache/druid/node_modules/xml2js/lib/parser.js:323:31)
    at Parser.parseString (/home/travis/build/apache/druid/node_modules/xml2js/lib/parser.js:5:59)
    at exports.parseString (/home/travis/build/apache/druid/node_modules/xml2js/lib/parser.js:369:19)
    at Object.parse.parseContent (/home/travis/build/apache/druid/node_modules/jacoco-parse/source/index.js:107:5)
    at /home/travis/build/apache/druid/node_modules/jacoco-parse/source/index.js:129:15
    at FSReqWrap.readFileAfterClose [as oncomplete] (fs.js:511:3)
****FAILED****

Is this saying that the build itself has broken code? If so, maybe it will go away on the next build?

paul-rogers · 2022-04-13T16:51:26Z

Rebased on latest master to try to fix the prior issue. Unfortunately, the issue didn't resolve.

Now getting a different unrelated failure:

[ERROR] org.apache.druid.query.groupby.epinephelinae.BufferHashGrouperTest.testGrowingOverflowingInteger  Time elapsed: 0.003 s  <<< ERROR!
java.lang.OutOfMemoryError
	at sun.misc.Unsafe.allocateMemory(Native Method)
	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:127)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
	at org.apache.druid.query.groupby.epinephelinae.BufferHashGrouperTest.makeGrouper(BufferHashGrouperTest.java:187)

paul-rogers · 2022-04-14T05:15:07Z

New commit. We have to exclude test test code from Jacoco since it is not unit tested. That was painful because the test classes were in "generic" Druid packages. Moved the test code into a dedicated package so we can just exclude that one package.

Migrated the remainder of the batch index tests. This showed some redundancy in the required test code, so created a test runner to hide that boilerplate. Test conversion is now very easy -- at least for the sample of tests converted thus far.

Also includes other minor doc changes and build issue fixes.

kfaraz

Thanks for re-organizing the content, @paul-rogers . It's much easier to follow now.
I have given a partial feedback, mostly minor nitpicks/suggestions.

I am going through the rest of it and will try to finish my review soon.

docker-tests/README.md

core/src/main/java/org/apache/druid/metadata/MetadataStorageConnectorConfig.java

kfaraz · 2022-04-16T04:41:30Z

integration-tests/src/main/java/org/apache/druid/testing/IntegrationTestingConfig.java

@@ -23,6 +23,19 @@
 import java.util.Map;

 /**
+ * Configuration for tests. Opinionated about the shape of the cluster:


Thanks for adding these!

integration-tests/src/main/java/org/apache/druid/testing/IntegrationTestingConfig.java

kfaraz · 2022-04-16T04:46:58Z

server/src/main/java/org/apache/druid/curator/CuratorModule.java

-  @LazySingleton
-  @SuppressForbidden(reason = "System#err")
-  public CuratorFramework makeCurator(ZkEnablementConfig zkEnablementConfig, CuratorConfig config, EnsembleProvider ensembleProvider, Lifecycle lifecycle)
+  public static CuratorFramework createCurator(CuratorConfig config, EnsembleProvider ensembleProvider)


Nit: I guess this method can be private now. Also, does it need to be static?

This is a tricky one. The original code creates a curator framework via a Guice provider. We have to keep that as that's what Druid services require.

The test code wants to use ZK, via curator, but without Guice, since Guice adds extra complexity. It turns out that what we want is to use the builder from the two config items. Rather than copy/paste that code, this refactoring makes it available outside of Guice.

The new static method have to be public so tests can reach them. I believe that the existing instance methods have to also be public so Guice can call them.

pom.xml

kfaraz · 2022-04-17T04:25:17Z

pom.xml

+                        <exclude>org/apache/druid/server/coordination/ServerManagerForQueryErrorTest.class</exclude>
+                        <exclude>org/apache/druid/guice/SleepModule.class</exclude>
+                        <exclude>org/apache/druid/guice/CustomNodeRoleClientModule.class</exclude>
+                        <exclude>org/apache/druid/cli/CustomNodeRoleCommandCreator.class</exclude>
+                        <exclude>org/apache/druid/cli/QueryRetryTestCommandCreator.class</exclude>


Now that you have put them in a separate package, I guess these exclusions are not needed anymore?

These are now for the "old" versions. Added a comment to clarify.

docker-tests/it-base/src/test/java/org/apache/druid/testing2/cluster/ClusterClient.java

docker-tests/it-base/src/test/java/org/apache/druid/test/TestClusterConfig.java

kfaraz · 2022-04-18T06:25:26Z

pom.xml

@@ -1505,6 +1525,7 @@
                            <!--@TODO After fixing https://github.com/apache/druid/issues/4964 remove this parameter-->
                            -Ddruid.indexing.doubleStorage=double
                        </argLine>
+                        <skipTests>${skipUTs}</skipTests>


Is this okay? Wouldn't this end up skipping all tests and not just UTs?

Added a comment. Surefire runs the UTs. It's sister plugin, Failsafe, runs the ITs. Here, we want to skip the Surefire tests only. Let me know if the comment makes this clear, else I'll add to it.

kfaraz

This is really great work, @paul-rogers !
Overall it looks great to me!
I ran the tests

it's pretty easy to build, start, stop the docker cluster now
running single tests from the IDE is also a lot of help
tests can easily be run and re-run any number of times (depending on whether they are performing the teardown cleanup)
logs etc are populated properly in the target/shared folders

I have some questions/requests:

As all the tests would now run in a single maven command, would there be a way to retry only failed tests after a failure happens in the first run?
The documents are fairly well detailed, but they seem to be more from the point of view of implementation details rather than usage. Given the size of this, it would be nice to have a usage doc which just lists out the steps (or points to another doc) for typical actions: writing a new test group, migrating a test group from old ITs, configuring the cluster for a test, debugging a test, running all tests, running all tests of a group, etc. Most of this stuff is already there but spread out.
As we start to migrate the existing tests, test flakiness is something we would need to detect and address. What would be an approach to do that? (maybe we could add a section in conversion.md for tips and pitfalls)

docker-tests/docs/docker.md

docker-tests/README.md

docker-tests/docs/debugging.md

kfaraz · 2022-04-21T05:40:04Z

...h-availability/src/test/java/org/apache/druid/testsEx/leadership/ITHighAvailabilityTest.java

+import org.apache.druid.testing.IntegrationTestingConfig;
+import org.apache.druid.testing.guice.TestClient;
+import org.apache.druid.testing.utils.SqlTestQueryHelper;


There seem to be some imports from the original integration-tests. Do we want to retain these as is?

I guess this is why there is a dependency on druid-integration-tests in the pom.xml for this test group.

Right. The thought is to reuse the original code where possible. The classes that migrated to the new page are for the cases where something in the code needed to change, typically something about configuration. It is a bit awkward that all the IT "framework" code is mixed in with the actual IT tests in the old structure: that's why we have to depend on the entire druid-integration-tests module.

If we can migrate all the tests, then we can merge the old and new files to create a single base project.

docker-tests/docs/tests.md

kfaraz · 2022-04-22T03:02:34Z

docker-tests/it-batch-index/cluster.sh

+# Starts the test-specific test cluster using Docker compose using
+# versions and other settings gathered when building the images.
+
+SCRIPT_DIR=$(cd $(dirname $0) && pwd)


As I see it, the contents of cluster.sh would be the same for every test group. Only the cluster config changes. Is it possible to avoid the duplication of the cluster.sh?

Copy the cluster.sh script from an existing test. Add lines to copy
any test-specific files into the target/shared folder.

I see this mentioned in docs/conversion.md as something that might prevent this. Just guessing here, but couldn't that be done in some other way, say by putting them in src/test/resources/shared?

Yes, I've been thinking about how to do this. The two "groups" we have now ended up needing the same setup. I'm waiting to see how a few more groups work out to determine if these are special cases, or if the script really does end up being the same. If the same, we can bump it to the parent directory.

paul-rogers · 2022-05-06T21:58:51Z

@kfaraz, thank you for your thorough review, and for trying out the new setup. Always great to know it runs on a machine other than my own!

You mentioned flaky test and how to retry them. Two thoughts on that.

First, we should not have flaky tests. IMHO, such tests either:

Are flaky because they start running before the cluster is stable,
Are not telling us anything if the test themselves are flaky (because they depend on timing, or on behavior which is inherently non-deterministic, such as the ordering of events from different services.)
Are point out actual issues with Druid: that clients would have to retry operations. We should either a) fix that issue, or b) document it. Either way, the tests should be prepared for whatever race or non-deterministic condition is in question.

The new framework eliminates the first issue. The framework ensures that services are ready before launching tests. This means that either the test or Druid is flaky. Either way, we should fix he issue: remove the test if it is not useful, else fix it or fix Druid (perhaps adding a way to synchronize when needed for testing.)

paul-rogers · 2022-05-06T22:07:20Z

All that said, there is the second issues: rerunning specific tests. This is a harder issue than one would think.

The reason to combine tests is that, in this new system, the bulk of the time for each "test group" is consumed with building Druid. If we keep the tests split up, we end up rebuilding Druid over and over and over. Allowing retries means retaining our current extravagant abuse of Travis resources.

The obvious solution to the redundancy issue is to build Druid and the image once, then run all the test groups that use that particular configuration. Since we have multiple configurations, the various configurations would still run in parallel, but the test "groups" would run in series within each configuration.

Of course, if we retain flaky tests, then we want to play "whack-a-mole" in builds: keep rerunning only those tests that failed until we get lucky and they pass. By combining tests, we decrease the probability of getting lucky. As mentioned above, the obvious answer is to fix the flaky tests, we we are starting to do.

Another constraint is how Travis seems to work. We can only rerun jobs within Travis's build matrix. It does not seem we can parameterize the job to say, "just run the ITs, with only these two projects." To be able to rerun one test "group" we have to let each group (for each configuration) build all of Druid, which gets us back to the redundancy issue.

Short term, I'm thinking to do an experiment in which each test "group" is triggered by a separate Maven profile. We can then also have an "all-its" profile that enables all the groups. Until we resolve flaky tests, we can opt to waste resources and build profile-by-profile (that is, group-by-group) as we do today. Later, when tests are fixed (or if we identify groups which are not flaky), we can combine them via profiles.

I'll try that in a separate commit so I can easily back it out if it does not work out.

paul-rogers · 2022-05-06T22:14:39Z

@kfaraz, good point on the docs. Yes, the docs started as my attempt to remember the many details of how the original tests worked, and what I changed as I created this new version. Per your suggestion, I created a quickstart.md page with usage info. We can expand that as we figure out what additional information is most often needed. I added links into the more detailed docs for when someone needs more information.

The idea on conversion is to try out a few groups here, then convert the others over time. I was perhaps lucky: the groups I converted so far mostly "just worked." I've encountered no flakiness in those tests, in my limited runs, after I made sure the cluster was up before running the tests.

We'll have to see, as we convert more, if the others are as easy, or if there will be tricky bits. If there are test that are still flaky, we'll have to sort that out on a case-by-case basis, as suggested above.

Let's also remember that there there is a big chunk of work not addressed in this PR: running a secured cluster. There is code in the old ITs to create certs, configure security, etc. Tests run that way will be very difficult to debug, by definition. That whole areas is left as an open issue in this PR, in part because this one is already large enough.

paul-rogers · 2022-05-16T22:12:48Z

This branch has been open long enough that it drifted out of sync with master. Rebasing ran into the usual issues when a zillion files change. So, squashed commits so the rebase would succeed. Fortunately, the squashed commits are those that have already been reviewed, no additional changes were made before squashing occurred. New changes show up as new commits on top of the squash. In this latest commit, updated the project from 0.23.0 to 0.24.0 so that the builds will work.

paul-rogers · 2022-05-19T20:00:32Z

Getting an odd failure:

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (process-resource-bundles) 
on project it-high-availability: 
Error resolving project artifact: 
Could not transfer artifact io.confluent:kafka-schema-registry-client:pom:5.5.1 
from/to sigar (https://repository.jboss.org/nexus/content/repositories/thirdparty-uploads/): 
Transfer failed for https://repository.jboss.org/nexus/content/repositories/thirdparty-uploads/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom 
for project io.confluent:kafka-schema-registry-client:jar:5.5.1: 
peer not authenticated -> [Help 1]

First, the project on which this fails does not include the artifact. Second, the project that does use it already built, so the artifact should be cached locally. Third,, why is the peer not authenticated?

kfaraz

Thanks for addressing the comments, @paul-rogers .
+1 after CI passes.

You might have mentioned this already but could you please confirm the phase of Travis CI where these new docker-tests would be executed?

kfaraz · 2022-05-23T07:01:55Z

.travis.yml

@@ -73,6 +73,19 @@ jobs:
      stage: Tests - phase 1
      script: ${MVN} animal-sniffer:check --fail-at-end

+    # Experimental run of the revised ITs. Done early to debug issues
+    # Move later once things work.
+    - name: "experimental docker tests"


Do these have to be moved to a later stage before we can merge this PR?

I went ahead and disabled this step for now: it has done its task of proving that the new ITs work in Travis. We'll revisit as we decide how to migrate from the existing IT groups to the new ones.

paul-rogers · 2022-05-23T20:13:10Z

@kfaraz, thanks for the review. It's been a long slog to resolve the many Maven issues with all our many static checks.

You asked about the "experimental docker tests" task in this PR. Yes, it is experimental: I'll remove (or disable) it before we commit. For now, I envision we won't run the tests in the maven build since they duplicate existing tests. Instead, a good next step would be to migrate each IT one by one: convert it to the new framework, replace the current IT tasks with a new version (based on the "experimental" one), and verify the results.

The plan is to get a clean build. Once that is done, I'll remove the experimental step and we can commit this monster.

As we move ahead, the new framework will run in phase 2, in place of the existing items. During the interim, we can mix-and-match mechanisms: the Travis builds are all independent of one another. That is a problem in general (we redo the same work over and over) but turns out to be a help during the transition.

paul-rogers · 2022-05-23T21:01:53Z

Currently trying to track down a mysterious error. In `it-high-availability, Maven is unable to find a particular jar file. Looks like it works one time (in Java 8), but fails another time (in Java 11):

[INFO] Downloading from google-maven-central: https://maven-central.storage-download.googleapis.com/maven2/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom
[INFO] Downloading from sonatype: https://oss.sonatype.org/content/repositories/releases/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom
[INFO] Downloading from sonatype-apache: https://repository.apache.org/content/repositories/releases/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom
[INFO] Downloading from apache.snapshots: https://repository.apache.org/snapshots/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom
[INFO] Downloading from sigar: https://repository.jboss.org/nexus/content/repositories/thirdparty-uploads/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (process-resource-bundles) on project it-high-availability: 
Error resolving project artifact: 
Could not transfer artifact io.confluent:kafka-schema-registry-client:pom:5.5.1 from/to sigar (https://repository.jboss.org/nexus/content/repositories/thirdparty-uploads/): 
Transfer failed for https://repository.jboss.org/nexus/content/repositories/thirdparty-uploads/io/confluent/kafka-schema-registry-client/5.5.1/kafka-schema-registry-client-5.5.1.pom for project io.confluent:kafka-schema-registry-client:jar:5.5.1: 
peer not authenticated -> [Help 1]

This is pushing against the edge of my Maven knowledge. I'm hoping it is just something silly I'm doing that shows up as the above mysterious error. Anyone seen something similar? For example, did the first attempt above succeed? Or, is the error a failure in that attempt, but the error is reported later? Or, is Maven trying to get the jar twice, the first worked, but the second failed?

Seems the 5.5.1 release is still available, so that isn't a problem. (It is old, and has vulnerabilities, so we should probably upgrade.)

This seems to be a transitive dependency brought in from druid-integration-tests. Yet, that module seems to compile. Still scratching my head...

paul-rogers · 2022-06-01T23:54:55Z

Rebased on latest master to try to overcome the perpetual "confluent" jar errors. Let's see if this let's us get a clean build.

paul-rogers · 2022-06-03T17:06:36Z

Sad. This latest run should have worked, but it seems there are issues on Travis with finding the JRE. Sigh. We have to wait for those issues to be resolved, then can some committer please restart the build.

paul-rogers · 2022-06-17T18:15:13Z

Getting a clean build is proving quite difficult. Out of desperation, we'll pull two groups of changes out of this PR into separate PRs so the build issues are easier to debug. In particular, it is hard at present to separate out actual errors in the "old" ITs from the flaky ITs. Let's get those other two PRs done, then we'll rebase this on the updated master so that only the new IR code remains. That way, if an old IT fails, we'll have some confidence that it is just flaky, not broken.

This commit contains the cleanup needed for the new integration test framework. Changes: - Fix log lines, misspellings, docs, etc. - Allow the use of some of Druid's "JSON config" objects in tests - Fix minor bug in `BaseNodeRoleWatcher`

This commit contains changes made to the existing ITs to support the new ITs. Changes: - Make the "custom node role" code usable by the new ITs. - Use flag `-DskipITs` to skips the integration tests but runs unit tests. - Use flag `-DskipUTs` skips unit tests but runs the "new" integration tests. - Expand the existing Druid profile, `-P skip-tests` to skip both ITs and UTs.

Restructuring of the integration tests to make the code simpler and much easier for developers to use on their own machine. * New project, docker-tests for this work * Project to build the test Docker image * Projects per test group * Restructuring of configuration * Two example test groups For now, this version exists parallel to the original version.

* Moved test code to a single module, as previously * Initialization handles customization * Shared cluster config across categories * Revised to use JUnit categories * Rebased on latest master * Corresponding doc updates

Support for all of the IntegrationTestingConfig properties Wrapper script for IT operations Additional documentation Build fix

Improved env var-based configuration Test createion guide Enable new ITs in travis Parameterized tests

paul-rogers · 2022-08-20T00:02:37Z

Revised to prepare for merge:

Parameterized tests
Test creation guide
Main IT script: it.sh
Enhanced configuration options: env vars, etc.
Test runner supports parameterized tests
Test runner allows test-specific code configuration (add Guice modules, etc.)
Other cleanup, bug fixes

paul-rogers · 2022-08-22T02:40:46Z

The latest PR converted two ITs to use the new framework. Both pass in Travis. (And there was much rejoicing.)

However, two of the old ITs fail for obscure reasons. A security test fails with an auth failure, but the same test was clean in a prior build. Another IT can't find its input file, though this PR changes none of the input files or paths in the old tests. These are not the usual flaky IT suspects. So we probably have to sort out what's-what. Once we do, this should be good to go.

paul-rogers · 2022-08-24T00:52:19Z

@kfaraz, thanks for your approval of this PR. The final changes are in for this PR and the build is clean. Please take a quick final look, and merge the PR at your convenience.

LakshSingla · 2022-08-24T05:15:20Z

integration-tests-ex/cases/pom.xml

+
+    <profiles>
+        <profile>
+          <id>IT-HighAvailability</id>


stylistic nit: I think that the indentation looks a bit off in the <profile>...</profile> tags.

kfaraz · 2022-08-24T05:18:23Z

Thanks for the update, @paul-rogers ! I will take another look today and merge this.

paul-rogers mentioned this pull request Mar 28, 2022

Simplify and Accelerate Integration Tests #12359

Open

clintropolis added Area - Testing Area - Dev For items related to the project itself, like dev docs and checklists, but not CI labels Mar 28, 2022

paul-rogers force-pushed the 220325-docker branch from 5787848 to ed9db43 Compare April 12, 2022 23:15

kfaraz reviewed Apr 18, 2022

View reviewed changes

kfaraz reviewed Apr 22, 2022

View reviewed changes

paul-rogers force-pushed the 220325-docker branch from dc800ea to 0d5d227 Compare May 16, 2022 22:10

kfaraz approved these changes May 23, 2022

View reviewed changes

paul-rogers mentioned this pull request Jun 1, 2022

Move web-console dependency declaration from druid-server to druid-distribution #12501

Merged

9 tasks

paul-rogers force-pushed the 220325-docker branch from 4b8d802 to af6a3c8 Compare June 1, 2022 23:54

paul-rogers force-pushed the 220325-docker branch from 7b55fb6 to 87db1a2 Compare June 13, 2022 23:36

This was referenced Jun 14, 2022

Foundation for the Druid metadata catalog #12647

Closed

Allow extension services to be discovered #12222

Closed

This was referenced Jun 17, 2022

Fixing http entity test case #12668

Closed

[Flaky unit test] Adding file based uri. #12671

Merged

paul-rogers mentioned this pull request Jun 17, 2022

IT-related changes pulled out of PR #12368 #12673

Merged

3 tasks

paul-rogers mentioned this pull request Jun 22, 2022

IT Conversion Tasks #12689

Closed

paul-rogers added a commit to paul-rogers/druid that referenced this pull request Jun 24, 2022

IT-related changes pulled out of PR apache#12368

76b5b47

paul-rogers marked this pull request as draft June 25, 2022 18:43

paul-rogers mentioned this pull request Jun 25, 2022

Docker build for the revised ITs #12707

Merged

4 tasks

paul-rogers force-pushed the 220325-docker branch from 2a0c56f to e62a56c Compare August 12, 2022 16:59

Restructured tests based on experience thus far

d50fe25

* Moved test code to a single module, as previously * Initialization handles customization * Shared cluster config across categories * Revised to use JUnit categories * Rebased on latest master * Corresponding doc updates

paul-rogers force-pushed the 220325-docker branch from e62a56c to d50fe25 Compare August 14, 2022 05:48

MySQL config fixes

b5c1f74

paul-rogers changed the title ~~First cut at restructuring the integration tests~~ Revised integration test framework Aug 16, 2022

Refine the IT framework

ac14060

Support for all of the IntegrationTestingConfig properties Wrapper script for IT operations Additional documentation Build fix

paul-rogers marked this pull request as ready for review August 19, 2022 22:05

Final revsions

e034f60

Improved env var-based configuration Test createion guide Enable new ITs in travis Parameterized tests

paul-rogers force-pushed the 220325-docker branch from a1a3687 to e034f60 Compare August 20, 2022 00:00

paul-rogers added 2 commits August 19, 2022 21:07

Disabled flaky tests to get a clean build

7e98bdb

Build fixes

754ba9c

Merge fix

9ac998f

LakshSingla reviewed Aug 24, 2022

View reviewed changes

kfaraz merged commit cfed036 into apache:master Aug 24, 2022

This was referenced Aug 24, 2022

Update Curator to 5.3.0 #12939

Merged

Auto-reload TLS certs for druid endpoints #12933

Merged

abhishekagarwal87 added this to the 24.0.0 milestone Aug 26, 2022

Revised integration test framework #12368

Revised integration test framework #12368

Conversation

paul-rogers commented Mar 25, 2022 • edited Loading

Description

Developer Experience

Contents of this First Cut

Limitations

Next Steps

Alternatives

Detailed Changes

paul-rogers commented Mar 28, 2022

paul-rogers commented Apr 4, 2022

paul-rogers commented Apr 9, 2022

paul-rogers commented Apr 11, 2022

paul-rogers commented Apr 13, 2022 • edited Loading

paul-rogers commented Apr 14, 2022

kfaraz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers May 6, 2022 • edited Loading

Choose a reason for hiding this comment

kfaraz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kfaraz Apr 22, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers commented May 6, 2022

paul-rogers commented May 6, 2022

paul-rogers commented May 6, 2022

paul-rogers commented May 16, 2022

paul-rogers commented May 19, 2022 • edited Loading

kfaraz left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paul-rogers commented May 23, 2022 • edited Loading

paul-rogers commented May 23, 2022

paul-rogers commented Jun 1, 2022

paul-rogers commented Jun 3, 2022

paul-rogers commented Jun 17, 2022

paul-rogers commented Aug 20, 2022

paul-rogers commented Aug 22, 2022

paul-rogers commented Aug 24, 2022

Choose a reason for hiding this comment

kfaraz commented Aug 24, 2022

paul-rogers commented Mar 25, 2022 •

edited

Loading

paul-rogers commented Apr 13, 2022 •

edited

Loading

paul-rogers May 6, 2022 •

edited

Loading

kfaraz left a comment •

edited

Loading

kfaraz Apr 22, 2022 •

edited

Loading

paul-rogers commented May 19, 2022 •

edited

Loading

kfaraz left a comment •

edited

Loading

paul-rogers commented May 23, 2022 •

edited

Loading