KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

mumrah · 2024-10-07T17:21:53Z

This patch extracts all of the test classes and methods from the JUnit XML files and creates a set of YAML files.

The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog. Since the files are sorted, there will only be a diff if a test is added, removed, or renamed. This should limit the storage requirements for this "database". By keeping these files in Git, we can easily do things like checkout the file set at an older timestamp.

git checkout test-catalog @{'7 days ago'}

The purpose of this test catalog is to allow us to easily and efficiently answer the question: "what test has been added recently". More precisely, we want to know which tests have been added in the last 7 days including new tests in a given PR.

The choice of YAML over plain text or JSON was primarily for compactness. Class names are not repeated, which makes the data set quite a bit smaller. The structure of YAML also gives us flexibility in the future for adding other data to this catalog.

mumrah · 2024-10-08T00:39:32Z

This needs #17399 to fix a test name parsing issue

mumrah · 2024-10-09T14:15:09Z

Here is what the test catalog looks like.

File listing:

-rw-r--r--@ 1 501  20   11677 Oct  8 20:08 api.yaml
-rw-r--r--@ 1 501  20     645 Oct  8 20:08 basic-auth-extension.yaml
-rw-r--r--@ 1 501  20  184030 Oct  8 20:08 clients.yaml
-rw-r--r--@ 1 501  20    3603 Oct  8 20:08 coordinator-common.yaml
-rw-r--r--@ 1 501  20  173313 Oct  8 20:08 core.yaml
-rw-r--r--@ 1 501  20     547 Oct  8 20:08 examples.yaml
-rw-r--r--@ 1 501  20    1372 Oct  8 20:08 file.yaml
-rw-r--r--@ 1 501  20    3756 Oct  8 20:08 generator.yaml
-rw-r--r--@ 1 501  20   31243 Oct  8 20:08 group-coordinator.yaml
-rw-r--r--@ 1 501  20    2257 Oct  8 20:08 json.yaml
-rw-r--r--@ 1 501  20     452 Oct  8 20:08 log4j-appender.yaml
-rw-r--r--@ 1 501  20   29454 Oct  8 20:08 metadata.yaml
-rw-r--r--@ 1 501  20     719 Oct  8 20:08 mirror-client.yaml
-rw-r--r--@ 1 501  20    9215 Oct  8 20:08 mirror.yaml
-rw-r--r--@ 1 501  20   22265 Oct  8 20:08 raft.yaml
-rw-r--r--@ 1 501  20   54940 Oct  8 20:08 runtime.yaml
-rw-r--r--@ 1 501  20   11227 Oct  8 20:08 server-common.yaml
-rw-r--r--@ 1 501  20    4511 Oct  8 20:08 server.yaml
-rw-r--r--@ 1 501  20    1647 Oct  8 20:08 share-coordinator.yaml
-rw-r--r--@ 1 501  20     329 Oct  8 20:08 share.yaml
-rw-r--r--@ 1 501  20     710 Oct  8 20:08 shell.yaml
-rw-r--r--@ 1 501  20     187 Oct  8 20:08 storage-api.yaml
-rw-r--r--@ 1 501  20   14613 Oct  8 20:08 storage.yaml
-rw-r--r--@ 1 501  20    3655 Oct  8 20:08 streams-scala.yaml
-rw-r--r--@ 1 501  20  288534 Oct  8 20:08 streams.yaml
-rw-r--r--@ 1 501  20     739 Oct  8 20:08 test-common-api.yaml
-rw-r--r--@ 1 501  20     349 Oct  8 20:08 test-common.yaml
-rw-r--r--@ 1 501  20    6006 Oct  8 20:08 test-utils.yaml
-rw-r--r--@ 1 501  20      76 Oct  8 20:08 tools-api.yaml
-rw-r--r--@ 1 501  20   23586 Oct  8 20:08 tools.yaml
-rw-r--r--@ 1 501  20    8309 Oct  8 20:08 transforms.yaml
-rw-r--r--@ 1 501  20    3750 Oct  8 20:08 trogdor.yaml

head of metadata.yaml

org.apache.kafka.controller.AclControlManagerTest:
- testAddAndDelete
- testCreateAclDeleteAcl
- testDeleteDedupe
- testLoadSnapshot
- testValidateFilter
- testValidateNewAcl
org.apache.kafka.controller.ActivationRecordsGeneratorTest:
- testActivationMessageForEmptyLog
- testActivationMessageForNonEmptyLogNoMigrations

mumrah · 2024-10-09T14:23:31Z

@chia7712 let me know what you think of this approach.

Once this catalog file is populated, I intend to use it like this:

In GHA, checkout the test-catalog branch from "7 days ago"
Run a Python script to convert the catalog into something easy to ingest in Java (single file of className#methodName)
Pass this file into JUnit ExecutionCondition plugin
The plugin compares each test being run with the catalog to determine if it should be quarantined or not

chia7712

@mumrah thanks for this patch. Could you please open a sub task for this PR as KAFKA-17629 has some sub tasks already. I've left a few questions for this PR, and they might be a bit silly. 😄

The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog.

Do you mean we'll have a specific test-catalog branch containing only YAML files?

Once this catalog file is populated, I intend to use it like this:

How, when, and who is allowed to push commits to test-catalog?

-rw-r--r--@ 1 501 20 11677 Oct 8 20:08 api.yaml

should we add prefix to those connect modules?

.github/scripts/junit.py

chia7712 · 2024-10-10T10:12:36Z

.github/workflows/build.yml

+        with:
+          name: junit-xml-${{ matrix.java }}
+          path: |
+            build/junit-xml/**/*.xml


Should we export the artifact url for this task?

I'm not sure we'll keep this step in long term since it is ~50Mb compressed (~1GB decompressed!).

mumrah · 2024-10-10T14:53:52Z

Do you mean we'll have a specific test-catalog branch containing only YAML files?

Yes, exactly. The branch will only have the test catalog files and no history otherwise (i.e., an orphaned branch). This is the same approach the GitHub Pages uses with the gh-pages branch. Here is an outdated example from my fork: https://github.com/mumrah/kafka/tree/test-metrics/

How, when, and who is allowed to push commits to test-catalog?

My plan is to have the trunk CI build run an additional job after "test" which checks out this branch, copies in the newly generated YAML files, commits changes (if any), and pushes. I don't anticipate committers needing to deal with this file directly.

Since it's a regular branch, any committer can modify it, but probably shouldn't.

should we add prefix to those connect modules?

~~Hm, seems the upload-artifact action is doing something weird to our paths. I'll investigate.~~ Yea, I'll fix

mumrah · 2024-10-10T18:14:40Z

Here's the latest catalog files:

Creating output directory ./test-catalog for test catalog export.
  Writing 101 tests for generator into ./test-catalog/generator.yaml.
  Writing 1 tests for tools-tools-api into ./test-catalog/tools-tools-api.yaml.
  Writing 764 tests for metadata into ./test-catalog/metadata.yaml.
  Writing 313 tests for connect-api into ./test-catalog/connect-api.yaml.
  Writing 20 tests for test-common-test-common-api into ./test-catalog/test-common-test-common-api.yaml.
  Writing 6 tests for test-common into ./test-catalog/test-common.yaml.
  Writing 224 tests for connect-mirror into ./test-catalog/connect-mirror.yaml.
  Writing 69 tests for streams-streams-scala into ./test-catalog/streams-streams-scala.yaml.
  Writing 85 tests for connect-json into ./test-catalog/connect-json.yaml.
  Writing 11 tests for streams-examples into ./test-catalog/streams-examples.yaml.
  Writing 2 tests for storage-storage-api into ./test-catalog/storage-storage-api.yaml.
  Writing 88 tests for trogdor into ./test-catalog/trogdor.yaml.
  Writing 10 tests for log4j-appender into ./test-catalog/log4j-appender.yaml.
  Writing 92 tests for coordinator-common into ./test-catalog/coordinator-common.yaml.
  Writing 1302 tests for connect-runtime into ./test-catalog/connect-runtime.yaml.
  Writing 17 tests for connect-basic-auth-extension into ./test-catalog/connect-basic-auth-extension.yaml.
  Writing 97 tests for server into ./test-catalog/server.yaml.
  Writing 159 tests for streams-test-utils into ./test-catalog/streams-test-utils.yaml.
  Writing 40 tests for share-coordinator into ./test-catalog/share-coordinator.yaml.
  Writing 317 tests for storage into ./test-catalog/storage.yaml.
  Writing 568 tests for tools into ./test-catalog/tools.yaml.
  Writing 17 tests for connect-mirror-client into ./test-catalog/connect-mirror-client.yaml.
  Writing 223 tests for connect-transforms into ./test-catalog/connect-transforms.yaml.
  Writing 5 tests for share into ./test-catalog/share.yaml.
  Writing 21 tests for shell into ./test-catalog/shell.yaml.
  Writing 582 tests for raft into ./test-catalog/raft.yaml.
  Writing 5710 tests for streams into ./test-catalog/streams.yaml.
  Writing 38 tests for connect-file into ./test-catalog/connect-file.yaml.
  Writing 3924 tests for core into ./test-catalog/core.yaml.
  Writing 1 tests for runtime into ./test-catalog/runtime.yaml.
  Writing 706 tests for group-coordinator into ./test-catalog/group-coordinator.yaml.
  Writing 4405 tests for clients into ./test-catalog/clients.yaml.
  Writing 318 tests for server-common into ./test-catalog/server-common.yaml.
  Wrote 20236 tests into test catalog.

Notice we now have "connect-file", "connect-api", "connect-transforms", etc.

chia7712 · 2024-10-11T03:54:55Z

build.gradle

-
+/**
+ * For a given Project, compute a nice dash separated directory name
+ * to store the JUnit XML files in. E.g., Project ":connect:api" -> "connect-api"


We don't have consistent sub-module naming. The connect-related modules don't have a prefix, while other sub-modules must include a prefix to avoid conflicts to connect's sub modules. This means the helper should only revise the module names for the connect modules.

I considered this, but didn't want to add special cases to the "module name finding" method.

The module name computed here (e.g., connect-api, clients, group-coordinator-group-coordinator-api) will determine the name of the directory in build/junit-xml. As long as its unique, it doesn't matter too much what it is.

Another option here would be to reflect the actual module paths in the output directory paths. So, :connect:api test results would end up in "build/junit-xml/connect/api/[test, quarantinedTest]/...". This would also cause the test catalog directories to mirror the actual layout of the project.

chia7712

@mumrah thanks for updates

chia7712 · 2024-10-11T14:20:54Z

build.gradle

@@ -503,7 +518,8 @@ subprojects {
    // were not run, but instead were restored via FROM-CACHE. See KAFKA-17479 for more details.
    doLast {
      if (ext.isGithubActions) {
-        def dest = rootProject.layout.buildDirectory.dir("junit-xml/${project.name}").get().asFile
+        def moduleDirPath = projectToJUnitXmlPath(project)


Should we apply this change to quarantinedTest as well?

mumrah · 2024-10-11T17:17:36Z

With latest changes, the files in the test catalog:

./clients
./clients/tests.yaml
./tools
./tools/tests.yaml
./tools/tools-api
./tools/tools-api/tests.yaml
./core
./core/tests.yaml
./group-coordinator
./group-coordinator/tests.yaml
./server-common
./server-common/tests.yaml
./trogdor
./trogdor/tests.yaml
./shell
./shell/tests.yaml
./streams
./streams/tests.yaml
./streams/test-utils
./streams/test-utils/tests.yaml
./streams/streams-scala
./streams/streams-scala/tests.yaml
./streams/examples
./streams/examples/tests.yaml
./runtime
./runtime/tests.yaml
./server
./server/tests.yaml
./test-common
./test-common/tests.yaml
./test-common/test-common-api
./test-common/test-common-api/tests.yaml
./storage
./storage/tests.yaml
./storage/storage-api
./storage/storage-api/tests.yaml
./share-coordinator
./share-coordinator/tests.yaml
./generator
./generator/tests.yaml
./coordinator-common
./coordinator-common/tests.yaml
./log4j-appender
./log4j-appender/tests.yaml
./raft
./raft/tests.yaml
./connect
./connect/file
./connect/file/tests.yaml
./connect/runtime
./connect/runtime/tests.yaml
./connect/basic-auth-extension
./connect/basic-auth-extension/tests.yaml
./connect/json
./connect/json/tests.yaml
./connect/transforms
./connect/transforms/tests.yaml
./connect/api
./connect/api/tests.yaml
./connect/mirror-client
./connect/mirror-client/tests.yaml
./connect/mirror
./connect/mirror/tests.yaml
./metadata
./metadata/tests.yaml
./share
./share/tests.yaml

mumrah · 2024-10-29T13:57:49Z

Verified that things look good on trunk.

When comparing the two test catalogs generated before and after this commit, we see:

diff --git a/streams/tests.yaml b/streams/tests.yaml
index 3f2250a..ded0055 100644
--- a/streams/tests.yaml
+++ b/streams/tests.yaml
@@ -2440,6 +2440,7 @@ org.apache.kafka.streams.processor.internals.PartitionGroupTest:
 - shouldThrowIllegalStateExceptionUponAddRecordsIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponGetHeadRecordOffsetIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponGetPartitionTimestampIfPartitionUnknown
+- shouldThrowIllegalStateExceptionUponHeadRecordLeaderEpochIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponNumBufferedIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponSetPartitionTimestampIfPartitionUnknown
 - shouldUpdateLags
@@ -2905,7 +2906,7 @@ org.apache.kafka.streams.processor.internals.StreamTaskTest:
 - shouldClearCommitStatusesInCloseDirty
 - shouldClearTaskTimeout
 - shouldCloseStateManagerEvenDuringFailureOnUncleanTaskClose
-- shouldCommitConsumerPositionIfRecordQueueIsEmpty
+- shouldCommitFetchedNextOffsetIfRecordQueueIsEmpty
 - shouldCommitNextOffsetAndProcessorMetadataFromQueueIfAvailable
 - shouldCommitOldProcessorMetadataWhenNotDirty
 - shouldFailOnCommitIfTaskIsClosed

which matches the changes in that commit

mumrah added 3 commits October 7, 2024 12:47

export run tests to yaml files

d44f547

generate test catalog

c87d942

add yaml dep

40c21ad

github-actions bot added the build Gradle build or GitHub Actions label Oct 7, 2024

revert unintended changes

0b79087

mumrah requested a review from chia7712 October 7, 2024 17:26

mumrah added 3 commits October 8, 2024 10:52

Merge remote-tracking branch 'origin/trunk' into KAFKA-17629-part-1

a1a8980

fix module path

d67731e

Merge remote-tracking branch 'origin/trunk' into KAFKA-17629-part-1

ce798ce

chia7712 reviewed Oct 10, 2024

View reviewed changes

.github/scripts/junit.py Outdated Show resolved Hide resolved

.github/scripts/junit.py Outdated Show resolved Hide resolved

chia7712 reviewed Oct 10, 2024

View reviewed changes

mumrah changed the title ~~KAFKA-17629 Extract test catalog from JUnit output [1/n]~~ KAFKA-17767 Extract test catalog from JUnit output [1/n] Oct 10, 2024

fix junit-xml paths

bc1aa94

mumrah requested a review from chia7712 October 10, 2024 20:24

chia7712 reviewed Oct 11, 2024

View reviewed changes

mumrah added 2 commits October 11, 2024 10:16

preserve module paths

c18f854

Merge remote-tracking branch 'origin/trunk' into KAFKA-17629-part-1

4a6a9b1

chia7712 reviewed Oct 11, 2024

View reviewed changes

fix output directories for quarantinedTest as well

889f7a0

chia7712 approved these changes Oct 17, 2024

View reviewed changes

chia7712 merged commit ef6c950 into apache:trunk Oct 17, 2024
6 checks passed

This was referenced Oct 29, 2024

KAFKA-17767 Store the test catalog in Git [2/n] #17624

Closed

KAFKA-17767 Store the test catalog in Git [2/n] (2nd attempt) #17627

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

mumrah commented Oct 7, 2024 •

edited

Loading

mumrah commented Oct 8, 2024

mumrah commented Oct 9, 2024

mumrah commented Oct 9, 2024

chia7712 left a comment

chia7712 Oct 10, 2024

mumrah Oct 10, 2024

mumrah commented Oct 10, 2024 •

edited

Loading

mumrah commented Oct 10, 2024

chia7712 Oct 11, 2024

mumrah Oct 11, 2024 •

edited

Loading

chia7712 left a comment

chia7712 Oct 11, 2024

mumrah commented Oct 11, 2024

mumrah commented Oct 29, 2024

KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

Conversation

mumrah commented Oct 7, 2024 • edited Loading

mumrah commented Oct 8, 2024

mumrah commented Oct 9, 2024

mumrah commented Oct 9, 2024

chia7712 left a comment

Choose a reason for hiding this comment

chia7712 Oct 10, 2024

Choose a reason for hiding this comment

mumrah Oct 10, 2024

Choose a reason for hiding this comment

mumrah commented Oct 10, 2024 • edited Loading

mumrah commented Oct 10, 2024

chia7712 Oct 11, 2024

Choose a reason for hiding this comment

mumrah Oct 11, 2024 • edited Loading

Choose a reason for hiding this comment

chia7712 left a comment

Choose a reason for hiding this comment

chia7712 Oct 11, 2024

Choose a reason for hiding this comment

mumrah commented Oct 11, 2024

mumrah commented Oct 29, 2024

mumrah commented Oct 7, 2024 •

edited

Loading

mumrah commented Oct 10, 2024 •

edited

Loading

mumrah Oct 11, 2024 •

edited

Loading