Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397

Merged
merged 11 commits into from
Oct 17, 2024

Conversation

mumrah
Copy link
Member

@mumrah mumrah commented Oct 7, 2024

This patch extracts all of the test classes and methods from the JUnit XML files and creates a set of YAML files.

The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog. Since the files are sorted, there will only be a diff if a test is added, removed, or renamed. This should limit the storage requirements for this "database". By keeping these files in Git, we can easily do things like checkout the file set at an older timestamp.

git checkout test-catalog @{'7 days ago'}

The purpose of this test catalog is to allow us to easily and efficiently answer the question: "what test has been added recently". More precisely, we want to know which tests have been added in the last 7 days including new tests in a given PR.

The choice of YAML over plain text or JSON was primarily for compactness. Class names are not repeated, which makes the data set quite a bit smaller. The structure of YAML also gives us flexibility in the future for adding other data to this catalog.

@github-actions github-actions bot added the build Gradle build or GitHub Actions label Oct 7, 2024
@mumrah mumrah requested a review from chia7712 October 7, 2024 17:26
@mumrah
Copy link
Member Author

mumrah commented Oct 8, 2024

This needs #17399 to fix a test name parsing issue

@mumrah
Copy link
Member Author

mumrah commented Oct 9, 2024

Here is what the test catalog looks like.

File listing:

-rw-r--r--@ 1 501  20   11677 Oct  8 20:08 api.yaml
-rw-r--r--@ 1 501  20     645 Oct  8 20:08 basic-auth-extension.yaml
-rw-r--r--@ 1 501  20  184030 Oct  8 20:08 clients.yaml
-rw-r--r--@ 1 501  20    3603 Oct  8 20:08 coordinator-common.yaml
-rw-r--r--@ 1 501  20  173313 Oct  8 20:08 core.yaml
-rw-r--r--@ 1 501  20     547 Oct  8 20:08 examples.yaml
-rw-r--r--@ 1 501  20    1372 Oct  8 20:08 file.yaml
-rw-r--r--@ 1 501  20    3756 Oct  8 20:08 generator.yaml
-rw-r--r--@ 1 501  20   31243 Oct  8 20:08 group-coordinator.yaml
-rw-r--r--@ 1 501  20    2257 Oct  8 20:08 json.yaml
-rw-r--r--@ 1 501  20     452 Oct  8 20:08 log4j-appender.yaml
-rw-r--r--@ 1 501  20   29454 Oct  8 20:08 metadata.yaml
-rw-r--r--@ 1 501  20     719 Oct  8 20:08 mirror-client.yaml
-rw-r--r--@ 1 501  20    9215 Oct  8 20:08 mirror.yaml
-rw-r--r--@ 1 501  20   22265 Oct  8 20:08 raft.yaml
-rw-r--r--@ 1 501  20   54940 Oct  8 20:08 runtime.yaml
-rw-r--r--@ 1 501  20   11227 Oct  8 20:08 server-common.yaml
-rw-r--r--@ 1 501  20    4511 Oct  8 20:08 server.yaml
-rw-r--r--@ 1 501  20    1647 Oct  8 20:08 share-coordinator.yaml
-rw-r--r--@ 1 501  20     329 Oct  8 20:08 share.yaml
-rw-r--r--@ 1 501  20     710 Oct  8 20:08 shell.yaml
-rw-r--r--@ 1 501  20     187 Oct  8 20:08 storage-api.yaml
-rw-r--r--@ 1 501  20   14613 Oct  8 20:08 storage.yaml
-rw-r--r--@ 1 501  20    3655 Oct  8 20:08 streams-scala.yaml
-rw-r--r--@ 1 501  20  288534 Oct  8 20:08 streams.yaml
-rw-r--r--@ 1 501  20     739 Oct  8 20:08 test-common-api.yaml
-rw-r--r--@ 1 501  20     349 Oct  8 20:08 test-common.yaml
-rw-r--r--@ 1 501  20    6006 Oct  8 20:08 test-utils.yaml
-rw-r--r--@ 1 501  20      76 Oct  8 20:08 tools-api.yaml
-rw-r--r--@ 1 501  20   23586 Oct  8 20:08 tools.yaml
-rw-r--r--@ 1 501  20    8309 Oct  8 20:08 transforms.yaml
-rw-r--r--@ 1 501  20    3750 Oct  8 20:08 trogdor.yaml

head of metadata.yaml

org.apache.kafka.controller.AclControlManagerTest:
- testAddAndDelete
- testCreateAclDeleteAcl
- testDeleteDedupe
- testLoadSnapshot
- testValidateFilter
- testValidateNewAcl
org.apache.kafka.controller.ActivationRecordsGeneratorTest:
- testActivationMessageForEmptyLog
- testActivationMessageForNonEmptyLogNoMigrations

@mumrah
Copy link
Member Author

mumrah commented Oct 9, 2024

@chia7712 let me know what you think of this approach.

Once this catalog file is populated, I intend to use it like this:

  • In GHA, checkout the test-catalog branch from "7 days ago"
  • Run a Python script to convert the catalog into something easy to ingest in Java (single file of className#methodName)
  • Pass this file into JUnit ExecutionCondition plugin
  • The plugin compares each test being run with the catalog to determine if it should be quarantined or not

Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah thanks for this patch. Could you please open a sub task for this PR as KAFKA-17629 has some sub tasks already. I've left a few questions for this PR, and they might be a bit silly. 😄

The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog.

Do you mean we'll have a specific test-catalog branch containing only YAML files?

Once this catalog file is populated, I intend to use it like this:

How, when, and who is allowed to push commits to test-catalog?

-rw-r--r--@ 1 501 20 11677 Oct 8 20:08 api.yaml

should we add prefix to those connect modules?

.github/scripts/junit.py Outdated Show resolved Hide resolved
.github/scripts/junit.py Outdated Show resolved Hide resolved
with:
name: junit-xml-${{ matrix.java }}
path: |
build/junit-xml/**/*.xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we export the artifact url for this task?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we'll keep this step in long term since it is ~50Mb compressed (~1GB decompressed!).

@mumrah mumrah changed the title KAFKA-17629 Extract test catalog from JUnit output [1/n] KAFKA-17767 Extract test catalog from JUnit output [1/n] Oct 10, 2024
@mumrah
Copy link
Member Author

mumrah commented Oct 10, 2024

Do you mean we'll have a specific test-catalog branch containing only YAML files?

Yes, exactly. The branch will only have the test catalog files and no history otherwise (i.e., an orphaned branch). This is the same approach the GitHub Pages uses with the gh-pages branch. Here is an outdated example from my fork: https://github.com/mumrah/kafka/tree/test-metrics/

How, when, and who is allowed to push commits to test-catalog?

My plan is to have the trunk CI build run an additional job after "test" which checks out this branch, copies in the newly generated YAML files, commits changes (if any), and pushes. I don't anticipate committers needing to deal with this file directly.

Since it's a regular branch, any committer can modify it, but probably shouldn't.

should we add prefix to those connect modules?

Hm, seems the upload-artifact action is doing something weird to our paths. I'll investigate. Yea, I'll fix

@mumrah
Copy link
Member Author

mumrah commented Oct 10, 2024

Here's the latest catalog files:

Creating output directory ./test-catalog for test catalog export.
  Writing 101 tests for generator into ./test-catalog/generator.yaml.
  Writing 1 tests for tools-tools-api into ./test-catalog/tools-tools-api.yaml.
  Writing 764 tests for metadata into ./test-catalog/metadata.yaml.
  Writing 313 tests for connect-api into ./test-catalog/connect-api.yaml.
  Writing 20 tests for test-common-test-common-api into ./test-catalog/test-common-test-common-api.yaml.
  Writing 6 tests for test-common into ./test-catalog/test-common.yaml.
  Writing 224 tests for connect-mirror into ./test-catalog/connect-mirror.yaml.
  Writing 69 tests for streams-streams-scala into ./test-catalog/streams-streams-scala.yaml.
  Writing 85 tests for connect-json into ./test-catalog/connect-json.yaml.
  Writing 11 tests for streams-examples into ./test-catalog/streams-examples.yaml.
  Writing 2 tests for storage-storage-api into ./test-catalog/storage-storage-api.yaml.
  Writing 88 tests for trogdor into ./test-catalog/trogdor.yaml.
  Writing 10 tests for log4j-appender into ./test-catalog/log4j-appender.yaml.
  Writing 92 tests for coordinator-common into ./test-catalog/coordinator-common.yaml.
  Writing 1302 tests for connect-runtime into ./test-catalog/connect-runtime.yaml.
  Writing 17 tests for connect-basic-auth-extension into ./test-catalog/connect-basic-auth-extension.yaml.
  Writing 97 tests for server into ./test-catalog/server.yaml.
  Writing 159 tests for streams-test-utils into ./test-catalog/streams-test-utils.yaml.
  Writing 40 tests for share-coordinator into ./test-catalog/share-coordinator.yaml.
  Writing 317 tests for storage into ./test-catalog/storage.yaml.
  Writing 568 tests for tools into ./test-catalog/tools.yaml.
  Writing 17 tests for connect-mirror-client into ./test-catalog/connect-mirror-client.yaml.
  Writing 223 tests for connect-transforms into ./test-catalog/connect-transforms.yaml.
  Writing 5 tests for share into ./test-catalog/share.yaml.
  Writing 21 tests for shell into ./test-catalog/shell.yaml.
  Writing 582 tests for raft into ./test-catalog/raft.yaml.
  Writing 5710 tests for streams into ./test-catalog/streams.yaml.
  Writing 38 tests for connect-file into ./test-catalog/connect-file.yaml.
  Writing 3924 tests for core into ./test-catalog/core.yaml.
  Writing 1 tests for runtime into ./test-catalog/runtime.yaml.
  Writing 706 tests for group-coordinator into ./test-catalog/group-coordinator.yaml.
  Writing 4405 tests for clients into ./test-catalog/clients.yaml.
  Writing 318 tests for server-common into ./test-catalog/server-common.yaml.
  Wrote 20236 tests into test catalog.

Notice we now have "connect-file", "connect-api", "connect-transforms", etc.

@mumrah mumrah requested a review from chia7712 October 10, 2024 20:24

/**
* For a given Project, compute a nice dash separated directory name
* to store the JUnit XML files in. E.g., Project ":connect:api" -> "connect-api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have consistent sub-module naming. The connect-related modules don't have a prefix, while other sub-modules must include a prefix to avoid conflicts to connect's sub modules. This means the helper should only revise the module names for the connect modules.

Copy link
Member Author

@mumrah mumrah Oct 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered this, but didn't want to add special cases to the "module name finding" method.

The module name computed here (e.g., connect-api, clients, group-coordinator-group-coordinator-api) will determine the name of the directory in build/junit-xml. As long as its unique, it doesn't matter too much what it is.

Another option here would be to reflect the actual module paths in the output directory paths. So, :connect:api test results would end up in "build/junit-xml/connect/api/[test, quarantinedTest]/...". This would also cause the test catalog directories to mirror the actual layout of the project.

Copy link
Contributor

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mumrah thanks for updates

@@ -503,7 +518,8 @@ subprojects {
// were not run, but instead were restored via FROM-CACHE. See KAFKA-17479 for more details.
doLast {
if (ext.isGithubActions) {
def dest = rootProject.layout.buildDirectory.dir("junit-xml/${project.name}").get().asFile
def moduleDirPath = projectToJUnitXmlPath(project)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this change to quarantinedTest as well?

@mumrah
Copy link
Member Author

mumrah commented Oct 11, 2024

With latest changes, the files in the test catalog:

./clients
./clients/tests.yaml
./tools
./tools/tests.yaml
./tools/tools-api
./tools/tools-api/tests.yaml
./core
./core/tests.yaml
./group-coordinator
./group-coordinator/tests.yaml
./server-common
./server-common/tests.yaml
./trogdor
./trogdor/tests.yaml
./shell
./shell/tests.yaml
./streams
./streams/tests.yaml
./streams/test-utils
./streams/test-utils/tests.yaml
./streams/streams-scala
./streams/streams-scala/tests.yaml
./streams/examples
./streams/examples/tests.yaml
./runtime
./runtime/tests.yaml
./server
./server/tests.yaml
./test-common
./test-common/tests.yaml
./test-common/test-common-api
./test-common/test-common-api/tests.yaml
./storage
./storage/tests.yaml
./storage/storage-api
./storage/storage-api/tests.yaml
./share-coordinator
./share-coordinator/tests.yaml
./generator
./generator/tests.yaml
./coordinator-common
./coordinator-common/tests.yaml
./log4j-appender
./log4j-appender/tests.yaml
./raft
./raft/tests.yaml
./connect
./connect/file
./connect/file/tests.yaml
./connect/runtime
./connect/runtime/tests.yaml
./connect/basic-auth-extension
./connect/basic-auth-extension/tests.yaml
./connect/json
./connect/json/tests.yaml
./connect/transforms
./connect/transforms/tests.yaml
./connect/api
./connect/api/tests.yaml
./connect/mirror-client
./connect/mirror-client/tests.yaml
./connect/mirror
./connect/mirror/tests.yaml
./metadata
./metadata/tests.yaml
./share
./share/tests.yaml

@chia7712 chia7712 merged commit ef6c950 into apache:trunk Oct 17, 2024
6 checks passed
@mumrah
Copy link
Member Author

mumrah commented Oct 29, 2024

Verified that things look good on trunk.

When comparing the two test catalogs generated before and after this commit, we see:

diff --git a/streams/tests.yaml b/streams/tests.yaml
index 3f2250a..ded0055 100644
--- a/streams/tests.yaml
+++ b/streams/tests.yaml
@@ -2440,6 +2440,7 @@ org.apache.kafka.streams.processor.internals.PartitionGroupTest:
 - shouldThrowIllegalStateExceptionUponAddRecordsIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponGetHeadRecordOffsetIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponGetPartitionTimestampIfPartitionUnknown
+- shouldThrowIllegalStateExceptionUponHeadRecordLeaderEpochIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponNumBufferedIfPartitionUnknown
 - shouldThrowIllegalStateExceptionUponSetPartitionTimestampIfPartitionUnknown
 - shouldUpdateLags
@@ -2905,7 +2906,7 @@ org.apache.kafka.streams.processor.internals.StreamTaskTest:
 - shouldClearCommitStatusesInCloseDirty
 - shouldClearTaskTimeout
 - shouldCloseStateManagerEvenDuringFailureOnUncleanTaskClose
-- shouldCommitConsumerPositionIfRecordQueueIsEmpty
+- shouldCommitFetchedNextOffsetIfRecordQueueIsEmpty
 - shouldCommitNextOffsetAndProcessorMetadataFromQueueIfAvailable
 - shouldCommitOldProcessorMetadataWhenNotDirty
 - shouldFailOnCommitIfTaskIsClosed

which matches the changes in that commit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Gradle build or GitHub Actions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants