-
Notifications
You must be signed in to change notification settings - Fork 14k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KAFKA-17767 Extract test catalog from JUnit output [1/n] #17397
Conversation
This needs #17399 to fix a test name parsing issue |
Here is what the test catalog looks like. File listing:
|
@chia7712 let me know what you think of this approach. Once this catalog file is populated, I intend to use it like this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for this patch. Could you please open a sub task for this PR as KAFKA-17629 has some sub tasks already. I've left a few questions for this PR, and they might be a bit silly. 😄
The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog.
Do you mean we'll have a specific test-catalog branch containing only YAML files?
Once this catalog file is populated, I intend to use it like this:
How, when, and who is allowed to push commits to test-catalog?
-rw-r--r--@ 1 501 20 11677 Oct 8 20:08 api.yaml
should we add prefix to those connect modules?
with: | ||
name: junit-xml-${{ matrix.java }} | ||
path: | | ||
build/junit-xml/**/*.xml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we export the artifact url for this task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we'll keep this step in long term since it is ~50Mb compressed (~1GB decompressed!).
Yes, exactly. The branch will only have the test catalog files and no history otherwise (i.e., an orphaned branch). This is the same approach the GitHub Pages uses with the
My plan is to have the trunk CI build run an additional job after "test" which checks out this branch, copies in the newly generated YAML files, commits changes (if any), and pushes. I don't anticipate committers needing to deal with this file directly. Since it's a regular branch, any committer can modify it, but probably shouldn't.
|
Here's the latest catalog files:
Notice we now have "connect-file", "connect-api", "connect-transforms", etc. |
|
||
/** | ||
* For a given Project, compute a nice dash separated directory name | ||
* to store the JUnit XML files in. E.g., Project ":connect:api" -> "connect-api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have consistent sub-module naming. The connect-related modules don't have a prefix, while other sub-modules must include a prefix to avoid conflicts to connect's sub modules. This means the helper should only revise the module names for the connect modules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered this, but didn't want to add special cases to the "module name finding" method.
The module name computed here (e.g., connect-api
, clients
, group-coordinator-group-coordinator-api
) will determine the name of the directory in build/junit-xml. As long as its unique, it doesn't matter too much what it is.
Another option here would be to reflect the actual module paths in the output directory paths. So, :connect:api
test results would end up in "build/junit-xml/connect/api/[test, quarantinedTest]/...". This would also cause the test catalog directories to mirror the actual layout of the project.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mumrah thanks for updates
@@ -503,7 +518,8 @@ subprojects { | |||
// were not run, but instead were restored via FROM-CACHE. See KAFKA-17479 for more details. | |||
doLast { | |||
if (ext.isGithubActions) { | |||
def dest = rootProject.layout.buildDirectory.dir("junit-xml/${project.name}").get().asFile | |||
def moduleDirPath = projectToJUnitXmlPath(project) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we apply this change to quarantinedTest as well?
With latest changes, the files in the test catalog:
|
Verified that things look good on trunk. When comparing the two test catalogs generated before and after this commit, we see:
which matches the changes in that commit |
This patch extracts all of the test classes and methods from the JUnit XML files and creates a set of YAML files.
The intention of these YAML files is to be checked into an orphaned Git branch and serve as a historical test catalog. Since the files are sorted, there will only be a diff if a test is added, removed, or renamed. This should limit the storage requirements for this "database". By keeping these files in Git, we can easily do things like checkout the file set at an older timestamp.
The purpose of this test catalog is to allow us to easily and efficiently answer the question: "what test has been added recently". More precisely, we want to know which tests have been added in the last 7 days including new tests in a given PR.
The choice of YAML over plain text or JSON was primarily for compactness. Class names are not repeated, which makes the data set quite a bit smaller. The structure of YAML also gives us flexibility in the future for adding other data to this catalog.