-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to split integration tests into different groups #3544
Conversation
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
🌐 Coverage report
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this and it the changes are surprisingly simple. Thanks!
I think the hardest thing here will be making sure that it is well understood and used properly. At minimum we will need some explanation of this in https://github.com/elastic/elastic-agent/blob/main/docs/test-framework-dev-guide.md as well.
I have two thoughts to try to help this:
-
Consider making a
ShardID
mandatory and predefine a default shard ID for all tests to reuse. This is more likely to work properly if someone copy/pastes from an existing test IMO, because people will see there is always a shard ID and that some tests use a default. Without this it is harder to notice that ShardID is sometimes undefined, and you aren't always going to be lead to the ShardID concept depending on which test you start from. -
We should call this something other than
ShardID
. Giving framework users control over which tests run where makes sense, but the concept of sharding doesn't have a default definition for test frameworks. This also avoids requiring users to understand what sharding means in general. As for what else to call it, maybeTestMachineGroup
to try to communicate that all tests in the same group will run on the same VM.
@cmacknz I took your feedback and change it to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM, a bunch of typos and some small clarifications to make before approval though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aside from typos craig pointed out.
Co-authored-by: Craig MacKenzie <[email protected]>
We had some really old firewall rules that where created by OGC around in GCP. This was causing this to fail because it now creates more instances than it used to. OGC creates a firewall rule per instance and this was causing the quota to hit over 500 because there was around 400 old firewall rules around. I have cleaned those up, with hopes it allows the next run to pass. |
buildkite test this |
buildkite test this |
buildkite test this |
SonarQube Quality Gate 0 Bugs No Coverage information |
This pull request is now in conflicts. Could you fix it? 🙏
|
Bringing it back to draft as it has been open for a long time without any movement. |
This pull request does not have a backport label. Could you fix it @blakerouse? 🙏
NOTE: |
I can see that the runner got stuck similar to the issue we have been seeing where it just stays running forever. With the groups it makes it easier to narrow it down to which group of tests is getting stuck: I went through the logs and split out the required information to see what ran or still is running. Manually correlating each line to its runner. The breakdown is below and you can tell that
Last line from
|
SonarQube Quality Gate 0 Bugs No Coverage information |
* Add ability to shard integration tests across muliple hosts. * Change to Group. * Add to dev guide. * Fix magefile. * Apply suggestions from code review Co-authored-by: Craig MacKenzie <[email protected]> * Add define.Default constant. * Remove isolate. * Fix tests. * Update groups. * Adjust groups. * Adjust some groups, add debugging SSH key. * Only windows default group with all logs. * Remove windows specific. * Fix add_cloud_metadata allowed errors. * Another allowed error. * Yet another error. --------- Co-authored-by: Craig MacKenzie <[email protected]> (cherry picked from commit 8a8abd0) # Conflicts: # magefile.go # pkg/testing/runner/config.go # pkg/testing/runner/runner.go # testing/integration/beats_serverless_test.go # testing/integration/install_unprivileged_test.go # testing/integration/logs_ingestion_test.go # testing/integration/upgrade_fleet_test.go
…fferent groups (#3841) * Add ability to split integration tests into different groups (#3544) * Add ability to shard integration tests across muliple hosts. * Change to Group. * Add to dev guide. * Fix magefile. * Apply suggestions from code review Co-authored-by: Craig MacKenzie <[email protected]> * Add define.Default constant. * Remove isolate. * Fix tests. * Update groups. * Adjust groups. * Adjust some groups, add debugging SSH key. * Only windows default group with all logs. * Remove windows specific. * Fix add_cloud_metadata allowed errors. * Another allowed error. * Yet another error. --------- Co-authored-by: Craig MacKenzie <[email protected]> (cherry picked from commit 8a8abd0) # Conflicts: # magefile.go # pkg/testing/runner/config.go # pkg/testing/runner/runner.go # testing/integration/beats_serverless_test.go # testing/integration/install_unprivileged_test.go # testing/integration/logs_ingestion_test.go # testing/integration/upgrade_fleet_test.go * Fix conflicts. --------- Co-authored-by: Blake Rouse <[email protected]>
What does this PR do?
Adds a required field
Group
todefine.Requirements
. Adds ability to select a specific group of tests withTEST_GROUPS
environment variable. Groups batches byGroup
name splitting the testing load across multiple instances.Why is it important?
Allows tests to be grouped to different instances. Allowing more tests to run concurrently to speed up the time it takes for integration tests to run. Allows the selection of running only a group of tests.
Checklist
[ ] I have made corresponding change to the default configuration files[ ] I have added an entry in./changelog/fragments
using the changelog toolHow to test this PR locally
Run
mage integration:test
see that multiple instances are spawned instead of only one per platform OS/architecture.Select a specific group with
TEST_GROUPS="default" mage integration:test
.