Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

Closed
pmuellr opened this issue Jun 6, 2024 · 6 comments · Fixed by #196399
Closed
Assignees
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@pmuellr
Copy link
Member

pmuellr commented Jun 6, 2024

In PR implement task claiming strategy mget #180485 we implemented an alternative task claiming strategy, but it has the following problem:

The jest and function tests are incomplete.

The jest tests do not test all the conditions of the default claimer, which they should.

Some of the function tests added in #180485 were skipped as they were flaky or always failing - those obviously need to be fixed. The functional tests that are there were copied from the existing task manager tests in https://github.com/elastic/kibana/tree/main/x-pack/test/plugin_api_integration/test_suites/task_manager - we should see if there are other function tests we can copy.

We should also see if we can just reuse the existing function tests drectly, instead of copying them. We'll still need a separate config, but maybe we can at least remove the copies of the test modules.

We should also see if we can build a simple integration test that launches multiple Kibanas, and ensures the claimers are working as expected (unclear what that means). We should actually try to make this test work with both claimers!

@pmuellr pmuellr added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 6, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/response-ops (Team:ResponseOps)

@mikecote
Copy link
Contributor

@pmuellr when we switch the default task claimer to mget, and say we make the mget specific test suite now update_by_query specific (for legacy purposes), does this reduce or eliminate the list of tests we need to add?

I'm working on a draft PR to see what test failures will come up once we switch to mget (and a poll interval of 500) => #190148

@pmuellr
Copy link
Member Author

pmuellr commented Aug 22, 2024

My initial thought was ...

We should also see if we can just reuse the existing function tests drectly, instead of copying them. We'll still need a separate config, but maybe we can at least remove the copies of the test modules.

So likely the FT tests, which aren't sensitive to the claimer, should hopefully be reused - but we need to figure out how to do that :-). FTR work, but not new tests per-se.

@mikecote
Copy link
Contributor

Gotcha, so if we switch the claimer used for all the tests to mget, we would get all the coverage necessary? Leaving behind update_by_query that now wouldn't be as fully covered. Perhaps that's sufficient? 🤔

@pmuellr
Copy link
Member Author

pmuellr commented Sep 3, 2024

Gotcha, so if we switch the claimer used for all the tests to mget, we would get all the coverage necessary? Leaving behind update_by_query that now wouldn't be as fully covered. Perhaps that's sufficient? 🤔

Seems like to me it's the minimum anyway. We will likely want some specialized tests, if possible, to deal with multiple costs, cost left over after claim, etc, etc - that's different than UBQ claimer. But it would at least be good to see all the existing tests, including implicitly the ones for alerting rules, running with the new claimer, just to make sure we haven't broken anything we already have tests for.

Perhaps we should create a floating PR that just whacks all the FTR configs to use the new claimer, and see what happens in CI. We can rebase it when we want to test any changes, throw it away when we decide what to REALLY do to test both of the claimers ...

@mikecote
Copy link
Contributor

mikecote commented Sep 5, 2024

I have this PR #190148 that hits all the test suites using the mget task claimer. There was a few tests I had to modify to fix race conditions but the results are looking good so far!

@mikecote mikecote assigned mikecote and unassigned pmuellr Oct 8, 2024
mikecote added a commit that referenced this issue Oct 21, 2024
…aimer being the default (#196399)

Resolves #184942
Resolves #192023
Resolves #195573

In this PR, I'm improving the flakiness found in our functional tests in
preperation for mget being the default task claimer that all these tests
run with (#194625). Because the
mget task claimer works differently and also polls more frequently, we
end-up in situations where tasks run faster than they were with
update_by_query, creating more race conditions that are now fixed in
this PR.

Issues were surfaced via #190148
where I set `mget` as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185
(for
0fcf1ae)
mikecote added a commit to mikecote/kibana that referenced this issue Oct 21, 2024
…aimer being the default (elastic#196399)

Resolves elastic#184942
Resolves elastic#192023
Resolves elastic#195573

In this PR, I'm improving the flakiness found in our functional tests in
preperation for mget being the default task claimer that all these tests
run with (elastic#194625). Because the
mget task claimer works differently and also polls more frequently, we
end-up in situations where tasks run faster than they were with
update_by_query, creating more race conditions that are now fixed in
this PR.

Issues were surfaced via elastic#190148
where I set `mget` as the default task claiming strategy.

Flaky test runs (some of these failed on other tests that are flaky):
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176
-
https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185
(for
elastic@0fcf1ae)

(cherry picked from commit 3b8cf12)

# Conflicts:
#	x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/alerts_as_data/alerts_as_data_flapping.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
4 participants