[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

pmuellr · 2024-06-06T15:05:48Z

In PR implement task claiming strategy mget #180485 we implemented an alternative task claiming strategy, but it has the following problem:

The jest and function tests are incomplete.

The jest tests do not test all the conditions of the default claimer, which they should.

Some of the function tests added in #180485 were skipped as they were flaky or always failing - those obviously need to be fixed. The functional tests that are there were copied from the existing task manager tests in https://github.com/elastic/kibana/tree/main/x-pack/test/plugin_api_integration/test_suites/task_manager - we should see if there are other function tests we can copy.

We should also see if we can just reuse the existing function tests drectly, instead of copying them. We'll still need a separate config, but maybe we can at least remove the copies of the test modules.

We should also see if we can build a simple integration test that launches multiple Kibanas, and ensures the claimers are working as expected (unclear what that means). We should actually try to make this test work with both claimers!

elasticmachine · 2024-06-06T15:05:50Z

Pinging @elastic/response-ops (Team:ResponseOps)

mikecote · 2024-08-22T12:52:30Z

@pmuellr when we switch the default task claimer to mget, and say we make the mget specific test suite now update_by_query specific (for legacy purposes), does this reduce or eliminate the list of tests we need to add?

I'm working on a draft PR to see what test failures will come up once we switch to mget (and a poll interval of 500) => #190148

pmuellr · 2024-08-22T14:22:41Z

My initial thought was ...

We should also see if we can just reuse the existing function tests drectly, instead of copying them. We'll still need a separate config, but maybe we can at least remove the copies of the test modules.

So likely the FT tests, which aren't sensitive to the claimer, should hopefully be reused - but we need to figure out how to do that :-). FTR work, but not new tests per-se.

mikecote · 2024-08-22T15:10:04Z

Gotcha, so if we switch the claimer used for all the tests to mget, we would get all the coverage necessary? Leaving behind update_by_query that now wouldn't be as fully covered. Perhaps that's sufficient? 🤔

pmuellr · 2024-09-03T18:50:41Z

Gotcha, so if we switch the claimer used for all the tests to mget, we would get all the coverage necessary? Leaving behind update_by_query that now wouldn't be as fully covered. Perhaps that's sufficient? 🤔

Seems like to me it's the minimum anyway. We will likely want some specialized tests, if possible, to deal with multiple costs, cost left over after claim, etc, etc - that's different than UBQ claimer. But it would at least be good to see all the existing tests, including implicitly the ones for alerting rules, running with the new claimer, just to make sure we haven't broken anything we already have tests for.

Perhaps we should create a floating PR that just whacks all the FTR configs to use the new claimer, and see what happens in CI. We can rebase it when we want to test any changes, throw it away when we decide what to REALLY do to test both of the claimers ...

mikecote · 2024-09-05T16:51:05Z

I have this PR #190148 that hits all the test suites using the mget task claimer. There was a few tests I had to modify to fix race conditions but the results are looking good so far!

…aimer being the default (#196399) Resolves #184942 Resolves #192023 Resolves #195573 In this PR, I'm improving the flakiness found in our functional tests in preperation for mget being the default task claimer that all these tests run with (#194625). Because the mget task claimer works differently and also polls more frequently, we end-up in situations where tasks run faster than they were with update_by_query, creating more race conditions that are now fixed in this PR. Issues were surfaced via #190148 where I set `mget` as the default task claiming strategy. Flaky test runs (some of these failed on other tests that are flaky): - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185 (for 0fcf1ae)

…aimer being the default (elastic#196399) Resolves elastic#184942 Resolves elastic#192023 Resolves elastic#195573 In this PR, I'm improving the flakiness found in our functional tests in preperation for mget being the default task claimer that all these tests run with (elastic#194625). Because the mget task claimer works differently and also polls more frequently, we end-up in situations where tasks run faster than they were with update_by_query, creating more race conditions that are now fixed in this PR. Issues were surfaced via elastic#190148 where I set `mget` as the default task claiming strategy. Flaky test runs (some of these failed on other tests that are flaky): - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7151 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7169 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7172 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7175 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7176 - https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/7185 (for elastic@0fcf1ae) (cherry picked from commit 3b8cf12) # Conflicts: # x-pack/test/alerting_api_integration/spaces_only/tests/alerting/group4/alerts_as_data/alerts_as_data_flapping.ts

pmuellr added Feature:Task Manager Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Jun 6, 2024

pmuellr mentioned this issue Jun 6, 2024

[ResponseOps] implement task claiming strategy mget #180485

Merged

doakalexi self-assigned this Jun 18, 2024

doakalexi removed their assignment Jul 15, 2024

mikecote assigned pmuellr Jul 16, 2024

mikecote mentioned this issue Aug 19, 2024

Scaling the alerting throughput ceiling from 3,200 to 32,000+ rules per minute #188194

Open

48 tasks

mikecote assigned mikecote and unassigned pmuellr Oct 8, 2024

mikecote mentioned this issue Oct 15, 2024

Improve task manager functional tests in preperation for mget task claimer being the default #196399

Merged

mikecote closed this as completed in #196399 Oct 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

pmuellr commented Jun 6, 2024

elasticmachine commented Jun 6, 2024

mikecote commented Aug 22, 2024

pmuellr commented Aug 22, 2024

mikecote commented Aug 22, 2024

pmuellr commented Sep 3, 2024

mikecote commented Sep 5, 2024

[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

[ResponseOps][Task Manager][mget Claimer] finish jest, functional, integration tests #184942

Comments

pmuellr commented Jun 6, 2024

elasticmachine commented Jun 6, 2024

mikecote commented Aug 22, 2024

pmuellr commented Aug 22, 2024

mikecote commented Aug 22, 2024

pmuellr commented Sep 3, 2024

mikecote commented Sep 5, 2024