Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor KibanaThreadPoolIT/SystemIndexThreadPoolTestCase for resiliency #108463

Merged
merged 11 commits into from
May 14, 2024

Conversation

pgomulka
Copy link
Contributor

@pgomulka pgomulka commented May 9, 2024

the in SystemIndexThreadPoolTestCase#testUserThreadPoolsAreBlocked sometimes was blocked instead of throwing an expected exception (due to a queue on a threadpool being full).
IT happens that submitting a busyTask does not guarantee that it will be executed immediately by a threadpool. It might be that some other task was executing at the time
This commit refactors the way threadpool is populated and makes sure that before the queue are filled, all the busyTasks are executed on threadpools

based on the test failure -> #107625 thread pool's threads were busy, but I cannot tell if a queue was full before the search request was submitted.

the in SystemIndexThreadPoolTestCase#testUserThreadPoolsAreBlocked sometimes was blocked instead
of throwing an expected exception (due to a queue on a threadpool being full).
This commit adds an extra debug & assertion to make sure the queue is indeed full and threadpool is full.
relates elastic#107625
@pgomulka pgomulka added >test Issues or PRs that are addressing/adding tests :Core/Infra/Core Core issues without another label labels May 9, 2024
@pgomulka pgomulka requested a review from a team May 9, 2024 13:53
@pgomulka pgomulka self-assigned this May 9, 2024
@elasticsearchmachine elasticsearchmachine added Team:Core/Infra Meta label for core/infra team v8.15.0 labels May 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (Team:Core/Infra)

@pgomulka pgomulka changed the title Extra debugging for KibanaThreadPoolIT/SystemIndexThreadPoolTestCase Refactor KibanaThreadPoolIT/SystemIndexThreadPoolTestCase for resiliency May 13, 2024
phaser.register();

blockThreadPool(phaser);
phaser.arriveAndAwaitAdvance();
Copy link
Contributor Author

@pgomulka pgomulka May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a new version of making sure all threadpools are blocked before the queues are being filled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 🎉

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the key to fix the test was to block all, waiting for all to be blocked, and only then fill the queues (and proceed with the test), correct?
If so, maybe it's worth a couple of comments (to be honest, method names are already good, but since this was an edge case a comment on that would be a good addition)

Copy link
Contributor

@mosche mosche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🥇

Comment on lines 41 to 46
* <p>We can block thread pools by setting them to one thread and no queue, then submitting
* threads that wait on a countdown latch. This lets us verify that operations on system indices
* are being directed to other thread pools.</p>
*
* <p>When implementing this class, don't forget to override {@link ESIntegTestCase#nodePlugins()} if
* the relevant system index is defined in a plugin.</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Javadoc isn't accurate anymore, better to remove

phaser.register();

blockThreadPool(phaser);
phaser.arriveAndAwaitAdvance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch 🎉

@pgomulka
Copy link
Contributor Author

@elasticmachine run elasticsearch-ci/part-2

Copy link
Contributor

@ldematte ldematte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments, but looks good overall!

ThreadPool.Info info = threadPool.info(threadPoolName);

Runnable waitAction = () -> {
phaser.arriveAndAwaitAdvance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe add a comment to indicate which step each of the arriveAndAwait guards

phaser.register();

blockThreadPool(phaser);
phaser.arriveAndAwaitAdvance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me the key to fix the test was to block all, waiting for all to be blocked, and only then fill the queues (and proceed with the test), correct?
If so, maybe it's worth a couple of comments (to be honest, method names are already good, but since this was an edge case a comment on that would be a good addition)

@@ -61,4 +97,108 @@ public void testKibanaThreadPool() {
}
});
}

public void testBlockedThreadPoolsRejectUserRequests() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: is this the right place for this test? This one is not Kibana specific, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in a way this is a 'negative test case' for what KibanaThreadPool test is doing.
previously it was part of a super class in a testframework, not sure if it is worthy at the moment to refactor it into its own test class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@pgomulka pgomulka merged commit 4e24cf9 into elastic:main May 14, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Core Core issues without another label Team:Core/Infra Meta label for core/infra team >test Issues or PRs that are addressing/adding tests v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants