-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Flaky org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT test #10154
Comments
Relates opensearch-project#10154 Signed-off-by: Andrew Ross <[email protected]>
Relates opensearch-project#10154 Signed-off-by: Andrew Ross <[email protected]>
Relates #10154 (cherry picked from commit c676479) Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…10752) Relates opensearch-project#10154 Signed-off-by: Andrew Ross <[email protected]>
On checking more, the Circuit Breaker for one of the threads is not getting reset back to zero. Following is the stacktrace:
|
Exception received on test failure is as follows: The random seed to reproduce the test faster is -Dtests.seed=F3F7FF2B9B35932E
|
The test failure is received only in the case of concurrent search, the above exception hints to a memory leak happening in one of the threads. On looking deeper, following looks to be the problem leading to the memory leak: The test This theory is validated by adding a sleep before throwing the exception which solves the issue: neetikasinghal@f38531a |
The solution for the memory leak can be solved by one of the following choices:
The above solution is also validated by 500 successful test runs. Hence, I would recommend option 2, which also looks to be a cleaner solution coming from upstream. @sohami @andrross @reta I would love to hear your thoughts on this. |
Thanks @neetikasinghal for looking into this. Taking Lucene side of changes makes sense to me instead of re-implementing it in OpenSearch. Also there is a follow-up PR in lucene to make it cancel already running tasks too. Ref here which will further improve it. We can keep an eye on that for future releases as an improvement. |
Thanks a lot @neetikasinghal , I side with you (and @sohami ) here to rely on Apache Lucene 9.9.x (the #11421 should be integrated soon). |
The reported test have failed in one of the PR builds - https://build.ci.opensearch.org/job/gradle-check/32188/. Reopening this issue. |
I am able to reproduce this with seed: -Dtests.seed=998015FD102898B9.
I will dive deeper into this to check further. |
I am able to figure out the root-cause of the memory leak happening. During the execution of the index search in the test, The collection strategy initialization flow is as follows:
In the ReorganizingLongHash's constructor, there are two big arrays initialized whose memory is accounted by the Circuit Breaker here. In the happy case scenario, the GlobalOrdinalsStringTermsAggregator is initialized which initializes the collectionStrategy and the arrays in ReorganizingLongHash's constructor are accounted by the CircuitBreaker. When a CircuitBreakingException is hit on any other code flow, the SearchContext.close() is called which further calls close on GlobalOrdinalsStringTermsAggregator and since the collectionStrategy is not null, close is called on ReorganizingLongHash's arrays as well, accounted by the CircuitBreaker and hence there is no memory leak. In order to deal with this, close needs to be explicitly called in ReorganizingLongHash's constructor when an exception is encountered. This is done as part of the PR #11953 |
…10752) Relates opensearch-project#10154 Signed-off-by: Andrew Ross <[email protected]>
…10752) Relates opensearch-project#10154 Signed-off-by: Andrew Ross <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>
Describe the bug
org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker
is flaky.To Reproduce
org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker {p0={"search.concurrent_segment_search.enabled":"true"}}
org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker {p0={"search.concurrent_segment_search.enabled":"false"}}
Expected behavior
Test should always pass.
Plugins
Please list all plugins currently enabled.
Screenshots
If applicable, add screenshots to help explain your problem.
Host/Environment (please complete the following information):
Additional context
CI - https://build.ci.opensearch.org/job/gradle-check/25992/
The text was updated successfully, but these errors were encountered: