Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] CardinalityWithRequestBreakerIT.testRequestBreaker fails on 7.9 #62439

Closed
cbuescher opened this issue Sep 16, 2020 · 8 comments · Fixed by #62685
Closed

[CI] CardinalityWithRequestBreakerIT.testRequestBreaker fails on 7.9 #62439

cbuescher opened this issue Sep 16, 2020 · 8 comments · Fixed by #62685
Assignees
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI v7.9.2

Comments

@cbuescher
Copy link
Member

Build scan:

https://gradle-enterprise.elastic.co/s/2coyhtowa22hi

Repro line:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker"
-Dtests.seed=ABEA4F928B9BAFCF
-Dtests.security.manager=true
-Dtests.locale=hr
-Dtests.timezone=Asia/Ust-Nera
-Druntime.java=8

Reproduces locally?:

yes

Applicable branches:

7.9

Failure history:

Another one on Sep 8th might be related: https://gradle-enterprise.elastic.co/s/e2ht4jwbic3dq

Failure excerpt:

java.lang.AssertionError: Request breaker not reset to 0 on node: node_s1
    Expected: <0L>
         but: was <524288L>
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
        at org.junit.Assert.assertThat(Assert.java:956)
        at org.elasticsearch.test.InternalTestCluster.lambda$ensureEstimatedStats$39(InternalTestCluster.java:2501)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:951)
        at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:924)
        at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2499)
        at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:94)
        at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2523)
        at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:598)
        at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2233)
java.lang.RuntimeException: 2 arrays have not been released
        at org.elasticsearch.common.util.MockBigArrays.ensureAllArraysAreReleased(MockBigArrays.java:70)
        at org.elasticsearch.test.ESTestCase.checkStaticState(ESTestCase.java:528)
        at org.elasticsearch.test.ESTestCase.after(ESTestCase.java:368)
@cbuescher cbuescher added >test-failure Triaged test failures from CI :Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. v7.9.2 labels Sep 16, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (:Distributed/Distributed)

@elasticmachine elasticmachine added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label Sep 16, 2020
@cbuescher
Copy link
Member Author

@henningandersen since you seem to have authored this test, could you take a look?

@cbuescher
Copy link
Member Author

Just found another one today: https://gradle-enterprise.elastic.co/s/2coyhtowa22hi

@danhermann danhermann added v7.9.3 and removed v7.9.2 labels Sep 18, 2020
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Sep 21, 2020
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes elastic#62439
henningandersen added a commit that referenced this issue Sep 21, 2020
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes #62439
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Sep 21, 2020
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes elastic#62439
henningandersen added a commit to henningandersen/elasticsearch that referenced this issue Sep 21, 2020
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes elastic#62439
henningandersen added a commit that referenced this issue Sep 21, 2020
If HyperLogLogPlusPlus failed during construction, it would
not release already allocated resources, causing the request
circuit breaker to not be adjusted down.

Closes #62439
@danhermann danhermann added v7.9.2 and removed v7.9.3 labels Sep 23, 2020
@dimitris-athanasiou
Copy link
Contributor

I just hit this type of failure today in a CI for a PR: https://gradle-enterprise.elastic.co/s/yryepezu7wpfk

java.lang.AssertionError: Request breaker not reset to 0 on node: node_s1
Expected: <0L>
     but: was <32L>
	at __randomizedtesting.SeedInfo.seed([5B1C745938F188BC:E34E4919D7315AF6]:0)
	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)
	at org.junit.Assert.assertThat(Assert.java:956)
	at org.elasticsearch.test.InternalTestCluster.lambda$ensureEstimatedStats$41(InternalTestCluster.java:2542)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:1019)
	at org.elasticsearch.test.ESTestCase.assertBusy(ESTestCase.java:992)
	at org.elasticsearch.test.InternalTestCluster.ensureEstimatedStats(InternalTestCluster.java:2540)
	at org.elasticsearch.test.TestCluster.assertAfterTest(TestCluster.java:83)
	at org.elasticsearch.test.InternalTestCluster.assertAfterTest(InternalTestCluster.java:2564)
	at org.elasticsearch.test.ESIntegTestCase.afterInternal(ESIntegTestCase.java:596)
	at org.elasticsearch.test.ESIntegTestCase.cleanUpCluster(ESIntegTestCase.java:2255)
	at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)

@henningandersen
Copy link
Contributor

This also failed recently in a regular 7.x build:

https://gradle-enterprise.elastic.co/s/kbti45zr4cp5s

Build history

@henningandersen
Copy link
Contributor

This test failed in a master build with a leak:

[2021-05-11T09:19:14,003][ERROR][o.e.t.LeakTracker        ] [node_s3] LEAK: resource was not cleaned up before it was garbage-collected.

It can be reproduced using:

./gradlew ':server:internalClusterTest' --tests "org.elasticsearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT.testRequestBreaker" -Dtests.seed=D3BE36364C08A43A -Dtests.locale=mt-MT -Dtests.timezone=Asia/Baghdad -Druntime.java=16 -Dtests.fips.enabled=true

Fix is incoming.

henningandersen added a commit to henningandersen/elasticsearch that referenced this issue May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates elastic#62439 and elastic#72309
henningandersen added a commit that referenced this issue May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates #62439 and #72309

Closes #72923
henningandersen added a commit that referenced this issue May 12, 2021
If consuming a query result were disrupted by circuit breaker we would
leak memory for aggs in buffered query results, fixed.

Relates #62439 and #72309

Closes #72923
@henningandersen
Copy link
Contributor

henningandersen commented Jun 9, 2021

The last test failure was against 7.12. It looks like this stopped occurring, at least I have not been able to reproduce in a loop or with the reproduce lines. Will wait a bit longer before closing this.

Notice that the previous leak and fix was only in 7.x/master.

@henningandersen
Copy link
Contributor

This stopped occurring. The fix in #72966 is unlikely to have fixed the incident reported by Dimitris, but it is very likely that a fix of resource handling has been made in anything from transport to cardinality aggregations and I am therefore closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Area. Please avoid if you can. Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. >test-failure Triaged test failures from CI v7.9.2
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants