Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Coordination nodes dead due to heap memory #55618

Closed
alogishetty opened this issue Apr 22, 2020 · 8 comments
Closed

Elastic Coordination nodes dead due to heap memory #55618

alogishetty opened this issue Apr 22, 2020 · 8 comments
Assignees
Labels
:Analytics/Aggregations Aggregations feedback_needed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@alogishetty
Copy link

alogishetty commented Apr 22, 2020

Elasticsearch version (bin/elasticsearch --version): 7.5.2

Plugins installed: []

JVM version (java -version):

OS version (uname -a if on a Unix-like system): CentOS

Description of the problem including expected versus actual behavior: 2 coordination nodes part of elastic search cluster dead with 3 mins apart due to Heap memory

Heap Memory: 16GB
Server Ram: 32GB

We upgraded to 7.5.2, 2 weeks ago. We started with 7.3.2 last year, but we haven't faced any issue like this. We would like to get more information on why this happened.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Provide logs (if relevant):

"json": {
      "component": "o.e.b.ElasticsearchUncaughtExceptionHandler",
      "cluster.uuid": "eUfRAIpMTviuB5jLJN6d7w",
      "stacktrace": [
        "java.lang.OutOfMemoryError: Java heap space",
        "at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:472) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus.<init>(HyperLogLogPlusPlus.java:772) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus.readFrom(HyperLogLogPlusPlus.java:1171) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.InternalCardinality.<init>(InternalCardinality.java:51) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.SearchModule$$Lambda$1534/0x0000000801125c40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.lambda$new$1(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations$$Lambda$4065/0x00000008017dfc40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.<init>(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite$InternalBucket.<init>(InternalComposite.java:270) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite.lambda$new$0(InternalComposite.java:82) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite$$Lambda$4458/0x000000080189c440.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite.<init>(InternalComposite.java:82) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.SearchModule$$Lambda$1680/0x0000000801178c40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.lambda$new$1(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations$$Lambda$4065/0x00000008017dfc40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.<init>(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.query.QuerySearchResult.readFromWithId(QuerySearchResult.java:276) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:72) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.action.search.SearchTransportService$$Lambda$4115/0x00000008017fe440.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:69) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:36) ~[elasticsearch-7.5.2.jar:7.5.2]"
      ],
      "node.id": "2TAdJcnET-OpGygqlvqUKQ",
      "timestamp": "2020-04-21T16:56:42,090-05:00",
      "message": "fatal error in thread [Thread-4], exiting",
      "level": "ERROR",
      "type": "server"
    }
@danielmitterdorfer
Copy link
Member

Thanks for raising this issue. Do you have any specific steps to reproduce this?

Note: This one might be a duplicate of #46116 despite the versions being different.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

@elasticmachine elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020
@nik9000 nik9000 self-assigned this May 4, 2020
@nik9000
Copy link
Member

nik9000 commented May 4, 2020

I'm working on a thread that started with #54758 that should give us better control over memory usage on the coordinating node. I don't expect it to bear fruit super soon though. In the mean time I suggest lowering the size of the composite aggregation and/or lowering the precision on the cardinality agg.

@alogishetty
Copy link
Author

alogishetty commented May 4, 2020

We had an issue on Master nodes, on 4-29-2020 due to many garbage collection cycles. Our cluster we Red and went into cycles from Red to Yellow to Red for hours. Until it recovered by itself. Seems like we had the same garbage collection issue on co-ordination nodes too.

Master node (1 of 3 nodes):
image

Co-ordination node (1 of 2 nodes):
image

@nik9000
Copy link
Member

nik9000 commented May 5, 2020

Master nodes

I can't really comment on what is up with the master nodes. The stack trace that you linked when creating the issue is a coordinating node issue. Are you sending queries to your master nodes? In general that is ok, but if you are having issues like this with agg reduction running you out of memory then I'd avoid it.

@alogishetty
Copy link
Author

alogishetty commented May 5, 2020

We have a cluster of 18, with 3 master nodes, 12 data nodes, 1 ingest node and 2 coordination nodes. And your master nodes, don't take any queries.

Seems like there is a memory management issue with elastic 7.5.2

@nik9000
Copy link
Member

nik9000 commented May 5, 2020

Your best bet for the master nodes is to get a heap dump and have a look at what is in there and open a new issue when you have an idea what is up there. The issue you've opened here is a coordinating node one and I'm working on it. Just, slowly.

@iverase
Copy link
Contributor

iverase commented Jun 18, 2020

There has been several improvements in memory management in the latests release. In particular for coordinating nodes, for example #46751 or #54758. I am closing for the time being this issue as there is no action to be done at this moment. Please feel free to report the issue again if it keeps happening.

@iverase iverase closed this as completed Jun 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/Aggregations Aggregations feedback_needed Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

No branches or pull requests

5 participants