Elastic Coordination nodes dead due to heap memory #55618

alogishetty · 2020-04-22T19:02:21Z

Elasticsearch version (bin/elasticsearch --version): 7.5.2

Plugins installed: []

JVM version (java -version):

OS version (uname -a if on a Unix-like system): CentOS

Description of the problem including expected versus actual behavior: 2 coordination nodes part of elastic search cluster dead with 3 mins apart due to Heap memory

Heap Memory: 16GB
Server Ram: 32GB

We upgraded to 7.5.2, 2 weeks ago. We started with 7.3.2 last year, but we haven't faced any issue like this. We would like to get more information on why this happened.

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Provide logs (if relevant):

"json": {
      "component": "o.e.b.ElasticsearchUncaughtExceptionHandler",
      "cluster.uuid": "eUfRAIpMTviuB5jLJN6d7w",
      "stacktrace": [
        "java.lang.OutOfMemoryError: Java heap space",
        "at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:472) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus.<init>(HyperLogLogPlusPlus.java:772) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.HyperLogLogPlusPlus.readFrom(HyperLogLogPlusPlus.java:1171) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.metrics.InternalCardinality.<init>(InternalCardinality.java:51) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.SearchModule$$Lambda$1534/0x0000000801125c40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.lambda$new$1(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations$$Lambda$4065/0x00000008017dfc40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.<init>(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite$InternalBucket.<init>(InternalComposite.java:270) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite.lambda$new$0(InternalComposite.java:82) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite$$Lambda$4458/0x000000080189c440.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.bucket.composite.InternalComposite.<init>(InternalComposite.java:82) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.SearchModule$$Lambda$1680/0x0000000801178c40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:46) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.NamedWriteableAwareStreamInput.readNamedWriteable(NamedWriteableAwareStreamInput.java:39) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.lambda$new$1(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations$$Lambda$4065/0x00000008017dfc40.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.common.io.stream.StreamInput.readCollection(StreamInput.java:1167) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.common.io.stream.StreamInput.readList(StreamInput.java:1134) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.aggregations.InternalAggregations.<init>(InternalAggregations.java:74) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.query.QuerySearchResult.readFromWithId(QuerySearchResult.java:276) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.search.query.QuerySearchResult.<init>(QuerySearchResult.java:72) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.action.search.SearchTransportService$$Lambda$4115/0x00000008017fe440.read(Unknown Source) ~[?:?]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:69) ~[elasticsearch-7.5.2.jar:7.5.2]",
        "at org.elasticsearch.action.ActionListenerResponseHandler.read(ActionListenerResponseHandler.java:36) ~[elasticsearch-7.5.2.jar:7.5.2]"
      ],
      "node.id": "2TAdJcnET-OpGygqlvqUKQ",
      "timestamp": "2020-04-21T16:56:42,090-05:00",
      "message": "fatal error in thread [Thread-4], exiting",
      "level": "ERROR",
      "type": "server"
    }

The text was updated successfully, but these errors were encountered:

danielmitterdorfer · 2020-05-04T08:33:55Z

Thanks for raising this issue. Do you have any specific steps to reproduce this?

Note: This one might be a duplicate of #46116 despite the versions being different.

elasticmachine · 2020-05-04T08:34:05Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-05-04T13:11:56Z

I'm working on a thread that started with #54758 that should give us better control over memory usage on the coordinating node. I don't expect it to bear fruit super soon though. In the mean time I suggest lowering the size of the composite aggregation and/or lowering the precision on the cardinality agg.

alogishetty · 2020-05-04T19:18:59Z

We had an issue on Master nodes, on 4-29-2020 due to many garbage collection cycles. Our cluster we Red and went into cycles from Red to Yellow to Red for hours. Until it recovered by itself. Seems like we had the same garbage collection issue on co-ordination nodes too.

Master node (1 of 3 nodes):

Co-ordination node (1 of 2 nodes):

nik9000 · 2020-05-05T14:15:34Z

Master nodes

I can't really comment on what is up with the master nodes. The stack trace that you linked when creating the issue is a coordinating node issue. Are you sending queries to your master nodes? In general that is ok, but if you are having issues like this with agg reduction running you out of memory then I'd avoid it.

alogishetty · 2020-05-05T21:28:08Z

We have a cluster of 18, with 3 master nodes, 12 data nodes, 1 ingest node and 2 coordination nodes. And your master nodes, don't take any queries.

Seems like there is a memory management issue with elastic 7.5.2

nik9000 · 2020-05-05T23:49:33Z

Your best bet for the master nodes is to get a heap dump and have a look at what is in there and open a new issue when you have an idea what is up there. The issue you've opened here is a coordinating node one and I'm working on it. Just, slowly.

iverase · 2020-06-18T09:29:49Z

There has been several improvements in memory management in the latests release. In particular for coordinating nodes, for example #46751 or #54758. I am closing for the time being this issue as there is no action to be done at this moment. Please feel free to report the issue again if it keeps happening.

danielmitterdorfer added :Analytics/Aggregations Aggregations feedback_needed labels May 4, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label May 4, 2020

nik9000 self-assigned this May 4, 2020

iverase closed this as completed Jun 18, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Coordination nodes dead due to heap memory #55618

Elastic Coordination nodes dead due to heap memory #55618

alogishetty commented Apr 22, 2020 •

edited

Loading

danielmitterdorfer commented May 4, 2020

elasticmachine commented May 4, 2020

nik9000 commented May 4, 2020

alogishetty commented May 4, 2020 •

edited

Loading

nik9000 commented May 5, 2020

alogishetty commented May 5, 2020 •

edited

Loading

nik9000 commented May 5, 2020

iverase commented Jun 18, 2020

Elastic Coordination nodes dead due to heap memory #55618

Elastic Coordination nodes dead due to heap memory #55618

Comments

alogishetty commented Apr 22, 2020 • edited Loading

danielmitterdorfer commented May 4, 2020

elasticmachine commented May 4, 2020

nik9000 commented May 4, 2020

alogishetty commented May 4, 2020 • edited Loading

nik9000 commented May 5, 2020

alogishetty commented May 5, 2020 • edited Loading

nik9000 commented May 5, 2020

iverase commented Jun 18, 2020

alogishetty commented Apr 22, 2020 •

edited

Loading

alogishetty commented May 4, 2020 •

edited

Loading

alogishetty commented May 5, 2020 •

edited

Loading