Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#6]], exiting java.lang.OutOfMemoryError: Java heap space #26525

Closed
hexinw opened this issue Sep 6, 2017 · 26 comments
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload

Comments

@hexinw
Copy link

hexinw commented Sep 6, 2017

Elasticsearch version (bin/elasticsearch --version):
Version: 5.4.2, Build: 929b078/2017-06-15T02:29:28.122Z, JVM: 1.8.0_131

Plugins installed: []

JVM version (java -version):

openjdk version "1.8.0_131"
OpenJDK Runtime Environment (build 1.8.0_131-b12)
OpenJDK 64-Bit Server VM (build 25.131-b12, mixed mode)

OS version (uname -a if on a Unix-like system):
Linux dev-v1.0-akshay-es-1 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior:
Circuit break exception fails to guard against some aggregation search, which causes elasticsearch out of memory exception and brings down elasticsearch node.

Stack trace of the out of memory exception:

[2017-09-01T13:51:20,838][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#6]], exiting
java.lang.OutOfMemoryError: Java heap space
        at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:99) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.PageCacheRecycler$1.newInstance(PageCacheRecycler.java:96) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:53) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.AbstractRecycler.obtain(AbstractRecycler.java:33) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.DequeRecycler.obtain(DequeRecycler.java:28) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.Recyclers$3.obtain(Recyclers.java:119) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.recycler.FilterRecycler.obtain(FilterRecycler.java:39) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.PageCacheRecycler.bytePage(PageCacheRecycler.java:147) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.AbstractBigArray.newBytePage(AbstractBigArray.java:117) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.BigByteArray.resize(BigByteArray.java:143) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.BigArrays.resizeInPlace(BigArrays.java:449) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.BigArrays.resize(BigArrays.java:496) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.BigArrays.grow(BigArrays.java:513) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.ensureCapacity(HyperLogLogPlusPlus.java:197) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.collect(HyperLogLogPlusPlus.java:232) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator$DirectCollector.collect(CardinalityAggregator.java:199) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:72) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramAggregator$1.collect(DateHistogramAggregator.java:116) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.AggregatorFactory$MultiBucketAggregatorWrapper$1.collect(AggregatorFactory.java:136) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:80) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.collectGlobalOrd(GlobalOrdinalsStringTermsAggregator.java:129) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.access$100(GlobalOrdinalsStringTermsAggregator.java:58) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$2.collect(GlobalOrdinalsStringTermsAggregator.java:168) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.LeafBucketCollector.collect(LeafBucketCollector.java:82) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.apache.lucene.search.MultiCollector$MultiLeafCollector.collect(MultiCollector.java:174) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
        at org.apache.lucene.search.Weight$DefaultBulkScorer.scoreAll(Weight.java:254) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
        at org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:197) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
        at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:669) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
        at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473) ~[lucene-core-6.5.1.jar:6.5.1 cd1f23c63abe03ae650c75ec8ccb37762806cc75 - jimczi - 2017-04-21 12:17:15]
[2017-09-01T13:51:20,858][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#11]], exiting
java.lang.OutOfMemoryError: Java heap space
        at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:490) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.<init>(HyperLogLogPlusPlus.java:171) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.metrics.cardinality.CardinalityAggregator.buildAggregation(CardinalityAggregator.java:145) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.histogram.DateHistogramAggregator.buildAggregation(DateHistogramAggregator.java:129) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.AggregatorFactory$MultiBucketAggregatorWrapper.buildAggregation(AggregatorFactory.java:147) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.bucketAggregations(BucketsAggregator.java:116) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator.buildAggregation(GlobalOrdinalsStringTermsAggregator.java:239) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.aggregations.AggregationPhase.execute(AggregationPhase.java:129) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:114) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesService.lambda$loadIntoContext$16(IndicesService.java:1107) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesService$$Lambda$2157/565932108.accept(Unknown Source) ~[?:?]
        at org.elasticsearch.indices.IndicesService.lambda$cacheShardLevelResult$18(IndicesService.java:1188) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesService$$Lambda$2158/111751494.get(Unknown Source) ~[?:?]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:160) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesRequestCache$Loader.load(IndicesRequestCache.java:143) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.common.cache.Cache.computeIfAbsent(Cache.java:401) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesRequestCache.getOrCompute(IndicesRequestCache.java:116) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesService.cacheShardLevelResult(IndicesService.java:1194) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.indices.IndicesService.loadIntoContext(IndicesService.java:1106) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:245) ~[elasticsearch-5.4.2.jar:5.4.2]
        at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:262) ~[elasticsearch-5.4.2.jar:5.4.2]

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

Provide logs (if relevant):
Query DSL that brings down the elasticsearch node.

nodes_info.go                              search_queries_terms.go
nodes_info_test.go                         search_queries_terms_test.go
nodes_stats.go                             search_queries_term_test.go
nodes_stats_test.go                        search_queries_type.go
percolate_test.go                          search_queries_type_test.go
ping.go                                    search_queries_wildcard.go
ping_test.go                               search_queries_wildcard_test.go
plugins.go                                 search_request.go
plugins_test.go                            search_request_test.go
put_template.go                            search_source.go
put_template_test.go                       search_source_test.go
query.go                                   search_suggester_test.go
README.md                                  search_terms_lookup.go
recipes                                    search_terms_lookup_test.go
reindex.go                                 search_test.go
reindex_test.go                            setup_test.go
request.go                                 snapshot_create_repository.go
request_test.go                            snapshot_create_repository_test.go
rescore.go                                 snapshot_delete_repository.go
rescorer.go                                snapshot_delete_repository_test.go
response.go                                snapshot_get_repository.go
retrier.go                                 snapshot_get_repository_test.go
retrier_test.go                            snapshot_verify_repository.go
retry.go                                   snapshot_verify_repository_test.go
retry_test.go                              sort.go
run-es-5.0.0.sh                            sort_test.go
run-es-5.0.1.sh                            suggester_completion_fuzzy.go
run-es-5.1.1.sh                            suggester_completion_fuzzy_test.go
run-es-5.1.2.sh                            suggester_completion.go
run-es-5.2.0.sh                            suggester_completion_test.go
run-es-5.2.1.sh                            suggester_context_category.go
run-es-5.2.2.sh                            suggester_context_category_test.go
            "fuzzy_max_expansions" : 50,
            "phrase_slop" : 0,
            "analyze_wildcard" : true,
            "escape" : false,
            "split_on_whitespace" : true,
            "boost" : 1.0
          }
        }
      ],
      "disable_coord" : false,
      "adjust_pure_negative" : true,
      "boost" : 1.0
    }
  },
  "aggregations" : {
    "4" : {
      "terms" : {
        "field" : "node_uuid",
        "size" : 500,
        "min_doc_count" : 1,
        "shard_min_doc_count" : 0,
        "show_term_doc_count_error" : false,
        "order" : [
          {
            "_term" : "desc"
          }
        ]
      },
      "aggregations" : {
        "2" : {
          "date_histogram" : {
            "field" : "timestamp",
            "format" : "epoch_millis",
            "interval" : "5m",
            "offset" : 0,
            "order" : {
              "_key" : "asc"
            },
            "keyed" : false,
            "min_doc_count" : 0,
            "extended_bounds" : {
              "min" : "1504126054711",
              "max" : "1504298854711"
            }
          },
          "aggregations" : {
            "1" : {
              "cardinality" : {
                "field" : "pid"
              }
            }
          }
        }
      }
    }
  }
}}] lastShard [true]
@dakrone
Copy link
Member

dakrone commented Sep 6, 2017

I believe this is related to #26012

@costin
Copy link
Member

costin commented Sep 7, 2017

updated the initial request and added minimal formatting to make the post readable

Thanks @dakrone, indeed it looks like a duplicate of #26012 ; @hexinw can you please confirm?

Cheers,

@costin costin added the :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload label Sep 7, 2017
@hexinw
Copy link
Author

hexinw commented Sep 7, 2017

screen shot 2017-09-07 at 10 11 26 am

This is the eclipse MAT report on the heap usage when elastic search crashed. Yes it looks like a dupe to #26012.

@hexinw
Copy link
Author

hexinw commented Sep 11, 2017

I changed the "min_doc_count" to avoid empty bucks in date histogram. I still run into the OOM problem. So I am actually not sure if solution discussed in #26012 would solve this problem. @colings86 @jimczi, could you comment?

Note I also tried out the "global_ordinals_hash" mentioned in #24359 and it didn't help, either

GET nusights_metric_cfs_collector_memcpu_stats_2017_*/_search
{
"size" : 0,
"query" : {
"bool" : {
"filter" : [
{
"range" : {
"timestamp" : {
"from" : "1504126054711",
"to" : "1504298854711",
"include_lower" : true,
"include_upper" : true,
"format" : "epoch_millis",
"boost" : 1.0
}
}
},
{
"query_string" : {
"query" : "(process: cfs)",
"fields" : [ ],
"use_dis_max" : true,
"tie_breaker" : 0.0,
"default_operator" : "or",
"auto_generate_phrase_queries" : false,
"max_determinized_states" : 10000,
"enable_position_increments" : true,
"fuzziness" : "AUTO",
"fuzzy_prefix_length" : 0,
"fuzzy_max_expansions" : 50,
"phrase_slop" : 0,
"analyze_wildcard" : true,
"escape" : false,
"split_on_whitespace" : true,
"boost" : 1.0
}
}
],
"disable_coord" : false,
"adjust_pure_negative" : true,
"boost" : 1.0
}
},
"aggregations" : {
"4" : {
"terms" : {
"field" : "node_uuid",
"size" : 500,
"execution_hint": "global_ordinals_hash",
"min_doc_count" : 1,
"shard_min_doc_count" : 0,
"show_term_doc_count_error" : false,
"order" : [
{
"_term" : "desc"
}
]
},
"aggregations" : {
"2" : {
"date_histogram" : {
"field" : "timestamp",
"format" : "epoch_millis",
"interval" : "5m",
"offset" : 0,
"order" : {
"_key" : "asc"
},
"keyed" : false,
"min_doc_count" : 1,
"extended_bounds" : {
"min" : "1504126054711",
"max" : "1504298854711"
}
},
"aggregations" : {
"1" : {
"cardinality": {
"field": "pid"
}
}
}
}
}
}
}
}

@hexinw
Copy link
Author

hexinw commented Sep 11, 2017

Noticed this cardinality aggregator memory usage model on current release.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-cardinality-aggregation.html

Start excerpt >>>
For a precision threshold of c, the implementation that we are using requires about c * 8 bytes.
End excerpt >>>

If this is the case, memory estimate for default 'prevision_threshold' of 30000 * 8 ~= 240KB. This is way bigger than the 5KB default weight in AggregatorBase. Does it mean we need to tweak the circuit breaker adjustment in CardinalityAggregator as well?

@hexinw
Copy link
Author

hexinw commented Sep 11, 2017

Also feel free to close this issue if you guys think it is a dup. I went through some of the initial issues report in GitHub and Elastic discussions before opening up this ticket. Saw something similar but not sure they are exactly the same.

@dakrone
Copy link
Member

dakrone commented Sep 11, 2017

This is way bigger than the 5KB default weight in AggregatorBase. Does it mean we need to tweak the circuit breaker adjustment in CardinalityAggregator as well?

The CardinalityAggregator directly uses BigArrays which has its own circuit breaking accounting, it should not need to be adjusted.

I believe this particular issue has to do with top-level histogram aggregations (of course I could be mistaken)

@colings86
Copy link
Contributor

Adding global_ordinals_hash to the request will only prevent the OOME in the cases where the query restricts the number of terms buckets created to a small number compared with the total number of terms int he field (i.e. the sparse terms case). If the query matches a lot of terms then it still has the potential to cause problems. I think we should try to find out if this is the case here.

@hexinw could you run the following request and paste the output on this issue?

GET nusights_metric_cfs_collector_memcpu_stats_2017_*/_search{  
   "size":0,
   "query":{  
      "bool":{  
         "filter":[  
            {  
               "range":{  
                  "timestamp":{  
                     "from":"1504126054711",
                     "to":"1504298854711",
                     "include_lower":true,
                     "include_upper":true,
                     "format":"epoch_millis",
                     "boost":1.0
                  }
               }
            },
            {  
               "query_string":{  
                  "query":"(process: cfs)",
                  "fields":[  

                  ],
                  "use_dis_max":true,
                  "tie_breaker":0.0,
                  "default_operator":"or",
                  "auto_generate_phrase_queries":false,
                  "max_determinized_states":10000,
                  "enable_position_increments":true,
                  "fuzziness":"AUTO",
                  "fuzzy_prefix_length":0,
                  "fuzzy_max_expansions":50,
                  "phrase_slop":0,
                  "analyze_wildcard":true,
                  "escape":false,
                  "split_on_whitespace":true,
                  "boost":1.0
               }
            }
         ],
         "disable_coord":false,
         "adjust_pure_negative":true,
         "boost":1.0
      }
   },
   "aggregations":{  
      "node_uuid_cardinality":{  
         "cardinality":{  
            "field":"node_uuid"
         }
      }
   }
}

@jimczi
Copy link
Contributor

jimczi commented Sep 12, 2017

I don't think this is related to #26012.
At least I don't think the solution proposed in #26012 could catch this better than the circuit breaker.

The circuit breaker for requests (indices.breaker.request.limit) defaults to 60% of the heap.
What size of heap are you using in your nodes ? 60% is quite high so there is a chance that you hit an OOME before that just because your node needs more than 40% of the total heap to survive.
If you want to protect against memory intensive aggregation you could try to lower this value.
The aggregation in the example can create at most 144,000 buckets (500 for the terms agg * 288 for the five minutes interval). That's big if all buckets need to compute a cardinality.
There are other data structures that are not count in the circuit breaker but I doubt that they are the main cause. The cardinality agg alone would need 38GB of accounted memory to finish (with 144,000 buckets).
Can you try to adjust the indices.breaker.request.limit and report if it worked ?

@jimczi
Copy link
Contributor

jimczi commented Sep 12, 2017

The default value for the precision_threshold is 3000 and not 30000 so the cardinality agg would need 3.8GB and not 38GB like said in the previous comment. Although it's still big and accounts for a single query so I guess that you have a small heap size or a lot of concurrent requests ?

@hexinw
Copy link
Author

hexinw commented Sep 12, 2017

@colings86 Yes there are more than 500 distinct nodes in the indices.

The output to the node cardinality aggregation query:

{
"took": 2,
"timed_out": false,
"_shards": {
"total": 10,
"successful": 10,
"failed": 0
},
"hits": {
"total": 237294,
"max_score": 0,
"hits": []
},
"aggregations": {
"node_uuid_cardinality": {
"value": 512
}
}
}

@hexinw
Copy link
Author

hexinw commented Sep 12, 2017

@jimczi Yes I have a pretty small elastic search single node cluster with small memory footprint (8G memory and 4G heap for ES), just for experiment on the date histogram and cardinality aggregation stuff. Not for production yet. Just a bit unprepared to realize that the system can consume a large memory consumption by the aggregations.

Sorry for the typo, with 3000 precision, 24KB per bucket consumption is not bad to me.

I think reducing indices.breaker.request.limit should help for the circuit breaker to come in. Let me confirm.

@hexinw
Copy link
Author

hexinw commented Sep 12, 2017

I changed indices.breaker.request.limit to 30% and it triggered the circuit break.

{
"error": {
"root_cause": [
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064832/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064832,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556080808/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556080808,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556130048/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556130048,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097504/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097504,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556080680/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556080680,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064960/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064960,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064792/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064792,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097608/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097608,
"bytes_limit": 2556061286
},
{
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097280/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097280,
"bytes_limit": 2556061286
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_08",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064832/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064832,
"bytes_limit": 2556061286
}
},
{
"shard": 0,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_09",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556080808/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556080808,
"bytes_limit": 2556061286
}
},
{
"shard": 1,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_08",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556130048/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556130048,
"bytes_limit": 2556061286
}
},
{
"shard": 1,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_09",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097504/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097504,
"bytes_limit": 2556061286
}
},
{
"shard": 2,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_09",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556080680/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556080680,
"bytes_limit": 2556061286
}
},
{
"shard": 3,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_08",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064960/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064960,
"bytes_limit": 2556061286
}
},
{
"shard": 3,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_09",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556064792/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556064792,
"bytes_limit": 2556061286
}
},
{
"shard": 4,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_08",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097608/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097608,
"bytes_limit": 2556061286
}
},
{
"shard": 4,
"index": "nusights_metric_cfs_collector_memcpu_stats_2017_09",
"node": "bfIPOYfyTg-GTU5CKY9ggA",
"reason": {
"type": "circuit_breaking_exception",
"reason": "[request] Data too large, data for [<reused_arrays>] would be [2556097280/2.3gb], which is larger than the limit of [2556061286/2.3gb]",
"bytes_wanted": 2556097280,
"bytes_limit": 2556061286
}
}
]
},
"status": 503
}

@hexinw
Copy link
Author

hexinw commented Sep 14, 2017

I enabled the trace log to understand how the request circuit breaker is doing the heap usage estimation. I noticed that the OOM seems to happen way after the heap usage estimation going down to 0gb.

Noticed the heap estimation by circuit breaker reaches peak 3.9gb around 10:49:11. It goes down to
0 around 10:49:21,922. But OOM happens in 10:50:42.

Does it mean some other circuit breaker logic is required to catch this case?

Heap estimation reaches peak

[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [568b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188695184 [3.9gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-504] bytes, now [4188694680]
[2017-09-14T10:49:11,391][TRACE][o.e.i.e.E.TMP ] [node01-es-dev] [.monitoring-kibana-2-2017.09.14][0] elasticsearch[node01-es-dev][refresh][T#3] TMP: seg=_5hw(6.5.1):c1 size=0.013 MB [floored]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [328b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188695008 [3.9gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-296] bytes, now [4188694712]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [328b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188695040 [3.9gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-296] bytes, now [4188694744]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [568b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188695312 [3.9gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-504] bytes, now [4188694808]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [144kb][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188842264 [3.9gb]]
[2017-09-14T10:49:11,388][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.8gb], limit: 5112122572 [4.7gb], estimate: 4184527824 [3.8gb]]
[2017-09-14T10:49:11,388][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.8gb], limit: 5112122572 [4.7gb], estimate: 4184527824 [3.8gb]]
[2017-09-14T10:49:11,388][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.8gb], limit: 5112122572 [4.7gb], estimate: 4184527824 [3.8gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188842328 [3.9gb]]
[2017-09-14T10:49:11,391][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188842392 [3.9gb]]
[2017-09-14T10:49:11,392][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-64] bytes, now [4188842328]
[2017-09-14T10:49:11,388][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.8gb], limit: 5112122572 [4.7gb], estimate: 4184527824 [3.8gb]]
[2017-09-14T10:49:11,392][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.9gb], limit: 5112122572 [4.7gb], estimate: 4188842328 [3.9gb]]
[2017-09-14T10:49:11,388][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.8gb], limit: 5112122572 [4.7gb], estimate: 4184527824 [3.8gb]]
[2017-09-14T10:49:11,392][TRACE][o.e.i.b.request ] [request] Adding [64b][<reused_arrays>] to used bytes [new used: [3.9gb],

Heap estimation goes down to 0
[2017-09-14T10:49:21,922][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [-96] bytes, now [0]
[2017-09-14T10:49:21,922][TRACE][o.e.t.TaskManager ] [node01-es-dev] unregister task for id: 875170
[2017-09-14T10:49:21,922][TRACE][o.e.t.T.tracer ] [node01-es-dev] [312235][indices:data/read/search[phase/query]] sent response
[2017-09-14T10:49:21,922][TRACE][o.e.t.T.tracer ] [node01-es-dev] [312235][indices:data/read/search[phase/query]] received response from [{node01-es-dev}{B_AYP0NDRD2LErSiyA1H9Q}{VktFHTHDSZeqjNiVZilxaw}{10.4.244.78}{10.4.244.78:9300}{ml.enabled=true}]
[2017-09-14T10:49:21,922][TRACE][o.e.a.s.TransportSearchAction] [node01-es-dev] got first-phase result from [B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][3]
[2017-09-14T10:49:21,922][TRACE][o.e.a.s.TransportSearchAction] [node01-es-dev] [query] Moving to next phase: [fetch], based on results from: [B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][0],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_09][0],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][1],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_09][1],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][2],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_09][2],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][3],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_09][3],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_08][4],[B_AYP0NDRD2LErSiyA1H9Q][nusights_metric_cfs_collector_memcpu_stats_2017_09][4] (cluster state version: 34)
[2017-09-14T10:49:21,929][TRACE][o.e.i.IndexService ] [node01-es-dev] [.monitoring-kibana-2-2017.09.14] scheduling refresh every 1s
[2017-09-14T10:49:22,429][TRACE][o.e.x.w.t.s.e.TickerScheduleTriggerEngine] [node01-es-dev] checking jobs [2017-09-14T17:49:22.429Z]

OOM happens
[2017-09-14T10:50:42,858][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#9]], exiting
java.lang.OutOfMemoryError: Java heap space
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:481) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.common.util.BigArrays.newByteArray(BigArrays.java:490) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.HyperLogLogPlusPlus.(HyperLogLogPlusPlus.java:171) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.metrics.cardinality.InternalCardinality.doReduce(InternalCardinality.java:90) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:119) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:139) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram$Bucket.reduce(InternalDateHistogram.java:113) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.reduceBuckets(InternalDateHistogram.java:287) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.bucket.histogram.InternalDateHistogram.doReduce(InternalDateHistogram.java:378) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:119) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:139) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.bucket.terms.InternalTerms$Bucket.reduce(InternalTerms.java:142) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.bucket.terms.InternalTerms.doReduce(InternalTerms.java:271) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregation.reduce(InternalAggregation.java:119) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.search.aggregations.InternalAggregations.reduce(InternalAggregations.java:139) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.SearchPhaseController.reduceAggs(SearchPhaseController.java:513) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:490) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.SearchPhaseController.reducedQueryPhase(SearchPhaseController.java:408) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.SearchPhaseController$1.reduce(SearchPhaseController.java:725) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.FetchSearchPhase.innerRun(FetchSearchPhase.java:102) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.FetchSearchPhase.access$000(FetchSearchPhase.java:45) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:87) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.2.jar:5.4.2]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.2.jar:5.4.2]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_131]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_131]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-09-14T10:50:42,879][TRACE][o.e.i.b.request ] [request] Adjusted breaker by [16440] bytes, now [21904]

@jimczi
Copy link
Contributor

jimczi commented Sep 14, 2017

Aggregations are performed in two phases. The first phase runs the aggregation on all shards and the second phase reduces the shard responses to a single result. The circuit breaker checks the memory during the first phase but it is not used for the reduce phase in the coordinating node. In your example the node ran out of memory during the reduce phase where memory is not checked.
All shards were able to perform the aggregation without reaching the circuit breaker limit but the coordinating node was not able to load all the results at once to create the final response.
In 5.4 we introduced a new option to limit the number of shard results that should be reduced at once on the coordinating node:
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-request-body.html
You can try to change the default value (255) in order to reduce the memory needed for the coordinating node.

@hexinw
Copy link
Author

hexinw commented Sep 15, 2017

What is the reason that we don't put circuit breaker in the reduced phase?
Or just catch the OOM exception in reducer stage and return failure to user. This seems a better solution than crashing ES node.

The batched_reduce_size parameter is a pretty coarse parameter as different data sets and query type can allow different number of shards to be reduced at a same time.

@hexinw
Copy link
Author

hexinw commented Sep 16, 2017

Do you guys see any issue for the following code change to catch the OOM in the reduce phase?

======> git diff core/src/main/java/org/elasticsearch/action/search/FetchSearchPhase.java

diff --git a/core/src/main/java/org/elasticsearch/action/search/FetchSearchPhase.java b/core/src/main/java/org/elasticsearch/action/search/FetchSearchPhase.java
index 25231ef..d91f5d3 100644
--- a/core/src/main/java/org/elasticsearch/action/search/FetchSearchPhase.java
+++ b/core/src/main/java/org/elasticsearch/action/search/FetchSearchPhase.java
@@ -84,7 +84,12 @@ final class FetchSearchPhase extends SearchPhase {
                 // we do the heavy lifting in this inner run method where we reduce aggs etc. that's why we fork this phase
                 // off immediately instead of forking when we send back the response to the user since there we only need
                 // to merge together the fetched results which is a linear operation.
-                innerRun();
+                try {
+                    innerRun();
+                } catch (OutOfMemoryError exc) {
+                    context.getLogger().warn("Catch OOM.");
+                    throw new SearchPhaseExecutionException("FetchSearch", "Out of memory error.", ShardSearchFailure.EMPTY_ARRAY);
+                }
             }
 
             @Override

====> Now OutOfMemoryError is caught without causing ES node to go down.

[2017-09-16T00:11:12,851][WARN ][o.e.a.s.TransportSearchAction] [node01-es-dev] Catch OOM.
[2017-09-16T00:11:13,404][WARN ][o.e.x.w.e.ExecutionService] [node01-es-dev] Failed to execute watch [CsSKhUSkTPSLM8c56fVsNw_kibana_version_mismatch_5b73b833-8199-4e40-a620-fd7d4ffdff64-2017-09-16T07:10:11.309Z]
[2017-09-16T00:11:13,383][WARN ][r.suppressed             ] path: /nusights_metric_cfs_collector_memcpu_stats_2017_*/_search, params: {index=nusights_metric_cfs_collector_memcpu_stats_2017_*}
org.elasticsearch.action.search.SearchPhaseExecutionException: 
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:272) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:97) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:623) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Out of memory error.
        at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:91) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        ... 3 more
[2017-09-16T00:11:14,381][WARN ][o.e.m.j.JvmGcMonitorService] [node01-es-dev] [gc][227] overhead, spent [10.1s] collecting in the last [11.1s]

@jasontedor
Copy link
Member

jasontedor commented Sep 16, 2017

We are not going to catch OOM, we worked hard to remove catching errors that can not be recovered from: #19231. Dying when the node hits such an error is intentional, the JVM is suspect at that point: #19272.

@hexinw
Copy link
Author

hexinw commented Sep 16, 2017

Sorry I am late to this. Can you elaborate the JVM statement? Do you mean that GC is slow in recycling the memory? Coming from C/C++ world, I'd think it normal for application to handle memory malloc failure. Is there any doc or discussion thread that talk about elastic search memory management model in general?

Just to recap, I run into two OOM problems in my test.

  1. OOM is hit in per shard search phase, where circuit breaker fails to kick in due to request circuit limit is higher than current actual available heap space.
  2. OOM happens in the reduce phase on the coordinating node.

Both OOMs are bad as they bring down the ES node and cause the ES cluster to not quite usable. And to me it'd be better to fail the query rather than crashing the ES node if we calculate the memory requirement upfront or react to the OOM dynamically.

@jasontedor
Copy link
Member

An out of memory exception can be thrown on any thread, whether executing:

  • JDK code
  • Elasticsearch server code
  • code in a third-party dependency

We have zero guarantees that a thread suffering an out of memory exception can gracefully cleanup any shared data structures it was in the middle of manipulating, there might not even be enough memory to gracefully carry out such cleanup. With no guarantees, we have to assume that the state of the JVM is suspect at this point.

It gets worse. Imagine threads waiting on an object monitor or a lock held by a thread that suffers an out of memory exception. The offending thread might not be able to notify the monitor or unlock these threads leaving them deadlocked. Trying to deal with this comes at the cost of significantly more complicated code. In fact, even if you tried to employ a strategy for dealing with this, because of the above remarks on shared data structures, there's no guarantee that this can be done safely; we can not unblock a waiting thread and let it start messing with a shared data structure when it's not in a good state.

Similar reasoning applies to other errors (stack overflow, etc.).

In such a stage, the JVM simply must die with dignity instead of limping along in an unknown unreliable state.

@hexinw
Copy link
Author

hexinw commented Sep 18, 2017

@jasontedor I agreed with you on the OOM decision. Is there any improvement in the talk to pre-estimate memory consumption (I understand it is still a best effort) and fail the query request rather than letting OOM kick in to bring down the ES node?

Also with respect to the circuit breaker setting, do you guys see any value in making the static limit a dynamic setting so the limit is dynamically decided by the time how much heap is available?

@dakrone
Copy link
Member

dakrone commented Sep 26, 2017

Is there any improvement in the talk to pre-estimate memory consumption (I understand it is still a best effort) and fail the query request rather than letting OOM kick in to bring down the ES node?

We're definitely always interested in improving the pre-estimation of memory consumption, and adding more circuit breakers as needed.

Also with respect to the circuit breaker setting, do you guys see any value in making the static limit a dynamic setting so the limit is dynamically decided by the time how much heap is available?

I think this would be too hard to debug since it would vary widely whether the node could handle the request. It's also impossible to know whether a heap at 79% usage could easily be GC'd to 20% usage, or whether all the objects are live and cannot be GC'd. I'm in favor of a static (albeit configurable) limit on breakers.

@hexinw
Copy link
Author

hexinw commented Sep 26, 2017

Thanks @dakrone What was the reason we don't have circuit breaker in reduced phase currently? Actually why is reducing phase taking more memory?

@dakrone
Copy link
Member

dakrone commented Sep 26, 2017

What was the reason we don't have circuit breaker in reduced phase currently?

It simply hasn't been added, I don't think there's a reason anyone purposely didn't add one there.

Actually why is reducing phase taking more memory?

The reducing phase usually does not take more memory, this is likely the reason why a circuit breaker hasn't been proposed for this prior to this. I haven't seen any other instances where a node was getting an OOME during the reduction phase.

@hexinw
Copy link
Author

hexinw commented Sep 26, 2017

The stack trace I posted 11 days back is a OOM in reduced phase.

[2017-09-16T00:11:12,851][WARN ][o.e.a.s.TransportSearchAction] [node01-es-dev] Catch OOM.
[2017-09-16T00:11:13,404][WARN ][o.e.x.w.e.ExecutionService] [node01-es-dev] Failed to execute watch [CsSKhUSkTPSLM8c56fVsNw_kibana_version_mismatch_5b73b833-8199-4e40-a620-fd7d4ffdff64-2017-09-16T07:10:11.309Z]
[2017-09-16T00:11:13,383][WARN ][r.suppressed             ] path: /nusights_metric_cfs_collector_memcpu_stats_2017_*/_search, params: {index=nusights_metric_cfs_collector_memcpu_stats_2017_*}
org.elasticsearch.action.search.SearchPhaseExecutionException: 
        at org.elasticsearch.action.search.AbstractSearchAsyncAction.onPhaseFailure(AbstractSearchAsyncAction.java:272) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.action.search.FetchSearchPhase$1.onFailure(FetchSearchPhase.java:97) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.onFailure(ThreadContext.java:623) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:39) [elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
Caused by: org.elasticsearch.action.search.SearchPhaseExecutionException: Out of memory error.
        at org.elasticsearch.action.search.FetchSearchPhase$1.doRun(FetchSearchPhase.java:91) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:638) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-5.4.2.jar:5.4.2-SNAPSHOT]
        ... 3 more
[2017-09-16T00:11:14,381][WARN ][o.e.m.j.JvmGcMonitorService] [node01-es-dev] [gc][227] overhead, spent [10.1s] collecting in the last [11.1s]

@danielmitterdorfer
Copy link
Member

A part of the issue described here has been tackled in #27581. Enhancing request-level circuit-breaking on the coordinating node is tracked in #37182 and thus I'm closing the issue here in favor or #37182.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload
Projects
None yet
Development

No branches or pull requests

7 participants