After Upgrade to 6.2.4 Circuitbreaker Exception Data too large #31197

r32rtb · 2018-06-08T11:11:15Z

Elasticsearch version (bin/elasticsearch --version): Version: 6.2.4, Build: ccec39f/2018-04-12T20:37:28.497551Z, JVM: 1.8.0_151

Plugins installed: discovery-file and xpack

JVM version (java -version): java version "1.8.0_151"
Java(TM) SE Runtime Environment (build 1.8.0_151-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.151-b12, mixed mode)

OS version (uname -a if on a Unix-like system): Linux 4.4.0-124-generic #148-Ubuntu SMP Wed May 2 13:00:18 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Description of the problem including expected versus actual behavior: After upgrade and restart of the cluster I get the following message:
[2018-06-08T06:51:56,825][WARN ][o.e.c.a.s.ShardStateAction] [a11-es0] [logstash-bro-2018.05.31][3] received shard failed for shard id [[logstash-bro-2018.05.31][3]], allocation id [dwSADk_rRhSqdRqqlG5_Qw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-bro-2018.05.31][3]: Recovery failed from {a16-es1}{YRp5l6bMSJKSjVD7_hC8aQ}{8tIAsMsKT_m5h5PP7aDjWg}{192.168.1.16}{192.168.1.16:9301}{box_type=hot} into {a25}{CNJlJBE0TQK2zYzu0Jcsxg}{Vs4C9bEWTtyF7CTOyvnbYA}{192.168.1.25}{192.168.1.25:9300}{box_type=warm}]; nested: RemoteTransportException[[a16-es1][192.168.1.16:9301][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [61] files with total size of [75.1gb]]; nested: RemoteTransportException[[a25][192.168.1.25:9300][internal:index/shard/recovery/file_chunk]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [23219239233/21.6gb], which is larger than the limit of [23178077798/21.5gb]]; ]

Steps to reproduce:

Please include a minimal but complete recreation of the problem, including
(e.g.) index creation, mappings, settings, query etc. The easier you make for
us to reproduce it, the more likely that somebody will take the time to look at it.

1.Upgraded from 6.1.1 to 6.2.4.
2.I have previously restarted the cluster without having this issue.
3.I attempted to increase the heap size to 40GB as a temporary solution to allow the reallocation to occure but still seeing the issue.

Provide logs (if relevant):
[2018-06-08T06:51:56,825][WARN ][o.e.c.a.s.ShardStateAction] [a11-es0] [logstash-bro-2018.05.31][3] received shard failed for shard id [[logstash-bro-2018.05.31][3]], allocation id [dwSADk_rRhSqdRqqlG5_Qw], primary term [0], message [failed recovery], failure [RecoveryFailedException[[logstash-bro-2018.05.31][3]: Recovery failed from {sa16-es1}{YRp5l6bMSJKSjVD7_hC8aQ}{8tIAsMsKT_m5h5PP7aDjWg}{192.168.1.16}{192.168.1.16:9301}{box_type=hot} into {a25}{CNJlJBE0TQK2zYzu0Jcsxg}{Vs4C9bEWTtyF7CTOyvnbYA}{192.168.1.25}{192.168.1.25:9300}{box_type=warm}]; nested: RemoteTransportException[[a16-es1][192.168.1.16:9301][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [61] files with total size of [75.1gb]]; nested: RemoteTransportException[[a25][192.168.1.25:9300][internal:index/shard/recovery/file_chunk]]; nested: CircuitBreakingException[[parent] Data too large, data for [<transport_request>] would be [23219239233/21.6gb], which is larger than the limit of [23178077798/21.5gb]]; ]

The text was updated successfully, but these errors were encountered:

elasticmachine · 2018-06-08T16:25:41Z

Pinging @elastic/es-core-infra

adol-ch · 2018-06-11T03:30:57Z

My cluster have the same issue

furkalor · 2018-07-26T08:36:21Z

I had this problem too. I work around it by increasing breaker limit on the fly:

curl -X PUT "localhost:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
 {
     "transient" : {
         "indices.breaker.total.limit" : "80%"
     }
 }
 '

By default it's 70%.

hrr888 · 2018-08-29T10:27:44Z

My cluster is also affected by this bug. Workaround helps only for a few days, then circuit breaker hits the limit again, and I bump it up few percent more. Rolling restart doesn't help.

jasontedor · 2018-08-29T18:17:33Z

Elasticsearch 6.2.0 introduced the accounting circuit breaker to account for segment memory usage: #27116. This means that for installations having a large number of shards, it would be expected that this breaker would start tripping on them when it previously was not. All that we have done here is do a better job of accounting for memory and can therefore break now in more situations. This is a good thing, it prevents us from going out of memory and completely blowing up. In 7.0.0 we are making further enhancements to the circuit breaker to account for the real memory usage: #33125

I do not see a bug here, I think that Elasticsearch and the circuit breaker are behaving as expected.

hrr888 · 2018-08-30T06:05:03Z

Maybe I don't understand something, but how simple request can allocate so much memory:

curl -X GET 'http://localhost:9200/_cat/health?v'

{"error":{"root_cause":[{"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [18213502664/16.9gb], which is larger than the limit of [18193693409/16.9gb]","bytes_wanted":18213502664,"bytes_limit":18193693409}],"type":"circuit_breaking_exception","reason":"[parent] Data too large, data for [<http_request>] would be [18213502664/16.9gb], which is larger than the limit of [18193693409/16.9gb]","bytes_wanted":18213502664,"bytes_limit":18193693409},"status":503}

danielmitterdorfer · 2018-08-30T07:27:34Z

@hrr888 it is not that this particular request allocates so much memory. Circuit breakers are hierarchical so the "parent" breaker accumulates the reserved memory of the other circuit breakers. It is likely that one of the other circuit breakers (see docs) is already reserving a lot of memory and this request just pushes it over the limit and the parent circuit breaker trips due to totally reserved memory usage of all other circuit breakers. You can use the node stats API (i.e. GET /_nodes/stats/breaker) to inspect the current circuit breaker status.

hrr888 · 2018-08-30T07:36:28Z

Ok, thanks for explanation. But then, this error message is somewhat misleading.

jasontedor · 2018-08-30T11:24:37Z

@hrr888 We are open to suggestions on how we can improve any aspect of the system. Can you help us understand what is would make it better?

Starting with:

{
  "error": {
    "root_cause": [
      {
        "type": "circuit_breaking_exception",
        "reason": "[parent] Data too large, data for [<http_request>] would be [18213502664/16.9gb], which is larger than the limit of [18193693409/16.9gb]",
        "bytes_wanted": 18213502664,
        "bytes_limit": 18193693409
      }
    ],
    "type": "circuit_breaking_exception",
    "reason": "[parent] Data too large, data for [<http_request>] would be [18213502664/16.9gb], which is larger than the limit of [18193693409/16.9gb]",
    "bytes_wanted": 18213502664,
    "bytes_limit": 18193693409
  },
  "status": 503
}

There was an error, its root_cause is a circuit_breaking_exception and the reason is because
[parent] Data too large, data for [<http_request>] would be [18213502664/16.9gb], which is larger than the limit of [18193693409/16.9gb].

This means it is the parent circuit breaker that tripped, it is an HTTP request that tripped it, if we accepted the HTTP request than the breaker would be at 18213502664 bytes (bytes_wanted) which is larger than the limit of 18193693409 bytes (bytes_limit). Therefore, the circuit breaker trips and the request is rejected.

The status code on the HTTP response is 503, internal server error (see #31986).

Finally, root_cause and the top-level exception are the same (they would not be in the case of, say, a remote exception where on the remote side the cause was, for example, and illegal argument exception).

r32rtb · 2018-08-31T01:27:48Z

Since upgrading to 6.4.0 I have no longer seen this issue. 6.4.0 appears to be much more stable.

hrr888 · 2018-08-31T10:39:38Z

@jasontedor Explanation you gave make it all clear. Maybe it's my poor understanding of English. The docs say

Each breaker specifies a limit for how much memory it can use. Additionally, there is a parent-level breaker that specifies the total amount of memory that can be used across all breakers.

"Across" is for me as "if any breaker has own limit greater than parent, then parent has precedence", but it not implies that parent is applied to sum of all allocated memory, as @danielmitterdorfer explained (I found the root cause of the problem. Thanks, @danielmitterdorfer :) ).

Maybe cited sentence should be rephrased or exdended.

wanthalf · 2019-05-20T16:04:58Z

After upgrade to ES7, I get this error just to any request (including health status) after a few minutes of monitoring by cerebro UI and no queries nor other requests at all. Heap is set to 16GB and it gets almost full just after initialisation of the 15 indexes (each having 5 shards). When setting the heap size to the default 1GB, ES does not even finish initialisation. So, I have to raise even higher than 16GB?

virtuman · 2019-05-22T03:06:03Z

same in my case. i set a cluster up with 3 data, 3 master, and 3 ingest nodes.

avloss · 2019-05-29T20:18:35Z

Getting same error with a "toy" example - this is so strange - I would expect Elastic to just "work".

jasontedor · 2019-05-29T20:25:04Z

@avloss I'm sorry that you're experiencing trouble, but without details of the issue that you're experiencing it is not actionable for us. We would love to understand the issue that you're facing, and supply a fix if appropriate. Help us help you and the rest of our userbase.

wanthalf · 2019-05-29T21:59:17Z

Still the same with 7.1. And I only have one single node. After raising the Java heap to 48GB, ES seems to manage common traffic well.

On the other hand, there is one great improvement since ES6, though: no more crazy garbage collection causing heavy load on all the 112 cores of the server every 1 second, even when the server is completely idle...!?! :-)

e-orz · 2019-05-30T13:00:15Z

I have the same issue. I think that the parent memory calculation is wrong with 7.x (doesn't happen with version 6)
see this discussion:
https://discuss.elastic.co/t/parent-circuit-breaker-calculation-seems-to-wrong-with-version-7-x/183530

hackerwin7 · 2019-06-05T13:03:17Z

same issues in 7.1.0, there is a internal server error in /_stats/fielddata

"failures":[  
         {  
            "shard":6,
            "index":"ad_stat_sum_extend-2019.05.08",
            "status":"INTERNAL_SERVER_ERROR",
            "reason":{  
               "type":"failed_node_exception",
               "reason":"Failed node [PMeDcHfMS4ilMZxQbrSt1w]",
               "node_id":"PMeDcHfMS4ilMZxQbrSt1w",
               "caused_by":{  
                  "type":"circuit_breaking_exception",
                  "reason":"[parent] Data too large, data for [<transport_request>] would be [26445028500/24.6gb], which is larger than the limit of [25769803776/24gb], real usage: [26445014040/24.6gb], new bytes reserved: [14460/14.1kb]",
                  "bytes_wanted":26445028500,
                  "bytes_limit":25769803776,
                  "durability":"PERMANENT"
               }
            }
         },

blueabysm · 2019-06-12T04:22:27Z

Same issue in 7.1.1.
I am new to Elasticsearch. I use ES cluster as a log center. Yesterday this issue made ES lose logs, many alerts related to this were triggered. I deleted indices 5 days ago (actually my cluster has just run a week) this morning, ES cluster finally receives logs normally.

After read this thread, I know that it is the expected behavior of ES. Is there any suggestion to avoid run into this issue again? Or, is there any walkaround? Thanks! @jasontedor

Theoooooo · 2019-06-13T09:47:58Z

I got the same issue after upgrading from 6.6.0 to 7.1.0 on a single node cluster with 3Gb of heap. Every 3 request (these are really small requests with very little data to retreive), i hit the Heap Limit and have to wait.

But i also got another ES cluster with 3 nodes, fresh install of he 7.1.1 and i never got this problem.
Seems like this issue appear with the upgrade i did.

e-orz · 2019-06-16T06:52:03Z

@Theoooooo
With my case it happens with fresh install.
What is the GC type, CMS or G1GC?

wuxiangli91 · 2019-06-18T01:55:32Z

just use:
ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch

since the default heap is 1G, if your data is big ,you should set it bigger

Theoooooo · 2019-06-18T16:23:25Z

@e-orz What's GC Type ?

@wuxiangli91 And what if you have limited ressources available ? That's making non sense but it's also right in the same time.

After testing many different configuration later (in term of ressources), well the answer to the problem is just to allocate more RAM. But with very small query that demand very small ressources and almost no cross-indices search, it's just a problem right now

e-orz · 2019-06-19T14:56:48Z

@Theoooooo I meant the garbage collection type. The default is CMS (concurrent mark and sweep) and the newer one is G1GC that should require less tweaks but might only be suited for larger heaps.

Theoooooo · 2019-06-20T12:00:35Z

@e-orz How do you check which version is used by java with elasticsearch ? ( i assume it's java ^^)

e-orz · 2019-06-20T19:38:01Z

@Theoooooo you can check the jvm.options file.

Theoooooo · 2019-06-21T09:17:57Z

@e-orz

It's the same content in the default jvm.options generated by elasticsearch at the installation

32328254 · 2019-07-11T02:43:04Z

I have the same issue.
i tested CMS GC way and G1GC way，But there are still the same problems。

cawoodm · 2020-05-15T10:13:27Z

We just upgraded ElasticSearch and Kibana to 7.7 from 7.6.2 and now Kibana won't start with the same message.

FATAL [circuit_breaking_exception] [parent] Data too large, data for [<http_request>] would be [987270888/941.5mb]...

raghuchahar007 · 2020-05-18T10:53:34Z

After smoothly working for more than 10 months, I start getting the same error on production suddenly while doing simple queries.

GET: http://localhost:8080/?pretty
Output:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "circuit_breaking_exception",
        "reason" : "[parent] Data too large, data for [<http_request>] would be [745522124/710.9mb], which is larger than the limit of [745517875/710.9mb]",
        "bytes_wanted" : 745522124,
        "bytes_limit" : 745517875
      }
    ],
    "type" : "circuit_breaking_exception",
    "reason" : "[parent] Data too large, data for [<http_request>] would be [745522124/710.9mb], which is larger than the limit of [745517875/710.9mb]",
    "bytes_wanted" : 745522124,
    "bytes_limit" : 745517875
  },
  "status" : 503
}

while simplest elasticsearch query, i.e elasticsearch metadata:

Here is my breaker info

{
 "breakers" : {
        "request" : {
          "limit_size_in_bytes" : 639015321,
          "limit_size" : "609.4mb",
          "estimated_size_in_bytes" : 0,
          "estimated_size" : "0b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "fielddata" : {
          "limit_size_in_bytes" : 639015321,
          "limit_size" : "609.4mb",
          "estimated_size_in_bytes" : 406826332,
          "estimated_size" : "387.9mb",
          "overhead" : 1.03,
          "tripped" : 0
        },
        "in_flight_requests" : {
          "limit_size_in_bytes" : 1065025536,
          "limit_size" : "1015.6mb",
          "estimated_size_in_bytes" : 560,
          "estimated_size" : "560b",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "accounting" : {
          "limit_size_in_bytes" : 1065025536,
          "limit_size" : "1015.6mb",
          "estimated_size_in_bytes" : 146387859,
          "estimated_size" : "139.6mb",
          "overhead" : 1.0,
          "tripped" : 0
        },
        "parent" : {
          "limit_size_in_bytes" : 745517875,
          "limit_size" : "710.9mb",
          "estimated_size_in_bytes" : 553214751,
          "estimated_size" : "527.5mb",
          "overhead" : 1.0,
          "tripped" : 0
        }
}

What would be the better approach to solve this, disabling circuit breaker or increasing breaker limit?

senvardarsemih · 2020-12-08T10:32:57Z

ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch

When i use this , it shows "./bin/elasticsearch-env: line 81: /etc/sysconfig/elasticsearch: Permission denied" error.

albertzaharovits added >bug :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload v6.2.4 labels Jun 8, 2018

jasontedor closed this as completed Aug 29, 2018

cosmo0920 mentioned this issue Dec 11, 2019

Data too large, circuit_breaking_exception uken/fluent-plugin-elasticsearch#682

Closed

1 task

toluaina mentioned this issue Nov 30, 2020

pgsync process not reducing the transaction logs created by the postgresql setting logical_replication=1 toluaina/pgsync#53

Closed

nsano-rururu mentioned this issue Dec 17, 2020

elastalert feels very unhappy (circuit_breaking_exception, Data too large) Yelp/elastalert#2485

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After Upgrade to 6.2.4 Circuitbreaker Exception Data too large #31197

After Upgrade to 6.2.4 Circuitbreaker Exception Data too large #31197

r32rtb commented Jun 8, 2018

elasticmachine commented Jun 8, 2018

adol-ch commented Jun 11, 2018

furkalor commented Jul 26, 2018

hrr888 commented Aug 29, 2018

jasontedor commented Aug 29, 2018

hrr888 commented Aug 30, 2018

danielmitterdorfer commented Aug 30, 2018

hrr888 commented Aug 30, 2018

jasontedor commented Aug 30, 2018 •

edited

Loading

r32rtb commented Aug 31, 2018

hrr888 commented Aug 31, 2018 •

edited

Loading

wanthalf commented May 20, 2019

virtuman commented May 22, 2019

avloss commented May 29, 2019

jasontedor commented May 29, 2019

wanthalf commented May 29, 2019 •

edited

Loading

e-orz commented May 30, 2019

hackerwin7 commented Jun 5, 2019

blueabysm commented Jun 12, 2019

Theoooooo commented Jun 13, 2019 •

edited

Loading

e-orz commented Jun 16, 2019

wuxiangli91 commented Jun 18, 2019

Theoooooo commented Jun 18, 2019 •

edited

Loading

e-orz commented Jun 19, 2019

Theoooooo commented Jun 20, 2019

e-orz commented Jun 20, 2019

Theoooooo commented Jun 21, 2019 •

edited

Loading

32328254 commented Jul 11, 2019

cawoodm commented May 15, 2020

raghuchahar007 commented May 18, 2020

senvardarsemih commented Dec 8, 2020

After Upgrade to 6.2.4 Circuitbreaker Exception Data too large #31197

After Upgrade to 6.2.4 Circuitbreaker Exception Data too large #31197

Comments

r32rtb commented Jun 8, 2018

elasticmachine commented Jun 8, 2018

adol-ch commented Jun 11, 2018

furkalor commented Jul 26, 2018

hrr888 commented Aug 29, 2018

jasontedor commented Aug 29, 2018

hrr888 commented Aug 30, 2018

danielmitterdorfer commented Aug 30, 2018

hrr888 commented Aug 30, 2018

jasontedor commented Aug 30, 2018 • edited Loading

r32rtb commented Aug 31, 2018

hrr888 commented Aug 31, 2018 • edited Loading

wanthalf commented May 20, 2019

virtuman commented May 22, 2019

avloss commented May 29, 2019

jasontedor commented May 29, 2019

wanthalf commented May 29, 2019 • edited Loading

e-orz commented May 30, 2019

hackerwin7 commented Jun 5, 2019

blueabysm commented Jun 12, 2019

Theoooooo commented Jun 13, 2019 • edited Loading

e-orz commented Jun 16, 2019

wuxiangli91 commented Jun 18, 2019

Theoooooo commented Jun 18, 2019 • edited Loading

e-orz commented Jun 19, 2019

Theoooooo commented Jun 20, 2019

e-orz commented Jun 20, 2019

Theoooooo commented Jun 21, 2019 • edited Loading

32328254 commented Jul 11, 2019

cawoodm commented May 15, 2020

raghuchahar007 commented May 18, 2020

senvardarsemih commented Dec 8, 2020

jasontedor commented Aug 30, 2018 •

edited

Loading

hrr888 commented Aug 31, 2018 •

edited

Loading

wanthalf commented May 29, 2019 •

edited

Loading

Theoooooo commented Jun 13, 2019 •

edited

Loading

Theoooooo commented Jun 18, 2019 •

edited

Loading

Theoooooo commented Jun 21, 2019 •

edited

Loading