Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SearchShardFailure handling results in OOM #48910

Closed
altinp opened this issue Nov 7, 2019 · 4 comments · Fixed by #51885
Closed

SearchShardFailure handling results in OOM #48910

altinp opened this issue Nov 7, 2019 · 4 comments · Fixed by #51885
Labels
:Core/Infra/Logging Log management and logging utilities

Comments

@altinp
Copy link

altinp commented Nov 7, 2019

This is either the same or closely related to #27596 reported by @mattweber - which was hoped fixed in 6.1+ but was closed for lack of feedback in the end. It likely was not fixed.

  • ES 6.6.2 (and 5.6.x)
  • latest OpenJDK 1.8
  • CentOS 7
  • ES heap 31GB
  • log-level: info
    (This is at a client site and I don't have direct access to some details but can obtain if really needed.)

Problem is reproducible in various clusters, ranging from 8 to 50+ nodes, with thousands of shards. As long as a query fails (or more precisely is rejected) on each shard, e.g. because it trips too_many_clauses, an OOM will occur eventually when:
free_heap < 2 * (serialized_query_string_size) * num_shards
where serialized_query_string_size is at least as big as the user-submitted query.

So this is a case of the remedy being worse than the disease: in ES 6.6, the terms query now allows up to 64K terms (hence eventual clauses) by default, finally in line with the default terms-lookup limit, but if you write the query with (1K < separate term clauses < 64K), it's the protective mechanism that kills the coordinating node (just as in v 5.6 with a terms query with >1K terms)

I have analyzed the heap dumps and can provide reports. The reason seems to be that ShardSearchFailure holds on to two error strings that contain the whole pretty-printed query. One inside the cause [QueryShardException] and the other in the reason. In our ES 5.6 test, these strings were about the same size as the user query; in the ES 6.6 test, a simple 4MB user query (on-disk-size) such as:

{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "from.email.keyword": "foo1"
          }
        }, 
        {
          "term": {
            "from.email.keyword": "foo2"
          }
...

led to a 11.5 MB in-heap size for cause and reason (each)

2 *11.5MB * 1061 shards ~= 24 GB => OOM

Questions:

  • if the query trips a global limit, can't this be short-circuited at the query-coordinating node before sending it out to all data nodes (at least if that's not a coordinating-only node?). If not, detect that one of the incoming failures is of a global-tripwire type and drop the rest?
  • trying to return more than the first x bytes of the user query is clearly not useful
  • we have users and automated clients that can always submit (or programmatically generate!) queries with too many term clauses etc.

I will next submit screenshots and reports from the heap analysis.

NOTE: I don't think the issue is specific to logging the error, but to preparing a query response to return to the client. This contains at least one shard failure per index as seen in a response when the query is limited to a small number of indexes/shards and returns without OOM.

@altinp
Copy link
Author

altinp commented Nov 8, 2019

Here is a screenshot from Eclipse MAT:
MAT-1

@altinp
Copy link
Author

altinp commented Nov 8, 2019

And zooming into one of the shard failures:
MAT-2-zoomin

@imotov imotov added the :Core/Infra/Logging Log management and logging utilities label Nov 8, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/Logging)

@altinp
Copy link
Author

altinp commented Nov 18, 2019

crash-query.json.zip
Hi @imotov,
Looks like this can be reproduced locally too, quite easily, and on the latest release, 7.4, as well:

  • get & run the latest release, using defaults (1G heap)
    • can also run a 2/3-node local cluster instead, with same results
  • create an index with 80 shards (it can stay empty)
  • POST a term query like the above/attached. To create a sizable 4MB request, I had to go up to 40K clauses

The same OOM is produced. Note, if you turn off all logging, the OOM is ~immediate.
The heap dump looks much the same: the 4MB request I submitted is serialized to a ~6MB string internally (once "boost":1.0 etc. are added to each clause) and two error strings containing it are part of each shard failure as cause.detailMessage and reason.

So 12 * 80 = 960MB. In fact, the OOM is produced when the total from these strings is at about 750MB, given the system's own baseline heap usage.

VisualVM 1 4 4 2019-11-18 14 22 41

elasticsearch.log

jimczi added a commit to jimczi/elasticsearch that referenced this issue Feb 4, 2020
QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes elastic#51843
Closes elastic#48910
jimczi added a commit that referenced this issue Feb 6, 2020
…ge (#51885)

QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes #51843
Closes #48910
jimczi added a commit that referenced this issue Feb 6, 2020
…ge (#51885)

QueryBuilders that throw exceptions on shards when building the Lucene query
returns the full serialization of the query builder in the exception message.
For large queries that fails to execute due to the max boolean clause, this means
that we keep a reference of these big messages for every shard that participate
in the request. In order to limit the memory needed to hold these query shard
exceptions in the coordinating node, this change removes the query builder
serialization from the shard exception. The query is known by the user so
there should be no need to repeat it on every shard exception. We could also
omit the entire stack trace for known bad request exception but it would deserve
a separate issue/pr.

Closes #51843
Closes #48910
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Core/Infra/Logging Log management and logging utilities
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants