-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't serialize the entire query when query parsing fails #51843
Comments
Pinging @elastic/es-search (:Search/Search) |
Pinging @elastic/es-core-infra (:Core/Infra/Logging) |
QueryBuilders that throw exceptions on shards when building the Lucene query returns the full serialization of the query builder in the exception message. For large queries that fails to execute due to the max boolean clause, this means that we keep a reference of these big messages for every shard that participate in the request. In order to limit the memory needed to hold these query shard exceptions in the coordinating node, this change removes the query builder serialization from the shard exception. The query is known by the user so there should be no need to repeat it on every shard exception. We could also omit the entire stack trace for known bad request exception but it would deserve a separate issue/pr. Closes elastic#51843 Closes elastic#48910
Thanks @jimczi, this should sort out #48910 ! One question (and pardon my ignorance): is it not possible for the coordinating node to do some (basic/partial?) parsing just to ensure that global circuit breakers like |
…ge (#51885) QueryBuilders that throw exceptions on shards when building the Lucene query returns the full serialization of the query builder in the exception message. For large queries that fails to execute due to the max boolean clause, this means that we keep a reference of these big messages for every shard that participate in the request. In order to limit the memory needed to hold these query shard exceptions in the coordinating node, this change removes the query builder serialization from the shard exception. The query is known by the user so there should be no need to repeat it on every shard exception. We could also omit the entire stack trace for known bad request exception but it would deserve a separate issue/pr. Closes #51843 Closes #48910
…ge (#51885) QueryBuilders that throw exceptions on shards when building the Lucene query returns the full serialization of the query builder in the exception message. For large queries that fails to execute due to the max boolean clause, this means that we keep a reference of these big messages for every shard that participate in the request. In order to limit the memory needed to hold these query shard exceptions in the coordinating node, this change removes the query builder serialization from the shard exception. The query is known by the user so there should be no need to repeat it on every shard exception. We could also omit the entire stack trace for known bad request exception but it would deserve a separate issue/pr. Closes #51843 Closes #48910
The coordinating node doesn't have the mapping so it's not possible at the moment. The change I committed is just a small fix, there's a lot of follow up that we could try. One thing that I'd like to test is whether we can de-duplicate the shard failures at the coordinator level. We shouldn't have to save every shard failures if they are all the same (parsing failure, too many clauses, ...). Another possibility would be to try to early terminate the query if a shard failure is reported and |
Ah, true. Would this (and other?) use cases perhaps benefit from an integrated "schema" (mapping) registry? This could be replicated on all nodes for fast access, since the overall size would be small. I also do agree on deduping the shard failures as they stream in. Even without peeking too deep into the error strings and relying on string similarity etc., you could perhaps check Early termination when Thanks again! |
@jimczi just to reconfirm: current behavior is for coordinating node to hold on to all shard failures regardless of One remaining worry is that this is only a search param. We have users who go through a UI facade that we control tightly but also data scientists who have more freedom in writing queries directly to ES. In the latter case, we'd still be vulnerable if they omit this param in their searches. Perhaps this could also be made available as a global setting? Finally, would you still want/have to distinguish between shard failures that can be retried against another replica and those that should cause the query to be terminated right away? |
Today when the query parsing fails on a shard, we create a
QueryShardException
that serializes the entireQueryBuilder
in the message. All these exceptions are kept on a per-shard basis by the coordinating node so we except them to be light weight. For very large boolean queries (with lots of clauses that trip themax_clause_count
for instance) this can cause memory pressure on the coordinating node since we don't de-duplicate exceptions coming from different shards.The simplest fix would be to limit the size of the
QueryShardException
's message by truncating the QueryBuilder serialization if it is above a certain threshold (1k ?).The text was updated successfully, but these errors were encountered: