-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request-level circuit breaker support on coordinating nodes #37182
Comments
Pinging @elastic/es-analytics-geo |
Pinging @elastic/es-core-infra |
I think so yes, the issue you described here should not happen on master where the maximum number of bucket is set to |
While #27581 should address issues with aggregations like DateHistogram with many small buckets we don't currently account for internal memory consumed by more expensive aggregations like |
@jimczi The maximum number of bucket only limit reduction phase bucket count. So how about if coordinate node received lots of first-phase aggregation bucket result from other data node before going to reduction phase and exploded the coordinate node memory? How do we limit the received temporary data memory? |
Follow "results.consumeResult(result);" may consume lots of memory and how to avoid OOM in huge number of nodes case aggregation request?
|
Will the new real memory breaker on 7.0 resolve this since it should break based on real heap usage threshold even on coordinating nodes correct? Or do we still plan on implementing a coordinating node specific request breaker in addition to the real memory breaker? |
@ppf2 I don't think the real memory circuit breaker is able to reliably prevent this. Circuit breakers are not actively observing system state but rather need to be invoked explicitly at certain points during request handling. Only then they will check current resource usage and potentially reject a request. If a request is past that check and then allocates a lot of memory, also the real memory circuit breaker cannot prevent this situation (although it might detect too high memory usage and rejects other requests that are sent concurrently). Therefore, a circuit breaker that checks (expected) memory usage of aggregations on the coordinating node can make sense. |
This is sort of linked in with a few things that @jimczi and I are working on slowly, in between a bunch of other things. In particular, we now have an accurate accounting of the memory usage of buffered aggregation results which we expect to be the bulk of the "big" stuff in a request on the coordinating node. We're working on refactoring the partial reductions (in #58461). From there we hope to trigger these partial reductions based on memory usage. That isn't quite the same thing as this issue, but it is fairly close. |
In my mind the follow up of #58461 was to account the memory used by buffered aggs in the circuit breaker so that's closer to what this issue is about. |
This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](elastic#51857) of blocking search request to a more sane number. Closes elastic#37182
This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](#51857) of blocking search request to a more sane number. Closes #37182
This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](#51857) of blocking search request to a more sane number. Closes #37182
Currently we do not have circuit breaker support for search requests executed on the coordinating node. We have multi-phase reduction which should help avoid OOMs but it is still possible to have abusive queries taking a node down.
A recent example OOM was caused by date histograms with 5 minute intervals executed across many time-based indices. Each of the data nodes failed to trip a circuit breaker because they were only seeing a small part of the final result. The multi-phase reduction did nothing to reduce the final number of buckets required and the final OOM occurred while rendering results in
toXContent
. This scenario was exacerbated by the fact there was a top-level terms agg forhostname
under which there were the date histograms.The text was updated successfully, but these errors were encountered: