Request-level circuit breaker support on coordinating nodes #37182

markharwood · 2019-01-07T11:05:19Z

Currently we do not have circuit breaker support for search requests executed on the coordinating node. We have multi-phase reduction which should help avoid OOMs but it is still possible to have abusive queries taking a node down.
A recent example OOM was caused by date histograms with 5 minute intervals executed across many time-based indices. Each of the data nodes failed to trip a circuit breaker because they were only seeing a small part of the final result. The multi-phase reduction did nothing to reduce the final number of buckets required and the final OOM occurred while rendering results in toXContent. This scenario was exacerbated by the fact there was a top-level terms agg for hostname under which there were the date histograms.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-01-07T11:05:20Z

Pinging @elastic/es-analytics-geo

elasticmachine · 2019-01-07T11:05:21Z

Pinging @elastic/es-core-infra

markharwood · 2019-01-07T11:54:17Z

@jimczi Since opening this ticket I've been made aware of #27581
Was that the final solution or is there more work on circuit breakers to be done on coordinating nodes?

jimczi · 2019-01-07T11:56:52Z

Was that the final solution or is there more work on circuit breakers to be done on coordinating nodes?

I think so yes, the issue you described here should not happen on master where the maximum number of bucket is set to 10,000 by default.

markharwood · 2019-01-07T12:44:19Z

While #27581 should address issues with aggregations like DateHistogram with many small buckets we don't currently account for internal memory consumed by more expensive aggregations like cardinality.
For this reason I'm keeping this ticket open to track the problem of accounting for these more complex aggregations.

howardhuanghua · 2019-01-17T08:58:21Z

@jimczi The maximum number of bucket only limit reduction phase bucket count. So how about if coordinate node received lots of first-phase aggregation bucket result from other data node before going to reduction phase and exploded the coordinate node memory? How do we limit the received temporary data memory?

howardhuanghua · 2019-01-17T09:02:16Z

Follow "results.consumeResult(result);" may consume lots of memory and how to avoid OOM in huge number of nodes case aggregation request?

    @Override
    public final void onShardSuccess(Result result) {
        successfulOps.incrementAndGet();
        results.consumeResult(result);
        if (logger.isTraceEnabled()) {
            logger.trace("got first-phase result from {}", result != null ? result.getSearchShardTarget() : null);
        }
        // clean a previous error on this shard group (note, this code will be serialized on the same shardIndex value level
        // so its ok concurrency wise to miss potentially the shard failures being created because of another failure
        // in the #addShardFailure, because by definition, it will happen on *another* shardIndex
        AtomicArray<ShardSearchFailure> shardFailures = this.shardFailures.get();
        if (shardFailures != null) {
            shardFailures.set(result.getShardIndex(), null);
        }
    }

ppf2 · 2019-01-23T18:21:15Z

@markharwood @jimczi

we don't currently account for internal memory consumed by more expensive aggregations like cardinality.

Will the new real memory breaker on 7.0 resolve this since it should break based on real heap usage threshold even on coordinating nodes correct? Or do we still plan on implementing a coordinating node specific request breaker in addition to the real memory breaker?

danielmitterdorfer · 2019-01-24T06:44:03Z

@ppf2 I don't think the real memory circuit breaker is able to reliably prevent this. Circuit breakers are not actively observing system state but rather need to be invoked explicitly at certain points during request handling. Only then they will check current resource usage and potentially reject a request. If a request is past that check and then allocates a lot of memory, also the real memory circuit breaker cannot prevent this situation (although it might detect too high memory usage and rejects other requests that are sent concurrently). Therefore, a circuit breaker that checks (expected) memory usage of aggregations on the coordinating node can make sense.

nik9000 · 2020-07-27T13:33:13Z

This is sort of linked in with a few things that @jimczi and I are working on slowly, in between a bunch of other things. In particular, we now have an accurate accounting of the memory usage of buffered aggregation results which we expect to be the bulk of the "big" stuff in a request on the coordinating node. We're working on refactoring the partial reductions (in #58461). From there we hope to trigger these partial reductions based on memory usage. That isn't quite the same thing as this issue, but it is fairly close.

jimczi · 2020-07-27T14:15:06Z

From there we hope to trigger these partial reductions based on memory usage. That isn't quite the same thing as this issue, but it is fairly close.

In my mind the follow up of #58461 was to account the memory used by buffered aggs in the circuit breaker so that's closer to what this issue is about.

This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](elastic#51857) of blocking search request to a more sane number. Closes elastic#37182

This commit allows coordinating node to account the memory used to perform partial and final reduce of aggregations in the request circuit breaker. The search coordinator adds the memory that it used to save and reduce the results of shard aggregations in the request circuit breaker. Before any partial or final reduce, the memory needed to reduce the aggregations is estimated and a CircuitBreakingException} is thrown if exceeds the maximum memory allowed in this breaker. This size is estimated as roughly 1.5 times the size of the serialized aggregations that need to be reduced. This estimation can be completely off for some aggregations but it is corrected with the real size after the reduce completes. If the reduce is successful, we update the circuit breaker to remove the size of the source aggregations and replace the estimation with the serialized size of the newly reduced result. As a follow up we could trigger partial reduces based on the memory accounted in the circuit breaker instead of relying on a static number of shard responses. A simpler follow up that could be done in the mean time is to [reduce the default batch reduce size](#51857) of blocking search request to a more sane number. Closes #37182

markharwood added >enhancement :Analytics/Aggregations Aggregations :Core/Infra/Circuit Breakers Track estimates of memory consumption to prevent overload team-discuss labels Jan 7, 2019

markharwood removed the team-discuss label Jan 9, 2019

danielmitterdorfer mentioned this issue Sep 6, 2019

[o.e.b.ElasticsearchUncaughtExceptionHandler] [node01-es-dev] fatal error in thread [elasticsearch[node01-es-dev][search][T#6]], exiting java.lang.OutOfMemoryError: Java heap space #26525

Closed

$@polyfractal$ polyfractal mentioned this issue Sep 19, 2019

Bucket aggregation circuit breaker optimization. #46751

Merged

rjernst added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Core/Infra Meta label for core/infra team labels May 4, 2020

imotov added the team-discuss label Jun 26, 2020

imotov removed the team-discuss label Jul 23, 2020

jimczi mentioned this issue Aug 18, 2020

Change search.max_buckets from Cluster Setting to Index Level Setting #61042

Closed

jimczi mentioned this issue Sep 10, 2020

Request-level circuit breaker support on coordinating nodes #62223

Merged

jimczi closed this as completed in #62223 Sep 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request-level circuit breaker support on coordinating nodes #37182

Request-level circuit breaker support on coordinating nodes #37182

markharwood commented Jan 7, 2019

elasticmachine commented Jan 7, 2019

elasticmachine commented Jan 7, 2019

markharwood commented Jan 7, 2019

jimczi commented Jan 7, 2019

markharwood commented Jan 7, 2019

howardhuanghua commented Jan 17, 2019

howardhuanghua commented Jan 17, 2019

ppf2 commented Jan 23, 2019

danielmitterdorfer commented Jan 24, 2019

nik9000 commented Jul 27, 2020

jimczi commented Jul 27, 2020

Request-level circuit breaker support on coordinating nodes #37182

Request-level circuit breaker support on coordinating nodes #37182

Comments

markharwood commented Jan 7, 2019

elasticmachine commented Jan 7, 2019

elasticmachine commented Jan 7, 2019

markharwood commented Jan 7, 2019

jimczi commented Jan 7, 2019

markharwood commented Jan 7, 2019

howardhuanghua commented Jan 17, 2019

howardhuanghua commented Jan 17, 2019

ppf2 commented Jan 23, 2019

danielmitterdorfer commented Jan 24, 2019

nik9000 commented Jul 27, 2020

jimczi commented Jul 27, 2020