-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bucket aggregation circuit breaker optimization. #46751
Conversation
bad4d7c
to
d173ffe
Compare
4d542ee
to
468cf11
Compare
Ping @danielmitterdorfer @dakrone , could you please help to check this PR? Thanks. |
Pinging @elastic/es-search |
Pinging @elastic/es-analytics-geo |
Hi @howardhuanghua, thanks for the PR :) Is this specifically to address #37182, e.g. dealing with memory usage on the coordinating node (before the final reduce is invoked)? I'm asking because we do track memory usage with the I'm not sure a new breaker is the right approach though. It would probably be better to re-use the existing As an aside, I don't think deprecating |
Hi @polyfractal, thanks for the comment. This PR checks real memory usage with a certain step size of buckets allocation in
For the aggregation buckets memory tracking issue, now we have follow solutions:
|
Thanks for the explanation, that helps me understand the purpose better. A few thoughts, mainly writing this so I have an overview of all the pieces: Today, the parent breaker is checked any time we increment a breaker via So this means the real memory breaker is being checked on most (but not all) shard-level aggregation operation today. It's not being checked on:
So I agree that we should add some checks to the parent breaker at those steps. I'm not sure we need @danielmitterdorfer do you know if it is ok to call |
IMHO the |
Thanks @danielmitterdorfer, @polyfractal. To avoid calling If we check real-memory breaker every time we account for a new bucket, each checking is about few hundred nanos, if an aggregation result contains hundreds of thousands of buckets, then would cost tens of milliseconds on parent limit checking. With Meanwhile, I would like to confirm these unchecked levels you have mentioned above:
|
Hi @polyfractal, would you pelase help to check the updated commit again? Thanks a lot. |
Hi @howardhuanghua, apologies for the delay. I became unexpectedly very busy this last week. I'll try to take a look at your new changes tomorrow! |
Hi Adrien @jpountz , would you please help to review this PR? Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea a lot. I left some comments that might help simplify a bit.
@@ -109,6 +129,11 @@ public void accept(int value) { | |||
+ "] but was [" + count + "]. This limit can be set by changing the [" + | |||
MAX_BUCKET_SETTING.getKey() + "] cluster level setting.", limit); | |||
} | |||
|
|||
if (value > 0 && checkBucketsStepSizeLimit > 0 && count % checkBucketsStepSizeLimit == 0) { | |||
CircuitBreaker breaker = circuitBreakerService.getBreaker(CircuitBreaker.REQUEST); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of only getting the request circuit breaker here, can you take a CircuitBreaker instead of a CircuitBreakerService in the constructor?
private volatile int maxBucket; | ||
public static final Setting<Integer> CHECK_BUCKETS_STEP_SIZE_SETTING = | ||
Setting.intSetting("search.check_buckets_step_size", DEFAULT_CHECK_BUCKETS_STEP_SIZE, | ||
-1, Setting.Property.NodeScope, Setting.Property.Dynamic); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be in favor of not making it configurable at all and check every 1000 buckets all the time? (Or maybe 1024 so that the % 1000
can be replaced with a lighter & 0x3FF
mask)
final boolean forbidPrivateIndexSettings) { | ||
final Environment environment, | ||
final Collection<Class<? extends Plugin>> classpathPlugins, | ||
final boolean forbidPrivateIndexSettings) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you undo the above unrelated indentation changes?
new MultiBucketConsumerService.MultiBucketConsumer(10000, 10000, service); | ||
|
||
long currentMemory = ((HierarchyCircuitBreakerService) service).currentMemoryUsage(); | ||
if (currentMemory > parentLimitBytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe make the test a bit more predictable by calling addWithoutBreaking
with a number of bytes that is greater than the limit?
BigArrays bigArrays, FetchPhase fetchPhase) { | ||
super(clusterService, indicesService, threadPool, scriptService, bigArrays, fetchPhase, null); | ||
IndicesService indicesService, ThreadPool threadPool, ScriptService scriptService, | ||
BigArrays bigArrays, FetchPhase fetchPhase, CircuitBreakerService circuitBreakerService) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
.numberOfReplicas(0) | ||
.creationDate(System.currentTimeMillis()) | ||
.build(), | ||
Settings.EMPTY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
4d6ca85
to
998ff15
Compare
Hi @jpountz , thanks for your review. I have updated the code, would you please help to check again?
|
Hi @jpountz , would you please help to review the changes again? Thank you. |
@elasticmachine update branch |
@elasticmachine ok to test |
I opened #51694 for the failure of elasticsearch-ci/1 @elasticmachine run elasticsearch-ci/1 |
@howardhuanghua Thanks again for your contribution! I opened #51731 to discuss the deprecation of |
Co-authored-by: Howard <[email protected]>
@jpountz Thanks a lot for your help! |
Thanks @howardhuanghua! ❤️ |
@howardhuanghua I always thought that with this new parameter, requests for too many buckets should be rejected。But I tested it today, making two requests and creating hundreds of thousands of buckets per request。In the end, the machine is still directly OOM。 |
If the JVM memory has been used for 60%, a large request comes in and consumes 50% of the memory. Since the default value of the circuit breaker is 60%, the circuit breaker will not be triggered, and the consequence is oom |
Bucket aggregation will consume a lot of memory on coordinate node if it has a huge number of resulting buckets. search.max_buckets setting could limit maximum number of buckets allowed in a single response. Sometimes user may increase this setting to get more buckets, but it also increases the risk of OOM. It's hard to evaluate a suitable value for max_buckets.
With this PR, we introduced search.check_buckets_step_size setting. Whenever allocating search.check_buckets_step_size new buckets, we do a parent circuit breaker checking. Based on this setting, user could control aggregation memory with a certain step size. We are also considering whether we could deprecate search.max_buckets in the future.