-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deprecation log indexing increases JVM pressure on coordinator nodes consuming all heap #85582
Comments
Pinging @elastic/es-core-infra (Team:Core/Infra) |
We managed to disable deprecation logs completely by setting this: "logger" : {
"org" : {
"elasticsearch" : {
"deprecation" : "ERROR"
}
}
} This has solved the problem of the memory leak. However, it is suboptimal as we do want to re-enable deprecation logging back before 8.x upgrade. |
@gregolsen can you provide use with sample of deprecation logs ? Can you see if they are duplicated/unique? |
Hey @pgomulka, I queried all the deprecation logs emitted at the time I was reproducing the issue and they are duplicates of just a few different log lines: CloudWatch Logs Insights
Our
|
@gregolsen the config looks correct. |
@pgomulka I'd prefer not to share the whole file - it is over 60MB even in compressed form and I want to make sure I'm not sharing anything private that might be in there.
So basically the same data as in the plain-text log deprecation log file (different counts because I grepped the logs on a different box). Hope this is helpful. |
I suspect that you have unique x-opaque-id on every request which is causing deprecation log deduplication to fail. |
I think you are correct - they are uniqueue:
|
this looks very much like #82810 |
@pgomulka Just to make sure I understood you correctly – is the assumption that all of those slow log items were created as a result of a request coming from Kibana? |
@gregolsen yes, I think they might be originated from different dashboards you have, or just plain discovery search. also given the amount of logs are being produced, I think you should disable the deprecation logs until you can upgrade to latest 7.17.x
|
It should be surfaced as |
The volume of the deprecation logs followed the daily traffic seasonality. Kibana audit log (excluding ELB health checks) is tiny in comparison with the amount of deprecation logs produced by the cluster (hundreds of millions log lines in just a few days).
I also checked - we don't have a single dashboard created in Kibana for that cluster. Kibana is used rarely and only for manual queries. Based on the above I'm still not convinced that Kibana requests generated these deprecation logs. Everything points towards the regular production traffic being responsible for generating these messages. While we upgraded the version to 7.17 we haven't done any work on changing the code to be compatible with 8.x - hence all the deprecation warnings about "Specifying types in document/search" |
it all comes down to incorrect use of x-opaque-id which is used for deduplication. If you use a unique id per each request then deduplication won't work.
|
given this is fixed in 7.17.3 (as per #85582 (comment) )I will close an issue. |
Sorry for the slow response - I was finally able to find where we indeed were setting a unique |
Elasticsearch Version
7.17.0
Installed Plugins
discovery-ec2,repository-s3
Java Version
17.0.1
OS Version
5.10.102-99.473.amzn2.x86_64
Problem Description
After upgrading to 7.17.0 from 7.9.3 cluster coordinator nodes are experiencing elevated JVM pressure.
This issue manifests across all the cluster we run. However, in one particular case, coordinator nodes were unable to garbage-collect their way out of this. Over a few hours those nodes consumed all the heap and started to breach parent circuit breaker (charts below is JVM memory pressure in percentage of "old" pool used).
We took a heap dump on a running node and it seems like
class org.elasticsearch.action.bulk.BulkProcessor @ 0x4322cce48
is consuming most of the heap:Upon inspecting the objects it seems like heap is full of objects with deprecation logs in a form of events written into the
deprecation.elasticsearch
data stream:As a mitigation, we attempted to disable deprecation logs indexing (which seems like was enabled by default in #76292). However, this didn't work and hasn't resulted in any reduction in JVM memory pressure.
Steps to Reproduce
Unfortunately I don't have any generic steps to reproduce this apart from adding a coordinator node using 7.17.0 to one of our clusters.
That said, I was able to easily reproduce this issue so it does seem like a genuine problem and not a fluke.
Logs (if relevant)
No response
The text was updated successfully, but these errors were encountered: