-
Notifications
You must be signed in to change notification settings - Fork 207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Jackson 2.17.0 LockFreePool causes memory issues #4729
Comments
I think I found the reason why it started to occur for version 2.8.0:
|
I analysed the issue together with @JannikBrand. We also pulled a thread dump at the time, when the circuit breaker was active and no data was ingested. We could verify, that all threads are waiting for data, either from the network or a queue within DataPrepper. This underlines the issue with the |
This bug might be introduced by a transitive dependency from armeria v1.28.2, which has Jackson 2.17.0 as dependency. This would explain, why the issue with the LockJoinPool for the |
I downloaded the Linux distribution of DataPrepper 2.8.0 and found the vulnerable Jackson version 2.17.0 in the libs folder: This indicates a conflict with the explicit jackson-bom 2.16.1 in Line 72 in 2406edc
As a fix, the armeria version needs to be upgraded to at least 1.29.0. This has already been done by @dlvenable for the main branch. I suggest to backport #4629 to the 2.8 release. Furthermore, the mismatch between the |
@KarstenSchnitter , @JannikBrand , Thank you for reporting this issue and the fantastic analysis! It does appear that Jackson 2.17.1 fixes this. I'm putting together some backport PRs to support a 2.8.1 release to fix this. Would you be able to test this using a locally-built Data Prepper on the |
Signed-off-by: David Venable <[email protected]>
@dlvenable: I talked to @JannikBrand about testing your change. In principle, we are able to verify, whether the upgrade is effective. But we both have a few days off, so that we can only look into that next week. |
Signed-off-by: David Venable <[email protected]>
Signed-off-by: David Venable <[email protected]> (cherry picked from commit 418a2a5)
I checked out the So, I think the change took effect. I did not perform an actual performance test, since I first would have to reproduce this locally with 2.8.0 and afterwards again with the patched 2.8.1 version. I could still do it next week, or instead I could also just confirm that the memory issues do not reoccur in our environment after upgrading to the patched 2.8.1 version. |
Signed-off-by: David Venable <[email protected]> (cherry picked from commit 418a2a5) Co-authored-by: David Venable <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
@JannikBrand , We just released Data Prepper 2.8.1 if you'd like to try to verify that the issue is resolved. |
@dlvenable I verified that the same aspect from my last comment is true for the released version:
From our side the issue can be closed. Thanks for processing and fixing it so quickly! |
You're welcome @JannikBrand. And thank you for the great analysis that helped us resolve this so quickly. |
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
…earch-project#4744) Signed-off-by: David Venable <[email protected]> Signed-off-by: Krishna Kondaka <[email protected]>
Describe the bug
Data Prepper runs in heap OOM issues. This was observed when ingesting OTel metrics via Data Prepper into OpenSearch (~400 metric data points per second).
(The picture shows the summed up heap memory of 2 Data Prepper instances. The instances do not crash, since circuit breakers are configured and constantly open.)
The memory is taken away from objects sitting in the Old Gen space.
Possible trigger: The issue started to occur when updating from DP version 2.7.0 to 2.8.0.
I created a heap dump:
The
org.opensearch.dataprepper.pipeline.Pipeline
object is taking away almost all the memory. Within the dominator tree, I can trace back the memory consumption to the jackson LockFreePool:There are some known issues with the LockFreePool, e.g. see
LockFreePool
appears to cause unintended object retention (~= memory leak) FasterXML/jackson-core#1260I am not sure what jackson version is exactly used within the opensearch sink, but at least we see that the LockFreePool is used.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
For comparison this is how the heap utilization looks without this issue (same ingestion workload):
Environment (please complete the following information):
The text was updated successfully, but these errors were encountered: