-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessively high memory usage when using client-side zstd compression in confighttp #8216
Comments
I managed to reproduce this in a test K8s cluster. I've also written some benchmarks which demonstrate the problem, although this is somewhat challenging for reasons which I'll elaborate upon. PresentationThe test cluster had a synthetic workload generating 10k log lines per second running. The resource consumption as reported by With gzip compression
With zstd compression
Nothing about the code for confighttp indicates why there would be such a large difference, and why the zstd encoder would allocate so much memory. Root causeI believe the root cause is a combination of the zstd encoder allocating a fair amount of memory by default, and our pooling mechanism for encoder just not working as expected. We put encoders in a I've sketched out and tested a solution using a different pooling mechanism here: main...swiatekm-sumo:opentelemetry-collector:fix/zstd-encoder-pooling. With this change, the memory usage is reasonable again. |
Would you like to submit a PR to fix the issue? |
@atoulme I'll try when I get the time. I still don't understand why the problem is as severe as it is. My fix is effective, so I must be roughly correct about the root cause, but I'd like to try to understand it better before submitting. |
@swiatekm-sumo @atoulme here are some of my findings on this. I deployed the otel collector on a k8s cluster today with atleast 50 nginx pods generating a new log every second. The difference seen in the memory usage with both the compression types is below:
The difference in memory usage wasn't 10x as seen previously. I used |
@swiatekm-sumo @atoulme Here are more findings and results from tests v0.92.0 v0.94.0 There seems to be some improvement in Apart from the above, there was a known memory leak with using A possible change we could make here is to simply use |
Thank you for this inquiry and providing data, I appreciate it. Do we have appropriate benchmarks for zstd usage that we could use to test the simple change you mention? |
I can work on the benchmarks to test the zstd compression with and without concurrency enabled. I do want to emphasize that the |
Ideally we'd have a benchmark showing the difference, though from trying to create one myself, this may not be so easy to do. The behaviour is timing-sensitive due to the use of |
Created the following draft PR for this, to show the difference in memory allocation with concurrency disabled - #9749 |
**Description:** zstd benchmark tests added The goal of this PR is to disable concurrency in zstd compression to reduce its memory footprint and avoid a known issue with goroutine leaks. Please see - klauspost/compress#264 **Link to tracking Issue:** #8216 **Testing:** Benchmark test results below ``` BenchmarkCompression/zstdWithConcurrency/compress-10 21392 55855 ns/op 187732.88 MB/s 2329164 B/op 28 allocs/op BenchmarkCompression/zstdNoConcurrency/compress-10 29526 39902 ns/op 262787.42 MB/s 1758988 B/op 15 allocs/op input => 10.00 MB ```
Resolved in #9749 |
@djluck that's strange given that the server side appears to already have our client-side fix implemented. Can you open a new issue with your findings? |
Sure thing @swiatekm |
Describe the bug
We've added zstd support to the server side of confighttp in #7927. I rolled this out for OTLP traffic between an agent and a gateway in a K8s environment and saw very significant increase in memory consumption on the client side.
Steps to reproduce
I have some memory profiles saved of this, and I'm planning to create a self-contained reproduction, ideally with just synthetic benchmarks.
What did you expect to see?
Memory consumption in the ballpark of what other compression methods use.
What did you see instead?
Memory consumption for otlphttp exporter using zstd more than 10x the amount for gzip.
What version did you use?
Version: 0.82.0
What config did you use?
The relevant part:
Environment
Kubernetes 1.24, EKS to be precise.
Additional context
I'm reporting this as is so it doesn't get lost and to consolidate reports in case other users experience this. Will update with more data once I'm able to.
The text was updated successfully, but these errors were encountered: