-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential for an open connection leak #8654
Comments
Hi! Thanks for the report! Can you provide further steps to reproduce this? Information such as what Vault version you're on, what API calls you're making, and any further information that might help us reproduce this locally on our own machines. Also, are you seeing ongoing growth in memory usage, or does memory tend to stabilize like this? |
I am observing the same memory leak on my Vault 1.4.0 server running on Kubernetes. You can see the memory usage graph below: It might be useful to note that this replica became the active server some time on April 22. I have a bunch of secret engines mounted:
And the following auth mounts:
I am also having Prometheus constantly scrape Vault server for metrics. Is there anything more I can do to provide more information? The pod will probably OOM soon and be killed by Kubernetes. |
@lawliet89 You can run Vault debug (https://www.vaultproject.io/docs/commands/debug) which will, among other things, capture heap usage stats. We can help analyze these logs (they may be emailed or otherwise shared outside of GitHub). |
Thanks @kalafut May I know who I should email the logs to? |
@lawliet89 You can send them to me (email is in Vault's commits). |
@lawliet89 Thanks for sending those. There is almost 1GB in use by the write buffer for GCS storage (in a function in the GCS sdk). I'm not familiar enough with the GCS storage details to know why, however. I do wonder what effect (if any) https://www.vaultproject.io/docs/configuration/storage/google-cloud-storage#chunk_size would have. |
I haven't really looked into it. but maybe this can be fixed with a bump to the Google Cloud SDK. An update was recently reverted in ea1564b due to etcd-io/etcd#11154 of which the fix should be released shortly. I will try halving the chunk size to see if it makes any difference. FWIW, I'm in the process of migrating to Raft storage. This might not be a fix for future readers, however. |
Thanks, that is a useful datapoint. |
I may also be seeing similar memory usage growth, though we've been running into memory exhaustion issues generally that make the graphs not super clear. Though I am on version 1.1.0. @lawliet89 I assume you halved it to |
@sidewinder12s Yes. |
For other folks, I suspect this issue may be compounding any performance issues I'm having with GCS: #8761 I've got 1000s of new nodes doing GCP logins daily that probably are not helping my GCS performance. |
@kalafut I'm investigating a somewhat related memory leak and noticed the following using
Seems like the string matching is very expensive here |
The line references on mem usage like the one above seem to be off, can you do |
@calvn my bad, I was on
I'm happy to provide more debugging context and info on the issue, can I email you directly? |
@elsesiy sure thing! You can send it to |
Looks like there were a couple of different investigations going on here. I'm not sure if there were any resolutions or not, but it's been a long time and I suspect people have moved on. If people think they're seeing connection or memory leaks in current Vault versions, please open a new issue. |
I was trying to analyze memory usage of my application when I noticed the following pprof output:
I am trying to understand why 29.31MB was accumulated at line 844. My hypothesis is that at line 846 https://github.com/hashicorp/vault/blob/master/api/client.go#L846 in function
RawRequestWithContext
a response object is created whose body is to be closed by the caller of this function.However consider the case of a redirect where at line 890 the statement
goto START
is executed. The line 846 will be executed again creating another response object. The previously created response object's body is never closed causing a connection and memory leak for every redirect.The text was updated successfully, but these errors were encountered: