-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upserting Keys via Transit seems to leak Memory #5746
Comments
We cache keys. We don't expire the cache, except to update entries. You could either use mount tuning to turn off caching (at a serious performane penalty), or you may want to look into derived keys. Curious what your use case is here as the policy management for that must be pretty tricky. |
By mount tuning you are referring to setting something like the force-no-cache param for the mount? I don't see that you can change that setting after the mount is enabled ( What is the expected memory footprint per key? The graph above shows 10GB of RAM used for 1 million keys which means approx 10KB per 256bit key. The use case is GDPR related and creates one key per user so we can permanently delete it once the user requests it. We are already using key derivation to cover different risk classes of PII so we already cut the number of keys by 20. The policies are actually pretty simple as they act on the context parameter only and not on specific keys (for this specific transit mount). The alternative is to use a data key and handle the encryption with specific keys outside of vault but we would really not like to go that route as this meant reimplementing convergent encryption, safely generating keys and essentially solving the problems again that Vault already solves.. From a functional point of view as well as performance wise Vault is fully sufficient and we are well aware that the transit backend was not specifically designed for this amount of keys, yet from Vault's architecture nothing prohibits supporting many keys on the transit backend as far as we can see. |
10KB per key is a lot. I wonder if you're hitting an open bug that affects various things in Go but specifically people have seen with GCS where connections can leak, which could mean that allocated memory isn't being cleared. See #5419 (comment) One thing to try from that angle is setting I also noticed that there are some plateaus and spikes in your graph, especially as time goes on. This could be consistent with the map we're using for the cache being called to grow. (We are using a sync.Map if you're Go-inclined, which specifically isn't great for this use-case but because we didn't expect it.) The use case makes sense. I see two options basically:
Unfortunately the cache we usually use has no way to do a Get + Extend function. I'm going to have some internal discussions about whether we want to maintain a fork or not. |
We tested creating a new mount with force-no-cache option:
The result was that the memory usage now plateaus below 1GB. Tested by upserting 40 Million keys. We also found that version 0.10.1 did not plateau with force-no-cache. Maybe because of the mentioned GCS leak. But memory usage increased much slower, but eventually hit any limit we set. However, with force-no-cache option for a new mount and vault versions 0.11.5 or 1.0.0 memory plateaus and duration per 10000 keys doubled (compared to key cache enabled) or something along that line. I plan to give more detailed performance test data in a follow up post. (I hope I find some time for tests) Thx for the help @jefferai. And of course we would like to see some options for caching strategies. Would be fantastic if you consider a use case similar to ours in your testing and development. |
@craftey I think we could make the cache an interface such that it can either be unlimited or we can make it an LRU based on options configured on the mount. |
Describe the bug
We are calling the transit's backends encrypt method (i.e.
encrypt/<key>
) in batch mode with a large amount of keys (millions). The memory consumption of vault just increases monotonically and never decreases again. (See attached picture from our monitoring)To Reproduce
Steps to reproduce the behavior:
Expected behavior
Memory goes down again or plateaus.
Environment:
Vault server configuration file(s):
default config with Google Storage as storage backend. No other setting (like cache setting) have been changed from the default.
The text was updated successfully, but these errors were encountered: