-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PKI Secret Engine auto-tidy #21041
Comments
@ser6iy This is indeed an issue, and a bit of a hard problem... Presently, PKI tidies are expensive for a couple of reasons:
We also have some customers run a huge number (3k+) PKI mounts, with lots of stored certificates per mount, making PKI tidies not only expensive in the local sense (of a mount), but in the global, cluster-resource expensive sense as well. While a cron based scheduling is desirable for some, with this large number of mounts, if all of these PKI mounts tidied at exactly the same time, it would definitely overwhelm the cluster. By using If there are not a huge number of certificates in the mount (and thus, not starving the Vault cluster for memory), then (IMO) Additionally, each mount type today has its own tidy operation with its own semantics. PKI's is one of the more complicated, and hence the introduction of an automated, cancel-able tidy, but Transform has one, AppRole has another, &c. Here, IMO, we really need a Vault-wide tidy interface (\o hence the Core tag) that standardizes perhaps both mechanisms (interval-based and time-of-day-based) tidy running, that allows mutual-exclusion, perhaps time-boxed execution (to prevent tidy from continuing past some duration), and a standard UX to configure & enable tidies, regardless of the underlying mount type. Note that this statement:
already occurs under the existing design. |
@cipherboy |
The wrong concept was used from the beginning, it needs to be redone.
PKI Secret Engine documentation for auto-tidy (https://developer.hashicorp.com/vault/api-docs/secret/pki#configure-automatic-tidy) has a parameter interval_duration (https://developer.hashicorp.com/vault/api-docs/secret/pki#interval_duration). This needs to explicitly call out the default value to be 12 hours.
[interval_duration](https://developer.hashicorp.com/vault/api-docs/secret/pki#interval_duration) (string: "") - Specifies the duration between automatic tidy operations; note that this is from the end of one operation to the start of the next so the time of the operation itself does not need to be considered.
Since the next cleaning starts at an interval_duration after the end of the previous one, over time, its start will be shifted, and will be released during business hours when the Vault is already loaded with users.
With a significant number of PKI certificates stored, the Vault heavily loads the backend storage with read operations and uses much more than usual (x10) RAM for indexing or checking them.
Therefore, for these tasks, the approach as in the cron is more optimal.
Need to specify the day and time to start cleaning, for example, on Friday at 11pm, and then cleaning will go on all weekend, without interfering with the main work and going faster because there is no load or it is much less from users.
And add a check that if the cleaning is already in progress (there can be a lot of certificates, they missed the previous cleaning or something else), then do not start a new one.
The text was updated successfully, but these errors were encountered: