-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When starting or changing the active pod, the Vault does not work for a very long time (many pki certs in backend storage) #21042
Comments
I do not understand this point:
When does PKI do this? We delay tidy until one full Can you share your Is this a limitation of your backend-end data store? Note that Hashicorp recommends using the Raft backend, which to my knowledge, does not have this limitation. |
Oh dear... it looks like listing all the certificates in the PKI store has been added in a place that blocks post-unseal processing: vault/builtin/logical/pki/backend.go Line 441 in 6fa423e
|
Ahh yes, the count. I thought this was disabled by default: vault/builtin/logical/pki/backend.go Line 755 in 6fa423e
|
I tried both options for "enabled" and false and true, Thanks again everyone for the quick response )) |
@ser6iy Hmm looks like the config option to disable counting wasn't backported to 1.12. Could you upgrade and see if that fixes it? It seems like you tested on 1.13.2, where I would've expected it to be respected (regardless of And yes, we got customer requests for metrics around number of certificates issued, which unfortunately has resulted in issues like this... |
@cipherboy I will try to add this parameter tomorrow and check if it works on version 1.13.2 |
Ahh I wonder if it got dropped in my reorg of that section for 1.13. Damn. I'll update that tomorrow, thank you! |
@cipherboy
after apply: |
maintain_stored_certificate_counts was only added in 1.14. |
@maxb |
I'm just a community contributor, and have no insight into HashiCorp's future intentions. It looks like at the moment, based on the public information in #18186 and linked PRs, it seems currently unplanned. |
@maxb is a very trusted community contributor who keeps all of us, and especially me, honest and we appreciate him for that. :-) That said, I've added 1.13 and 1.12 backport labels so hopefully it should be in the next set of releases (1.13.3 was just cut like, yesterday or today, so this would be 1.13.4 I believe). But it looks like it doesn't apply cleanly to 1.12 so I'm curious how much work that will be... 1.14.0-rc1 seems to be available now, which as Max points out, actually has the fix, but also apparently lacks documentation, so I shall fix that too. |
OK, a known issue has been added for 1.12 and 1.13, and the fix backported to those branches. The fix should be present in the next set of releases for 1.12, 1.13, and 1.14+ Thank you for reporting this! :-) |
When starting or changing the active pod, the Vault does not work for a very long time (many pki certs in backend storage).
Checked on versions 1.12.4, 1.12.6, 1.13.2.
Auto cleaning is enabled, it's just that certificates must be stored for a certain time and there can be a lot of them.
Dynamodb is used as a backend, at the time of starting or switching the active pod, it uses all on demand resources to the maximum, and active pod use memory ~x10 from normal state.
Checked with a small number of PKI certificates up to 100 everything loads and switches instantly.
And from 20 000 000 PKI certs the active sub hangs in a non-working state for about ~ 25 minutes.
Debug logs from active pod:
Vault has enough resources.
The problem is that it reads all PKI certificates into memory from backend storage, and until it finishes it, it does not work, does not respond to requests, it just hangs.
This behavior of the service in high availability mode is at least strange, the service should start and work and then index or check certificates for expire in background.
The text was updated successfully, but these errors were encountered: