-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
POTENTIAL DEADLOCK: When the rate limit quotas setup takes more time to load the quota config and rules #21338
Comments
I agree with most of your analysis, but I don't think this is quite right:
The way I would phrase it is: if you have a huge number of quota rules, then you may get a My read of the situation is that everything is working as designed. Why do you have so many quota rules? Can you simplify them, e.g. by setting them at the mount level instead of the path level? |
Hi @ncabatoff , Thank you for the response. I do have below query, can you please help me understand this? if the POTENTIAL DEADLOCK is just a log message from the library, at-least after couple of minutes vault should be up and running, but in my case vault is getting crashed. Is this expected or am I missing something here? on a side note:
Yes, configuring the rate limit quotas at the mount level is a good idea when we have huge number of paths. |
No, I'm forgetting that that's a side-effect of the deadlock detector default behaviour. That's probably a bug we should fix, I'll look into that. |
Describe the bug
A clear and concise description of what the bug is.
We tried to apply the rate limit quotas on approximately ~30K paths. When the vault is restarting during it's post unseal process it's trying to setup the rate limit quotas config and rules by acquiring the locks here which is taking more time i.e > 30 seconds and as the vault is unsealed other go routines are making api calls like sys/health, sys/metrics and this is leading to a deadlock situation as the vault is verifying if the path is in the exempt list by acquiring the locks again here
To Reproduce
Steps to reproduce the behavior:
time.sleep(2 * time.minute)
statement here`POTENTIAL DEADLOCK:
Previous place where the lock was grabbed
goroutine 138 lock 0xc00190a618
/tmp/project/vault/quotas/quotas.go:1026 quotas.(*Manager).Setup ??? <<<<<
/tmp/project/vault/core.go:3272 vault.(*Core).setupQuotas ???
/tmp/project/vault/core.go:2332 vault.standardUnsealStrategy.unseal ???
/tmp/project/vault/core.go:2455 vault.(*Core).postUnseal ???
/tmp/project/vault/ha.go:659 vault.(*Core).waitForLeadership ???
/tmp/project/vault/ha.go:479 vault.(*Core).runStandby.func9 ???
/tmp/project/vendor/github.com/oklog/run/group.go:38 run.(*Group).Run.func1 ???
Have been trying to lock it again for more than 30s
goroutine 5159 lock 0xc00190a618
/tmp/project/vault/quotas/quotas.go:731 quotas.(*Manager).RateLimitPathExempt ??? <<<<<
/tmp/project/vault/core.go:3287 vault.(*Core).ApplyRateLimitQuota ???
/tmp/project/http/util.go:63 http.rateLimitQuotaWrapping.func1 ???
/usr/local/go/src/net/http/server.go:2122 http.HandlerFunc.ServeHTTP ???
/tmp/project/http/handler.go:440 http.wrapGenericHandler.func1 ???
/usr/local/go/src/net/http/server.go:2122 http.HandlerFunc.ServeHTTP ???
/tmp/project/vendor/github.com/hashicorp/go-cleanhttp/handlers.go:42 go-cleanhttp.PrintablePathCheckHandler.func1 ???
/usr/local/go/src/net/http/server.go:2122 http.HandlerFunc.ServeHTTP ???
/usr/local/go/src/net/http/server.go:2936 http.serverHandler.ServeHTTP ???
/usr/local/go/src/net/http/server.go:1995 http.(*conn).serve ???
`
Expected behavior
A clear and concise description of what you expected to happen.
Vault should able to restart with out any issues and should able to load the rate limit quotas and rules in less time.
Environment:
vault status
): 1.13.1vault version
):Vault server configuration file(s):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: