Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with authrequest objects #1091

Closed
blakebarnett opened this issue Oct 2, 2017 · 10 comments
Closed

Performance issues with authrequest objects #1091

blakebarnett opened this issue Oct 2, 2017 · 10 comments

Comments

@blakebarnett
Copy link

blakebarnett commented Oct 2, 2017

We're running Dex 2.4.1 with the LDAP connector and kubernetes storage backend. Kubernetes has RBAC enabled and is using Dex via the OIDC authorization plugin.

We're seeing large amounts of authrequest objects being created even though the cluster is quite small with very little usage:

kubectl get authrequests -n security | wc -l
   17359

Interestingly, 2 of our larger production clusters have roughly the same amount.

EDIT: All of the objects it returns are for the current day (expiration is working properly)

When Dex gets restarted it makes etcd latency spike and we get alerts, which seems to be because there are so many authrequest objects. (Still running etcd v2, k8s v1.7.5)

Is this expected? Should we move to a different backend?

@ericchiang
Copy link
Contributor

The TPR/CRD backend is specifically stated to not be performance optimized. How many login events are you handling?

@blakebarnett
Copy link
Author

Very few logins, but it seems like the authrequests are being created when k8s does an RBAC authorization?

@ericchiang
Copy link
Contributor

@blakebarnett no, only when a user attempts to login

@blakebarnett
Copy link
Author

So apparently it's creating authrequests every time our ELB health-check hits it with HTTPS (we changed it from HTTP because it was filling the logs with TLS handshake error messages)

It seems strange that it creates one of these records even just hitting HTTPS:<nodeport>/healthz, but as soon as I changed it back to HTTP, the garbage collection has started cleaning them up.

So, my question is, what's the proper way to health check Dex if it's terminating SSL? I noticed #753 was recommended to solve this, but we are using an ELB in front of Dex and would really like it to work like #682

Maybe it would make sense to expose a separate non-SSL port just for health checks, and maybe prometheus metrics? ;)

@ericchiang
Copy link
Contributor

ericchiang commented Oct 10, 2017 via email

@blakebarnett
Copy link
Author

Yeah, we're seeing TLS handshake errors after switching the health check to just plain TCP as expected as well as GC errors that seem to be related (though now that we aren't using HTTPS for the health-check they are slowly disappearing):

dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="garbage collection failed: failed to delete auth request: not found"
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 10.4.86.184:63503: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 10.4.94.141:41004: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 100.66.147.0:19626: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 100.66.237.128:4860: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 10.4.86.184:63515: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 100.66.247.64:26080: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 100.66.237.128:4870: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:28 http: TLS handshake error from 10.4.86.184:63525: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:28 http: TLS handshake error from 10.4.68.129:29046: EOF

@gtorre
Copy link

gtorre commented Oct 12, 2017

@blakebarnett we're thinking of running Dex with kubernetes storage backend as well. Would you mind sharing your opinion on it so far? How does it perform? How much storage does it take up? Roughly how many kubernetes users do you have? Thanks!

@blakebarnett
Copy link
Author

It's been working great for us aside from this issue, which seems to be specific to an external LB doing a non-TLS health-check

@blakebarnett
Copy link
Author

We upgraded to Dex 2.7.1 and migrated to CRDs, but we are still seeing ~17.5k authrequests at all times. We tried changing the way the health check works (TLS/non-TLS etc) with no luck.

We are on k8s 1.7.9 and seeing big spikes in etcd access latency for type: []unstructured.Unstructured which seems to point to TPR/CRDs. (There's a bug in 1.8 related to this but I don't think it effects 1.7.9? We don't see the symptoms seen here: kubernetes/kubernetes#53485)

@nabokihms
Copy link
Member

  1. Health checker was refactored to work in the background.
  2. To prevent Dex from creating auth requests on authentication, we recommend using a reverse proxy with rate limit capabilities, e.g., ingress-nginx with the nginx.ingress.kubernetes.io/limit-rps annotation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants