Performance issues with authrequest objects #1091

blakebarnett · 2017-10-02T20:12:19Z

We're running Dex 2.4.1 with the LDAP connector and kubernetes storage backend. Kubernetes has RBAC enabled and is using Dex via the OIDC authorization plugin.

We're seeing large amounts of authrequest objects being created even though the cluster is quite small with very little usage:

kubectl get authrequests -n security | wc -l
   17359

Interestingly, 2 of our larger production clusters have roughly the same amount.

EDIT: All of the objects it returns are for the current day (expiration is working properly)

When Dex gets restarted it makes etcd latency spike and we get alerts, which seems to be because there are so many authrequest objects. (Still running etcd v2, k8s v1.7.5)

Is this expected? Should we move to a different backend?

The text was updated successfully, but these errors were encountered:

ericchiang · 2017-10-09T18:12:58Z

The TPR/CRD backend is specifically stated to not be performance optimized. How many login events are you handling?

blakebarnett · 2017-10-09T19:22:24Z

Very few logins, but it seems like the authrequests are being created when k8s does an RBAC authorization?

ericchiang · 2017-10-09T19:36:51Z

@blakebarnett no, only when a user attempts to login

blakebarnett · 2017-10-10T17:45:24Z

So apparently it's creating authrequests every time our ELB health-check hits it with HTTPS (we changed it from HTTP because it was filling the logs with TLS handshake error messages)

It seems strange that it creates one of these records even just hitting HTTPS:<nodeport>/healthz, but as soon as I changed it back to HTTP, the garbage collection has started cleaning them up.

So, my question is, what's the proper way to health check Dex if it's terminating SSL? I noticed #753 was recommended to solve this, but we are using an ELB in front of Dex and would really like it to work like #682

Maybe it would make sense to expose a separate non-SSL port just for health checks, and maybe prometheus metrics? ;)

ericchiang · 2017-10-10T19:34:02Z

The /healthz endpoint is the correct way. We'd be open to exporting prometheus metrics on a separate port though. What's weird is that the /healthz endpoint should be cleaning up the authrequests itself https://github.com/coreos/dex/blob/f3c85e6936b064d2a7a6ef46fa4bb58d6e295051/server/handlers.go#L23-L51 Do you have any logs from your dex instance indicating errors during the health check?

…

On Tue, Oct 10, 2017 at 10:45 AM blakebarnett ***@***.***> wrote: So apparently it's creating authrequests every time our ELB health-check hits it with HTTPS (we changed it from HTTP because it was filling the logs with TLS handshake error messages) It seems strange that it creates one of these records even just hitting HTTPS:<nodeport>/healthz, but as soon as I changed it back to HTTP, the garbage collection has started cleaning them up. So, my question is, what's the proper way to health check Dex if it's terminating SSL? I noticed #753 <#753> was recommended to solve this, but we are using an ELB in front of Dex and would really like it to work like #682 <#682> Maybe it would make sense to expose a separate non-SSL port just for health checks, and maybe prometheus metrics? ;) — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1091 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACO_XT2qJhACaMzsDkLzvBJ1WMYWLrGsks5sq606gaJpZM4PrLzy> .

blakebarnett · 2017-10-10T19:52:51Z

Yeah, we're seeing TLS handshake errors after switching the health check to just plain TCP as expected as well as GC errors that seem to be related (though now that we aren't using HTTPS for the health-check they are slowly disappearing):

dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="failed to delete auth request: not found"
dex-312619737-bcl35 dex time="2017-10-10T19:48:24Z" level=error msg="garbage collection failed: failed to delete auth request: not found"
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 10.4.86.184:63503: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 10.4.94.141:41004: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 100.66.147.0:19626: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:08 http: TLS handshake error from 100.66.237.128:4860: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 10.4.86.184:63515: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 100.66.247.64:26080: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:18 http: TLS handshake error from 100.66.237.128:4870: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:28 http: TLS handshake error from 10.4.86.184:63525: EOF
dex-312619737-bcl35 dex 2017/10/10 19:49:28 http: TLS handshake error from 10.4.68.129:29046: EOF

gtorre · 2017-10-12T21:02:38Z

@blakebarnett we're thinking of running Dex with kubernetes storage backend as well. Would you mind sharing your opinion on it so far? How does it perform? How much storage does it take up? Roughly how many kubernetes users do you have? Thanks!

blakebarnett · 2017-10-20T18:19:12Z

It's been working great for us aside from this issue, which seems to be specific to an external LB doing a non-TLS health-check

blakebarnett · 2017-11-11T19:23:34Z

We upgraded to Dex 2.7.1 and migrated to CRDs, but we are still seeing ~17.5k authrequests at all times. We tried changing the way the health check works (TLS/non-TLS etc) with no luck.

We are on k8s 1.7.9 and seeing big spikes in etcd access latency for type: []unstructured.Unstructured which seems to point to TPR/CRDs. (There's a bug in 1.8 related to this but I don't think it effects 1.7.9? We don't see the symptoms seen here: kubernetes/kubernetes#53485)

nabokihms · 2023-08-03T12:37:42Z

Health checker was refactored to work in the background.
To prevent Dex from creating auth requests on authentication, we recommend using a reverse proxy with rate limit capabilities, e.g., ingress-nginx with the nginx.ingress.kubernetes.io/limit-rps annotation.

srenatus mentioned this issue Feb 4, 2019

Healthcheck via /healthz endpoint is write-intensive operation for storage backend #1386

Closed

nabokihms closed this as completed Aug 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance issues with authrequest objects #1091

Performance issues with authrequest objects #1091

blakebarnett commented Oct 2, 2017 •

edited

Loading

ericchiang commented Oct 9, 2017

blakebarnett commented Oct 9, 2017

ericchiang commented Oct 9, 2017

blakebarnett commented Oct 10, 2017

ericchiang commented Oct 10, 2017 via email

blakebarnett commented Oct 10, 2017

gtorre commented Oct 12, 2017

blakebarnett commented Oct 20, 2017

blakebarnett commented Nov 11, 2017

nabokihms commented Aug 3, 2023

Performance issues with authrequest objects #1091

Performance issues with authrequest objects #1091

Comments

blakebarnett commented Oct 2, 2017 • edited Loading

ericchiang commented Oct 9, 2017

blakebarnett commented Oct 9, 2017

ericchiang commented Oct 9, 2017

blakebarnett commented Oct 10, 2017

ericchiang commented Oct 10, 2017 via email

blakebarnett commented Oct 10, 2017

gtorre commented Oct 12, 2017

blakebarnett commented Oct 20, 2017

blakebarnett commented Nov 11, 2017

nabokihms commented Aug 3, 2023

blakebarnett commented Oct 2, 2017 •

edited

Loading