Support for TokenIntrospection and UserInfo cache #20209

sberyozkin · 2021-09-16T15:36:44Z

The problem of the performance being possibly affected due to up to 2 remote calls per every incoming access token (one for the access token introspection - opaque bearer access token or access token returned with the code flow response if it is configured to be the source of roles with non Keycloak providers plus a user info call if required for all the providers) has been known for a while and several users are waiting for a fix, in some cases SLAs can be affected. I believe @stianst and @pedroigor have also mentioned before the possibility of caching the token introspection results for a short configurable period of time.

So this PR does the following:

introduces Uni friendly TokenIntrospectionCache and UserInfoCache interfaces
Minor updates to TokenIntrospection and UserInfo for the custom cache implementations be able to recreate them either from String or javax.json.JsonObject
Minor update to the way UserInfo is prepared - earlier, Vertx JsonObject was created first and then later it was converted to javax.json.JsonObject and now it is the other way around but only if UserInfo is the source of roles since Vertx JsonObject is used for it which is rare enough - so it s a bit more effective now and fits better with the new interfaces.
Updates OidcIdentityProvider to check TokenIntrospectionCache and UserInfoCache and try to add a new entry but only if the current tenant allows to cache.
Simple default token cache implementing both TokenIntrospectionCache and UserInfoCache is provided - in the vast majority of cases it will be the same token which is introspected and/or used to get UserInfo - so this cache uses a single cache entry to hold both objects (one of them can be null), enforces a max number of entries and optionally starts a clean up timer. It is a fairly basic implementation which might be tweaked a bit further but it should be good enough to handle the cases where up to say 3K requests are coming nearly at the same time. Users can register custom ones of course.
Tests verifying the default cache behaves correctly have been added
Docs have been updated

sberyozkin · 2021-09-17T12:30:38Z

@stuartwdouglas @pedroigor it would be good if this PR could make it to 2.3.0.CR1 which is due next Wed as it is a significant improvement for OIDC users. There will be time to tweak a few things if needed before 2.3.0.Final.

pedroigor · 2021-09-20T12:14:34Z

@sberyozkin I'm wondering if makes sense to use a bounded LinkedHashMap and implement the removeEldestEntry method accordingly to remove old entries when adding new entries. We would then have some wrapper in front of this map to also check whether an entry is expired when obtaining entries.

It should help to make things simpler and remove the overhead of having a background task?

sberyozkin · 2021-09-20T13:08:01Z

Hi Pedro @pedroigor Sure that would be neat, but we'd need to sync around LinkedHashMap ? I recall thinking about LinkedHashMap (not about removeEldestEntry though, thanks for reminding about it :-) ) but the fact it was not concurrent ready stopped me...

FYI running the timer is optional, if it is not running then when adding an entry to a full map, one of the oldest entries will be removed. Timer would only help with keeping removing the stale entries proactively say every few mins, but I guess, it the max size is for ex 500-1000 then it is probably not even worth starting it.
I was thinking of having say a concurrent blocking queue and keeping sorting it but I was not sure it was worth it as this sorting would happen on every insert.

I reckon users would plugin custom cache providers in the really demanding productions...

sberyozkin · 2021-09-20T13:16:12Z

@stuartwdouglas Do you think the default cache has to be reworked ? See comments above. I like Pedro's idea and would not mind to explore more options but not sure at the moment it is possible. Default one is meant more to help users get started with experimenting with the cache fast as opposed to providing a high quality implementation which will handle massive amount of concurrent requests, I reckon users would use Infinispan, Caffeine, etc if they really need to super scale

sberyozkin · 2021-09-20T13:20:10Z

Making sure it is effectively sorted with the every addition such that we can just remove the 1st entry when the map is full can be useful but I'm not sure it would not have side-effects - too many sorting attempts in a highly concurrent application...

pedroigor · 2021-09-20T14:55:52Z

I'm not pushing for it. Would be nice to hear from Stuart what he thinks.

stuartwdouglas

Oops, looks like I forgot to hit submit on my review.

There are some issues with the implementation, but I also question how useful this is in a Kubernetes environment, when there is no guarantee you will hit the same nodes each time. Basically the more you scale up, the less useful this becomes (e.g. if you have 2 nodes then it will work 50% of the time, if you have 100 it will be 1%).

extensions/oidc/runtime/src/main/java/io/quarkus/oidc/TokenIntrospectionCache.java

extensions/oidc/runtime/src/main/java/io/quarkus/oidc/UserInfoCache.java

...dc/runtime/src/main/java/io/quarkus/oidc/runtime/DefaultTokenIntrospectionUserInfoCache.java

extensions/oidc/deployment/src/main/java/io/quarkus/oidc/deployment/OidcBuildStep.java

extensions/oidc/runtime/src/main/java/io/quarkus/oidc/runtime/DefaultTenantConfigResolver.java

...dc/runtime/src/main/java/io/quarkus/oidc/runtime/DefaultTokenIntrospectionUserInfoCache.java

sberyozkin · 2021-09-21T09:47:08Z

Hi Stuart, @stuartwdouglas, thanks for the review, I'll give it a try to make this default implementation a bit more robust, I was wondering it it was simpler to drop it totally and just let users register custom ones - but may be it is worth trying to make it better, will help in not too very demanding cases...

sberyozkin · 2021-09-21T10:17:32Z

@stuartwdouglas @pedroigor I'll push to 2.4.x as I won't have time to complete it for 2.3.0.CR1, thanks

quarkus-bot · 2021-09-28T12:14:06Z

This workflow status is outdated as a new workflow run has been triggered.

Failing Jobs - Building `d05abe0`

Status	Name	Step	Failures	Logs	Raw logs
✖	Initial JDK 11 Build	`Build`	⚠️ Check →	Logs	Raw logs

sberyozkin · 2021-09-28T12:42:37Z

@stuartwdouglas I've tried to address all the change requests:

Default cache bean is only registered if it is enabled (but true by default - the test cache bean uses @AltPriority which is a safe way to take over the default one). The default one can also be injected into the custom one if needed if only one of the interfaces has to be customized (I don't expect the users doing it though if they decide to provide a custom cache implementation)
Added OidcContext<T> - but I'd like to follow up with another PR to deprecate the interfaces in TokenStateManager and TenantConfigResolver - as it would make the recently added new methods deprecated and so I'd also like to remove the methods deprecated there earlier - so it would make sense to have a dedicated PR for that
Vertx.setPeriodic is used
Gave it a try with keeping a separate AtomicInteger cache size counter

Have a look please

The operator update based approach does not provide any real guarentees, as multiple threads can be incrementing and decrementing at the same time. compareAndSet must be used to guarentee sizing. In addition this removes the attempt to invalidate an entry if the cache is full. If the cache fills up then this will result in every request iterating over every entry, which has pretty horrible performance characteristics.

stuartwdouglas · 2021-09-29T01:18:01Z

@sberyozkin I pushed some changes that fix some issues, however I think we have missed the window for 2.3.0 so will likely need to add back some of these methods with all the different context classes and deprecate them all.

@geoand I am guessing this is too late for 2.3?

geoand · 2021-09-29T03:21:07Z

You folks will need to decide within the next hour or so if this is going to be backported :)
Otherwise, I won't have enough time to get the release done today

stuartwdouglas · 2021-09-29T03:57:01Z

I think back port it if possible, but if it is going to be a pain we can always go down the depreciation route.

geoand · 2021-09-29T04:08:10Z

Okay. Go ahead and merge then and I'll cherry-pick it to the my backports branch

geoand · 2021-09-29T07:13:56Z

Just as an FYI, I am going to finish the backports once #20103 is in

sberyozkin requested review from stuartwdouglas and pedroigor September 16, 2021 15:36

quarkus-bot bot added area/documentation area/oidc labels Sep 16, 2021

sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from b94641b to d811c6a Compare September 16, 2021 17:00

stuartwdouglas requested changes Sep 21, 2021

View reviewed changes

sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from d811c6a to d05abe0 Compare September 28, 2021 12:05

sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from d05abe0 to 494da57 Compare September 28, 2021 12:22

Support for TokenIntrospection and UserInfo cache

f35be09

sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from 494da57 to f35be09 Compare September 28, 2021 18:20

stuartwdouglas added 2 commits September 29, 2021 11:05

Move to using generic OidcRequestContext<T>

e63538d

stuartwdouglas approved these changes Sep 29, 2021

View reviewed changes

stuartwdouglas merged commit cca576a into quarkusio:main Sep 29, 2021

quarkus-bot bot added this to the 2.4 - main milestone Sep 29, 2021

geoand added the triage/backport? label Sep 29, 2021

geoand modified the milestones: 2.4 - main, 2.3.0.Final Sep 29, 2021

geoand removed the triage/backport? label Sep 29, 2021

sberyozkin deleted the oidc_introspection_userinfo_cache branch September 29, 2021 08:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for TokenIntrospection and UserInfo cache #20209

Support for TokenIntrospection and UserInfo cache #20209

sberyozkin commented Sep 16, 2021 •

edited

Loading

sberyozkin commented Sep 17, 2021

pedroigor commented Sep 20, 2021

sberyozkin commented Sep 20, 2021 •

edited

Loading

sberyozkin commented Sep 20, 2021

sberyozkin commented Sep 20, 2021

pedroigor commented Sep 20, 2021

stuartwdouglas left a comment

sberyozkin commented Sep 21, 2021

sberyozkin commented Sep 21, 2021

quarkus-bot bot commented Sep 28, 2021 •

edited

Loading

Failing Jobs - Building `d05abe0`

sberyozkin commented Sep 28, 2021

stuartwdouglas commented Sep 29, 2021

geoand commented Sep 29, 2021

stuartwdouglas commented Sep 29, 2021

geoand commented Sep 29, 2021

geoand commented Sep 29, 2021

Support for TokenIntrospection and UserInfo cache #20209

Support for TokenIntrospection and UserInfo cache #20209

Conversation

sberyozkin commented Sep 16, 2021 • edited Loading

sberyozkin commented Sep 17, 2021

pedroigor commented Sep 20, 2021

sberyozkin commented Sep 20, 2021 • edited Loading

sberyozkin commented Sep 20, 2021

sberyozkin commented Sep 20, 2021

pedroigor commented Sep 20, 2021

stuartwdouglas left a comment

Choose a reason for hiding this comment

sberyozkin commented Sep 21, 2021

sberyozkin commented Sep 21, 2021

quarkus-bot bot commented Sep 28, 2021 • edited Loading

Failing Jobs - Building d05abe0

sberyozkin commented Sep 28, 2021

stuartwdouglas commented Sep 29, 2021

geoand commented Sep 29, 2021

stuartwdouglas commented Sep 29, 2021

geoand commented Sep 29, 2021

geoand commented Sep 29, 2021

sberyozkin commented Sep 16, 2021 •

edited

Loading

sberyozkin commented Sep 20, 2021 •

edited

Loading

quarkus-bot bot commented Sep 28, 2021 •

edited

Loading

Failing Jobs - Building `d05abe0`