Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for TokenIntrospection and UserInfo cache #20209

Merged

Conversation

sberyozkin
Copy link
Member

@sberyozkin sberyozkin commented Sep 16, 2021

Fixes #12800.

The problem of the performance being possibly affected due to up to 2 remote calls per every incoming access token (one for the access token introspection - opaque bearer access token or access token returned with the code flow response if it is configured to be the source of roles with non Keycloak providers plus a user info call if required for all the providers) has been known for a while and several users are waiting for a fix, in some cases SLAs can be affected. I believe @stianst and @pedroigor have also mentioned before the possibility of caching the token introspection results for a short configurable period of time.

So this PR does the following:

  • introduces Uni friendly TokenIntrospectionCache and UserInfoCache interfaces
  • Minor updates to TokenIntrospection and UserInfo for the custom cache implementations be able to recreate them either from String or javax.json.JsonObject
  • Minor update to the way UserInfo is prepared - earlier, Vertx JsonObject was created first and then later it was converted to javax.json.JsonObject and now it is the other way around but only if UserInfo is the source of roles since Vertx JsonObject is used for it which is rare enough - so it s a bit more effective now and fits better with the new interfaces.
  • Updates OidcIdentityProvider to check TokenIntrospectionCache and UserInfoCache and try to add a new entry but only if the current tenant allows to cache.
  • Simple default token cache implementing both TokenIntrospectionCache and UserInfoCache is provided - in the vast majority of cases it will be the same token which is introspected and/or used to get UserInfo - so this cache uses a single cache entry to hold both objects (one of them can be null), enforces a max number of entries and optionally starts a clean up timer. It is a fairly basic implementation which might be tweaked a bit further but it should be good enough to handle the cases where up to say 3K requests are coming nearly at the same time. Users can register custom ones of course.
  • Tests verifying the default cache behaves correctly have been added
  • Docs have been updated

@sberyozkin
Copy link
Member Author

@stuartwdouglas @pedroigor it would be good if this PR could make it to 2.3.0.CR1 which is due next Wed as it is a significant improvement for OIDC users. There will be time to tweak a few things if needed before 2.3.0.Final.

@pedroigor
Copy link
Contributor

@sberyozkin I'm wondering if makes sense to use a bounded LinkedHashMap and implement the removeEldestEntry method accordingly to remove old entries when adding new entries. We would then have some wrapper in front of this map to also check whether an entry is expired when obtaining entries.

It should help to make things simpler and remove the overhead of having a background task?

@sberyozkin
Copy link
Member Author

sberyozkin commented Sep 20, 2021

Hi Pedro @pedroigor Sure that would be neat, but we'd need to sync around LinkedHashMap ? I recall thinking about LinkedHashMap (not about removeEldestEntry though, thanks for reminding about it :-) ) but the fact it was not concurrent ready stopped me...

FYI running the timer is optional, if it is not running then when adding an entry to a full map, one of the oldest entries will be removed. Timer would only help with keeping removing the stale entries proactively say every few mins, but I guess, it the max size is for ex 500-1000 then it is probably not even worth starting it.
I was thinking of having say a concurrent blocking queue and keeping sorting it but I was not sure it was worth it as this sorting would happen on every insert.

I reckon users would plugin custom cache providers in the really demanding productions...

@sberyozkin
Copy link
Member Author

@stuartwdouglas Do you think the default cache has to be reworked ? See comments above. I like Pedro's idea and would not mind to explore more options but not sure at the moment it is possible. Default one is meant more to help users get started with experimenting with the cache fast as opposed to providing a high quality implementation which will handle massive amount of concurrent requests, I reckon users would use Infinispan, Caffeine, etc if they really need to super scale

@sberyozkin
Copy link
Member Author

Making sure it is effectively sorted with the every addition such that we can just remove the 1st entry when the map is full can be useful but I'm not sure it would not have side-effects - too many sorting attempts in a highly concurrent application...

@pedroigor
Copy link
Contributor

I'm not pushing for it. Would be nice to hear from Stuart what he thinks.

Copy link
Member

@stuartwdouglas stuartwdouglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, looks like I forgot to hit submit on my review.

There are some issues with the implementation, but I also question how useful this is in a Kubernetes environment, when there is no guarantee you will hit the same nodes each time. Basically the more you scale up, the less useful this becomes (e.g. if you have 2 nodes then it will work 50% of the time, if you have 100 it will be 1%).

@sberyozkin
Copy link
Member Author

Hi Stuart, @stuartwdouglas, thanks for the review, I'll give it a try to make this default implementation a bit more robust, I was wondering it it was simpler to drop it totally and just let users register custom ones - but may be it is worth trying to make it better, will help in not too very demanding cases...

@sberyozkin
Copy link
Member Author

@stuartwdouglas @pedroigor I'll push to 2.4.x as I won't have time to complete it for 2.3.0.CR1, thanks

@sberyozkin sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from d811c6a to d05abe0 Compare September 28, 2021 12:05
@quarkus-bot
Copy link

quarkus-bot bot commented Sep 28, 2021

This workflow status is outdated as a new workflow run has been triggered.

Failing Jobs - Building d05abe0

Status Name Step Failures Logs Raw logs
Initial JDK 11 Build Build ⚠️ Check → Logs Raw logs

@sberyozkin sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from d05abe0 to 494da57 Compare September 28, 2021 12:22
@sberyozkin
Copy link
Member Author

@stuartwdouglas I've tried to address all the change requests:

  • Default cache bean is only registered if it is enabled (but true by default - the test cache bean uses @AltPriority which is a safe way to take over the default one). The default one can also be injected into the custom one if needed if only one of the interfaces has to be customized (I don't expect the users doing it though if they decide to provide a custom cache implementation)
  • Added OidcContext<T> - but I'd like to follow up with another PR to deprecate the interfaces in TokenStateManager and TenantConfigResolver - as it would make the recently added new methods deprecated and so I'd also like to remove the methods deprecated there earlier - so it would make sense to have a dedicated PR for that
  • Vertx.setPeriodic is used
  • Gave it a try with keeping a separate AtomicInteger cache size counter

Have a look please

@sberyozkin sberyozkin force-pushed the oidc_introspection_userinfo_cache branch from 494da57 to f35be09 Compare September 28, 2021 18:20
The operator update based approach does not provide any real guarentees,
as multiple threads can be incrementing and decrementing at the same
time. compareAndSet must be used to guarentee sizing.

In addition this removes the attempt to invalidate an entry if the cache
is full. If the cache fills up then this will result in every request
iterating over every entry, which has pretty horrible performance
characteristics.
@stuartwdouglas
Copy link
Member

@sberyozkin I pushed some changes that fix some issues, however I think we have missed the window for 2.3.0 so will likely need to add back some of these methods with all the different context classes and deprecate them all.

@geoand I am guessing this is too late for 2.3?

@geoand
Copy link
Contributor

geoand commented Sep 29, 2021

You folks will need to decide within the next hour or so if this is going to be backported :)
Otherwise, I won't have enough time to get the release done today

@stuartwdouglas
Copy link
Member

I think back port it if possible, but if it is going to be a pain we can always go down the depreciation route.

@geoand
Copy link
Contributor

geoand commented Sep 29, 2021

Okay. Go ahead and merge then and I'll cherry-pick it to the my backports branch

@geoand
Copy link
Contributor

geoand commented Sep 29, 2021

Just as an FYI, I am going to finish the backports once #20103 is in

@stuartwdouglas stuartwdouglas merged commit cca576a into quarkusio:main Sep 29, 2021
@quarkus-bot quarkus-bot bot added this to the 2.4 - main milestone Sep 29, 2021
@geoand geoand modified the milestones: 2.4 - main, 2.3.0.Final Sep 29, 2021
@sberyozkin sberyozkin deleted the oidc_introspection_userinfo_cache branch September 29, 2021 08:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OIDC Introspection and Userinfo Cache
4 participants