-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Starvation: AAD Token Validation #2774
Comments
@serpentfabric are you seeing the same behavior on the latest version 8.0.2? In 8.0.2 we are locking on the first call to the metadata endpoint and not on all, which might impact what you are seeing. thanks! |
We have the same problem, we see a lot of requests going towards well-known endpoint, even though Automatic Refresh Interval should be 12 hours, and RefreshInterval is 5 minutes. Looks like this happens more when the app gets a lot of incoming traffic. We use version 8.0.2 of the library: One thing we noticed in the codebase is that this line, will always set _syncAfter to 5 minutes, so AutomaticRefreshInteval of 12 hours is only used once after initial retrieval when _currentConfiguration object is still null, and then after first call to UpdateCurrentConfiguration method it will do automatic refresh every 5 minutes Line 316 in a9380ab
Also here are some logs from the app, it is a single call to the endpoint that requires authenticated user, where token is malformed, but as you can see it results in 3 separate calls towards metadata endpoint:
|
And this line of code seems to be incorrect
According to this comparand in on position 3, and value to set is on position 2, it will compare current state with running, and replace it with idle when they are equal. Probably is should be the other way around?
If case seems correct, since if current state is idle, it will run the update, and if it is running, it will not, but this code will never set it to running, and there seem to be no other place that will set _configurationRetrieverState flag to running in the code base. |
We found that after updating to version 8.0 of Microsoft.AspNetCore.Authentication.JwtBearer, we got a lot more configuration refresh requests from this line: Line 946 in c96b0d2
This occurs when receiving a token with an invalid signing key or issuer. Prior to dotnet/aspnetcore#49541, Line 928 in c96b0d2
|
The fix for this has been released in 8.1.1. |
Which version of Microsoft.IdentityModel are you using?
Note that to get help, you need to run the latest version.
Where is the issue?
Is this a new or an existing app?
existing
Repro
If you send 200 simultaneous requests to an API when the applicaiton first starts then the application will be unresponsive for up to a minute while we believe it makes requests to AAD to get the metadata required to validate tokens on the request. After "warming up" the issue goes away.
I think I found the cause. And I think I found a workaround. But I reckon it's something worthy of a fix.
Cause
The server is configured to use AAD here:
And when the 200 requests arrive at once, the code in AadIssuerValidator calls GetBaseConfigurationAsync which in turn calls GetConfigurationAsync which in turn fails this condition the following check the first 200 (or however many concurrent requests came in) times:
and then the remaining 199 requests all queue up waiting for that lock...
workaround
we created a startup probe, that runs before the container accepts traffic, that runs the following code and basically hydrates the metadata using the config options
Expected behavior
The metadata is already available before the application can receive the requests or it re-uses the recently fetched metadata once subsequent requests get the lock.
Actual behavior
All requests waiting for the lock re-retrieve the token's metadata/signing-keys regardless if they were fetched recently enough
Possible solution
i haven't given a lot of thought to how this system works, so i reckon my workaround isn't ideal, but i think it's enough to inspire the right kind of change?
Additional context / logs / screenshots / links to code
Here you can see the cache size growing for requests when it should just be a single one:
The text was updated successfully, but these errors were encountered: