-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix responses for the token APIs #54532
Conversation
This commit fixes our behavior for the use of token related APIs. More concretely: - In the Get Token API with the `refresh` grant, when an invalid (already deleted, malformed, unknown) refresh token is used in the body of the request, we respond with 400 and an error_description header with the message "could not refresh the requested token" as opposed to sometimes doing that and sometimes returning `401` and a message "token malformed" - In the Invalidate Token API, when using an invalid (already deleted, malformed, unknown) access or refresh token, we respond with 200 and a body that shows that no tokens were invalidated: ``` { "invalidated_tokens":0, "previously_invalidated_tokens":0, "error_count":0 } ``` as opposed to the current behavior which was to throw an error with 400 or 401 ( depending on the case )
Pinging @elastic/es-security (:Security/Authentication) |
@@ -48,7 +48,7 @@ POST /_security/oidc/logout | |||
"refresh_token": "vLBPvmAB6KvwvJZr27cS" | |||
} | |||
-------------------------------------------------- | |||
// TEST[catch:unauthorized] | |||
// TEST[catch:request] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (expected for this test) 500 was masked by the 401
that was thrown in
Line 55 in d029a13
invalidateRefreshToken(request.getRefreshToken(), ActionListener.wrap(ignore -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because the response changes to 200 and error now happens in the OIDC part as 500?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is necessary because we don't do the login flow so we don't have tokens to invalidate for the logout. We expect that this call would fail for that reason - but it failing with a 401
was a mistake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please help me understand the motivation of returning 200
for more "invalid" token invalidation requests. It feels like a form of leniency to me. For the same "not found" token, what is the benefit of returning a 200
for "invalidation requeset" compared to 400
for "refresh request"? Do we have a specific use case for it? Why not make them consistent?
Hiding internal errors could be misleading at times. I'd prefer more accurate and granular error responses, i.e. error code and description that precisely describe the underlying issue (unless forbidden by security policy, e.g. password enumeration). Elasticsearch in general has pretty elaborated error messages. We even have error_trace
parameter to show the entire stacktrace. So my understanding is that we are OK to tell users the actual issues. The current tokenService code sometimes hides the actual issue by a generic 401
malformed error. This can be improved. But I am not sure whether the answer is to replace all of them with 200
.
There are many things that can go wrong with a request. To return 200
for error situations, we need carefully define what error types qualify it, which I think is tricky. There are gray areas. For an example, this variant of invalidateAccessToken returns 400
for null
user and this is different from the new 200
logic. Another example is that I could even argue that an "empty string" should be treated the same as malformated ones and also gets a 200
response.
In summary, I am not convinced that more 200 responses are what we want. But I am open to discussions.
@@ -584,7 +585,8 @@ public void invalidateAccessToken(String accessToken, ActionListener<TokensInval | |||
final Iterator<TimeValue> backoff = DEFAULT_BACKOFF.iterator(); | |||
decodeToken(accessToken, ActionListener.wrap(userToken -> { | |||
if (userToken == null) { | |||
listener.onFailure(traceLog("invalidate token", accessToken, malformedTokenException())); | |||
logger.trace("The access token [{}] is expired and already deleted", accessToken); | |||
listener.onResponse(TokensInvalidationResult.emptyResult()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this change, a 200 response will be returned if the token index does not exist. This behaviour is different from the refresh token invalidation, which returns 400 invalidGrant when the index does not exist.
@@ -48,7 +48,7 @@ POST /_security/oidc/logout | |||
"refresh_token": "vLBPvmAB6KvwvJZr27cS" | |||
} | |||
-------------------------------------------------- | |||
// TEST[catch:unauthorized] | |||
// TEST[catch:request] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is necessary because the response changes to 200 and error now happens in the OIDC part as 500?
@@ -872,7 +891,7 @@ private void findTokenFromRefreshToken(String refreshToken, Iterator<TimeValue> | |||
} | |||
} catch (IOException e) { | |||
logger.debug(() -> new ParameterizedMessage("Could not decode refresh token [{}].", refreshToken), e); | |||
listener.onFailure(malformedTokenException()); | |||
listener.onResponse(SearchHits.empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All usage of malformedTokenException
seems to be deleted so itself can be removed as well.
Sure thing. This is not about leniency. So, in principal when we get a request to invalidate one/more tokens and we can't find the token in our token index:
Our current behavior is broken as we would reply with:
and I'm trying to figure out how to best solve it. I took the stance that we shouldn't care about why the token we received can't be invalidated since it's already invalid. I'm open to discuss this of course. Is your problem with the response code specifically ? I selected that because the request "succeeds in its desired outcome" and I felt the response body was an indication enough. Would you think a
Because it's not the same thing IMO. When the request is to refresh a token and the token is malformed, then we can't refresh the token and we cannot satisfy the user request so we return a 400 saying it's the callers fault because the token they sent was not valid. When the request is to invalidate the token the caller sends us a refresh token that they think is valid and their purpose is to make this token "not valid". If we determine that the token is not valid either way, the user request is satisfied (because the token is invalid) and we indicate in the response body that we did not invalidate the token now with
The more I think about it though, the more a 4xx seems better suited
I don't know if I agree.. Why would someone attempting to authenticate with an access token care if this failed because we couldn't base64 decode a string or because the version number of the encoded token is not what we expected or because we couldn't perform a search in an internal index. These are internal implementation details, are prone to change and offer no value to the caller. The caller should treat the access token as an opaque string with a binary status : valid or invalid. The information on why we failed might be interesting for an administrator and this is why we log it.
TBC, this is not what this PR attempts to do. It's not a blanket change to always return 200 everywhere, it affects the calls to the invalidate API.
Agreed :/
I didn't change that on purpose because it is called from TransportSamlInvalidateSessionAction with a token that we just found in our security index ourselves and not with a token that a user sent us via the API
This breaks the contract of our Invalidate Token API though ( that you must provide a non empty string ) and thus we fail on validation on the Request level. Maybe we could chat about this tomorrow too and save us some back 'n forth, I'll hit you up in my morning ! |
I had a discussion with @albertzaharovits now about this too and we came to an agreement on :
I'll adust the code, @ywangd we can still discuss this in your afternoon ! |
That sounds like it will be a breaking change for Kibana though, unless I'm missing something. We never expected
Are you still planning this change for 7.7.0? |
@azasypkin I was under the impression that it is broken now for kibana ( as you do get 4xx for already invalidated tokens - but with misleading errors ), and this is the motivation for changing this behavior. TBC, nothing we're doing here changes the current behavior when you'd get a
I'm treating this as a bug since it doesn't change a correct behavior but fixes an incorrect one. As such (and assuming that we don't figure out that it does change a behavior in a way that is breaking for kibana ) if this is merged in time to make it to a BC for 7.7, it will be in 7.7, otherwise it will be in 7.7.1 |
@jkakavas I understand that the intention is to bring consistency to invalidate token calls for both access and refresh tokens. My main question was about when and why we should return a Base on the analysis so far and existing code behaviours, the following status code seems to make sense to me:
As for the response content, I'd personally prefer to tell user the failure reason. Maybe not to the details of base64 decoding failure, rather something like "malformed" or "not found". But the status code already signifies it. So I won't cling to my idea. I am available for discussion tomorrow when you have time. Thanks @azasypkin Correct me if I am wrong, but I think Kibana should not be affected by these non-200 codes since I'd assume when Kibana invalidates the tokens, they should all fall into above cases where |
Ah, I must have misunderstood then, that's good to know! If in all the cases when we were returning 200 we'll continue to return 200 then we're good, thanks and sorry for the noise! |
And that is a valid question that made me rethink this through and decide that we shouldn't return 200. 💯
✔️ . The caveat is that we can never know for sure if this an access token that was at some point valid but now expired and removed OR a token that is not meant for us.
i disagree on that. I think 404 is appropriate. Access tokens should be opaque strings for the clients, they shouldn't care / know / be aware of whether it can be decoded on our side. It's not iike they are encoding or producing these tokens on the client side so that such an error message would be useful / an indication to do something differently. It's a server side implementation detail. We discussed this with Albert too and came down to that we could do it but don't see the benefit in it. I was also thinking we probably want to keep the 404 vs X distinction and only return X when we want to signal to the caller that they should retry the request. Now,
I guess
✔️ but more precisely, token that has already expired but not for more than 24hrs since we would have cleaned over the document. After ~24hrs from a token expiration this would fall in the "Token document not found" case.
✔️ With the same caveat as above. Most things make sense and corresponds to what we were laying out in #54532 (comment)
see my comments above, I'm still not persuaded this is something we should do. |
@jkakavas Thanks a lot for your detailed response and background knowledge. I appreciate that. 👍 I feel we are progressing here and have narrowed down the discussions.
This is a great one. We have two choices here: 1) Lastly, for |
It's definitely possible, but we haven't heard such complaints yet. Likely because there are just a few cases when user can get into such situation:
Kibana would definitely prefer 200 since we know that tokens we send are always well-formed and were created by Elasticsearch, but they can be:
To summarize:
|
I think this is up to Kibana to consume and not up to us to change. We will define the status code that we return in such cases and Kibana will make sure that the correct behavior is presented to the user.
We basically can't find a token document for that token. The reason why ( we can't decode the token string in order to make a search request) is - to me - an implementation detail.
As I mentioned before, end users do not care about this. If this comes up for troubleshooting, it will be from the perspective of an administrator that has access to logs. I still prefer the simplicity of 2 error codes (
We discussed whether we should do this also yesterday. We came down to the conclusion that we shouldn't. The reasoning is that this
it means there were 11 tokens and we could invalidate only 8 of them, so there are 3 that are still valid. If we return |
Let's go with this then. As for the Thanks for all the discussions. It was good learning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments and the mock needs to be fixed for the failed tests, otherwise LGTM. Thanks for the iterations.
I think a 400 might be better to signal that it's the caller's fault instead of 404 which can also be observed when a token is simply invalidated and deleted
Do you still plan to do this?
when(license.isTokenServiceAllowed()).thenReturn(true); | ||
} | ||
|
||
public void testInvalidateTokensWhenIndexUnavailable() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worthwhile to test "unavailable" index in TokenAuthIntegTests
by closing the token index before invalidation?
I went back and forth to be honest, but I'll do this as it feels the right thing to do |
Feel free to leave it for the next round if you need more time to ponder on it. |
Wouldn't that mostly be testing |
Not entirely. A closed index makes My understanding is that you wanted to test the code behaviour when above failure happens? |
Yes, but what we check is what happens when Now, that I will be adding different behavior for exists vs available for the invalidate related methods, I'll add a test case for closed as this still triggers unavailable but should return 400 and not 503. If that's what you meant above +1, I didn't make the connection |
My original comment of "a closed index is counted as an unavailable index" was to your comment of
I thought your preference was to trigger the index to be unavailable in |
Gotcha, the original comment (#54532 (comment)) is presented out of context in UI :/ instead of as a response to my original comment ( that I had forgotten I had made up until now ) |
Thanks for all the comments and discussion @ywangd , I changed the behavior to differentiate between closed/not-existent and unavailable in the invalidate token API, take a look if you please to validate that this is what you had in mind too |
if (e instanceof IndexNotFoundException || e instanceof IndexClosedException) { | ||
listener.onFailure(new ElasticsearchSecurityException("failed to invalidate token", RestStatus.BAD_REQUEST)); | ||
} else { | ||
listener.onFailure(unableToPerformAction(e)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to have the new logic encapsulated inside unableToPerformAction(e)
. Otherwise it looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unableToPerformAction
is meant to be a simple wrapper to throw an ESS with 503. If your issue is with the duplication of these 4 lines and you feel strongly about this, I can add method but I don't see so much value in it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often apply rule of 3 and these code is duplicated 2 times, not 3 or more. So, no, I don't feel strongly about it.
This commit fixes our behavior regarding the responses we return in various cases for the use of token related APIs. More concretely: - In the Get Token API with the `refresh` grant, when an invalid (already deleted, malformed, unknown) refresh token is used in the body of the request, we respond with `400` HTTP status code and an `error_description` header with the message "could not refresh the requested token". Previously we would return erroneously return a `401` with "token malformed" message. - In the Invalidate Token API, when using an invalid (already deleted, malformed, unknown) access or refresh token, we respond with `404` and a body that shows that no tokens were invalidated: ``` { "invalidated_tokens":0, "previously_invalidated_tokens":0, "error_count":0 } ``` The previous behavior would be to erroneously return a `400` or `401` ( depending on the case ). - In the Invalidate Token API, when the tokens index doesn't exist or is closed, we return `400` because we assume this is a user issue either because they tried to invalidate a token when there is no tokens index yet ( i.e. no tokens have been created yet or the tokens index has been deleted ) or the index is closed. - In the Invalidate Token API, when the tokens index is unavailable, we return a `503` status code because we want to signal to the caller of the API that the token they tried to invalidate was not invalidated and we can't be sure if it is still valid or not, and that they should try the request again. Resolves: elastic#53323
This commit fixes our behavior regarding the responses we return in various cases for the use of token related APIs. More concretely: - In the Get Token API with the `refresh` grant, when an invalid (already deleted, malformed, unknown) refresh token is used in the body of the request, we respond with `400` HTTP status code and an `error_description` header with the message "could not refresh the requested token". Previously we would return erroneously return a `401` with "token malformed" message. - In the Invalidate Token API, when using an invalid (already deleted, malformed, unknown) access or refresh token, we respond with `404` and a body that shows that no tokens were invalidated: ``` { "invalidated_tokens":0, "previously_invalidated_tokens":0, "error_count":0 } ``` The previous behavior would be to erroneously return a `400` or `401` ( depending on the case ). - In the Invalidate Token API, when the tokens index doesn't exist or is closed, we return `400` because we assume this is a user issue either because they tried to invalidate a token when there is no tokens index yet ( i.e. no tokens have been created yet or the tokens index has been deleted ) or the index is closed. - In the Invalidate Token API, when the tokens index is unavailable, we return a `503` status code because we want to signal to the caller of the API that the token they tried to invalidate was not invalidated and we can't be sure if it is still valid or not, and that they should try the request again. Resolves: #53323
This commit fixes our behavior regarding the responses we return in various cases for the use of token related APIs. More concretely: - In the Get Token API with the `refresh` grant, when an invalid (already deleted, malformed, unknown) refresh token is used in the body of the request, we respond with `400` HTTP status code and an `error_description` header with the message "could not refresh the requested token". Previously we would return erroneously return a `401` with "token malformed" message. - In the Invalidate Token API, when using an invalid (already deleted, malformed, unknown) access or refresh token, we respond with `404` and a body that shows that no tokens were invalidated: ``` { "invalidated_tokens":0, "previously_invalidated_tokens":0, "error_count":0 } ``` The previous behavior would be to erroneously return a `400` or `401` ( depending on the case ). - In the Invalidate Token API, when the tokens index doesn't exist or is closed, we return `400` because we assume this is a user issue either because they tried to invalidate a token when there is no tokens index yet ( i.e. no tokens have been created yet or the tokens index has been deleted ) or the index is closed. - In the Invalidate Token API, when the tokens index is unavailable, we return a `503` status code because we want to signal to the caller of the API that the token they tried to invalidate was not invalidated and we can't be sure if it is still valid or not, and that they should try the request again. Resolves: elastic#53323
This commit fixes our behavior regarding the responses we return in various cases for the use of token related APIs. More concretely: - In the Get Token API with the `refresh` grant, when an invalid (already deleted, malformed, unknown) refresh token is used in the body of the request, we respond with `400` HTTP status code and an `error_description` header with the message "could not refresh the requested token". Previously we would return erroneously return a `401` with "token malformed" message. - In the Invalidate Token API, when using an invalid (already deleted, malformed, unknown) access or refresh token, we respond with `404` and a body that shows that no tokens were invalidated: ``` { "invalidated_tokens":0, "previously_invalidated_tokens":0, "error_count":0 } ``` The previous behavior would be to erroneously return a `400` or `401` ( depending on the case ). - In the Invalidate Token API, when the tokens index doesn't exist or is closed, we return `400` because we assume this is a user issue either because they tried to invalidate a token when there is no tokens index yet ( i.e. no tokens have been created yet or the tokens index has been deleted ) or the index is closed. - In the Invalidate Token API, when the tokens index is unavailable, we return a `503` status code because we want to signal to the caller of the API that the token they tried to invalidate was not invalidated and we can't be sure if it is still valid or not, and that they should try the request again. Backport of #54532
This commit fixes our behavior for the use of token related APIs.
More concretely:
In the Get Token API with the
refresh
grant, when an invalid(already deleted, malformed, unknown) refresh token is used in the
body of the request, we respond with
400
HTTP status codeand an
error_description
header with the message "could notrefresh the requested token".
Previously we would return erroneously return a
401
with "tokenmalformed" message.
In the Invalidate Token API, when using an invalid (already
deleted, malformed, unknown) access or refresh token, we respond
with
404
and a body that shows that no tokens were invalidated:The previous behavior would be to erroneously return
a
400
or401
( depending on the case ).In the Invalidate Token API, when the tokens index doesn't
exist or is closed, we return
400
because we assume this isa user issue either because they tried to invalidate a token
when there is no tokens index yet ( i.e. no tokens have
been created yet or the tokens index has been deleted ) or the
index is closed.
In the Invalidate Token API, when the tokens index is
unavailable, we return a
503
status code becausewe want to signal to the caller of the API that the token they
tried to invalidate was not invalidated and we can't be sure
if it is still valid or not, and that they should try the request
again.
Resolves: #53323