404 vs 403 for unauthorized requests to prevent data leaks #11

mkokotovich · 2022-05-05T18:02:59Z

mkokotovich
May 5, 2022
Collaborator

We had a discussion in the ART 3 arch sync today about the return code for a request to an endpoint to retrieve quotes from SF:
GET /v1/opportunity/:opp-id/quotes

The endpoint performs authorization on these requests to make sure the user has access to the quotes.

It uses the org id from the me response to get the SF account linked to the user's identity organization
It then looks up the opportunity in SF and finds the account it is linked to
If those two accounts match, then the user has access

Our discussion was centered around what to return if the two accounts don't match. A 403 makes sense, because the user is unauthorized to view that account. However, even just returning 403 is leaking some information to this user, it is telling them that an opportunity with that ID exists. A user could use this information to do other nefarious things.

You may have tried to open something in Github on a new browser and noticed how it returns a 404:

They are following this same principle: don't leak the information that an entity exists to a user without access.

Identity Service does this too with password resets, failed logins, etc. We make sure the "password is incorrect" errors use the same response as the "email not found" errors, because we don't want to leak whether or not an email is in our system.

I think we should update https://github.com/SPSCommerce/sps-api-standards/blob/main/standards/authentication.md to reflect this, and make it clear that 404 is probably preferred to 403 in most cases.

JamesStauffer · 2022-05-06T02:02:44Z

JamesStauffer
May 6, 2022
Collaborator

Should the suggesting to return 404 instead of 403 only apply when the path has identifiers?
Does it matter to return a 403 when the request doesn't have any identifiers (e.g. /identity/users/me)?

0 replies

travisgosselin · 2022-05-06T13:25:37Z

travisgosselin
May 6, 2022
Maintainer

I definitely appreciate this perspective a lot in ensuring we are not leaking information. This exact question was discussed in the working group last Nov 8, and I really wish we had detailed notes on the reasoning at that time - unfortunately, the meeting recording is about a month too old and has been removed. This is why I have been encouraging us moving forward to use GitHub Discussions so we can better track and refer back to the results.

If I recall correctly @jwineinger , @eggilbert and @alexander-ivakhnenko were all present and key contributors to this discussion at that time. Hoping you can take a minute to review and provide feedback here on your thoughts.

From what I recall, one key difference with our API is that it is always authenticated access from subscribed or paying organizations. This is not a free API that anyone can sign up and use - which does limit the scope of bad actors, but nevertheless, that alone is likely a weak argument for or against.

You could also reasonably reverse this as well, which is I believe how AWS S3 style resource lookups work. The idea that instead of returning a 404 for things you don't have permission too (like your GitHub repo example), you could return 403 for everything you don't have permission too, including resources that do not exist. In some business scenarios this might make more sense to consuming engineers for the API as we are telling them they are forbidden from knowing if something exists, rather than saying it doesn't exist for you - could be interpreted a little bit easier.

I think this, to a very small degree, may also be use-case specific depending on the sensitive nature of the content - I wonder in our domain if this does pose real risk? Perhaps there are more qualifiers on when to use this pattern vs globally? In your example, can you expand on how knowing the existence of an opportunity would be used nefariously (not implying that your wrong at all, just want to understand the specifics a bit more in the discussion).

This particular line is really helpful in directing towards your suggestion Koko as general industry intent:
https://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html#sec10.4.4

The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated. If the request method was not HEAD and the server wishes to make public why the request has not been fulfilled, it SHOULD describe the reason for the refusal in the entity. If the server does not wish to make this information available to the client, the status code 404 (Not Found) can be used instead.

1 reply

travisgosselin May 9, 2022
Maintainer

We had some discussion on it today. There is a sense that it may be difficult to formulate an all-encompassing statement around this, but adding a line about using a 403 to not leak essential information may make sense. I lean towards it being a 403 since a 404 not found can be more confusing than a 404 (i.e. you don't have access to know if it exists, vs assuming you don't have access based on it not being found).

@jwineinger and @alexander-ivakhnenko had some opinions as well probably worth sharing. ?

omnipitous · 2022-08-25T16:07:55Z

omnipitous
Aug 25, 2022
Collaborator

We've talked a lot about 404's more in the context of "object missing vs. invalid URL" but this part has come up too. For me this fits into the realm of "maybe better for security but worse for functionality and does it actually make us any safer?" For support, debugging and client functionality purposes having accurate response codes is valuable so we take a hit making such a change. On the other side of the coin: Sure an attacker can use response codes to farm the 'shape' of an API and to enumerate certain identifiers but for the former we're publishing the spec so it's not really protected data anymore and for the latter: The Identity Example could be "qualified" as "Don't leak PII" so if your query params will return a 403 or not based on some PII then return a consistent response code to not allow usage of the endpoint to farm for PII. I do prefer the suggestion to always return a 403 instead of 404 as we already have a usage collision on 404's from raw HTTP so would rather overload 403.. Travis comment to the nature of our APIs we are at least limited to "Existing customers trying to sus out data on other customers" vs "external bad actors trying to penetrate our systems" since all of our APIs are secured.

1 reply

JamesStauffer Sep 3, 2022
Collaborator

Good points. The API spec will likely show the pattern for the ID so I don't see much risk to an attacker knowing that value x exists or not. If there is a risk to that, maybe it is better handled by security testing instead of security be obscurity.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

404 vs 403 for unauthorized requests to prevent data leaks #11

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

404 vs 403 for unauthorized requests to prevent data leaks #11

mkokotovich May 5, 2022 Collaborator

Replies: 3 comments · 2 replies

JamesStauffer May 6, 2022 Collaborator

travisgosselin May 6, 2022 Maintainer

travisgosselin May 9, 2022 Maintainer

omnipitous Aug 25, 2022 Collaborator

JamesStauffer Sep 3, 2022 Collaborator

mkokotovich
May 5, 2022
Collaborator

Replies: 3 comments 2 replies

JamesStauffer
May 6, 2022
Collaborator

travisgosselin
May 6, 2022
Maintainer

travisgosselin May 9, 2022
Maintainer

omnipitous
Aug 25, 2022
Collaborator

JamesStauffer Sep 3, 2022
Collaborator