Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gateway authentication #1067

Merged
merged 153 commits into from
Dec 2, 2024
Merged

Add gateway authentication #1067

merged 153 commits into from
Dec 2, 2024

Conversation

afsalthaj
Copy link
Contributor

@afsalthaj afsalthaj commented Nov 16, 2024

Add worker gateway authentication resolving. Please excuse the verbose PR description, but I request you to go through it before reviewing code.

#1061
#1062
#1063
#1064
#1060
#1065

As part of these tickets, there are some important changes done here - mainly focussing on user experience with authentication, CORS and similar middleware(s) in API gateway, that can be very difficult to fix in future if not taken care now.

Notes on OpenAPI

This was also a learning as to how much we can keep up with the real OpenAPI spec. The native API definition (user-facing part) is now mostly inspired from OpenAPI spec itself.

  1. API Gateway is powerful enough, that you don't need all the details required in OpenAPI spec all the time. Trying to add them to align strongly with OpenAPI spec can be really confusing. Example: There is no need of openIdConnectUrl which is part of the OpenAPI spec - just the security scheme id is enough. The reasoning here is, not many API gateways really support this level of authentication as far as I see. While OpenAPI spec is designed and geared to be part of the actual application and not the gateway. Another example is, there is no clear path on how to setup cors in a standard OpenAPI spec - while it is aware of some headers, but not entirely.

  2. API Gateway doesn't have all of the features that an advanced OpenAPI spec may refer to. However, by the time we support the major ones (which we are already in progress with), we will end up having the above situation. Example: We support only OpenID based authentication, while OpenAPI can list a set of securities to be applied. I haven't done the sequential application of security yet. Supporting a list right now can only confuse developers.

Hints on what's in PR

  • HttpApiDefinitionRequest (user facing internal API definition structure which is formed either through direct native-api-definition upload or OpenAPI import). As an example, this will have concise information about security (there is more to it, which is further explained below)
  • Explicit Transformation Layer which makes extra routes, extra middlewares etc based on the user request to form the core internal HttpApiDefinition (explained below). Transformations are idempotent.
  • HttpApiDefinition . This will have detailed information about security, added auth call back endpoints based on the redirect-url of user's security scheme with Google (as an example), cors-header-additions based on preflight endpoints etc.
  • CompiledHttpApiDefinition(not much change here except compiler driven changes). This is HttpApiDefinition itself, but with pre compiled rib, input type information for each Rib used (worker name and response mapping)
  • HttpApiDefinitionResponse - This is user facing response (unexposed byte code, unexposed identity provider metadata, unexposed secrets etc). There is an extra HttpApiDefinitionRequestData which is almost 1-1 with HttpApiDefinitionRequest, however, I didn't focus on fixing this. Duplicate data came into existence for some reason, but such details can be dealt later, as this is not affecting user's hot path.
  • Security Scheme Create and Get. Delete and Update has implications and I would want to do this later.
  • Session Store mainly to correlate requests that went through multiple redirections as part of authentication. Currently a a redis also backed by a simple in-memory structure backed by existing cache mechanisms withe eviction policies. Currently session store is mainly for authentication, so I used the original state information (a random value) produced by the openidconnect crate
  • Session management are skipped for requests that doesn't require it. i.e, only those requests that goes through a auth middleware becomes aware of the need of the session information, and the response from middleware will have a session id, which is later used to lookup the auth details which gets fed to rib input. If requests don't go through this middleware, this sessionId is always a None. It can be None for certain other cases too such as if the middleware results in Unauthorized response or a redirect response.
  • Auth call back static endpoint - this is part of static binding which was introduced as part of cors
  • MiddlewareIn and MiddlewareOut. Authentication Middleware has and instance of MiddlewareIn (inspecting request) and Cors has an instance of MiddlewareOut (transforming response)
  • Authentication Middleware implementation. If the request doesn't have tokens (id token or access token, i.e, unauthenticated calls), then it sends a redirect response back to identity provider. Similarly, if the session was expired, it sends a redirect response. If the signature is mismatched, it is unauthorized response rather than redirect.
  • IdentityProvider trait implementations. There Is a default implementation and it's reused for various providers. The identity provider trait is focussed on easier testing:
  • Explicit Tests: Signature Mismatch, Expired Tokens, Expired Session and actual Successful authentication. This includes redirect flows, and this was a time killer however, making it work will allow escape weeks of debugging in future.
  • The generics such as ToResponse<R> was hindering development to a great extent and it only got worse in last couple of days, so I made it explicit for Http last day. Basically the workflow is much more easier to understand now. History: This was mainly done to reuse the workflow (resolve the binding and execute the binding while handling middlewares etc) between different protocols long time back, however, it was giving more pain than needed. We will figure out how other protocols can be incorporated later. It's not a concern of this PR. In other words, gateway_input_executor is changed to gateway_http_input_executor
  • I haven't given much love to golem-cli part yet, but works for adding a security scheme and use it in API definition. This is more of mechanical work, and we can keep improving
  • I tried to fix a few clones that happens at hot path. But I am hesitant to do it in a hurry (I tried, and it resulted in some changes that I am not convinced of yet). We are aware of it, and I raised this ticket

Experimental in nature

Being able to tag a security which is a one-liner in configuration (OpenAPI as well as native API definition), and then specify things like request.auth.email in a Rib script makes the whole workflow production grade. That is, not anyone can simply invoke a worker even in a publicly exposed API.

The changes done are still experimental as there are numerous possibilities of configurations and usage of these features. It' less ROI to go through all of those possibilities in the first attempt, and therefore the PR is more of getting one thing right with focus on extensibility, and not focussing much on minor details which can be added/fixed based on testing in coming weeks. Example: Adding cookies is just inlined straight away for the auth call back responses - we are not making it configurable and complicating it further - because we don't know if this is worth it yet.

I took an extra week for this work, but I strongly believe the details figured out are worth it, and offloading them to future would result in weeks of downtime.

@afsalthaj
Copy link
Contributor Author

I will squash before merge

@vigoo
Copy link
Contributor

vigoo commented Nov 27, 2024

First question, just based on the PR description. You mention:

Session Store mainly to correlate requests that went through multiple redirections as part of authentication. Currently a simple in-memory structure with an eviction policy.

How does this work with multiple load-balanced worker service nodes (which is the normal way to set up a real golem cluster)?

@afsalthaj
Copy link
Contributor Author

afsalthaj commented Nov 27, 2024

Yes. Absolutely. As I mentioned in the PR description the plan is to add redis backend. But as another PR. Cloud will not work without this anyway and I will be forced to add (and there won't be accidental skipping). Happy to push that into the current PR. Just stopped committing once the tests becoming green after fixing other corner cases.

Copy link
Contributor

@mschuwalow mschuwalow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will continue tomorrow

@afsalthaj
Copy link
Contributor Author

@mschuwalow Tried to resolve many of your comments thanks for the review. Transformers are top level functions and so for middleware in and out (initially I had troubles, propagation of some history there, but fixed it). I think these are useful comments.

I would prefer doing the rest (if any later), especially IdentityProvider. Initially provider is a trait, but not each one of them. Initially they were in the commit history, and later to moved to a single IdentityProvider implementation. If we have diverging implementations we can consider it as each provider as a trait. Or we can think about it later

@jdegoes jdegoes merged commit c183ea1 into main Dec 2, 2024
18 checks passed
@jdegoes jdegoes deleted the add_gateway_auth branch December 2, 2024 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment