Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add authentication to HTTP API #6431

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

jacobtomlinson
Copy link
Member

@jacobtomlinson jacobtomlinson commented May 24, 2022

Proposal for closing #6407. Feedback and suggestions very welcome.

This PR adds authentication to the HTTP API via a decorator that can be reused on any methods we want to put behind it.

To use API routes that modify cluster state the user must set the API key in the config at distributed.scheduler.http.api-key and then that same API key must be set in the HTTP Authorization header of the request curl -H "Authorization: Bearer {api-key}" http://localhost:8787/api/v1/retire_workers.

Auth can be disabled by setting distributed.scheduler.http.api-key to False which may be useful in deployments where auth is handled elsewhere.

But by default, the api-key is set to "" which will always result in a 403 Forbidden. So I've also re-enabled the HTTP API by default (reverting #6420) as the new authentication default is also secure.

I've added a number of tests to ensure this auth handling is robust, but suggestions on improving testing further are welcome.

Closes #6407

@github-actions
Copy link
Contributor

github-actions bot commented May 24, 2022

Unit Test Results

       15 files  ±  0         15 suites  ±0   6h 37m 32s ⏱️ + 3m 48s
  2 811 tests +  2    2 729 ✔️ +1    80 💤 +1  2 ±0 
20 839 runs  +13  19 905 ✔️ +9  930 💤 +2  4 +2 

For more details on these failures, see this check.

Results for commit 37eabda. ± Comparison against base commit 60f0886.

♻️ This comment has been updated with latest results.

@ntabris
Copy link
Contributor

ntabris commented May 24, 2022

But by default, the api-key is set to "" which will always result in a 403 Forbidden

This affects dashboard as well?

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented May 24, 2022

@ntabris no just the new HTTP API that was added in #6270, and only the routes that affect the scheduler, read-only routes will not require auth.

@jacobtomlinson
Copy link
Member Author

@fjetter @gjoseph92 would you have some time to review this?

@pentschev
Copy link
Member

rerun tests

Note: rerunning gpuCI tests since those errors should be fixed by #6434

@gjoseph92
Copy link
Collaborator

I wonder if the list of which routes do vs don't require auth should be in the config? (Note this is basically what we had in #6408, with a better name). Seems like something security-conscious users would like to be able to set themselves.

@jacobtomlinson
Copy link
Member Author

jacobtomlinson commented May 27, 2022

Thanks @gjoseph92. I think giving folks the ability to individually select which routes are authenticated is going too far in terms of configurability, folks will be easily able to shoot themselves in the foot by missing one out. This feels like unnecessary complexity.

My preference would be to leave read-only routes public and read-write routed with auth. Then if folks want the read-only routes to also have authentication then it is up to them to add another layer, in the same way we don't authenticate the dashboard today and folks commonly add something like Kong in front.

I could maybe be persuaded that we could add a simple config option for "use auth for all API routes", "use auth for read-write routes only" and "disable auth". But again this still feels like unnecessary complexity. This also opens us up to creep in terms of all the other routes on the HTTP server.

@gjoseph92
Copy link
Collaborator

Sure. This seems fine as an incremental step.

I have a little hesitancy about adding auth in general—it seems like it should be out of scope for dask, and as you mentioned adding a proxy in front seems like better practice anyway—but I suppose basic auth like this is somewhat common practice (prometheus does it, for example).

I guess I'm less into the read vs read-write distinction; I don't feel like that's a security guarantee we actually make or test against strongly enough to be sure the "read" routes are actually safe. An all-or-nothing approach would probably be more intuitive to me—if you're in a security environment where you'd need auth on the API, you probably should have it on everything. (And most likely, you should actually not have dask's auth on anything, and instead have a proxy in front with (m)TLS termination, connection limiting, auth, and whatever else you want.)

But that's a harder incremental change because:

  • we want the API to be secure
  • we want the API to be on by default (is this true? if not, that makes everything way easier.)
  • we don't want accessing the dashboard to suddenly require auth by default

So what you have here probably makes sense as an incremental step, because it lets use the API more securely, and iterate on it, without needing to update dask-cloudprovier, dask-k8s, dask-gateway, Coiled, and every other deployment to suddenly handle authenticating the dashboard.

But I think long-term, there shouldn't be a split between read and read-write routes. Auth should be all or nothing, and any deployment system exposing things to the internet should handle the "all" case. I think this is the ticket we should probably be discussing and have a plan to implement. I'd be a bit disappointed if things just stayed in this split case forever; if we don't think we'll ever have full auth, then I'm not sure we should actually add it incrementally just for the API.

@jacobtomlinson
Copy link
Member Author

I have a little hesitancy about adding auth in general—it seems like it should be out of scope for dask, and as you mentioned adding a proxy in front seems like better practice anyway

I'm a bit confused by this comment.

The Dask RPC has always been unauthenticated by default, with the option to enable mTLS to authenticate it. So auth is very much in-scope for Dask.

When I implemented the HTTP API I followed the same principle of having an unauthenticated interface by default, although I hadn't yet implemented any auth so we have disabled the API by default. This was a sensible precaution.

This PR adds that authentication and also makes it default secure, which is inconsistent with the RPC but I think a reasonable choice. One of the main worries raised in #6407 was dask-cloudprovider exposing clusters to the internet in the most basic deployment scenarios, to make that less risky the mTLS is enabled by default but we needed this PR to do the same for the HTTP API.

Ideally I'd like to rethink the whole approach of exposing to the internet in those scenarios though.

I guess I'm less into the read vs read-write distinction; I don't feel like that's a security guarantee we actually make or test against strongly enough to be sure the "read" routes are actually safe. An all-or-nothing approach would probably be more intuitive to me

Routes like the /health endpoint might cause us a headache here as they must be available without authentication.

You also mention "we don't want accessing the dashboard to suddenly require auth by default" which I totally agree with. But the dashboard is a read-only endpoint that returns HTML, the read-only routes here in this API return the same information as JSON. So those goals are in conflict.

I would be more than happy to move more API routes under authentication, but some things like health and the dashboard must remain outside of it.

Again this is another reason why I want to rethink the exposing to the internet defaults in dask-cloudprovider as we currently make the dashboard available to anyone.

we want the API to be on by default (is this true? if not, that makes everything way easier.)

Ideally, I would like it to be on by default just as the RPC is.

Alternatively, I would like it to be easily enabled via a boolean config option. Currently to enable it you must configure a full list of HTTP routes and include the HTTP API in this list. This requires looking up the defaults and keeping that list up to date which can be done in dask-kubernetes but is cumbersome.

This is extra unpleasant for users who want to enable it manually via their .yaml files.

we want the API to be secure

I would nuance this to say we want the API to be able to be secure, just as the RPC can have TLS enabled.

I'm struggling a little to reason here about the HTTP API given the current security posture of the RPC. Being consistent and being default-secure are conflicting goals.

I do think that either making the HTTP API secure by default or disabled by default is a good practice. Maybe we should consider doing the same with the RPC?

I'd be a bit disappointed if things just stayed in this split case forever; if we don't think we'll ever have full auth, then I'm not sure we should actually add it incrementally just for the API.

Again I'm very conflicted here. What do you mean by full auth? Auth on every HTTP endpoint? Auth on the RPC? A common auth on both? Kerberos and other third-party auth plugins like dask-gateway has?

Generally I think the approach here is more than adequate for simple use cases, and I'm happy to move more read-only routes under the auth if there are genuine security concerns about them. Then for advanced use cases folks can turn it off and put an auth proxy in front.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't expose insecure HTTP API
4 participants