Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If user is removed from GitHub team/org they should lose access (within X minutes) #67

Closed
simonw opened this issue Jan 23, 2021 · 16 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request wontfix This will not be worked on
Milestone

Comments

@simonw
Copy link
Owner

simonw commented Jan 23, 2021

Last remaining issue for #64 - if Datasette is configured to only allow access to users who are members of a specific GitHub organization, we need to ensure they lose access within a reasonable timeframe.

Their teams and orgs are baked into their actor cookie.

It's OK for them to maintain access for a little while, provided that's documented and administrators know to e.g. rotate the DATASETTE_SECRET.

@simonw simonw added documentation Improvements or additions to documentation enhancement New feature or request labels Jan 23, 2021
@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

We could invalidate the actor cookie after a short TTL but this will be frustrating for the users (unless we bring back auto-login, which was removed in #64).

Maybe a better option: embed a timestamp in the actor cookie showing when those teams/orgs were last fetched. Re-fetch them from the GitHub API any time that timestamp has expired, and re-issue the cookie.

@simonw simonw added this to the 1.0 milestone Jan 23, 2021
@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

I'm going to store int(time.time()) in the signed cookie as gh_ts.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

Here's an interesting challenge: at what point should the re-fetch of teams and orgs from GitHub happen?

We've baked the results of those API calls into the actor cookie itself. This means Datasette's default permission checking code doesn't need to call anything relating to our plugin at all - it can work purely off those values.

But we want to re-fetch the results from that API every five minutes or so to avoid them becoming stale.

How do we know when to re-fetch the results? We potentially need to issue a fresh cookie at the same time, to update the timestamp - that's a bit gnarly.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

I could try registering a custom permission_allowed() hook which does something special only if the allow blocks mention gh_orgs and gh_teams - but the Datasette default one is marked tryfirst=True for some reason so that may not work: https://github.com/simonw/datasette/blob/0.53/datasette/default_permissions.py

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

Maybe the solution here is to move away from the gh_orgs and gh_teams being baked into the actor cookie. They need to be stored somewhere though, ideally somewhere stateless so the plugin continues to work well on stateless hosting.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

Actually I think I can solve this with a actor_from_request() plugin hook. https://docs.datasette.io/en/stable/plugin_hooks.html#plugin-hook-actor-from-request

It could work by itself dispatching to the default Datasette one - https://github.com/simonw/datasette/blob/0.53/datasette/actor_auth_cookie.py - and then applying some extra logic to see if the permissions need to be re-checked. Since this runs on every incoming request that should be the right place to do this.

It can return an await function too, so it can make outbound HTTP calls.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

One catch: how can the actor_from_request() cause a fresh cookie to be set at the end of the request cycle?

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

One way could be for it to set a key on the actor called _needs_cookie - then have an asgi_wrapper() middleware look for that key and use it to re-set the actor cookie.

This is pretty fragile and weird though. Maybe Datasette itself needs to grow a mechanism that allows plugins to specify that the actor cookie should be set again?

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

An ASGI middleware wrapper could add a function key to scope called reset_actor_cookie - the actor_from_request hook could then execute that scope["reset_actor_cookie"](new_actor) function to alert the wrapping middleware that the actor cookie should be set again.

This is similar to the pattern used by asgi-csrf: https://github.com/simonw/asgi-csrf/blob/764914ad7931c29fd00e999a071b458fbef7b4d3/asgi_csrf.py#L33-L98

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

The problem I'm trying to solve here - mark that a cookie should be set at the end of the response - is the same problem I had to solve for the datasette.add_message() API. Here's how that works - it stashes a _messages property on the request object: https://github.com/simonw/datasette/blob/c38c42948cbfddd587729413fd6082ba352eaece/datasette/app.py#L503-L514

    def add_message(self, request, message, type=INFO):
        if not hasattr(request, "_messages"):
            request._messages = []
            request._messages_should_clear = False
        request._messages.append((message, type))

    def _write_messages_to_response(self, request, response):
        if getattr(request, "_messages", None):
            # Set those messages
            response.set_cookie("ds_messages", self.sign(request._messages, "messages"))
        elif getattr(request, "_messages_should_clear", False):
            response.set_cookie("ds_messages", "", expires=0, max_age=0)

Here's where _write_messages_to_response is called - by the route_path() method: https://github.com/simonw/datasette/blob/c38c42948cbfddd587729413fd6082ba352eaece/datasette/app.py#L1099-L1102

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

How about if the Datasette Request object grows a .set_cookie() method? It's on the Response class at the moment but maybe there's an argument that cookies should always be set against the global request object instead? Bit of a weird change though.

@simonw
Copy link
Owner Author

simonw commented Jan 23, 2021

I'm going to try to implement the scope["reset_actor_cookie"](new_actor) ASGI middleware pattern.

@simonw
Copy link
Owner Author

simonw commented Jan 24, 2021

I've hit a REALLY nasty problem: calling the GitHub API to check what teams and orgs the user is a member of requires an access token. But we've not stored the user's access token anywhere - which is good, because keeping hold of an access token for any longer than strictly necessary is never a good idea.

But this means that a design where we make ongoing calls to check the user's memberships won't actually work.

@simonw
Copy link
Owner Author

simonw commented Jan 24, 2021

Alternative solutions to this problem:

  • Set actor cookies that only last for a few minutes, and auto-login-bounce the user back through GitHub when they expire
    • This is pretty messy. It won't work properly for POST requests, so if a user is half way through submitting something and their actor expires they'll be out of luck
  • Tell Datasette administrators that if they kick someone out of an organisation they need to manually cycle the server's secrets.

I think that second option may be the way to go here. The case of needing to kick a user out of Datasette because they left a GitHub team or organization is going to be vanishingly rare, so leaving this to the administrators (as long as it is clearly documented) feels like an OK thing to do.

simonw added a commit that referenced this issue Jan 24, 2021
@simonw
Copy link
Owner Author

simonw commented Jan 24, 2021

Idea for an alternative mechanism: allow users to define a JSON file of GitHub user IDs that are allowed access. They can host this at a separate URL - e.g. in a gist. Datasette hits that URL and caches it for a few seconds between hits to figure out who is allowed access.

@simonw
Copy link
Owner Author

simonw commented Jan 24, 2021

For the moment I'm going to add the note about this to the README and ship the release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

1 participant