Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design offboarding process that hub admins can use to offboard their users #8

Open
3 tasks
yuvipanda opened this issue Mar 11, 2022 · 8 comments
Open
3 tasks

Comments

@yuvipanda
Copy link
Member

From @rabernat in 2i2c-org/infrastructure#1050 (comment):

Over the 5-10 years of this project, people will exit the project. We need a sustainable approach to not only onboarding but offboarding.

Question for 2i2c: Beyond simply removing their access via the github group, what is the process for offboarding them and specifically purging their user data from storage so we don't continuously accumulate abandoned data?

When we offboard people, the following tasks need to be done:

  • Make sure they can no longer log in. Removing the user from the appropriate GitHub team ought to do this
  • Delete their home directories, so it stops taking up space. There's not an easy way for hub admins to do this right now without involving 2i2c engineers.
  • Delete any data they might have on the scratch bucket.

We should design an interim offboarding process, as well as a self-service one.

@rabernat
Copy link

rabernat commented Apr 7, 2022

I am checking in on this issue. Has there been any progress on defining and implementing an offboarding process? The LEAP executive committee is reluctant to begin using our hub until an offboarding process is in place. So I would like to be able to provide an update on this.

[ ] Delete any data they might have on the scratch bucket.

I believe the scratch budgets are configured with a finite retention time for data. So that should not be necessary.

@choldgraf
Copy link
Member

No there has not been progress on this (for future reference, this issue would reflect any discussion or plans we have around this topic).

Is there a specific set of concerns or questions that you need addressed? Looking at the list in the top comment, it seems like the only thing for which there is uncertainty is Delete their home directories, so it stops taking up space. There's not an easy way for hub admins to do this right now without involving 2i2c engineers.. Current practice is simply "the representative for the LEAP community asks us delete a person's home directory, and we can do it". That is not a scalable or long-term pattern, but is at least a workable solution.

Is there something else that needs to be addressed and is causing hesitancy?

@rabernat
Copy link

rabernat commented Apr 7, 2022

The concern is with home directories. Some of the LEAP EC members are concerned about data accumulating from temporary users (e.g. bootcamp participants) which will generate an ever greater costs over the 5+ years of this project. This could be mitigated by enforcing quotas on home directories, but my understanding is that such quotas are not supported. Me personally emailing support for every offboarded users is not workable because:

  • I am a major bottleneck and won't personally be aware of every time a user is offboarded. There are potentially hundreds of users involved.
  • It creates manual toil for 2i2c engineers.

Over in leap-stc/leap-stc.github.io#1, I am proposing a user policy for our hub. In it I say the following

Removing a user from the leap-pangeo-users group entirely will disable their access
completely.
An automated process will delete user data from the hub one month after a user
is removed from the leap-pangeo-users group.

Is it feasible to implement a cron job of some sort that will perform this deletion.


Conversely, if you can provide some arguments and evidence that this issue (accumulating home-directory data) is not going to be a major cost or concern for our project, I can take that back to the EC. Without either such arguments or a technical solution to the problem, my colleagues will not feel comfortable moving forward with the hub.

@choldgraf
Copy link
Member

Yeah - it is a balance between "removing data quickly will piss off users that want it retained", and "removing data slowly will incur extra storage costs". We don't have a strict policy for this because different communities have different preferences.

I think a reasonable approach is "if a user is explicitly deleted (by being removed from a GitHub team in this case) then it should be assumed their data will be deleted at the end of the month". @yuvipanda is that similar to how Berkeley does this?

@yuvipanda
Copy link
Member Author

Berkeley's policy is that we archive user home directories to object storage if they haven't been used in 6 months, and users can manually request it back if they need to.

The question is how to figure out 'removed from the project for 1 month', as that can be a bit tricky. How about the following criteria:

  1. User home directory has not been modified in 1 month (or 2 months, my preference)
  2. User does not exist in the JupyterHub database.

So that means part of offboarding would require them to hit the 'delete' button in the hub control panel (https://leap.2i2c.cloud/hub/admin) and anyone designated hub admin can do so, along with removing them from the github team. It also means that user data could be gone sooner than 1 month after they are removed - as it is 1 month since last login. Hence my preference for that to be 2 months.

How does that sound, @rabernat?

@yuvipanda
Copy link
Member Author

The other alternative is we just add a 'delete home directory' button that the admin can press as part of offboarding. So the offboarding process becomes:

  1. Delete from GitHub teams
  2. Delete from hub control panel
  3. Press button in (Tbd location) to delete their home directory

@rabernat
Copy link

rabernat commented Apr 7, 2022

I am totally fine with the 2 months! 👍

Can you clarify what the current "delete user" button actually does?

  • Does it remove the home directory?
  • What if I delete a user and they are still in the github team. Can they still log in?
  • Conversely, what if I delete a user from the team but they are still in the hub database?

Can we find a way to avoid this manual step? My preference would be to have all membership managed via github, without having to have LEAP admins interact with the jupyterhub admin dashboard.

@yuvipanda
Copy link
Member Author

@rabernat can we revisit in say 6 months or so wrt the manual step for decomissioning? Given the limited dev resources we currently have, and that I've to focus on 2i2c-org/infrastructure#1146 too, I'd prefer to do this iteratively than all in one go.

Does it remove the home directory?

It does not right now, but maybe I can just hook this up and that should solve our problems?!

What if I delete a user and they are still in the github team. Can they still log in?

Yes they can.

Conversely, what if I delete a user from the team but they are still in the hub database?

This is actually an important quesiton I don't know the answer to atm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants