Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block insecure ways of pulling from private repositories by default #85

Open
yuvipanda opened this issue May 22, 2019 · 7 comments
Open

Comments

@yuvipanda
Copy link
Contributor

yuvipanda commented May 22, 2019

See comment #85 (comment) for a lot more detail

Since we don't have good support for pulling from private repositories, folks often put their own personal access tokens in the nbgitpuller URL.

This is extremely dangerous, and the same as sharing your password. We should detect and block this, but only after making sure we have some easy way for folks to pull from private repositories.

@albertmichaelj
Copy link
Collaborator

I would strongly encourage against this. I have set up a bot github account with read only access and I use that token, and I distribute links to my students through private channels (the course information system at my university). I cannot have my course materials publicly available for IP reasons, so I need to be able to pull from a private repo using just a link. I do not see any way in which you can get around this without distributing something just as confidential to students in order to allow for cloning the repos.

I think that actively disallowing this completely normal and expected feature of github based workflows is both insulting to the users and, frankly, pointless. I think that recommending people not share access tokens is fine (and not documenting the method to do so except to warn people), but if I choose to share an access token, understanding the consequences and having taken precautions I feel comfortable with, then I would be quite disappointed if someone else told me I wasn't allowed to.

I would strongly encourage you to delete this as an issue.

@yuvipanda
Copy link
Contributor Author

First, I apologize if this was insulting, @albertmichaelj. I can see how the framing of this issue would feel like that, and I'll shortly reword it to feel less like that.

There are primarily two different ways to safely support pulling private repos:

  1. Create a readonly deploy key for your repo, and put it in your JupyterHub in a way that git will automatically use it when fetching.
  2. Create a bot account, grant it readonly access to your repo, and distribute the access token. You can distribute this in two ways:
    a. With a netrc file in the JupyterHub
    b. As part of the url itself

Both (1) and (2a) have the following advantages:

  1. The secret is in the JupyterHub, not in in the URL
  2. You can thus rotate the secret, revoke it / make a new one without causing issues for users
  3. It doesn't get logged in the various places a HTTP request URL might get logged - on disk, in a cloud service, in a end user's internet proxy, random folks running wireshark looking for secrets, etc.

(2b) has disadvantages that almost mirror the advantages of (1) and (2a):

  1. Can not rotate the secret without having to re-spread the URL
  2. Secret can get logged in various places in the network path. This makes it even more dangerous when used without HTTPS
  3. Most dangerously, the easiest way to accomplish this, and what most people are likely to do (and we have seen do), is to use your own personal access token. This is pretty bad, and is sortof hard to prevent. People are strapped for time, making a secure bot account (with a password you have to remember) takes time, and folks will use their personal access token.

Software should be secure by default, and if you want to do something insecure, you have to explicitly opt into it. Accessing private repos is an extremely common use case, so we should figure out how to explicitly support that well (#53). This would mean we should document the various ways you can do it, and the ways you should not. Otherwise folks will find and document insecure ways of doing this - for example, see discussion in #53. So not documenting it only gives us security by obscurity.

As software developers, we should make it easy to do the secure thing, hard to do things that are insecure but can not be accomplished in any other way, and practically impossible to do things that are insecure but easily doable in other ways.

I'd that method (1) is much easier to do than creating a secure bot account, and is what most folks should do. It also gives you per-repo control.

I'm interested in learning if options (1) or (2a) do not work for your use case. If you can expand on what your use case is, we can probably find another way of supporting that that doesn't involve spreading credentials in URLs. If not, we can figure out how to allow this with a big security warning and requiring explicit opt-in.

I hope this makes a little more sense on the rationale behind this issue, and I again apologize if it felt insulting.

@yuvipanda yuvipanda changed the title Block putting git passwords / tokens in the URL Block insecure ways of pulling from private repositories by default May 28, 2019
@albertmichaelj
Copy link
Collaborator

I appreciate your detailed response, and I respect the desire to provide a secure piece of software. I also agree that encouraging people to use the personal access token in the url is poor practice. However, as best I can tell, your suggestions make the pretty strong assumption that people will only be using nbgitpuller for JupyterHub. I use it for local installations as well.

I could definitely give all of my students netrc files, but I'm not sure this is actually safer (though your point about the url not being logged is a good one) since the students still have the credentials. However, I work with fairly low tech comfort level students, and I've done a lot of work to make the process as painless as possible for them. Trying to get them to correctly configure something through a hidden file would be a major pain point for my class.

As far as JupyterHub goes, I'm sure there are better ways to access private git repos. However, I still see no reason to take away an option that works perfect for my use case (if someone snoops on my bot accounts read only access token, it's not a big deal, I just have to make a good faith effort to keep the files secure, but the students can always do arbitrarily bad things with the files once they have them). I think that there should be a more secure and easier to use option as well, but I do not think that just forbidding including a private access token (like I could with any git clone command) is the right way to solve the problem.

@yuvipanda
Copy link
Contributor Author

Ah, very interesting. I had not considered the non-JupyterHub use case at all - thank you for that perspective. I agree that trying to get non-tech-savvy folks to set up a netrc or ssh key on their own isn't an acceptable solution. I also agree that including a token in the URL is probably the most realistic option there.

Another way to approach this would be:

  1. Provide a notebook option to enable this.
  2. Forbid enabling this when running inside a JupyterHub, since I still think the other options are a better fit on JupyterHub. The network logging concerns are also more acute there. I could probably be easily convinced otherwise the first time someone files a bug with a clear use case.
  3. Allow turning this on when starting the notebook in a non-JupyterHub setting. I'm torn on whether this should default to allow or deny.

This makes it impossible to do the insecure thing when alternatives are available (on a JupyterHub), and possible when alternatives aren't realistically available.

For (3), can you tell me how your users typically start a Jupyter Notebook process? I'd also generally love to hear a little more about your use case.

Thank you for helping make nbgitpuller better and more useful! <3

@albertmichaelj
Copy link
Collaborator

What I've done is create an environment.yaml file that the students import through the anaconda app that they get from installing the standard anaconda distribution. The students then can select the conda environment in a drop down menu on the anaconda app and just click launch on Jupyter Lab. So, the users do not have the ability to specify any kind of command line switch for the notebook server.

I have experimented with having a jupyter notebook that they have to run to finish installation that writes things like environmental variables in the conda environment that will be set when the environment is activated (though the conda activate.d directories), so there are ways that I could use this to set up some kind of key. However, running the installation notebook and maintaining it has been somewhat problematic. That is why I've been looking forward to pull request #42 being merged so that I can specify where the class files end up so they're not in the home directory.

However, philosophically I don't think the problem is with nbgitpuller. Github allows for the cloning of repos using personal access tokens. nbgitpuller is just a wrapper for git that handles things like local file changes and recloning a little better (and it provides a url that can be clicked). I don't think that it's really nbgitpuller's job to limit git's capabilities. I understand the argument against using it, but it is something that GitHub allows, and it has its use cases. I would argue they shouldn't even be disallowed for JupyterHub.

Instead, I think that there should be an easy alternative to the personal access token that is well documented, and then people will naturally choose that path. People aren't going to find an obscure way to use personal access tokens if they can do something much easier that is spelled out in the documentation. I don't see why this has to be explicitly forbidden when their may be unforeseen reasonable use cases for not disallowing built in GitHub functionality.

@yuvipanda yuvipanda removed this from the v1.0 milestone Jul 4, 2019
@danfulton
Copy link

danfulton commented Aug 9, 2019

EDIT: I figured out how to use SSH address in the nbgitpuller generated link, and seems to be working now. The documentation for JupyterLab does not seem accurate, as adding %3Fautodecode to a JupyterLab link does not prevent the workspace prompt.

@yuvipanda

Create a readonly deploy key for your repo, and put it in your JupyterHub in a way that git will automatically use it when fetching.

I've seen you mention this a few times in a few issue threads around nbgitpuller and tljh, but I can't seem to figure out how to do it.

I have a private repo in bitbucket, and I've generated an SSH key on my TLJH instance, and added the public key to bitbucket, but nbgitpuller always defaults to https. Can I force TLJH to use ssh globally? Or am I barking up the wrong tree with SSH, and this is better accomplished with OAuth (which I'm also unfamiliar with).

@abalbekov
Copy link

I imagine following workflow that will not include Access Token in nbgitpuller URL:

  1. Instructor creates Access Token in github
  2. Instructor then saves the AT in JupyterHub and gives it a token name. The JupyterHub needs to provide a web page for saving access tokens. Alternatively instructor saves the token somewhere on JupyterHub host.
  3. Instructor then generates a URL that instead of actual AT includes a token name.
  4. Student receives the URL, clicks on it and nbgitpuller finds the actual saved token by token name and uses it to access git.

I would also suggest another security improvement.
What if a student username could also be (optionally) included in the URL ?
Then nbgitpuller could make sure that only this JupyterHub user is able to use the URL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants