Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support GitHub token for access to private registry repo #123

Open
mfremont opened this issue Mar 19, 2021 · 11 comments
Open

support GitHub token for access to private registry repo #123

mfremont opened this issue Mar 19, 2021 · 11 comments

Comments

@mfremont
Copy link

Problem Context

PkgServer makes requests to GitHub to validate the registry repo URL on startup and to validate tree SHAs when fetching updates from the storage server, but it does not provide a way to supply a GitHub access token or other credentials to be used when making these requests for a private repo. This prevents use of PkgServer for private registries. Use of PkgServer for private registries is helpful to simplify access for users and continuous integration.

Proposal

Based on a discussion with @staticfloat, I'd like to propose adding PkgServer support for GitHub personal access tokens and use GitHub REST API endpoints for the two repository checks that PkgServer makes:

  • add an optional access_token member to RegistryMeta
  • change the repository URL check to use HEAD /repos/{owner}/{repository}
  • change the tree SHA check to use HEAD /repos/{owner}/{repository}/git/trees/{sha}
  • if the token is not nothing, add an HTTP Authorization header to the HEAD requests

I've validated that the above HEAD requests can be made with HTTP package, using tokens for private repos and no token for a public repo such as JuliaRegistries/General.

Implementation sketch

Add new functions, potentially scoped to a PkgServer sub module named GitHub

  • api_repository_url(repo_url::String)::String

    Composes the GitHub API URL based on the GitHub repository URL. For example, for the General
    registry, the repo and its API URL are:

    https://github.com/JuliaRegistries/General.git -> https://api.github.com/repos/JuliaRegistries/General

  • api_tree_url(repo_base_url::String, sha::String)::String

    Composes the GitHub API URL for a tree SHA. This URL can be used to validate that a tree SHA
    exists in the upstream repository. repo_base_url is the API URL for the repository resource,
    such as composed by api_repository_url(repo_url)

  • resource_exists(url::String; token::Union{Nothing,String}=nothing) -> Bool

    Returns true if a HEAD request for the GitHub url returns a 200, false otherwise.
    For non-public resources, specify a GitHub personal access token with the optional token
    parameter (only GitHub REST API requests can be authorized with a token).

Add new members to RegistryMeta:

  • upstream_api_url::String - the base URL for the registry repo within the GitHub API path namespace
  • access_token::Union{Nothing,String} - the optional access token

Modify:

  • RegistryMeta constructor to accept an optional access token and generate/validate the upstream_api_url from the repository URL
  • verify_registry_hash(uuid::AbstractString, hash::AbstractString) to use the proposed functions to validate the tree SHA

Considerations

GitHub personal access tokens potentially grant broad privileges to multiple repos

  • GitHub UI does not offer a way to generate a token that grants read-only repo access to a specific repository
    • access is implicitly write if the user is authorized for write
    • access spans all repos the user has access to
  • to comply with principle of least privilege, an organization/individual setting up a private PkgServer would likely need to create a "machine account" with only read access to the private registry repo

OAuth or GitHub deployment keys might allow for finer-grained access control, but would require more restructuring of PkgServer.

Unit tests that do not perform network requests should be straightforward.

Integration tests that validate functionality with a private registry repo are desirable, but will likely require some
assistance/coordination

  • setup a private repo under an appropriate Julia project organization
  • write an integration test
  • setup and validate the CI job
@DilumAluthge
Copy link
Member

GitHub deployment keys might allow for finer-grained access control

I do think that SSH deploy keys (with read-only access) might be better for this use case:

  1. Each SSH deploy key is scoped to a single GitHub repository. This restriction is enforced by GitHub.
  2. You can set each SSH deploy key to be read-only access.

@johnnychen94
Copy link
Contributor

johnnychen94 commented Mar 20, 2021

Unsure if this is already considered, a side effect of adding such support is that it encourages private packages, I just hope we don't have private packages in the public General registry.

@staticfloat
Copy link
Member

Unsure if this is already considered, a side effect of adding such support is that it encourages private packages

There are already many, many private packages. This isn't something we want to discourage. People should feel free to write whatever private packages they want. There are a couple packages that have been taken down since they were registered, but that's always going to be the case.

I do think that SSH deploy keys (with read-only access) might be better for this use case:

The issue is that we use an optimization here; we use the GitHub API (not git) to ensure that a treehash exists within the repository. To use SSH keys, we would need to maintain a local git clone of the registry, then do a lookup within the repository for the requested treehash, and if it doesn't exist, do a fetch, then check again. This will likely take significantly longer, but it may not be that big of an issue.

I'm in favor of us continuing to use the GitHub API optimization (where we don't need to manage local registry state, and we can make a single HTTP request to ensure that a treehash exists). Once someone has a private registry that they want to host on something that isn't GitHub, we'll likely need to implement a fallback git clone-based version of this, so we'll just keep in mind that that will need to exist as a fallback eventually.

@staticfloat
Copy link
Member

@mfremont your sketch looks great to me. I say full steam ahead!

@johnnychen94
Copy link
Contributor

johnnychen94 commented Mar 20, 2021

There are already many, many private packages.

I know, my argument is that we should not encourage this. I do concern that filling the default public registry with too many private packages will be harmful to the ecosystem; wouldn't it be terrifying when installing a package from the default registry requires authentication? That really doesn't sound like an ecosystem around the MIT license.

No objection to the proposal, I'm just a little bit concerned about its side effect.

@staticfloat
Copy link
Member

I do concern that filling the default public registry with too many private packages will be harmful to the ecosystem

You can't register a private package in General; it won't be merged. If you register a package and then make it private later, users won't be able to dev it, but they'll still be able to add it because the historical versions will be available in the StorageServer archives. There's nothing about this PR that encourages people to register private packages in a public registry; it's the opposite. It allows users behind firewalls to make use of private packages in private registries.

The changes here aren't to support private packages, they're to support private registries.

@mfremont
Copy link
Author

Thanks for the feedback.

My initial investigation included considering deployment keys, but, as @staticfloat mentioned, that appears to require using the git CLI or Libgit2, and I did not see a straightforward way to use either to verify the repository URL or tree SHAs without cloning the repo and performing periodic fetch/pull requests, and that has more overhead than the GitHub REST API requests. At the point when there is demand for private registry support with a repo hosted by a provider other than GitHub, it may be worth reconsideration.

I have some basics of the implementation working based on the above sketch, and will devote some time today and tomorrow to filling out that implementation and adding unit tests.

@staticfloat any thoughts or guidance on integration tests that to verify functionality with a private registry?

@staticfloat
Copy link
Member

any thoughts or guidance on integration tests that to verify functionality with a private registry?

Should be simple enough to set up a private registry. We can fork the Test registry, make it private, maybe add a new package to it, and then ensure that the package server can fetch it. We can create a dummy GitHub user and embed a PAT for that user into the CI tests for PkgServer.

@StefanKarpinski
Copy link
Collaborator

How about we just add an option to turn off verification? That's really only intended for the situation where there are multiple upstream storage servers and we don't necessarily trust all of them. In a private registry setting, that seems unlikely to be an issue.

@mfremont
Copy link
Author

@StefanKarpinski are you suggesting that we add a boolean to RegistryMeta to disable verification for that registry only so that a package server for a private registry can disable checks for that private registry?

A few considerations:

  • in the current model, any storage server can serve any number of registries
  • because JULIA_PKG_SERVER can only refer to a single package server, a package server for a private registry also needs to serve requests for packages in General, so it must either have a private storage server that also maintains all of General, or it must include one or more of the Julia storage servers in its configuration

To the best of my understanding, there is not currently a way to configure a package server instance to associate specific storage servers with specific registries. Consequently, to safely turn off verification for an individual registry, we'd need to add the registry -> storage server association. Assuming this is desirable, it would require more discussion of the design changes to make the registry to storage server association and how that would be expressed in the configuration.

In terms of the code required, the changes required to specify and use optional tokens with the GitHub REST API have been straightforward and minimalist to implement.

@StefanKarpinski
Copy link
Collaborator

I mean having a global flag that turns off verification for all registries. A typical setup for a private pkg server is to use the public pkg servers as upstream storage servers. Since pkg.julialang.org verifies registries and the connection is secure, it seems fine in that situation to not verify registries. Since the private registries are being served by a server that you operate, presumably you trust that as well, so in this situation it seems like verification is unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants