Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(github): add documentation #163

Merged
merged 6 commits into from
May 15, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 41 additions & 2 deletions docs/source/auth-providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Giftless provides the following authentication and authorization modules by defa

* `giftless.auth.jwt:JWTAuthenticator` - uses [JWT tokens](https://jwt.io/) to both identify
the user and grant permissions based on scopes embedded in the token payload.
* `giftless.auth.github:GithubAuthenticator` - uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) to both identify the user and grant permissions based on those for a GitHub repository of the same organization/name.
* `giftless.auth.allow_anon:read_only` - grants read-only permissions on everything to every
request; Typically, this is only useful in testing environments or in very limited
deployments.
Expand Down Expand Up @@ -75,7 +76,7 @@ Basic HTTP authentication.

You can disable this functionality or change the expected username using the `basic_auth_user` configuration option.

### Configuration Options
### `giftless.auth.jwt` Configuration Options
The following options are available for the `jwt` auth module:

* `algorithm` (`str`): JWT algorithm to use, e.g. `HS256` (default) or `RS256`. Must match the algorithm
Expand Down Expand Up @@ -191,6 +192,37 @@ The `leeway` parameter allows for providing a leeway / grace time to be
considered when checking expiry times, to cover for clock skew between
servers.

## GitHub Authenticator
This authenticator lets you provide a frictionless LFS backend for existing GitHub repositories. It plays nicely with `git` credential helpers and allows you to use GitHub as the single authentication & authorization provider.

### Details
The authenticator uses [GitHub Personal Access Tokens](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens), the same ones used for cloning a GitHub repo over HTTPS. The provided token is used in a couple GitHub API calls that identify the token's identity and [its permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) for the GitHub organization & repository. The token is supposed to be passed in the password part of the `Basic` HTTP auth (username is ignored). `Bearer` token HTTP auth is also supported, although no git client will likely use it.

For the authenticator to work properly the token must have the `read:org` for "Classic" or `metadata:read` permission for the fine-grained kind.

Note: Authentication via SSH that could be used to verify the user is [not possible with GitHub at the time of writing](https://github.com/datopian/giftless/issues/128#issuecomment-2037190728).

The GitHub repository permissions are mapped to [Giftless permissions](#permissions) in the straightforward sense that those able to write will be able to write, same with read; invalid tokens or identities with no repository access will get rejected.

To minimize the traffic to GitHub for each LFS action, most of the auth data is being temporarily cached in memory, which improves performance, but naturally also ignores immediate changes for identities with changed permissions.

### GitHub Auth Flow
Here's a description of the authentication & authorization flow. If any of these steps fails, the request gets rejected.

1. The URI of the primary git LFS (HTTP) [`batch` request](https://github.com/git-lfs/git-lfs/blob/main/docs/api/batch.md) is used (as usual) to determine what GitHub organization and repository is being targeted (e.g. `https://<server>/<org>/<repo>.git/info/lfs/...`). The request's `Authentication` header is also searched for the required GitHub personal access token.
athornton marked this conversation as resolved.
Show resolved Hide resolved
2. The token is then used in a [`/user`](https://docs.github.com/en/rest/users/users?apiVersion=2022-11-28#get-the-authenticated-user) GitHub API call to get its identity data.
3. Further on the GitHub API is asked for the [user's permissions](https://docs.github.com/en/rest/collaborators/collaborators?apiVersion=2022-11-28#get-repository-permissions-for-a-user) to the org/repo in question.
4. Based on the information above the user will be granted or rejected access.

### `giftless.auth.github` Configuration Options
* `api_url` (`str` = `"https://api.github.com"`): Base URL for the GitHub API (enterprise servers have API at `"https://<custom-hostname>/api/v3/"`).
* `api_version` (`str | None` = `"2022-11-28"`): Target GitHub API version; set to `None` to use GitHub's latest (rather experimental).
* `cache` (`dict`): Cache configuration section
* `token_max_size` (`int` = `32`): Max number of entries in the token -> user LRU cache. This cache holds the authentication data for a token. Evicted tokens will need to be re-authenticated.
* `auth_max_size` (`int` = `32`): Max number of [un]authorized org/repos TTL(LRU) for each user. Evicted repos will need to get re-authorized.
* `auth_write_ttl` (`float` = `15 * 60`): Max age [seconds] of user's org/repo authorizations able to `WRITE`. A repo writer will also need to be re-authorized after this period.
* `auth_other_ttl` (`float` = `30`): Max age [seconds] of user's org/repo authorizations **not** able to `WRITE`. A repo reader or a rejected user will get a chance for a permission upgrade after this period.

## Understanding Authentication and Authorization Providers

This part is more abstract, and will help you understand how Giftless handles
Expand Down Expand Up @@ -220,6 +252,10 @@ Very simply, an `Identity` object encapsulates information about the current use
request, and is expected to have the following interface:

```python
from typing import Optional
from giftless.auth.identity import Permission


class Identity:
name: Optional[str] = None
id: Optional[str] = None
Expand All @@ -244,9 +280,12 @@ Authorizer classes may use the default built-in `DefaultIdentity`, or implement
subclass of their own.

#### Permissions
Giftless defines the following permissions on entites:
Giftless defines the following permissions on entities:

```python
from enum import Enum


class Permission(Enum):
READ = "read"
READ_META = "read-meta"
Expand Down
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
import importlib
import importlib.metadata

from recommonmark.transform import AutoStructify

Expand Down
12 changes: 12 additions & 0 deletions docs/source/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,5 +126,17 @@ clients using these URLs. By default, the JWT auth provider is used here.

There is typically no need to override the default behavior.

#### `LEGACY_ENDPOINTS`
This is a `bool` flag, default `true` (deprecated, use `false` where possible), that affects the base URI of all the service endpoints. Previously, the endpoints didn't adhere to the rules for [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md), which needed additional routing or client configuration.

The default base URI for all giftless endpoints is now `/<org_path>/<repo>.git/info/lfs` while the legacy one is `/<org>/<repo>`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. It may take me personally some time to convert all our repos, but having discovery work is awesome.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this fast Giftless release pace... I'm really worried you might not make it in time! 😄

* `<org>` is a simple organization name not containing slashes (common for GitHub)
* `<org_path>` is a more versatile organization path which can contain slashes (common for GitLab)
* `<repo>` is a simple repository name not containing slashes

With `LEGACY_ENDPOINTS` set to `true`, **both the current and legacy** endpoints work simultaneously. When using the `basic_streamimg` transfer adapter, for backward compatibility it is the **legacy URI** that is being used for the object URLs in the batch API responses.

Setting `LEGACY_ENDPOINTS` to `false` makes everything use the current base URI, requests to the legacy URIs will get rejected.

#### `DEBUG`
If set to `true`, enables more verbose debugging output in logs.
58 changes: 58 additions & 0 deletions docs/source/github-lfs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
Shadowing GitHub LFS
====================

This guide shows how to use Giftless as the LFS server for an existing GitHub repository (not using GitHub LFS). Thanks to a handful tricks it also acts as a full remote HTTPS-based `git` repository, making this a zero client configuration setup.

This guide uses `docker compose`, so you need to [install it](https://docs.docker.com/compose/install/). It also relies on you using HTTPS for cloning GitHub repos. The SSH way is not supported.

### Running docker containers
To run the setup, `git clone https://github.com/datopian/giftless`, step into the `examples/github-lfs` and run `docker compose up`.

This will run two containers:
- `giftless`: Locally built Giftless server configured to use solely the [GitHub authentication provider](auth-providers.md#github-authenticator) and a local docker compose volume as the storage backend.
- `proxy`: An [Envoy reverse proxy](https://www.envoyproxy.io/) which acts as the frontend listening on a local port 5000, configured to route LFS traffic to `giftless` and pretty much anything else to `[api.]github.com`. **The proxy listens at an unencrypted HTTP**, setting the proxy to provide TLS termination is very much possible, but isn't yet covered (your turn, thanks for the contribution!).

Feel free to explore the `compose.yaml`, which contains all the details.

### Cloning a GitHub repository via proxy
The frontend proxy forwards the usual `git` traffic to GitHub, so go there and pick/create some testing repository where you have writable access and clone it via the proxy hostname (just change `github.com` for wherever you host):
```shell
git clone http://localhost:5000/$YOUR_ORG/$YOUR_REPO
```
When you don't use a credential helper, you might get asked a few times for the same credentials before the call gets through. [Make sure to get one](https://git-scm.com/doc/credential-helpers) before it drives you insane.

Thanks to the [automatic LFS server discovery](https://github.com/git-lfs/git-lfs/blob/main/docs/api/server-discovery.md) this is all you should need to become LFS-enabled!

### Pushing binary blobs
Let's try pushing some binary blobs then! See also [Quickstart](quickstart.md#create-a-local-repository-and-push-some-file).
```shell
# create some blob
dd if=/dev/urandom of=blob.bin bs=1M count=1
# make it tracked by LFS
git lfs track blob.bin
# the LFS tracking is written in .gitattributes, which you also want committed
git add .gitattributes blob.bin
git commit -m 'Hello LFS!'
# push it, assuming the local branch is main
# this might fail for the 1st time, when git automatically runs 'git config lfs.locksverify false'
git push -u origin main
```

This should eventually succeed, and you will find the LFS digest in place of the blob on GitHub and the binary blob on your local storage:
```shell
docker compose exec -it giftless find /lfs-storage
/lfs-storage
/lfs-storage/$YOUR_ORG
/lfs-storage/$YOUR_ORG/$YOUR_REPO
/lfs-storage/$YOUR_ORG/$YOUR_REPO/deadbeefb10bb10bad40beaa8c68c4863e8b00b7e929efbc6dcdb547084b01
```

Next time anyone clones the repo (via the proxy), the binary blob will get properly downloaded. Failing to use the proxy hostname will make `git` use GitHub's own LFS, which is a paid service you are obviously trying to avoid.

### Service teardown

Finally, to shut down your containers, break (`^C`) the current compose run and clean up dead containers with:
```shell
docker compose down [--volumes]
```
Using `--volumes` tears down the `lfs-storage` volume too, so make sure it's what you wanted.
1 change: 1 addition & 0 deletions docs/source/guides.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ This section includes several how-to guides designed to get you started with Gif
quickstart
using-gcs
jwt-auth-guide
github-lfs
6 changes: 6 additions & 0 deletions examples/github-lfs/.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# listening (proxy) port on the host
SERVICE_PORT=5000
# inner port giftless listens on
GIFTLESS_PORT=5000
# inner port the reverse proxy listens on
PROXY_PORT=8080
162 changes: 162 additions & 0 deletions examples/github-lfs/compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,162 @@
name: github-lfs

volumes:
lfs-storage: {}

services:
giftless:
image: docker.io/datopian/giftless:latest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might actually want to point to the lsst-sqre one for now. I haven't been making any headway on getting access to PyPi and Docker from Datopian. And honestly I don't want write access to their repositories, but I would like to put newer versions out there.

Copy link
Collaborator Author

@vit-zikmund vit-zikmund May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's a [Datopian] shame, but in this example it's just a local name (as the pull policy is cleverly set to Never 🤣) The image is always being built locally (only using that one as the cache source - and that itself is of questionable benefit at the moment - it's there to reflect exactly how the Makefile does it)

volumes:
- lfs-storage:/lfs-storage
environment:
GIFTLESS_DEBUG: "1"
GIFTLESS_CONFIG_STR: |
# use endpoints at /<org>/<repo>.git/info/lfs/ only
LEGACY_ENDPOINTS: false
AUTH_PROVIDERS:
- factory: giftless.auth.github:factory
TRANSFER_ADAPTERS:
basic:
factory: giftless.transfer.basic_streaming:factory
options:
# use the lfs-storage volume as local storage
storage_class: giftless.storage.local_storage:LocalStorage
storage_options:
path: /lfs-storage
# disable the default JWT pre-auth provider, object up/downloads get also authorized via GitHub
PRE_AUTHORIZED_ACTION_PROVIDER: null
command: "--http=0.0.0.0:$GIFTLESS_PORT -M -T --threads 2 -p 2 --manage-script-name --callable app"
pull_policy: never # prefer local build
build:
cache_from:
- docker.io/datopian/giftless:latest
context: ../..

proxy:
image: docker.io/envoyproxy/envoy:v1.30-latest
configs:
- source: envoy
target: /etc/envoy/envoy.yaml
command: "/usr/local/bin/envoy -c /etc/envoy/envoy.yaml"
ports:
- "$SERVICE_PORT:$PROXY_PORT"
depends_on:
giftless:
condition: service_started

configs:
envoy:
content: |
static_resources:
listeners:
- address:
socket_address:
address: 0.0.0.0
port_value: $PROXY_PORT # proxy port
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
suppress_envoy_headers: true
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: /dev/stdout
generate_request_id: false
preserve_external_request_id: true
route_config:
name: ingress_route
virtual_hosts:
- name: giftless
domains:
- "*"
routes:
- name: giftless
# Only this goes to the giftless service
match:
safe_regex:
regex: (?:/[^/]+){2,}\.git/info/lfs(?:/.*|$)
route:
timeout: 0s # don't break long-running downloads
cluster: giftless
- name: api_github_com
# Routing 3rd party tools assuming this is a GitHub Enterprise URL /api/v#/X to public api.github.com/X
match:
safe_regex: &api_regex
regex: /api/v\d(?:/(.*)|$)
route:
regex_rewrite:
pattern: *api_regex
substitution: /\1
host_rewrite_literal: api.github.com
timeout: 3600s
cluster: api_github_com
request_headers_to_remove:
- x-forwarded-proto
- name: github_com
# Anything else is forwarded directly to GitHub
match:
prefix: "/"
route:
host_rewrite_literal: github.com
timeout: 3600s
cluster: github_com
request_headers_to_remove:
- x-forwarded-proto
clusters:
- name: giftless
connect_timeout: 0.25s
type: strict_dns
lb_policy: round_robin
load_assignment:
cluster_name: giftless
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: giftless # inner giftless hostname
port_value: $GIFTLESS_PORT # local giftless port
- name: api_github_com
type: logical_dns
# Comment out the following line to test on v6 networks
dns_lookup_family: v4_only
load_assignment:
cluster_name: api_github_com
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: api.github.com
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: api.github.com
- name: github_com
type: logical_dns
# Comment out the following line to test on v6 networks
dns_lookup_family: v4_only
load_assignment:
cluster_name: github_com
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: github.com
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: github.com