Skip to content

Commit

Permalink
Merge pull request #55 from wtsi-npg/devel
Browse files Browse the repository at this point in the history
Release to master
  • Loading branch information
nerdstrike authored Oct 16, 2023
2 parents 8a6dbaf + bd9f0f0 commit 9a43979
Show file tree
Hide file tree
Showing 38 changed files with 1,079 additions and 394 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@
__pycache__
*.egg-info
.vscode
.eggs
.eggs
build
.pytest_cache
.vscode
34 changes: 28 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,34 @@ cd server
mkdir -p logs
export DB_URL=postgresql+asyncpg://npg_rw:$PASS@npg_porch_db:$PORT/$DATABASE
export DB_SCHEMA='non_default'
uvicorn main:app --host 0.0.0.0 --port 8080 --reload --log-config logging.json
uvicorn npg.main:app --host 0.0.0.0 --port 8080 --reload --log-config logging.json
```

and open your browser at `http://localhost:8080` to see links to the docs.

The server will not start without `DB_URL` in the environment

## Running in production

When you want HTTPS, logging and all that jazz:

```bash
uvicorn main:app --workers 2 --host 0.0.0.0 --port 8080 --log-config ~/logging.json --ssl-keyfile ~/.ssh/key.pem --ssl-certfile ~/.ssh/cert.pem --ssl-ca-certs /usr/local/share/ca-certificates/institute_ca.crt
```

Consider running with nohup or similar.

Some notes on arguments:
--workers: How many pre-forks to run. Async should mean we don't need many. Directly increases memory consumption

--host: 0.0.0.0 = bind to all network interfaces. Reliable but greedy in some situations

--log-config: Refers to a JSON file for python logging library. An example file is found in /server/logging.json. Uvicorn provides its own logging configuration via `uvicorn.access` and `uvicorn.error`. These may behave undesirably, and can be overridden in the JSON file with an alternate config. Likewise, fastapi logs to `fastapi` if that needs filtering. For logging to files, set `use_colors = False` in the relevant handlers or shell colour settings will appear as garbage in the logs.

--ssl-keyfile: A PEM format key for the server certificate
--ssl-certfile: A PEM format certificate for signing HTTPS communications
--ssl-ca-certs: A CRT format certificate authority file that pleases picky clients. Uvicorn does not automatically find the system certificates, or so it seems.

## Testing

```bash
Expand All @@ -71,11 +92,15 @@ Create a schema on a postgres server:

```bash
psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d postgres
```

CREATE SCHEMA npg_porch
```sql
CREATE SCHEMA npg_porch;
SET search_path = npg_porch, public;
GRANT USAGE ON SCHEMA npg_porch TO npgtest_ro, npgtest_rw;
```

Then run a script that deploys the ORM to this schema
The SET command ensures that the new schema is visible _for one session only_ in the `\d*` commands you might use in psql. Then run a script that deploys the ORM to this schema

```bash
DB=npg_porch
Expand All @@ -89,7 +114,6 @@ psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d $DB
Permissions must be granted to the npg_rw and npg_ro users to the newly created schema

```sql
GRANT USAGE ON SCHEMA npg_porch TO npgtest_ro, npgtest_rw;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA npg_porch TO npgtest_rw;
GRANT SELECT ON ALL TABLES IN SCHEMA npg_porch TO npgtest_ro;

Expand All @@ -98,6 +122,4 @@ GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA npg_porch TO npgtes

Note that granting usage on sequences is required to allow autoincrement columns to work during an insert. This is a trick of newer Postgres versions.

It may prove necessary to `GRANT` to specific named tables and sequences. Under specific circumstances the `ALL TABLES` qualifier doesn't work.

Until token support is implemented, a row will need to be inserted manually into the token table. Otherwise none of the event logging works.
25 changes: 17 additions & 8 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,15 @@ Bash tools like `jq` and `jo` can be useful in working with the server, as all m

We have tried to make interactions with npg_porch as atomic as possible, so the data you send and the data you receive follow the same schema.

Security is necessary in order to prevent accidental misuse of npg_porch. An authorisation token can be provided to you by the maintainers, which you will then use to enable each request. Not implemented yet!
Security is necessary in order to prevent accidental misuse of npg_porch. An authorisation token can be provided to you by the maintainers, which you will then use to enable each request.

A note on HTTPS: Client libraries like `requests`, certain GUIs and Firefox will try to verify the server certificate authority. System-administered software are already configured correctly, but other packages installed by conda or pip may need to be told how the client may verify with a client certificate e.g. contained in `/usr/share/ca-certificates/`. It may also be useful to unset `https_proxy` and `HTTPS_PROXY` in your environment.

### Step 0 - get issued security tokens

Access to the service is loosely controlled with authorisation tokens. You will be issued with an admin token that enables you to register pipelines, and further tokens for pipeline-specific communication. Please do not share the tokens around and use them for purposes besides the specific pipeline. This will help us to monitor pipeline reliability and quality of service. Authorisation is achieved by HTTP Bearer Token:

`curl -L -H "Authorization: Bearer $TOKEN" https://$SERVER:$PORT`

### Step 1 - register your pipeline with npg_porch

Expand All @@ -38,7 +46,7 @@ You can name your pipeline however you like, but the name must be unique, and be
}
```

`url='npg_porch_server.sanger.ac.uk/pipelines'; curl -L -XPOST ${url} -H "content-type: application/json" -w " %{http_code}" -d @pipeline-def.json`
`url='https://$SERVER:$PORT/pipelines'; curl -L -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $ADMIN_TOKEN" -w " %{http_code}" -d @pipeline-def.json`

Keep this pipeline definition with your data, as you will need it to tell npg_porch which pipeline you are acting on.

Expand Down Expand Up @@ -110,9 +118,9 @@ Note that it is possible to run the same `task_input` with a different `pipeline
Now you want the pipeline to run once per specification, and so register the documents with npg_porch.

```bash
url='npg_porch_server.sanger.ac.uk/tasks'
url='https://$SERVER:$PORT/tasks'
for DOC in *.json; do
response=$(curl -w '%{http_code}' -L -XPOST ${url} -H "content-type: application/json" -d @${DOC}`)
response=$(curl -w '%{http_code}' -L -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $TOKEN" -d @${DOC}`)
# parsing the response is left as an exercise for the reader...
if [[ "$response_code" ne 201]]; then
Expand All @@ -128,7 +136,7 @@ use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new(POST => 'npg_porch_server.sanger.ac.uk/tasks');
my $request = HTTP::Request->new(POST => 'https://$SERVER:$PORT/tasks');
$request->content_type('application/json');
$request->header(Accept => 'application/json');
$request->content($DOC);
Expand Down Expand Up @@ -180,8 +188,8 @@ Supposing there are new tasks created every 24 hours, we then also need a client
Using the "claim" interface, you can ask npg_porch to earmark tasks that you intend to run. Others will remain unclaimed until this script or another claims them. Generally speaking, tasks are first-in first-out, so the first task you get if you claim one is the first unclaimed task npg_porch was told about.
```bash
url='npg_porch_server.sanger.ac.uk/tasks/claim'
response=$(curl -L -I -XPOST ${url} -H "content-type: application/json" -d @pipeline-def.json)
url='https://$SERVER:$PORT/tasks/claim'
response=$(curl -L -I -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $TOKEN" -d @pipeline-def.json)
```
Response body:
Expand Down Expand Up @@ -216,9 +224,10 @@ or
use JSON qw/decode_json/;
my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new(POST => 'npg_porch_server.sanger.ac.uk/tasks/claim');
my $request = HTTP::Request->new(POST => 'https://$SERVER:$PORT/tasks/claim');
$request->content_type('application/json');
$request->header(Accept => 'application/json');
$request->header(Authorization => "Bearer $TOKEN")
my $response = $ua->request($request);
if ($response->is_success) {
Expand Down
33 changes: 33 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[build-system]
requires = ["setuptools>=61"]
build-backend = "setuptools.build_meta"

[project]
name = "npg_porch"
requires-python = ">=3.10"
authors = [{name="Marina Gourtovaia", email="[email protected]"}, {name="Kieron Taylor", email="[email protected]"}]
description = "API server for tracking unique workflow executions"
readme = "README.md"
license = {file = "LICENSE.md"}
dependencies = [
"aiosqlite",
"asyncpg",
"fastapi",
"pydantic > 2.0.0",
"pysqlite3",
"psycopg2-binary",
"sqlalchemy >2",
"ujson",
"uvicorn",
"uuid"
]
dynamic = ["version"]

[project.optional-dependencies]
test = [
"pytest",
"pytest-asyncio",
"requests",
"flake8",
"httpx"
]
7 changes: 6 additions & 1 deletion server/deploy_schema.py → scripts/deploy_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
if schema_name is None:
schema_name = 'npg_porch'

engine = sqlalchemy.create_engine(db_url)
print(f'Deploying npg_porch tables to schema {schema_name}')

engine = sqlalchemy.create_engine(
db_url,
connect_args={'options': f'-csearch_path={schema_name}'}
)

npg.porchdb.models.Base.metadata.schema = schema_name
npg.porchdb.models.Base.metadata.create_all(engine)
75 changes: 75 additions & 0 deletions scripts/issue_token.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/usr/bin/env python

import argparse
from sqlalchemy import create_engine, select
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.exc import NoResultFound

from npg.porchdb.models import Token, Pipeline

parser = argparse.ArgumentParser(
description='Creates a token in the backend DB and returns it'
)

parser.add_argument(
'-H', '--host', help='Postgres host', required=True
)
parser.add_argument(
'-d', '--database', help='Postgres database', default='npg_porch'
)
parser.add_argument(
'-s', '--schema', help='Postgres schema', default='npg_porch'
)
parser.add_argument(
'-u', '--user', help='Postgres rw user', required=True
)
parser.add_argument(
'-p', '--password', help='Postgres rw password', required=True
)
parser.add_argument(
'-P', '--port', help='Postgres port', required=True
)
parser.add_argument(
'-n', '--pipeline', help='Pipeline name. If given, create '
)
parser.add_argument(
'-D', '--description', help='Description of token purpose', required=True
)

args = parser.parse_args()


db_url = f'postgresql+psycopg2://{args.user}:{args.password}@{args.host}:{args.port}/{args.database}'

engine = create_engine(db_url, connect_args={'options': f'-csearch_path={args.schema}'})
SessionFactory = sessionmaker(bind=engine)
session = SessionFactory()

token = None
pipeline = None

if args.pipeline:
try:
pipeline = session.execute(
select(Pipeline)
.where(Pipeline.name == args.pipeline)
).scalar_one()
except NoResultFound:
raise Exception(
'Pipeline with name {} not found in database'.format(args.pipeline)
)

token = Token(
pipeline=pipeline,
description=args.description
)
else:
token = Token(description=args.description)

session.add(token)
session.commit()

print(token.token)

session.close()
engine.dispose()
File renamed without changes.
File renamed without changes.
File renamed without changes.
45 changes: 45 additions & 0 deletions server/npg/porch/auth/token.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2022 Genome Research Ltd.
#
# Author: Kieron Taylor [email protected]
# Author: Marina Gourtovaia [email protected]
#
# This file is part of npg_porch
#
# npg_porch is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 3 of the License, or (at your option) any
# later version.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# this program. If not, see <http://www.gnu.org/licenses/>.

import logging
from fastapi import Depends
from fastapi.security import HTTPBearer
from fastapi import HTTPException

from npg.porchdb.connection import get_CredentialsValidator
from npg.porchdb.auth import CredentialsValidationException

auth_scheme = HTTPBearer()

async def validate(
creds = Depends(auth_scheme),
validator = Depends(get_CredentialsValidator)
):

token = creds.credentials
p = None
try:
p = await validator.token2permission(token)
except CredentialsValidationException as e:
logger = logging.getLogger(__name__)
logger.warning(str(e))
raise HTTPException(status_code=403, detail="Invalid token")

return p
Loading

0 comments on commit 9a43979

Please sign in to comment.