Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release to master #55

Merged
merged 52 commits into from
Oct 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
7cf101d
Token-based authorization.
mgcam Mar 3, 2022
ce17035
correct name for the test file
mgcam Mar 9, 2022
60aa467
Bug fix: need left join to link tokens to pipelines
mgcam Mar 9, 2022
ced9197
Provide clue when testing mode not enabled
nerdstrike Mar 7, 2022
5adeeba
Remove spurious test mark
nerdstrike Mar 7, 2022
1192809
Document some settings for production use
nerdstrike Mar 9, 2022
6230a85
Improve regexp for the token string.
mgcam Mar 9, 2022
93dea5e
Use custom exception for token validation
mgcam Mar 9, 2022
908c868
Exception declaration and propagation
mgcam Mar 9, 2022
24b40ad
Copyright notices added/fixed
mgcam Mar 9, 2022
d314852
Token auth for request to create a pipeline.
mgcam Mar 10, 2022
45fa130
Merge pull request #33 from mgcam/auth4pipelines
nerdstrike Mar 10, 2022
e2c00c7
Merge pull request #34 from nerdstrike/quality_of_life
mgcam Mar 10, 2022
36f7271
Copyright notices added/fixed
mgcam Mar 9, 2022
ad032b9
Simple token auth for task endpoints
mgcam Mar 10, 2022
1d75aed
Pipeline-aware token auth for task endpoints
mgcam Mar 11, 2022
18a69b7
Fix typo
mgcam Mar 14, 2022
4f36cb6
Simplified class definitions for custom exceptions.
mgcam Mar 14, 2022
cfca748
Merge pull request #35 from mgcam/token_auth4task_endpoints
nerdstrike Mar 14, 2022
03e0da1
Update the user guide to describe token use.
nerdstrike Mar 15, 2022
10e518f
Addendum about proxies
nerdstrike Mar 15, 2022
75e1b15
Mask some tool baggage
nerdstrike Mar 15, 2022
97f05d6
Big refactor to make package install and be testable in develop and r…
nerdstrike Mar 17, 2022
253de7d
Make scripts and server correctly use postgres schema
nerdstrike Mar 22, 2022
5282e51
Prevent tests from receiving postgres driver options
nerdstrike Mar 23, 2022
59453b6
Make user guide refer to https wherever possible
nerdstrike Mar 24, 2022
fb82a89
Merge pull request #36 from nerdstrike/token_admin
mgcam Mar 24, 2022
9f4b71b
Complete get_tasks coverage and allow filtering by pipeline name and …
nerdstrike Apr 8, 2022
0e6384e
Merge pull request #42 from nerdstrike/filtered_tasks
mgcam Apr 9, 2022
90dc184
Newer starlette releases require httpx, and redirect handling has cha…
nerdstrike Dec 13, 2022
8661051
Merge pull request #46 from wtsi-npg/fix_starlette_dependency
mgcam Dec 16, 2022
cfab67a
Move from checkout@v2 to @v3
kjsanger Dec 16, 2022
23cdb35
Merge pull request #47 from wtsi-npg/kjsanger-patch-2
jmtcsngr Dec 16, 2022
6b36e5c
Prevent sqlalchemy 2, pydantic 2 from installing
nerdstrike Aug 15, 2023
f93107d
Merge branch 'devel' into stabilize_deps
nerdstrike Aug 15, 2023
2016f5d
Merge pull request #49 from nerdstrike/stabilize_deps
nerdstrike Aug 15, 2023
dc38619
Switch to pyproject file for package setup
nerdstrike Jul 26, 2023
36a996a
Use setup.cfg to find modules, remove redundant setup.py
nerdstrike Aug 22, 2023
c4ee822
Annoying httpx dependency absence
nerdstrike Aug 22, 2023
497acdd
Push to python 3.10
nerdstrike Aug 22, 2023
34b39ae
Merge pull request #48 from nerdstrike/update_packaging
mgcam Aug 22, 2023
010b0c0
Compatibility upgrade to use sqlachemy 2
nerdstrike Aug 22, 2023
f870e48
Adjust pydantic use until tests pass.
nerdstrike Aug 24, 2023
9c3e56d
Satisfy pydantic deprecation warnings
nerdstrike Aug 24, 2023
fe491c5
Merge pull request #50 from nerdstrike/sqlalchemy_upgrade
mgcam Aug 25, 2023
aae9ee3
Merge pull request #51 from nerdstrike/upgrade_pydantic
mgcam Aug 25, 2023
7432a8e
Use best/better practices for type hints and pydantic constraints.
nerdstrike Oct 11, 2023
3bae5f3
Merge pull request #52 from nerdstrike/pydantic_second_pass
mgcam Oct 11, 2023
3a0b662
Try to prevent dormant DB connection from causing 500 errors for the …
nerdstrike Oct 12, 2023
f96876d
Merge pull request #53 from nerdstrike/db_reconnect
mgcam Oct 12, 2023
80ca8fc
Version update
nerdstrike Oct 16, 2023
bd9f0f0
Merge pull request #54 from nerdstrike/devel
nerdstrike Oct 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v3
- name: Set up Python 3.10
uses: actions/setup-python@v2
with:
Expand Down
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,7 @@
__pycache__
*.egg-info
.vscode
.eggs
.eggs
build
.pytest_cache
.vscode
34 changes: 28 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,34 @@ cd server
mkdir -p logs
export DB_URL=postgresql+asyncpg://npg_rw:$PASS@npg_porch_db:$PORT/$DATABASE
export DB_SCHEMA='non_default'
uvicorn main:app --host 0.0.0.0 --port 8080 --reload --log-config logging.json
uvicorn npg.main:app --host 0.0.0.0 --port 8080 --reload --log-config logging.json
```

and open your browser at `http://localhost:8080` to see links to the docs.

The server will not start without `DB_URL` in the environment

## Running in production

When you want HTTPS, logging and all that jazz:

```bash
uvicorn main:app --workers 2 --host 0.0.0.0 --port 8080 --log-config ~/logging.json --ssl-keyfile ~/.ssh/key.pem --ssl-certfile ~/.ssh/cert.pem --ssl-ca-certs /usr/local/share/ca-certificates/institute_ca.crt
```

Consider running with nohup or similar.

Some notes on arguments:
--workers: How many pre-forks to run. Async should mean we don't need many. Directly increases memory consumption

--host: 0.0.0.0 = bind to all network interfaces. Reliable but greedy in some situations

--log-config: Refers to a JSON file for python logging library. An example file is found in /server/logging.json. Uvicorn provides its own logging configuration via `uvicorn.access` and `uvicorn.error`. These may behave undesirably, and can be overridden in the JSON file with an alternate config. Likewise, fastapi logs to `fastapi` if that needs filtering. For logging to files, set `use_colors = False` in the relevant handlers or shell colour settings will appear as garbage in the logs.

--ssl-keyfile: A PEM format key for the server certificate
--ssl-certfile: A PEM format certificate for signing HTTPS communications
--ssl-ca-certs: A CRT format certificate authority file that pleases picky clients. Uvicorn does not automatically find the system certificates, or so it seems.

## Testing

```bash
Expand All @@ -71,11 +92,15 @@ Create a schema on a postgres server:

```bash
psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d postgres
```

CREATE SCHEMA npg_porch
```sql
CREATE SCHEMA npg_porch;
SET search_path = npg_porch, public;
GRANT USAGE ON SCHEMA npg_porch TO npgtest_ro, npgtest_rw;
```

Then run a script that deploys the ORM to this schema
The SET command ensures that the new schema is visible _for one session only_ in the `\d*` commands you might use in psql. Then run a script that deploys the ORM to this schema

```bash
DB=npg_porch
Expand All @@ -89,7 +114,6 @@ psql --host=npg_porch_db --port=$PORT --username=npg_admin --password -d $DB
Permissions must be granted to the npg_rw and npg_ro users to the newly created schema

```sql
GRANT USAGE ON SCHEMA npg_porch TO npgtest_ro, npgtest_rw;
GRANT USAGE ON ALL SEQUENCES IN SCHEMA npg_porch TO npgtest_rw;
GRANT SELECT ON ALL TABLES IN SCHEMA npg_porch TO npgtest_ro;

Expand All @@ -98,6 +122,4 @@ GRANT SELECT, INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA npg_porch TO npgtes

Note that granting usage on sequences is required to allow autoincrement columns to work during an insert. This is a trick of newer Postgres versions.

It may prove necessary to `GRANT` to specific named tables and sequences. Under specific circumstances the `ALL TABLES` qualifier doesn't work.

Until token support is implemented, a row will need to be inserted manually into the token table. Otherwise none of the event logging works.
25 changes: 17 additions & 8 deletions docs/user_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,15 @@ Bash tools like `jq` and `jo` can be useful in working with the server, as all m

We have tried to make interactions with npg_porch as atomic as possible, so the data you send and the data you receive follow the same schema.

Security is necessary in order to prevent accidental misuse of npg_porch. An authorisation token can be provided to you by the maintainers, which you will then use to enable each request. Not implemented yet!
Security is necessary in order to prevent accidental misuse of npg_porch. An authorisation token can be provided to you by the maintainers, which you will then use to enable each request.

A note on HTTPS: Client libraries like `requests`, certain GUIs and Firefox will try to verify the server certificate authority. System-administered software are already configured correctly, but other packages installed by conda or pip may need to be told how the client may verify with a client certificate e.g. contained in `/usr/share/ca-certificates/`. It may also be useful to unset `https_proxy` and `HTTPS_PROXY` in your environment.

### Step 0 - get issued security tokens

Access to the service is loosely controlled with authorisation tokens. You will be issued with an admin token that enables you to register pipelines, and further tokens for pipeline-specific communication. Please do not share the tokens around and use them for purposes besides the specific pipeline. This will help us to monitor pipeline reliability and quality of service. Authorisation is achieved by HTTP Bearer Token:

`curl -L -H "Authorization: Bearer $TOKEN" https://$SERVER:$PORT`

### Step 1 - register your pipeline with npg_porch

Expand All @@ -38,7 +46,7 @@ You can name your pipeline however you like, but the name must be unique, and be
}
```

`url='npg_porch_server.sanger.ac.uk/pipelines'; curl -L -XPOST ${url} -H "content-type: application/json" -w " %{http_code}" -d @pipeline-def.json`
`url='https://$SERVER:$PORT/pipelines'; curl -L -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $ADMIN_TOKEN" -w " %{http_code}" -d @pipeline-def.json`

Keep this pipeline definition with your data, as you will need it to tell npg_porch which pipeline you are acting on.

Expand Down Expand Up @@ -110,9 +118,9 @@ Note that it is possible to run the same `task_input` with a different `pipeline
Now you want the pipeline to run once per specification, and so register the documents with npg_porch.

```bash
url='npg_porch_server.sanger.ac.uk/tasks'
url='https://$SERVER:$PORT/tasks'
for DOC in *.json; do
response=$(curl -w '%{http_code}' -L -XPOST ${url} -H "content-type: application/json" -d @${DOC}`)
response=$(curl -w '%{http_code}' -L -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $TOKEN" -d @${DOC}`)

# parsing the response is left as an exercise for the reader...
if [[ "$response_code" ne 201]]; then
Expand All @@ -128,7 +136,7 @@ use HTTP::Request;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new(POST => 'npg_porch_server.sanger.ac.uk/tasks');
my $request = HTTP::Request->new(POST => 'https://$SERVER:$PORT/tasks');
$request->content_type('application/json');
$request->header(Accept => 'application/json');
$request->content($DOC);
Expand Down Expand Up @@ -180,8 +188,8 @@ Supposing there are new tasks created every 24 hours, we then also need a client
Using the "claim" interface, you can ask npg_porch to earmark tasks that you intend to run. Others will remain unclaimed until this script or another claims them. Generally speaking, tasks are first-in first-out, so the first task you get if you claim one is the first unclaimed task npg_porch was told about.

```bash
url='npg_porch_server.sanger.ac.uk/tasks/claim'
response=$(curl -L -I -XPOST ${url} -H "content-type: application/json" -d @pipeline-def.json)
url='https://$SERVER:$PORT/tasks/claim'
response=$(curl -L -I -XPOST ${url} -H "content-type: application/json" -H "Authorization: Bearer $TOKEN" -d @pipeline-def.json)
```

Response body:
Expand Down Expand Up @@ -216,9 +224,10 @@ or
use JSON qw/decode_json/;

my $ua = LWP::UserAgent->new;
my $request = HTTP::Request->new(POST => 'npg_porch_server.sanger.ac.uk/tasks/claim');
my $request = HTTP::Request->new(POST => 'https://$SERVER:$PORT/tasks/claim');
$request->content_type('application/json');
$request->header(Accept => 'application/json');
$request->header(Authorization => "Bearer $TOKEN")

my $response = $ua->request($request);
if ($response->is_success) {
Expand Down
33 changes: 33 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
[build-system]
requires = ["setuptools>=61"]
build-backend = "setuptools.build_meta"

[project]
name = "npg_porch"
requires-python = ">=3.10"
authors = [{name="Marina Gourtovaia", email="[email protected]"}, {name="Kieron Taylor", email="[email protected]"}]
description = "API server for tracking unique workflow executions"
readme = "README.md"
license = {file = "LICENSE.md"}
dependencies = [
"aiosqlite",
"asyncpg",
"fastapi",
"pydantic > 2.0.0",
"pysqlite3",
"psycopg2-binary",
"sqlalchemy >2",
"ujson",
"uvicorn",
"uuid"
]
dynamic = ["version"]

[project.optional-dependencies]
test = [
"pytest",
"pytest-asyncio",
"requests",
"flake8",
"httpx"
]
7 changes: 6 additions & 1 deletion server/deploy_schema.py → scripts/deploy_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,12 @@
if schema_name is None:
schema_name = 'npg_porch'

engine = sqlalchemy.create_engine(db_url)
print(f'Deploying npg_porch tables to schema {schema_name}')

engine = sqlalchemy.create_engine(
db_url,
connect_args={'options': f'-csearch_path={schema_name}'}
)

npg.porchdb.models.Base.metadata.schema = schema_name
npg.porchdb.models.Base.metadata.create_all(engine)
75 changes: 75 additions & 0 deletions scripts/issue_token.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
#!/usr/bin/env python

import argparse
from sqlalchemy import create_engine, select
from sqlalchemy.orm import sessionmaker
from sqlalchemy.orm.exc import NoResultFound

from npg.porchdb.models import Token, Pipeline

parser = argparse.ArgumentParser(
description='Creates a token in the backend DB and returns it'
)

parser.add_argument(
'-H', '--host', help='Postgres host', required=True
)
parser.add_argument(
'-d', '--database', help='Postgres database', default='npg_porch'
)
parser.add_argument(
'-s', '--schema', help='Postgres schema', default='npg_porch'
)
parser.add_argument(
'-u', '--user', help='Postgres rw user', required=True
)
parser.add_argument(
'-p', '--password', help='Postgres rw password', required=True
)
parser.add_argument(
'-P', '--port', help='Postgres port', required=True
)
parser.add_argument(
'-n', '--pipeline', help='Pipeline name. If given, create '
)
parser.add_argument(
'-D', '--description', help='Description of token purpose', required=True
)

args = parser.parse_args()


db_url = f'postgresql+psycopg2://{args.user}:{args.password}@{args.host}:{args.port}/{args.database}'

engine = create_engine(db_url, connect_args={'options': f'-csearch_path={args.schema}'})
SessionFactory = sessionmaker(bind=engine)
session = SessionFactory()

token = None
pipeline = None

if args.pipeline:
try:
pipeline = session.execute(
select(Pipeline)
.where(Pipeline.name == args.pipeline)
).scalar_one()
except NoResultFound:
raise Exception(
'Pipeline with name {} not found in database'.format(args.pipeline)
)

token = Token(
pipeline=pipeline,
description=args.description
)
else:
token = Token(description=args.description)

session.add(token)
session.commit()

print(token.token)

session.close()
engine.dispose()
File renamed without changes.
File renamed without changes.
File renamed without changes.
45 changes: 45 additions & 0 deletions server/npg/porch/auth/token.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Copyright (C) 2022 Genome Research Ltd.
#
# Author: Kieron Taylor [email protected]
# Author: Marina Gourtovaia [email protected]
#
# This file is part of npg_porch
#
# npg_porch is free software: you can redistribute it and/or modify it
# under the terms of the GNU General Public License as published by the Free
# Software Foundation; either version 3 of the License, or (at your option) any
# later version.
#
# This program is distributed in the hope that it will be useful, but WITHOUT
# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
# FOR A PARTICULAR PURPOSE. See the GNU General Public License for more
# details.
#
# You should have received a copy of the GNU General Public License along with
# this program. If not, see <http://www.gnu.org/licenses/>.

import logging
from fastapi import Depends
from fastapi.security import HTTPBearer
from fastapi import HTTPException

from npg.porchdb.connection import get_CredentialsValidator
from npg.porchdb.auth import CredentialsValidationException

auth_scheme = HTTPBearer()

async def validate(
creds = Depends(auth_scheme),
validator = Depends(get_CredentialsValidator)
):

token = creds.credentials
p = None
try:
p = await validator.token2permission(token)
except CredentialsValidationException as e:
logger = logging.getLogger(__name__)
logger.warning(str(e))
raise HTTPException(status_code=403, detail="Invalid token")

return p
Loading
Loading