Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure dependencies / image building #104

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,5 @@
.direnv
__pycache__
.mypy_cache
.pytest_cache
.pytest_cache
pctasks_frontend/node_modules
61 changes: 9 additions & 52 deletions Dockerfile.task_base
Original file line number Diff line number Diff line change
@@ -1,69 +1,26 @@
FROM ubuntu:20.04
FROM python:3.10.6-buster
ARG REQUIREMENTS_BASE=requirements.base.txt

# Setup timezone info
ENV PIP_NO_CACHE_DIR=1
ENV TZ=UTC

ENV LC_ALL=C.UTF-8
ENV LANG=C.UTF-8

RUN ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && echo $TZ > /etc/timezone

RUN apt-get update && apt-get install -y software-properties-common

RUN add-apt-repository ppa:ubuntugis/ppa && \
apt-get update && \
apt-get install -y build-essential python3-dev python3-pip \
jq unzip ca-certificates wget curl git && \
apt-get autoremove && apt-get autoclean && apt-get clean

RUN update-alternatives --install /usr/bin/python python /usr/bin/python3 10

# See https://github.com/mapbox/rasterio/issues/1289
ENV CURL_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt

# Install Python 3.8
RUN curl -L -O "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" \
&& bash "Mambaforge-$(uname)-$(uname -m).sh" -b -p /opt/conda \
&& rm -rf "Mambaforge-$(uname)-$(uname -m).sh"

ENV PATH /opt/conda/bin:$PATH
ENV LD_LIBRARY_PATH /opt/conda/lib/:$LD_LIBRARY_PATH

RUN mamba install -y -c conda-forge python=3.8 gdal pip setuptools cython numpy
COPY ${REQUIREMENTS_BASE} /requirements.base.txt
RUN python3 -m pip install -U pip \
&& python3 -m pip install -r /requirements.base.txt

RUN python -m pip install --upgrade pip

# Install common packages
COPY requirements-task-base.txt /tmp/requirements.txt
RUN python -m pip install --no-build-isolation -r /tmp/requirements.txt

#
# Copy and install packages
#

COPY pctasks/core /opt/src/pctasks/core
RUN cd /opt/src/pctasks/core && \
pip install .

COPY pctasks/cli /opt/src/pctasks/cli
RUN cd /opt/src/pctasks/cli && \
pip install .

COPY pctasks/task /opt/src/pctasks/task
RUN cd /opt/src/pctasks/task && \
pip install .

COPY pctasks/client /opt/src/pctasks/client
RUN cd /opt/src/pctasks/client && \
pip install .

COPY pctasks/ingest /opt/src/pctasks/ingest
RUN cd /opt/src/pctasks/ingest && \
pip install .

COPY pctasks/dataset /opt/src/pctasks/dataset
RUN cd /opt/src/pctasks/dataset && \
pip install .

COPY requirements.pctasks.txt /opt/src/requirements.pctasks.txt
RUN cd /opt/src && python -m pip install -r requirements.pctasks.txt

# Setup Python Path to allow import of test modules
ENV PYTHONPATH=/opt/src:$PYTHONPATH
Expand Down
74 changes: 0 additions & 74 deletions datasets/goes/goes-glm/Dockerfile

This file was deleted.

8 changes: 8 additions & 0 deletions datasets/goes/goes-glm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,12 @@ And registered with

```console
$ pctasks workflow create datasets/goes/goes-glm/workflows/goes-glm-update.yaml
$ pctasks workflow create datasets/goes/goes-glm/workflows/goes-glm-update-blue.yaml
```

## Image building

```
./scripts/generate-requirements datasets/goes/goes-glm/requirements.txt
docker build -t <registry>/pctasks-goes-glm:<tag> -f datasets/goes/goes-glm/Dockerfile .
```
5 changes: 0 additions & 5 deletions datasets/noaa-mrms-qpe/Dockerfile

This file was deleted.

10 changes: 9 additions & 1 deletion datasets/noaa-mrms-qpe/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,4 +36,12 @@ They can be registered with

```bash
$ ls datasets/noaa-mrms-qpe/workflows/* | xargs -I {} pctasks workflow update {}
```
```

## Image building

```
./scripts/generate-requirements datasets/noaa-mrms-qpe/requirements.txt
docker build -t <registry>/pctasks-noaa-mrms-qpe:<tag> -f datasets/goes/goes-glm/Dockerfile .
```

36 changes: 34 additions & 2 deletions docs/user_guide/runtime.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,11 @@
## Specifying requirements

In addition to the set of packages provided by the base docker image, you can specify a list of additional packages
to install with a `requirements.txt` file. This can be done in a dataset configuration or in a task configuration.
to install with a `requirements.txt` file.

```{note} Installing extra dependencies at runtime should only be done when developing a workflow. See [](#building-images) for transitioning to a production-ready workflow.
```
This can be done in a dataset configuration or in a task configuration.

```yaml
# file: naip/dataset.yaml
Expand Down Expand Up @@ -56,4 +60,32 @@ Behind the scenes, when you submit a workflow generated from this `dataset.yaml`
the module is uploaded to Azure Blob Storage. Before executing your task, the
worker downloads that module and places it in a location that's importable by
the Python interpreter. The uploaded module / package is prioritized over any
existing modules with the same import name.
existing modules with the same import name.

## Building Images

`pctasks` lets you specify a `requirements.txt` with additional dependencies to
install at runtime. This is convenient for development, but installing
additional dependencies isn't appropriate for production environments that need
to run reliably at scale. For that, we'll build a container image from our
requirements.

First, use `./scripts/generate-requirements` to generate the `requirements.txt`
file. Provide any additional requirements files you need to this script:

```
$ ./scripts/generate-requirements datasets/goes/goes-glm/requirements.txt
```

Next, build and upload the container image:

```
$ docker build -t pctasks-goes-glm:latest -f datasets/goes/goes-glm/Dockerfile .
$ docker push ...
```

Alternatively, build the docker container in Azure:

```
$ az acr build -r "registry" -g "resource-group -t 'pctasks-<dataset>:<tag>' -f Dockerfile.task_base .
```
1 change: 1 addition & 0 deletions pctasks/cli/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
file:./pctasks/core#egg=pctasks.core
TomAugspurger marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 2 additions & 0 deletions pctasks/client/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
file:./pctasks/core#egg=pctasks.core
file:./pctasks/cli#egg=pctasks.cli
3 changes: 3 additions & 0 deletions pctasks/dataset/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
file:./pctasks/task#egg=pctasks.task
file:./pctasks/client#egg=pctasks.client
file:./pctasks/ingest#egg=pctasks.ingest
5 changes: 5 additions & 0 deletions pctasks/dev/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
file:./pctasks/task#egg=pctasks.task
file:./pctasks/client#egg=pctasks.client
file:./pctasks/ingest#egg=pctasks.ingest
file:./pctasks/run#egg=pctasks.run
file:./pctasks/cli#egg=pctasks.cli
1 change: 1 addition & 0 deletions pctasks/ingest/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
file:./pctasks/client#egg=pctasks.client
2 changes: 2 additions & 0 deletions pctasks/ingest_task/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
file:./pctasks/task#egg=pctasks.task
file:./pctasks/ingest#egg=pctasks.ingest
1 change: 1 addition & 0 deletions pctasks/notify/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
file:./pctasks/core#egg=pctasks.core
1 change: 1 addition & 0 deletions pctasks/router/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
file:./pctasks/core#egg=pctasks.core
3 changes: 3 additions & 0 deletions pctasks/run/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
file:./pctasks/core#egg=pctasks.core
file:./pctasks/task#egg=pctasks.task
file:./pctasks/client#egg=pctasks.client
2 changes: 2 additions & 0 deletions pctasks/server/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
file:./pctasks/core#egg=pctasks.core
file:./pctasks/run#egg=pctasks.run
2 changes: 2 additions & 0 deletions pctasks/task/dev_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
file:./pctasks/core#egg=pctasks.core
file:./pctasks/cli#egg=pctasks.cli
3 changes: 3 additions & 0 deletions requirements-dev.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ pystac[validation]==1.*
azure-functions
azure-functions-durable

# for generating requirements files for Docker
pip-tools

# Mypy stubs

types-cachetools
Expand Down
Loading