This repository hosts Data & Technology Service's Airflow code. We use Airflow as our primary orchestration platform for scheduling and monitoring our automated scripts and integrations.
The production Airflow instance is available at https://airflow.austinmobility.io/
. It requires COA network access.
Our Airflow instance is hosted on dts-int-data-p01
and can be found in /srv/atd-airflow
. Local development is available, and instructions are below.
The stack is composed of:
- Airflow v2 (Docker image)
- HAProxy to distribute HTTP requests over the stack
- Flower workers dashboard to monitor remote workers
- DTS Airflow
-
Clone this repository and start a new development branch based on the
production
branch. -
Create a
.env
file with the following variables:
AIRFLOW_UID=0
ENVIRONMENT=development
_AIRFLOW_WWW_USER_USERNAME=admin
_AIRFLOW_WWW_USER_PASSWORD=<Pick your initial admin password here>
AIRFLOW_PROJ_DIR=<The absolute path of your Airflow repository checkout>
# this fernet key is for testing purposes only
_AIRFLOW__CORE__FERNET_KEY=PTkIRwL-c46jgnaohlkkXfVikC-roKa95ipXfqST7JM=
_AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
OP_API_TOKEN=<Get from 1Password entry named "TPW DTS API Accessible Secrets 1Password Connect Server Access Token">
OP_CONNECT=<Get from 1Password entry named "TPW DTS API Accessible Secrets 1Password Connect Server Access Token">
OP_VAULT_ID=<Get from 1Password entry named "TPW DTS API Accessible Secrets 1Password Connect Server Access Token">
DOCKER_HUB_USERNAME=<Get from 1Password entry named "Docker Hub">
DOCKER_HUB_TOKEN=<A docker hub access token assigned to specifically to you>
- Start the Docker the stack (optionally use the
-d
flag to run containers in the background):
docker compose up -d
-
Log in to the dashboard at
http://localhost:8080
using the username and password set in your.env
file. -
The Flower workers' status page available at
http://localhost:8081
Once the local stack is up and running, you can start writing a new DAG by adding a script to the dags/
folder. Any new utilities can be placed in the dags/utils/
folder. As you develop, you can check the local Airflow webserver for any errors that are encountered when loading the DAG.
If any new files that are not DAGs or folders that don't contain DAGs are placed within the dags/
folder, they should be added to the dags/.airflowignore
file so the stack doesn't log errors about files that are not recognized as DAGs.
Once a DAG is recognized by Airflow as valid, it will appear in the local webserver where you can trigger the DAG for testing.
You can also use this example command to execute a DAG in development. This is the CLI version of triggering the DAG manually in the web UI. Exchange <dag-id>
with the ID you've given your DAG in the DAG decorator or configuration.
docker compose run --rm airflow-cli dags test <dag-id>
If a DAG corresponds with another repo, be sure to add a tag with the naming convention of repo:name-of-the-repo
.
Never commit directly to the production
branch. Commit your changes to a development branch, push the branch to Github, and open a pull request against production
. Once your PR is reviewed and approved, merge the branch to production
.
Once merged, you will need to connect to our production Airflow host on the COA network, then pull down your changes from Github. Airflow will automatically load any DAG changes within five minutes. Activate your DAG through the Airflow web interface at https://airflow.austinmobility.io/
.
# dts-int-data-p01
# become the superuser
su -;
# enter into the production airflow directory
cd /srv/atd-airflow;
# pull the changes
git pull;
# return to user-land
exit;
The production Airflow deployment uses a second Docker compose file which provides haproxy configuration overrides. To start the production docker compose stack use you must load both files in order:
# dts-int-data-p01
# become the superuser
su -;
# enter into the production airflow directory
cd /srv/atd-airflow;
# pull the repository changes
git pull;
# pull the fresh production image
docker compose pull;
# stop the Docker stack
docker compose stop;
# start the Docker stack
docker compose -f docker-compose.yaml -f docker-compose-production.yaml up -d;
# you can watch the logs as the stack starts (ctrl + c to exit logging without affecting the stack)
docker compose logs -f;
# return to user-land
exit;
Once the stack comes back up, you can monitor the scheduled Airflow DAGs.
Utilities used by multiple DAGs
Secrets stored in 1Password can be directly integrated into Airflow DAGs. As a best practice, DAGs should always use the 1Password utilities when accessing secrets.
The 1Password utility is a light wrapper of the 1Password Connect Python SDK methods. The utility communicates with our self-hosted 1Password connect server using the OP
environment variables set in your .env
file.
You can model your code off of existing DAGs which use our 1Password utility.
For example, this snippet fetches 1Password secrets from a task so that they can be used by subsequent tasks.
from utils.onepassword import get_env_vars_task
REQUIRED_SECRETS = {
"SOME_SECRET": {
"opitem": "My Secret 1Pass Item", # must match item name in 1Password vault
"opfield": f"My Secret Value", # must match field name in 1Password item
},
}
with DAG(
dag_id=f"my_dag",
# ...other DAG settings
) as dag:
env_vars = get_env_vars_task(REQUIRED_SECRETS)
task_1 = DockerOperator(
task_id="my_docker_task",
image="some-image-name",
auto_remove=True,
command="hello_world.py",
environment=env_vars,
tty=True,
force_pull=True,
)
The Slack operator utility makes use of the integration between the Airflow and a Slack app webhook. The purpose of the utility is to add Slack notifications to DAGs using the callback parameters. Failure, critical failure, and success notifications are implemented.
To configure the Slack operator in your local instance, from the Airflow UI go to Admin > Connections and choose Slack API as the connection type. You can find the remaining settings in 1Password under the Airflow - Slack Bot item.
To test the Slack operator locally, see the DAG named test_slack_notifier
.
- 🐚 get a shell on a worker, for example
docker exec -it airflow-airflow-worker-1 bash
- ⛔ Stop all containers and execute this to reset your local database.
- Do not run in production unless you feel really great about your backups.
- This will reset the history of your dag runs and switch states.
docker compose down --volumes --remove-orphans
Start the production docker compose stack with haproxy overrides:
docker compose -f docker-compose.yaml -f docker-compose-production.yaml up -d
Follow these steps to update the Airflow docker step. Reasons for doing this include:
- Adding new requirements to the
requirements.txt
for the local stack - Modifying the
Dockerfile
to upgrade the Airflow version - Modifying the haproxy configuration
- Read the "Significant Changes" sections of the Airflow release notes between the versions in question: https://github.com/apache/airflow/releases/
- Apache Airflow is a very active project, and these release notes are pretty dense. Keeping a regular update cadence will be helpful to keep up the task of updating airflow from becoming an "information overload" job.
- Create a local branch with the Dockerfile modified to the version you intend to test
- In the docker-compose.yaml, replace
image: atddocker/atd-airflow:production
withbuild: .
- Build the Docker images locally:
docker compose build --no-cache
- Bring up the services and check the logging for errors and see that everything runs as expected:
docker compose up
- Check if you can reach the Airflow dashboard at
http://localhost:8080
- If any updates affect the Slack notifier, see the instructions to test it
- Bring down the services:
docker compose down
- In the docker-compose.yaml, switch
build: .
back toimage: atddocker/atd-airflow:production
- Push your branch and create a PR for review
- After approval, merge and update the stack using the instructions in the Moving to production section
This Airflow stack uses HAProxy as a reverse proxy to terminate incoming SSL/TLS connections and then to route the requests over HTTP to the appropriate backend web service. The SSL certificates are stored in the haproxy/ssl
folder and are maintained by a bash
script, executed monthly by cron
. This script uses the EFF's CertBot service to renew and replace the SSL certificates used by HAProxy to secure the Airflow services.
The Airflow stack contains the following web services:
- The Airflow main web UI
- The Airflow workers dashboard
In local development, we don't have any special host names we can use to differentiate which back-end service a request needs to be routed to, so we do this by listening on multiple, local ports. Depending on what port you request from, the local HAProxy will pick the correct backend web service to send your request to. Local development also does not require auth for the Flower service.
In production, however, we do have different host names assigned for each resource, so we're able to listen on a single port. Based on the hostname specified in the HTTP header which available to HAProxy after terminating the SSL connection, the proxy is able to pick which backend to route the request to. Production does require auth for the Flower service, and the username and password is set in docker-compose.yaml
where it reuses the Airflow username and password set in the stack's environment variables.
- The service needs to be restarted when the SSL certificates are rotated. This is normally handled by the automated renewal scripts.
- The service can be restarted independently of the rest of the stack if needed, as well. This can be done using
docker compose stop haproxy; docker compose build haproxy; docker compose up -d haproxy;
or similar, for example.
- Make it disable all DAGs on start locally so it fails to safe
- Create remote worker image example
- Use
docker compose
newprofile
support