Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Compose CeleryExecutor example #8621

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions templates/docker-compose/CeleryExecutor/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Created by .ignore support plugin (hsz.mobi)
.idea
airflow-data/logs
airflow-data/dags/__pycache__/
29 changes: 29 additions & 0 deletions templates/docker-compose/CeleryExecutor/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# docker-compose-airflow
Docker Compose Apache Airflow (Official Docker Images) with CeleryExecutor, InitDB and InitUser

Ideal for local development or small scale personal deployments.

Prerequisites: Docker and docker-compose
turbaszek marked this conversation as resolved.
Show resolved Hide resolved

### How to deploy:

**Step 1:** Clone this Repo and open terminal

**Step 2:** Go through .env file, init_airflow_setup.sh, docker-compose.yml file to change settings according to your preference. Or you can just keep them as it is for local development.

**Step 3:** Run `docker-compose up -d`

**Step 4:** Run `sh init_airflow_setup.sh` (Run this only for initial deployment, Airflow container will be in restart mode till this script successfully executed.)

**Step 5:** Go to http://localhost:8080 and login with user: _airflow_test_user_ and password: _airflow_test_password_ as specified in init_airflow_setup.sh script

**Step 6:** Enable and Run few dags and monitor Celery workers at http://localhost:5555


Airflow Environment variables are maintained in .env file where you can modify Airflow Config as specified in the link https://airflow.apache.org/docs/stable/howto/set-config.html

You can modify Executor modes by setting value of .env variable - AIRFLOW__CORE__EXECUTOR=CeleryExecutor

If you have existing airflow db, you can connect to it by setting .env variable - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
(If you are using an existing Airflow DB. Do NOT run init_airflow_setup.sh)

Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
from datetime import timedelta
# The DAG object; we'll need this to instantiate a DAG
from airflow import DAG
# Operators; we need this to operate!
from airflow.operators.bash_operator import BashOperator
from airflow.utils.dates import days_ago
# These args will get passed on to each operator
# You can override them on a per-task basis during operator initialization

default_args = {
'owner': 'airflow',
'depends_on_past': False,
'start_date': days_ago(2),
'email': ['[email protected]'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
# 'queue': 'bash_queue',
# 'pool': 'backfill',
# 'priority_weight': 10,
# 'end_date': datetime(2016, 1, 1),
# 'wait_for_downstream': False,
# 'dag': dag,
# 'sla': timedelta(hours=2),
# 'execution_timeout': timedelta(seconds=300),
# 'on_failure_callback': some_function,
# 'on_success_callback': some_other_function,
# 'on_retry_callback': another_function,
# 'sla_miss_callback': yet_another_function,
# 'trigger_rule': 'all_success'
}
dag = DAG(
'tutorial',
default_args=default_args,
description='A simple tutorial DAG',
schedule_interval=timedelta(days=1),
)

# t1, t2 and t3 are examples of tasks created by instantiating operators
t1 = BashOperator(
task_id='print_date',
bash_command='date',
dag=dag,
)

t2 = BashOperator(
task_id='sleep',
depends_on_past=False,
bash_command='sleep 5',
retries=3,
dag=dag,
)
dag.doc_md = __doc__

t1.doc_md = """\
#### Task Documentation
You can document your task using the attributes `doc_md` (markdown),
`doc` (plain text), `doc_rst`, `doc_json`, `doc_yaml` which gets
rendered in the UI's Task Instance Details page.
![img](http://montcs.bloomu.edu/~bobmon/Semesters/2012-01/491/import%20soul.png)
"""
templated_command = """
{% for i in range(5) %}
echo "{{ ds }}"
echo "{{ macros.ds_add(ds, 7)}}"
echo "{{ params.my_param }}"
{% endfor %}
"""

t3 = BashOperator(
task_id='templated',
depends_on_past=False,
bash_command=templated_command,
params={'my_param': 'Parameter I passed in'},
dag=dag,
)

t1 >> [t2, t3]

Empty file.
Empty file.
18 changes: 18 additions & 0 deletions templates/docker-compose/CeleryExecutor/airflow-env-variables.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@


# AIRFLOW CORE

AIRFLOW__CORE__EXECUTOR=CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://postgres:postgres@postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
AIRFLOW__CORE__LOAD_EXAMPLES=True

#AIRFLOW WEBSERVER
AIRFLOW__WEBSERVER__RBAC=True
AIRFLOW__WEBSERVER__EXPOSE_CONFIG=True


# AIRFLOW CELERY
AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
AIRFLOW__CELERY__RESULT_BACKEND=db+postgresql://postgres:postgres@postgres:5432/airflow

97 changes: 97 additions & 0 deletions templates/docker-compose/CeleryExecutor/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
version: '3'
services:
postgres:
image: postgres:latest
container_name: postgres_cont
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=airflow
- POSTGRES_PORT=5432
ports:
- 5432:5432

redis:
image: redis:latest
container_name: redis_cont
restart: always
ports:
- 6379:6379

airflow:
image: apache/airflow:1.10.10
container_name: airflow_cont
env_file:
- airflow-env-variables.env
restart: always
command: webserver
ports:
- 8080:8080
volumes:
- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins

airflow-scheduler:
image: apache/airflow:1.10.10
container_name: airflow_scheduler_cont
env_file:
- airflow-env-variables.env
restart: always
command: scheduler
volumes:
- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins

airflow-worker1:
image: apache/airflow:1.10.10
container_name: airflow_worker1_cont
env_file:
- airflow-env-variables.env
restart: always
command: worker
volumes:
- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins

airflow-worker2:
image: apache/airflow:1.10.10
container_name: airflow_worker2_cont
env_file:
- airflow-env-variables.env
restart: always
command: worker
volumes:
- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins


airflow-worker3:
image: apache/airflow:1.10.10
container_name: airflow_worker3_cont
env_file:
- airflow-env-variables.env
command: worker
restart: always
volumes:
- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins

airflow-flower:
image: apache/airflow:1.10.10
container_name: airflow_flower_cont
restart: always
volumes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@resdevd resdevd Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did contemplate/consider using extension-fields, however IMO we shouldn't incorporate extension-fields for this usecase because of following reasons.

  • Most production environments like ECS/Fargate and Docker swarm don't support them unlike docker env variables.
  • Users might want to use different images and volumes for different airflow services. (Airflow workers might need java, specific python packages etc.,)
  • Much of code duplication is eliminated in form of .env master file already.
  • Readability and Adoption of extension-fields is still lagging behind with majority of docker/docker-compose users.
  • Container names should be unique.
  • Advanced users can always configure the config according to their needs.

So I believe especially for this usecase, extension-fields come under "premature optimization".

“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

- ./airflow-data/dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins
env_file:
- airflow-env-variables.env
command: flower
ports:
- 5555:5555

20 changes: 20 additions & 0 deletions templates/docker-compose/CeleryExecutor/init_airflow_setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/sh

# WARNING: Run this script only during initial db setup. DO NOT run this script on an existing Airflow DB.

IS_INITDB=True
AIRFLOW_USER=airflow_test_user
AIRFLOW_PASSWORD=airflow_test_password
[email protected]

if [ $IS_INITDB ]; then

echo "Initializing Airflow DB setup and Admin user setup because value of IS_INITDB is $IS_INITDB"
echo " Airflow admin username will be $AIRFLOW_USER"

docker exec -ti airflow_cont airflow initdb && echo "Initialized airflow DB"
docker exec -ti airflow_cont airflow create_user --role Admin --username $AIRFLOW_USER --password $AIRFLOW_PASSWORD -e $AIRFLOW_USER_EMAIL -f airflow -l airflow && echo "Created airflow Initial admin user with username $AIRFLOW_USER"

else
echo "Skipping InitDB and InitUser setup because value of IS_INITDB is $IS_INITDB"
fi