Docker Compose CeleryExecutor example #8621

resdevd · 2020-04-29T02:15:03Z

This is a docker-compose template ideal for local development using official Airflow Docker images with easy deploy instructions. Addressing issue #8605

Make sure to mark the boxes below before creating PR: [x]

Description above provides context of the change
Unit tests coverage for changes (not needed for documentation changes)
Target Github ISSUE in description if exists
Commits follow "How to write a good git commit message"
Relevant documentation is updated including usage instructions.
I will engage committers as explained in Contribution Workflow Example.

In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

boring-cyborg · 2020-04-29T02:15:06Z

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
Here are some useful points:

Pay attention to the quality of your code (flake8, pylint and type annotations). Our pre-commits will help you with that.
In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
Be sure to read the Airflow Coding style.
Apache Airflow is a community-driven project and together we are making it better 🚀.
In case of doubts contact the developers at:
Mailing List: [email protected]
Slack: https://apache-airflow-slack.herokuapp.com/

resdevd · 2020-04-29T02:19:31Z

@potiuk FYI

mik-laj · 2020-04-29T05:58:36Z

templates/docker-compose/CeleryExecutor/docker-compose.yml

+    image: apache/airflow:1.10.10
+    container_name: airflow_flower_cont
+    restart: always
+    volumes:


Can you use extension-fields to reduce code duplication?
https://medium.com/@kinghuang/docker-compose-anchors-aliases-extensions-a1e4105d70bd#f191

I did contemplate/consider using extension-fields, however IMO we shouldn't incorporate extension-fields for this usecase because of following reasons.

Most production environments like ECS/Fargate and Docker swarm don't support them unlike docker env variables.

Users might want to use different images and volumes for different airflow services. (Airflow workers might need java, specific python packages etc.,)

Much of code duplication is eliminated in form of .env master file already.

Readability and Adoption of extension-fields is still lagging behind with majority of docker/docker-compose users.

Container names should be unique.

Advanced users can always configure the config according to their needs.

So I believe especially for this usecase, extension-fields come under "premature optimization".

“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth

templates/docker-compose/CeleryExecutor/README.md

BasPH · 2020-04-30T06:06:25Z

I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment?

My ideas:

Users will likely want to use it as an example and change it for their specific deployment
I think these should be as simple as possible to run, i.e. a one-liner in the readme
I think the folder name "templates" is incorrect here, no templating is being done? Something like "examples/docker", or even "docker-compose" in the root of the project makes more sense IMO
There are already plenty of example DAGs, let's not add yet another example

To address your points about the extension fields:

Most production environments like ECS/Fargate and Docker swarm don't support them unlike docker env variables.

I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.

Users might want to use different images and volumes for different airflow services. (Airflow workers might need java, specific python packages etc.,)

Okay, why does this change the argument for using the extension fields?

Much of code duplication is eliminated in form of .env master file already.

If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.

Readability and Adoptation of extension-fields is still lagging behind with majority of docker/docker-compose users.

How so?

Container names should be unique.

You still have unique container names.

Advanced users can always configure the config according to their needs.

Yes.

See the following docker-compose file using extension fields, I think it's pretty readable:

version: '3'

# ========================== AIRFLOW ENVIRONMENT VARIABLES ===========================
x-environment: &airflow_environment
  - AIRFLOW__CORE__EXECUTOR=LocalExecutor
  - AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
  - AIRFLOW__CORE__LOAD_EXAMPLES=False
  - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@postgres:5432/airflow
  - AIRFLOW__CORE__STORE_DAG_CODE=True
  - AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
  - AIRFLOW__WEBSERVER__RBAC=True
# ========================== /AIRFLOW ENVIRONMENT VARIABLES ==========================

services:
  postgres:
    image: postgres:12-alpine
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow
    ports:
      - "5432:5432"

  webserver:
    image: apache/airflow:1.10.10-python3.7
    ports:
      - "8080:8080"
    environment: *airflow_environment
    command: webserver

  scheduler:
    image: apache/airflow:1.10.10-python3.7
    environment: *airflow_environment
    command: scheduler

For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of sleep 5:

  initdb_adduser:
    image: apache/airflow:1.10.10-python3.7
    depends_on:
      - postgres
    environment: *airflow_environment
    entrypoint: /bin/bash
    # The webserver initializes permissions, so sleep for that to (approximately) be finished
    # No disaster if the webserver isn't finished by then, but create_user will start spitting out errors until the permissions exist
    command: -c 'airflow initdb && sleep 5 && airflow create_user --role Admin --username airflow --password airflow -e [email protected] -f airflow -l airflow'

resdevd · 2020-04-30T08:11:17Z

I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment?

IMO the simplest way to run airflow is still pip install with airflow commands. A "hello world" example should not be requiring users to install/learn docker and docker-compose.

I think the folder name "templates" is incorrect here, no templating is being done? Something like "examples/docker", or even "docker-compose" in the root of the project makes more sense IMO

Sure, like @turbaszek mentioned we could use a name like "deploy" etc., where more docker-compose config with LocalExecutor and potentially k8s/helm config can be added in future. IMO "examples" would be confusing as it might refer to example dags and operators.

There are already plenty of example DAGs, let's not add yet another example

It wasn't about adding an example, the goal was to show usage of docker volumes for dags and plugins.

I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.

This PR is addressing #8605 "Add Production-ready docker compose for the production image".
IMO CeleryExecutor is probably most advanced config in terms of docker-compose, It's easy to change from CeleryExecutor to LocalExecutor than vice-versa.

Okay, why does this change the argument for using the extension fields?

I was referring to the code context that was being referred to regarding extension fields in the original review.

If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.

Like mentioned above, this PR is addressing an issue where the goal is to have docker-compose close to production. Having fernet keys, sensitive info that live in the docker-compose is not a best practice either (in terms of .gitignore config and migration to ECS/Fargate, Docker Swarm etc., ).

Readability and Adoptation of extension-fields is still lagging behind with majority of docker/docker-compose users.

How so?

Glad you asked. Prior to Airflow's official Docker images, the two most popular docker-compose airflow code include https://github.com/puckel/docker-airflow and https://github.com/bitnami/bitnami-docker-airflow . I don't think extension fields are used there.
Not only that, I don't think you'd find them being used even in official Docker's repo https://github.com/docker/awesome-compose extensively. You might notice same behavior with some of the most popular docker-compose code like https://github.com/deviantony/docker-elk and https://github.com/vegasbrianc/prometheus etc.,

For the context, Extension fields were introduced in docker-compose 3.4 in 2017.

You still have unique container names.

I was referring to the code context that was being referred to regarding extension fields in the original review.

For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of sleep 5:

This is a separate issue #8606 , in that case if env variables would be used for this purpose then we can git rid of the init script entirely. Besides containers share cpu, memory, network and io, I'm not sure adding another container for a one time action is the best way to go.

gpongracz · 2020-05-03T21:26:42Z

Looks like the entrypoint.sh is not doing an initdb or upgradedb prior to the command like is found in puckel
https://github.com/puckel/docker-airflow/blob/master/script/entrypoint.sh#L110
I think that the puckel entrypoint script is very elegant and may help inform design of this one...
In the mean time I have written a docker-compose for a local executor definition that overcomes this by setting restart to on-failure and starting a container which runs an initdb which then stops after successfully completing - a bit of a cludge but it gets the current image working...
This is certainly not a production script but it works with the current image
I've commented out a couple of optional mounts for dags and in my case aws keys
If a feature similar to the puckel entrypoint above is developed then the webserver and scheduler definitions in the following docker-compose could be collapsed into a single container

docker-compose.yml (working example mitigating for entrypoint.sh lacking initdb)
version: '3.7'
services:
postgres:
image: postgres:10
restart: on-failure
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
logging:
options:
max-size: 10m
max-file: "3"
ports:
- "5432:5432"
initdb:
image: apache/airflow:latest
restart: on-failure
depends_on:
- postgres
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
logging:
options:
max-size: 10m
max-file: "3"
command: initdb
webserver:
image: apache/airflow:latest
restart: on-failure
depends_on:
- postgres
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
- AIRFLOW__CORE__LOAD_EXAMPLES=True
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
logging:
options:
max-size: 10m
max-file: "3"

volumes:

- ./dags:/opt/airflow/dags

- ~/.aws:/home/airflow/.aws

    ports:
        - "8080:8080"
    command: webserver
scheduler:
    image: apache/airflow:latest
    restart: on-failure
    depends_on:
        - postgres
    environment:
        - AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
        - AIRFLOW__CORE__LOAD_EXAMPLES=True
        - AIRFLOW__CORE__EXECUTOR=LocalExecutor
        - AIRFLOW__WEBSERVER__BASE_URL=http://localhost:8080
    logging:
        options:
            max-size: 10m
            max-file: "3"

volumes:

- ./dags:/opt/airflow/dags

- ~/.aws:/home/airflow/.aws

    command: scheduler

resdevd · 2020-05-04T01:30:08Z

entrypoint.sh is maintained in dockerfile itself. This PR is docker-compose setup using Airflow official Docker Image.
If you explore the code, you can see one time setup of initdb and inituser via docker-exec into main airflow container via a shell script.
The idea is users who need initdb and intiuser setup can run that script one time after deploying docker compose, or users can just use docker-compose by replacing env variables with an existing db connection string.
In this PR, the containers are in restart mode, so users can run that script while docker-compose is active. IMO we shouldn't waste resources by having a container run just for one time action. In this PR, users won't have to rebuild images by changing entrypoint.

stale · 2020-06-20T03:29:26Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

resdevd added 4 commits April 28, 2020 18:32

Docker Compose CeleryExecutor example

6930905

Docker Compose CeleryExecutor example

f05b395

Docker Compose CeleryExecutor example

01faeff

readme update

0282e7a

mik-laj reviewed Apr 29, 2020

View reviewed changes

turbaszek linked an issue Apr 29, 2020 that may be closed by this pull request

Add Production-ready docker compose for the production image #8605

Closed

3 tasks

turbaszek removed a link to an issue Apr 29, 2020

Add Production-ready docker compose for the production image #8605

Closed

3 tasks

turbaszek reviewed Apr 29, 2020

View reviewed changes

templates/docker-compose/CeleryExecutor/README.md Show resolved Hide resolved

BasPH mentioned this pull request May 3, 2020

[AIRFLOW-8605] Added docker compose #8689

Closed

6 tasks

stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 20, 2020

stale bot closed this Jun 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker Compose CeleryExecutor example #8621

Docker Compose CeleryExecutor example #8621

resdevd commented Apr 29, 2020

boring-cyborg bot commented Apr 29, 2020

resdevd commented Apr 29, 2020

mik-laj Apr 29, 2020

resdevd Apr 29, 2020 •

edited

Loading

BasPH commented Apr 30, 2020

resdevd commented Apr 30, 2020 •

edited

Loading

gpongracz commented May 3, 2020

resdevd commented May 4, 2020 •

edited

Loading

stale bot commented Jun 20, 2020

Docker Compose CeleryExecutor example #8621

Docker Compose CeleryExecutor example #8621

Conversation

resdevd commented Apr 29, 2020

boring-cyborg bot commented Apr 29, 2020

resdevd commented Apr 29, 2020

mik-laj Apr 29, 2020

Choose a reason for hiding this comment

resdevd Apr 29, 2020 • edited Loading

Choose a reason for hiding this comment

BasPH commented Apr 30, 2020

resdevd commented Apr 30, 2020 • edited Loading

gpongracz commented May 3, 2020

volumes:

- ./dags:/opt/airflow/dags

- ~/.aws:/home/airflow/.aws

volumes:

- ./dags:/opt/airflow/dags

- ~/.aws:/home/airflow/.aws

resdevd commented May 4, 2020 • edited Loading

stale bot commented Jun 20, 2020

resdevd Apr 29, 2020 •

edited

Loading

resdevd commented Apr 30, 2020 •

edited

Loading

resdevd commented May 4, 2020 •

edited

Loading