-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docker Compose CeleryExecutor example #8621
Conversation
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst)
|
@potiuk FYI |
image: apache/airflow:1.10.10 | ||
container_name: airflow_flower_cont | ||
restart: always | ||
volumes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use extension-fields to reduce code duplication?
https://medium.com/@kinghuang/docker-compose-anchors-aliases-extensions-a1e4105d70bd#f191
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did contemplate/consider using extension-fields, however IMO we shouldn't incorporate extension-fields for this usecase because of following reasons.
- Most production environments like ECS/Fargate and Docker swarm don't support them unlike docker env variables.
- Users might want to use different images and volumes for different airflow services. (Airflow workers might need java, specific python packages etc.,)
- Much of code duplication is eliminated in form of .env master file already.
- Readability and Adoption of extension-fields is still lagging behind with majority of docker/docker-compose users.
- Container names should be unique.
- Advanced users can always configure the config according to their needs.
So I believe especially for this usecase, extension-fields come under "premature optimization".
“The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimization is the root of all evil (or at least most of it) in programming.” - Donald Knuth
I think we should decide if the docker-compose file will serve as a "hello world" example, or should it be something more complex, but closer to a production deployment? My ideas:
To address your points about the extension fields:
I think the goal should be a simple example docker-compose file. People will then extract bits and pieces from the example into their deployment. There's a ton of different ways to do deployment, so I would not aim for creating a docker-compose file with the goal of being close to one single deployment method. Instead, aim for something easy to comprehend.
Okay, why does this change the argument for using the extension fields?
If we want the docker-compose file to serve as an example, I think it should be as self-explanatory as possible. Storing env vars in a separate file does not help that cause, I think using extension fields and having everything in the same file will be clearer.
How so?
You still have unique container names.
Yes. See the following docker-compose file using extension fields, I think it's pretty readable: version: '3'
# ========================== AIRFLOW ENVIRONMENT VARIABLES ===========================
x-environment: &airflow_environment
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgresql://airflow:airflow@postgres:5432/airflow
- AIRFLOW__CORE__STORE_DAG_CODE=True
- AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
- AIRFLOW__WEBSERVER__RBAC=True
# ========================== /AIRFLOW ENVIRONMENT VARIABLES ==========================
services:
postgres:
image: postgres:12-alpine
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=airflow
- POSTGRES_DB=airflow
ports:
- "5432:5432"
webserver:
image: apache/airflow:1.10.10-python3.7
ports:
- "8080:8080"
environment: *airflow_environment
command: webserver
scheduler:
image: apache/airflow:1.10.10-python3.7
environment: *airflow_environment
command: scheduler For the initialization I've used this thing in the past, worked quite okay. We could make it a bit more sophisticated, e.g. by checking some condition to be True once a second, instead of initdb_adduser:
image: apache/airflow:1.10.10-python3.7
depends_on:
- postgres
environment: *airflow_environment
entrypoint: /bin/bash
# The webserver initializes permissions, so sleep for that to (approximately) be finished
# No disaster if the webserver isn't finished by then, but create_user will start spitting out errors until the permissions exist
command: -c 'airflow initdb && sleep 5 && airflow create_user --role Admin --username airflow --password airflow -e [email protected] -f airflow -l airflow' |
IMO the simplest way to run airflow is still pip install with airflow commands. A "hello world" example should not be requiring users to install/learn docker and docker-compose.
Sure, like @turbaszek mentioned we could use a name like "deploy" etc., where more docker-compose config with LocalExecutor and potentially k8s/helm config can be added in future. IMO "examples" would be confusing as it might refer to example dags and operators.
It wasn't about adding an example, the goal was to show usage of docker volumes for dags and plugins.
This PR is addressing #8605 "Add Production-ready docker compose for the production image".
I was referring to the code context that was being referred to regarding extension fields in the original review.
Like mentioned above, this PR is addressing an issue where the goal is to have docker-compose close to production. Having fernet keys, sensitive info that live in the docker-compose is not a best practice either (in terms of .gitignore config and migration to ECS/Fargate, Docker Swarm etc., ).
Glad you asked. Prior to Airflow's official Docker images, the two most popular docker-compose airflow code include https://github.com/puckel/docker-airflow and https://github.com/bitnami/bitnami-docker-airflow . I don't think extension fields are used there. For the context, Extension fields were introduced in docker-compose 3.4 in 2017.
I was referring to the code context that was being referred to regarding extension fields in the original review.
This is a separate issue #8606 , in that case if env variables would be used for this purpose then we can git rid of the init script entirely. Besides containers share cpu, memory, network and io, I'm not sure adding another container for a one time action is the best way to go. |
Looks like the entrypoint.sh is not doing an initdb or upgradedb prior to the command like is found in puckel docker-compose.yml (working example mitigating for entrypoint.sh lacking initdb) volumes:- ./dags:/opt/airflow/dags- ~/.aws:/home/airflow/.aws
volumes:- ./dags:/opt/airflow/dags- ~/.aws:/home/airflow/.aws
|
entrypoint.sh is maintained in dockerfile itself. This PR is docker-compose setup using Airflow official Docker Image. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This is a docker-compose template ideal for local development using official Airflow Docker images with easy deploy instructions. Addressing issue #8605
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.