You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We run airflow 2.2.3 via docker-compose (basically this ). It's been running fine since we set it up 4 months ago. We have both a production and test instance, identical setup on different systems. Test is fine, but our production instance exhibited the following when we came in on Monday:
no dags had run Sunday (coincindentally, daylight savings time day)
the 'next run' dates for each dag have not been updated since the last dag run, and are now in the past.
changes to the dags folder (such as updating dag files, deleting dag files) are not reflected in the webui
However, changes to the dag folder ARE reflected by doing an airflow dags list in the scheduler's container!
our dags_processor_manager.log file no longer shows any dags found, and instead just prints the following ever few minutes:
[2022-11-11 20:28:30,545] {manager.py:495} INFO - Exiting gracefully upon receiving signal 15
[2022-11-11 20:28:30,630] {manager.py:514} INFO - Processing files using up to 2 processes at a time
[2022-11-11 20:28:30,630] {manager.py:515} INFO - Process each file at most once every 30 seconds
[2022-11-11 20:28:30,631] {manager.py:517} INFO - Checking for new files in /opt/airflow/dags every 300 seconds
[2022-11-11 20:28:30,631] {manager.py:663} INFO - Searching for files in /opt/airflow/dags
I confirmed that:
all containers can ping each other
all containers have access to read the dag dir
I should note a few of the dags in our dags folder have had import errors for a few weeks. Perhaps not addressing this was a mistake?
I tried deleting the entire postgres database and upon re-initialization, Airflow becomes functional again. Ideally though we wouldn't have to do this as we lose metadata and still have the issue of Airflow randomly stop working.
Does the webui get the info about the dags dir directly from the postgres db? would this be some sort of db issue?
What you think should happen instead
Dags should not have stopped? Webui accurately reflects dag folder?
System info
OS | Linux
architecture | x86_64
uname | uname_result(system='Linux', node='c63071150ddd', release='4.18.0-372.26.1.el8_6.x86_64',
| version='#1 SMP Tue Sep 13 18:09:48 UTC 2022', machine='x86_64', processor='')
locale | ('en_US', 'UTF-8')
python_version | 3.7.12 (default, Nov 17 2021, 17:59:57) [GCC 8.3.0]
python_location | /usr/local/bin/python
Tools info
git | NOT AVAILABLE
ssh | OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1n 15 Mar 2022
kubectl | NOT AVAILABLE
gcloud | NOT AVAILABLE
cloud_sql_proxy | NOT AVAILABLE
mysql | mysql Ver 8.0.27 for Linux on x86_64 (MySQL Community Server - GPL)
sqlite3 | 3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1
psql | psql (PostgreSQL) 11.16 (Debian 11.16-0+deb10u1)
docker-compose.yaml:
---------------------------------
# This configuration requires an .env file in this directory
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME - Docker image name used to run Airflow.
# AIRFLOW_UID - User ID in Airflow containers
# AIRFLOW_INSTANCE - Docker instance name used on PSDI team
# AIRFLOW_FERNET_KEY - Fernet key used to encrypt connection details
# AIRFLOW_WEBSERVER_PORT - The webserver port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_REDIS_PORT - The redis port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_FLOWER_PORT - The flower port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_CORE_FILE_PATH - The relative path of the core Airflow files. (DAGs, Plugins, DDLs, etc..)
# _AIRFLOW_WWW_USER_USERNAME - Username for the administrator account (if requested).
# _AIRFLOW_WWW_USER_PASSWORD - Password for the administrator account (if requested).
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
#
# Feel free to modify this file to suit your needs in the .env file.
# If you do not have an .env file, or your .env file has issues, the application will not start
# See the .env file in this directory for these variables starting with '$' below !!
---
version: '3'
#This stanza represents the 'includes' that will be inserted whenever airflow-common is referenced below
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME}
environment:
&airflow-common-env
ENVIRONMENT: ${ENVIRONMENT}
TEAM_EMAIL: ${TEAM_EMAIL}
AIRFLOW__WEBSERVER__INSTANCE_NAME: "${ENVIRONMENT} Environment"
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
AIRFLOW__CORE__FERNET_KEY: ${AIRFLOW_FERNET_KEY}
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
#AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
AIRFLOW__CORE__LOAD_EXAMPLES: ${LOAD_EXAMPLES:-false}
volumes:
- ${AIRFLOW_CORE_FILE_PATH}/dags:/opt/airflow/dags
- ${AIRFLOW_CORE_FILE_PATH}/credentials:/opt/airflow/credentials
- ${AIRFLOW_CORE_FILE_PATH}/plugins:/opt/airflow/plugins
- ${BUCKET_PATH}/${ENVIRONMENT}/airflow_app_logs/:/opt/airflow/logs
- ${BUCKET_PATH}/${ENVIRONMENT}:/opt/airflow/dags/bucket
user: "${AIRFLOW_UID}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
postgres:
condition: service_healthy
services:
postgres:
image: ${POSTGRES_IMAGE_NAME}
container_name: postgres_${AIRFLOW_INSTANCE}
environment:
POSTGRES_USER: airflow
POSTGRES_PASSWORD: airflow
POSTGRES_DB: airflow
volumes:
#- postgres-db-volume:/var/lib/postgresql/data
- ${BUCKET_PATH}/${ENVIRONMENT}/airflow_postgres-db-volume/:/var/lib/postgresql/data
healthcheck:
test: ["CMD", "pg_isready", "-U", "airflow"]
interval: 5s
retries: 5
restart: always
redis:
image: ${REDIS_IMAGE_NAME}
container_name: redis_${AIRFLOW_INSTANCE}
expose:
- ${AIRFLOW_REDIS_PORT}
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
container_name: airflow_webserver_${AIRFLOW_INSTANCE}
command: airflow webserver
ports:
- ${AIRFLOW_WEBSERVER_PORT}:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
container_name: airflow_scheduler_${AIRFLOW_INSTANCE}
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
container_name: airflow_worker_${AIRFLOW_INSTANCE}
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
container_name: airflow_triggerer_${AIRFLOW_INSTANCE}
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
container_name: airflow_init_${AIRFLOW_INSTANCE}
entrypoint: /bin/bash
command:
- -c
- |
function ver() {
printf "%04d%04d%04d%04d" $${1//./ }
}
airflow_version=$$(gosu airflow airflow version)
airflow_version_comparable=$$(ver $${airflow_version})
min_airflow_version=2.2.0
min_airflow_version_comparable=$$(ver $${min_airflow_version})
echo "VERSION IS $${min_airflow_version}, $${min_airflow_version_comparable}"
if (( airflow_version_comparable < min_airflow_version_comparable )); then
echo
echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
echo
exit 1
fi
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
echo
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
echo
fi
mkdir -p /sources/logs /sources/dags /sources/plugins
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
exec /entrypoint airflow version
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD}
user: "0:0"
volumes:
- .:/sources
airflow-cli:
<<: *airflow-common
container_name: airflow_cli_test
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
flower:
<<: *airflow-common
container_name: airflow_flower_${AIRFLOW_INSTANCE}
command: celery flower
ports:
- ${AIRFLOW_FLOWER_PORT}:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-connections-import:
<<: *airflow-common
container_name: airflow_connections_import_${AIRFLOW_INSTANCE}
command: connections import connections.json
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-variables-import:
<<: *airflow-common
container_name: airflow_variables_import_${AIRFLOW_INSTANCE}
command: variables import variables.json
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-jupyter:
<<: *airflow-common
container_name: airflow_jupyter_${AIRFLOW_INSTANCE}
command: bash -cx "jupyter notebook --ip 0.0.0.0 --NotebookApp.token='' --NotebookApp.password=''"
ports:
- ${JUPYTER_PORT}:8888
restart: always
#volumes:
#postgres-db-volume:
Anything else
I believe a similar problem happened about 4 months ago but i chalked it up to something i had done during devleopment.
Having compared my docker-compose.yaml with https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml, i noticed that the official doc specifies postgres:13 whereas i was using postgres:latest. By using latest, it resulted in me pulling down 14.x. I had mentioned above that my test app did not see the issue but that changed since i reported the issue. Both of my systems, both using postgres 14.x saw the issue.
i'm chalking this up to the psql driver on the 2.2.3 image being incompatible with the 14.x server is a subtle way. It worked for about 4 monhts, until it didn't...
Apache Airflow version
Other Airflow 2 version (please specify below)
What happened
We run airflow 2.2.3 via docker-compose (basically this ). It's been running fine since we set it up 4 months ago. We have both a production and test instance, identical setup on different systems. Test is fine, but our production instance exhibited the following when we came in on Monday:
airflow dags list
in the scheduler's container!I confirmed that:
I should note a few of the dags in our dags folder have had import errors for a few weeks. Perhaps not addressing this was a mistake?
I tried deleting the entire postgres database and upon re-initialization, Airflow becomes functional again. Ideally though we wouldn't have to do this as we lose metadata and still have the issue of Airflow randomly stop working.
Does the webui get the info about the dags dir directly from the postgres db? would this be some sort of db issue?
What you think should happen instead
Dags should not have stopped? Webui accurately reflects dag folder?
How to reproduce
No response
Operating System
NAME="Rocky Linux" VERSION="8.6 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky Linux" ROCKY_SUPPORT_PRODUCT_VERSION="8" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8"
Versions of Apache Airflow Providers
Providers info
apache-airflow-providers-amazon | 2.4.0
apache-airflow-providers-celery | 2.1.0
apache-airflow-providers-cncf-kubernetes | 2.2.0
apache-airflow-providers-docker | 2.3.0
apache-airflow-providers-elasticsearch | 2.1.0
apache-airflow-providers-ftp | 2.0.1
apache-airflow-providers-google | 6.2.0
apache-airflow-providers-grpc | 2.0.1
apache-airflow-providers-hashicorp | 2.1.1
apache-airflow-providers-http | 2.0.1
apache-airflow-providers-imap | 2.0.1
apache-airflow-providers-jdbc | 2.0.1
apache-airflow-providers-microsoft-azure | 3.4.0
apache-airflow-providers-mysql | 2.1.1
apache-airflow-providers-odbc | 2.0.1
apache-airflow-providers-oracle | 2.0.1
apache-airflow-providers-postgres | 2.4.0
apache-airflow-providers-presto | 2.0.1
apache-airflow-providers-redis | 2.0.1
apache-airflow-providers-sendgrid | 2.0.1
apache-airflow-providers-sftp | 2.3.0
apache-airflow-providers-slack | 4.1.0
apache-airflow-providers-sqlite | 2.0.1
apache-airflow-providers-ssh | 2.3.0
Deployment
Deployment details
airflow info:
Apache Airflow
version | 2.2.3
executor | CeleryExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn | postgresql+psycopg2://airflow:airflow@postgres/airflow
dags_folder | /opt/airflow/dags
plugins_folder | /opt/airflow/plugins
base_log_folder | /opt/airflow/logs
remote_base_log_folder |
System info
OS | Linux
architecture | x86_64
uname | uname_result(system='Linux', node='c63071150ddd', release='4.18.0-372.26.1.el8_6.x86_64',
| version='#1 SMP Tue Sep 13 18:09:48 UTC 2022', machine='x86_64', processor='')
locale | ('en_US', 'UTF-8')
python_version | 3.7.12 (default, Nov 17 2021, 17:59:57) [GCC 8.3.0]
python_location | /usr/local/bin/python
Tools info
git | NOT AVAILABLE
ssh | OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1n 15 Mar 2022
kubectl | NOT AVAILABLE
gcloud | NOT AVAILABLE
cloud_sql_proxy | NOT AVAILABLE
mysql | mysql Ver 8.0.27 for Linux on x86_64 (MySQL Community Server - GPL)
sqlite3 | 3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1
psql | psql (PostgreSQL) 11.16 (Debian 11.16-0+deb10u1)
Paths info
airflow_home | /opt/airflow
system_path | /home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sb
| in:/bin:/usr/local/bin:/root/.local/bin:/opt/oracle/instantclient_21_1
python_path | /home/airflow/.local/bin:/opt/airflow:/usr/local/lib/python37.zip:/usr/local/lib/python3.7:/u
| sr/local/lib/python3.7/lib-dynload:/home/airflow/.local/lib/python3.7/site-packages:/usr/loca
| l/lib/python3.7/site-packages:/opt/airflow/dags:/opt/airflow/config:/opt/airflow/plugins
airflow_on_path | True
our docker-compose.yaml
Anything else
I believe a similar problem happened about 4 months ago but i chalked it up to something i had done during devleopment.
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: