Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dags stopped running and webui not picking up changes to dag directory #27628

Closed
1 of 2 tasks
jason-brian-anderson opened this issue Nov 11, 2022 · 4 comments
Closed
1 of 2 tasks
Labels
area:core kind:bug This is a clearly a bug

Comments

@jason-brian-anderson
Copy link

jason-brian-anderson commented Nov 11, 2022

Apache Airflow version

Other Airflow 2 version (please specify below)

What happened

We run airflow 2.2.3 via docker-compose (basically this ). It's been running fine since we set it up 4 months ago. We have both a production and test instance, identical setup on different systems. Test is fine, but our production instance exhibited the following when we came in on Monday:

  • no dags had run Sunday (coincindentally, daylight savings time day)
  • the 'next run' dates for each dag have not been updated since the last dag run, and are now in the past.
  • changes to the dags folder (such as updating dag files, deleting dag files) are not reflected in the webui
  • However, changes to the dag folder ARE reflected by doing an airflow dags list in the scheduler's container!
  • our dags_processor_manager.log file no longer shows any dags found, and instead just prints the following ever few minutes:
[2022-11-11 20:28:30,545] {manager.py:495} INFO - Exiting gracefully upon receiving signal 15
[2022-11-11 20:28:30,630] {manager.py:514} INFO - Processing files using up to 2 processes at a time 
[2022-11-11 20:28:30,630] {manager.py:515} INFO - Process each file at most once every 30 seconds
[2022-11-11 20:28:30,631] {manager.py:517} INFO - Checking for new files in /opt/airflow/dags every 300 seconds
[2022-11-11 20:28:30,631] {manager.py:663} INFO - Searching for files in /opt/airflow/dags

I confirmed that:

  • all containers can ping each other
  • all containers have access to read the dag dir

I should note a few of the dags in our dags folder have had import errors for a few weeks. Perhaps not addressing this was a mistake?

I tried deleting the entire postgres database and upon re-initialization, Airflow becomes functional again. Ideally though we wouldn't have to do this as we lose metadata and still have the issue of Airflow randomly stop working.

Does the webui get the info about the dags dir directly from the postgres db? would this be some sort of db issue?

What you think should happen instead

Dags should not have stopped? Webui accurately reflects dag folder?

How to reproduce

No response

Operating System

NAME="Rocky Linux" VERSION="8.6 (Green Obsidian)" ID="rocky" ID_LIKE="rhel centos fedora" VERSION_ID="8.6" PLATFORM_ID="platform:el8" PRETTY_NAME="Rocky Linux 8.6 (Green Obsidian)" ANSI_COLOR="0;32" CPE_NAME="cpe:/o:rocky:rocky:8:GA" HOME_URL="https://rockylinux.org/" BUG_REPORT_URL="https://bugs.rockylinux.org/" ROCKY_SUPPORT_PRODUCT="Rocky Linux" ROCKY_SUPPORT_PRODUCT_VERSION="8" REDHAT_SUPPORT_PRODUCT="Rocky Linux" REDHAT_SUPPORT_PRODUCT_VERSION="8"

Versions of Apache Airflow Providers

Providers info
apache-airflow-providers-amazon | 2.4.0
apache-airflow-providers-celery | 2.1.0
apache-airflow-providers-cncf-kubernetes | 2.2.0
apache-airflow-providers-docker | 2.3.0
apache-airflow-providers-elasticsearch | 2.1.0
apache-airflow-providers-ftp | 2.0.1
apache-airflow-providers-google | 6.2.0
apache-airflow-providers-grpc | 2.0.1
apache-airflow-providers-hashicorp | 2.1.1
apache-airflow-providers-http | 2.0.1
apache-airflow-providers-imap | 2.0.1
apache-airflow-providers-jdbc | 2.0.1
apache-airflow-providers-microsoft-azure | 3.4.0
apache-airflow-providers-mysql | 2.1.1
apache-airflow-providers-odbc | 2.0.1
apache-airflow-providers-oracle | 2.0.1
apache-airflow-providers-postgres | 2.4.0
apache-airflow-providers-presto | 2.0.1
apache-airflow-providers-redis | 2.0.1
apache-airflow-providers-sendgrid | 2.0.1
apache-airflow-providers-sftp | 2.3.0
apache-airflow-providers-slack | 4.1.0
apache-airflow-providers-sqlite | 2.0.1
apache-airflow-providers-ssh | 2.3.0

Deployment

Deployment details

airflow info:

Apache Airflow
version | 2.2.3
executor | CeleryExecutor
task_logging_handler | airflow.utils.log.file_task_handler.FileTaskHandler
sql_alchemy_conn | postgresql+psycopg2://airflow:airflow@postgres/airflow
dags_folder | /opt/airflow/dags
plugins_folder | /opt/airflow/plugins
base_log_folder | /opt/airflow/logs
remote_base_log_folder |

System info
OS | Linux
architecture | x86_64
uname | uname_result(system='Linux', node='c63071150ddd', release='4.18.0-372.26.1.el8_6.x86_64',
| version='#1 SMP Tue Sep 13 18:09:48 UTC 2022', machine='x86_64', processor='')
locale | ('en_US', 'UTF-8')
python_version | 3.7.12 (default, Nov 17 2021, 17:59:57) [GCC 8.3.0]
python_location | /usr/local/bin/python

Tools info
git | NOT AVAILABLE
ssh | OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1n 15 Mar 2022
kubectl | NOT AVAILABLE
gcloud | NOT AVAILABLE
cloud_sql_proxy | NOT AVAILABLE
mysql | mysql Ver 8.0.27 for Linux on x86_64 (MySQL Community Server - GPL)
sqlite3 | 3.27.2 2019-02-25 16:06:06 bd49a8271d650fa89e446b42e513b595a717b9212c91dd384aab871fc1d0alt1
psql | psql (PostgreSQL) 11.16 (Debian 11.16-0+deb10u1)

Paths info
airflow_home | /opt/airflow
system_path | /home/airflow/.local/bin:/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sb
| in:/bin:/usr/local/bin:/root/.local/bin:/opt/oracle/instantclient_21_1
python_path | /home/airflow/.local/bin:/opt/airflow:/usr/local/lib/python37.zip:/usr/local/lib/python3.7:/u
| sr/local/lib/python3.7/lib-dynload:/home/airflow/.local/lib/python3.7/site-packages:/usr/loca
| l/lib/python3.7/site-packages:/opt/airflow/dags:/opt/airflow/config:/opt/airflow/plugins
airflow_on_path | True

our docker-compose.yaml

docker-compose.yaml:
---------------------------------
# This configuration  requires an .env file in this directory 
# The following variables are supported:
#
# AIRFLOW_IMAGE_NAME           - Docker image name used to run Airflow.
# AIRFLOW_UID                  - User ID in Airflow containers
# AIRFLOW_INSTANCE             - Docker instance name used on PSDI team
# AIRFLOW_FERNET_KEY           - Fernet key used to encrypt connection details
# AIRFLOW_WEBSERVER_PORT       - The webserver port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_REDIS_PORT           - The redis port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_FLOWER_PORT          - The flower port you would like to use. Must not be used by anyone else on the team.
# AIRFLOW_CORE_FILE_PATH        - The relative path of the core Airflow files. (DAGs, Plugins, DDLs, etc..)
# _AIRFLOW_WWW_USER_USERNAME   - Username for the administrator account (if requested).
# _AIRFLOW_WWW_USER_PASSWORD   - Password for the administrator account (if requested).
# _PIP_ADDITIONAL_REQUIREMENTS - Additional PIP requirements to add when starting all containers.
#
# Feel free to modify this file to suit your needs in the .env file.
# If you do not have an .env file, or your .env file has issues, the application will not start
# See the .env file in this directory for these variables starting with '$' below !!
---
version: '3'
#This stanza represents the 'includes' that will be inserted whenever airflow-common is referenced below
x-airflow-common:
  &airflow-common
  image: ${AIRFLOW_IMAGE_NAME}
  environment:
    &airflow-common-env
    ENVIRONMENT: ${ENVIRONMENT}
    TEAM_EMAIL: ${TEAM_EMAIL}
    AIRFLOW__WEBSERVER__INSTANCE_NAME: "${ENVIRONMENT} Environment"
    AIRFLOW__CORE__EXECUTOR: CeleryExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres/airflow
    AIRFLOW__CELERY__BROKER_URL: redis://:@redis:6379/0
    AIRFLOW__CORE__FERNET_KEY: ${AIRFLOW_FERNET_KEY}
    AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
    #AIRFLOW__CORE__LOAD_EXAMPLES: 'false'
    AIRFLOW__CORE__LOAD_EXAMPLES: ${LOAD_EXAMPLES:-false}
  volumes:
    - ${AIRFLOW_CORE_FILE_PATH}/dags:/opt/airflow/dags
    - ${AIRFLOW_CORE_FILE_PATH}/credentials:/opt/airflow/credentials
    - ${AIRFLOW_CORE_FILE_PATH}/plugins:/opt/airflow/plugins
    - ${BUCKET_PATH}/${ENVIRONMENT}/airflow_app_logs/:/opt/airflow/logs
    - ${BUCKET_PATH}/${ENVIRONMENT}:/opt/airflow/dags/bucket
  user: "${AIRFLOW_UID}:0"
  depends_on:
    &airflow-common-depends-on
    redis:
      condition: service_healthy
    postgres:
      condition: service_healthy

services:
  postgres:
    image: ${POSTGRES_IMAGE_NAME}
    container_name: postgres_${AIRFLOW_INSTANCE}
    environment:
      POSTGRES_USER: airflow
      POSTGRES_PASSWORD: airflow
      POSTGRES_DB: airflow
    volumes:
     #- postgres-db-volume:/var/lib/postgresql/data
      - ${BUCKET_PATH}/${ENVIRONMENT}/airflow_postgres-db-volume/:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD", "pg_isready", "-U", "airflow"]
      interval: 5s
      retries: 5
    restart: always

  redis:
    image: ${REDIS_IMAGE_NAME}
    container_name: redis_${AIRFLOW_INSTANCE}
    expose:
      - ${AIRFLOW_REDIS_PORT}
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 5s
      timeout: 30s
      retries: 50
    restart: always

  airflow-webserver:
    <<: *airflow-common
    container_name: airflow_webserver_${AIRFLOW_INSTANCE}
    command: airflow webserver
    ports:
      - ${AIRFLOW_WEBSERVER_PORT}:8080
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-scheduler:
    <<: *airflow-common
    container_name: airflow_scheduler_${AIRFLOW_INSTANCE}
    command: scheduler
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-worker:
    <<: *airflow-common
    container_name: airflow_worker_${AIRFLOW_INSTANCE}
    command: celery worker
    healthcheck:
      test:
        - "CMD-SHELL"
        - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"'
      interval: 10s
      timeout: 10s
      retries: 5
    environment:
      <<: *airflow-common-env
      # Required to handle warm shutdown of the celery workers properly
      # See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
      DUMB_INIT_SETSID: "0"
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-triggerer:
    <<: *airflow-common
    container_name: airflow_triggerer_${AIRFLOW_INSTANCE}
    command: triggerer
    healthcheck:
      test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-init:
    <<: *airflow-common
    container_name: airflow_init_${AIRFLOW_INSTANCE}
    entrypoint: /bin/bash
    command:
      - -c
      - |
        function ver() {
          printf "%04d%04d%04d%04d" $${1//./ }
        }
        airflow_version=$$(gosu airflow airflow version)
        airflow_version_comparable=$$(ver $${airflow_version})
        min_airflow_version=2.2.0
        min_airflow_version_comparable=$$(ver $${min_airflow_version})
        echo "VERSION IS $${min_airflow_version}, $${min_airflow_version_comparable}"
        if (( airflow_version_comparable < min_airflow_version_comparable )); then
          echo
          echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
          echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
          echo
          exit 1
        fi
        if [[ -z "${AIRFLOW_UID}" ]]; then
          echo
          echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
          echo "If you are on Linux, you SHOULD follow the instructions below to set "
          echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
          echo "For other operating systems you can get rid of the warning with manually created .env file:"
          echo "    See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
          echo
        fi
        one_meg=1048576
        mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
        cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
        disk_available=$$(df / | tail -1 | awk '{print $$4}')
        warning_resources="false"
        if (( mem_available < 4000 )) ; then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
          echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
          echo
          warning_resources="true"
        fi
        if (( cpus_available < 2 )); then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
          echo "At least 2 CPUs recommended. You have $${cpus_available}"
          echo
          warning_resources="true"
        fi
        if (( disk_available < one_meg * 10 )); then
          echo
          echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
          echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
          echo
          warning_resources="true"
        fi
        if [[ $${warning_resources} == "true" ]]; then
          echo
          echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
          echo "Please follow the instructions to increase amount of resources available:"
          echo "   https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
          echo
        fi
        mkdir -p /sources/logs /sources/dags /sources/plugins
        chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
        exec /entrypoint airflow version
    environment:
      <<: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD}
    user: "0:0"
    volumes:
      - .:/sources

  airflow-cli:
    <<: *airflow-common
    container_name: airflow_cli_test
    profiles:
      - debug
    environment:
      <<: *airflow-common-env
      CONNECTION_CHECK_MAX_COUNT: "0"
    # Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252

  flower:
    <<: *airflow-common
    container_name: airflow_flower_${AIRFLOW_INSTANCE}
    command: celery flower
    ports:
      - ${AIRFLOW_FLOWER_PORT}:5555
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully
  
  airflow-connections-import:
    <<: *airflow-common
    container_name: airflow_connections_import_${AIRFLOW_INSTANCE}
    command: connections import connections.json
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-variables-import:
    <<: *airflow-common
    container_name: airflow_variables_import_${AIRFLOW_INSTANCE}
    command: variables import variables.json
    depends_on:
      <<: *airflow-common-depends-on
      airflow-init:
        condition: service_completed_successfully

  airflow-jupyter:
    <<: *airflow-common
    container_name: airflow_jupyter_${AIRFLOW_INSTANCE}
    command: bash -cx "jupyter notebook --ip 0.0.0.0 --NotebookApp.token='' --NotebookApp.password=''"
    ports:
      - ${JUPYTER_PORT}:8888
    restart: always
        
    #volumes:
    #postgres-db-volume:


Anything else

I believe a similar problem happened about 4 months ago but i chalked it up to something i had done during devleopment.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@jason-brian-anderson jason-brian-anderson added area:core kind:bug This is a clearly a bug labels Nov 11, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Nov 11, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@uranusjr
Copy link
Member

What are the next run values you have? I wonder if this is actually related to dst.

@jason-brian-anderson
Copy link
Author

jason-brian-anderson commented Nov 15, 2022

Having compared my docker-compose.yaml with https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml, i noticed that the official doc specifies postgres:13 whereas i was using postgres:latest. By using latest, it resulted in me pulling down 14.x. I had mentioned above that my test app did not see the issue but that changed since i reported the issue. Both of my systems, both using postgres 14.x saw the issue.

i'm chalking this up to the psql driver on the 2.2.3 image being incompatible with the 14.x server is a subtle way. It worked for about 4 monhts, until it didn't...

My lesson is always beware of 'latest'

@jason-brian-anderson
Copy link
Author

tldr: use postgres:13 with airflow 2.2.3, not postgres:latest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

2 participants