Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Helm chart: All task pods are terminating with error while task succeed #16016

Closed
andormarkus opened this issue May 23, 2021 · 1 comment
Closed
Labels
kind:bug This is a clearly a bug

Comments

@andormarkus
Copy link
Contributor

andormarkus commented May 23, 2021

Apache Airflow version: 2.0.2
Kubernetes version: 1.20
Helm chart version: 1.0.0

What happened:
Successful task pods are terminating with error.

I did not have this error with the old airflow-helm chart. The official helm chart was redeployed into a fresh EKS cluster.
Update: It does not have this issue because uses 2.0.1-python3.8. If I use 2.0.1-python3.8 with this chart them it tis fine.

Screenshot 2021-05-23 at 23 53 20

▶ kubectl -n airflow get pods

NAME                                                     READY   STATUS    RESTARTS   AGE
airflow-scheduler-649d7dfb7d-58k6h                       2/2     Running   0          14m
airflow-scheduler-649d7dfb7d-gvg5d                       2/2     Running   0          14m
airflow-scheduler-649d7dfb7d-m85n5                       2/2     Running   0          14m
airflow-statsd-84f4f9898-kwxjk                           1/1     Running   0          14m
airflow-webserver-687c54c99d-df76q                       1/1     Running   0          14m
airflow-webserver-687c54c99d-vnw24                       1/1     Running   0          14m
airflow-webserver-687c54c99d-xq8tc                       1/1     Running   0          14m
simplepipeparsing.7187c95facb6494f8538160889667df6       0/1     Error     0          8m11s
simplepipeprocessing0.ba48ee69f8434c5ab2a21d997b7509b5   0/1     Error     0          7m54s
simplepipeprocessing1.ede29dbf0ca243dfbdf89afffe15f586   0/1     Error     0          7m54s
simplepipeprocessing2.a093d6b74303401d859d2361742ea8f6   0/1     Error     0          7m53s
simplepipeprocessing3.6b358ac9cba546ac8cfeb07eec6730fb   0/1     Error     0          7m52s
simplepipeprocessing4.f411ed8a7553475db7603a995e6630d7   0/1     Error     0          7m51s
▶ kubectl -n airflow logs pod/simplepipeparsing.7187c95facb6494f8538160889667df6

BACKEND=postgresql
DB_HOST=dataeng-rds-airflowmetastore-dev.c20w6vrzbehx.eu-central-1.rds.amazonaws.com
DB_PORT=5432

[2021-05-23 21:44:28,189] {dagbag.py:451} INFO - Filling up the DagBag from /opt/airflow/dags/dags/simple_pipe.py
[2021-05-23 21:44:28,445] {base_aws.py:368} INFO - Airflow Connection: aws_conn_id=aws_default
[2021-05-23 21:44:29,033] {base_aws.py:391} WARNING - Unable to use Airflow Connection for credentials.
[2021-05-23 21:44:29,033] {base_aws.py:392} INFO - Fallback on boto3 credential strategy
[2021-05-23 21:44:29,033] {base_aws.py:397} INFO - Creating session using boto3 credential strategy region_name=eu-central-1
Running <TaskInstance: simple_pipe.parsing 2020-01-01T00:00:00+00:00 [queued]> on host simplepipeparsing.7187c95facb6494f8538160889667df6
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/task_command.py", line 235, in task_run
    _run_task_by_selected_method(args, dag, ti)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/task_command.py", line 64, in _run_task_by_selected_method
    _run_task_by_local_task_job(args, ti)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/task_command.py", line 120, in _run_task_by_local_task_job
    run_job.run()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/base_job.py", line 237, in run
    self._execute()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 142, in _execute
    self.on_kill()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/jobs/local_task_job.py", line 157, in on_kill
    self.task_runner.on_finish()
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/task/task_runner/base_task_runner.py", line 178, in on_finish
    self._error_file.close()
  File "/usr/local/lib/python3.6/tempfile.py", line 511, in close
    self._closer.close()
  File "/usr/local/lib/python3.6/tempfile.py", line 448, in close
    unlink(self.name)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmp_bh__su1'

How to reproduce it:

executor: KubernetesExecutor
fernetKey: "XXXXXXXXXX"

config:
  logging:
    colored_console_log: "True"
    remote_logging: "True"
    remote_base_log_folder: "cloudwatch://${log_group_arn}"
    remote_log_conn_id: "aws_default"
  core:
    load_examples: "False"
  webserver:
    base_url: "http://foobaa.com/airflow"
  secrets:
    backend: "airflow.contrib.secrets.aws_systems_manager.SystemsManagerParameterStoreBackend"
    backend_kwargs: '{"connections_prefix": "/airflow/connections", "variables_prefix": "/airflow/variables", "profile_name": null}'

webserver:
  replicas: 3
  nodeSelector:
    namespace: airflow
  serviceAccount:
    name: ${service_account_name}
    annotations:
      eks.amazonaws.com/role-arn: ${service_account_iamrole_arn}
  service:
    type: NodePort

ingress:
  enabled: true
  web:
    precedingPaths:
      - path: "/*"
        serviceName: "ssl-redirect"
        servicePort: "use-annotation"
    path: "/airflow/*"
    
    annotations:
      external-dns.alpha.kubernetes.io/hostname: ${web_url}
      kubernetes.io/ingress.class: alb
      alb.ingress.kubernetes.io/scheme: internal
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/target-group-attributes: stickiness.enabled=true,stickiness.lb_cookie.duration_seconds=3600

      alb.ingress.kubernetes.io/certificate-arn: ${aws_acm_certificate_arn}
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTP": 80}, {"HTTPS":443}]'
      alb.ingress.kubernetes.io/actions.ssl-redirect: '{"Type": "redirect", "RedirectConfig": { "Protocol": "HTTPS", "Port": "443", "StatusCode": "HTTP_301"}}'

scheduler:
  replicas: 3
  nodeSelector:
    namespace: airflow
  serviceAccount:
    name: ${service_account_name}
    annotations:
      eks.amazonaws.com/role-arn: ${service_account_iamrole_arn}

workers:
  serviceAccount:
    name: ${service_account_name}
    annotations:
      eks.amazonaws.com/role-arn: ${service_account_iamrole_arn}

dags:
  persistence:
    enabled: true
    storageClassName: ${storage_class_dags}

logs:
  persistence:
    enabled: true
    storageClassName: ${storage_class_logs}

postgresql:
  enabled: false

data:
  metadataSecretName: ${metadata_secret_name}

Update:
I have did further testing with different versions:

  • 2.0.1-python3.8 - OK
  • 2.0.2-python3.8 - NOK
  • 2.0.2-python3.6 - NOK
  • 2.1.0-python3.8 - NOK
@andormarkus andormarkus added the kind:bug This is a clearly a bug label May 23, 2021
@andormarkus
Copy link
Contributor Author

Closing this bug because it is KubernetesExecutor related and updated bug report is filed in #16020

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

1 participant