Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

K8s executor failed to manual re-run the task #15960

Closed
yogyang opened this issue May 20, 2021 · 2 comments
Closed

K8s executor failed to manual re-run the task #15960

yogyang opened this issue May 20, 2021 · 2 comments
Labels
kind:bug This is a clearly a bug

Comments

@yogyang
Copy link
Contributor

yogyang commented May 20, 2021

Apache Airflow version: 2.0.2

Kubernetes version (if you are using kubernetes) (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.7", GitCommit:"1dd5338295409edcfff11505e7bb246f0d325d15", GitTreeState:"clean", BuildDate:"2021-01-13T13:23:52Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T20:55:23Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Cloud provider or hardware configuration: rancher ec2
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 10 (buster)
  • Kernel (e.g. uname -a): Linux airflow-webserver-57ccd474bc-8rfg5 5.4.0-1045-aws airflow worker does not work  #47-Ubuntu SMP Tue Apr 13 07:02:25 UTC 2021 x86_64 GNU/Linux
  • Install tools:
  • Others: K8s executor mode

What happened:

click task -> run ignore state -> Task would not be executed

webserver gives error:

 WARNING - ApiException when attempting to run task, re-queueing. Message: pods is forbidden: User "system:serviceaccount:production-airflow:airflow-webserver" cannot create resource "pods" in API group "" in the namespace "production-airflow"
[2021-05-20 04:28:07,532] {kubernetes_executor.py:275} INFO - Kubernetes job is (TaskInstanceKey xxxxx)
[2021-05-20 04:28:07,534] {pod_launcher.py:86} ERROR - Exception when attempting to create Namespaced Pod:
 "metadata": {
    "annotations": {
      "dag_id": "xxxx",
      "task_id": "xxxx",
      "execution_date": "2021-05-20T03:00:00+00:00",
      "try_number": "3",
      "ad.datadoghq.com/tags": "{ \"type\": \"job\",\"task\": \"xx.xx\" }"
    },
    "labels": {
      "airflow-worker": "manual",
      "dag_id": "xxx",
      "task_id": "xxx",
      "execution_date": "2021-05-20T03_00_00_plus_00_00",
      "try_number": "3",
      "airflow_version": "2.0.2",
      "kubernetes_executor": "True"
    },
    "name": "xxxx.b2bd66df4f3b44198d7f23cdcaae07d1",
    "namespace": "production-airflow"
  },
  "spec": {...}
  Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.7/site-packages/airflow/kubernetes/pod_launcher.py", line 82, in run_pod_async
    body=sanitized_pod, namespace=pod.metadata.namespace, **kwargs
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 6174, in create_namespaced_pod
    (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api/core_v1_api.py", line 6265, in create_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/api_client.py", line 388, in request
    body=body)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 278, in POST
    body=body)
  File "/home/airflow/.local/lib/python3.7/site-packages/kubernetes/client/rest.py", line 231, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'Date': 'Thu, 20 May 2021 04:28:07 GMT', 'Content-Length': '316'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods is forbidden: User \"system:serviceaccount:production-airflow:airflow-webserver\" cannot create resource \"pods\" in API group \"\" in the namespace \"production-airflow\"","reason":"Forbidden","details":{"kind":"pods"},"code":403}

What you expected to happen:

Task can be ran

Now, airflow start an exeuctor in airlfow-webserver, but airflow-webserver has no rolebinding to 'pod-launcher' role

airflow/airflow/www/views.py

Lines 1432 to 1440 in 5bd6ea7

executor.job_id = "manual"
executor.start()
executor.queue_task_instance(
ti,
ignore_all_deps=ignore_all_deps,
ignore_task_deps=ignore_task_deps,
ignore_ti_state=ignore_ti_state,
)
executor.heartbeat()

rolebinding-in helm:

subjects:
{{- if has .Values.executor $schedulerLaunchExecutors }}
- kind: ServiceAccount
name: {{ .Release.Name }}-scheduler
namespace: {{ .Release.Namespace }}
{{- end }}
{{- if has .Values.executor $workerLaunchExecutors }}
- kind: ServiceAccount
name: {{ .Release.Name }}-worker
namespace: {{ .Release.Namespace }}

How to reproduce it:

set up a K8s airflow using the apache/airflow/helm, manually trigger task run by task -> run (Ignore state)

Anything else we need to know:

@yogyang yogyang added the kind:bug This is a clearly a bug label May 20, 2021
@yogyang
Copy link
Contributor Author

yogyang commented May 20, 2021

add rolebinding to serviceaccount:airflow-webserver would result in another problem: the pod airflow-webserver created is not using pod templated defined in the pod_template_file file, as the helm did not mount the pod_template_file for web-server pod by default.

Need to set webserver extratVolumeMout to mount the pod_template_file

@yogyang
Copy link
Contributor Author

yogyang commented Nov 19, 2021

upgrade to 2.1.2, seems not this problem now. close it

@yogyang yogyang closed this as completed Nov 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug
Projects
None yet
Development

No branches or pull requests

1 participant