Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI 'kubernetes cleanup-pods' fails on invalid label key #16013

Closed
andormarkus opened this issue May 23, 2021 · 16 comments · Fixed by #17298
Closed

CLI 'kubernetes cleanup-pods' fails on invalid label key #16013

andormarkus opened this issue May 23, 2021 · 16 comments · Fixed by #17298
Assignees
Labels
affected_version:2.0 Issues Reported for 2.0 kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Milestone

Comments

@andormarkus
Copy link
Contributor

Apache Airflow version: 2.0.2
Helm chart version: 1.0.0
Kubernetes version: 1.20

What happened:
Airflow airflow-cleanup cronjob is failing with the error below. When I run the same command form the webserver or scheduler pod I got the same error.

> airflow@airflow-webserver-7f9f7954c-p9vv9:/opt/airflow$ airflow kubernetes  cleanup-pods --namespace airflow

Loading Kubernetes configuration
Listing pods in namespace airflow
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 89, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/kubernetes_command.py", line 111, in cleanup_pods
    pod_list = kube_client.list_namespaced_pod(**list_kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod
    (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 12905, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 366, in request
    headers=headers)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 241, in GET
    query_params=query_params)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': '53ee7655-f595-42a5-bdfb-689067a7fe02', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 'e14ece85-9601-4034-9a43-7872ebabcbc5', 'X-Kubernetes-Pf-Prioritylevel-Uid': '72601873-fd48-4405-99dc-b7c4cac03b5c', 'Date': 'Sun, 23 May 2021 16:07:37 GMT', 'Content-Length': '428'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to parse requirement: invalid label key \"{'matchExpressions':\": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')","reason":"BadRequest","code":400}

How to reproduce it:
Create and airflow deployment with Helm chart
Enable automatic cleanup

cleanup:
  enabled: true

Run command airflow kubernetes cleanup-pods --namespace airflow

@andormarkus andormarkus added the kind:bug This is a clearly a bug label May 23, 2021
@mik-laj
Copy link
Member

mik-laj commented May 23, 2021

CC: @XD-DENG

@XD-DENG
Copy link
Member

XD-DENG commented May 23, 2021

This is interesting, because it seems like the error arises because of selecting pods using an invalid label key. However, according to the Airflow 2.0.2 related CLI code, all keys we used align with the legal pattern ([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9] (and they are all the label keys created in PodGenerator.construct_pod)

I can take a further look later. If @andormarkus has any other potentially related information, please share here as well.

Thanks.

@andormarkus
Copy link
Contributor Author

Hi @XD-DENG, Im more than happy to provide related information however I need guidance how to provide it.

@XD-DENG
Copy link
Member

XD-DENG commented May 23, 2021

Thanks @andormarkus

Something I have in mind may be a sample pod yaml in your environment, i.e. the output of kubectl get pod your_pod_name -o yaml, something like that (status part in the yaml is not necessary; if you can provide it, please do remember to mask/remove the potentially sensitive information inside)

(I'm a bit busy recently. If I don't get back here in the coming few days, please feel free to ping me here for reminding. Thanks😄)

@andormarkus
Copy link
Contributor Author

I have attached the output of the cleanup pod. It does not matter if the script is running in a scheduler/webserver/cleanup pod

▶ kubectl -n airflow get pods airflow-cleanup-1621836900-qjqbt -o yaml

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
    sidecar.istio.io/inject: "false"
  creationTimestamp: "2021-05-24T06:15:08Z"
  generateName: airflow-cleanup-1621836900-
  labels:
    controller-uid: XXXXXXXXXXXXXXXX
    job-name: airflow-cleanup-1621836900
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:sidecar.istio.io/inject: {}
        f:generateName: {}
        f:labels:
          .: {}
          f:controller-uid: {}
          f:job-name: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"XXXXXXXXXXXXXXXX"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:affinity: {}
        f:containers:
          k:{"name":"airflow-cleanup-pods"}:
            .: {}
            f:args: {}
            f:env:
              .: {}
              k:{"name":"AIRFLOW__CORE__FERNET_KEY"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"AIRFLOW__CORE__SQL_ALCHEMY_CONN"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
              k:{"name":"AIRFLOW_CONN_AIRFLOW_DB"}:
                .: {}
                f:name: {}
                f:valueFrom:
                  .: {}
                  f:secretKeyRef:
                    .: {}
                    f:key: {}
                    f:name: {}
            f:image: {}
            f:imagePullPolicy: {}
            f:name: {}
            f:resources: {}
            f:terminationMessagePath: {}
            f:terminationMessagePolicy: {}
            f:volumeMounts:
              .: {}
              k:{"mountPath":"/opt/airflow/airflow.cfg"}:
                .: {}
                f:mountPath: {}
                f:name: {}
                f:readOnly: {}
                f:subPath: {}
        f:dnsPolicy: {}
        f:enableServiceLinks: {}
        f:restartPolicy: {}
        f:schedulerName: {}
        f:securityContext: {}
        f:serviceAccount: {}
        f:serviceAccountName: {}
        f:terminationGracePeriodSeconds: {}
        f:volumes:
          .: {}
          k:{"name":"config"}:
            .: {}
            f:configMap:
              .: {}
              f:defaultMode: {}
              f:name: {}
            f:name: {}
    manager: kube-controller-manager
    operation: Update
    time: "2021-05-24T06:15:08Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:status:
        f:conditions:
          k:{"type":"ContainersReady"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
          k:{"type":"Initialized"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:status: {}
            f:type: {}
          k:{"type":"Ready"}:
            .: {}
            f:lastProbeTime: {}
            f:lastTransitionTime: {}
            f:message: {}
            f:reason: {}
            f:status: {}
            f:type: {}
        f:containerStatuses: {}
        f:hostIP: {}
        f:phase: {}
        f:podIP: {}
        f:podIPs:
          .: {}
          k:{"ip":"10.10.13.37"}:
            .: {}
            f:ip: {}
        f:startTime: {}
    manager: kubelet
    operation: Update
    time: "2021-05-24T06:15:21Z"
  name: airflow-cleanup-1621836900-qjqbt
  namespace: airflow
  ownerReferences:
  - apiVersion: batch/v1
    blockOwnerDeletion: true
    controller: true
    kind: Job
    name: airflow-cleanup-1621836900
    uid: XXXXXXXXXXXXXXXX
  resourceVersion: "132358"
  uid: XXXXXXXXXXXXXXXX
spec:
  affinity: {}
  containers:
  - args:
    - kubernetes
    - cleanup-pods
    - --namespace=airflow
    env:
    - name: AIRFLOW__CORE__FERNET_KEY
      valueFrom:
        secretKeyRef:
          key: fernet-key
          name: airflow-fernet-key
    - name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
      valueFrom:
        secretKeyRef:
          key: connection
          name: airflow-postgres-password
    - name: AIRFLOW_CONN_AIRFLOW_DB
      valueFrom:
        secretKeyRef:
          key: connection
          name: airflow-postgres-password
    image: apache/airflow:2.0.2
    imagePullPolicy: IfNotPresent
    name: airflow-cleanup-pods
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /opt/airflow/airflow.cfg
      name: config
      readOnly: true
      subPath: airflow.cfg
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: airflow-cleanup-token-r6vnk
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: ip-10-10-13-20.eu-central-1.compute.internal
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: airflow-cleanup
  serviceAccountName: airflow-cleanup
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - configMap:
      defaultMode: 420
      name: airflow-airflow-config
    name: config
  - name: airflow-cleanup-token-r6vnk
    secret:
      defaultMode: 420
      secretName: airflow-cleanup-token-r6vnk

@XD-DENG
Copy link
Member

XD-DENG commented May 24, 2021

Hi @andormarkus , my bad that I should have made my request clearer: I meant a sample Pod running the Airflow task execution job (e.g. they should have labels like airflow-worker, dag_id, airflow_version, kubernetes_executor, etc.), so that we can better examine what's happening. Thanks.

@andormarkus
Copy link
Contributor Author

Hi @XD-DENG, this might related to version 2.0.2 and up. I have downgraded my version to 2.0.1 and it runs without problem. In #16020 I got other issue with 2.0.2 and up while 2.0.1 works without problem.

▶ kubectl -n airflow logs pod/airflow-cleanup-1621861200-m49j7

BACKEND=postgresql
DB_HOST=XXXXXXXXXXXXXXXXX
DB_PORT=5432

Loading Kubernetes configuration
Listing pods in namespace airflow
Inspecting pod airflow-cleanup-1621861200-m49j7
No action taken on pod airflow-cleanup-1621861200-m49j7
Inspecting pod airflow-s3-sync-1621861200-m9hqd
Deleting pod "airflow-s3-sync-1621861200-m9hqd" phase "succeeded" and reason "", restart policy "never"
Deleting POD "airflow-s3-sync-1621861200-m9hqd" from "airflow" namespace
{'api_version': 'v1',
 'code': None,
 'details': None,
 'kind': 'Pod',
 'message': None,
 'metadata': {'_continue': None,
              'remaining_item_count': None,
              'resource_version': '233459',
              'self_link': None},
 'reason': None,
 'status': "{'phase': 'Succeeded', 'conditions': [{'type': 'Initialized', "
           "'status': 'True', 'lastProbeTime': None, 'lastTransitionTime': "
           "'2021-05-24T13:00:02Z', 'reason': 'PodCompleted'}, {'type': "
           "'Ready', 'status': 'False', 'lastProbeTime': None, "
           "'lastTransitionTime': '2021-05-24T13:00:10Z', 'reason': "
           "'PodCompleted'}, {'type': 'ContainersReady', 'status': 'False', "
           "'lastProbeTime': None, 'lastTransitionTime': "
           "'2021-05-24T13:00:10Z', 'reason': 'PodCompleted'}, {'type': "
           "'PodScheduled', 'status': 'True', 'lastProbeTime': None, "
           "'lastTransitionTime': '2021-05-24T13:00:02Z'}], 'hostIP': "
           "'10.10.13.20', 'podIP': '10.10.13.126', 'podIPs': [{'ip': "
           "'10.10.13.126'}], 'startTime': '2021-05-24T13:00:02Z', "
           "'containerStatuses': [{'name': 'aws-cli', 'state': {'terminated': "
           "{'exitCode': 0, 'reason': 'Completed', 'startedAt': "
           "'2021-05-24T13:00:07Z', 'finishedAt': '2021-05-24T13:00:09Z', "
           "'containerID': "
           "'docker://e6a7d6adbac716cbd01ebd08d6a434121255280a3c3ef77706235ada7beb41e8'}}, "
           "'lastState': {}, 'ready': False, 'restartCount': 0, 'image': "
           "'amazon/aws-cli:2.2.5', 'imageID': "
           "'docker-pullable://amazon/aws-cli@sha256:0f5d7cd127969a6af0b2af62d7c111678dea1d9f10ef19e43a090c590bba6c77', "
           "'containerID': "
           "'docker://e6a7d6adbac716cbd01ebd08d6a434121255280a3c3ef77706235ada7beb41e8', "
           "'started': False}], 'qosClass': 'BestEffort'}"}
Inspecting pod airflow-scheduler-6757dffbf7-2pq6s
No action taken on pod airflow-scheduler-6757dffbf7-2pq6s
Inspecting pod airflow-scheduler-6757dffbf7-cwdts
No action taken on pod airflow-scheduler-6757dffbf7-cwdts
Inspecting pod airflow-scheduler-6757dffbf7-zdgjh
No action taken on pod airflow-scheduler-6757dffbf7-zdgjh
Inspecting pod airflow-statsd-84f4f9898-2hlrv
No action taken on pod airflow-statsd-84f4f9898-2hlrv
Inspecting pod airflow-webserver-7cd9f557d-7f4lf
No action taken on pod airflow-webserver-7cd9f557d-7f4lf
Inspecting pod airflow-webserver-7cd9f557d-qhdcp
No action taken on pod airflow-webserver-7cd9f557d-qhdcp
Inspecting pod airflow-webserver-7cd9f557d-r45th
No action taken on pod airflow-webserver-7cd9f557d-r45th
Inspecting pod etlkafkarawbronzeemarsyslisttables.9c9062c56081483eacf03ee92f07aaef
No action taken on pod etlkafkarawbronzeemarsyslisttables.9c9062c56081483eacf03ee92f07aaef

@eladkal eladkal added provider:cncf-kubernetes Kubernetes provider related issues affected_version:2.0 Issues Reported for 2.0 labels Jun 3, 2021
@jimmybchopps
Copy link

jimmybchopps commented Jul 25, 2021

@XD-DENG Any update on this? We're seeing the exact same issue on version 2.0.2. Our Kubernetes version is 1.20.2 and we have airflow in its own namespace. Here is what the label section looks like from the webserver where the command was manually run from to reproduce the issue:

labels:
component: webserver
pod-template-hash: {Redacted but Valid}
release: {Redacted but Valid}
tier: airflow

I didn't see any other labels outside of those, but maybe I missed them. Anything else to look at to further diagnose the issue? Can you share the exact api call it's trying to make?

@silvadenisaraujo
Copy link

Hey, reporting the same issue with airflow 2.1.1:

PS C:\Code\airflow> kubectl exec pod/airflow-scheduler-57bb5db948-bbl8p -n airflow -- airflow kubernetes cleanup-pods
Defaulted container "scheduler" out of: scheduler, git-sync, scheduler-gc, wait-for-airflow-migrations (init)
Loading Kubernetes configuration
Listing pods in namespace airflow
Traceback (most recent call last):
  File "/home/airflow/.local/bin/airflow", line 8, in <module>
    sys.exit(main())
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/__main__.py", line 40, in main
    args.func(args)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/cli_parser.py", line 48, in command
    return func(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/utils/cli.py", line 91, in wrapper
    return f(*args, **kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/airflow/cli/commands/kubernetes_command.py", line 111, in cleanup_pods
    pod_list = kube_client.list_namespaced_pod(**list_kwargs)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 12803, in list_namespaced_pod
    (data) = self.list_namespaced_pod_with_http_info(namespace, **kwargs)  # noqa: E501
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 12905, in list_namespaced_pod_with_http_info
    collection_formats=collection_formats)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 345, in call_api
    _preload_content, _request_timeout)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 176, in __call_api
    _request_timeout=_request_timeout)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 366, in request
    headers=headers)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 241, in GET
    query_params=query_params)
  File "/home/airflow/.local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 231, in request
    raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'c189a3fa-80f4-4ea2-86dd-6d09ccdf9bad', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Mon, 26 Jul 2021 15:45:24 GMT', 'Content-Length': '428'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"unable to parse requirement: invalid label key \"{'matchExpressions':\": name part must consist of alphanumeric characters, '-', '_' or '.', and must start and end with an alphanumeric character (e.g. 'MyName',  or 'my.name',  or '123-abc', regex used for validation is '([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9]')","reason":"BadRequest","code":400}
PS C:\Code\airflow> kubectl exec pod/airflow-scheduler-57bb5db948-bbl8p -n mti-algo-prd -- airflow version
Defaulted container "scheduler" out of: scheduler, git-sync, scheduler-gc, wait-for-airflow-migrations (init)
2.1.1

@jean-malo
Copy link

jean-malo commented Jul 28, 2021

Same issue here running apache-airflow==2.1.2 and apache-airflow-providers-cncf-kubernetes==2.0.0

It is not an issue with the labels (they are all valid) but rather an issue with the usage of match_expressions in this line. I haven't investigated the kubernetes python client but it seems like it's sending label selectors that are not formatted correctly to the kubernetes API.

Using a standard string based label selector seems to work fine see this commit as an example

I'd be happy to open a PR but I can't seem to fix the problem if still using client.V1LabelSelector rather than a plain string.

@dlampa
Copy link
Contributor

dlampa commented Jul 28, 2021

Hey @jean-malo I've reached the same conclusion re. V1LabelSelector and had the solution ready for a few days - please take a look at #17298 . I've created a simple script using the same approach as in the CLI outside the Airflow environment and had exactly the same error - this definitely looks like a kubernetes-client issue.

@XD-DENG
Copy link
Member

XD-DENG commented Jul 29, 2021

Thanks all for your feedbacks, and sorry for responding slowly (due to my day work schedule)

@dlampa , many thanks for helping look into the issue. I will find a time to review the PR you prepared within this week, and share my comments there.

Thanks again!

@XD-DENG
Copy link
Member

XD-DENG commented Jul 29, 2021

Hi @jean-malo , let's continue the discussion at @dlampa 's PR #17298 ?

I will add my comments there later.

Thanks!

@XD-DENG XD-DENG added this to the Airflow 2.1.3 milestone Jul 29, 2021
dlampa added a commit to dlampa/airflow that referenced this issue Jul 29, 2021
XD-DENG pushed a commit that referenced this issue Jul 29, 2021
…17298)

Fix for #16013 - CLI 'kubernetes cleanup-pods' fails on invalid label key
@XD-DENG
Copy link
Member

XD-DENG commented Jul 29, 2021

Hi all, thanks to @dlampa , we have a fix for this issue now (#17298).

The fix is merged into main branch now. I have marked it for milestone 2.1.3, so you should be able to expect having this fix in the next release 2.1.3.

jhtimmins pushed a commit that referenced this issue Aug 5, 2021
…17298)

Fix for #16013 - CLI 'kubernetes cleanup-pods' fails on invalid label key

(cherry picked from commit 36bdfe8)
@timothyclarke
Copy link

@XD-DENG When is 2.1.3 going to be released ?

jhtimmins pushed a commit that referenced this issue Aug 13, 2021
…17298)

Fix for #16013 - CLI 'kubernetes cleanup-pods' fails on invalid label key

(cherry picked from commit 36bdfe8)
@potiuk
Copy link
Member

potiuk commented Aug 14, 2021

~ wek from now if all goes well.

kaxil pushed a commit that referenced this issue Aug 17, 2021
…17298)

Fix for #16013 - CLI 'kubernetes cleanup-pods' fails on invalid label key

(cherry picked from commit 36bdfe8)
jhtimmins pushed a commit that referenced this issue Aug 17, 2021
…17298)

Fix for #16013 - CLI 'kubernetes cleanup-pods' fails on invalid label key

(cherry picked from commit 36bdfe8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.0 Issues Reported for 2.0 kind:bug This is a clearly a bug provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants