-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI 'kubernetes cleanup-pods' fails on invalid label key #16013
Comments
CC: @XD-DENG |
This is interesting, because it seems like the error arises because of selecting pods using an invalid label key. However, according to the Airflow 2.0.2 related CLI code, all keys we used align with the legal pattern I can take a further look later. If @andormarkus has any other potentially related information, please share here as well. Thanks. |
Hi @XD-DENG, Im more than happy to provide related information however I need guidance how to provide it. |
Thanks @andormarkus Something I have in mind may be a sample pod yaml in your environment, i.e. the output of (I'm a bit busy recently. If I don't get back here in the coming few days, please feel free to ping me here for reminding. Thanks😄) |
I have attached the output of the cleanup pod. It does not matter if the script is running in a scheduler/webserver/cleanup pod ▶ kubectl -n airflow get pods airflow-cleanup-1621836900-qjqbt -o yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/psp: eks.privileged
sidecar.istio.io/inject: "false"
creationTimestamp: "2021-05-24T06:15:08Z"
generateName: airflow-cleanup-1621836900-
labels:
controller-uid: XXXXXXXXXXXXXXXX
job-name: airflow-cleanup-1621836900
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:sidecar.istio.io/inject: {}
f:generateName: {}
f:labels:
.: {}
f:controller-uid: {}
f:job-name: {}
f:ownerReferences:
.: {}
k:{"uid":"XXXXXXXXXXXXXXXX"}:
.: {}
f:apiVersion: {}
f:blockOwnerDeletion: {}
f:controller: {}
f:kind: {}
f:name: {}
f:uid: {}
f:spec:
f:affinity: {}
f:containers:
k:{"name":"airflow-cleanup-pods"}:
.: {}
f:args: {}
f:env:
.: {}
k:{"name":"AIRFLOW__CORE__FERNET_KEY"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef:
.: {}
f:key: {}
f:name: {}
k:{"name":"AIRFLOW__CORE__SQL_ALCHEMY_CONN"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef:
.: {}
f:key: {}
f:name: {}
k:{"name":"AIRFLOW_CONN_AIRFLOW_DB"}:
.: {}
f:name: {}
f:valueFrom:
.: {}
f:secretKeyRef:
.: {}
f:key: {}
f:name: {}
f:image: {}
f:imagePullPolicy: {}
f:name: {}
f:resources: {}
f:terminationMessagePath: {}
f:terminationMessagePolicy: {}
f:volumeMounts:
.: {}
k:{"mountPath":"/opt/airflow/airflow.cfg"}:
.: {}
f:mountPath: {}
f:name: {}
f:readOnly: {}
f:subPath: {}
f:dnsPolicy: {}
f:enableServiceLinks: {}
f:restartPolicy: {}
f:schedulerName: {}
f:securityContext: {}
f:serviceAccount: {}
f:serviceAccountName: {}
f:terminationGracePeriodSeconds: {}
f:volumes:
.: {}
k:{"name":"config"}:
.: {}
f:configMap:
.: {}
f:defaultMode: {}
f:name: {}
f:name: {}
manager: kube-controller-manager
operation: Update
time: "2021-05-24T06:15:08Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:status:
f:conditions:
k:{"type":"ContainersReady"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
k:{"type":"Initialized"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:status: {}
f:type: {}
k:{"type":"Ready"}:
.: {}
f:lastProbeTime: {}
f:lastTransitionTime: {}
f:message: {}
f:reason: {}
f:status: {}
f:type: {}
f:containerStatuses: {}
f:hostIP: {}
f:phase: {}
f:podIP: {}
f:podIPs:
.: {}
k:{"ip":"10.10.13.37"}:
.: {}
f:ip: {}
f:startTime: {}
manager: kubelet
operation: Update
time: "2021-05-24T06:15:21Z"
name: airflow-cleanup-1621836900-qjqbt
namespace: airflow
ownerReferences:
- apiVersion: batch/v1
blockOwnerDeletion: true
controller: true
kind: Job
name: airflow-cleanup-1621836900
uid: XXXXXXXXXXXXXXXX
resourceVersion: "132358"
uid: XXXXXXXXXXXXXXXX
spec:
affinity: {}
containers:
- args:
- kubernetes
- cleanup-pods
- --namespace=airflow
env:
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
key: fernet-key
name: airflow-fernet-key
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
key: connection
name: airflow-postgres-password
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
key: connection
name: airflow-postgres-password
image: apache/airflow:2.0.2
imagePullPolicy: IfNotPresent
name: airflow-cleanup-pods
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /opt/airflow/airflow.cfg
name: config
readOnly: true
subPath: airflow.cfg
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: airflow-cleanup-token-r6vnk
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: ip-10-10-13-20.eu-central-1.compute.internal
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: airflow-cleanup
serviceAccountName: airflow-cleanup
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- configMap:
defaultMode: 420
name: airflow-airflow-config
name: config
- name: airflow-cleanup-token-r6vnk
secret:
defaultMode: 420
secretName: airflow-cleanup-token-r6vnk |
Hi @andormarkus , my bad that I should have made my request clearer: I meant a sample Pod running the Airflow task execution job (e.g. they should have labels like |
Hi @XD-DENG, this might related to version ▶ kubectl -n airflow logs pod/airflow-cleanup-1621861200-m49j7
BACKEND=postgresql
DB_HOST=XXXXXXXXXXXXXXXXX
DB_PORT=5432
Loading Kubernetes configuration
Listing pods in namespace airflow
Inspecting pod airflow-cleanup-1621861200-m49j7
No action taken on pod airflow-cleanup-1621861200-m49j7
Inspecting pod airflow-s3-sync-1621861200-m9hqd
Deleting pod "airflow-s3-sync-1621861200-m9hqd" phase "succeeded" and reason "", restart policy "never"
Deleting POD "airflow-s3-sync-1621861200-m9hqd" from "airflow" namespace
{'api_version': 'v1',
'code': None,
'details': None,
'kind': 'Pod',
'message': None,
'metadata': {'_continue': None,
'remaining_item_count': None,
'resource_version': '233459',
'self_link': None},
'reason': None,
'status': "{'phase': 'Succeeded', 'conditions': [{'type': 'Initialized', "
"'status': 'True', 'lastProbeTime': None, 'lastTransitionTime': "
"'2021-05-24T13:00:02Z', 'reason': 'PodCompleted'}, {'type': "
"'Ready', 'status': 'False', 'lastProbeTime': None, "
"'lastTransitionTime': '2021-05-24T13:00:10Z', 'reason': "
"'PodCompleted'}, {'type': 'ContainersReady', 'status': 'False', "
"'lastProbeTime': None, 'lastTransitionTime': "
"'2021-05-24T13:00:10Z', 'reason': 'PodCompleted'}, {'type': "
"'PodScheduled', 'status': 'True', 'lastProbeTime': None, "
"'lastTransitionTime': '2021-05-24T13:00:02Z'}], 'hostIP': "
"'10.10.13.20', 'podIP': '10.10.13.126', 'podIPs': [{'ip': "
"'10.10.13.126'}], 'startTime': '2021-05-24T13:00:02Z', "
"'containerStatuses': [{'name': 'aws-cli', 'state': {'terminated': "
"{'exitCode': 0, 'reason': 'Completed', 'startedAt': "
"'2021-05-24T13:00:07Z', 'finishedAt': '2021-05-24T13:00:09Z', "
"'containerID': "
"'docker://e6a7d6adbac716cbd01ebd08d6a434121255280a3c3ef77706235ada7beb41e8'}}, "
"'lastState': {}, 'ready': False, 'restartCount': 0, 'image': "
"'amazon/aws-cli:2.2.5', 'imageID': "
"'docker-pullable://amazon/aws-cli@sha256:0f5d7cd127969a6af0b2af62d7c111678dea1d9f10ef19e43a090c590bba6c77', "
"'containerID': "
"'docker://e6a7d6adbac716cbd01ebd08d6a434121255280a3c3ef77706235ada7beb41e8', "
"'started': False}], 'qosClass': 'BestEffort'}"}
Inspecting pod airflow-scheduler-6757dffbf7-2pq6s
No action taken on pod airflow-scheduler-6757dffbf7-2pq6s
Inspecting pod airflow-scheduler-6757dffbf7-cwdts
No action taken on pod airflow-scheduler-6757dffbf7-cwdts
Inspecting pod airflow-scheduler-6757dffbf7-zdgjh
No action taken on pod airflow-scheduler-6757dffbf7-zdgjh
Inspecting pod airflow-statsd-84f4f9898-2hlrv
No action taken on pod airflow-statsd-84f4f9898-2hlrv
Inspecting pod airflow-webserver-7cd9f557d-7f4lf
No action taken on pod airflow-webserver-7cd9f557d-7f4lf
Inspecting pod airflow-webserver-7cd9f557d-qhdcp
No action taken on pod airflow-webserver-7cd9f557d-qhdcp
Inspecting pod airflow-webserver-7cd9f557d-r45th
No action taken on pod airflow-webserver-7cd9f557d-r45th
Inspecting pod etlkafkarawbronzeemarsyslisttables.9c9062c56081483eacf03ee92f07aaef
No action taken on pod etlkafkarawbronzeemarsyslisttables.9c9062c56081483eacf03ee92f07aaef |
@XD-DENG Any update on this? We're seeing the exact same issue on version 2.0.2. Our Kubernetes version is 1.20.2 and we have airflow in its own namespace. Here is what the label section looks like from the webserver where the command was manually run from to reproduce the issue: labels: I didn't see any other labels outside of those, but maybe I missed them. Anything else to look at to further diagnose the issue? Can you share the exact api call it's trying to make? |
Hey, reporting the same issue with airflow 2.1.1:
|
Same issue here running It is not an issue with the labels (they are all valid) but rather an issue with the usage of Using a standard string based label selector seems to work fine see this commit as an example I'd be happy to open a PR but I can't seem to fix the problem if still using |
Hey @jean-malo I've reached the same conclusion re. |
Thanks all for your feedbacks, and sorry for responding slowly (due to my day work schedule) @dlampa , many thanks for helping look into the issue. I will find a time to review the PR you prepared within this week, and share my comments there. Thanks again! |
Hi @jean-malo , let's continue the discussion at @dlampa 's PR #17298 ? I will add my comments there later. Thanks! |
@XD-DENG When is |
~ wek from now if all goes well. |
Apache Airflow version: 2.0.2
Helm chart version: 1.0.0
Kubernetes version: 1.20
What happened:
Airflow airflow-cleanup cronjob is failing with the error below. When I run the same command form the webserver or scheduler pod I got the same error.
How to reproduce it:
Create and airflow deployment with Helm chart
Enable automatic cleanup
Run command
airflow kubernetes cleanup-pods --namespace airflow
The text was updated successfully, but these errors were encountered: