-
Notifications
You must be signed in to change notification settings - Fork 14.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kubernetes Invalid executor_config, pod_override filled with Encoding.VAR #26101
Comments
+1 seeing the same thing on 2.3.4, KubernetesExector. Wanted to provide more info. We run 40 DAGs with several thousand total daily tasks, and every day we have 5-10 tasks that get stuck in Queued state. We see the following sequence in the logs:
The in a repeated loop we see logs like this:
for each stuck task, followed shortly by:
This continues until the task is marked as failed in the airflow UI. The executor_config on the invalid tasks looks like this:
Which is odd, because for the thousands of unaffected tasks it looks like this:
|
@dstandish this seems to introduce non-deterministic behavior for any pod that uses KubernetesExecuter and specifies a pod override. Only pods that specify an override seem to be impacted. And the issue seems to be that the deserialization of the executer config is broken in some way in the new system. This can be easily repro'd in the airflow web UI by hitting reload repeatedly on the task instance details pane for a task that uses KubernetesExecuter and specifies an executor config. I was even able to cause the web ui to crash as a result. |
The bind processor logic had the effect of applying serialization logic multiple times, which produced an incorrect serialization output. Resolves apache#26101.
The bind processor logic had the effect of applying serialization logic multiple times, which produced an incorrect serialization output. Resolves #26101.
The bind processor logic had the effect of applying serialization logic multiple times, which produced an incorrect serialization output. Resolves apache#26101. (cherry picked from commit 87108d7)
Apache Airflow version
2.3.4
What happened
Trying to start Kubernetes tasks using a
pod_override
results in pods not starting after upgrading from 2.3.2 to 2.3.4The pod_override look very odd, filled with many Encoding.VAR objects, see following scheduler log:
Looking in the UI, the task get stuck in scheduled state forever. By clicking instance details, it shows similar state of the pod_override with many Encoding.VAR.
This appears like a recent addition, in 2.3.4 via #24356.
@dstandish do you understand if this is connected?
What you think should happen instead
No response
How to reproduce
No response
Operating System
Debian GNU/Linux 11 (bullseye)
Versions of Apache Airflow Providers
apache-airflow-providers-celery==3.0.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-common-sql==1.1.0
apache-airflow-providers-docker==3.1.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-postgres==5.2.0
apache-airflow-providers-sqlite==3.2.0
kubernetes==23.6.0
Deployment
Other Docker-based deployment
Deployment details
No response
Anything else
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: