Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig #26008

Closed
1 of 2 tasks
Smirn08 opened this issue Aug 27, 2022 · 11 comments
Closed
1 of 2 tasks
Labels
affected_version:2.3 Issues Reported for 2.3 kind:bug This is a clearly a bug pending-response provider:cncf-kubernetes Kubernetes provider related issues
Milestone

Comments

@Smirn08
Copy link

Smirn08 commented Aug 27, 2022

Apache Airflow version

2.3.4

What happened

I have tasks in K8s with KubernetesExecutor for which I need to use a unique pod label. In version 2.3.4, I started catching an error Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig when executing a task but pods are created successfully.

What you think should happen instead

I think this is the effect of these changes, but I'm not sure.

*** Reading local file: /airflow/logs/dag_id=my_dag/run_id=manual__2022-08-27T11:23:42.943920+00:00/task_id=example/attempt=1.log
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1171} INFO - Dependencies all met for <TaskInstance: my_dag.example manual__2022-08-27T11:23:42.943920+00:00 [queued]>
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1171} INFO - Dependencies all met for <TaskInstance: my_dag.example manual__2022-08-27T11:23:42.943920+00:00 [queued]>
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1368} INFO - 
--------------------------------------------------------------------------------
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1369} INFO - Starting attempt 1 of 1
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1370} INFO - 
--------------------------------------------------------------------------------
[2022-08-27, 14:24:20 MSK] {taskinstance.py:1389} INFO - Executing <Task(PythonOperator): example> on 2022-08-27 11:23:42.943920+00:00
[2022-08-27, 14:24:20 MSK] {standard_task_runner.py:52} INFO - Started process 55 to run task
[2022-08-27, 14:24:20 MSK] {standard_task_runner.py:79} INFO - Running: ['airflow', 'tasks', 'run', 'my_dag', 'example', 'manual__2022-08-27T11:23:42.943920+00:00', '--job-id', '11062', '--raw', '--subdir', 'DAGS_FOLDER/my_dag.py', '--cfg-path', '/tmp/tmpo2fb44af', '--error-file', '/tmp/tmp9y8fjhzy']
[2022-08-27, 14:24:20 MSK] {standard_task_runner.py:80} INFO - Job 11062: Subtask example
[2022-08-27, 14:24:21 MSK] {task_command.py:371} INFO - Running <TaskInstance: my_dag.example manual__2022-08-27T11:23:42.943920+00:00 [running]> on host mydagexample -bf562e4f714543b0b1d8ee52c2e255ff
[2022-08-27, 14:24:21 MSK] {taskinstance.py:1902} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1463, in _run_raw_task
    self._execute_task_with_callbacks(context, test_mode)
  File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 1569, in _execute_task_with_callbacks
    rtif = RenderedTaskInstanceFields(ti=self, render_templates=False)
  File "<string>", line 4, in __init__
  File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 437, in _initialize_instance
    manager.dispatch.init_failure(self, args, kwargs)
  File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/util/langhelpers.py", line 72, in __exit__
    with_traceback=exc_tb,
  File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/opt/conda/lib/python3.7/site-packages/sqlalchemy/orm/state.py", line 434, in _initialize_instance
    return manager.original_init(*mixed[1:], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/airflow/models/renderedtifields.py", line 90, in __init__
    self.k8s_pod_yaml = ti.render_k8s_pod_yaml()
  File "/opt/conda/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 2250, in render_k8s_pod_yaml
    pod_override_object=PodGenerator.from_obj(self.executor_config),
  File "/opt/conda/lib/python3.7/site-packages/airflow/kubernetes/pod_generator.py", line 180, in from_obj
    'Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig'
TypeError: Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig
[2022-08-27, 14:24:22 MSK] {taskinstance.py:1412} INFO - Marking task as FAILED. dag_id=my_dag, task_id=example, execution_date=20220827T112342, start_date=20220827T112420, end_date=20220827T112422
[2022-08-27, 14:24:22 MSK] {standard_task_runner.py:97} ERROR - Failed to execute job 11062 for task example (Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig; 55)
[2022-08-27, 14:24:22 MSK] {local_task_job.py:156} INFO - Task exited with return code 1
[2022-08-27, 14:24:22 MSK] {local_task_job.py:279} INFO - 0 downstream tasks scheduled from follow-on schedule check

How to reproduce

I came to the conclusion that this error is due to the random name generation at "svc_name" variable.

Because if you remove the variable "svc_name" or make it unchanged, everything is fine.
It's not only related to the "metadata" parameter.

I will give a simple example.

template.py

import uuid
from typing import Dict
from kubernetes.client import models as k8s

def get_executor_config() -> Dict[str, k8s.V1Pod]:
   svc_name = str(uuid.uuid4()).replace("-", "") # error
   # svc_name = "static_name" # it's fine
   executor_config = {
       "pod_override": k8s.V1Pod(
           metadata=k8s.V1ObjectMeta(labels={"app": svc_name}),
           spec=k8s.V1PodSpec(
               containers=[
                   k8s.V1Container(name="base"),
               ],
           ),
       ),
   }
   return executor_config

my_dag.py

from airflow import DAG
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from template import get_executor_config

def my_func():
    print("example")

default_args = {
    "owner": "Airflow",
    "start_date": days_ago(1),
}

with DAG(dag_id="my_dag", default_args=default_args, schedule_interval=None) as dag:
  task = PythonOperator(
      task_id="example",
      python_callable=my_func,
      executor_config=get_executor_config(),
      dag=dag,
  )

Operating System

Debian GNU/Linux 9 (stretch)

Versions of Apache Airflow Providers

apache-airflow-providers-apache-hdfs==3.1.0
apache-airflow-providers-apache-hive==4.0.0
apache-airflow-providers-apache-spark==3.0.0
apache-airflow-providers-cncf-kubernetes==4.3.0
apache-airflow-providers-common-sql==1.1.0
apache-airflow-providers-ftp==3.1.0
apache-airflow-providers-http==4.0.0
apache-airflow-providers-imap==3.0.0
apache-airflow-providers-jdbc==3.2.0
apache-airflow-providers-postgres==5.2.0
apache-airflow-providers-sftp==4.0.0
apache-airflow-providers-sqlite==3.2.0
apache-airflow-providers-ssh==3.1.0

Deployment

Other Docker-based deployment

Deployment details

kubectl version

Client Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.8", GitCommit:"ec6eb119b81be488b030e849b9e64fda4caaf33c", GitTreeState:"clean", BuildDate:"2020-03-12T21:00:06Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.8", GitCommit:"35dc4cdc26cfcb6614059c4c6e836e5f0dc61dee", GitTreeState:"clean", BuildDate:"2020-06-26T03:36:03Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Python 3.7.7 and all Airflow constraints
DB - PostgreSQL v12

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@Smirn08 Smirn08 added area:core kind:bug This is a clearly a bug labels Aug 27, 2022
@boring-cyborg
Copy link

boring-cyborg bot commented Aug 27, 2022

Thanks for opening your first issue here! Be sure to follow the issue template!

@Smirn08
Copy link
Author

Smirn08 commented Aug 29, 2022

Hello @dstandish ! Could you pay attention, if it possible, thank you

@Smirn08 Smirn08 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2022
@Smirn08 Smirn08 reopened this Aug 29, 2022
@Smirn08 Smirn08 closed this as completed Aug 29, 2022
@Smirn08 Smirn08 reopened this Aug 29, 2022
@uranusjr
Copy link
Member

I was just seeing a separate case for this particular error, and the setup also has a similar pattern—the DAG file imports a helper module (template here) that implements a function that returns the custom executor_config. And the error is emitted from here:

if isinstance(k8s_object, k8s.V1Pod):
return k8s_object
elif isinstance(k8s_legacy_object, dict):
warnings.warn(
'Using a dictionary for the executor_config is deprecated and will soon be removed.'
'please use a `kubernetes.client.models.V1Pod` class with a "pod_override" key'
' instead. ',
category=RemovedInAirflow3Warning,
)
return PodGenerator.from_legacy_obj(obj)
else:
raise TypeError(
'Cannot convert a non-kubernetes.client.models.V1Pod object into a KubernetesExecutorConfig'
)

Since there’s an isinstance(k8s_object, k8s.V1Pod) check right at the top (that should cover this executor_config value), this means the (de-)serialisation process could be causing the issue, or the V1Pod import may have changed during the process. I’m not sure how this can be fixed yet, but a temporary workaround would be to use the old executor config format for now:

custom_executor_config = {
    "KubernetesExecutor": {
        # Put worker configuration here. Possible arguments see:
        # PodGenerator in airflow/kubernetes/pod_generator_deprecated.py
    },
}

If this is a serialisation issue, #26191 may be the fix for this.

@hterik
Copy link
Contributor

hterik commented Sep 16, 2022

Is there a way to avoid serializing+deserializing the executor_config.pod_override in the first place? It should be enough that the scheduler loads it from the dag right before launching the pod? Why does it need to be in the database at all?
There's been quite a few bugs around this lately, would be good if it could be avoided entirely.

@uranusjr
Copy link
Member

The scheduler never loads the DAG from file because it would enable arbitrary code execution that may break the scheduler internals. The documentation explains this: https://airflow.apache.org/docs/apache-airflow/stable/dag-serialization.html

@potiuk
Copy link
Member

potiuk commented Sep 19, 2022

Yes. Scheduler ONLY works on serialized version of the DAG where only structure and relations are kept. Actually in 2.3 to some extent and in 2.4. where we completed DAGFIleProcessor separation, technically you could run scheduler similarly as webserver without accessing DAG folder I believe (and possibly in the next chart we might support this case and do not mount DAG files to scheduler at all.

@dstandish
Copy link
Contributor

I was able to reproduce this on 2.3.4 and it does appear to be the same issue resolved by #26191

Now that 2.4.0rc1 is available for testing, can you try it out and confirm it's resolved @hterik @Smirn08 ?

@george-zubrienko
Copy link

george-zubrienko commented Sep 19, 2022

On some of our dags tasks became stuck in queued state with scheduler reporting invalid executor config (assigned with pod_override: V1Pod), after we tried to upgrade to 2.3.4. Not sure if this is the same issue though.

Upgrading to 2.4.0 doesn't help. Downgrading to 2.3.3 fixes the issue.

@dstandish
Copy link
Contributor

dstandish commented Sep 19, 2022

@george-zubrienko can you share the executor config / sample dag and the traceback? Please also comment on whether this is an inflight dag run or whether it's new dag runs created after upgrade?

@eladkal eladkal added pending-response provider:cncf-kubernetes Kubernetes provider related issues affected_version:2.3 Issues Reported for 2.3 and removed area:core labels Sep 21, 2022
@dstandish
Copy link
Contributor

Yeah i can repro this on 2.3.4 but not on 2.4.0
I believe #26191 does fix this.

@potiuk
Copy link
Member

potiuk commented Sep 21, 2022

Closing then - we can always reopen if we get reports it is not fixed.

@potiuk potiuk closed this as completed Sep 21, 2022
@potiuk potiuk added this to the Airflow 2.4.0 milestone Sep 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affected_version:2.3 Issues Reported for 2.3 kind:bug This is a clearly a bug pending-response provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

No branches or pull requests

7 participants