Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialization of nested attr objects #38540

Closed
2 tasks done
tomrutter opened this issue Mar 27, 2024 · 0 comments · Fixed by #38591
Closed
2 tasks done

Serialization of nested attr objects #38540

tomrutter opened this issue Mar 27, 2024 · 0 comments · Fixed by #38591
Assignees
Labels
area:core kind:bug This is a clearly a bug

Comments

@tomrutter
Copy link
Contributor

Apache Airflow version

2.8.4

If "Other Airflow 2 version" selected, which one?

No response

What happened?

I am passing values between tasks using xcom. These values are instances of a attrs class that has another attrs class as a member. The serialisation currently fails to correctly mark the inner class with the class tags to allow it to be deserialized back to the attrs class (it remains as a dict after loading into the downstream task).

What you think should happen instead?

I think that the nested attrs objects should be returned unchanged after deserialization.

How to reproduce

The value of foo passed to task2 has the inner_value as a dict rather than a InnerClass instance.

import attrs

@attrs.define(kw_only=True, frozen=True, slots=False)
class OuterClass:
  inner_value: InnerClass

@attrs.define(kw_only=True, frozen=True, slots=False)
class InnerClass:
  x: str


@task
def task1():
    return OuterClass(inner_value=InnerClass("test!")

@task
def task2(foo: OuterClass):
    print(foo)  # foo here is OuterClass(inner_value={"x": "test!"})

@dag
def my_dag():
    task2(task1())

Operating System

linux (standard airflow slim images extended with airflow providers running on kubernetes)

Versions of Apache Airflow Providers

defaults for 2.8.4

Deployment

Official Apache Airflow Helm Chart

Deployment details

Airflow deployment on Azure Kubernetes using postgres backend db.

Anything else?

I think a simple solution would be to update the following code in airflow.serialization.serde.py (lines 182 to 187), to change the value of recurse in the call to attr.asdict from True to False, leaving the serialization of inner classes to the subsequent call to serialize.

Before:

    # attr annotated
    if attr.has(cls):
        # Only include attributes which we can pass back to the classes constructor
        data = attr.asdict(cast(attr.AttrsInstance, o), recurse=True, filter=lambda a, v: a.init)
        dct[DATA] = serialize(data, depth + 1)
        return dct

After:

    # attr annotated
    if attr.has(cls):
        # Only include attributes which we can pass back to the classes constructor
        data = attr.asdict(cast(attr.AttrsInstance, o), recurse=False, filter=lambda a, v: a.init)
        dct[DATA] = serialize(data, depth + 1)
        return dct

That is

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@tomrutter tomrutter added area:core kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 27, 2024
@jscheffl jscheffl removed the needs-triage label for new issues that we didn't triage yet label Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core kind:bug This is a clearly a bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants