Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert AssetSelection subclasses from NamedTuples to Pydantic #19197

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Jan 12, 2024

Summary & Motivation

This PR updates AssetSelection and its subclasses to inherit from pydantic.BaseModel instead of NamedTuple. NamedTuple._replace() was replaced by AssetSelection.replace(), a wrapper for copy (pydantic v1) and model_copy (pydantic v2)

After several attempts, to make this work

  • the super class AssetSelection must inherit from pydantic.BaseModel
  • the super class and subclasses must have the config frozen=True, which allows all of them to be immutable, like NamedTuple.

Because of that, this PR udpates:

  • all classes in asset_selection.py to reflect the changes.
  • the class DbtManifestAssetSelection in dagster-dbt, which inherits from AssetSelection. It is now immutable.
  • the code using the classes - when inheriting from pydantic.BaseModel, instantiating new objects require keyword-only arguments except if __init__() is redefined for the class, so the code was updated to use the keywords.
  • the tests to reflect the changes.

How I Tested These Changes

BK

@maximearmstrong maximearmstrong self-assigned this Jan 12, 2024
@maximearmstrong maximearmstrong marked this pull request as ready for review January 12, 2024 20:02
Copy link
Contributor

@sryza sryza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hallelujah!

Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a relatively light weight back-compat test would be to get a serialized string for one/some of these in master and check it in directly to a test ensuring it loads successfully. This can de done with serialize_value and deserialize_value from dagster._serdes

edit: There are general tests for NamedTuple -> dataclass in the serdes library, so adding a specific test here isn't necessary, just some extra validation and maybe a good learning exercise.

@@ -1,8 +1,9 @@
import collections.abc
import operator
from abc import ABC, abstractmethod
from dataclasses import dataclass, replace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could consider using pydantic dataclass if we want these to be run time type checked

https://docs.pydantic.dev/latest/concepts/dataclasses/

but since it appears the original NamedTuple variants did not have a custom __new__ with checks that would be new additional behavior

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it was a good idea and wanted to do it, but I seems pydantic doesn't support replace, so I think for this use case we can keep dataclasses which does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would model_copy work? pydantic/pydantic#3352

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We support both pydantic 1 & 2 and those comments make it seem like model_copy is 2 only, something to double check

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, interesting. That would most likely work. I will implement it before merging.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like copy -> model_copy was a name change in 1 -> 2
https://docs.pydantic.dev/latest/migration/#changes-to-pydanticbasemodel
so I think we would need to handle that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could write a wrapper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few attempts to properly handle inheritance and frozen classes, the code and PR description are updated. @alangenfeld @sryza, lmk what you think!

@maximearmstrong maximearmstrong marked this pull request as draft January 16, 2024 22:57
@maximearmstrong maximearmstrong marked this pull request as ready for review January 17, 2024 15:29
@maximearmstrong maximearmstrong requested a review from sryza January 17, 2024 16:04
@maximearmstrong maximearmstrong changed the title Convert AssetSelection subclasses from NamedTuples to dataclasses Convert AssetSelection subclasses from NamedTuples to Pydantic Jan 17, 2024
Copy link
Member

@alangenfeld alangenfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe worth adding a test that demonstrates the new runtime type checking that this adds

@@ -396,9 +402,16 @@ def from_coercible(cls, selection: CoercibleToAssetSelection) -> "AssetSelection
def to_serializable_asset_selection(self, asset_graph: AssetGraph) -> "AssetSelection":
return AssetSelection.keys(*self.resolve(asset_graph))

def replace(self, update: dict):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could consider having this be **kwargs if we wanted to not have to change the replace callsite args

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 37aea77

@maximearmstrong
Copy link
Contributor Author

maximearmstrong commented Jan 24, 2024

maybe worth adding a test that demonstrates the new runtime type checking that this adds

@alangenfeld Good call, done in b9db7f0

@maximearmstrong maximearmstrong merged commit ccdfdea into master Jan 25, 2024
1 check passed
@maximearmstrong maximearmstrong deleted the maxime/convert-assetselection-subclasses-from-namedtuples-to-dataclasses branch January 25, 2024 00:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants