-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
strict_dataclass #20461
strict_dataclass #20461
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on brief googling I'm not persuaded that we should wrap dataclass
rather than BaseModel
.
BaseModel
is the main way that people use pydantic and I think our default position should be reflecting the default way that people use it.
I'm also a little wary of adding the dependency on dataclass_transform
here.
How many of our namedtuples are public APIs and would need to support positionals? I also think it would be fairly straightforward to support positional by customizing |
Do you have evidence for this? Also, my suspicion is that a lot more people are familiar with dataclasses than pydantic.
|
This is a sample from scanning through the first ~third of our public exports:
|
@schrockn it's certainly not the only benefit of pydantic, but probably the #1 thing that makes me excited about switching from NamedTuples to pydantic is the opportunity to get rid of all our boilerplate constructors. So would be very sad to move to a pydantic approach that requires to keep many of these. |
Fair.
It doesn't do runtime type-checking, so it wouldn't be a replacement for all the type-checking code we have. |
Ah yes of course.
Yes that will be nice. However just to manage expectations that we will still need custom initializers for any serdes class that has changed, or for any class where we want to massage arguments (e.g. CoercibleToAssetKey argument that ends up an AssetKey field). In terms of my comment "I also think it would be fairly straightforward to support positional by customizing init our subclass" I want to be clear I'm talking about supporting this is our own BaseModel class, rather than forcing all subclasses to customize init |
Ah I misunderstood. If we can find a way to do this then I agree that's the best option. However I did a bunch of investigation on this this morning and couldn't figure out a way to do it that preserves static type checking. If you have ideas about avenues to investigate to get this to work, I'd be happy to explore them. |
Yeah the version I imagine is probably similar to what you prototyped where it accepts *args and **kwargs and does some reflection-based remapping, which probably breaks static typing. @smackesey may have some additional ideas here. My recommendation is a dagster-specific class Inheriting from In terms of those existing public classes, a good deal of them require custom initialization anyways. Any class that has been serialized and changed at all, and any class that does any manipulation (e.g. a For new classes we can use the built-in init logic until we need to change serialization or do some transformation on inputs. This approach also has the advantage of being more minimal. from dagster._core.definitions.events import AssetKey, CoercibleToAssetKey
from pydantic import BaseModel, Extra
class Pedantic(BaseModel):
class Config:
frozen = True
extra = Extra.forbid
class ThingThatAcceptsCoercibleToAssetKey(Pedantic):
def __init__(self, asset_key: CoercibleToAssetKey):
super().__init__(**dict(asset_key=AssetKey.from_coercible(asset_key)))
asset_key: AssetKey
print(ThingThatAcceptsCoercibleToAssetKey("asset_key")) I would be surprised if there weren't edge case bugs in the cross product of pydantic versions, python versions, and typing_extension versions in the dataclass wrapper variant. Whereas in the above there is less chance of that. |
Of the classes listed in my comment above, I believe the 6 whose names start with "Asset" all require special constructors, and the other 10 do not. When I sampled a few more further down in the file, I got a similar ratio.
Yeah this was the first thing that I tried and it has a lot of things going for it. But still concerned about the amount of boilerplate it implies.
Would definitely be bad if this were the case. FWIW the minimum version of pydantic that we support imports typing_extensions.dataclass_transform: https://github.com/pydantic/pydantic/blob/v1.10.1/pydantic/dataclasses.py#L39 I suspect you won't be super hot on this, but just to enumerate the space of options, another direction to consider is supporting both. E.g. we could have a subclass of |
A point against the dataclass approach, which gives me pause, is that overriding init gets more complicated. Will need to play around with this but I think we might need to override new instead. |
Don't think so. Any of them that have return super(In, cls).__new__(
cls,
dagster_type=(
NoValueSentinel
if dagster_type is NoValueSentinel
else resolve_dagster_type(dagster_type)
),
Given the above (and the point about serializable classes), I think both approaches have similar amounts of boilerplate in practice and in the cases where we do need a custom constructor we won't need the check calls so it will feel quite a bit better. |
Serdes change POC: #20470 |
## Summary & Motivation Introduces a `StrictModel` class, which is a subclass of `pydantic.BaseModel` that's frozen and doesn't allow default constructor arguments that aren't class members. The idea is for this to eventually replace all our uses of `NamedTuple`, `BaseModel`, and `@dataclass`. This is a replacement for the unmerged #20461. Downstream PRs that demonstrate its usage: - #20641 - #20643 - #20638 ## How I Tested These Changes
## Summary & Motivation Introduces a `StrictModel` class, which is a subclass of `pydantic.BaseModel` that's frozen and doesn't allow default constructor arguments that aren't class members. The idea is for this to eventually replace all our uses of `NamedTuple`, `BaseModel`, and `@dataclass`. This is a replacement for the unmerged #20461. Downstream PRs that demonstrate its usage: - #20641 - #20643 - #20638 ## How I Tested These Changes
Summary & Motivation
This one was a doozy.
Introduces a
@strict_dataclass
decorator, with the aim of replacing our usage ofNamedTuple
andpydantic.BaseModel
.Why not
pydantic.BaseModel
?frozen=True
, which is annoying and error-prone.Why not
@pydantic.dataclass(frozen=True, config=dict(extra="forbid"))
This would require remembering to use
frozen=True, config=dict(extra="forbid")
, which is annoying and error-prone.How I Tested These Changes