-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: Delegate serialization to pydantic #8286
Python: Delegate serialization to pydantic #8286
Conversation
I hope that I understand what the issue is about. If not, I'd like to fix it. Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aless10 Thanks for working on this. The idea is to delegate all the parsing to Pydantic, which is probably faster/smarter about this.
@@ -353,7 +356,7 @@ def check_sort_orders(cls, values: Dict[str, Any]) -> Dict[str, Any]: | |||
increasing long that tracks the order of snapshots in a table.""" | |||
|
|||
|
|||
TableMetadata = Union[TableMetadataV1, TableMetadataV2] | |||
TableMetadata = Annotated[Union[TableMetadataV1, TableMetadataV2], Field(discriminator="format_version")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love it
python/pyiceberg/table/metadata.py
Outdated
def parse_raw(data: str) -> TableMetadata: | ||
try: | ||
d = f'{{"table_metadata": {data}}}' | ||
metadata_wrapper = TableMetadataUtil._MetadataWrapper.parse_raw(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of the wrapper doesn't make sense to me. Shouldn't
metadata_wrapper = TableMetadataUtil._MetadataWrapper.parse_raw(d) | |
metadata_wrapper = TableMetadata.parse_raw(d) |
work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, TableMetadata is a type, not a pydantic model. So it does not have the parse_raw method. I get that we want to achieve this. I'll try to study pydantic more to see if it is possible to achieve this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, I don't like the solution that I found, but it works
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I also looked into the discriminator to automatically deserialize into the right version. Would be great to get it working, but it is a bit rough it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The discriminator works if it is a field in a model. But it does not work per se. We can use it as I did or implement a sort of factory method (as it is right now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to look at what pydantic does and basically it takes the string and then calls json.load. So one of the thing that we can do is:
- move the json.load call inside this method
- call the parse_object: now we don't have to concatenate the string since the object is the correct one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also bumping Pydantic to 2.0: #7782 Maybe we can see if that offers something new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @aless10 🙌🏻
(It should) closes #7982
TableMetadataUtil has a
parse_raw
method where it receives a string that will be parsed by a pydantic model. This model will also return the correct TableMetadata version based on the fieldformat_version
.This is like a strategy method that is now handled with a pydantic model (before, there were a couple of if statements).