Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: Delegate serialization to pydantic #8286

Merged
merged 3 commits into from
Aug 15, 2023

Conversation

aless10
Copy link
Contributor

@aless10 aless10 commented Aug 10, 2023

(It should) closes #7982

TableMetadataUtil has a parse_raw method where it receives a string that will be parsed by a pydantic model. This model will also return the correct TableMetadata version based on the field format_version.
This is like a strategy method that is now handled with a pydantic model (before, there were a couple of if statements).

@aless10
Copy link
Contributor Author

aless10 commented Aug 10, 2023

I hope that I understand what the issue is about. If not, I'd like to fix it. Thanks.

Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aless10 Thanks for working on this. The idea is to delegate all the parsing to Pydantic, which is probably faster/smarter about this.

python/pyiceberg/catalog/rest.py Show resolved Hide resolved
python/pyiceberg/table/metadata.py Outdated Show resolved Hide resolved
@@ -353,7 +356,7 @@ def check_sort_orders(cls, values: Dict[str, Any]) -> Dict[str, Any]:
increasing long that tracks the order of snapshots in a table."""


TableMetadata = Union[TableMetadataV1, TableMetadataV2]
TableMetadata = Annotated[Union[TableMetadataV1, TableMetadataV2], Field(discriminator="format_version")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it

python/tests/table/test_metadata.py Show resolved Hide resolved
def parse_raw(data: str) -> TableMetadata:
try:
d = f'{{"table_metadata": {data}}}'
metadata_wrapper = TableMetadataUtil._MetadataWrapper.parse_raw(d)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of the wrapper doesn't make sense to me. Shouldn't

Suggested change
metadata_wrapper = TableMetadataUtil._MetadataWrapper.parse_raw(d)
metadata_wrapper = TableMetadata.parse_raw(d)

work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, TableMetadata is a type, not a pydantic model. So it does not have the parse_raw method. I get that we want to achieve this. I'll try to study pydantic more to see if it is possible to achieve this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I don't like the solution that I found, but it works

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I also looked into the discriminator to automatically deserialize into the right version. Would be great to get it working, but it is a bit rough it seems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discriminator works if it is a field in a model. But it does not work per se. We can use it as I did or implement a sort of factory method (as it is right now)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to look at what pydantic does and basically it takes the string and then calls json.load. So one of the thing that we can do is:

  1. move the json.load call inside this method
  2. call the parse_object: now we don't have to concatenate the string since the object is the correct one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also bumping Pydantic to 2.0: #7782 Maybe we can see if that offers something new.

@Fokko Fokko changed the title [#7982] delegate serialization to pydantic Python: Delegate serialization to pydantic Aug 10, 2023
Copy link
Contributor

@Fokko Fokko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @aless10 🙌🏻

@Fokko Fokko merged commit 700127d into apache:master Aug 15, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Python: Delegate json deserialization to Pydantic
2 participants