Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat request] Make Table / TableMetadata JSON serializable #535

Open
kevinjqliu opened this issue Mar 20, 2024 · 5 comments
Open

[feat request] Make Table / TableMetadata JSON serializable #535

kevinjqliu opened this issue Mar 20, 2024 · 5 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@kevinjqliu
Copy link
Contributor

Feature Request / Improvement

The REST Catalog exposes Table and TableMetadata information as HTTP endpoints in JSON format (link). This information is similar to the internal state of Table and TableMetadata objects in Python.

It would be great to make these JSON serializable.

Example

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
json.dumps(vars(tbl))

Error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Table is not JSON serializable
>>> json.dumps(vars(tbl))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TableMetadataV1 is not JSON serializable
@Fokko
Copy link
Contributor

Fokko commented Mar 20, 2024

We should be able to (de)serialize it using Pydantic. That's probably also faster.

@kevinjqliu
Copy link
Contributor Author

kevinjqliu commented Mar 20, 2024

oh thanks for the hint, looks like using the model_dump_json function works.

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
tbl.metadata.model_dump_json()

but only on tbl.metadata and not tbl.

@kevinjqliu
Copy link
Contributor Author

There's already a __repr__ function defined for the Table object. @Fokko what do you think about adding another function for Table which will output the JSON representation?

@db-trin-life
Copy link

@kevinjqliu if no one is on this, can look to take this on

@kevinjqliu
Copy link
Contributor Author

@db-trin-life yep assigned to you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants