Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Centralized Management of Table Properties #365

Closed
HonahX opened this issue Feb 4, 2024 · 3 comments · Fixed by #388
Closed

Implement Centralized Management of Table Properties #365

HonahX opened this issue Feb 4, 2024 · 3 comments · Fixed by #388
Assignees

Comments

@HonahX
Copy link
Contributor

HonahX commented Feb 4, 2024

Feature Request / Improvement

With the release of version 0.6.0, pyiceberg is set to expand its capabilities in handling a greater variety of table properties, encompassing both read and write operations. (like #358 #363 )

It would be beneficial to have a place to manage the names and default values of all table properties. Similar to Apache Iceberg Java's TableProperties and PropertyUtil classes.

We can create a new file named properties.py under pyiceberg.table to hold properties' name and default, and another file named property_util.py under pyiceberg.utils to parse properties into int, bool, float, or Optional[T].

This will ensure consistent parsing of properties across the codebase and simplify future updates or changes in property names and defaults.

@Fokko
Copy link
Contributor

Fokko commented Feb 5, 2024

I think this is a great idea @HonahX. I try to avoid creating a lot of new files, since imports in Python are slow. What do you think of adding a class TableProperties to table/__init__.py. This way we can call it TableProperties.MAX_PARQUET_DICTIONARY_SIZE.

@HonahX
Copy link
Contributor Author

HonahX commented Feb 6, 2024

@Fokko Sounds good. My initial thought was to use a new file because table/init.py already has a substantial amount of content, and it's likely to grow as we develop write support. However, considering the importance of import speed, integrating it into table/__init__.py seems like the more efficient approach.

@HonahX
Copy link
Contributor Author

HonahX commented Feb 6, 2024

I can take this if no one has started it :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants