Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Function behaving similarly to SHOW PARTITIONS in the Python API #2671

Closed
FrankPortman opened this issue Jul 15, 2024 · 5 comments · Fixed by #2816
Closed

Function behaving similarly to SHOW PARTITIONS in the Python API #2671

FrankPortman opened this issue Jul 15, 2024 · 5 comments · Fixed by #2816
Assignees
Labels
enhancement New feature or request

Comments

@FrankPortman
Copy link

FrankPortman commented Jul 15, 2024

Function behaving similarly to SHOW PARTITIONS in the Python API

I am wondering if there is something similar to SHOW PARTITIONS from the Spark world of interacting with Delta Tables. This is a metadata-only query that returns back all of the partitions live for a specific Delta Table in a tabular format. Functionality such as DeltaTable.files_by_partitions is super helpful for querying but not quite the same thing. get_active_partitions is almost what I need but (1) it's not exposed in the public API which makes usage a bit clunky (but certainly nothing life ruining) and (2) the struct it returns is not the most ergonomic.

Any openness to a PR that does this?

No related issues from what I could tell.

@FrankPortman FrankPortman added the enhancement New feature or request label Jul 15, 2024
@FrankPortman FrankPortman changed the title Something behaving similarly to SHOW PARTITIONS in the Python API Function behaving similarly to SHOW PARTITIONS in the Python API Jul 15, 2024
@ion-elgreco
Copy link
Collaborator

@FrankPortman feel free to open a PR for this, it's definitely useful as a proper public api

@FrankPortman
Copy link
Author

@ion-elgreco is your preference to just open up the API so get_active_partitions exists on DeltaTable in Python? Or are you also open to it or some helper method on DeltaTable packaging the results in a more ergonomic format?

For example, right now a call to table._table.get_active_partitions() returns something like frozenset[frozenset[tuple[str, str]]], were the inner set contains as many tuples of pKey, pVal as there are partition cols in the table. My use case would involve merging those inner tables into some struct or dict, so that a single partition "row" has all the partition cols pivoted out. I don't mind handling that last part in my business logic code, but if this is something you think would be useful, I can add it here as well.

@ion-elgreco
Copy link
Collaborator

Let's go for something more ergonomic since it's a public api

@omkar-foss
Copy link
Contributor

Seems like a nice feature to have! I can pick this up and raise a PR if no one is working on it, let me know.

@FrankPortman
Copy link
Author

I'd love that - I haven't had a chance to prio yet

omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 22, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 22, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 23, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
omkar-foss added a commit to omkar-foss/delta-rs that referenced this issue Aug 23, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
ion-elgreco pushed a commit to omkar-foss/delta-rs that referenced this issue Aug 30, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
github-merge-queue bot pushed a commit that referenced this issue Aug 30, 2024
This adds a public method `partitions()` to the `DeltaTable` class
to get properly formatted partitions (list of dicts) for the table.
Also provides an option to return partitions as a list of tuples,
and proxies the partition filters to rust `get_active_partitions()`.

This also adds supporting tests for this feature.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants