-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow projection of schemas/structs #38615
Comments
There's the I'm not quite sure casting is the right name to give this operation, but I think there is room to add an operation like what you're proposing (whether it's called casting or not) |
Thanks again for the quick response @lidavidm The current operation that does this is called |
Still running into this. I would expect something like below to work: In [1]: import pyarrow as pa
In [2]: current_schema = pa.schema([
...: ^Ipa.field("x", pa.float32()),
...: ^Ipa.field("y", pa.float32())
...: ])
...:
...: tbl = pa.Table.from_pylist(
...: [
...: {"x": 52.371807, "y": 4.896029},
...: {"x": 52.387386, "y": 4.646219},
...: {"x": 52.078663, "y": 4.288788},
...: ],
...: schema=current_schema,
...: )
...:
...: schema_with_z = pa.schema(
...: [
...: ^I^Ipa.field("x", pa.float32()),
...: ^I^Ipa.field("y", pa.float32()),
...: pa.field("x", pa.float32()),
...: ]
...: )
...:
...: tbl.cast(schema_with_z)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 23
6 tbl = pa.Table.from_pylist(
7 [
8 {"x": 52.371807, "y": 4.896029},
(...)
12 schema=current_schema,
13 )
15 schema_with_z = pa.schema(
16 [
17 pa.field("x", pa.float32()),
(...)
20 ]
21 )
---> 23 tbl.cast(schema_with_z)
File /opt/homebrew/lib/python3.11/site-packages/pyarrow/table.pxi:3793, in pyarrow.lib.Table.cast()
ValueError: Target schema's field names are not matching the table's field names: ['x', 'y'], ['x', 'y', 'x'] |
Describe the enhancement requested
For PyIceberg recently, concatenation of tables has been added: #36846 To add new fields I concat the requested schema with the data that was loaded. However, now I'm hitting the next barrier, unable to project the schemas of nested structs.
Bit of context. For the top-level schema it is not an issue because we can select the columns that we need when reading in the table, but it doesn't allow selection of nested columns.
Selecting a subset:
Or in a nested struct:
Any thoughts on adding this? Or can we achieve this in another way?
Component(s)
Python
The text was updated successfully, but these errors were encountered: