-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to build a field of type Struct with DataFusion SQL? #2043
Comments
Hello - thanks for raising this, im quite interested in operations on structs as well. Do you have any examples of how other engines such as spark / postgres / athena etc handle what you are proposing? |
Different approaches to represent a structure in the projection part of a SELECT. BigQuery syntax: Spark syntax: Postgres and Athena syntax: Snowflake syntax: |
BTW I started collecting known issues in struct support in #2326 |
@lquerel Are you taking this feature? If not i am glad to take this.❤️ |
@Ted-Jiang No unfortunately I can't right now. Thanks in advance. |
@Ted-Jiang thanks for your PR. Really nice to see that integrated so quickly. Does this PR allow to create nested structs ? |
Sorry for the delay, No we can't create a nested, I think we should modify it in |
@Ted-Jiang based on #2389 and this I was wondering what all is possible to create struct columns in datafusion SQL. for example I was trying to do something like the below but its broke,
|
For those looking for how to set the names, the |
DataFusion already gives us a way to access nested fields with this syntax field["nested_field"] (field being an arrow struct field). However I didn't find any example or documentation to do the reverse operation, i.e. creating a struct from multiple fields. A pseudo SQL query (inspired from BIGQUERY) will be something like:
SELECT STRUCT(a AS field_1, b AS field_2) AS struct_name FROM table
. Is there a way to express this type of construct with DataFusion SQL?Support for this type of functionality will be particularly helpful for dealing with data sources that support nested fields (e.g. parquet and JSON) and when the query output MUST also be stored in a nested form. Example of use cases:
Proposed solution:
STRUCT (expr AS field_name, ...) AS struct_field_name
SELECT STRUCT (struct_a.* EXCEPT (field_1, field_2)) FROM xyz
. This type of syntax is robust to schema evolution as new fields will be automatically captured by the wildcard operator.SELECT STRUCT(struct_a.*, struct_b.*) AS combined_struct FROM xyz
. This must fail if the structures a and b have fields with the same name.The text was updated successfully, but these errors were encountered: