-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] Support User-Defined Types (UDT) #7923
Comments
Thank you for this @yukkit -- I think the high level idea would work really nicely into the DataFusion story of extensibility. I think the core challenge of implementing this feature is how to work it into the existing code. DataFusion uses One way to model user defined types in DataFusion would be as an arrow extension type (which would need upstream support as described in apache/arrow-rs#4472). Then the DataFusion codebase could treat all user defined types as arrow extension types, using the There is also a somewhat related discussion on #7421 about how DataType encodes both encoding and logical type |
@alamb I read the discussion on data types in #7421. I understand that there is no concept of logical types in arrow-rs. Rather than adding an ExtensionType in arrow DataType, I am more inclined to introduce a LogicalType in DF, just as discussed in #7421. My thoughts are as follows:
Finally, in my opinion, both of the following plans are feasible:
I look forward to hearing more suggestions on which plan to ultimately implement |
I think we should support ExtensionType for TableProvider(define a column with some extension type) whatever the plan we choose finally.
TableProvider returns Schema defined in arrow, so we need another Schema type with LogicalType in DF? |
I think that |
I am not sure about how large the change is, but my intuition is that it would be substantial. However, the only way to find out I think would be to try.
I agree they are both feasible, though they come with different tradeoffs. I think the next step is probably to prototype one of the approaches with a technical spike (aka make a PR with that it would look like, to get a sense of the API as well as what would need to be changed) |
@alamb I very much agree with you, I will try it next, and if it proves feasible, I will submit draft pr step by step. |
New proposal: #12644 |
Is your feature request related to a problem or challenge?
I've noticed there are some issues regarding adding extension types in DataFusion.
Providing an interface for adding extension types in DataFusion would be highly meaningful. This would allow applications built on DataFusion to easily incorporate business-specific data types.
I hope to promote the development of the UDT feature through this current proposal.
Describe the solution you'd like
User-Defined Types (UDT)
UDT stands for User-Defined Type. It is a feature in database systems that allows users to define their own custom data types based on existing data types provided by the database. This feature enables users to create data structures tailored to their specific needs, providing a higher level of abstraction and organization for complex data.
Syntax
Behaviors
Behaviors of Data Types
Behaviors of Data
Role of Data Types in the SQL Lifecycle
SQL Statement String -> AST
None
AST -> Logical Plan
Logical Plan -> Execution Plan
None
Execution Plan -> ResultSet
Core Structures
Examples
create udt
geoarrow
https://github.com/geoarrow/geoarrow/blob/main/extension-types.md
Point
Questions
Describe alternatives you've considered
No response
Additional context
@alamb I am particularly eager to receive your feedback or suggestions on this proposal. Additionally, I highly encourage individuals who are familiar with or interested in this feature to contribute their improvement ideas.
The text was updated successfully, but these errors were encountered: