-
Is there any technique to return a complex type from a vector UDF? I've been reviewing the available "VectorUdf" functions defined in ArrowFunctions and DataFrameFunctions but I cannot see how a complex type would be returned. I really like the way we can return a "Microsoft.Spark.Sql.GenericRow" from a standard Udf function, but that seems slow. I'd rather operate on larger batches of data if possible (ie. spark partitions). Any help would be appreciated. I haven't been able to find anything on my own yet. I discovered an Apache Arrow data type called "Apache.Arrow.StructArray" and I'm wondering if that could be used by way of one of the implementations of VectorUdf in ArrowFunctions. However there are virtually no references to "Apache.Arrow.StructArray" in the project, and no examples. Another approach I've used to return complex data types from a large batch of a dataframe is by relying on GroupBy/Apply in RelationalGroupedDataset. But I think I'm overusing/abusing that feature. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Here is the error I get when attempting to use ArrowFunctions to return a complex type (Apache.Arrow.StructArray):
|
Beta Was this translation helpful? Give feedback.
-
I stumbled on a reference to the same thing in the issues list: It looks like this is a work-in-progress... I'm guessing there isn't any other straight-forward way to return complex data from a vector UDF. Will close this discussion for now. |
Beta Was this translation helpful? Give feedback.
-
Can you offer an example of this? |
Beta Was this translation helpful? Give feedback.
I stumbled on a reference to the same thing in the issues list:
(issue #826 )
It looks like this is a work-in-progress...
I'm guessing there isn't any other straight-forward way to return complex data from a vector UDF. Will close this discussion for now.