-
Assuming I return a row from a udf like so... TestUdfWithReturnAsRowType(): ... is there a way to cast the collection of row datatypes back into a dataframe ? The only thing I've managed to do is this wierd thing where I collect the column and then send the GenericRow collection back thru spark.CreateDataFrame. See below.
This seems very inefficient. I'd rather not collect any data back to the driver at all if I can avoid it. I just want the resulting GenericRow from my UDF to be treated like it is a distributed dataframe. Hopefully this is clear. Any pointers would be appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
For the example in the test you shared: Row[] rows = _df.Select(udf(nameCol).As("col"), nameCol).Collect().ToArray();
The column type of |
Beta Was this translation helpful? Give feedback.
For the example in the test you shared:
_df.Select(udf(nameCol).As("col"), nameCol)
returns aDataFrame
, so I am confused with what you are trying to do: "to cast the collection of row datatypes back into a dataframe ?"The column type of
udf(nameCol).As("col")
is the schema you specify in the UDF, so you can keep operating on the dataframe based on the schema. Try to run_df.Select(udf(nameCol).As("col"), nameCol).PrintSchema()
to see the schema of the dataframe after applying the UDF.