Replies: 3 comments 1 reply
-
Maybe you can do something like spark/src/csharp/Microsoft.Spark.UnitTest/WorkerFunctionTests.cs Lines 94 to 96 in a6f4e91 @pgovind, do you know an easy way to construct |
Beta Was this translation helpful? Give feedback.
-
Can you tell me how you are trying to construct an Current options:
|
Beta Was this translation helpful? Give feedback.
-
@pgovind Consider this example.... The example performs a group/apply that results in a dataframe with two integer values. Notice how those two integer columns were constructed using the "Append" method. Given that pattern, suppose a developer decided they wanted to return one additional column, which was some arbitrary string (ASDFC). This is an surprisingly difficult thing to accomplish! Without having a very deep understanding of how either works, I find myself wondering if RecordBatches are the more preferred class (rather than FxDataFrame's) for these group/apply vector UDF's. Today I initially started trying the FxDataFrames, but I should probably test the behavior of RecordBatches as well since they seem fairly interchangeable in the context of group/apply. In short, I do agree with you that a discoverable API is needed (rather than using Reflection and calling hidden members). Otherwise these string columns don't feel very approachable for the casual developer. I suspect the need for option 3 will come up quite regularly. Today I unintentionally started writing some code that used the Append() method on an ASDFC and it wasn't working, nor throwing exceptions. It turns out I was unintentionally running the extension method from IEnumerable! ;-) Maybe your proposed API could simply involve an additional builder class or something like that... |
Beta Was this translation helpful? Give feedback.
-
I'm trying to write a UDF that returns an ArrowStringDataFrameColumn with arbitrary strings. It doesn't seem straightforward.
Can someone please point me to some sample code?
Here I am able to just clone another string column:
but if I want to build my own results from scratch, I keep getting an exception that says the array is immutable.
The only thing I've been able to find in my googling is this code that initializes the string data in an obscure way (link below). It is hard to believe this would be the only way to prepare string data.
https://github.com/dotnet/corefxlab/blob/e0c8705741ad2b66a1bbc70ff5bf5e02886eca1e/tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs#L32
Hopefully I'm on the right track. I initially tried to use the "StringDataFrameColumn" but that doesn't seem to be the right class since my UDF's reject it in every way.
Beta Was this translation helpful? Give feedback.
All reactions