-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-37653: [MATLAB] Add arrow.array.StructArray
MATLAB class
#37806
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! It's really nice to see nested type support being added.
matlab/src/matlab/+arrow/+internal/+test/+tabular/createAllSupportedArrayTypes.m
Show resolved
Hide resolved
44b2d89
to
e913f15
Compare
+1 |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7b30ba4. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…pache#37806) ### Rationale for this change Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class. ### What changes are included in this PR? Added `arrow.array.StructArray` MATLAB class. *Methods* of `arrow.array.StructArray` include: - `fromArrays(arrays, nvpairs)` - `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name. - `toMATLAB()` -> convert to a MATLAB `table` - `table()` -> convert to a MATLAB `table` *Properties* of `arrow.array.StructArray` include: - `Type` - `Length` - `NumFields` - `FieldNames` - `Valid` **Example Usage** ```matlab >> a = arrow.array([1, 2, 3, 4]); >> b = arrow.array(["A", "B", "C", "D"]); >> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"]) s = -- is_valid: all not null -- child 0 type: double [ 1, 2, 3, 4 ] -- child 1 type: string [ "A", "B", "C", "D" ] % Convert StructArray to a MATLAB table >> t = toMATLAB(s) t = 4×2 table A B _ ___ 1 "A" 2 "B" 3 "C" 4 "D" ``` ### Are these changes tested? Yes. Added a new test class `tStructArray.m` ### Are there any user-facing changes? Yes. Users can now construct an `arrow.array.StructArray` instance. ### Notes 1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array. 2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method. ### Future Directions 1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays. 4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s 5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s * Closes: apache#37653 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…pache#37806) ### Rationale for this change Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class. ### What changes are included in this PR? Added `arrow.array.StructArray` MATLAB class. *Methods* of `arrow.array.StructArray` include: - `fromArrays(arrays, nvpairs)` - `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name. - `toMATLAB()` -> convert to a MATLAB `table` - `table()` -> convert to a MATLAB `table` *Properties* of `arrow.array.StructArray` include: - `Type` - `Length` - `NumFields` - `FieldNames` - `Valid` **Example Usage** ```matlab >> a = arrow.array([1, 2, 3, 4]); >> b = arrow.array(["A", "B", "C", "D"]); >> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"]) s = -- is_valid: all not null -- child 0 type: double [ 1, 2, 3, 4 ] -- child 1 type: string [ "A", "B", "C", "D" ] % Convert StructArray to a MATLAB table >> t = toMATLAB(s) t = 4×2 table A B _ ___ 1 "A" 2 "B" 3 "C" 4 "D" ``` ### Are these changes tested? Yes. Added a new test class `tStructArray.m` ### Are there any user-facing changes? Yes. Users can now construct an `arrow.array.StructArray` instance. ### Notes 1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array. 2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method. ### Future Directions 1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays. 4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s 5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s * Closes: apache#37653 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
…pache#37806) ### Rationale for this change Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class. ### What changes are included in this PR? Added `arrow.array.StructArray` MATLAB class. *Methods* of `arrow.array.StructArray` include: - `fromArrays(arrays, nvpairs)` - `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name. - `toMATLAB()` -> convert to a MATLAB `table` - `table()` -> convert to a MATLAB `table` *Properties* of `arrow.array.StructArray` include: - `Type` - `Length` - `NumFields` - `FieldNames` - `Valid` **Example Usage** ```matlab >> a = arrow.array([1, 2, 3, 4]); >> b = arrow.array(["A", "B", "C", "D"]); >> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"]) s = -- is_valid: all not null -- child 0 type: double [ 1, 2, 3, 4 ] -- child 1 type: string [ "A", "B", "C", "D" ] % Convert StructArray to a MATLAB table >> t = toMATLAB(s) t = 4×2 table A B _ ___ 1 "A" 2 "B" 3 "C" 4 "D" ``` ### Are these changes tested? Yes. Added a new test class `tStructArray.m` ### Are there any user-facing changes? Yes. Users can now construct an `arrow.array.StructArray` instance. ### Notes 1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array. 2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method. ### Future Directions 1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays. 4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s 5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s * Closes: apache#37653 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
Rationale for this change
Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement
arrow.array.StructArray
class.What changes are included in this PR?
Added
arrow.array.StructArray
MATLAB class.Methods of
arrow.array.StructArray
include:fromArrays(arrays, nvpairs)
field(i)
-> get thei
field as anarrow.array.Array
.i
can be a positive integer or a field name.toMATLAB()
-> convert to a MATLABtable
table()
-> convert to a MATLABtable
Properties of
arrow.array.StructArray
include:Type
Length
NumFields
FieldNames
Valid
Example Usage
Are these changes tested?
Yes. Added a new test class
tStructArray.m
Are there any user-facing changes?
Yes. Users can now construct an
arrow.array.StructArray
instance.Notes
struct
is a MATLAB datatype,StructArray
'stoMATLAB
method returns a MATLABtable
. We went with this design because the layout of MATLABtable
s more closely resemblesStructArray
s. MATLABtables
ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding astruct
method toStructArray
, which will return a MATLABstruct
array.toMATLAB
method fromproxy::Array
because the nested array MATLAB will implement theirtoMATLAB
method by invoking thetoMATLAB
method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have atoMATLAB
method.Future Directions
fromMATLAB
static method to createStructArray
s from MATLABtables
and MATLABstruct
arrays.fromTable
static method to createStructArray
s fromarrow.tabular.Table
sfromRecordBatch
static method to createStructArray
s fromarrow.tabular.RecordBatch
sarrow.array.StructArray
MATLAB class #37653