Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37653: [MATLAB] Add arrow.array.StructArray MATLAB class #37806

Merged
merged 33 commits into from
Sep 20, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Sep 20, 2023

Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement arrow.array.StructArray class.

What changes are included in this PR?

Added arrow.array.StructArray MATLAB class.

Methods of arrow.array.StructArray include:

  • fromArrays(arrays, nvpairs)
  • field(i) -> get the i field as an arrow.array.Array. i can be a positive integer or a field name.
  • toMATLAB() -> convert to a MATLAB table
  • table() -> convert to a MATLAB table

Properties of arrow.array.StructArray include:

  • Type
  • Length
  • NumFields
  • FieldNames
  • Valid

Example Usage

>> a = arrow.array([1, 2, 3, 4]);
>> b = arrow.array(["A", "B", "C", "D"]);
>> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"])
s = 

-- is_valid: all not null
-- child 0 type: double
  [
    1,
    2,
    3,
    4
  ]
-- child 1 type: string
  [
    "A",
    "B",
    "C",
    "D"
  ]

% Convert StructArray to a MATLAB table
>> t = toMATLAB(s)

t =

  4×2 table

    A     B 
    _    ___

    1    "A"
    2    "B"
    3    "C"
    4    "D"

Are these changes tested?

Yes. Added a new test class tStructArray.m

Are there any user-facing changes?

Yes. Users can now construct an arrow.array.StructArray instance.

Notes

  1. Although struct is a MATLAB datatype, StructArray's toMATLAB method returns a MATLAB table. We went with this design because the layout of MATLAB tables more closely resembles StructArrays. MATLAB tables ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a struct method to StructArray, which will return a MATLAB struct array.
  2. I removed the virtual toMATLAB method from proxy::Array because the nested array MATLAB will implement their toMATLAB method by invoking the toMATLAB method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a toMATLAB method.

Future Directions

  1. Add a fromMATLAB static method to create StructArrays from MATLAB tables and MATLAB struct arrays.
  2. Add a fromTable static method to create StructArrays from arrow.tabular.Tables
  3. Add a fromRecordBatch static method to create StructArrays from arrow.tabular.RecordBatchs

@sgilmore10 sgilmore10 marked this pull request as ready for review September 20, 2023 16:29
Copy link
Member

@kevingurney kevingurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It's really nice to see nested type support being added.

matlab/src/cpp/arrow/matlab/error/error.h Outdated Show resolved Hide resolved
matlab/src/matlab/+arrow/+array/StructArray.m Outdated Show resolved Hide resolved
matlab/src/matlab/+arrow/+array/StructArray.m Outdated Show resolved Hide resolved
matlab/src/matlab/+arrow/+internal/+validate/parseValid.m Outdated Show resolved Hide resolved
matlab/src/matlab/+arrow/+type/StructType.m Show resolved Hide resolved
matlab/test/arrow/array/tStructArray.m Outdated Show resolved Hide resolved
matlab/test/arrow/array/tStructArray.m Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 20, 2023
@github-actions github-actions bot removed the awaiting changes Awaiting changes label Sep 20, 2023
@github-actions github-actions bot added the awaiting change review Awaiting change review label Sep 20, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Sep 20, 2023
@kevingurney
Copy link
Member

+1

@kevingurney kevingurney merged commit 7b30ba4 into apache:main Sep 20, 2023
7 of 9 checks passed
@kevingurney kevingurney deleted the GH-37653 branch September 20, 2023 19:43
@kevingurney kevingurney removed the awaiting merge Awaiting merge label Sep 20, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 7b30ba4.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…pache#37806)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class.

### What changes are included in this PR?

Added `arrow.array.StructArray` MATLAB class. 

*Methods* of `arrow.array.StructArray` include: 

- `fromArrays(arrays, nvpairs)`
- `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name.
- `toMATLAB()` -> convert to a MATLAB `table`
- `table()` -> convert to a MATLAB `table`

*Properties* of `arrow.array.StructArray` include:

- `Type`
- `Length`
- `NumFields`
- `FieldNames`
- `Valid`

**Example Usage**
```matlab
>> a = arrow.array([1, 2, 3, 4]);
>> b = arrow.array(["A", "B", "C", "D"]);
>> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"])
s = 

-- is_valid: all not null
-- child 0 type: double
  [
    1,
    2,
    3,
    4
  ]
-- child 1 type: string
  [
    "A",
    "B",
    "C",
    "D"
  ]

% Convert StructArray to a MATLAB table
>> t = toMATLAB(s)

t =

  4×2 table

    A     B 
    _    ___

    1    "A"
    2    "B"
    3    "C"
    4    "D"
```

### Are these changes tested?

Yes. Added a new test class `tStructArray.m`

### Are there any user-facing changes?

Yes. Users can now construct an `arrow.array.StructArray` instance. 

### Notes

1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array.
2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method.

### Future Directions
1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays.
4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s
5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s

* Closes: apache#37653 

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…pache#37806)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class.

### What changes are included in this PR?

Added `arrow.array.StructArray` MATLAB class. 

*Methods* of `arrow.array.StructArray` include: 

- `fromArrays(arrays, nvpairs)`
- `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name.
- `toMATLAB()` -> convert to a MATLAB `table`
- `table()` -> convert to a MATLAB `table`

*Properties* of `arrow.array.StructArray` include:

- `Type`
- `Length`
- `NumFields`
- `FieldNames`
- `Valid`

**Example Usage**
```matlab
>> a = arrow.array([1, 2, 3, 4]);
>> b = arrow.array(["A", "B", "C", "D"]);
>> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"])
s = 

-- is_valid: all not null
-- child 0 type: double
  [
    1,
    2,
    3,
    4
  ]
-- child 1 type: string
  [
    "A",
    "B",
    "C",
    "D"
  ]

% Convert StructArray to a MATLAB table
>> t = toMATLAB(s)

t =

  4×2 table

    A     B 
    _    ___

    1    "A"
    2    "B"
    3    "C"
    4    "D"
```

### Are these changes tested?

Yes. Added a new test class `tStructArray.m`

### Are there any user-facing changes?

Yes. Users can now construct an `arrow.array.StructArray` instance. 

### Notes

1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array.
2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method.

### Future Directions
1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays.
4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s
5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s

* Closes: apache#37653 

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…pache#37806)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class.

### What changes are included in this PR?

Added `arrow.array.StructArray` MATLAB class. 

*Methods* of `arrow.array.StructArray` include: 

- `fromArrays(arrays, nvpairs)`
- `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name.
- `toMATLAB()` -> convert to a MATLAB `table`
- `table()` -> convert to a MATLAB `table`

*Properties* of `arrow.array.StructArray` include:

- `Type`
- `Length`
- `NumFields`
- `FieldNames`
- `Valid`

**Example Usage**
```matlab
>> a = arrow.array([1, 2, 3, 4]);
>> b = arrow.array(["A", "B", "C", "D"]);
>> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"])
s = 

-- is_valid: all not null
-- child 0 type: double
  [
    1,
    2,
    3,
    4
  ]
-- child 1 type: string
  [
    "A",
    "B",
    "C",
    "D"
  ]

% Convert StructArray to a MATLAB table
>> t = toMATLAB(s)

t =

  4×2 table

    A     B 
    _    ___

    1    "A"
    2    "B"
    3    "C"
    4    "D"
```

### Are these changes tested?

Yes. Added a new test class `tStructArray.m`

### Are there any user-facing changes?

Yes. Users can now construct an `arrow.array.StructArray` instance. 

### Notes

1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array.
2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method.

### Future Directions
1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays.
4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s
5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s

* Closes: apache#37653 

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MATLAB] Add arrow.array.StructArray MATLAB class
2 participants