Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-38417: [MATLAB] Implement a TableTypeValidator class that validates a MATLAB cell array contains only tables that share the same schema #38551

Merged
merged 21 commits into from
Nov 2, 2023

Conversation

sgilmore10
Copy link
Member

@sgilmore10 sgilmore10 commented Nov 1, 2023

Rationale for this change

This is a followup to #38533.

Adding this TableValidator class is a step towards implementing the arrow.array.ListArray.fromMATLAB method for creating ListArrays whose ValueType is a StructArray.

This validator will ensure all tables in a cell array have the same schema when attempting to make a ListArray of Structs. This is a requirement to ensure the tables in the cell array are vertcat'ble. For example, two tables with different VariableNames cannot be concatenated together:

>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.

What changes are included in this PR?

Modified arrow.array.internal.list.Validator to inherit from matlab.mixin.Heterogeneous. Doing so enables creating an array whose elements are different subclasses of arrow.array.internal.list.Validator.

Added a new MATLAB class arrow.array.internal.list.TableValidator, which inherits from arrow.array.internal.list.Validator. This class has two properties: VariableNames and VariableValidators.

VariableNames is a string array containing the expected variable names of all tables.

VariableValidators is an array of arrow.array.internal.list.Validator, in which each element represents one variable in a table. This array is used to validate table variables have the expected type and configuration.

TableValidator's validateElement method uses both its VariableNames and VariableValidator properties to validate the input argument provided is a table with the expected schema. If not, it throws an error.

Lastly, I added a gateway function called arrow.array.internal.list.createValidator, which creates the appropriate Validator subclass based on the input. If no such Validator exists, an error is thrown.

Are these changes tested?

Yes. Added two new test classes: tTableValidator.m and tCreateValidator.m.

Are there any user-facing changes?

No.

Future Directions:

  1. [MATLAB] Implement fromMATLAB method for arrow.array.ListArray. #38354

Copy link
Member

@kevingurney kevingurney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Thank you for working on this!

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Nov 2, 2023
@github-actions github-actions bot removed the awaiting changes Awaiting changes label Nov 2, 2023
@github-actions github-actions bot added the awaiting change review Awaiting change review label Nov 2, 2023
@kevingurney
Copy link
Member

+1

@kevingurney kevingurney merged commit 56edb2d into apache:main Nov 2, 2023
9 checks passed
@kevingurney kevingurney deleted the GH-38417 branch November 2, 2023 19:23
@kevingurney kevingurney removed the awaiting change review Awaiting change review label Nov 2, 2023
@github-actions github-actions bot added the awaiting merge Awaiting merge label Nov 2, 2023
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 56edb2d.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…validates a MATLAB `cell` array contains only `table`s that share the same schema (apache#38551)

### Rationale for this change

This is a followup to apache#38533.

Adding this `TableValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB` method for creating `ListArray`s whose `ValueType` is a `StructArray`.

This validator will ensure all `table`s in a `cell` array have the same schema when attempting to make a `ListArray` of `Struct`s. This is a requirement to ensure the `table`s in the `cell` array are vertcat'ble. For example, two `table`s with different `VariableNames` cannot be concatenated together:

```matlab
>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.
```

### What changes are included in this PR?

Modified `arrow.array.internal.list.Validator` to inherit from `matlab.mixin.Heterogeneous`. Doing so enables creating an array whose elements are different subclasses of `arrow.array.internal.list.Validator`.

Added a new MATLAB class `arrow.array.internal.list.TableValidator`, which inherits from `arrow.array.internal.list.Validator`. This class has two properties: `VariableNames` and `VariableValidators`. 

`VariableNames` is a `string` array containing the expected variable names of all `table`s.

`VariableValidators` is an array of `arrow.array.internal.list.Validator`, in which each element represents one variable in a `table`. This array is used to validate `table` variables have the expected type and configuration. 

`TableValidator`'s `validateElement` method uses both its `VariableNames` and `VariableValidator` properties to validate the input argument provided is a `table` with the expected schema. If not, it throws an error.

Lastly, I  added a gateway function called `arrow.array.internal.list.createValidator`, which creates the appropriate `Validator` subclass based on the input. If no such `Validator` exists, an error is thrown.

### Are these changes tested?

Yes. Added two new test classes: `tTableValidator.m` and `tCreateValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions: 

1. apache#38354
* Closes: apache#38417

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…validates a MATLAB `cell` array contains only `table`s that share the same schema (apache#38551)

### Rationale for this change

This is a followup to apache#38533.

Adding this `TableValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB` method for creating `ListArray`s whose `ValueType` is a `StructArray`.

This validator will ensure all `table`s in a `cell` array have the same schema when attempting to make a `ListArray` of `Struct`s. This is a requirement to ensure the `table`s in the `cell` array are vertcat'ble. For example, two `table`s with different `VariableNames` cannot be concatenated together:

```matlab
>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.
```

### What changes are included in this PR?

Modified `arrow.array.internal.list.Validator` to inherit from `matlab.mixin.Heterogeneous`. Doing so enables creating an array whose elements are different subclasses of `arrow.array.internal.list.Validator`.

Added a new MATLAB class `arrow.array.internal.list.TableValidator`, which inherits from `arrow.array.internal.list.Validator`. This class has two properties: `VariableNames` and `VariableValidators`. 

`VariableNames` is a `string` array containing the expected variable names of all `table`s.

`VariableValidators` is an array of `arrow.array.internal.list.Validator`, in which each element represents one variable in a `table`. This array is used to validate `table` variables have the expected type and configuration. 

`TableValidator`'s `validateElement` method uses both its `VariableNames` and `VariableValidator` properties to validate the input argument provided is a `table` with the expected schema. If not, it throws an error.

Lastly, I  added a gateway function called `arrow.array.internal.list.createValidator`, which creates the appropriate `Validator` subclass based on the input. If no such `Validator` exists, an error is thrown.

### Are these changes tested?

Yes. Added two new test classes: `tTableValidator.m` and `tCreateValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions: 

1. apache#38354
* Closes: apache#38417

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants