Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MATLAB] Implement fromMATLAB method for arrow.array.ListArray. #38354

Closed
kevingurney opened this issue Oct 19, 2023 · 1 comment · Fixed by #38561
Closed

[MATLAB] Implement fromMATLAB method for arrow.array.ListArray. #38354

kevingurney opened this issue Oct 19, 2023 · 1 comment · Fixed by #38561

Comments

@kevingurney
Copy link
Member

Describe the enhancement requested

In support of #38353, we should implement a static fromMATLAB method for arrow.array.ListArray that takes in a MATLAB cell array and returns an instance of arrow.array.ListArray.

Component(s)

MATLAB

kevingurney added a commit that referenced this issue Oct 23, 2023
### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class.

This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method.

### What changes are included in this PR?

1. Added a new `arrow.array.ListArray` MATLAB class.

*Methods*

`cellArray = arrow.array.ListArray.toMATLAB()`
`listArray = arrow.array.ListArray.fromArrays(offsets, values)`

*Properties*

`Offsets` - `Int32Array` list offsets (uses zero-based indexing)
`Values` - Array of values in the list (supports nesting)

2. Added a new `arrow.type.traits.ListTraits` MATLAB class.

**Example**
```matlab
>> offsets = arrow.array(int32([0, 2, 3, 7]))

offsets = 

[
  0,
  2,
  3,
  7
]

>> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"])

values = 

[
  "A",
  "B",
  "C",
  "D",
  "E",
  "F",
  "G"
]

>> arrowArray = arrow.array.ListArray.fromArrays(offsets, values)

arrowArray = 

[
  [
    "A",
    "B"
  ],
  [
    "C"
  ],
  [
    "D",
    "E",
    "F",
    "G"
  ]
]

>> matlabArray = arrowArray.toMATLAB()

matlabArray =

  3x1 cell array

    {2x1 string}
    {["C"     ]}
    {4x1 string}

>> matlabArray{:}

ans = 

  2x1 string array

    "A"
    "B"

ans = 

    "C"

ans = 

  4x1 string array

    "D"
    "E"
    "F"
    "G"

```

### Are these changes tested?

Yes.

1. Added a new `tListArray.m` test class.
2. Added a new `tListTraits.m` test class.
3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`.

### Are there any user-facing changes?

Yes.

1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method.

### Notes

1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large.
2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See #38353, #38354, and #38361.
3. Thank you @ sgilmore10 for your help with this pull request!

### Future Directions

1. #38353
2. #38354
3. #38361
4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays.
* Closes: #37815 

Lead-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this issue Oct 25, 2023
…ache#38357)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class.

This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method.

### What changes are included in this PR?

1. Added a new `arrow.array.ListArray` MATLAB class.

*Methods*

`cellArray = arrow.array.ListArray.toMATLAB()`
`listArray = arrow.array.ListArray.fromArrays(offsets, values)`

*Properties*

`Offsets` - `Int32Array` list offsets (uses zero-based indexing)
`Values` - Array of values in the list (supports nesting)

2. Added a new `arrow.type.traits.ListTraits` MATLAB class.

**Example**
```matlab
>> offsets = arrow.array(int32([0, 2, 3, 7]))

offsets = 

[
  0,
  2,
  3,
  7
]

>> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"])

values = 

[
  "A",
  "B",
  "C",
  "D",
  "E",
  "F",
  "G"
]

>> arrowArray = arrow.array.ListArray.fromArrays(offsets, values)

arrowArray = 

[
  [
    "A",
    "B"
  ],
  [
    "C"
  ],
  [
    "D",
    "E",
    "F",
    "G"
  ]
]

>> matlabArray = arrowArray.toMATLAB()

matlabArray =

  3x1 cell array

    {2x1 string}
    {["C"     ]}
    {4x1 string}

>> matlabArray{:}

ans = 

  2x1 string array

    "A"
    "B"

ans = 

    "C"

ans = 

  4x1 string array

    "D"
    "E"
    "F"
    "G"

```

### Are these changes tested?

Yes.

1. Added a new `tListArray.m` test class.
2. Added a new `tListTraits.m` test class.
3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`.

### Are there any user-facing changes?

Yes.

1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method.

### Notes

1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large.
2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See apache#38353, apache#38354, and apache#38361.
3. Thank you @ sgilmore10 for your help with this pull request!

### Future Directions

1. apache#38353
2. apache#38354
3. apache#38361
4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays.
* Closes: apache#37815 

Lead-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
kevingurney pushed a commit that referenced this issue Oct 31, 2023
…tes a MATLAB `cell` array contains only values of the same class type. (#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. #38420 
2. #38417
3. #38354 
* Closes: #38419

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
kevingurney pushed a commit that referenced this issue Oct 31, 2023
…es a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (#38533)

### Rationale for this change

This is a followup to #38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. #38417 
2. #38354 
* Closes: #38420

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
@sgilmore10
Copy link
Member

take

kevingurney pushed a commit that referenced this issue Nov 2, 2023
…tes a MATLAB `cell` array contains only `table`s that share the same schema (#38551)

### Rationale for this change

This is a followup to #38533.

Adding this `TableValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB` method for creating `ListArray`s whose `ValueType` is a `StructArray`.

This validator will ensure all `table`s in a `cell` array have the same schema when attempting to make a `ListArray` of `Struct`s. This is a requirement to ensure the `table`s in the `cell` array are vertcat'ble. For example, two `table`s with different `VariableNames` cannot be concatenated together:

```matlab
>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.
```

### What changes are included in this PR?

Modified `arrow.array.internal.list.Validator` to inherit from `matlab.mixin.Heterogeneous`. Doing so enables creating an array whose elements are different subclasses of `arrow.array.internal.list.Validator`.

Added a new MATLAB class `arrow.array.internal.list.TableValidator`, which inherits from `arrow.array.internal.list.Validator`. This class has two properties: `VariableNames` and `VariableValidators`. 

`VariableNames` is a `string` array containing the expected variable names of all `table`s.

`VariableValidators` is an array of `arrow.array.internal.list.Validator`, in which each element represents one variable in a `table`. This array is used to validate `table` variables have the expected type and configuration. 

`TableValidator`'s `validateElement` method uses both its `VariableNames` and `VariableValidator` properties to validate the input argument provided is a `table` with the expected schema. If not, it throws an error.

Lastly, I  added a gateway function called `arrow.array.internal.list.createValidator`, which creates the appropriate `Validator` subclass based on the input. If no such `Validator` exists, an error is thrown.

### Are these changes tested?

Yes. Added two new test classes: `tTableValidator.m` and `tCreateValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions: 

1. #38354
* Closes: #38417

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
kevingurney pushed a commit that referenced this issue Nov 2, 2023
…tArray` (#38561)

### Rationale for this change

We should implement a static `fromMATLAB` method for `arrow.array.ListArray` that takes in a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. Adding this method enables users to create an `arrow.array.ListArray` by passing a MATLAB `cell` array to the `arrow.array` gateway function:

```matlab
>> C = {[1 2 3], [4 5], 6};
>> array = arrow.array(C)

array = 

  ListArray with 3 elements and 0 null values:

    [
        [
            1,
            2,
            3
        ],
        [
            4,
            5
        ],
        [
            6
        ]
    ]
```
Internally, the `arrow.array` gateway function will call `arrow.array.ListArray.fromMATLAB` to construct a `ListArray` from the given `cell` array.

### What changes are included in this PR?

1. Implemented `fromMATLAB` method on `arrow.array.ListArray`. This method accepts a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. 
2. Set the `ArrayStaticConstructor` property of `arrow.type.traits.ListTraits` to `@ arrow.array.ListArray.fromMATLAB`.
3. Added a switch case for `"cell"` to the `arrow.array` gateway function that invokes `arrow.array.ListArray.fromMATLAB` with the input `cell` array.

### Are these changes tested?

Yes. I added a new test class to the `test/arrow/array/list` folder named `tFromMATLAB.m`.

### Are there any user-facing changes?

Yes. Users can now create instances of `arrow.array.ListArray` by passing `cell` arrays to `arrow.array`:

```matlab
>> C = {["A" "B"], ["C" "D" "E"], missing, ["F" "G"], string.empty(0, 1)};
>> array = arrow.array(C)

array = 

  ListArray with 5 elements and 1 null value:

    [
        [
            "A",
            "B"
        ],
        [
            "C",
            "D",
            "E"
        ],
        null,
        [
            "F",
            "G"
        ],
        []
    ]

```

* Closes: #38354

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
@kevingurney kevingurney added this to the 15.0.0 milestone Nov 2, 2023
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ache#38357)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class.

This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method.

### What changes are included in this PR?

1. Added a new `arrow.array.ListArray` MATLAB class.

*Methods*

`cellArray = arrow.array.ListArray.toMATLAB()`
`listArray = arrow.array.ListArray.fromArrays(offsets, values)`

*Properties*

`Offsets` - `Int32Array` list offsets (uses zero-based indexing)
`Values` - Array of values in the list (supports nesting)

2. Added a new `arrow.type.traits.ListTraits` MATLAB class.

**Example**
```matlab
>> offsets = arrow.array(int32([0, 2, 3, 7]))

offsets = 

[
  0,
  2,
  3,
  7
]

>> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"])

values = 

[
  "A",
  "B",
  "C",
  "D",
  "E",
  "F",
  "G"
]

>> arrowArray = arrow.array.ListArray.fromArrays(offsets, values)

arrowArray = 

[
  [
    "A",
    "B"
  ],
  [
    "C"
  ],
  [
    "D",
    "E",
    "F",
    "G"
  ]
]

>> matlabArray = arrowArray.toMATLAB()

matlabArray =

  3x1 cell array

    {2x1 string}
    {["C"     ]}
    {4x1 string}

>> matlabArray{:}

ans = 

  2x1 string array

    "A"
    "B"

ans = 

    "C"

ans = 

  4x1 string array

    "D"
    "E"
    "F"
    "G"

```

### Are these changes tested?

Yes.

1. Added a new `tListArray.m` test class.
2. Added a new `tListTraits.m` test class.
3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`.

### Are there any user-facing changes?

Yes.

1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method.

### Notes

1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large.
2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See apache#38353, apache#38354, and apache#38361.
3. Thank you @ sgilmore10 for your help with this pull request!

### Future Directions

1. apache#38353
2. apache#38354
3. apache#38361
4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays.
* Closes: apache#37815 

Lead-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…alidates a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (apache#38533)

### Rationale for this change

This is a followup to apache#38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38417 
2. apache#38354 
* Closes: apache#38420

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…validates a MATLAB `cell` array contains only `table`s that share the same schema (apache#38551)

### Rationale for this change

This is a followup to apache#38533.

Adding this `TableValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB` method for creating `ListArray`s whose `ValueType` is a `StructArray`.

This validator will ensure all `table`s in a `cell` array have the same schema when attempting to make a `ListArray` of `Struct`s. This is a requirement to ensure the `table`s in the `cell` array are vertcat'ble. For example, two `table`s with different `VariableNames` cannot be concatenated together:

```matlab
>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.
```

### What changes are included in this PR?

Modified `arrow.array.internal.list.Validator` to inherit from `matlab.mixin.Heterogeneous`. Doing so enables creating an array whose elements are different subclasses of `arrow.array.internal.list.Validator`.

Added a new MATLAB class `arrow.array.internal.list.TableValidator`, which inherits from `arrow.array.internal.list.Validator`. This class has two properties: `VariableNames` and `VariableValidators`. 

`VariableNames` is a `string` array containing the expected variable names of all `table`s.

`VariableValidators` is an array of `arrow.array.internal.list.Validator`, in which each element represents one variable in a `table`. This array is used to validate `table` variables have the expected type and configuration. 

`TableValidator`'s `validateElement` method uses both its `VariableNames` and `VariableValidator` properties to validate the input argument provided is a `table` with the expected schema. If not, it throws an error.

Lastly, I  added a gateway function called `arrow.array.internal.list.createValidator`, which creates the appropriate `Validator` subclass based on the input. If no such `Validator` exists, an error is thrown.

### Are these changes tested?

Yes. Added two new test classes: `tTableValidator.m` and `tCreateValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions: 

1. apache#38354
* Closes: apache#38417

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this issue Nov 13, 2023
…ay.ListArray` (apache#38561)

### Rationale for this change

We should implement a static `fromMATLAB` method for `arrow.array.ListArray` that takes in a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. Adding this method enables users to create an `arrow.array.ListArray` by passing a MATLAB `cell` array to the `arrow.array` gateway function:

```matlab
>> C = {[1 2 3], [4 5], 6};
>> array = arrow.array(C)

array = 

  ListArray with 3 elements and 0 null values:

    [
        [
            1,
            2,
            3
        ],
        [
            4,
            5
        ],
        [
            6
        ]
    ]
```
Internally, the `arrow.array` gateway function will call `arrow.array.ListArray.fromMATLAB` to construct a `ListArray` from the given `cell` array.

### What changes are included in this PR?

1. Implemented `fromMATLAB` method on `arrow.array.ListArray`. This method accepts a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. 
2. Set the `ArrayStaticConstructor` property of `arrow.type.traits.ListTraits` to `@ arrow.array.ListArray.fromMATLAB`.
3. Added a switch case for `"cell"` to the `arrow.array` gateway function that invokes `arrow.array.ListArray.fromMATLAB` with the input `cell` array.

### Are these changes tested?

Yes. I added a new test class to the `test/arrow/array/list` folder named `tFromMATLAB.m`.

### Are there any user-facing changes?

Yes. Users can now create instances of `arrow.array.ListArray` by passing `cell` arrays to `arrow.array`:

```matlab
>> C = {["A" "B"], ["C" "D" "E"], missing, ["F" "G"], string.empty(0, 1)};
>> array = arrow.array(C)

array = 

  ListArray with 5 elements and 1 null value:

    [
        [
            "A",
            "B"
        ],
        [
            "C",
            "D",
            "E"
        ],
        null,
        [
            "F",
            "G"
        ],
        []
    ]

```

* Closes: apache#38354

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ache#38357)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class.

This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method.

### What changes are included in this PR?

1. Added a new `arrow.array.ListArray` MATLAB class.

*Methods*

`cellArray = arrow.array.ListArray.toMATLAB()`
`listArray = arrow.array.ListArray.fromArrays(offsets, values)`

*Properties*

`Offsets` - `Int32Array` list offsets (uses zero-based indexing)
`Values` - Array of values in the list (supports nesting)

2. Added a new `arrow.type.traits.ListTraits` MATLAB class.

**Example**
```matlab
>> offsets = arrow.array(int32([0, 2, 3, 7]))

offsets = 

[
  0,
  2,
  3,
  7
]

>> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"])

values = 

[
  "A",
  "B",
  "C",
  "D",
  "E",
  "F",
  "G"
]

>> arrowArray = arrow.array.ListArray.fromArrays(offsets, values)

arrowArray = 

[
  [
    "A",
    "B"
  ],
  [
    "C"
  ],
  [
    "D",
    "E",
    "F",
    "G"
  ]
]

>> matlabArray = arrowArray.toMATLAB()

matlabArray =

  3x1 cell array

    {2x1 string}
    {["C"     ]}
    {4x1 string}

>> matlabArray{:}

ans = 

  2x1 string array

    "A"
    "B"

ans = 

    "C"

ans = 

  4x1 string array

    "D"
    "E"
    "F"
    "G"

```

### Are these changes tested?

Yes.

1. Added a new `tListArray.m` test class.
2. Added a new `tListTraits.m` test class.
3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`.

### Are there any user-facing changes?

Yes.

1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method.

### Notes

1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large.
2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See apache#38353, apache#38354, and apache#38361.
3. Thank you @ sgilmore10 for your help with this pull request!

### Future Directions

1. apache#38353
2. apache#38354
3. apache#38361
4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays.
* Closes: apache#37815 

Lead-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…validates a MATLAB `cell` array contains only values of the same class type. (apache#38530)

### Rationale for this change

Adding this `ClassTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is either a numeric, boolean, string, time32, or time64 array from a MATLAB `cell` array.

### What changes are included in this PR?

Added an abstract class `arrow.array.internal.list.ListTypeValidator` that defines three abstract methods: 
1. `validateElement(obj, element)`
2. `length = getElementLength(obj, element)` 
3. `C = reshapeCellElements(obj, C)`

These abstract methods will be used in `ListArray.fromMATLAB` to create `ListArray`s from MATLAB `cell` arrays.  Below is a  "pared-down" version of how the `fromMATLAB` algorithm will work:

```matlab
function listArray = fromMATLAB(C)

    % Create the appropriate ListTypeValidator from the
    % first element in the cell array C
    validator = createListTypeValidator(C{1});

    % Pre-allocate a uint32 vector for the offsets
    numRows = numel(C);
    offsets = zeros([numRows 1], "int32");

    for ii = 1:numRows
         cellElement = C{ii};
    
        % Validate cellElement can be used to create
        % one row in the ListArray, i.e. For example,
        % if the first element in C was a double, verify
        % cellElement is also a double.
        validator.validateElement(cellElement);

        % Determine how much to increment the 
        % last offset value by to set the offset at index ii + 1.
        length = validator.getElementLEngth(cellElement);
        offsets[ii + 1] = length + offsets[i];
    end

    % Reshape the elements in cell array C so that they
    % can be vertically concatenated.
    C = validator.reshapeCellElements(C);
    
    % Extract the cell array elements and vertically concatenate
    % them into one array. Then pass this array to arrow.array().
    values = vertcat(C{:});
    valueArray = arrow.array(values);
     
    % Create an Int32Array from offsets
    offsetArray = arrow.array(offsets);

    listArray = arrow.array.ListArray(Values=valueArray, Offsets=offsetArray)
end
```
The concrete type of the `validator` object is created based on the first element in the `cell` array `C`.  We use the first element to determine what kind of `ListArray` to construct from the input `cell` array.

--

Added a concrete class called `arrow.array.internal.list.ClassTypeValidator`, which inherits from `arrow.array.internal.list.ListTypeValidator`:

1. `validateElement(obj, element)` - Throws an error if the element's class type does not match the expected value.
2.  `length = getElementLength(obj, element)` - Returns the number of elements in the input array.
3.  `C = reshapeCellElements(obj, C)` - Reshapes all elements in the `cell` array `C` to be column vectors.

`ClassTypeValidator` will be used when creating `ListArray`s from MATLAB `cell` arrays containing "primitive types", such as numerics, strings, and durations.

### Are these changes tested?

Yes. I added a new class called `tClassTypeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38420 
2. apache#38417
3. apache#38354 
* Closes: apache#38419

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…alidates a MATLAB `cell` array contains only values of zoned or unzoned `datetime`s (apache#38533)

### Rationale for this change

This is a followup to apache#38419.

Adding this `DatetimeTypeValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB()` method for creating `ListArray`s whose `ValueType`s is a timestamp array from a MATLAB `cell` array.

This validator will ensure the cell array contain only `datetime`s or unzoned `datetime`s. This is a requirement when creating a `List` of `Timestamp`s because two MATLAB `datetime`s can only be concatenated together if they are either both zoned or both unzoned:

```matlab
>> d1 = datetime(2023, 10, 31, TimeZone="America/New_York");
>> d2 =datetime(2023, 11, 1);
>> [d1; d2]
Error using datetime/vertcat
Unable to concatenate a datetime array that has a time zone with one that does not have a time
zone.
```

### What changes are included in this PR?

Added a new MATLAB class called `arrow.array.internal.list.DatetimeValidator`, which inherits from `arrow.array.internal.list.ClassTypeValidator`.

 This new class defines one property called `HasTimeZone`, which is a scalar `logical` indicating if the validator expects all `datetime`s to be zoned or not. 

Additionally, `DatetimeValidator` overrides the `validateElement` method. It first call's `ClassTypeValidator`'s implementation of `validateElement` to verify the input element is a `datetime`. If so, it then confirms that the input `datetime`'s TimeZone property is empty or nonempty, based on the validator's `HasTimeZone`  property value.

### Are these changes tested?

Yes, I added a new test class called `tDatetimeValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions

1. apache#38417 
2. apache#38354 
* Closes: apache#38420

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…validates a MATLAB `cell` array contains only `table`s that share the same schema (apache#38551)

### Rationale for this change

This is a followup to apache#38533.

Adding this `TableValidator` class is a step towards implementing the `arrow.array.ListArray.fromMATLAB` method for creating `ListArray`s whose `ValueType` is a `StructArray`.

This validator will ensure all `table`s in a `cell` array have the same schema when attempting to make a `ListArray` of `Struct`s. This is a requirement to ensure the `table`s in the `cell` array are vertcat'ble. For example, two `table`s with different `VariableNames` cannot be concatenated together:

```matlab
>> t1 = table(1, 2, VariableNames=["A", "B"]);
>> t2 = table(3, 4, VariableNames=["C", "D"]);
>> vertcat(t1, t2)
Error using tabular/vertcat
All tables being vertically concatenated must have the same variable names.
```

### What changes are included in this PR?

Modified `arrow.array.internal.list.Validator` to inherit from `matlab.mixin.Heterogeneous`. Doing so enables creating an array whose elements are different subclasses of `arrow.array.internal.list.Validator`.

Added a new MATLAB class `arrow.array.internal.list.TableValidator`, which inherits from `arrow.array.internal.list.Validator`. This class has two properties: `VariableNames` and `VariableValidators`. 

`VariableNames` is a `string` array containing the expected variable names of all `table`s.

`VariableValidators` is an array of `arrow.array.internal.list.Validator`, in which each element represents one variable in a `table`. This array is used to validate `table` variables have the expected type and configuration. 

`TableValidator`'s `validateElement` method uses both its `VariableNames` and `VariableValidator` properties to validate the input argument provided is a `table` with the expected schema. If not, it throws an error.

Lastly, I  added a gateway function called `arrow.array.internal.list.createValidator`, which creates the appropriate `Validator` subclass based on the input. If no such `Validator` exists, an error is thrown.

### Are these changes tested?

Yes. Added two new test classes: `tTableValidator.m` and `tCreateValidator.m`.

### Are there any user-facing changes?

No.

### Future Directions: 

1. apache#38354
* Closes: apache#38417

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this issue Feb 19, 2024
…ay.ListArray` (apache#38561)

### Rationale for this change

We should implement a static `fromMATLAB` method for `arrow.array.ListArray` that takes in a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. Adding this method enables users to create an `arrow.array.ListArray` by passing a MATLAB `cell` array to the `arrow.array` gateway function:

```matlab
>> C = {[1 2 3], [4 5], 6};
>> array = arrow.array(C)

array = 

  ListArray with 3 elements and 0 null values:

    [
        [
            1,
            2,
            3
        ],
        [
            4,
            5
        ],
        [
            6
        ]
    ]
```
Internally, the `arrow.array` gateway function will call `arrow.array.ListArray.fromMATLAB` to construct a `ListArray` from the given `cell` array.

### What changes are included in this PR?

1. Implemented `fromMATLAB` method on `arrow.array.ListArray`. This method accepts a MATLAB `cell` array and returns an instance of `arrow.array.ListArray`. 
2. Set the `ArrayStaticConstructor` property of `arrow.type.traits.ListTraits` to `@ arrow.array.ListArray.fromMATLAB`.
3. Added a switch case for `"cell"` to the `arrow.array` gateway function that invokes `arrow.array.ListArray.fromMATLAB` with the input `cell` array.

### Are these changes tested?

Yes. I added a new test class to the `test/arrow/array/list` folder named `tFromMATLAB.m`.

### Are there any user-facing changes?

Yes. Users can now create instances of `arrow.array.ListArray` by passing `cell` arrays to `arrow.array`:

```matlab
>> C = {["A" "B"], ["C" "D" "E"], missing, ["F" "G"], string.empty(0, 1)};
>> array = arrow.array(C)

array = 

  ListArray with 5 elements and 1 null value:

    [
        [
            "A",
            "B"
        ],
        [
            "C",
            "D",
            "E"
        ],
        null,
        [
            "F",
            "G"
        ],
        []
    ]

```

* Closes: apache#38354

Authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants