Skip to content

Commit

Permalink
apacheGH-37815: [MATLAB] Add arrow.array.ListArray MATLAB class (ap…
Browse files Browse the repository at this point in the history
…ache#38357)

### Rationale for this change

Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class.

This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method.

### What changes are included in this PR?

1. Added a new `arrow.array.ListArray` MATLAB class.

*Methods*

`cellArray = arrow.array.ListArray.toMATLAB()`
`listArray = arrow.array.ListArray.fromArrays(offsets, values)`

*Properties*

`Offsets` - `Int32Array` list offsets (uses zero-based indexing)
`Values` - Array of values in the list (supports nesting)

2. Added a new `arrow.type.traits.ListTraits` MATLAB class.

**Example**
```matlab
>> offsets = arrow.array(int32([0, 2, 3, 7]))

offsets = 

[
  0,
  2,
  3,
  7
]

>> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"])

values = 

[
  "A",
  "B",
  "C",
  "D",
  "E",
  "F",
  "G"
]

>> arrowArray = arrow.array.ListArray.fromArrays(offsets, values)

arrowArray = 

[
  [
    "A",
    "B"
  ],
  [
    "C"
  ],
  [
    "D",
    "E",
    "F",
    "G"
  ]
]

>> matlabArray = arrowArray.toMATLAB()

matlabArray =

  3x1 cell array

    {2x1 string}
    {["C"     ]}
    {4x1 string}

>> matlabArray{:}

ans = 

  2x1 string array

    "A"
    "B"

ans = 

    "C"

ans = 

  4x1 string array

    "D"
    "E"
    "F"
    "G"

```

### Are these changes tested?

Yes.

1. Added a new `tListArray.m` test class.
2. Added a new `tListTraits.m` test class.
3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`.

### Are there any user-facing changes?

Yes.

1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method.

### Notes

1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large.
2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See apache#38353, apache#38354, and apache#38361.
3. Thank you @ sgilmore10 for your help with this pull request!

### Future Directions

1. apache#38353
2. apache#38354
3. apache#38361
4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays.
* Closes: apache#37815 

Lead-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Sarah Gilmore <[email protected]>
Signed-off-by: Kevin Gurney <[email protected]>
  • Loading branch information
kevingurney and sgilmore10 authored Oct 23, 2023
1 parent 1d11df3 commit 3beb93a
Show file tree
Hide file tree
Showing 12 changed files with 444 additions and 9 deletions.
103 changes: 103 additions & 0 deletions matlab/src/cpp/arrow/matlab/array/proxy/list_array.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#include "arrow/matlab/array/proxy/list_array.h"
#include "arrow/matlab/array/proxy/numeric_array.h"
#include "arrow/matlab/array/proxy/wrap.h"
#include "arrow/matlab/error/error.h"
#include "libmexclass/proxy/ProxyManager.h"

namespace arrow::matlab::array::proxy {

ListArray::ListArray(std::shared_ptr<arrow::ListArray> list_array) : proxy::Array{std::move(list_array)} {
REGISTER_METHOD(ListArray, getValues);
REGISTER_METHOD(ListArray, getOffsets);
}

libmexclass::proxy::MakeResult ListArray::make(const libmexclass::proxy::FunctionArguments& constructor_arguments) {
namespace mda = ::matlab::data;
using libmexclass::proxy::ProxyManager;
using Int32ArrayProxy = arrow::matlab::array::proxy::NumericArray<arrow::Int32Type>;
using ListArrayProxy = arrow::matlab::array::proxy::ListArray;
using ArrayProxy = arrow::matlab::array::proxy::Array;

mda::StructArray opts = constructor_arguments[0];
const mda::TypedArray<uint64_t> offsets_proxy_id_mda = opts[0]["OffsetsProxyID"];
const mda::TypedArray<uint64_t> values_proxy_id_mda = opts[0]["ValuesProxyID"];
const mda::TypedArray<bool> validity_bitmap_mda = opts[0]["Valid"];

const auto offsets_proxy_id = offsets_proxy_id_mda[0];
const auto values_proxy_id = values_proxy_id_mda[0];

const auto offsets_proxy = std::static_pointer_cast<Int32ArrayProxy>(ProxyManager::getProxy(offsets_proxy_id));
const auto values_proxy = std::static_pointer_cast<ArrayProxy>(ProxyManager::getProxy(values_proxy_id));

const auto offsets = offsets_proxy->unwrap();
const auto values = values_proxy->unwrap();

// Pack the validity bitmap values.
MATLAB_ASSIGN_OR_ERROR(auto validity_bitmap_buffer,
bit::packValid(validity_bitmap_mda),
error::BITPACK_VALIDITY_BITMAP_ERROR_ID);

// Create a ListArray from values and offsets.
MATLAB_ASSIGN_OR_ERROR(auto array,
arrow::ListArray::FromArrays(*offsets, *values, arrow::default_memory_pool(), validity_bitmap_buffer),
error::LIST_ARRAY_FROM_ARRAYS_FAILED);

// Return a ListArray Proxy.
auto list_array = std::static_pointer_cast<arrow::ListArray>(array);
return std::make_shared<ListArrayProxy>(std::move(list_array));
}

void ListArray::getValues(libmexclass::proxy::method::Context& context) {
namespace mda = ::matlab::data;
using libmexclass::proxy::ProxyManager;

auto list_array = std::static_pointer_cast<arrow::ListArray>(array);
auto value_array = list_array->values();

// Wrap the array within a proxy object if possible.
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto value_array_proxy,
proxy::wrap(value_array),
context, error::UNKNOWN_PROXY_FOR_ARRAY_TYPE);
const auto value_array_proxy_id = ProxyManager::manageProxy(value_array_proxy);
const auto type_id = value_array->type_id();

// Return a struct with two fields: ProxyID and TypeID. The MATLAB
// layer will use these values to construct the appropriate MATLAB
// arrow.array.Array subclass.
mda::ArrayFactory factory;
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"});
output[0]["ProxyID"] = factory.createScalar(value_array_proxy_id);
output[0]["TypeID"] = factory.createScalar(static_cast<int32_t>(type_id));
context.outputs[0] = output;
}

void ListArray::getOffsets(libmexclass::proxy::method::Context& context) {
namespace mda = ::matlab::data;
using libmexclass::proxy::ProxyManager;
using Int32ArrayProxy = arrow::matlab::array::proxy::NumericArray<arrow::Int32Type>;
auto list_array = std::static_pointer_cast<arrow::ListArray>(array);
auto offsets_array = list_array->offsets();
auto offsets_int32_array = std::static_pointer_cast<arrow::Int32Array>(offsets_array);
auto offsets_int32_array_proxy = std::make_shared<Int32ArrayProxy>(offsets_int32_array);
const auto offsets_int32_array_proxy_id = ProxyManager::manageProxy(offsets_int32_array_proxy);
mda::ArrayFactory factory;
context.outputs[0] = factory.createScalar(offsets_int32_array_proxy_id);
}
}
38 changes: 38 additions & 0 deletions matlab/src/cpp/arrow/matlab/array/proxy/list_array.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

#pragma once

#include "arrow/matlab/array/proxy/array.h"

namespace arrow::matlab::array::proxy {

class ListArray : public arrow::matlab::array::proxy::Array {

public:
ListArray(std::shared_ptr<arrow::ListArray> list_array);
~ListArray() {}

static libmexclass::proxy::MakeResult make(const libmexclass::proxy::FunctionArguments& constructor_arguments);

protected:
void getValues(libmexclass::proxy::method::Context& context);
void getOffsets(libmexclass::proxy::method::Context& context);

};

}
3 changes: 3 additions & 0 deletions matlab/src/cpp/arrow/matlab/array/proxy/wrap.cc
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "arrow/matlab/array/proxy/boolean_array.h"
#include "arrow/matlab/array/proxy/numeric_array.h"
#include "arrow/matlab/array/proxy/string_array.h"
#include "arrow/matlab/array/proxy/list_array.h"
#include "arrow/matlab/array/proxy/struct_array.h"

namespace arrow::matlab::array::proxy {
Expand Down Expand Up @@ -62,6 +63,8 @@ namespace arrow::matlab::array::proxy {
return std::make_shared<proxy::NumericArray<arrow::Date64Type>>(std::static_pointer_cast<arrow::Date64Array>(array));
case ID::STRING:
return std::make_shared<proxy::StringArray>(std::static_pointer_cast<arrow::StringArray>(array));
case ID::LIST:
return std::make_shared<proxy::ListArray>(std::static_pointer_cast<arrow::ListArray>(array));
case ID::STRUCT:
return std::make_shared<proxy::StructArray>(std::static_pointer_cast<arrow::StructArray>(array));
default:
Expand Down
1 change: 1 addition & 0 deletions matlab/src/cpp/arrow/matlab/error/error.h
Original file line number Diff line number Diff line change
Expand Up @@ -197,6 +197,7 @@ namespace arrow::matlab::error {
static const char* CHUNKED_ARRAY_NUMERIC_INDEX_WITH_EMPTY_CHUNKED_ARRAY = "arrow:chunkedarray:NumericIndexWithEmptyChunkedArray";
static const char* CHUNKED_ARRAY_INVALID_NUMERIC_CHUNK_INDEX = "arrow:chunkedarray:InvalidNumericChunkIndex";
static const char* STRUCT_ARRAY_MAKE_FAILED = "arrow:array:StructArrayMakeFailed";
static const char* LIST_ARRAY_FROM_ARRAYS_FAILED = "arrow:array:ListArrayFromArraysFailed";
static const char* INDEX_EMPTY_CONTAINER = "arrow:index:EmptyContainer";
static const char* INDEX_OUT_OF_RANGE = "arrow:index:OutOfRange";
static const char* BUFFER_VIEW_OR_COPY_FAILED = "arrow:buffer:ViewOrCopyFailed";
Expand Down
2 changes: 2 additions & 0 deletions matlab/src/cpp/arrow/matlab/proxy/factory.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
#include "arrow/matlab/array/proxy/time32_array.h"
#include "arrow/matlab/array/proxy/time64_array.h"
#include "arrow/matlab/array/proxy/struct_array.h"
#include "arrow/matlab/array/proxy/list_array.h"
#include "arrow/matlab/array/proxy/chunked_array.h"
#include "arrow/matlab/tabular/proxy/record_batch.h"
#include "arrow/matlab/tabular/proxy/table.h"
Expand Down Expand Up @@ -61,6 +62,7 @@ libmexclass::proxy::MakeResult Factory::make_proxy(const ClassName& class_name,
REGISTER_PROXY(arrow.array.proxy.BooleanArray , arrow::matlab::array::proxy::BooleanArray);
REGISTER_PROXY(arrow.array.proxy.StringArray , arrow::matlab::array::proxy::StringArray);
REGISTER_PROXY(arrow.array.proxy.StructArray , arrow::matlab::array::proxy::StructArray);
REGISTER_PROXY(arrow.array.proxy.ListArray , arrow::matlab::array::proxy::ListArray);
REGISTER_PROXY(arrow.array.proxy.TimestampArray, arrow::matlab::array::proxy::NumericArray<arrow::TimestampType>);
REGISTER_PROXY(arrow.array.proxy.Time32Array , arrow::matlab::array::proxy::NumericArray<arrow::Time32Type>);
REGISTER_PROXY(arrow.array.proxy.Time64Array , arrow::matlab::array::proxy::NumericArray<arrow::Time64Type>);
Expand Down
111 changes: 111 additions & 0 deletions matlab/src/matlab/+arrow/+array/ListArray.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
% Licensed to the Apache Software Foundation (ASF) under one or more
% contributor license agreements. See the NOTICE file distributed with
% this work for additional information regarding copyright ownership.
% The ASF licenses this file to you under the Apache License, Version
% 2.0 (the "License"); you may not use this file except in compliance
% with the License. You may obtain a copy of the License at
%
% http://www.apache.org/licenses/LICENSE-2.0
%
% Unless required by applicable law or agreed to in writing, software
% distributed under the License is distributed on an "AS IS" BASIS,
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
% implied. See the License for the specific language governing
% permissions and limitations under the License.

classdef ListArray < arrow.array.Array

properties (Hidden, GetAccess=public, SetAccess=private)
NullSubstitutionValue = missing;
end

properties (Dependent, GetAccess=public, SetAccess=private)
Values
Offsets
end

methods

function obj = ListArray(proxy)
arguments
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.array.proxy.ListArray")}
end
import arrow.internal.proxy.validate
[email protected](proxy);
end

function values = get.Values(obj)
valueStruct = obj.Proxy.getValues();
traits = arrow.type.traits.traits(arrow.type.ID(valueStruct.TypeID));
proxy = libmexclass.proxy.Proxy(Name=traits.ArrayProxyClassName, ID=valueStruct.ProxyID);
values = traits.ArrayConstructor(proxy);
end

function offsets = get.Offsets(obj)
proxyID = obj.Proxy.getOffsets();
proxy = libmexclass.proxy.Proxy(Name="arrow.array.proxy.Int32Array", ID=proxyID);
offsets = arrow.array.Int32Array(proxy);
end

function matlabArray = toMATLAB(obj)
numElements = obj.NumElements;
matlabArray = cell(numElements, 1);

values = toMATLAB(obj.Values);
% Add one to Offsets array because MATLAB
% uses 1-based indexing.
offsets = toMATLAB(obj.Offsets) + 1;

startIndex = offsets(1);
for ii = 1:numElements
% Subtract 1 because ending offset value is exclusive.
endIndex = offsets(ii + 1) - 1;
matlabArray{ii} = values(startIndex:endIndex, :);
startIndex = endIndex + 1;
end

hasInvalid = ~all(obj.Valid);
if hasInvalid
matlabArray(~obj.Valid) = {obj.NullSubstitutionValue};
end
end

end

methods (Static)

function array = fromArrays(offsets, values, opts)
arguments
offsets (1, 1) arrow.array.Int32Array
values (1, 1) arrow.array.Array
opts.Valid
end

import arrow.internal.validate.parseValid

if nargin < 2
error("arrow:array:list:FromArraysValuesAndOffsets", ...
"Must supply both an offsets and values array to construct a ListArray.")
end

% Offsets should contain one more element than the number of elements in the output ListArray.
numElements = offsets.NumElements - 1;

validElements = parseValid(opts, numElements);
offsetsProxyID = offsets.Proxy.ID;
valuesProxyID = values.Proxy.ID;

args = struct(...
OffsetsProxyID=offsetsProxyID, ...
ValuesProxyID=valuesProxyID, ...
Valid=validElements ...
);

proxyName = "arrow.array.proxy.ListArray";
proxy = arrow.internal.proxy.create(proxyName, args);
array = arrow.array.ListArray(proxy);
end

end

end
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,13 @@
stringArray = arrow.array(strings);
arrowArrays{ii} = StructArray.fromArrays(timestampArray, stringArray);
matlabData{ii} = table(dates, strings, VariableNames=["Field1", "Field2"]);
elseif name == "arrow.array.ListArray"
offsets = arrow.array(int32(0:opts.NumRows));
numbers = randomNumbers("double", opts.NumRows);
matlabData{ii} = num2cell(numbers);
values = arrow.array(numbers);
listArray = ListArray.fromArrays(offsets, values);
arrowArrays{ii} = listArray;
else
error("arrow:test:SupportedArrayCase", ...
"Missing if-branch for array class " + name);
Expand Down
10 changes: 6 additions & 4 deletions matlab/src/matlab/+arrow/+type/+traits/ListTraits.m
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,17 @@
classdef ListTraits < arrow.type.traits.TypeTraits

properties (Constant)
ArrayConstructor = missing
ArrayClassName = missing
ArrayProxyClassName = missing
ArrayConstructor = @arrow.array.ListArray
ArrayClassName = "arrow.array.ListArray"
ArrayProxyClassName = "arrow.array.proxy.ListArray"
ArrayStaticConstructor = missing
TypeConstructor = @arrow.type.ListType
TypeClassName = "arrow.type.ListType"
TypeProxyClassName = "arrow.type.proxy.ListType"
% The cell function works differently than other
% "type construction functions" in MATLAB.
MatlabConstructor = missing
MatlabClassName = missing
MatlabClassName = "cell"
end

end
2 changes: 1 addition & 1 deletion matlab/src/matlab/+arrow/array.m
Original file line number Diff line number Diff line change
Expand Up @@ -62,4 +62,4 @@
if ~ischar(data)
data = convertCharsToStrings(data);
end
end
end
Loading

0 comments on commit 3beb93a

Please sign in to comment.