forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
…ache#38357) ### Rationale for this change Now that many of the commonly-used "primitive" array types have been added to the MATLAB interface, we can implement an `arrow.array.ListArray` class. This pull request adds a new `arrow.array.ListArray` class which can be converted to a MATLAB `cell` array by calling the static `toMATLAB` method. ### What changes are included in this PR? 1. Added a new `arrow.array.ListArray` MATLAB class. *Methods* `cellArray = arrow.array.ListArray.toMATLAB()` `listArray = arrow.array.ListArray.fromArrays(offsets, values)` *Properties* `Offsets` - `Int32Array` list offsets (uses zero-based indexing) `Values` - Array of values in the list (supports nesting) 2. Added a new `arrow.type.traits.ListTraits` MATLAB class. **Example** ```matlab >> offsets = arrow.array(int32([0, 2, 3, 7])) offsets = [ 0, 2, 3, 7 ] >> values = arrow.array(["A", "B", "C", "D", "E", "F", "G"]) values = [ "A", "B", "C", "D", "E", "F", "G" ] >> arrowArray = arrow.array.ListArray.fromArrays(offsets, values) arrowArray = [ [ "A", "B" ], [ "C" ], [ "D", "E", "F", "G" ] ] >> matlabArray = arrowArray.toMATLAB() matlabArray = 3x1 cell array {2x1 string} {["C" ]} {4x1 string} >> matlabArray{:} ans = 2x1 string array "A" "B" ans = "C" ans = 4x1 string array "D" "E" "F" "G" ``` ### Are these changes tested? Yes. 1. Added a new `tListArray.m` test class. 2. Added a new `tListTraits.m` test class. 3. Updated `arrow.internal.test.tabular.createAllSupportedArrayTypes` to include `ListArray`. ### Are there any user-facing changes? Yes. 1. Users can now create an `arrow.array.ListArray` from an `offsets` and `values` array by calling the static `arrow.array.ListArray.fromArrays(offsets, values)` method. `ListArray`s can be converted into MATLAB `cell` arrays by calling the static `arrow.array.ListArray.toMATLAB` method. ### Notes 1. We chose to use the "missing-class" `missing` value as the `NullSubstitutionValue` for the time being for `ListArray`. However, we eventually want to add `arrow.array.NullArray`, and will most likely want to use the "missing-class" `missing` value to represent `NullArray` values in MATLAB. So, this could cause some ambiguity in the future. We have been thinking about whether we should consider introducing some sort of special "sentinel value" to represent null values when converting to MATLAB `cell` arrays. Perhaps, something like `arrow.Null`, or something to that effect, in order to avoid this ambiguity. If we think it makes sense to do that, we may want to retroactively change the `NullSubstitutionValue` to be `arrow.Null` and break compatibility. Since we are still in pre-`0.1`, we don't think the impact of such a behavior change would be very large. 2. Implementing `ListArray` is fairly involved. So, in the spirit of incremental delivery, we chose not to include an implementation of `arrow.array.ListArray.fromMATLAB` in this initial pull request. We plan on following up with some more changes to `arrow.array.ListArray`. See apache#38353, apache#38354, and apache#38361. 3. Thank you @ sgilmore10 for your help with this pull request! ### Future Directions 1. apache#38353 2. apache#38354 3. apache#38361 4. Consider adding a null sentinel value like `arrow.Null` for conversion to MATLAB `cell` arrays. * Closes: apache#37815 Lead-authored-by: Kevin Gurney <[email protected]> Co-authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
- Loading branch information
1 parent
1d11df3
commit 3beb93a
Showing
12 changed files
with
444 additions
and
9 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "arrow/matlab/array/proxy/list_array.h" | ||
#include "arrow/matlab/array/proxy/numeric_array.h" | ||
#include "arrow/matlab/array/proxy/wrap.h" | ||
#include "arrow/matlab/error/error.h" | ||
#include "libmexclass/proxy/ProxyManager.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
ListArray::ListArray(std::shared_ptr<arrow::ListArray> list_array) : proxy::Array{std::move(list_array)} { | ||
REGISTER_METHOD(ListArray, getValues); | ||
REGISTER_METHOD(ListArray, getOffsets); | ||
} | ||
|
||
libmexclass::proxy::MakeResult ListArray::make(const libmexclass::proxy::FunctionArguments& constructor_arguments) { | ||
namespace mda = ::matlab::data; | ||
using libmexclass::proxy::ProxyManager; | ||
using Int32ArrayProxy = arrow::matlab::array::proxy::NumericArray<arrow::Int32Type>; | ||
using ListArrayProxy = arrow::matlab::array::proxy::ListArray; | ||
using ArrayProxy = arrow::matlab::array::proxy::Array; | ||
|
||
mda::StructArray opts = constructor_arguments[0]; | ||
const mda::TypedArray<uint64_t> offsets_proxy_id_mda = opts[0]["OffsetsProxyID"]; | ||
const mda::TypedArray<uint64_t> values_proxy_id_mda = opts[0]["ValuesProxyID"]; | ||
const mda::TypedArray<bool> validity_bitmap_mda = opts[0]["Valid"]; | ||
|
||
const auto offsets_proxy_id = offsets_proxy_id_mda[0]; | ||
const auto values_proxy_id = values_proxy_id_mda[0]; | ||
|
||
const auto offsets_proxy = std::static_pointer_cast<Int32ArrayProxy>(ProxyManager::getProxy(offsets_proxy_id)); | ||
const auto values_proxy = std::static_pointer_cast<ArrayProxy>(ProxyManager::getProxy(values_proxy_id)); | ||
|
||
const auto offsets = offsets_proxy->unwrap(); | ||
const auto values = values_proxy->unwrap(); | ||
|
||
// Pack the validity bitmap values. | ||
MATLAB_ASSIGN_OR_ERROR(auto validity_bitmap_buffer, | ||
bit::packValid(validity_bitmap_mda), | ||
error::BITPACK_VALIDITY_BITMAP_ERROR_ID); | ||
|
||
// Create a ListArray from values and offsets. | ||
MATLAB_ASSIGN_OR_ERROR(auto array, | ||
arrow::ListArray::FromArrays(*offsets, *values, arrow::default_memory_pool(), validity_bitmap_buffer), | ||
error::LIST_ARRAY_FROM_ARRAYS_FAILED); | ||
|
||
// Return a ListArray Proxy. | ||
auto list_array = std::static_pointer_cast<arrow::ListArray>(array); | ||
return std::make_shared<ListArrayProxy>(std::move(list_array)); | ||
} | ||
|
||
void ListArray::getValues(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
using libmexclass::proxy::ProxyManager; | ||
|
||
auto list_array = std::static_pointer_cast<arrow::ListArray>(array); | ||
auto value_array = list_array->values(); | ||
|
||
// Wrap the array within a proxy object if possible. | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto value_array_proxy, | ||
proxy::wrap(value_array), | ||
context, error::UNKNOWN_PROXY_FOR_ARRAY_TYPE); | ||
const auto value_array_proxy_id = ProxyManager::manageProxy(value_array_proxy); | ||
const auto type_id = value_array->type_id(); | ||
|
||
// Return a struct with two fields: ProxyID and TypeID. The MATLAB | ||
// layer will use these values to construct the appropriate MATLAB | ||
// arrow.array.Array subclass. | ||
mda::ArrayFactory factory; | ||
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"}); | ||
output[0]["ProxyID"] = factory.createScalar(value_array_proxy_id); | ||
output[0]["TypeID"] = factory.createScalar(static_cast<int32_t>(type_id)); | ||
context.outputs[0] = output; | ||
} | ||
|
||
void ListArray::getOffsets(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
using libmexclass::proxy::ProxyManager; | ||
using Int32ArrayProxy = arrow::matlab::array::proxy::NumericArray<arrow::Int32Type>; | ||
auto list_array = std::static_pointer_cast<arrow::ListArray>(array); | ||
auto offsets_array = list_array->offsets(); | ||
auto offsets_int32_array = std::static_pointer_cast<arrow::Int32Array>(offsets_array); | ||
auto offsets_int32_array_proxy = std::make_shared<Int32ArrayProxy>(offsets_int32_array); | ||
const auto offsets_int32_array_proxy_id = ProxyManager::manageProxy(offsets_int32_array_proxy); | ||
mda::ArrayFactory factory; | ||
context.outputs[0] = factory.createScalar(offsets_int32_array_proxy_id); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#pragma once | ||
|
||
#include "arrow/matlab/array/proxy/array.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
class ListArray : public arrow::matlab::array::proxy::Array { | ||
|
||
public: | ||
ListArray(std::shared_ptr<arrow::ListArray> list_array); | ||
~ListArray() {} | ||
|
||
static libmexclass::proxy::MakeResult make(const libmexclass::proxy::FunctionArguments& constructor_arguments); | ||
|
||
protected: | ||
void getValues(libmexclass::proxy::method::Context& context); | ||
void getOffsets(libmexclass::proxy::method::Context& context); | ||
|
||
}; | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
% Licensed to the Apache Software Foundation (ASF) under one or more | ||
% contributor license agreements. See the NOTICE file distributed with | ||
% this work for additional information regarding copyright ownership. | ||
% The ASF licenses this file to you under the Apache License, Version | ||
% 2.0 (the "License"); you may not use this file except in compliance | ||
% with the License. You may obtain a copy of the License at | ||
% | ||
% http://www.apache.org/licenses/LICENSE-2.0 | ||
% | ||
% Unless required by applicable law or agreed to in writing, software | ||
% distributed under the License is distributed on an "AS IS" BASIS, | ||
% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
% implied. See the License for the specific language governing | ||
% permissions and limitations under the License. | ||
|
||
classdef ListArray < arrow.array.Array | ||
|
||
properties (Hidden, GetAccess=public, SetAccess=private) | ||
NullSubstitutionValue = missing; | ||
end | ||
|
||
properties (Dependent, GetAccess=public, SetAccess=private) | ||
Values | ||
Offsets | ||
end | ||
|
||
methods | ||
|
||
function obj = ListArray(proxy) | ||
arguments | ||
proxy(1, 1) libmexclass.proxy.Proxy {validate(proxy, "arrow.array.proxy.ListArray")} | ||
end | ||
import arrow.internal.proxy.validate | ||
[email protected](proxy); | ||
end | ||
|
||
function values = get.Values(obj) | ||
valueStruct = obj.Proxy.getValues(); | ||
traits = arrow.type.traits.traits(arrow.type.ID(valueStruct.TypeID)); | ||
proxy = libmexclass.proxy.Proxy(Name=traits.ArrayProxyClassName, ID=valueStruct.ProxyID); | ||
values = traits.ArrayConstructor(proxy); | ||
end | ||
|
||
function offsets = get.Offsets(obj) | ||
proxyID = obj.Proxy.getOffsets(); | ||
proxy = libmexclass.proxy.Proxy(Name="arrow.array.proxy.Int32Array", ID=proxyID); | ||
offsets = arrow.array.Int32Array(proxy); | ||
end | ||
|
||
function matlabArray = toMATLAB(obj) | ||
numElements = obj.NumElements; | ||
matlabArray = cell(numElements, 1); | ||
|
||
values = toMATLAB(obj.Values); | ||
% Add one to Offsets array because MATLAB | ||
% uses 1-based indexing. | ||
offsets = toMATLAB(obj.Offsets) + 1; | ||
|
||
startIndex = offsets(1); | ||
for ii = 1:numElements | ||
% Subtract 1 because ending offset value is exclusive. | ||
endIndex = offsets(ii + 1) - 1; | ||
matlabArray{ii} = values(startIndex:endIndex, :); | ||
startIndex = endIndex + 1; | ||
end | ||
|
||
hasInvalid = ~all(obj.Valid); | ||
if hasInvalid | ||
matlabArray(~obj.Valid) = {obj.NullSubstitutionValue}; | ||
end | ||
end | ||
|
||
end | ||
|
||
methods (Static) | ||
|
||
function array = fromArrays(offsets, values, opts) | ||
arguments | ||
offsets (1, 1) arrow.array.Int32Array | ||
values (1, 1) arrow.array.Array | ||
opts.Valid | ||
end | ||
|
||
import arrow.internal.validate.parseValid | ||
|
||
if nargin < 2 | ||
error("arrow:array:list:FromArraysValuesAndOffsets", ... | ||
"Must supply both an offsets and values array to construct a ListArray.") | ||
end | ||
|
||
% Offsets should contain one more element than the number of elements in the output ListArray. | ||
numElements = offsets.NumElements - 1; | ||
|
||
validElements = parseValid(opts, numElements); | ||
offsetsProxyID = offsets.Proxy.ID; | ||
valuesProxyID = values.Proxy.ID; | ||
|
||
args = struct(... | ||
OffsetsProxyID=offsetsProxyID, ... | ||
ValuesProxyID=valuesProxyID, ... | ||
Valid=validElements ... | ||
); | ||
|
||
proxyName = "arrow.array.proxy.ListArray"; | ||
proxy = arrow.internal.proxy.create(proxyName, args); | ||
array = arrow.array.ListArray(proxy); | ||
end | ||
|
||
end | ||
|
||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -62,4 +62,4 @@ | |
if ~ischar(data) | ||
data = convertCharsToStrings(data); | ||
end | ||
end | ||
end |
Oops, something went wrong.