forked from apache/arrow
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
…pache#37806) ### Rationale for this change Now that many of the commonly-used "primitive" array types have been added to the MATLAB Inferface, we can implement `arrow.array.StructArray` class. ### What changes are included in this PR? Added `arrow.array.StructArray` MATLAB class. *Methods* of `arrow.array.StructArray` include: - `fromArrays(arrays, nvpairs)` - `field(i)` -> get the `i` field as an `arrow.array.Array`. `i` can be a positive integer or a field name. - `toMATLAB()` -> convert to a MATLAB `table` - `table()` -> convert to a MATLAB `table` *Properties* of `arrow.array.StructArray` include: - `Type` - `Length` - `NumFields` - `FieldNames` - `Valid` **Example Usage** ```matlab >> a = arrow.array([1, 2, 3, 4]); >> b = arrow.array(["A", "B", "C", "D"]); >> s = arrow.array.StructArray.fromArrays(a, b, FieldNames=["A", "B"]) s = -- is_valid: all not null -- child 0 type: double [ 1, 2, 3, 4 ] -- child 1 type: string [ "A", "B", "C", "D" ] % Convert StructArray to a MATLAB table >> t = toMATLAB(s) t = 4×2 table A B _ ___ 1 "A" 2 "B" 3 "C" 4 "D" ``` ### Are these changes tested? Yes. Added a new test class `tStructArray.m` ### Are there any user-facing changes? Yes. Users can now construct an `arrow.array.StructArray` instance. ### Notes 1. Although [`struct`](https://www.mathworks.com/help/matlab/ref/struct.html) is a MATLAB datatype, `StructArray`'s `toMATLAB` method returns a MATLAB `table`. We went with this design because the layout of MATLAB `table`s more closely resembles `StructArray`s. MATLAB `tables` ensure a consistent schema and the data is laid out in a columnar format. In a future PR, we plan on adding a `struct` method to `StructArray`, which will return a MATLAB `struct` array. 2. I removed the virtual `toMATLAB` method from `proxy::Array` because the nested array MATLAB will implement their `toMATLAB` method by invoking the `toMATLAB` method on their field arrays. There's no need for the C++ proxy classes of nested arrays to have a `toMATLAB` method. ### Future Directions 1. Add a `fromMATLAB` static method to create `StructArray`s from MATLAB `tables` and MATLAB `struct` arrays. 4. Add a `fromTable` static method to create `StructArray`s from `arrow.tabular.Table`s 5. Add a `fromRecordBatch` static method to create `StructArray`s from `arrow.tabular.RecordBatch`s * Closes: apache#37653 Authored-by: Sarah Gilmore <[email protected]> Signed-off-by: Kevin Gurney <[email protected]>
- Loading branch information
1 parent
fc052a0
commit fe278f0
Showing
41 changed files
with
803 additions
and
85 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
199 changes: 199 additions & 0 deletions
199
matlab/src/cpp/arrow/matlab/array/proxy/struct_array.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,199 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#include "arrow/matlab/array/proxy/struct_array.h" | ||
#include "arrow/matlab/array/proxy/wrap.h" | ||
#include "arrow/matlab/bit/pack.h" | ||
#include "arrow/matlab/error/error.h" | ||
#include "arrow/matlab/index/validate.h" | ||
|
||
#include "arrow/util/utf8.h" | ||
|
||
#include "libmexclass/proxy/ProxyManager.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
StructArray::StructArray(std::shared_ptr<arrow::StructArray> struct_array) | ||
: proxy::Array{std::move(struct_array)} { | ||
REGISTER_METHOD(StructArray, getNumFields); | ||
REGISTER_METHOD(StructArray, getFieldByIndex); | ||
REGISTER_METHOD(StructArray, getFieldByName); | ||
REGISTER_METHOD(StructArray, getFieldNames); | ||
} | ||
|
||
libmexclass::proxy::MakeResult StructArray::make(const libmexclass::proxy::FunctionArguments& constructor_arguments) { | ||
namespace mda = ::matlab::data; | ||
using libmexclass::proxy::ProxyManager; | ||
|
||
mda::StructArray opts = constructor_arguments[0]; | ||
const mda::TypedArray<uint64_t> arrow_array_proxy_ids = opts[0]["ArrayProxyIDs"]; | ||
const mda::StringArray field_names_mda = opts[0]["FieldNames"]; | ||
const mda::TypedArray<bool> validity_bitmap_mda = opts[0]["Valid"]; | ||
|
||
std::vector<std::shared_ptr<arrow::Array>> arrow_arrays; | ||
arrow_arrays.reserve(arrow_array_proxy_ids.getNumberOfElements()); | ||
|
||
// Retrieve all of the Arrow Array Proxy instances from the libmexclass ProxyManager. | ||
for (const auto& arrow_array_proxy_id : arrow_array_proxy_ids) { | ||
auto proxy = ProxyManager::getProxy(arrow_array_proxy_id); | ||
auto arrow_array_proxy = std::static_pointer_cast<proxy::Array>(proxy); | ||
auto arrow_array = arrow_array_proxy->unwrap(); | ||
arrow_arrays.push_back(arrow_array); | ||
} | ||
|
||
// Convert the utf-16 encoded field names into utf-8 encoded strings | ||
std::vector<std::string> field_names; | ||
field_names.reserve(field_names_mda.getNumberOfElements()); | ||
for (const auto& field_name : field_names_mda) { | ||
const auto field_name_utf16 = std::u16string(field_name); | ||
MATLAB_ASSIGN_OR_ERROR(const auto field_name_utf8, | ||
arrow::util::UTF16StringToUTF8(field_name_utf16), | ||
error::UNICODE_CONVERSION_ERROR_ID); | ||
field_names.push_back(field_name_utf8); | ||
} | ||
|
||
// Pack the validity bitmap values. | ||
MATLAB_ASSIGN_OR_ERROR(auto validity_bitmap_buffer, | ||
bit::packValid(validity_bitmap_mda), | ||
error::BITPACK_VALIDITY_BITMAP_ERROR_ID); | ||
|
||
// Create the StructArray | ||
MATLAB_ASSIGN_OR_ERROR(auto array, | ||
arrow::StructArray::Make(arrow_arrays, field_names, validity_bitmap_buffer), | ||
error::STRUCT_ARRAY_MAKE_FAILED); | ||
|
||
// Construct the StructArray Proxy | ||
auto struct_array = std::static_pointer_cast<arrow::StructArray>(array); | ||
return std::make_shared<proxy::StructArray>(std::move(struct_array)); | ||
} | ||
|
||
void StructArray::getNumFields(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
|
||
mda::ArrayFactory factory; | ||
const auto num_fields = array->type()->num_fields(); | ||
context.outputs[0] = factory.createScalar(num_fields); | ||
} | ||
|
||
void StructArray::getFieldByIndex(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
using namespace libmexclass::proxy; | ||
|
||
mda::StructArray args = context.inputs[0]; | ||
const mda::TypedArray<int32_t> index_mda = args[0]["Index"]; | ||
const auto matlab_index = int32_t(index_mda[0]); | ||
|
||
auto struct_array = std::static_pointer_cast<arrow::StructArray>(array); | ||
|
||
const auto num_fields = struct_array->type()->num_fields(); | ||
|
||
// Validate there is at least 1 field | ||
MATLAB_ERROR_IF_NOT_OK_WITH_CONTEXT( | ||
index::validateNonEmptyContainer(num_fields), | ||
context, error::INDEX_EMPTY_CONTAINER); | ||
|
||
// Validate the matlab index provided is within the range [1, num_fields] | ||
MATLAB_ERROR_IF_NOT_OK_WITH_CONTEXT( | ||
index::validateInRange(matlab_index, num_fields), | ||
context, error::INDEX_OUT_OF_RANGE); | ||
|
||
// Note: MATLAB uses 1-based indexing, so subtract 1. | ||
const int32_t index = matlab_index - 1; | ||
|
||
auto field_array = struct_array->field(index); | ||
|
||
// Wrap the array within a proxy object if possible. | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto field_array_proxy, | ||
proxy::wrap(field_array), | ||
context, error::UNKNOWN_PROXY_FOR_ARRAY_TYPE); | ||
const auto field_array_proxy_id = ProxyManager::manageProxy(field_array_proxy); | ||
const auto type_id = field_array->type_id(); | ||
|
||
// Return a struct with two fields: ProxyID and TypeID. The MATLAB | ||
// layer will use these values to construct the appropriate MATLAB | ||
// arrow.array.Array subclass. | ||
mda::ArrayFactory factory; | ||
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"}); | ||
output[0]["ProxyID"] = factory.createScalar(field_array_proxy_id); | ||
output[0]["TypeID"] = factory.createScalar(static_cast<int32_t>(type_id)); | ||
context.outputs[0] = output; | ||
} | ||
|
||
void StructArray::getFieldByName(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
using libmexclass::proxy::ProxyManager; | ||
|
||
mda::StructArray args = context.inputs[0]; | ||
|
||
const mda::StringArray name_mda = args[0]["Name"]; | ||
const auto name_utf16 = std::u16string(name_mda[0]); | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(const auto name, | ||
arrow::util::UTF16StringToUTF8(name_utf16), | ||
context, error::UNICODE_CONVERSION_ERROR_ID); | ||
|
||
|
||
auto struct_array = std::static_pointer_cast<arrow::StructArray>(array); | ||
auto field_array = struct_array->GetFieldByName(name); | ||
if (!field_array) { | ||
// Return an error if we could not query the field by name. | ||
const auto msg = "Could not find field named " + name + "."; | ||
context.error = libmexclass::error::Error{ | ||
error::ARROW_TABULAR_SCHEMA_AMBIGUOUS_FIELD_NAME, msg}; | ||
return; | ||
} | ||
|
||
// Wrap the array within a proxy object if possible. | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto field_array_proxy, | ||
proxy::wrap(field_array), | ||
context, error::UNKNOWN_PROXY_FOR_ARRAY_TYPE); | ||
const auto field_array_proxy_id = ProxyManager::manageProxy(field_array_proxy); | ||
const auto type_id = field_array->type_id(); | ||
|
||
// Return a struct with two fields: ProxyID and TypeID. The MATLAB | ||
// layer will use these values to construct the appropriate MATLAB | ||
// arrow.array.Array subclass. | ||
mda::ArrayFactory factory; | ||
mda::StructArray output = factory.createStructArray({1, 1}, {"ProxyID", "TypeID"}); | ||
output[0]["ProxyID"] = factory.createScalar(field_array_proxy_id); | ||
output[0]["TypeID"] = factory.createScalar(static_cast<int32_t>(type_id)); | ||
context.outputs[0] = output; | ||
} | ||
|
||
void StructArray::getFieldNames(libmexclass::proxy::method::Context& context) { | ||
namespace mda = ::matlab::data; | ||
|
||
const auto& fields = array->type()->fields(); | ||
const auto num_fields = fields.size(); | ||
std::vector<mda::MATLABString> names; | ||
names.reserve(num_fields); | ||
|
||
for (size_t i = 0; i < num_fields; ++i) { | ||
auto str_utf8 = fields[i]->name(); | ||
|
||
// MATLAB strings are UTF-16 encoded. Must convert UTF-8 | ||
// encoded field names before returning to MATLAB. | ||
MATLAB_ASSIGN_OR_ERROR_WITH_CONTEXT(auto str_utf16, | ||
arrow::util::UTF8StringToUTF16(str_utf8), | ||
context, error::UNICODE_CONVERSION_ERROR_ID); | ||
const mda::MATLABString matlab_string = mda::MATLABString(std::move(str_utf16)); | ||
names.push_back(matlab_string); | ||
} | ||
|
||
mda::ArrayFactory factory; | ||
context.outputs[0] = factory.createArray({1, num_fields}, names.begin(), names.end()); | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
// Licensed to the Apache Software Foundation (ASF) under one | ||
// or more contributor license agreements. See the NOTICE file | ||
// distributed with this work for additional information | ||
// regarding copyright ownership. The ASF licenses this file | ||
// to you under the Apache License, Version 2.0 (the | ||
// "License"); you may not use this file except in compliance | ||
// with the License. You may obtain a copy of the License at | ||
// | ||
// http://www.apache.org/licenses/LICENSE-2.0 | ||
// | ||
// Unless required by applicable law or agreed to in writing, | ||
// software distributed under the License is distributed on an | ||
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
// KIND, either express or implied. See the License for the | ||
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
#pragma once | ||
|
||
#include "arrow/matlab/array/proxy/array.h" | ||
|
||
namespace arrow::matlab::array::proxy { | ||
|
||
class StructArray : public arrow::matlab::array::proxy::Array { | ||
public: | ||
StructArray(std::shared_ptr<arrow::StructArray> struct_array); | ||
|
||
~StructArray() {} | ||
|
||
static libmexclass::proxy::MakeResult make(const libmexclass::proxy::FunctionArguments& constructor_arguments); | ||
|
||
protected: | ||
|
||
void getNumFields(libmexclass::proxy::method::Context& context); | ||
|
||
void getFieldByIndex(libmexclass::proxy::method::Context& context); | ||
|
||
void getFieldByName(libmexclass::proxy::method::Context& context); | ||
|
||
void getFieldNames(libmexclass::proxy::method::Context& context); | ||
|
||
}; | ||
|
||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.