-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-12730: [MATLAB] Update featherreadmex and featherwritemex to build against latest Arrow C++ APIs #10305
Conversation
I'm not too familiar with Travis CI, but I believe the build failure failure on ARM may be unrelated to my changes and/or a pre-existing build issue. It looks like other failed jobs have experienced a similar issue on ARM as well. If there's anything I need to do, please let me know. |
Sorry for the long delay - I kicked off the other CI pipelines here. If you rebase, that'll let you re-run the ARM build - note it's been somewhat flaky but should have gotten better recently (we've switched to a newer machine type, IIRC). Also, did you have someone in mind to review this? If not we can find someone. |
Thanks for looking at this @lidavidm! I rebased against the master branch. Ideally, we would like to get reviewers outside of MathWorks. I'm not sure what the precedent is for new language bindings in terms of finding the right reviewers. We understand it takes a fair bit of time and effort to review pull requests, so let us know if there's anything we can do to help. Best, |
Hey, sorry for the delay (again) - I can try to review this once I get a chance, but you may also want to poke the mailing list ([email protected]) to see if someone else is interested. |
No worries! I just messaged the mailing list to see if anyone would be interested in reviewing this. Best, |
@@ -15,7 +15,8 @@ | |||
# specific language governing permissions and limitations | |||
# under the License. | |||
|
|||
cmake_minimum_required(VERSION 3.2) | |||
cmake_minimum_required(VERSION 3.20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit high, is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We wanted to use 3.20 because earlier versions of the FindMatlab.cmake module had a few bugs we ran into when trying to build our mex functions. I'm open to other approaches, but the hope was to build on the existing work of the cmake community to avoid reinventing the wheel. We also want to share improvements upstream where appropriate.
matlab/src/feather_reader.cc
Outdated
|
||
// TableReader expects a RandomAccessFile. | ||
std::shared_ptr<io::RandomAccessFile> random_access_file(readable_file); | ||
std::shared_ptr<io::RandomAccessFile> random_access_file{ | ||
maybe_readable_file.ValueOrDie()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use ARROW_ASSIGN_OR_RAISE
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
matlab/src/feather_reader.cc
Outdated
RETURN_NOT_OK(maybe_reader); | ||
|
||
// Set the internal reader_ object. | ||
(*feather_reader)->reader_ = maybe_reader.ValueOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW_ASSIGN_OR_RAISE
?
matlab/src/feather_reader.cc
Outdated
arrow::Status status = reader_->Read(&table); | ||
if (!status.ok()) { | ||
mexErrMsgIdAndTxt("MATLAB:arrow:FeatherReader::FailedToReadTable", | ||
"Failed to read arrow::Table from Feather file."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want to append the Status message? You might consider having a utility to adorn a status with additional Matlab-specific error message to make this pattern simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'll update the error to include the status message.
matlab/src/feather_writer.cc
Outdated
arrow::Result<std::shared_ptr<ResizableBuffer>> maybe_buffer = | ||
arrow::AllocateResizableBuffer(internal::BitPackedLength(num_rows_)); | ||
RETURN_NOT_OK(maybe_buffer); | ||
std::shared_ptr<ResizableBuffer> validity_bitmap = maybe_buffer.ValueOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW_ASSIGN_OR_RAISE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
matlab/src/feather_writer.cc
Outdated
schema_builder.Finish(); | ||
RETURN_NOT_OK(table_schema_result); | ||
|
||
std::shared_ptr<arrow::Schema> table_schema = table_schema_result.ValueOrDie(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARROW_ASSIGN_OR_RAISE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
matlab/src/feather_writer.h
Outdated
#include <matrix.h> | ||
|
||
#include <memory> | ||
#include <string> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to move these here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. See my comment below. I'll move them back to the top.
matlab/src/featherreadmex.cc
Outdated
#include <mex.h> | ||
|
||
#include <string> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason for this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to reorganize the include statements in a way that made sense to me, but I'd rather go for consistency with the rest of the code base. I'll move the stdlib includes up top.
matlab/src/featherwritemex.cc
Outdated
#include <mex.h> | ||
|
||
#include <string> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move the include statement to the top.
Thanks for the feedback @wesm! I'll make some changes and push to the pr once I'm done. Best, |
Just pushed changes based on the code review feedback. |
Sorry. This is conflicted caused by #10571. Could you rebase on the master? |
Sorry about that. Just rebased on master. |
Thanks. Don't worry. It's not a your problem. |
LINK_TO | ||
arrow_shared) | ||
|
||
# Ensure the MEX binaries are placed in the src directory on all platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just wondering. Why do we need to place the binaries in the source directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to execute the MEX function, it has to be discoverable on the MATLAB search path. We also have MATLAB code (featherread.m and featherwrite.m) that we also need to add to the MATLAB search path. Since we need to add the source directory to the path anyway, I thought it makes sense to put the MEX files there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I understand. ( https://github.com/apache/arrow/blob/master/matlab/test/tfeathermex.m#L26 is the code that adds the source directory to the MATLAB search path.)
Generally, we should not change anything files in the source directory with out-of-source build. Can we also add the build directory to the MATLAB search path? Is it difficult?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could add the build folder to the MATLAB search path. However, this will make harder to run our unit tests automatically because we would require the user to explicitly tell us where their build files are located every time.
Additionally, as the MATLAB interface grows, we will have many MEX files. In order to keep things organized. We see two approaches:
- We can encode the relationship of MEX files to MATLAB classes via the MEX file name itself. For example, arrow_array_new.mexw64.
or
- The other option is to encode the relationship between the two via the MATLAB packaging mechanism. For example, the folder structure +matlab/+array/+mex exposes a package called matlab.array.mex to MATLAB. If we choose this option, we can reuse common names, such as "new", and keep the names short.
Option 2 seems more scalable, but if you're experience tells us this will lead to issues, we can revisit adding the build folder to the path.
Best,
Sarah
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for describing this.
Could you try one of them (or both)? If we can find a better approach, we can choose it.
It can be done in this pull request or a follow up task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we can definitely investigate both approaches and see which one is preferable. I created a jira task to look into this in a future pull request.
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
Co-authored-by: Kevin Gurney <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did not detect the build failures due to the lack of CI integration. We hope to add CI support soon and will follow up with a mailing list discussion to talk through the details.
I hope that this is the next task. :-)
LINK_TO | ||
arrow_shared) | ||
|
||
# Ensure the MEX binaries are placed in the src directory on all platforms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
I understand. ( https://github.com/apache/arrow/blob/master/matlab/test/tfeathermex.m#L26 is the code that adds the source directory to the MATLAB search path.)
Generally, we should not change anything files in the source directory with out-of-source build. Can we also add the build directory to the MATLAB search path? Is it difficult?
matlab/src/feather_reader.cc
Outdated
std::static_pointer_cast<ArrowArrayType>(column); | ||
|
||
// Get a raw pointer to the Arrow array data. | ||
const MatlabType* source = integer_array->raw_values(); | ||
const MatlabType* source = | ||
reinterpret_cast<const MatlabType*>(arrow_numeric_array->values()->data()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work with sliced array?
What is the problem of raw_values()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the documentation, it looks like the values() method doesn't account for slice offsets. I think we modified this line when we were trying to get the code to compile again, but didn't check to see if this change was really necessary. I just tried using raw_values() instead and it works. I'll undo this change.
matlab/src/feather_reader.cc
Outdated
// TableReader expects a RandomAccessFile. | ||
std::shared_ptr<io::RandomAccessFile> random_access_file(readable_file); | ||
ARROW_ASSIGN_OR_RAISE(std::shared_ptr<io::RandomAccessFile> random_access_file, | ||
maybe_readable_file); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this cast?
I think that arrow::ipc::feather::Reader::Open()
can accept arrow::io::ReadableFile
because arrow::io::ReadableFile
is a subclass of arrow::io::RandomAccessFile
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We don't need this cast.
Hi @kou, Regarding CI, we're currently working on adding GoogleTest support in this branch here if you're interested. Once this pull request is accepted, we can submit a pull request for this branch. That branch requires the changes here to build, which is why we haven't opened pull request yet. Regarding MATLAB CI, we're actively working on this and will open a pull request as soon as possible. Best, |
Thanks! |
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
Co-authored-by: Sutou Kouhei <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Overview
featherreadmex
andfeatherwritemex
fail to build against the latest Arrow C++ APIs. These changes allow them to successfully build.Testing
featherreadmex
andfeatherwritemex
on Windows 10 with Visual Studio 2019featherreadmex
andfeatherwritemex
on macOS Big Sur (11.2.3) with GNU Make 3.81featherreadmex
andfeatherwritemex
on Debian 10 with GNU Make GNU 4.2.1tfeather
andtfeathermex
on all platforms in MATLAB R2021aFuture Directions
build_support
folder is out of date. We are not sure if we want to maintain a separate MATLAB based build system along side the CMake based one. We will follow up on this in the future via the mailing list or Jira.We acknowledge there is a lot of information in this pull request. In the future, we will work in smaller increments. We felt a bigger pull request was necessary to get back to a working state.
Thanks,
Sarah