Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Meson build system for nanoarrow-ipc extension #483

Merged
merged 20 commits into from
Jun 11, 2024

Conversation

WillAyd
Copy link
Contributor

@WillAyd WillAyd commented May 23, 2024

No description provided.

@WillAyd
Copy link
Contributor Author

WillAyd commented May 23, 2024

Meson's handling of subprojects is way easier I think than CMake's. If down the road we wanted to replace CMake with Meson there is a ton that we could simplify within these extensions, and maybe could even consider them as separate projects

@paleolimbot
Copy link
Member

Nice!

I've been wondering if the IPC and Device "extensions" should move into the main tree (e.g., #481 )...they started as separate projects but a few things like the "testing" module make it a little awkward to keep them separate. Any thoughts on that?

# under the License.

[wrap-git]
url = ../../..
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the main project right now we ask end users to run meson wrap install <lib>, but that doesn't make sense to do with this nanoarrow.wrap, so I ended up putting all the wrap files here

Maybe even in the main project we should just add the wrap files? That would technically pin a version for third party dependencies (if not installed elsewhere on the system) which may or may not be desired

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this complexity would be solved if the IPC extension just lived with the main sources!

# under the License.

[wrap-git]
method = cmake
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flatcc is working to replace (?) CMake with Meson. That would simplify the wrap / subproject usage for flatcc if and when it goes live

dvidelabs/flatcc#262

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the intention is to support both, although we vendor flatcc anyway for now. I should be more active contributing to that upstream since the IPC reader 100% relies on it!

@WillAyd
Copy link
Contributor Author

WillAyd commented May 23, 2024

I haven't used these in particular so I'm just kind of guessing, but it feels like it would make sense to combine them. I think for consumers having to potentially install three different libraries for nanoarrow instead of just one library that you can choose the dependencies out of is overkill. And as you've mentioned already, there appears to be a ton of overlap between these projects' dependencies

instead of -DNANOARROW_IPC_CODE_COVERAGE=true''')
endif

if get_option('NANOARROW_IPC_BUNDLE')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot without being super knowledgeable on the end goal of these bundle options, are these used by consumers downstream or just for internal use? I see the Python build uses the bundle to auto generate a Cython header, but not sure of other uses cases and if its worth trying to replicate in Meson

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's any reason to expose this in Meson...this is currently how the files in dist/ are generated (e.g., the single-file bundles for each component). The fact that they are written in CMake is something that initially made it easier to test that the single-file distribution actually worked...I think it might make more sense to rewrite the amalgamation scripts in Python at some point though.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't used these in particular so I'm just kind of guessing, but it feels like it would make sense to combine them.

Great! I will aim for that in a near-term PR. I am happy to work through this and merge it before that refactor or afterward (your call!)

# under the License.

[wrap-git]
method = cmake
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the intention is to support both, although we vendor flatcc anyway for now. I should be more active contributing to that upstream since the IPC reader 100% relies on it!

# under the License.

[wrap-git]
url = ../../..
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this complexity would be solved if the IPC extension just lived with the main sources!

@WillAyd
Copy link
Contributor Author

WillAyd commented May 24, 2024

Let's just wait to merge this until after then - there is no rush for this and I think waiting will make that consolidation easier

@@ -34,3 +34,14 @@ subdir('src/nanoarrow')
if get_option('benchmarks')
subdir('dev/benchmarks')
endif


if get_option('apps')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should really move the meson.build from the src/nanoarrow directory up to here to mirror what is done for CMake, but am leaving that for a separate PR. For now just put this apps piece up here since it does not live in src/nanoarrow

@@ -470,7 +470,7 @@ TEST(NanoarrowTestingTest, NanoarrowTestingTestFieldMetadata) {
NANOARROW_RETURN_NOT_OK(ArrowSchemaSetMetadata(schema, "\0\0\0\0"));
return NANOARROW_OK;
},
[](ArrowArray* array) { return NANOARROW_OK; }, &WriteFieldJSON,
[](ArrowArray*) { return NANOARROW_OK; }, &WriteFieldJSON,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly related but it showed as a warning we could control with the Meson warning configuration

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@WillAyd
Copy link
Contributor Author

WillAyd commented Jun 10, 2024

Interesting to see the valgrind error in CI right now. I assume this is also why the windows build is failing, although surprised our CMake setup didn't catch it

https://github.com/apache/arrow-nanoarrow/actions/runs/9455933515/job/26046780100?pr=483#step:7:338

@WillAyd
Copy link
Contributor Author

WillAyd commented Jun 11, 2024

Seems like the error has to do with the fields being sent to ArrowErrorSet in this function:

if (array_view->storage_type == NANOARROW_TYPE_RUN_END_ENCODED) {

Not familiar with the run end encoded implementation but I think we are missing something with how fields are being initialized and accessed through that code block

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but generally looking great!

@@ -23,8 +23,6 @@

void dump_schema_to_stdout(struct ArrowSchema* schema, int level, char* buf,
int buf_size) {
int n_chars = ArrowSchemaToString(schema, buf, buf_size, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is still needed (or else the top-level type will never appear in the dump). Maybe just remove int n_chars = (if the unused variable was a problem?)

auto result = memcmp(arrays[i].children[0]->buffers[1], one_two_three_le,
sizeof(one_two_three_le));
// discard result to silence -Wunused-value
(void)result;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would NANOARROW_UNUSED() work here?

@@ -470,7 +470,7 @@ TEST(NanoarrowTestingTest, NanoarrowTestingTestFieldMetadata) {
NANOARROW_RETURN_NOT_OK(ArrowSchemaSetMetadata(schema, "\0\0\0\0"));
return NANOARROW_OK;
},
[](ArrowArray* array) { return NANOARROW_OK; }, &WriteFieldJSON,
[](ArrowArray*) { return NANOARROW_OK; }, &WriteFieldJSON,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

@@ -18,3 +18,4 @@ python/src/nanoarrow/dlpack_abi.h
subprojects/google-benchmark.wrap
subprojects/gtest.wrap
subprojects/nlohmann_json.wrap
subprojects/zlib.wrap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason these are being ignored rather than just adding the Apache license header to the files? (It seems like those file types support comments and some of the wrap files already have the license header added)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ones I have excluded were auto generated by meson wrap install <library> whereas the one that I created from scratch (flatcc.wrap) included the Arrow licensing agreement.

Not sure if that distinction matters. Happy to add license agreement to even the auto generated ones

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be best to add the license headers rather than put it in the excluded files (although I think it is fine either way since it was generated for us).

Comment on lines +68 to +72
cmake = import('cmake')
cmake_opts = cmake.subproject_options()
cmake_opts.add_cmake_defines({'CMAKE_POSITION_INDEPENDENT_CODE': true})
flatcc_subproj = cmake.subproject('flatcc', options: cmake_opts)
flatcc_dep = flatcc_subproj.dependency('flatccrt')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean that CMake is required to build the flatcc runtime? I get that it is more "meson"ic to use the wrap file; however, it might also be a little strange that building nanoarrow_ipc via CMake will give you a slightly different result than using Meson. I think I'd prefer that the default build uses the vendored version (via a pure Meson compile of the four files in thirdparty/flatcc/src/runtime), and perhaps our dependency relationship with flatcc could be improved in both CMake/Meson (perhaps with some contributions to the upstream setup).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool will take a look. I believe flatcc is trying to switch to Meson as well, so yea can look and see if that makes for a good upstream contribution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is solved in https://github.com/apache/arrow-nanoarrow/pull/483/files/82a6b48e4218f24b320cfb9e05f972c59d2cdd37..b878e23bb721f07a058e9f63a4db3aa6588f0faa

Basically Meson allows you to provide your own "patch" file for systems that don't use Meson. The file(s) go in subprojects/packagefiles/<patch_dir> and the wrap system still takes care of fetching / downloading the library for you, rather than requiring you to vendor into your source tree

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh set up the subproject here correctly but forgot to come back and update the code block here. Will push up a follow up PR to correct

name + '-test',
name.replace('-', '_') + '_test.cc',
link_with: nanoarrow_ipc_lib,
dependencies: config['deps']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to factor more common dependencies like gtest_dep into link_with?

(Also: I don't think that nanoarrow_dep or flatcc_dep is needed because it should be handled as a transitive dependency of nanoarrow_ipc_lib? I might have put that in the CMake due to some weirdness about it being a library built on top of nanoarrow rather than part of nanoarrow itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the first question - what problem are you trying to solve by making that a direct link argument? The way the dependency is declared (see wrapdb source) I believe Meson will take care of that linkage for you, alongside having the proper include and threads dependency

As far as the second question goes the way I had this previously the dependencies were not transitive, but I've since updated the nanoarrow_ipc_dep to include those:

nanoarrow_ipc_dep = declare_dependency(..., dependencies: [nanoarrow_dep, flatcc_dep])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the first question

It seemed like the loop was handling linking each test with nanoarrow_ipc_lib and I was wondering if the loop could also handle linking the dependencies that were common to every test executable. This is probably just me misunderstanding meson and feel free to ignore if it doesn't make sense 🙂

As far as the second question goes the way I had this previously the dependencies were not transitive, but I've since updated the nanoarrow_ipc_dep to include those

Great! I mostly just want to make sure my vauge mental model of what's going on here is correct 🙂

Copy link
Contributor Author

@WillAyd WillAyd Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when coming from CMake it is helpful to understand that the word "dependency" means something else in Meson. In CMake, dependency refers to "something that needs to be built before this target can be". In Meson, a dependency helps you transitively use include directories / link targets. You declare this via the declare_dependency function:

base_dep = declare_dependency(
    include_directories: ['foo_dir'],
    link_with: [threads_lib],
)

So when you end up using that dependency on another target, it will transitively carry over the include / link flags for you, i.e.

child_lib = library('child', sources: ['child.cc'], dependencies: [base_dep])

expands into

child_lib = library('child',
    sources: ['child.cc'],
    include_directories: ['foo_dir'],
    link_with: [threads_lib],
)

(at least as far as I understand things)

@@ -68,12 +68,12 @@ function main() {
show_header "Run test suite"
meson configure -Dtests=true -Db_coverage=true
meson compile
meson test --wrap valgrind
meson test --wrap='valgrind --track-origins=yes' --print-errorlogs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often use --tool=memcheck --leak-check=full (but this seems to give great output so take or leave!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure can definitely add the leak-check=full - its just a tradeoff between performance and visibility. This runs pretty quick in CI as is

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Let's leave it (the memcheck CI job + verification both already do the memcheck/full version).

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

The Meson build failure is a separate issue related to attempting to print very large numbers (unrelated to this PR).

@paleolimbot paleolimbot merged commit 41e9e71 into apache:main Jun 11, 2024
32 of 33 checks passed
@WillAyd WillAyd deleted the meson-for-ipc branch June 11, 2024 20:25
paleolimbot pushed a commit that referenced this pull request Jun 12, 2024
@paleolimbot paleolimbot added this to the nanoarrow 0.6.0 milestone Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants