-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17868: [C++][Python] Restore the ARROW_PYTHON CMake option #14273
Conversation
|
@github-actions crossbow submit -g nightly-tests -g nightly-release -g nightly-packaging |
This comment was marked as outdated.
This comment was marked as outdated.
@github-actions crossbow submit -g nightly-tests -g nightly-release -g nightly-packaging |
This comment was marked as outdated.
This comment was marked as outdated.
ARROW_FLIGHT=OFF \ | ||
ARROW_GANDIVA=OFF \ | ||
ARROW_HDFS=ON \ | ||
ARROW_JSON=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is all this required for R? AFAIR, Python is only used to test PyArrow-R interoperability here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nealrichardson Do you know what components are needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can disable ARROW_HDFS
. Other components seems necessary. I'll try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added a few comments
@@ -96,25 +96,27 @@ cmake \ | |||
-DARROW_BUILD_SHARED=ON \ | |||
-DARROW_BUILD_STATIC=OFF \ | |||
-DARROW_BUILD_TESTS=OFF \ | |||
-DARROW_COMPUTE=ON \ | |||
-DARROW_CSV=ON \ | |||
-DARROW_DATASET=${ARROW_DATASET} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just hardcode this to ON? (the ARROW_PYTHON=ON would have ensured that) I don't think we want to create wheels without dataset enabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to create wheels without dataset enabled
I think so but ${ARROW_DATASET}
is also used for export PYARROW_WITH_DATASET=${ARROW_DATASET}
below. So I think that we use ${ARROW_DATASET}
here for consistency. (ARROW_DATASET
is initialized by : ${ARROW_DATASET:=ON}
.)
@@ -129,9 +131,9 @@ cmake \ | |||
-DCMAKE_INSTALL_PREFIX=${build_dir}/install \ | |||
-DCMAKE_OSX_ARCHITECTURES=${CMAKE_OSX_ARCHITECTURES} \ | |||
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \ | |||
-DOPENSSL_USE_STATIC_LIBS=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unrelated to the other changes. Can you give a short reasoning for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This is unrelated to ARROW_PYTHON
. Sorry for including this to this pull request.
OPENSSL_USE_STATIC_LIBS
is redundant here because there is -DARROW_DEPENDENCY_USE_SHARED=OFF
in this command line. OPENSSL_USE_STATIC_LIBS
is set automatically in https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindOpenSSLAlt.cmake#L48-L52 .
I just found this by sorting CMake options. So I mix this change into this pull request. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to be sorry. I was just wondering about the reasoning
-DARROW_DATASET=${ARROW_DATASET} \ | ||
-DARROW_DATASET=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-DARROW_DATASET=${ARROW_DATASET} \ | |
-DARROW_DATASET=ON \ | |
-DARROW_DATASET=ON \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks but I choose -DARROW_DATASET=${ARROW_DATASET}
for consistency.
"ARROW_COMPUTE": "ON", | ||
"ARROW_CSV": "ON", | ||
"ARROW_FILESYSTEM": "ON", | ||
"ARROW_JSON": "ON" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't yet used the presets myself, but shall we include ARROW_DATASET
here as well? (that was included with the previous features-python
preset.
Maybe you can also keep the exact name to keep it working for people that were using those presets?
cc @wjones127
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm fine not including datasets if it's not required for the Python build. In most workflows with CMakePresets.json, I think we expect developers to create their own presets in CMakeUserPresets.json, which will inherit from the provided once. (See an example here)
Once this is merged, I can send a notice to the mailing list with instructions on how to transition to the new presets. (I need to update my blog post anyways.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you can also keep the exact name to keep it working for people that were using those presets?
It makes sense. I added no -minimal
/-maximal
versions.
docs/source/developers/python.rst
Outdated
.. | ||
$ make -j4 | ||
$ make install | ||
$ cmake --build . --target install --config Debug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For my understanding: is the --config Debug
needed if you already use CMAKE_BUILD_TYPE=Debug
above?
And will this line also build in parallel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the --config Debug
is redundant.
In my experience, it seems to automatically build in parallel, but you can also add the option explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. I added this accidentally. I revert this.
I found -j4
isn't suitable here. We can use -j$(nproc)
on Linux and -j$(sysctl -n hw.ncpu)
on macOS. But I thought that it's better that we use Ninja here. And I started rewriting this for Ninja but I stopped it because this pull request isn't for the change. But this change remained.
Restore it but it's marked as a deprecated option. Because the Python component in Apache Arrow C++ was moved to PyArrow by ARROW-16340. It' removed in a feature release. Users should use CMake presets instead of ARROW_PYTHON but CMake presets requires CMake 3.19 or later.
Co-authored-by: Antoine Pitrou <[email protected]>
02b6faf
to
e6b084c
Compare
@github-actions crossbow submit -g nightly-tests -g nightly-release -g nightly-packaging |
Revision: 954f1ad Submitted crossbow builds: ursacomputing/crossbow @ actions-a153f6e909 |
I think that this is ready to merge. |
Benchmark runs are scheduled for baseline = 4dfa617 and contender = 89c0214. 89c0214 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…che#14273) Restore it but it's marked as a deprecated option. Because the Python component in Apache Arrow C++ was moved to PyArrow by ARROW-16340. It' removed in a feature release. Users should use CMake presets instead of ARROW_PYTHON but CMake presets requires CMake 3.19 or later. Lead-authored-by: Sutou Kouhei <[email protected]> Co-authored-by: Sutou Kouhei <[email protected]> Signed-off-by: Sutou Kouhei <[email protected]>
Restore it but it's marked as a deprecated option. Because the Python component in Apache Arrow C++ was moved to PyArrow by ARROW-16340. It' removed in a feature release.
Users should use CMake presets instead of ARROW_PYTHON but CMake presets requires CMake 3.19 or later.