Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17868: [C++][Python] Restore the ARROW_PYTHON CMake option #14273

Merged
merged 9 commits into from
Oct 1, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion ci/appveyor-cpp-build.bat
Original file line number Diff line number Diff line change
@@ -60,16 +60,19 @@ cmake -G "%GENERATOR%" %CMAKE_ARGS% ^
-DARROW_BUILD_EXAMPLES=ON ^
-DARROW_BUILD_STATIC=OFF ^
-DARROW_BUILD_TESTS=ON ^
-DARROW_COMPUTE=ON ^
-DARROW_CSV=ON ^
-DARROW_CXXFLAGS="%ARROW_CXXFLAGS%" ^
-DARROW_DATASET=ON ^
-DARROW_ENABLE_TIMING_TESTS=OFF ^
-DARROW_FILESYSTEM=ON ^
-DARROW_FLIGHT=%ARROW_BUILD_FLIGHT% ^
-DARROW_FLIGHT_SQL=%ARROW_BUILD_FLIGHT_SQL% ^
-DARROW_GANDIVA=%ARROW_BUILD_GANDIVA% ^
-DARROW_HDFS=ON ^
-DARROW_JSON=ON ^
-DARROW_MIMALLOC=ON ^
-DARROW_PARQUET=ON ^
-DARROW_PYTHON=ON ^
-DARROW_S3=%ARROW_S3% ^
-DARROW_SUBSTRAIT=ON ^
-DARROW_VERBOSE_THIRDPARTY_BUILD=OFF ^
16 changes: 10 additions & 6 deletions ci/docker/conda-python-hdfs.dockerfile
Original file line number Diff line number Diff line change
@@ -42,12 +42,16 @@ COPY ci/etc/hdfs-site.xml $HADOOP_HOME/etc/hadoop/
# build cpp with tests
ENV CC=gcc \
CXX=g++ \
ARROW_BUILD_TESTS=ON \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_FLIGHT=OFF \
ARROW_GANDIVA=OFF \
ARROW_PLASMA=OFF \
ARROW_PARQUET=ON \
PARQUET_REQUIRE_ENCRYPTION=ON \
ARROW_ORC=OFF \
ARROW_HDFS=ON \
ARROW_PYTHON=ON \
ARROW_BUILD_TESTS=ON
ARROW_JSON=ON \
ARROW_ORC=OFF \
ARROW_PARQUET=ON \
ARROW_PLASMA=OFF \
PARQUET_REQUIRE_ENCRYPTION=ON
8 changes: 6 additions & 2 deletions ci/docker/conda-python-spark.dockerfile
Original file line number Diff line number Diff line change
@@ -37,7 +37,11 @@ RUN /arrow/ci/scripts/install_spark.sh ${spark} /spark
# build cpp with tests
ENV CC=gcc \
CXX=g++ \
ARROW_PYTHON=ON \
ARROW_HDFS=ON \
ARROW_BUILD_TESTS=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_HDFS=ON \
ARROW_JSON=ON \
SPARK_VERSION=${spark}
12 changes: 8 additions & 4 deletions ci/docker/conda-python.dockerfile
Original file line number Diff line number Diff line change
@@ -37,10 +37,14 @@ RUN mamba install -q -y \
COPY ci/scripts/install_gcs_testbench.sh /arrow/ci/scripts
RUN /arrow/ci/scripts/install_gcs_testbench.sh default

ENV ARROW_PYTHON=ON \
ARROW_BUILD_STATIC=OFF \
ENV ARROW_BUILD_STATIC=OFF \
ARROW_BUILD_TESTS=OFF \
ARROW_BUILD_UTILITIES=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_HDFS=ON \
ARROW_JSON=ON \
ARROW_TENSORFLOW=ON \
ARROW_USE_GLOG=OFF \
ARROW_HDFS=ON
ARROW_USE_GLOG=OFF
7 changes: 6 additions & 1 deletion ci/docker/linux-apt-docs.dockerfile
Original file line number Diff line number Diff line change
@@ -96,10 +96,15 @@ RUN /arrow/ci/scripts/r_deps.sh /arrow && \
ENV ARROW_BUILD_STATIC=OFF \
ARROW_BUILD_TESTS=OFF \
ARROW_BUILD_UTILITIES=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_FLIGHT=ON \
ARROW_GCS=ON \
ARROW_GLIB_VAPI=false \
ARROW_PYTHON=ON \
ARROW_HDFS=ON \
ARROW_JSON=ON \
ARROW_S3=ON \
ARROW_USE_GLOG=OFF \
CMAKE_UNITY_BUILD=ON
11 changes: 8 additions & 3 deletions ci/docker/linux-apt-python-3.dockerfile
Original file line number Diff line number Diff line change
@@ -45,8 +45,13 @@ RUN if [ "${numba}" != "" ]; then \
/arrow/ci/scripts/install_numba.sh ${numba} \
; fi

ENV ARROW_PYTHON=ON \
ARROW_BUILD_STATIC=OFF \
ENV ARROW_BUILD_STATIC=OFF \
ARROW_BUILD_TESTS=OFF \
ARROW_BUILD_UTILITIES=OFF \
ARROW_USE_GLOG=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_HDFS=ON \
ARROW_JSON=ON \
ARROW_USE_GLOG=OFF
7 changes: 6 additions & 1 deletion ci/docker/linux-apt-r.dockerfile
Original file line number Diff line number Diff line change
@@ -103,13 +103,18 @@ ENV \
ARROW_BUILD_STATIC=OFF \
ARROW_BUILD_TESTS=OFF \
ARROW_BUILD_UTILITIES=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_FLIGHT=OFF \
ARROW_GANDIVA=OFF \
ARROW_HDFS=OFF \
ARROW_JSON=ON \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is all this required for R? AFAIR, Python is only used to test PyArrow-R interoperability here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nealrichardson Do you know what components are needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we can disable ARROW_HDFS. Other components seems necessary. I'll try it.

ARROW_NO_DEPRECATED_API=ON \
ARROW_ORC=OFF \
ARROW_PARQUET=ON \
ARROW_PLASMA=OFF \
ARROW_PYTHON=ON \
ARROW_S3=ON \
ARROW_USE_CCACHE=ON \
ARROW_USE_GLOG=OFF \
11 changes: 8 additions & 3 deletions ci/docker/linux-dnf-python-3.dockerfile
Original file line number Diff line number Diff line change
@@ -36,8 +36,13 @@ RUN pip install \
-r arrow/python/requirements-build.txt \
-r arrow/python/requirements-test.txt

ENV ARROW_PYTHON=ON \
ARROW_BUILD_STATIC=OFF \
ENV ARROW_BUILD_STATIC=OFF \
ARROW_BUILD_TESTS=OFF \
ARROW_BUILD_UTILITIES=OFF \
ARROW_USE_GLOG=OFF \
ARROW_COMPUTE=ON \
ARROW_CSV=ON \
ARROW_DATASET=ON \
ARROW_FILESYSTEM=ON \
ARROW_HDFS=ON \
ARROW_JSON=ON \
ARROW_USE_GLOG=OFF
1 change: 0 additions & 1 deletion ci/scripts/cpp_build.sh
Original file line number Diff line number Diff line change
@@ -104,7 +104,6 @@ cmake \
-DARROW_ORC=${ARROW_ORC:-OFF} \
-DARROW_PARQUET=${ARROW_PARQUET:-OFF} \
-DARROW_PLASMA=${ARROW_PLASMA:-OFF} \
-DARROW_PYTHON=${ARROW_PYTHON:-OFF} \
-DARROW_RUNTIME_SIMD_LEVEL=${ARROW_RUNTIME_SIMD_LEVEL:-MAX} \
-DARROW_S3=${ARROW_S3:-OFF} \
-DARROW_SKYHOOK=${ARROW_SKYHOOK:-OFF} \
10 changes: 6 additions & 4 deletions ci/scripts/python_wheel_macos_build.sh
Original file line number Diff line number Diff line change
@@ -96,25 +96,27 @@ cmake \
-DARROW_BUILD_SHARED=ON \
-DARROW_BUILD_STATIC=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=${ARROW_DATASET} \
Copy link
Member

@jorisvandenbossche jorisvandenbossche Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just hardcode this to ON? (the ARROW_PYTHON=ON would have ensured that) I don't think we want to create wheels without dataset enabled?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want to create wheels without dataset enabled

I think so but ${ARROW_DATASET} is also used for export PYARROW_WITH_DATASET=${ARROW_DATASET} below. So I think that we use ${ARROW_DATASET} here for consistency. (ARROW_DATASET is initialized by : ${ARROW_DATASET:=ON}.)

-DARROW_DEPENDENCY_SOURCE="VCPKG" \
-DARROW_DEPENDENCY_USE_SHARED=OFF \
-DARROW_FILESYSTEM=ON \
-DARROW_FLIGHT=${ARROW_FLIGHT} \
-DARROW_GANDIVA=${ARROW_GANDIVA} \
-DARROW_GCS=${ARROW_GCS} \
-DARROW_HDFS=${ARROW_HDFS} \
-DARROW_JEMALLOC=${ARROW_JEMALLOC} \
-DARROW_JSON=ON \
-DARROW_MIMALLOC=${ARROW_MIMALLOC} \
-DARROW_ORC=${ARROW_ORC} \
-DARROW_PACKAGE_KIND="python-wheel-macos" \
-DARROW_PARQUET=${ARROW_PARQUET} \
-DPARQUET_REQUIRE_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION} \
-DARROW_PLASMA=${ARROW_PLASMA} \
-DARROW_PYTHON=ON \
-DARROW_RPATH_ORIGIN=ON \
-DARROW_SUBSTRAIT=${ARROW_SUBSTRAIT} \
-DARROW_S3=${ARROW_S3} \
-DARROW_SIMD_LEVEL=${ARROW_SIMD_LEVEL} \
-DARROW_SUBSTRAIT=${ARROW_SUBSTRAIT} \
-DARROW_TENSORFLOW=${ARROW_TENSORFLOW} \
-DARROW_USE_CCACHE=ON \
-DARROW_WITH_BROTLI=${ARROW_WITH_BROTLI} \
@@ -129,9 +131,9 @@ cmake \
-DCMAKE_INSTALL_PREFIX=${build_dir}/install \
-DCMAKE_OSX_ARCHITECTURES=${CMAKE_OSX_ARCHITECTURES} \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \
-DOPENSSL_USE_STATIC_LIBS=ON \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unrelated to the other changes. Can you give a short reasoning for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. This is unrelated to ARROW_PYTHON. Sorry for including this to this pull request.

OPENSSL_USE_STATIC_LIBS is redundant here because there is -DARROW_DEPENDENCY_USE_SHARED=OFF in this command line. OPENSSL_USE_STATIC_LIBS is set automatically in https://github.com/apache/arrow/blob/master/cpp/cmake_modules/FindOpenSSLAlt.cmake#L48-L52 .

I just found this by sorting CMake options. So I mix this change into this pull request. Sorry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to be sorry. I was just wondering about the reasoning

-DORC_PROTOBUF_EXECUTABLE=${VCPKG_ROOT}/installed/${VCPKG_TARGET_TRIPLET}/tools/protobuf/protoc \
-DORC_SOURCE=BUNDLED \
-DPARQUET_REQUIRE_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION} \
-DVCPKG_MANIFEST_MODE=OFF \
-DVCPKG_TARGET_TRIPLET=${VCPKG_TARGET_TRIPLET} \
-G ${CMAKE_GENERATOR} \
12 changes: 6 additions & 6 deletions ci/scripts/python_wheel_manylinux_build.sh
Original file line number Diff line number Diff line change
@@ -89,31 +89,31 @@ pushd /tmp/arrow-build
# https://github.com/aws/aws-sdk-cpp/issues/1809 is fixed and vcpkg
# ships the fix.
cmake \
-DARROW_BROTLI_USE_SHARED=OFF \
-DARROW_BUILD_SHARED=ON \
-DARROW_BUILD_STATIC=OFF \
-DARROW_BUILD_TESTS=OFF \
-DARROW_COMPUTE=ON \
-DARROW_CSV=ON \
-DARROW_DATASET=${ARROW_DATASET} \
-DARROW_DEPENDENCY_SOURCE="VCPKG" \
-DARROW_DEPENDENCY_USE_SHARED=OFF \
-DARROW_FILESYSTEM=ON \
-DARROW_FLIGHT=${ARROW_FLIGHT} \
-DARROW_GANDIVA=${ARROW_GANDIVA} \
-DARROW_GCS=${ARROW_GCS} \
-DARROW_HDFS=${ARROW_HDFS} \
-DARROW_JEMALLOC=${ARROW_JEMALLOC} \
-DARROW_JSON=ON \
-DARROW_MIMALLOC=${ARROW_MIMALLOC} \
-DARROW_ORC=${ARROW_ORC} \
-DARROW_PACKAGE_KIND="python-wheel-manylinux${MANYLINUX_VERSION}" \
-DARROW_PARQUET=${ARROW_PARQUET} \
-DPARQUET_REQUIRE_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION} \
-DARROW_PLASMA=${ARROW_PLASMA} \
-DARROW_PYTHON=ON \
-DARROW_RPATH_ORIGIN=ON \
-DARROW_SUBSTRAIT=${ARROW_SUBSTRAIT} \
-DARROW_S3=${ARROW_S3} \
-DARROW_SUBSTRAIT=${ARROW_SUBSTRAIT} \
-DARROW_TENSORFLOW=${ARROW_TENSORFLOW} \
-DARROW_USE_CCACHE=ON \
-DARROW_UTF8PROC_USE_SHARED=OFF \
-DARROW_WITH_BROTLI=${ARROW_WITH_BROTLI} \
-DARROW_WITH_BZ2=${ARROW_WITH_BZ2} \
-DARROW_WITH_LZ4=${ARROW_WITH_LZ4} \
@@ -125,9 +125,9 @@ cmake \
-DCMAKE_INSTALL_LIBDIR=lib \
-DCMAKE_INSTALL_PREFIX=/tmp/arrow-dist \
-DCMAKE_UNITY_BUILD=${CMAKE_UNITY_BUILD} \
-DOPENSSL_USE_STATIC_LIBS=ON \
-DORC_PROTOBUF_EXECUTABLE=${VCPKG_ROOT}/installed/${VCPKG_TARGET_TRIPLET}/tools/protobuf/protoc \
-DORC_SOURCE=BUNDLED \
-DPARQUET_REQUIRE_ENCRYPTION=${PARQUET_REQUIRE_ENCRYPTION} \
-DVCPKG_MANIFEST_MODE=OFF \
-DVCPKG_TARGET_TRIPLET=${VCPKG_TARGET_TRIPLET} \
${ARROW_EXTRA_CMAKE_FLAGS} \
9 changes: 6 additions & 3 deletions ci/scripts/python_wheel_windows_build.bat
Original file line number Diff line number Diff line change
@@ -62,21 +62,23 @@ cmake ^
-DARROW_BUILD_SHARED=ON ^
-DARROW_BUILD_STATIC=OFF ^
-DARROW_BUILD_TESTS=OFF ^
-DARROW_COMPUTE=ON ^
-DARROW_CSV=ON ^
-DARROW_CXXFLAGS="/MP" ^
-DARROW_DATASET=%ARROW_DATASET% ^
-DARROW_DEPENDENCY_SOURCE=VCPKG ^
-DARROW_DEPENDENCY_USE_SHARED=OFF ^
-DARROW_FILESYSTEM=ON ^
-DARROW_FLIGHT=%ARROW_FLIGHT% ^
-DARROW_GANDIVA=%ARROW_GANDIVA% ^
-DARROW_HDFS=%ARROW_HDFS% ^
-DARROW_JSON=ON ^
-DARROW_MIMALLOC=%ARROW_MIMALLOC% ^
-DARROW_ORC=%ARROW_ORC% ^
-DARROW_PACKAGE_KIND="python-wheel-windows" ^
-DARROW_PARQUET=%ARROW_PARQUET% ^
-DPARQUET_REQUIRE_ENCRYPTION=%PARQUET_REQUIRE_ENCRYPTION% ^
-DARROW_PYTHON=ON ^
-DARROW_SUBSTRAIT=%ARROW_SUBSTRAIT% ^
-DARROW_S3=%ARROW_S3% ^
-DARROW_SUBSTRAIT=%ARROW_SUBSTRAIT% ^
-DARROW_TENSORFLOW=%ARROW_TENSORFLOW% ^
-DARROW_WITH_BROTLI=%ARROW_WITH_BROTLI% ^
-DARROW_WITH_BZ2=%ARROW_WITH_BZ2% ^
@@ -90,6 +92,7 @@ cmake ^
-DCMAKE_INSTALL_PREFIX=C:\arrow-dist ^
-DCMAKE_UNITY_BUILD=%CMAKE_UNITY_BUILD% ^
-DMSVC_LINK_VERBOSE=ON ^
-DPARQUET_REQUIRE_ENCRYPTION=%PARQUET_REQUIRE_ENCRYPTION% ^
-DVCPKG_MANIFEST_MODE=OFF ^
-DVCPKG_TARGET_TRIPLET=%VCGPK_TARGET_TRIPLET% ^
-G "%CMAKE_GENERATOR%" ^
84 changes: 82 additions & 2 deletions cpp/CMakePresets.json
Original file line number Diff line number Diff line change
@@ -117,17 +117,61 @@
"ARROW_GANDIVA": "ON"
}
},
{
"name": "features-python-minimal",
"inherits": [
"features-minimal"
],
"hidden": true,
"cacheVariables": {
"ARROW_COMPUTE": "ON",
"ARROW_CSV": "ON",
"ARROW_FILESYSTEM": "ON",
"ARROW_JSON": "ON"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't yet used the presets myself, but shall we include ARROW_DATASET here as well? (that was included with the previous features-python preset.
Maybe you can also keep the exact name to keep it working for people that were using those presets?

cc @wjones127

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm fine not including datasets if it's not required for the Python build. In most workflows with CMakePresets.json, I think we expect developers to create their own presets in CMakeUserPresets.json, which will inherit from the provided once. (See an example here)

Once this is merged, I can send a notice to the mailing list with instructions on how to transition to the new presets. (I need to update my blog post anyways.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can also keep the exact name to keep it working for people that were using those presets?

It makes sense. I added no -minimal/-maximal versions.

}
},
{
"name": "features-python",
"inherits": [
"features-main"
],
"hidden": true,
"cacheVariables": {
"ARROW_COMPUTE": "ON",
"ARROW_CSV": "ON",
"ARROW_DATASET": "ON",
"ARROW_FILESYSTEM": "ON",
"ARROW_JSON": "ON",
"ARROW_ORC": "ON"
}
},
{
"name": "features-python-maximal",
"inherits": [
"features-cuda",
"features-filesystems",
"features-flight",
"features-gandiva",
"features-main",
"features-python-minimal"
],
"hidden": true,
"cacheVariables": {
"ARROW_ORC": "ON",
"PARQUET_REQUIRE_ENCRYPTION": "ON"
}
},
{
"name": "features-maximal",
"inherits": [
"features-main",
"features-cuda",
"features-filesystems",
"features-flight",
"features-gandiva"
"features-gandiva",
"features-python-maximal"
],
"hidden": true,
"displayName": "Debug build with everything enabled (except benchmarks and CUDA)",
"cacheVariables": {
"ARROW_BUILD_EXAMPLES": "ON",
"ARROW_BUILD_UTILITIES": "ON",
@@ -185,6 +229,24 @@
"displayName": "Debug build with tests and Gandiva",
"cacheVariables": {}
},
{
"name": "ninja-debug-python-minimal",
"inherits": ["base-debug", "features-python-minimal"],
"displayName": "Debug build for PyArrow with minimal features",
"cacheVariables": {}
},
{
"name": "ninja-debug-python",
"inherits": ["base-debug", "features-python"],
"displayName": "Debug build for PyArrow with common features (for backward compatibility)",
"cacheVariables": {}
},
{
"name": "ninja-debug-python-maximal",
"inherits": ["base-debug", "features-python-maximal"],
"displayName": "Debug build for PyArrow with everything enabled (except CUDA)",
"cacheVariables": {}
},
{
"name": "ninja-debug-maximal",
"inherits": ["base-debug", "features-maximal"],
@@ -228,6 +290,24 @@
"displayName": "Release build with Gandiva",
"cacheVariables": {}
},
{
"name": "ninja-release-python-minimal",
"inherits": ["base-release", "features-python-minimal"],
"displayName": "Release build for PyArrow with minimal features",
"cacheVariables": {}
},
{
"name": "ninja-release-python",
"inherits": ["base-release", "features-python"],
"displayName": "Release build for PyArrow with common features (for backward compatibility)",
"cacheVariables": {}
},
{
"name": "ninja-release-python-maximal",
"inherits": ["base-release", "features-python-maximal"],
"displayName": "Release build for PyArrow with everything enabled (except CUDA)",
"cacheVariables": {}
},
{
"name": "ninja-release-maximal",
"inherits": ["base-release", "features-maximal"],
Loading