Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: ARROW-2461: [Python] Build manylinux2010 wheels #4391

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -389,6 +389,28 @@ services:
- ./python/manylinux1:/io:delegated
command: /io/build_arrow.sh

python-manylinux2010:
# Usage:
# either build:
# $ docker-compose build python-manylinux2010
# or pull:
# $ docker-compose pull python-manylinux2010
# an then run:
# $ docker-compose run -e PYTHON_VERSION=3.7 python-manylinux1
image: quay.io/xhochy/arrow_manylinux2010_x86_64_base:latest
build:
context: python/manylinux2010
dockerfile: Dockerfile-x86_64_base
shm_size: 2G
environment:
PYARROW_PARALLEL: 3
PYTHON_VERSION: ${PYTHON_VERSION:-3.6}
UNICODE_WIDTH: ${UNICODE_WIDTH:-16}
volumes:
- .:/arrow:delegated
- ./python/manylinux2010:/io:delegated
command: /io/build_arrow.sh

######################### Integration Tests #################################

# impala:
Expand Down
1 change: 1 addition & 0 deletions python/manylinux2010/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
dist
99 changes: 99 additions & 0 deletions python/manylinux2010/Dockerfile-x86_64_base
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
FROM quay.io/pypa/manylinux2010_x86_64:latest

# Install dependencies
RUN yum install -y xz ccache flex wget && yum clean all

ADD scripts/build_zlib.sh /
RUN /build_zlib.sh

ADD scripts/build_openssl.sh /
RUN /build_openssl.sh

ADD scripts/build_boost.sh /
RUN /build_boost.sh

# Install cmake manylinux1 package
ADD scripts/install_cmake.sh /
RUN /install_cmake.sh

ADD scripts/build_gtest.sh /
RUN /build_gtest.sh
ENV GTEST_HOME /usr

ADD scripts/build_flatbuffers.sh /
RUN /build_flatbuffers.sh
ENV FLATBUFFERS_HOME /usr

ADD scripts/build_bison.sh /
RUN /build_bison.sh

ADD scripts/build_thrift.sh /
RUN /build_thrift.sh
ENV THRIFT_HOME /usr

ADD scripts/build_brotli.sh /
RUN /build_brotli.sh
ENV BROTLI_HOME /usr

ADD scripts/build_snappy.sh /
RUN /build_snappy.sh
ENV SNAPPY_HOME /usr

ADD scripts/build_lz4.sh /
RUN /build_lz4.sh
ENV LZ4_HOME /usr

ADD scripts/build_zstd.sh /
RUN /build_zstd.sh
ENV ZSTD_HOME /usr

ADD scripts/build_ccache.sh /
RUN /build_ccache.sh

ADD scripts/build_protobuf.sh /
RUN /build_protobuf.sh
ENV PROTOBUF_HOME /usr

ADD scripts/build_glog.sh /
RUN /build_glog.sh
ENV GLOG_HOME /usr

WORKDIR /
RUN git clone https://github.com/matthew-brett/multibuild.git && cd multibuild && git checkout ffe59955ad8690c2f8bb74766cb7e9b0d0ee3963

ADD scripts/build_virtualenvs.sh /
RUN /build_virtualenvs.sh

ADD scripts/build_llvm.sh /
RUN /build_llvm.sh

ADD scripts/build_clang.sh /
RUN /build_clang.sh

ADD scripts/build_double_conversion.sh /
RUN /build_double_conversion.sh

ADD scripts/build_rapidjson.sh /
RUN /build_rapidjson.sh

ADD scripts/build_re2.sh /
RUN /build_re2.sh

ADD scripts/build_gflags.sh /
RUN /build_gflags.sh
117 changes: 117 additions & 0 deletions python/manylinux2010/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
<!---
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

## Manylinux1 wheels for Apache Arrow

This folder provides base Docker images and an infrastructure to build
`manylinux1` compatible Python wheels that should be installable on all
Linux distributions published in last four years.

The process is split up in two parts: There are base Docker images that build
the native, Python-indenpendent dependencies. For these you can select if you
want to also build the dependencies used for the Parquet support. Depending on
these images, there is also a bash script that will build the pyarrow wheels
for all supported Python versions and place them in the `dist` folder.

### Build instructions

You can build the wheels with the following
command (this is for Python 2.7 with unicode width 16, similarly you can pass
in `PYTHON_VERSION="3.5"`, `PYTHON_VERSION="3.6"` or `PYTHON_VERSION="3.7"` or
use `PYTHON_VERSION="2.7"` with `UNICODE_WIDTH=32`):

```bash
# Build the python packages
docker run --env PYTHON_VERSION="2.7" --env UNICODE_WIDTH=16 --shm-size=2g --rm -t -i -v $PWD:/io -v $PWD/../../:/arrow quay.io/xhochy/arrow_manylinux1_x86_64_base:latest /io/build_arrow.sh
# Now the new packages are located in the dist/ folder
ls -l dist/
```

### Updating the build environment
The base docker image is less often updated. In the case we want to update
a dependency to a new version, we also need to adjust it. You can rebuild
this image using

```bash
docker build -t arrow_manylinux1_x86_64_base -f Dockerfile-x86_64_base .
```

For each dependency, we have a bash script in the directory `scripts/` that
downloads the sources, builds and installs them. At the end of each dependency
build the sources are removed again so that only the binary installation of a
dependency is persisted in the docker image. When you do local adjustments to
this image, you need to change the name of the docker image in the `docker run`
command.

### Using quay.io to trigger and build the docker image

1. Make the change in the build scripts (eg. to modify the boost build, update `scripts/boost.sh`).

2. Setup an account on quay.io and link to your GitHub account

3. In quay.io, Add a new repository using :

1. Link to GitHub repository push
2. Trigger build on changes to a specific branch (eg. myquay) of the repo (eg. `pravindra/arrow`)
3. Set Dockerfile location to `/python/manylinux1/Dockerfile-x86_64_base`
4. Set Context location to `/python/manylinux1`

4. Push change (in step 1) to the branch specified in step 3.ii

* This should trigger a build in quay.io, the build takes about 2 hrs to finish.

5. Add a tag `latest` to the build after step 4 finishes, save the build ID (eg. `quay.io/pravindra/arrow_manylinux1_x86_64_base:latest`)

6. In your arrow PR,

* include the change from 1.
* modify `travis_script_manylinux.sh` to switch to the location from step 5 for the docker image.

## TensorFlow compatible wheels for Arrow

As TensorFlow is not compatible with the manylinux1 standard, the above
wheels can cause segfaults if they are used together with the TensorFlow wheels
from https://www.tensorflow.org/install/pip. We do not recommend using
TensorFlow wheels with pyarrow manylinux1 wheels until these incompatibilities
are addressed by the TensorFlow team [1]. For most end-users, the recommended
way to use Arrow together with TensorFlow is through conda.
If this is not an option for you, there is also a way to produce TensorFlow
compatible Arrow wheels that however do not conform to the manylinux1 standard
and are not officially supported by the Arrow community.

Similar to the manylinux1 wheels, there is a base image that can be built with

```bash
docker build -t arrow_linux_x86_64_base -f Dockerfile-x86_64_ubuntu .
```

Once the image has been built, you can then build the wheels with the following
command (this is for Python 2.7 with unicode width 16, similarly you can pass
in `PYTHON_VERSION="3.5"`, `PYTHON_VERSION="3.6"` or `PYTHON_VERSION="3.7"` or
use `PYTHON_VERSION="2.7"` with `UNICODE_WIDTH=32`)

```bash
# Build the python packages
sudo docker run --env UBUNTU_WHEELS=1 --env PYTHON_VERSION="2.7" --env UNICODE_WIDTH=16 --rm -t -i -v $PWD:/io -v $PWD/../../:/arrow arrow_linux_x86_64_base:latest /io/build_arrow.sh
# Now the new packages are located in the dist/ folder
ls -l dist/
echo "Please note that these wheels are not manylinux1 compliant"
```

[1] https://groups.google.com/a/tensorflow.org/d/topic/developers/TMqRaT-H2bI/discussion
143 changes: 143 additions & 0 deletions python/manylinux2010/build_arrow.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
# Usage:
# docker run --rm -v $PWD:/io arrow-base-x86_64 /io/build_arrow.sh

# Build upon the scripts in https://github.com/matthew-brett/manylinux-builds
# * Copyright (c) 2013-2016, Matt Terry and Matthew Brett (BSD 2-clause)

source /multibuild/manylinux_utils.sh

# Quit on failure
set -e

# Print commands for debugging
set -x

cd /arrow/python

# PyArrow build configuration
export PYARROW_BUILD_TYPE='release'
export PYARROW_CMAKE_GENERATOR='Ninja'
export PYARROW_WITH_ORC=1
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PLASMA=1
export PYARROW_BUNDLE_ARROW_CPP=1
export PYARROW_BUNDLE_BOOST=1
export PYARROW_BOOST_NAMESPACE=arrow_boost
export PKG_CONFIG_PATH=/usr/lib/pkgconfig:/arrow-dist/lib/pkgconfig

export PYARROW_CMAKE_OPTIONS='-DTHRIFT_HOME=/usr -DBoost_NAMESPACE=arrow_boost -DBOOST_ROOT=/arrow_boost_dist'
# Ensure the target directory exists
mkdir -p /io/dist

# Must pass PYTHON_VERSION and UNICODE_WIDTH env variables
# possible values are: 2.7,16 2.7,32 3.5,16 3.6,16 3.7,16

CPYTHON_PATH="$(cpython_path ${PYTHON_VERSION} ${UNICODE_WIDTH})"
PYTHON_INTERPRETER="${CPYTHON_PATH}/bin/python"
PIP="${CPYTHON_PATH}/bin/pip"
PATH="${PATH}:${CPYTHON_PATH}"

#if [ "${PYTHON_VERSION}" != "2.7" ]; then
# # Gandiva is not supported on Python 2.7
# export PYARROW_WITH_GANDIVA=1
# export BUILD_ARROW_GANDIVA=ON
#else
export PYARROW_WITH_GANDIVA=0
export BUILD_ARROW_GANDIVA=OFF
#fi

echo "=== (${PYTHON_VERSION}) Building Arrow C++ libraries ==="
ARROW_BUILD_DIR=/tmp/build-PY${PYTHON_VERSION}-${UNICODE_WIDTH}
mkdir -p "${ARROW_BUILD_DIR}"
pushd "${ARROW_BUILD_DIR}"
#-DARROW_GANDIVA_PC_CXX_FLAGS="-isystem;/opt/rh/devtoolset-8/root/usr/include/c++/8/" \
PATH="${CPYTHON_PATH}/bin:${PATH}" cmake -DCMAKE_BUILD_TYPE=Release \
-DARROW_DEPENDENCY_SOURCE="SYSTEM" \
-DZLIB_ROOT=/usr/local \
-DCMAKE_INSTALL_PREFIX=/arrow-dist \
-DCMAKE_INSTALL_LIBDIR=lib \
-DARROW_BUILD_TESTS=OFF \
-DARROW_BUILD_SHARED=ON \
-DARROW_BOOST_USE_SHARED=ON \
-DARROW_JEMALLOC=ON \
-DARROW_RPATH_ORIGIN=ON \
-DARROW_PYTHON=ON \
-DARROW_PARQUET=ON \
-DPythonInterp_FIND_VERSION=${PYTHON_VERSION} \
-DARROW_PLASMA=ON \
-DARROW_TENSORFLOW=ON \
-DARROW_ORC=ON \
-DARROW_GANDIVA=${BUILD_ARROW_GANDIVA} \
-DARROW_GANDIVA_JAVA=OFF \
-DBoost_NAMESPACE=arrow_boost \
-DBOOST_ROOT=/arrow_boost_dist \
-GNinja /arrow/cpp
ninja install
popd

# Check that we don't expose any unwanted symbols
/io/scripts/check_arrow_visibility.sh

echo "=== (${PYTHON_VERSION}) Install the wheel build dependencies ==="
$PIP install -r requirements-wheel.txt

# Clear output directory
rm -rf dist/
echo "=== (${PYTHON_VERSION}) Building wheel ==="
# Remove build directory to ensure CMake gets a clean run
rm -rf build/
PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py build_ext \
--inplace \
--bundle-arrow-cpp \
--bundle-boost \
--boost-namespace=arrow_boost
PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py bdist_wheel
PATH="$PATH:${CPYTHON_PATH}/bin" $PYTHON_INTERPRETER setup.py sdist

if [ -n "$UBUNTU_WHEELS" ]; then
echo "=== (${PYTHON_VERSION}) Wheels are not compatible with manylinux1 ==="
mv dist/pyarrow-*.whl /io/dist
else
echo "=== (${PYTHON_VERSION}) Tag the wheel with manylinux2010 ==="
mkdir -p repaired_wheels/
auditwheel -v repair --plat manylinux2010_x86_64 -L . dist/pyarrow-*.whl -w repaired_wheels/

# Install the built wheels
$PIP install repaired_wheels/*.whl

# Test that the modules are importable
$PYTHON_INTERPRETER -c "
import sys
import pyarrow
import pyarrow.orc
import pyarrow.parquet
import pyarrow.plasma

#if sys.version_info.major > 2:
# import pyarrow.gandiva
"

# More thorough testing happens outsite of the build to prevent
# packaging issues like ARROW-4372
mv dist/*.tar.gz /io/dist
mv repaired_wheels/*.whl /io/dist
fi
Loading