Skip to content

Commit

Permalink
ARROW-17891: [Docs][Python] Update and sync Win section of the develo…
Browse files Browse the repository at this point in the history
…pers/python page (#14350)

This PR updates Windows section of the Python Development page. Main changes:

- use Python 3.10 (also in instructions for Linux/MacOs)
- definition of `PATH` not needed as Python doesn't search in `PATH` for dlls anymore ([3.8 +](https://bugs.python.org/issue43173))
- use `CONDA_PREFIX` to define `ARROW_HOME` as in other parts of the docs
- remove **Running C++ unit tests for Python integration** section (C++ unit tests are part of `pytest`-based test module as of #14117)

cc @wjones127 @jorisvandenbossche 

Authored-by: Alenka Frim <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
  • Loading branch information
AlenkaF authored and kou committed Oct 20, 2022
1 parent 8f03259 commit 6278176
Showing 1 changed file with 12 additions and 65 deletions.
77 changes: 12 additions & 65 deletions docs/source/developers/python.rst
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ dependencies for Arrow C++ and PyArrow as pre-built binaries, which can make
Arrow development easier and faster.

Let's create a conda environment with all the C++ build and Python dependencies
from conda-forge, targeting development for Python 3.9:
from conda-forge, targeting development for Python 3.10:

On Linux and macOS:

Expand All @@ -210,7 +210,7 @@ On Linux and macOS:
--file arrow/ci/conda_env_python.txt \
--file arrow/ci/conda_env_gandiva.txt \
compilers \
python=3.9 \
python=3.10 \
pandas
As of January 2019, the ``compilers`` package is needed on many Linux
Expand Down Expand Up @@ -495,31 +495,28 @@ First, starting from a fresh clone of Apache Arrow:
--file arrow\ci\conda_env_cpp.txt ^
--file arrow\ci\conda_env_python.txt ^
--file arrow\ci\conda_env_gandiva.txt ^
python=3.9
python=3.10
$ conda activate pyarrow-dev
Now, we build and install Arrow C++ libraries.

We set a number of environment variables:

- the path of the installation directory of the Arrow C++ libraries as
``ARROW_HOME``
- add the path of installed DLL libraries to ``PATH``
- and the CMake generator to be used as ``PYARROW_CMAKE_GENERATOR``
We set the path of the installation directory of the Arrow C++ libraries as
``ARROW_HOME``. When using a conda environment, Arrow C++ is installed
in the environment directory, which path is saved in the
`CONDA_PREFIX <https://docs.conda.io/projects/conda-build/en/latest/user-guide/environment-variables.html#environment-variables-that-affect-the-build-process>`_
environment variable.

.. code-block::
$ set ARROW_HOME=%cd%\arrow-dist
$ set PATH=%ARROW_HOME%\bin;%PATH%
$ set PYARROW_CMAKE_GENERATOR=Visual Studio 15 2017 Win64
$ set ARROW_HOME=%CONDA_PREFIX%\Library
Let's configure, build and install the Arrow C++ libraries:

.. code-block::
$ mkdir arrow\cpp\build
$ pushd arrow\cpp\build
$ cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
$ cmake -G "Ninja" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DCMAKE_UNITY_BUILD=ON ^
-DARROW_COMPUTE=ON ^
Expand All @@ -535,7 +532,7 @@ Let's configure, build and install the Arrow C++ libraries:
-DARROW_WITH_ZLIB=ON ^
-DARROW_WITH_ZSTD=ON ^
..
$ cmake --build . --target INSTALL --config Release
$ cmake --build . --target install --config Release
$ popd
Now, we can build pyarrow:
Expand Down Expand Up @@ -572,10 +569,6 @@ Then run the unit tests with:
the Python extension. This is recommended for development as it allows the
C++ libraries to be re-built separately.

As a consequence however, ``python setup.py install`` will also not install
the Arrow C++ libraries. Therefore, to use ``pyarrow`` in python, ``PATH``
must contain the directory with the Arrow .dll-files.

If you want to bundle the Arrow C++ libraries with ``pyarrow``, add
the ``--bundle-arrow-cpp`` option when building:

Expand All @@ -586,56 +579,10 @@ Then run the unit tests with:
Important: If you combine ``--bundle-arrow-cpp`` with ``--inplace`` the
Arrow C++ libraries get copied to the source tree and are not cleared
by ``python setup.py clean``. They remain in place and will take precedence
over any later Arrow C++ libraries contained in ``PATH``. This can lead to
over any later Arrow C++ libraries contained in ``CONDA_PREFIX``. This can lead to
incompatibilities when ``pyarrow`` is later built without
``--bundle-arrow-cpp``.

Running C++ unit tests for Python integration
---------------------------------------------

Running C++ unit tests should not be necessary for most developers. If you do
want to run them, you need to pass ``-DARROW_BUILD_TESTS=ON`` during
configuration of the Arrow C++ library build:

.. code-block::
$ mkdir arrow\cpp\build
$ pushd arrow\cpp\build
$ cmake -G "%PYARROW_CMAKE_GENERATOR%" ^
-DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-DARROW_BUILD_TESTS=ON ^
-DARROW_COMPUTE=ON ^
-DARROW_CSV=ON ^
-DARROW_CXXFLAGS="/WX /MP" ^
-DARROW_DATASET=ON ^
-DARROW_FILESYSTEM=ON ^
-DARROW_HDFS=ON ^
-DARROW_JSON=ON ^
-DARROW_PARQUET=ON ^
..
$ cmake --build . --target INSTALL --config Release
$ popd
Getting ``arrow-python-test.exe`` (C++ unit tests for python integration) to
run is a bit tricky because your ``%PYTHONHOME%`` must be configured to point
to the active conda environment:

.. code-block::
$ set PYTHONHOME=%CONDA_PREFIX%
$ pushd arrow\cpp\build\release\Release
$ arrow-python-test.exe
$ popd
To run all tests of the Arrow C++ library, you can also run ``ctest``:

.. code-block::
$ set PYTHONHOME=%CONDA_PREFIX%
$ pushd arrow\cpp\build
$ ctest
$ popd
Caveats
-------

Expand Down

0 comments on commit 6278176

Please sign in to comment.