-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Statically link perspective-python
and libarrow
#1290
Conversation
perspective-python
perspective-python
and libarrow
74e74b8
to
938421e
Compare
WIP windows flatbuffers Install flatbuffers using vcpkg link against rt for python2 manylinux use vcpkg toolchain file WIP why not try chocolatey
pull in flatbuffers as an external dep wip fix windows build nocopy arrow dlls
…to install PyArrow 2
@texodus tests now run against PyArrow 2.0.0, checked locally and on Azure Mac/Windows jobs. Just need to regen the docker images and should be all clear. @timkpaine would you be able to pull down this branch and take a look if you have time? I don't have a Windows machine to verify, but the build should work as long as you have flatbuffers installed before build. |
Need to check that an arrow passed in from python will be binary compatible with the version we built against, so expose the arrow api version into python and check against whatever attribute is there on the python side, otherwise it will likely just seg fault when the arrow versions are incompatible right now if arrow versions are incompatible, you get the error message that disables the c++ code, so something similar but just for arrow |
@timkpaine Thanks for the feedback! I don't think this is necessarily an issue, since
I dug up a very old
I concur this needs work, lets aim to improve the quality of our error reporting in a PR which targets this specifically. |
# Chocolatey for our Azure Windows job. | ||
if (WIN32) | ||
psp_build_dep("flatbuffers" "${PSP_CMAKE_MODULE_PATH}/flatbuffers.txt.in") | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do the flatbuffers
headers need to be available only on Windows?
you have installed. To disable this, pass the `--no-build-isolation` flag to | ||
pip. | ||
|
||
- Flatbuffers not installed prior to installing Perspective |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a newline, markdown is picky sometimes
Looks good! Thanks for the PR! |
So long as it fails in python and not in c++ (e.g. no segfault, user can handle in python) its "fine".
Note we won't need to re-add |
I think I might have removed the |
I've fixed on my jlab 3 PR |
@timkpaine Just to re-emphasize - this PR changes no python, it only removes the dependency shared
|
So long as the arrow validation doesn't trigger un catchable errors in python, it should be a straight improvement. If you're shoving an arrow into perspective in python, you're probably using pyarrow. Right now we go pyarrow table -> byte string copy -> perspective (2 copies total). We can no longer do pyarrow table -> no copy arrow table -> perspective (1 copy total), but I think it's ok given the better install experience (and we never implemented it anyway). So the only negative is that if you're shoving an arrow from your version of pyarrow into our version of arrow and it's incompatible, you get an obscure error. We should make it a PerspectiveCPPException or whatever we called it. |
you mean this comment? #1157 (comment) |
IIRC it should be a I saw the fix for rpath on #1294 - I definitely was heavy-handed in removing PyArrow-related code from our CMakeLists, and cut out the "set rpath to $ORIGIN" block as well. Thanks for the fix! |
Yep because we expose |
This reverts commit 5032199.
This reverts commit 5032199.
This PR builds Apache Arrow as a static library and reconfigures
perspective-python
to no longer link against a version of PyArrow present on the system/installed as part ofperspective-python
dependencies.Instead, Perspective's C++ library will link against the pre-compiled, minimal arrow static library, and the user is free to use any version of PyArrow they choose to install as long as the Arrow binary format is compatible with the binaries output by Perspective's version of Arrow, which is v1.0.1 at this time.
At present, Perspective does not
import pyarrow
or use any of Arrow's Python-specific C++ code. If we were to use PyArrow in the Python runtime, however (like in the test suite), Perspective's C++ version of Arrow will continue to work as expected—perspective-python
will use its own version of Arrow, andpyarrow
will use thelibarrow.so
that shipped with the PyArrow install. This means further integration with PyArrow is not blocked/degraded, and if anything should be accelerated as we now have a consistent Arrow version across all Perspective runtimes.In effect, we have taken a "pinned" version of PyArrow that used to depend on external assets we could not fully/consistently control, and turned it into a pinned version of PyArrow that we have full management and control over.
Benefits
With this change, the complicated and admittedly brittle code we have in place to deal with detecting PyArrow's install location, managing PyArrow versions, copying Arrow DLLs (on Windows), etc. can be removed, as Perspective's build no longer depends on the existence and correct versioning of an asset we have little control over.
Additionally, users can install the newest version of PyArrow in order to gain access to features/fixes without worrying about Perspective compatibility beyond the binary format, which is expected to remain stable and compatible between versions. Finally, the developer experience should be improved for both source builds and configuring the
conda-forge
build, which can unpin its PyArrow dependency.Testing
Perspective's WASM build has built and used the minimal version of Arrow since late 2019, and
perspective-python
's test suite contains a litany of tests that utilize PyArrow to generate/ingest arrow binaries to and from Perspective. This PR updates the Manylinux Dockerfiles and Python test suite to use PyArrowv2.0.0
(except for Python 2, which continues to usev0.16.0
). This in effect gives us full coverage over forwards and backwards compatibility between our static Arrow pinned tov.1.0.1
and the versions of PyArrow used by the tests.TODO:
setup.py
,pyproject.toml
etc.