-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade WebAssembly build to Arrow 1.0.1 #1207
Conversation
50d9237
to
229f764
Compare
if(${CMAKE_SYSTEM_NAME} MATCHES "Windows") | ||
# windows its just "arrow.dll" | ||
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY "arrow_python") | ||
set(PYTHON_PYARROW_ARROW_SHARED_LIBRARY "arrow") | ||
set(PYTHON_PYARROW_LIBRARIES ${PYTHON_PYARROW_PYTHON_SHARED_LIBRARY} ${PYTHON_PYARROW_ARROW_SHARED_LIBRARY}) | ||
elseif (CMAKE_SYSTEM_NAME MATCHES "Darwin") | ||
# Link against pre-built libarrow on MacOS | ||
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow_python.${PYARROW_VERSION_MINOR}.dylib) | ||
set(PYTHON_PYARROW_ARROW_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow.${PYARROW_VERSION_MINOR}.dylib) | ||
set(PYTHON_PYARROW_PYTHON_SHARED_LIBRARY ${PYTHON_PYARROW_LIBRARY_DIR}/${CMAKE_SHARED_LIBRARY_PREFIX}arrow_python.100.dylib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't hardcode the version, just use "arrow_python.dylib" etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are going to have to cope with the conditional naming for these libraries, as they are not disted in this version under this name (and don't seem to exhibit any consistent version convention at all).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's link this into an apache arrow issue, for pyarrow < 1 they had symlinks and we'll need those for some platforms
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesm this makes linking against these libraries super ugly. Since the version is managed by python anyway (e.g. in the version of the python package), wouldn't it be better to ship just the unversioned binary (e.g. libarrow.dylib)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See http://arrow.apache.org/docs/python/extending.html#building-extensions-against-pypi-wheels. We had to make changes to prevent shared libraries from being duplicated in the wheels. It might be possible to change things to ship unversioned shared libraries instead, but it will require some surgery on the Cython modules to get them to link with the unversioned libs (because they look for the libraries with the ABI version when they link). Feel free to open a JIRA issue in Apache Arrow with a proposal -- I have definitely hit the limit for how much time I can personally spend on it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @wesm
@timkpaine I am not particularly bothered by this, it is easy to fix and this file must be manually updated for new Arrow versions regardless.
cpp/perspective/CMakeLists.txt
Outdated
target_link_libraries(binding psp) | ||
target_compile_options(psp PRIVATE -Wno-deprecated-register) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are these for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Silences python2 deprecation warnings which make our build logs un-readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're doing this on windows too though, are you sure this flag will work there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know ... our windows azure builds do not work ...
@@ -10,6 +10,7 @@ | |||
#include <perspective/emscripten.h> | |||
#include <perspective/arrow_loader.h> | |||
#include <perspective/arrow_writer.h> | |||
#include "arrow/csv/api.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match style with <>
python/perspective/setup.py
Outdated
"python-dateutil>=2.8.0", | ||
"six>=1.11.0", | ||
"traitlets>=4.3.2", | ||
] | ||
|
||
if sys.version_info.major < 3: | ||
requires.append("backports.shutil-which") | ||
requires.append("pyarrow<2") | ||
else: | ||
requires.append("pyarrow>=1.0.1,<2") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should match pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This version is conditional on python version - does this not make the looser bounds "pyarrow<2"
for pyproject.toml
correct? Perspective under this PR will compile with either pyarrow
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pyproject.toml
specifies it as pyarrow<2
regardless of python version. So it could satisfy with pyarrow<0.16
even though that won't work.
You have a couple of options, something like:
"pyarrow>0.16.0,<2; python_version < '3'", "pyarrow>1.0.1,<2; python_version >= '3'"
would work ok.
Stylistically, every place where we specify requirements should match exactly the behavior of pyproject.toml, otherwise we can very easily start to break our builds and wheels like the state we were in before.
azure-pipelines.yml
Outdated
@@ -181,7 +182,7 @@ jobs: | |||
# displayName: "Which python" | |||
|
|||
# - script: | | |||
# python -m pip install numpy "pyarrow>=0.16.0,<1" | |||
# python -m pip install numpy "pyarrow>=1.0.1,<2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match pyproject.toml
azure-pipelines.yml
Outdated
@@ -338,8 +344,14 @@ jobs: | |||
displayName: "Which python" | |||
|
|||
- script: | | |||
python -m pip install delocate wheel numpy "pyarrow>=0.16.0,<1" | |||
displayName: "Python deps" | |||
python -m pip install delocate wheel numpy "pyarrow>=1.0.1,<2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
match pyproject.toml
@texodus the only issue really is that |
@timkpaine I've moved the |
@texodus do you want me to make the python changes on that branch or do you want them in a separate PR? |
I already made most of them - just the |
Upgrades Apache Arrow to v1.0.1 for all targets except Python 2.7. For the WebAssembly/Emscripten build, CSV loading and writing (via
to_csv()
) has been rewritten to use Arrow over C++, resulting in a ~10x load performance increase, depending on whether you provide a schema and what data types your columns are.For Python 2.7, Arrow requirements are
pyarrow<2
only, which is 0.16.1. I've done some limited testing to ensure this is cross-compatible with Arrow Table data generated or read-from 1.0.1, but this is something we should probably keep an eye on; especially regarding the JupyterLab plugin@finos/perspective-jupyterlab
, whose client/server distributed mode would use engines compiled to both versions simultaneously when Python 2.7 runs the kernel.