-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using pyarrow with pypy #2089
Comments
The failure does not seem to be related to pypy but rather a version mismatch between the libarrow package and what you install as Python. For building with pypy I would rather suggest that you use the plain build from source https://arrow.apache.org/docs/python/development.html#development I'm not aware of anyone that uses pyarrow together with pypy so it might or might not work. Probably the latter but then we should have a quick look if it may be possible to easily fix it or if it's a larger task. |
I semi-followed the instructions (with a few modifications, such as editing arrow/cpp/CMakeCache.txt to PYTHON_EXECUTABLE). I was able to build both arrow and parquet. I am able to write a simple parquet file (haven't yet tried a more advanced scenario):
So.. it might work, will try to get some time to do a more "real world" scenario. Tests
So most of them passes (and a few fails) but there are 4 segfaults:
They might have more then one test that segfaults, I just took the one that aborted the test (I didn't go in an manually exclude the segfaulting to see if there are more) |
I have opened https://issues.apache.org/jira/browse/ARROW-2651 to track the PyPy support of Arrow. I must say that your outcome is better than I had expected. Regading the segfaults, it is hard to estimate on how difficult they are to fix. To provide more information on the segmentation faults, you could run the code with coredumps enabled and afterwards inspect the coredump and post the backtrace here. This will work roughly as follows:
Then post the output of |
I've also opened an issue on PyPy https://bitbucket.org/pypy/pypy/issues/2842/running-pyarrow-on-pypy-segfaults and created a reproducable sample at https://github.com/bivald/pyarrow-docker-test |
Backtrace:
|
Closing this in favor of tracking progress in ARROW-2651 |
As described in the [ARROW-2651](https://issues.apache.org/jira/browse/ARROW-2651) issue, this patch fixes the C datetime module import mechanism for PyPy. This is related to #2089 which was closed in favor of the JIRA issue. Authored-by: mattip <[email protected]> Signed-off-by: Antoine Pitrou <[email protected]>
Hi,
I'm trying to create
parquet
files withpypy
(using pyarrow) . After having spent quite a few hours on this I'm stuck. My base question is:pyarrow
withpypy
? 😄The built wheels can't be used for pypy (of course) so I'm trying to build pyarrow from source. To simplify I'm using the libarrow debian apt repository.
cp -r /arrow/cpp/src/arrow /arrow/python/build/temp.linux-x86_64-2.7/
)pypy setup.py build_ext --build-type=release
This fails with:
Full output on https://gist.github.com/bivald/01ab26bd6e5cedcf4d34354095d33bf2
I'm running all of this via Docker and can provide the Dockerfiles if anyone is interested. I guess my main question is:
Does anyone know any fundamentals that hinders pyarrow on pypy?
Nowadays pypy supports numpy, pandas and (not sure about Cython)
Normally you can often use pure python implementations when your on pypy, but the only I found for parquet is read-only. Worst case I'll posix spawn a "normal" python process, but would love to get it working properly.
The background is that I have several workers which run on pypy and I'm shifting them to produce parquet files over csv. The next step in the process uses CPython so parquet works great there.
The text was updated successfully, but these errors were encountered: