Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with r-arrow=12 #1470

Closed
euklid321 opened this issue Aug 31, 2023 · 11 comments
Closed

Segfault with r-arrow=12 #1470

euklid321 opened this issue Aug 31, 2023 · 11 comments
Labels

Comments

@euklid321
Copy link

Importing pyarrow.parquet with reticulate results in a segfault. The error below disappears if I downgrade to r-arrow=11.

Rscript -e "reticulate::py_run_string('import pyarrow.parquet')"

WARNING: Logging before InitGoogleLogging() is written to STDERR
W20230831 15:39:32.166492  2661 s3fs.cc:2598]  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

 *** caught segfault ***
address 0x50, cause 'memory not mapped'
An irrecoverable exception occurred. R is aborting now ...
Segmentation fault (core dumped)

arrow                     1.2.3              pyhd8ed1ab_0    conda-forge
libarrow                  12.0.1           hb9dc469_9_cpu    conda-forge
pyarrow                   12.0.1          py310hf9e7431_9_cpu    conda-forge
r-arrow                   12.0.1            r43h59595ed_0    conda-forge
r-reticulate              1.31              r43ha503ecb_0    conda-forge
@t-kalinowski
Copy link
Member

Does this error occur outside of conda? If you create a virtual environment, does the error go away?

@euklid321
Copy link
Author

Does this error occur outside of conda? If you create a virtual environment, does the error go away?

It also occurs in a virtual environment:

(venv) ➜  ~ pip freeze
numpy==1.25.2
pyarrow==12.0.0
(venv) ➜  ~ Rscript -e "reticulate::py_run_string('import pyarrow.parquet')"
/Users/runner/work/crossbow/crossbow/arrow/cpp/src/arrow/filesystem/s3fs.cc:2598:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

 *** caught segfault ***
address 0x0, cause 'unknown'
An irrecoverable exception occurred. R is aborting now ...
[1]    8698 segmentation fault  Rscript -e "reticulate::py_run_string('import pyarrow.parquet')"

@t-kalinowski
Copy link
Member

Hi, I tried but am unable to reproduce the error.

@t-kalinowski t-kalinowski added the reprex Can't reproduce label Sep 6, 2023
@euklid321
Copy link
Author

I encounter this issue with r-arrow=12 and r-arrow=13 and on osx-arm64 as well as on linux64.

@t-kalinowski
Copy link
Member

Can you please provide code/instructions I can copy-paste into R console for creating a venv and installing the necessary packages so that I can reproduce the error?

@euklid321
Copy link
Author

I think the easiest is the following:

conda create -n demo pyarrow=13 r-reticulate
conda run -n demo Rscript -e "reticulate::py_run_string('import pyarrow.parquet')"

Can you reproduce the error with these commands?

@t-kalinowski
Copy link
Member

Is R originating from conda as well? If not, then binary incompatibilities are inevitable when mixing conda and non-conda binaries. You can easily install a pre-built whl from pip that should work w/ your R installation, like this:

library(reticulate)
if (is.null(virtualenv_starter(">=3.9"))) 
  install_python("3.9:latest")
reticulate::virtualenv_create("r-pyarrow", "3.10", packages = "pyarrow")

Then in a fresh R session, this should just work:

library(reticulate)
pq <- import("pyarrow.parquet")

If you're initializing python some other way (not with the import() call), you may want to first call use_virtualenv("r-pyarrow")

@euklid321
Copy link
Author

Is R originating from conda as well?

Yes, in my example, everything originates from conda.

I also tried your approach with a standalone R installation. Now, I only get a segfault warning, but not a segfault:

➜  ~ /usr/local/bin/Rscript -e "library('reticulate'); use_virtualenv('r-pyarrow'); py_config(); py_list_packages(); py_run_string('import pyarrow.parquet')"
python:         /Users/sascha/.virtualenvs/r-pyarrow/bin/python
libpython:      /Users/sascha/.pyenv/versions/3.9.18/lib/libpython3.9.dylib
pythonhome:     /Users/sascha/.virtualenvs/r-pyarrow:/Users/sascha/.virtualenvs/r-pyarrow
version:        3.9.18 (main, Oct  4 2023, 19:14:29)  [Clang 15.0.0 (clang-1500.0.40.1)]
numpy:          /Users/sascha/.virtualenvs/r-pyarrow/lib/python3.9/site-packages/numpy
numpy_version:  1.26.0

NOTE: Python version was forced by use_python() function
  package version     requirement
1   numpy  1.26.0   numpy==1.26.0
2 pyarrow  13.0.0 pyarrow==13.0.0
/Users/voltrondata/github-actions-runner/_work/crossbow/crossbow/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

@t-kalinowski
Copy link
Member

Thanks, I can reproduce in reticulate, and also don't see the same warning when running under python directly.

Looking at upstream, it seems that there was a similar issue (same warning) emitted by arrow for java: apache/arrow#36934. We'll probably need a similar PR upstream to fix this.

The warning is being emitted from here: https://github.com/apache/arrow/blob/02de3c1789460304e958936b78d60f824921c250/cpp/src/arrow/filesystem/s3fs.cc#L2993, likely being called in python here: https://github.com/apache/arrow/blob/02de3c1789460304e958936b78d60f824921c250/python/pyarrow/_s3fs.pyx#L70

@xhochy
Copy link

xhochy commented Oct 26, 2023

@euklid321 Can you check whether conda-forge/arrow-cpp-feedstock#1211 helps with your problem? You would need to upgrade to r-arrow=13 for this though.

@euklid321
Copy link
Author

@euklid321 Can you check whether conda-forge/arrow-cpp-feedstock#1211 helps with your problem? You would need to upgrade to r-arrow=13 for this though.

I don't get the error any longer with the latest libarrow=13.0.0 build. 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants