-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R] Segfault on flight test #35391
Comments
The flight tests are also segfaulting in python (on osx), see here |
ICYMI @lidavidm @jorisvandenbossche |
Other potentially related issue: |
That sounds like the same thing--didn't realize it had been happening for so long. It does seem more frequent now though, or has re-emerged after being fixed. |
paleolimbot
added a commit
that referenced
this issue
May 30, 2023
… any Array references (#35812) This was identified and 99% debugged by @ lgautier on rpy2/rpy2-arrow#11 . Thank you! I have no idea why this does anything; however, the `RStringViewer` class *was* holding on to an unnecessary Array reference and this seemed to fix the crash for me. Maybe a circular reference? The reprex I was using (provided by @ lgautier) was: Install fresh deps: ```bash pip3 install pandas pyarrow rpy2-arrow R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/")' ``` Run this python script: ```python import pandas as pd import pyarrow from rpy2.robjects.packages import importr import rpy2.robjects import rpy2_arrow.arrow as pyra base = importr('base') nanoarrow = importr('nanoarrow') code = """ function(df) { # df$col1 # no segfault on exit # I(df$col1) # no segfault on exit # df$col2 # no segfault on exit I(df$col2) # segfault on exit } """ rfunction = rpy2.robjects.r(code) pd_df = pd.DataFrame({ "col1": range(10), "col2":["a" for num in range(10)] }) pd_tbl = pyarrow.Table.from_pandas(pd_df) r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl) r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl)) output = rfunction(r_df) print(output) ``` Before this PR (installing R/arrow from main) I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" zsh: segmentation fault python reprex-arrow.py ``` After this PR I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" ``` (with no segfault) I wonder if this also will help with #35391 since it's also a segfault involving the Python <-> R bridge. * Closes: #34897 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
thisisnic
pushed a commit
to thisisnic/arrow
that referenced
this issue
Jun 6, 2023
…ot own any Array references (apache#35812) This was identified and 99% debugged by @ lgautier on rpy2/rpy2-arrow#11 . Thank you! I have no idea why this does anything; however, the `RStringViewer` class *was* holding on to an unnecessary Array reference and this seemed to fix the crash for me. Maybe a circular reference? The reprex I was using (provided by @ lgautier) was: Install fresh deps: ```bash pip3 install pandas pyarrow rpy2-arrow R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/")' ``` Run this python script: ```python import pandas as pd import pyarrow from rpy2.robjects.packages import importr import rpy2.robjects import rpy2_arrow.arrow as pyra base = importr('base') nanoarrow = importr('nanoarrow') code = """ function(df) { # df$col1 # no segfault on exit # I(df$col1) # no segfault on exit # df$col2 # no segfault on exit I(df$col2) # segfault on exit } """ rfunction = rpy2.robjects.r(code) pd_df = pd.DataFrame({ "col1": range(10), "col2":["a" for num in range(10)] }) pd_tbl = pyarrow.Table.from_pandas(pd_df) r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl) r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl)) output = rfunction(r_df) print(output) ``` Before this PR (installing R/arrow from main) I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" zsh: segmentation fault python reprex-arrow.py ``` After this PR I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" ``` (with no segfault) I wonder if this also will help with apache#35391 since it's also a segfault involving the Python <-> R bridge. * Closes: apache#34897 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
thisisnic
pushed a commit
to thisisnic/arrow
that referenced
this issue
Jun 13, 2023
…ot own any Array references (apache#35812) This was identified and 99% debugged by @ lgautier on rpy2/rpy2-arrow#11 . Thank you! I have no idea why this does anything; however, the `RStringViewer` class *was* holding on to an unnecessary Array reference and this seemed to fix the crash for me. Maybe a circular reference? The reprex I was using (provided by @ lgautier) was: Install fresh deps: ```bash pip3 install pandas pyarrow rpy2-arrow R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/")' ``` Run this python script: ```python import pandas as pd import pyarrow from rpy2.robjects.packages import importr import rpy2.robjects import rpy2_arrow.arrow as pyra base = importr('base') nanoarrow = importr('nanoarrow') code = """ function(df) { # df$col1 # no segfault on exit # I(df$col1) # no segfault on exit # df$col2 # no segfault on exit I(df$col2) # segfault on exit } """ rfunction = rpy2.robjects.r(code) pd_df = pd.DataFrame({ "col1": range(10), "col2":["a" for num in range(10)] }) pd_tbl = pyarrow.Table.from_pandas(pd_df) r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl) r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl)) output = rfunction(r_df) print(output) ``` Before this PR (installing R/arrow from main) I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" zsh: segmentation fault python reprex-arrow.py ``` After this PR I get: ``` (.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a" ``` (with no segfault) I wonder if this also will help with apache#35391 since it's also a segfault involving the Python <-> R bridge. * Closes: apache#34897 Authored-by: Dewey Dunnington <[email protected]> Signed-off-by: Dewey Dunnington <[email protected]>
I believe this has been fixed by #35812 but feel free to reopen if not |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
This is happening pretty regularly on
main
now in the ubuntu "force tests" job. As a result, I haven't seen #35238 in a while because this segfault happens before we get to that point :/cc @paleolimbot
Component(s)
R
The text was updated successfully, but these errors were encountered: