Skip to content

Commit

Permalink
apacheGH-34897: [R] Ensure that the RStringViewer helper class does n…
Browse files Browse the repository at this point in the history
…ot own any Array references (apache#35812)

This was identified and 99% debugged by @ lgautier on rpy2/rpy2-arrow#11 . Thank you!

I have no idea why this does anything; however, the `RStringViewer` class *was* holding on to an unnecessary Array reference and this seemed to fix the crash for me. Maybe a circular reference? The reprex I was using (provided by @ lgautier) was:

Install fresh deps:

```bash
pip3 install pandas pyarrow rpy2-arrow
R -e 'install.packages("arrow", repos = "https://cloud.r-project.org/")'
```

Run this python script:

```python
import pandas as pd
import pyarrow
from rpy2.robjects.packages import importr
import rpy2.robjects
import rpy2_arrow.arrow as pyra
base = importr('base')
nanoarrow = importr('nanoarrow')

code = """
    function(df) {
        # df$col1  # no segfault on exit
        # I(df$col1)  # no segfault on exit
        # df$col2  # no segfault on exit
        I(df$col2)  # segfault on exit
    }
"""
rfunction = rpy2.robjects.r(code)

pd_df = pd.DataFrame({
    "col1": range(10),
    "col2":["a" for num in range(10)]
})
pd_tbl = pyarrow.Table.from_pandas(pd_df)
r_tbl = pyra.pyarrow_table_to_r_table(pd_tbl)
r_df = base.as_data_frame(nanoarrow.as_nanoarrow_array_stream(r_tbl))

output = rfunction(r_df)
print(output)
```

Before this PR (installing R/arrow from main) I get:

```
(.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
 [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"

zsh: segmentation fault  python reprex-arrow.py
```

After this PR I get:

```
(.venv) dewey@ Deweys-Mac-mini 2023-05-29_rpy % python reprex-arrow.py
 [1] "a" "a" "a" "a" "a" "a" "a" "a" "a" "a"
```

(with no segfault)

I wonder if this also will help with apache#35391 since it's also a segfault involving the Python <-> R bridge.
* Closes: apache#34897

Authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
  • Loading branch information
paleolimbot authored and thisisnic committed Jun 13, 2023
1 parent 5014172 commit 91b885e
Showing 1 changed file with 4 additions and 2 deletions.
6 changes: 4 additions & 2 deletions r/src/altrep.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -743,6 +743,8 @@ struct AltrepVectorString : public AltrepVectorBase<AltrepVectorString<Type>> {

// Helper class to convert to R strings. We declare one of these for the
// class to avoid having to stack-allocate one for every STRING_ELT call.
// This class does not own a reference to any arrays: it is the caller's
// responsibility to ensure the Array lifetime exeeds that of the viewer.
struct RStringViewer {
RStringViewer() : strip_out_nuls_(false), nul_was_stripped_(false) {}

Expand Down Expand Up @@ -821,11 +823,11 @@ struct AltrepVectorString : public AltrepVectorBase<AltrepVectorString<Type>> {
}

void SetArray(const std::shared_ptr<Array>& array) {
array_ = array;
array_ = array.get();
string_array_ = internal::checked_cast<const StringArrayType*>(array.get());
}

std::shared_ptr<Array> array_;
const Array* array_;
const StringArrayType* string_array_;
std::string stripped_string_;
bool strip_out_nuls_;
Expand Down

0 comments on commit 91b885e

Please sign in to comment.