Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: segmentation fault with pd.NA and np.ndarray.__contains__ #31922

Closed
jorisvandenbossche opened this issue Feb 12, 2020 · 7 comments · Fixed by numpy/numpy#15553 or #36283
Closed

BUG: segmentation fault with pd.NA and np.ndarray.__contains__ #31922

jorisvandenbossche opened this issue Feb 12, 2020 · 7 comments · Fixed by numpy/numpy#15553 or #36283
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Feb 12, 2020

Testing for membership in an ndarray segfaults:

In [1]: pd.NA in np.array(["a"], dtype=object) 
Segmentation fault (core dumped)

Now, this might be something that needs to be fixed in numpy, as for plain lists it "works":

In [3]: pd.NA in ['a']  
...
TypeError: boolean value of NA is ambiguous

but numpy might not expect an error instead of True/False.

@jorisvandenbossche jorisvandenbossche added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 12, 2020
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Feb 12, 2020
@jorisvandenbossche
Copy link
Member Author

cc @seberg

@seberg
Copy link
Contributor

seberg commented Feb 12, 2020

Thanks, looks like a missing error check in ndarray.__contains__ (that method is broken for other reasons):

    any = PyArray_Any((PyArrayObject *)res, NPY_MAXDIMS, NULL);
    Py_DECREF(res);
    ret = PyObject_IsTrue(any);
    Py_DECREF(any);

must test for any == NULL.

@seberg
Copy link
Contributor

seberg commented Feb 13, 2020

I hope this is fixed in master now, the whole issue is a bit more complex, but I think it should be generally safe now. But we have more issues around casting and ufunc execution when such errors occur, I hope you just do not notice it in practice, because those would not be fixed easily.

@jorisvandenbossche jorisvandenbossche added the Needs Tests Unit test(s) needed to prevent regressions label Feb 13, 2020
@jorisvandenbossche
Copy link
Member Author

Thanks for the quick follow-up!
We run tests with numpy master, so we can add a test that is only run on that build, to check it is actually fixed.

@jbrockmendel jbrockmendel added the Segfault Non-Recoverable Error label Feb 14, 2020
charris pushed a commit to charris/numpy that referenced this issue Mar 2, 2020
@mroeschke mroeschke added good first issue and removed Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays Segfault Non-Recoverable Error labels Apr 4, 2020
ylin00 added a commit to ylin00/pandas that referenced this issue Sep 11, 2020
@jreback jreback modified the milestones: Contributions Welcome, 1.2 Sep 11, 2020
@jreback
Copy link
Contributor

jreback commented Sep 21, 2020

reopening, it appears that this should be skipped on older numpies.

@jreback jreback reopened this Sep 21, 2020
@jreback
Copy link
Contributor

jreback commented Sep 21, 2020

@ylin00 would you like to re-vise the PR to do this?

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020
@jreback
Copy link
Contributor

jreback commented Nov 19, 2020

hmm this looks ok now. closing as fixed in #36283

@jreback jreback closed this as completed Nov 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
5 participants