Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: include numpy._core imports for NumPy 2.0 #4857

Merged
merged 11 commits into from
Sep 27, 2023
28 changes: 22 additions & 6 deletions include/pybind11/numpy.h
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,25 @@ inline numpy_internals &get_numpy_internals() {
return *ptr;
}

PYBIND11_NOINLINE module_ import_numpy_core_submodule(const char *submodule_name) {
try {
return module_::import((std::string("numpy._core.") + submodule_name).c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a (terse) comment here explaining when this is expected to work, and below similarly?

Rough idea:

// numpy._core was established with numpy PR ...
// numpy.core was the import needed up until numpy release ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I added a comment.

} catch (error_already_set &ex) {
if (!ex.matches(PyExc_ImportError)) {
throw;
}
try {
return module_::import((std::string("numpy.core.") + submodule_name).c_str());
} catch (error_already_set &ex) {
if (!ex.matches(PyExc_ImportError)) {
throw;
}
throw import_error(std::string("pybind11 couldn't import ") + submodule_name
+ " from numpy.");
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you have here looks acceptable, but I wonder if we could make this more robust pretty easily, along the lines of:

  • import numpy
  • Inspect numpy.__version__ or similar to decide if core or _core is needed, and try only that one import. (I'm hoping this might be more feasible than usual because you own the numpy API.)

This is less likely to be surprising if there is some low-level issue in pybind11 or Python itself (refactoring, new major dev version), or numpy is changed further in the future. The error message then could be the original import error, re-raised using raise_from (very easy):

inline void raise_from(error_already_set &err, PyObject *type, const char *message) {

Copy link
Contributor Author

@mtsokol mtsokol Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did similar changes for JAX and originally I also decided to go with numpy version checks in an if statement and then a single import.
But I was advised that try ... catch approach is more robust. Here's an explanation from Jake: jax-ml/jax#16972 (review)

Version checks are fine for tests, but I'd prefer to avoid them in package import paths – e.g. imagine a user who has some custom numpy build with a version string that we don't anticipate, and then import jax might fail on trying to cast the version to a tuple of ints.

But I'm happy to switch to a version check if you prefer it!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're dealing with 1 try vs 2 tries here, that's similar but not the same.

version check

I'm thinking of it more as a "capability check".

(Version numbers a convenient way to answer "what exactly do I have", that's a different purpose.)

I'm still hoping that we can rise to a level of organization that allows us to only have 1 import and report the original error: does or can numpy provide a simple API that allows us to only try once?

Copy link
Contributor Author

@mtsokol mtsokol Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I switched to an if statement that selects package path based on numpy version - now (_)core.multiarray or (_)core._internal is imported only once.
(it might look lengthy, but this is the recommended way to get major version integer: https://numpy.org/doc/stable/release/1.9.0-notes.html#numpyversion-class-added)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think I somehow lost a comment here that I thought I typed in already. Sorry if something appears twice.)

std::string numpy_core_path = major_version >= 2 ? "numpy._core" : "numpy.core";

What you have looks fine though, if you prefer, the only thing I'd do is remove the explicit std::string() inside the two if branches, i.e. I think simply numpy_core_path = "numpy._core"; will work and is what people usually do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can’t really check (at a conference), but what are we using the core module for? If it’s private that means NumPy is free to break us any time. Would be very nice to start moving to whatever is supported instead longer term.

Copy link
Collaborator

@henryiii henryiii Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only our tests? Ahh, no, multiarray. Is that available directly? Doesn’t seem to be. Is it supposed to stick around?

Copy link
Collaborator

@EthanSteinberg EthanSteinberg Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henryiii
We are using the core module to obtain access to numpy's private C API.

Yes, NumPy can break us at any time as we are relying on a private API. @rwgk and I discuss a long term solution in the main PR thread, which is moving us to Numpy's public Python API instead of their private C API.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Numpy has a public C API as well, but that's difficult for us to use because that would complicate our build process quite a bit).


template <typename T>
struct same_size {
template <typename U>
Expand Down Expand Up @@ -263,7 +282,7 @@ struct npy_api {
};

static npy_api lookup() {
module_ m = module_::import("numpy.core.multiarray");
module_ m = detail::import_numpy_core_submodule("multiarray");
auto c = m.attr("_ARRAY_API");
void **api_ptr = (void **) PyCapsule_GetPointer(c.ptr(), nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an existing problem, but could you please add this fix or similar while you're at it (untested)?

if (api_ptr == nullptr) {
    raise_from(PyExc_SystemError, "FAILURE obtaining numpy _ARRAY_API pointer.");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added this if statement - looks good to me.

npy_api api;
Expand Down Expand Up @@ -626,11 +645,8 @@ class dtype : public object {

private:
static object _dtype_from_pep3118() {
static PyObject *obj = module_::import("numpy.core._internal")
.attr("_dtype_from_pep3118")
.cast<object>()
.release()
.ptr();
module_ m = detail::import_numpy_core_submodule("_internal");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk I would bet this change right here is causing the deadlocking. Look at how the old code did it all in one static call so the import was only once (even if this function got called a bunch).

The new code is importing _internal every time this function is called.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was my first suspicion, too, but that change by itself did not help.

Then I decided to roll back and debug after production is in a healthy state again.

Sorry I should have mentioned this before. (I'm currently battling multiple fires.)

Almost certainly we should change the code here back, but I want to get to the bottom of the timeouts first. (I only have an internal reproducer at the moment. It involves running a test 100 times, ~1/3 of the tests time out.)

static PyObject *obj = m.attr("_dtype_from_pep3118").cast<object>().release().ptr();
return reinterpret_borrow<object>(obj);
}

Expand Down