-
Notifications
You must be signed in to change notification settings - Fork 758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for nogil Python #2885
Conversation
Windows does not have the SOABI sysconfig variable set.
To be honest, I am highly sceptical as if I understand this correctly and
We currently rely on the GIL protecting our data structures and global state in various places and protecting them separately would add overhead to the common usage with CPython or a much higher maintenance burden to abstract away the GIL-versus-separate-locks problem. (It would also very much complicate reasoning as we would need to assume separate locks when evaluating safety.) I think following our usual standards regarding soundness we could not release a version of the PyO3 crate based on the changes presented here. |
Agree with both of you. I'm definitely enthusiastic to support PEP 703, as I think that it would be a massive long-term win for Rust/Python interop if the GIL were removed. That said, @adamreichold is absolutely right that this is a long way from mergeable / releasable as-is. At a minimum, this is going to need additional CI jobs for the nogil fork on Windows, Mac and Linux. The soundness issues are also a major concern. One which springs to mind now is With passing CI jobs and a reasonable effort to identify what would be unsound and how it's mitigated, I can see us merging this. Certainly in the long term this work would be necessary to support PEP 703, and if supporting nogil helps push the PEP, as long as the maintenance burden here isn't enormous I'd be ok with merging nogil support too.
As a non-optimal short-term fallback we could always have a static global mutex which all PyO3 APIs could use internally 😋 (only half serious, if there are better options we should take them). |
#[inline] | ||
#[cfg(not(Py_NOGIL))] | ||
pub unsafe fn PyList_FetchItem(list: *mut PyObject, index: Py_ssize_t) -> *mut PyObject { | ||
_Py_XNewRef(PyList_GetItem(list, index)) | ||
} | ||
|
||
#[inline] | ||
#[cfg(Py_NOGIL)] | ||
pub unsafe fn PyList_FetchItem(list: *mut PyObject, index: Py_ssize_t) -> *mut PyObject { | ||
_PyList_FetchItem(list, index) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH, adding these provisional APIs and changing PyO3 to use them is a 👎 from me, the callsites which you modified should already be converting borrowed references to owned, and if they didn't, that's a separate bug.
EDIT: I reread the PEP and see now that it's necessary for thread-safety that the borrowed-to-owned conversion is done within the interpreter rather than PyO3. So I guess this would indeed be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea behind this proposed change is that, when running without the GIL, the retrieval from list and the acquisition of an owned reference needs to be performed atomically in relation to any concurrent modifications. The PEP 703 proposes PyList_FetchItem
and PyDict_GetItem
which have this behavior -- they return owned references and are safe in the face of concurrent modification.
I am not sure about the So for downstream projects to remain sound, we might end up needing to do our own locking to give the
which might end up defeating the purpose supporting nogil in the first place. (My scepticism is not so much about supporting nogil, but rather about doing it with a simple compile time switch to change things behind the scenes while keeping an API which is tailored towards the GIL-based CPython world.) |
Can you point me to a few instances of using the GIL to protect other data structures, such as in rust-numpy? @davidhewitt mentioned GILOnceCell. The conversion of borrowed references in dict.rs and list.rs is another example, but that seems easy to address. |
This is a very good point, and one that I hadn't considered the finer details of. I think in an ideal world this means we would want some PyO3 APIs to have external differences when running on nogil so that unsound uses would not compile. More consideration needed. At its most crude, a |
Thought - |
Examples are protecting cached capsule pointers in https://github.com/PyO3/rust-numpy/blob/main/src/npyffi/array.rs#L60 and https://github.com/PyO3/rust-numpy/blob/main/src/npyffi/ufunc.rs#L38 as well as access to the global state backing the dynamic borrow checking in https://github.com/PyO3/rust-numpy/blob/main/src/borrow/shared.rs#L107 But as @davidhewitt discussed above, this is not so much about changing the various usage sites to use an independent synchronization scheme, but rather about PyO3 providing a sound API contract for the So either have to remove the As for a contract that would work in both cases, I can only imagine that |
Thinking on this more overnight, I think the assessment above is correct - we'd need the A I think PyO3's "borrowed references" For the sake of supporting experimentation, how about the following:
I'd hope that by providing a way for experimentation in nogil that it encourages downstream packages to at least consider what they would look like in a nogil context, even if they have to declare that they don't support it for now. |
Another obvious soundness hole is our This also raises an argument for making progress on #1979 and move |
I think this case could be handled using a single On the other hand, we do not need full locks here as we do not need to provide blocking behaviour at all. Just as we fail now if the cell is already borrowed, we could just check an atomic usage counter and fail if it has the wrong value without risk of deadlocks. (The ECS hecs for example uses atomic borrow counts to allow multi-threaded usage without imposting locking costs onto all users. One just has to structure the systems accessing the data to not overlap by design but there is safety hazard besides a |
I'm generally not a fan of adding more conditional compilation, but one way to fix that is to just hide it behind
That's the direction I'd take - rather than implement something ourselves that might be rather implicit and error prone, users should implement their own interior mutability if they need it. |
As interacting with Python mandates shared ownership, I don't think it is reasonable to push this out to our users completely. We should at least provide a tool box of well tested generally used solutions. If some code needs a special mechanism, they can use a frozen pyclass and add their own layer of interior mutability as you suggest. But I still think we need to provide something that works out of the box. |
I considered this and am split. It would avoid deadlocks but I think with high concurrency it would be easy for users to hit racy errors and end up wanting to have some level of blocking to wait until the object can be written to.
I agree that |
I am not trying to argue for not changing the default. I am arguing for shipping and documenting something like
If |
#[inline] | ||
#[cfg(any(py_sys_config = "Py_REF_DEBUG", Py_NOGIL))] | ||
pub unsafe fn Py_INCREF(op: *mut PyObject) { | ||
Py_IncRef(op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colesbury - I understood that for the CPython stable API there was a discussion about making refcounting details internal to CPython but it was deemed infeasible due to performance regression.
Here this patch seems to do exactly that. Are you able to comment on the estimated performance impact for extensions by changing refcounting to be through an FFI call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have heard something similar, but I can't find the original source and don't know for which extensions it was deemed too great a cost.
There didn't seem to be an noticeable performance impact for the cryptography extension, but I don't know enough to make any general estimates.
Can someone (like @adamreichold or @davidhewitt) explain how
|
I'd say a good summary would be that This is complicated somewhat by the borrow checking being pluggable to avoid its overhead for immutable types shared the Python heap. You can find the various implementations including code that looks exactly like |
Maybe this saying it better: Since in Python, basically all objects are basically always shared, we can only produce shared references and if your Rust type is fine with that (because it is immutable or already uses interior mutability), then you do not need the borrow checking. ( |
The basic principle is that Rust's references, by definition, for a given piece of data allow for at most one of the following to exist at a time:
(To violate this rule is UB.) The purpose of Rust's
The |
I would like to amend this: How this internal state is modified is exactly the same as for |
Thank you both for the explanations. If I understand correctly:
@davidhewitt mentioned the possibility of using Is my understanding correct? |
I'm re-reading the thread now that I better understand PyCell and see that the above is among the things you are considering. |
I think there is one issue here, mainly
because just producing a The above is also independent of So I think that before discussing how to enable interior mutability in a nogil world - "the To enable us to produce a shared reference into the Python heap, I think we need to ensure that the current thread owns at least one globally visible reference to the object in question. (I suspect this usually fulfilled automatically as it is currently hard to pass a Python object to Rust code without elevating its reference count, but I guess the nogil fork might change the globally visible part of the above requirement.) Furthermore, we probably need to enforce a Alternatively, if we do not want enforce These options are also not really exclusive: We could have a bit in the object header indicating thread safety and enforce the locking/pinning procedure only for types which do not have the sync bit set. In this approach, one might even get away with a global lock under the assumption that it is only a fallback and all performance-critical Python objects will enable the sync bit eventually. |
If you want to see the fallout of this, you could try making some more changes: You need change this PyClassPyO3Options's pyo3/pyo3-macros-backend/src/pyclass.rs Lines 44 to 46 in cb38ff0
and put a Sync bound on Line 849 in cb38ff0
Note that this isn't enough to ensure soundness, but you should be able to see how "reasonable" this would be to users. |
I suspect that would involve reference wrappers like
I think there are two directions in which a One is for implementing The other is for controlling what objects on the Python heap are accessed by shared references, i.e. basically whether As written above, personally I do not think so without reifying that property in the Python object header, i.e. having objects opt into being sync. Otherwise I fear for I also think this ties in directly with questions like
because I don't think this can answered by just considering the Of course, for me the converse position also holds, i.e. without something like a sync bit in the object header but using a global lock for PyO3, I would not produce shared references to anything stored in the Python heap but the Footnotes
|
Maybe just for the record: I would support adding a |
@adamreichold wrote:
I'm not sure I understand your position. PyO3 already makes thread safety assumptions about For example:
|
From a Rust PoV, these would be considered memory safety bugs, not something that we should design our API around.
Really? What's the mechanism for crashing here, since |
Can you clarify? Pyo3 does not let users do this. |
Indeed, not everything fully safe yet but that does not appear a good argument for making things worse to me8 and I agree with @birkenfeld's
Or if Python had something like For example, CPython has made significant progress in that direction in the past, e.g. when GIL is released, then
But indeed especially NumPy is still a big problem as it releases the GIL when operating on primitives without any other synchronization so our own dynamic borrow checking currently considers even pure Python code operating on NumPy arrays unsafe/trusted as it is able to produce data races without using any native code (besides NumPy itself). But as @birkenfeld said this is not something we should aim for when designing future contracts and interfaces. Ideally, we will get a cross-language protocol for borrowing the interior of large buffers like NumPy arrays and things like I also do understand that the nogil fork is mainly about being able to use all that existing Python software in a more scalable manner and I am not opposed to that. But I would prefer if we do not do it blindly and try to include more safety features into the approach so that e.g. the scientists using our code do not need to become experts on numerical stability or memory orderings to determine if their usage is correct. Someone said that Rust feels like doing parkour while hanging on strings and wearing protective gear and I think that this is indeed the goal we should strive for when designing our programming interfaces. |
The point of these examples is in response to:
Basically, I think I think it's good and practical to put extra constraints on projects built with PyO3 (like the @birkenfeld wrote:
I agree that PyO3 should not design its API around this. You can consider these "bugs" in the projects, but that's mostly a label of convenience. (OTOH, the
Nearly all the GIL thread-safety issues have the same form: you assume so invariant which is broken because some Python API call releases the GIL. The list issues are often of the form:
The general problem is that a huge number of Python API calls potentially releases the GIL. The most obvious causes are Py_DECREF, because destructors can call arbitrary code, and allocations because they may trigger a GC, which can call arbitrary code. These issues are generally both re-entrancy and thread-safety hazards. This is essentially the same issue that can lead to GILOnceCell initialization being called multiple times, although in that case its documented behavior, while in the list implementation is clearly a bug. The use of a single bytecode is a red herring. The implementations for most bytecodes are complex enough that they may end up releasing the GIL at some intermediate point. @mejrs wrote:
Here is the example I linked to in a previous comment. The Python environment is inherently shared so you can get the same https://gist.github.com/colesbury/b01e645546114977bff8d7babbb05f29 @adamreichold wrote:
I know
I generally agree, but I don't think this is a blind approach any more than todays approach is blind. |
Of course, you're right. It's a can of worms (or maybe, little snakelets). |
Maybe to clarify my intent: I am not arguing for reifying thread safety and the whole ecosystem to bend over backwards to protect the purity of our I am just not sure typical usage of the CPython API from C has enough control over references into the Python heap to enforce this, or rather to exploit it if all involved objects are indeed thread safe. I am also not sure if a flag in the object header is a sensible mechanism to achieve this. I am convinced that native code should explicitly opt into the increase in parallelism though.
As written above this is really a sore spot and the reason even pure Python code using NumPy is considered unsafe/trusted. It is also a reason why we put our borrow checking into a C API compatible capsule in the hope that it might see more widespread use outside of rust-numpy. But again, I don't think the main problem is rust-numpy's safe API being unsound strictly speaking but rather that plain Python code can produce data races without any safeguards in the first place and do so as easily as performing numerical operations like += on NumPy arrays. ... Switching gears, I suspect we should try to refocus the discussion away from the sync bit thing. We explained our positions and I think it is alright if we do not agree for now. Especially since we have already identified multiple things in PyO3 that we do need to work on:
|
This is totally fine though, this &PyAny cannot be used while the gil is not held. |
@adamreichold wrote:
Great -- I'll try to prototype these things, but probably won't get to it immediately. (Of course, if anyone else is interested in working on those things and has the time, that would be great too.) |
I would advise waiting for a commitment from @davidhewitt before investing significant amounts of your time into these things though for he is the final authority on the direction of PyO3's development and I am not sure if he has had time to read through and comment on this thread yet. |
I would also recommend you wait for davidhewitts thoughts before putting in a lot of effort, but I'd like to think we all share authority regarding direction here. |
Sorry for some delay from me - I've been reading the discussion and thinking a bit about this but have been AWOL for a couple days with a sick family. Better again for now. The discussion above sounds good to me. To re-spell a couple of the above in my own thoughts:
|
It would be nice to have something like crater for this as I would expect the breakage to be manageable if we would couple this with making Put differently, if we do go for a split of Of course, there is the argument (which is a big part of the PEP as well), that forcing everything to be thread-safe is imposing that cost on applications which are not interested in multi-threading. Personally, I have to say though that if that level of efficiency is relevant, I would probably recommend moving away from an interpreted language in any case. |
A crater-like thing would be cool though I fear it'd be a lot of work to set up? I wonder if there's a way we can set up a deprecation warning to nudge users a release or two prior. (Maybe using macro hackery deep in the guts of |
Hello @colesbury Thank you for contributing to PyO3! The project is undergoing changes on its licensing model to better align with the rest of the Rust crates ecosystem. Before your work gets merged, please express your consent by leaving a comment on this pull request. Thank you! |
Very excited to see discussion of adding
It's slightly off topic for this thread, but we have previously observed a case like this when using the |
@colesbury I just got this going in https://github.com/pantsbuild/pants which is a heavy user of Python+PyO3 in a multi-threaded application. We use Rust's I have a branch able to run it with an updated version of your fork of PyO3 (for some recent 0.18 fixes here) and the So far I've seen the occasional seg fault (first one I snipped below) and at least one "Already mutably borrowed". Oh and an ocean of Let me know if you want the cores or anything else that be be of assistance. |
Hi @thejcannon and @stuhood - thanks for taking a look at this. I've paused work on this PR until a decision is made on the PEP 703. I plan to submit it to the steering council soon, and will come back to it once there's a decision. |
Oh exciting! Absolute best of luck. Hopefully they say yes and we can reap the rewards 😈 |
Sounds positive for the future of nogil, how exciting! We'd better start figuring out the details here |
Closing this in favour of #4265 and all the work we're putting in to 0.23! |
Per the discussion in Discourse, these are the changes to PyO3 to work with the nogil proof-of-concept implementation. I've rebased these changes from 0.15.2 and done some light testing locally.
The primary purpose of this PR is to discuss the changes PyO3 would need to support PEP 703. Those changes would be slightly different because the PEP proposes an ABI flag like "cp312n" whereas the "nogil" fork currently looks something like "nogil39". Additionally, the reference count fields are slightly different sizes in the PEP vs. in the "nogil" fork (basically
Py_ssize_t
in the PEP andu32
in the fork). Otherwise, I think the changes would be similar.Additionally, if you're interested, it would be great for PyO3 to support the "nogil" fork (i.e., to get this PR in a state where it can be merged). That would make it easier to support projects that depend on PyO3 in "nogil" Python (and may help the PEP too). We've done something similar in [0.29] Add configuration for nogil Python (gh-4912) cython/cython#4914. (Of course, if it ever feels like too much of a maintenance burden in the future, you should feel free to drop it.)