Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider making it possible to borrow PyRef in #[pyclass] #1089

Open
davidhewitt opened this issue Aug 7, 2020 · 20 comments
Open

Consider making it possible to borrow PyRef in #[pyclass] #1089

davidhewitt opened this issue Aug 7, 2020 · 20 comments

Comments

@davidhewitt
Copy link
Member

At the moment, as #1088 notes (and also #1085), it's difficult to expose things like iterators to Python because of the restriction that #[pyclass] cannot take a lifetime.

This restriction obviously makes sense because Rust-managed data cannot be borrowed and then exposed to Python; Python will allow this data to live arbitrarily long.

However, data that's already owned by Python, e.g. in a PyCell<T>, can be safely "borrowed" and stored in another #[pyclass]. We do this already, but only for the lifetime-free case Py<T>.

It would be incredibly powerful if PyRef<'a, T> (or some similar construct) was able to be stored inside another #[pyclass]. This would make make wrapping arbitrary iterators, for example, much easier.

This playground shows that with some carefully-placed transmutes it's possible to hack the type system to at least achieve the desired affect: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=041f9a6f5dfa3a5fc59ece7105cc4dfd

That code is a wildly unsafe sketch that doesn't actually use the PyO3 types, so please don't try to use it as-is. You're responsible for all segfaults if you do 😄. If you're finding yourself in need of this feature, I can offer mentoring to help design how PyO3 could make such a trick safe and ergonomic.

@messense
Copy link
Member

messense commented Mar 9, 2021

Hitting #1088 when I'm trying to expose crfs::Model which has a lifetime parameter to Python. It would be great if this code works when data is a Python managed buffer.

#[pyclass]
struct Model<'a> {
    data: &'a [u8],  // or maybe &'a PyBytes
    model: crfs::Model<'a>,
}

@davidhewitt
Copy link
Member Author

Last time I looked at this I concluded that having the lifetime parameter on the #[pyclass] is impossible, however something along this kind of line could work:

#[pyclass]
struct PyModel {
    data: PyBytes
    accessor: Box<dyn for<'py> Fn(Python<'py>) -> crfs::Model<'py>>
}

Rather than Box<dyn Fn> it would be nice to have a trait, but I think it might require generic associated types.

I haven't thought too hard in this space. I'm sure that something can be possible but it just needs research!

@messense
Copy link
Member

That looks a lot like what ouroboros does for self-referential struct.

@kngwyu
Copy link
Member

kngwyu commented Mar 10, 2021

Looks interesting 👀
Maybe we can extend #[new] to be enabled to take Python<'py>.

@messense
Copy link
Member

messense commented Mar 10, 2021

I was able to get something working with ouroboros using self-referential struct to remove the lifetime parameter: messense/crfs-rs#3

@davidhewitt
Copy link
Member Author

davidhewitt commented Mar 13, 2021

That's really cool! I think we might be able to write an iterator example using that 🚀 (#1085)

@alex
Copy link
Contributor

alex commented Jun 26, 2021

So, the natural extension of this would be if, instead of copying to a Vec<u8>, we could simply do a borrow from a Py<PyBytes>. I believe this should be safe for the same reasons.

@davidhewitt
Copy link
Member Author

The hard bit is how to correctly constrain this to python-owned data. For example, &[u8] isn't enough; if we enabled something like the below I can easily write unsound code:

#[pyclass]
struct WithBorrowedData<'py> {
    py: Python<'py>,
    data: &'py [u8],
}

// trivial to create a function which invalidates lifetimes
#[pyfunction]
fn use_after_free(py: Python) -> PyResult<PyObject> {
    let data = vec![];
    Py::new(WithBorrowedData { py, data: data.as_slice() }.into()
}

// The return value of use_after_free contains an invalid data reference

I've been thinking that PyRef (or something similar) would need to be used as a wrapper. Maybe a PyRef::map function would be useful.

Tbh this potentially also ties in with #1308 and how we represent owned / borrowed python data in general.

@alex
Copy link
Contributor

alex commented Jun 27, 2021

Yes, I think the desired behavior is something like:

#[ouroboros::self_referencing]
struct Foo {
    owner: Py<PyBytes>,
    #[borrows(owner)]
    slice: &'this [u8]
}

#[pyfunction]
fn f(data: Py<PyBytes>) {
    Foo::new(data, |data| data.as_bytes())
}

Effectively the same pattern as https://github.com/pyca/cryptography/blob/main/src/rust/src/ocsp.rs#L32-L42, except without copying to a Vec.

This should be safe because if you have a Py<PyBytes> you own a reference to it, and PyBytes are immutable, so it will never be freed behind your back.

@davidhewitt
Copy link
Member Author

If I understand correctly, reason that the above doesn't currently work with #[ourobouros::self_referencing] is because the slice access closure doesn't have access to a Python GIL token? It seems like that should be solvable by providing an equivalent of ourobouros::self_referencing inside PyO3, but that is probably a lot of work...

@alex
Copy link
Contributor

alex commented Jun 27, 2021 via email

@jovenlin0527
Copy link
Contributor

This is my attempt at allowing a PyClass to hold a reference to another PyClass.
I think it is safe, but not sure if it has other issues.

https://gist.github.com/b05902132/de18debe9039473aa9b0bed6b48436c8

As an example, I wrote a simple class backed by Vec and its iterator. Hope this helps.

@davidhewitt
Copy link
Member Author

davidhewitt commented Nov 17, 2021

👍 thanks, that's very interesting and roughly along the lines of what I was thinking. It would be nice to be able to handle that mapped value as a real Python object too. That'll provide a way for us to eventually avoid cloning in #1358

(This will probably require lots of changes to the internal of PyCell.)

@messense
Copy link
Member

Hitting this again in https://github.com/messense/tree-sitter-py/blob/05d5adde312afd22c8ac9f59f07731e17cf9a71a/src/lib.rs#L116-L122 but this time ouroboros can't help. The error message when uncommented is

error[E0495]: cannot infer an appropriate lifetime for autoref due to conflicting requirements
   --> src/lib.rs:118:18
    |
118 |             self.borrow_cursor().node()
    |                  ^^^^^^^^^^^^^
    |
note: first, the lifetime cannot outlive the anonymous lifetime defined on the method body at 116:13...
   --> src/lib.rs:116:13
    |
116 |     fn node(&self) -> PyNode {
    |             ^^^^^
note: ...so that reference does not outlive borrowed content
   --> src/lib.rs:118:13
    |
118 |             self.borrow_cursor().node()
    |             ^^^^
note: but, the lifetime must be valid for the anonymous lifetime #1 defined on the body at 117:49...
   --> src/lib.rs:117:49
    |
117 |           PyNode::new(self.borrow_tree().clone(), |tree| {
    |  _________________________________________________^
118 | |             self.borrow_cursor().node()
119 | |         })
    | |_________^
note: ...so that the expression is assignable
   --> src/lib.rs:118:13
    |
118 |             self.borrow_cursor().node()
    |             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    = note: expected `Node<'_>`
               found `Node<'_>`

Basically tree-sitter has aTree struct that can produce a Node<'_> and a TreeCursor<'_>, then TreeCursor<'_> can also produce a Node<'_>, it use PhantomData to enforce lifetime variance.

@davidhewitt
Copy link
Member Author

Note to self: when looking at the internals of #[pyclass] to make this possible, also think about use of Box / Arc and how is might impact adding conversion implementations for them, xref #3014 #3729

@Alphare
Copy link

Alphare commented Feb 12, 2024

FYI other Mercurial developers and I added this feature (or at least something similar) in rust-cpython back in 2019: http://dgrunwald.github.io/rust-cpython/doc/cpython/struct.PySharedRefCell.html

I don't know if something similar can be achieved, but I figured you'd like to know. I would certainly need this feature to actually use PyO3 for non-trivial classes within the context of Mercurial.

@alex
Copy link
Contributor

alex commented Feb 12, 2024

Fwiw the thing I described above, borrowing from a PyByted works, and is what cryptography uses these Days

@adamreichold
Copy link
Member

From the linked docs, it seems that using this requires unsafe code? If so, one can already use unsafe code to replace lifetimes by 'static. I think this issue is about providing a safe interface to achieve this.

@alex
Copy link
Contributor

alex commented Feb 12, 2024 via email

@adamreichold
Copy link
Member

Sorry, this was us racing GitHub comments. I was referring to PySharedRefCell apparently requiring unsafe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants