Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental: owned objects API #1300

Closed
wants to merge 7 commits into from
Closed

Conversation

davidhewitt
Copy link
Member

This is an experimental extension to the pyo3 API to provide a new set of types which do not use pyo3's internal objects store.

There's two objectives from this new API:

Design

The goal with this PR is to keep the existing pyo3 API 100% backwards-compatible. There should be no changes needed to existing code if we merge this API. This allows it to be opt-in for now while we collect feedback from the new design, and make changes if needed.

PyO3 can make use of the new API internally to benefit from speedups where possible.

Because this is 100% backwards-compatible, and it's also a very big amount of changes, I would like to merge this API and start getting feedback. The promise to users is that we can change (or remove) the entire pyo3::experimental module at any time until we're happy with it.

When we are ready to make this new API official, the next steps would be to deprecate owned references and finally remove the pyo3 object store completely.

New APIs

In the pyo3::experimental module, I've added some new APIs, explained below. I've also made a pyo3::experimental::prelude module. Anyone who wants to test this API can swap this for the original prelude (along with the other new imports).

I welcome feedback and alternative designs for all of these. I think it's important to solve this problem of pyo3's memory management. I don't claim to have the answer. I think what I have here is a pretty good start.

New PyOwned<'py, T> type

To be able to return owned pointers from the new objects API there is a new PyOwned<'py, T> type. It's basically Py<T> with a Python<'py>, so it's bound to the GIL.

New objects::PyAny<'py> etc.

These "objects" are very similar to the existing types::PyAny (etc.). They are also used as references. This time they are always borrowed (because PyOwned is the owned reference).

An owned object PyOwned<'py, Any> will deref to &'a PyAny<'py> :

  • The 'a lifetime is Rust's usual lifetime for borrowed data. This is essential for tracking that the borrowed reference is only used while the owned object is kept alive.
  • The 'py lifetime is the GIL lifetime. It is different to 'a because the GIL lifetime is the same all the way through functions. If it is mixed in with 'a (like in the types API) it becomes shorter and shorter each time a PyOwned is borrowed, which is very annoying.

The motivation with these borrowed objects is exactly the same for the same reason we changed &'a PyAny to have the same memory representation as *mut ffi::PyObject. It's the fastest and most direct representation we can have for Python objects.

It also means that migrating to the new API is easier. Where we currently have

use pyo3::types::PyAny;
fn foo(&PyAny) { }

switching to the new API compiles immediately for this:

use pyo3::experimental::objects::PyAny;
fn foo(&PyAny) { }

For functions with return values, there is a little more work:

// Before 
use pyo3::{Python, types::PyAny};
fn foo(py: Python) -> &PyBytes {
    PyBytes::new(py, "")
}

// After
use pyo3::{Python, experimental::{PyOwned, objects::PyAny}};
fn foo(py: Python) -> PyOwned<Bytes> {
    PyBytes::new(py, "")
}

New types::Any etc.

The new 'py lifetime on &'a PyAny<'py> is important for the reasons above. However it's really annoying in Py<T> and PyOwned<'py, T>, because:

  • in Py<PyAny<'py>> there's no GIL, so the 'py lifetime doesn't make sense.
  • in PyOwned<'py, PyAny<'py>> we already have the lifetime; it's annoying to have to repeat it.

The solution I've come up with in this API is for the type parameter in Py<T> to be a type tag. In fact, it's just the same types::PyAny which we already use. I wanted to give it a different name so that we don't confuse it with the new objects::PyAny.

So I re-exported the old types without the Py- prefix, e.g. Any, Dict, Bytes, and so on. I chose this because it removes repetition of Py aswell - we just have Py<Any>, Py<Dict>, PyOwned<'py, Any> etc.

experimental::FromPyObject and experimental::PyTryFrom

To be compatible with &'a PyAny<'py>, these two traits needed to be changed to also have input lifetimes 'a and 'py.

For the moment I just copied them into the new experimental module and added the extra lifetime. I'm tempted to make this opportunity to rework the existing traits (if we want to make any changes). On the other hand, keeping them as similar as possible to the existing ones will make migration easier.

Benchmarks

I added some new benchmarks for the experimental API. To make the reference-releasing comparable with the old API I added unsafe { py.new_pool() } to the existing benchmarks.

In most benchmarks the new api is either the same or a little faster. For some that currently put a lot of owned references into pyo3's store, like dict iteration, the new API is significantly faster. (~30%). Even better, with the new API the memory consumption remains flat during iteration. In the existing API it grows as the references are put into pyo3's store.

Here's a full table of comparison:

benchmark current api experimental api
bench_call_0 52,895 ns/iter (+/- 6,956) 54,954 ns/iter (+/- 2,175)
bench_call_method_0 153,977 ns/iter (+/- 9,671) 153,001 ns/iter (+/- 6,288)
dict_get_item 1,933,300 ns/iter (+/- 328,382) 1,780,027 ns/iter (+/- 79,625)
extract_btreemap 9,538,110 ns/iter (+/- 634,895) 8,688,905 ns/iter (+/- 590,359)
extract_btreeset 8,099,900 ns/iter (+/- 570,799) 7,842,395 ns/iter (+/- 535,694)
extract_hashmap 5,013,300 ns/iter (+/- 483,341) 4,292,595 ns/iter (+/- 358,152)
extract_hashset 5,418,065 ns/iter (+/- 440,259) 4,900,090 ns/iter (+/- 355,389)
iter_dict 1,759,862 ns/iter (+/- 209,509) 1,304,683 ns/iter (+/- 133,726)
iter_list 1,262,143 ns/iter (+/- 76,108) 1,109,227 ns/iter (+/- 52,926)
iter_set 1,257,083 ns/iter (+/- 100,680) 1,037,567 ns/iter (+/- 120,639)
iter_tuple 824,397 ns/iter (+/- 46,234) 753,351 ns/iter (+/- 125,318)
list_get_item 573,128 ns/iter (+/- 61,386) 531,383 ns/iter (+/- 21,875)
tuple_get_item 408,859 ns/iter (+/- 25,755) 407,605 ns/iter (+/- 21,552)

@davidhewitt davidhewitt marked this pull request as draft November 29, 2020 16:05
@kngwyu
Copy link
Member

kngwyu commented Nov 30, 2020

Wow, thank you for this big effort!

I'm sorry that I don't have time to give a thorough check, but let me give a design question:
Why borrowed-first? (I should have explained this more in the #1056 thread, sorry)...

I think that we should design new APIs centered on owned objects, and PyAny, PyList, ... etc. should be owned objects.

Currently, our APIs are built around borrowed objects such as &PyAny, &PyList, and so on. However, actually borrowed objects cannot be completely safe since droping PyObject can make the refcount 0. Then, why are we stuck on borrowed objects? This is because of the reference pool, which this PR is going to remove. Since the reference pool holds the reference, we can treat an owned object as a borrowed one.
But, what if we don't have the pool?
Then I think we should switch our APIs to be built on owned objects. There's no way to treat an owned as borrowed so we have much more usages of owned objects, and writing lots of PyOwned<T> is not fun. Owned objects should be easier to use than the borrowed one.
This is what I meant in the comment:

PyAny<'py> and PyAnyRaw. No borrowed type.

Also, I was thinking of PyAnyRaw as a counterpart of Py<PyAny>(i.e., objects not bounded by GIL).
As you showed previously, being not able to have type Bouned<'py> = PyAny<'py> can be a problem when grouping these objects by a trait (e.g., PyRawObject). But I think that just having a normal method (e.g., fn with_py<'py>(self) -> PyAny<'py>) with the common name could work.

@davidhewitt
Copy link
Member Author

davidhewitt commented Nov 30, 2020

@kngwyu good questions! Once I understand PyAnyRaw a little better, I will also make a branch with PyAny<'py> and PyAnyRaw. I think I can easily copy this branch and rework to the owned types. Then we can compare against this branch and see which we like most! 🚀

I agree that having the core types be owned can be nicer. The piece I am not sure I understand is PyAnyRaw to replace Py<T>. Some questions:

  • Do we have a PyAnyRaw for each type? So PyAnyRaw, PyDictRaw, PyTupleRaw... ?

  • Also, what about for a #[pyclass] struct MyClass? Often I see users do this kind of thing:

    #[pyclass] struct Type1 { }
    
    #[pyclass] struct Type2 {
        inner: Py<Type1>
    }

    with PyAnyRaw, what do we give users as a replacement for this? Would this maybe be PyCellRaw<Type1>? I think having Py<Type1> is really nice, so it would be a shame to lose it.

  • Alternative idea: what about make the same API split in this PR where we have types::Any, types::Dict etc. ? Then instead of PyAnyRaw we can have Py<types::Any>, PyDictRaw instead is Py<types::Dict> etc. This would also allow users to continue to have Py<Type1> for pyclasses. With an owned objects API, I think that the usage of Py will only be for when users want to store python values in Rust types, so I think it's ok if this is a little complicated.

    (With this alternative idea I would maybe also rename Py<T> -> PySend<T>.)

@kngwyu
Copy link
Member

kngwyu commented Dec 1, 2020

The only strong point I have is owned-first so I think there can be various ideas about Py<T> alternatives.

Do we have a PyAnyRaw for each type?

I meant so.

Also, what about for a #[pyclass] struct MyClass? Often I see users do this kind of thing:

It can be problematic, but

Alternative idea: what about make the same API split in this PR where we have types::Any, types::Dict etc.

But is Py<Any> easier than PyAnyRaw or so?
They look almost the same functionalities for me.

@davidhewitt davidhewitt mentioned this pull request Dec 8, 2020
3 tasks
@davidhewitt
Copy link
Member Author

But is Py easier than PyAnyRaw or so?

I think that both are about the same for PyAny, but Py<MyClass> definitely feels necessary. How does a user create MyClassRaw?

I wonder if for the moment while we are experimenting with this branch we can have

type PyAnyRaw = Py<Any>;

And then learn about which one we find is nicer to use in practice.

@davidhewitt
Copy link
Member Author

Also, closing this PR in favour of #1308 which is the branch we'll actually merge eventually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants