experimental: owned objects API #1300

davidhewitt · 2020-11-29T15:58:12Z

This is an experimental extension to the pyo3 API to provide a new set of types which do not use pyo3's internal objects store.

There's two objectives from this new API:

stable memory consumption from pyo3 when embedding Python in Rust without manually managing memory using super-unsafe APIs. cc Unexpected memory management behavior #1056 PyBytes::new leaks memory #793 Documentaion about when we free memory #311 https://github.com/BenSchZA/radCAD/issues/2
faster performance overall because there's no overhead reading & writing from a thread-local vector.

Design

The goal with this PR is to keep the existing pyo3 API 100% backwards-compatible. There should be no changes needed to existing code if we merge this API. This allows it to be opt-in for now while we collect feedback from the new design, and make changes if needed.

PyO3 can make use of the new API internally to benefit from speedups where possible.

Because this is 100% backwards-compatible, and it's also a very big amount of changes, I would like to merge this API and start getting feedback. The promise to users is that we can change (or remove) the entire pyo3::experimental module at any time until we're happy with it.

When we are ready to make this new API official, the next steps would be to deprecate owned references and finally remove the pyo3 object store completely.

New APIs

In the pyo3::experimental module, I've added some new APIs, explained below. I've also made a pyo3::experimental::prelude module. Anyone who wants to test this API can swap this for the original prelude (along with the other new imports).

I welcome feedback and alternative designs for all of these. I think it's important to solve this problem of pyo3's memory management. I don't claim to have the answer. I think what I have here is a pretty good start.

New `PyOwned<'py, T>` type

To be able to return owned pointers from the new objects API there is a new PyOwned<'py, T> type. It's basically Py<T> with a Python<'py>, so it's bound to the GIL.

New `objects::PyAny<'py>` etc.

These "objects" are very similar to the existing types::PyAny (etc.). They are also used as references. This time they are always borrowed (because PyOwned is the owned reference).

An owned object PyOwned<'py, Any> will deref to &'a PyAny<'py> :

The 'a lifetime is Rust's usual lifetime for borrowed data. This is essential for tracking that the borrowed reference is only used while the owned object is kept alive.
The 'py lifetime is the GIL lifetime. It is different to 'a because the GIL lifetime is the same all the way through functions. If it is mixed in with 'a (like in the types API) it becomes shorter and shorter each time a PyOwned is borrowed, which is very annoying.

The motivation with these borrowed objects is exactly the same for the same reason we changed &'a PyAny to have the same memory representation as *mut ffi::PyObject. It's the fastest and most direct representation we can have for Python objects.

It also means that migrating to the new API is easier. Where we currently have

use pyo3::types::PyAny;
fn foo(&PyAny) { }

switching to the new API compiles immediately for this:

use pyo3::experimental::objects::PyAny;
fn foo(&PyAny) { }

For functions with return values, there is a little more work:

// Before 
use pyo3::{Python, types::PyAny};
fn foo(py: Python) -> &PyBytes {
    PyBytes::new(py, "")
}

// After
use pyo3::{Python, experimental::{PyOwned, objects::PyAny}};
fn foo(py: Python) -> PyOwned<Bytes> {
    PyBytes::new(py, "")
}

New `types::Any` etc.

The new 'py lifetime on &'a PyAny<'py> is important for the reasons above. However it's really annoying in Py<T> and PyOwned<'py, T>, because:

in Py<PyAny<'py>> there's no GIL, so the 'py lifetime doesn't make sense.
in PyOwned<'py, PyAny<'py>> we already have the lifetime; it's annoying to have to repeat it.

The solution I've come up with in this API is for the type parameter in Py<T> to be a type tag. In fact, it's just the same types::PyAny which we already use. I wanted to give it a different name so that we don't confuse it with the new objects::PyAny.

So I re-exported the old types without the Py- prefix, e.g. Any, Dict, Bytes, and so on. I chose this because it removes repetition of Py aswell - we just have Py<Any>, Py<Dict>, PyOwned<'py, Any> etc.

`experimental::FromPyObject` and `experimental::PyTryFrom`

To be compatible with &'a PyAny<'py>, these two traits needed to be changed to also have input lifetimes 'a and 'py.

For the moment I just copied them into the new experimental module and added the extra lifetime. I'm tempted to make this opportunity to rework the existing traits (if we want to make any changes). On the other hand, keeping them as similar as possible to the existing ones will make migration easier.

Benchmarks

I added some new benchmarks for the experimental API. To make the reference-releasing comparable with the old API I added unsafe { py.new_pool() } to the existing benchmarks.

In most benchmarks the new api is either the same or a little faster. For some that currently put a lot of owned references into pyo3's store, like dict iteration, the new API is significantly faster. (~30%). Even better, with the new API the memory consumption remains flat during iteration. In the existing API it grows as the references are put into pyo3's store.

Here's a full table of comparison:

benchmark	current api	experimental api
bench_call_0	`52,895 ns/iter (+/- 6,956)`	`54,954 ns/iter (+/- 2,175)`
bench_call_method_0	`153,977 ns/iter (+/- 9,671)`	`153,001 ns/iter (+/- 6,288)`
dict_get_item	`1,933,300 ns/iter (+/- 328,382)`	`1,780,027 ns/iter (+/- 79,625)`
extract_btreemap	`9,538,110 ns/iter (+/- 634,895)`	`8,688,905 ns/iter (+/- 590,359)`
extract_btreeset	`8,099,900 ns/iter (+/- 570,799)`	`7,842,395 ns/iter (+/- 535,694)`
extract_hashmap	`5,013,300 ns/iter (+/- 483,341)`	`4,292,595 ns/iter (+/- 358,152)`
extract_hashset	`5,418,065 ns/iter (+/- 440,259)`	`4,900,090 ns/iter (+/- 355,389)`
iter_dict	`1,759,862 ns/iter (+/- 209,509)`	`1,304,683 ns/iter (+/- 133,726)`
iter_list	`1,262,143 ns/iter (+/- 76,108)`	`1,109,227 ns/iter (+/- 52,926)`
iter_set	`1,257,083 ns/iter (+/- 100,680)`	`1,037,567 ns/iter (+/- 120,639)`
iter_tuple	`824,397 ns/iter (+/- 46,234)`	`753,351 ns/iter (+/- 125,318)`
list_get_item	`573,128 ns/iter (+/- 61,386)`	`531,383 ns/iter (+/- 21,875)`
tuple_get_item	`408,859 ns/iter (+/- 25,755)`	`407,605 ns/iter (+/- 21,552)`

kngwyu · 2020-11-30T05:55:59Z

Wow, thank you for this big effort!

I'm sorry that I don't have time to give a thorough check, but let me give a design question:
Why borrowed-first? (I should have explained this more in the #1056 thread, sorry)...

I think that we should design new APIs centered on owned objects, and PyAny, PyList, ... etc. should be owned objects.

Currently, our APIs are built around borrowed objects such as &PyAny, &PyList, and so on. However, actually borrowed objects cannot be completely safe since droping PyObject can make the refcount 0. Then, why are we stuck on borrowed objects? This is because of the reference pool, which this PR is going to remove. Since the reference pool holds the reference, we can treat an owned object as a borrowed one.
But, what if we don't have the pool?
Then I think we should switch our APIs to be built on owned objects. There's no way to treat an owned as borrowed so we have much more usages of owned objects, and writing lots of PyOwned<T> is not fun. Owned objects should be easier to use than the borrowed one.
This is what I meant in the comment:

PyAny<'py> and PyAnyRaw. No borrowed type.

Also, I was thinking of PyAnyRaw as a counterpart of Py<PyAny>(i.e., objects not bounded by GIL).
As you showed previously, being not able to have type Bouned<'py> = PyAny<'py> can be a problem when grouping these objects by a trait (e.g., PyRawObject). But I think that just having a normal method (e.g., fn with_py<'py>(self) -> PyAny<'py>) with the common name could work.

davidhewitt · 2020-11-30T09:01:50Z

@kngwyu good questions! Once I understand PyAnyRaw a little better, I will also make a branch with PyAny<'py> and PyAnyRaw. I think I can easily copy this branch and rework to the owned types. Then we can compare against this branch and see which we like most! 🚀

I agree that having the core types be owned can be nicer. The piece I am not sure I understand is PyAnyRaw to replace Py<T>. Some questions:

Do we have a PyAnyRaw for each type? So PyAnyRaw, PyDictRaw, PyTupleRaw... ?
Also, what about for a #[pyclass] struct MyClass? Often I see users do this kind of thing:
```
#[pyclass] struct Type1 { }

#[pyclass] struct Type2 {
    inner: Py<Type1>
}
```
with PyAnyRaw, what do we give users as a replacement for this? Would this maybe be PyCellRaw<Type1>? I think having Py<Type1> is really nice, so it would be a shame to lose it.
Alternative idea: what about make the same API split in this PR where we have types::Any, types::Dict etc. ? Then instead of PyAnyRaw we can have Py<types::Any>, PyDictRaw instead is Py<types::Dict> etc. This would also allow users to continue to have Py<Type1> for pyclasses. With an owned objects API, I think that the usage of Py will only be for when users want to store python values in Rust types, so I think it's ok if this is a little complicated.

(With this alternative idea I would maybe also rename Py<T> -> PySend<T>.)

kngwyu · 2020-12-01T14:11:54Z

The only strong point I have is owned-first so I think there can be various ideas about Py<T> alternatives.

Do we have a PyAnyRaw for each type?

I meant so.

Also, what about for a #[pyclass] struct MyClass? Often I see users do this kind of thing:

It can be problematic, but

Alternative idea: what about make the same API split in this PR where we have types::Any, types::Dict etc.

But is Py<Any> easier than PyAnyRaw or so?
They look almost the same functionalities for me.

davidhewitt · 2020-12-08T22:08:58Z

But is Py easier than PyAnyRaw or so?

I think that both are about the same for PyAny, but Py<MyClass> definitely feels necessary. How does a user create MyClassRaw?

I wonder if for the moment while we are experimenting with this branch we can have

type PyAnyRaw = Py<Any>;

And then learn about which one we find is nicer to use in practice.

davidhewitt · 2020-12-08T22:09:23Z

Also, closing this PR in favour of #1308 which is the branch we'll actually merge eventually.

davidhewitt added 7 commits November 29, 2020 13:46

wip: try to make owned objects...

92d833c

working benches

20e615c

fix all tests

66c85cf

all tests working

d173d71

start using type in extract

0a4a6b8

fix types import

30295b2

make benches fair comparison; fix tests and docs in objects

bbd30ed

davidhewitt marked this pull request as draft November 29, 2020 16:05

davidhewitt mentioned this pull request Dec 1, 2020

experimental: owned objects API (alternative) #1304

Closed

davidhewitt mentioned this pull request Dec 8, 2020

experimental: owned objects API #1308

Closed

3 tasks

davidhewitt closed this Dec 8, 2020

davidhewitt mentioned this pull request Jun 17, 2023

add pyo3::experimental namespace with new owned PyAny<'py> #3205

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

experimental: owned objects API #1300

experimental: owned objects API #1300

davidhewitt commented Nov 29, 2020

kngwyu commented Nov 30, 2020

davidhewitt commented Nov 30, 2020 •

edited

Loading

kngwyu commented Dec 1, 2020

davidhewitt commented Dec 8, 2020

davidhewitt commented Dec 8, 2020

experimental: owned objects API #1300

experimental: owned objects API #1300

Conversation

davidhewitt commented Nov 29, 2020

Design

New APIs

New PyOwned<'py, T> type

New objects::PyAny<'py> etc.

New types::Any etc.

experimental::FromPyObject and experimental::PyTryFrom

Benchmarks

kngwyu commented Nov 30, 2020

davidhewitt commented Nov 30, 2020 • edited Loading

kngwyu commented Dec 1, 2020

davidhewitt commented Dec 8, 2020

davidhewitt commented Dec 8, 2020

New `PyOwned<'py, T>` type

New `objects::PyAny<'py>` etc.

New `types::Any` etc.

`experimental::FromPyObject` and `experimental::PyTryFrom`

davidhewitt commented Nov 30, 2020 •

edited

Loading