Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add utility to share rust reference across Python objects #189

Merged
merged 4 commits into from
Jan 27, 2020

Conversation

yuja
Copy link
Contributor

@yuja yuja commented Nov 4, 2019

This was originally developed for the Mercurial's Rust extension. It allows
us to implement a Python iterator over a Rust iterator relatively safely.

The problem is that a Rust iterator typically has a reference to the
collection behind, and thus cannot be a data member of a Python object.
We get around this problem by casting &'a to &'static and add runtime
validation instead. The basic idea is described in the following blog post.

https://raphaelgomes.dev/blog/articles/2019-07-01-sharing-references-between-python-and-rust.html

In order to make the reference-sharing business safe, we have the following
structs:

  • PySharedRefCell defines a storage holding a value to be shared
  • PySharedRef is a lifetime-bound reference to the value above, which provides
    a safe interface on top of the PySharedRefCell and its owner PyObject
  • UnsafePyLeaked is the reference not bound to the real lifetime, and its
    validity is checked at runtime
  • PyLeakedRef/PyLeakedRefMut are borrowed references from UnsafePyLeaked

In order to create PySharedRef in safe manner, we plan to add py_class!()
macro like this:

py_class!(class List |py|
    shared_data vec: Vec<i32>;
    // the storage type is PySharedRefCell<Vec<i32>>, but its accessor
    // returns PySharedRef<'a, Vec<i32>>. It makes sure that the right
    // owner PyObject and its data member are paired.
});

The macro extension is not included in this PR since we'll probably need
some bikeshedding.

There are a couple of unsafe business in this module, and some public
functions are unfortunately unsafe. Please refer to the inline comments
why some of them can be safe and some can't.

Thanks to @gracinet for the basic ideas and experiments,
@Alphare for the core implementation,
@markbt for the generation-counter idea,
and Mercurial developers for reviewing the original patches.

@Alphare
Copy link
Contributor

Alphare commented Nov 4, 2019

Note that my original blog post is out of date and contains a few errors that I should really find time to fix. Not that it matters much for this PR.
Thanks a lot @yuja for all of your work and precious help.

@gracinet
Copy link
Collaborator

gracinet commented Dec 1, 2019

I started to read this, but I'm far from being done.

@markbt @dgrunwald how do you think we should proceed?
I didn't have the opportunity to closely look at this at the time it's been reviewed inside Mercurial, so I'm somewhat in a fresh mindset. It's true I made the first experiments on this topic, but this has come a long way since then.

@yuja Do you think it could be of use outside of py_class? I'm wondering for instance if we could use the same principles to improve PyDict (that doesn't currently provide iterators for Rust consumers), or add a standard exposition of HashMap and BTreeMap to the Python interpreter. Just being curious here, thinking of follow-ups.

@yuja
Copy link
Contributor Author

yuja commented Dec 1, 2019

if we could use the same principles to improve PyDict (that doesn't currently provide
iterators for Rust consumers),

That's probably unrelated topic since we can safely take a lifetime-bound iterator in Rust.
If we want an iterator not bound to lifetime, using PyIterator should suffice.

add a standard exposition of HashMap and BTreeMap to the Python interpreter

Might be doable, but I don't know if we can easily implement a generic PythonObject wrapper
for arbitrary types of arbitrary sizes. In any case, the container will have to be owned by
PySharedRefCell, and which will have to be wrapped into a PythonObject.

@markbt
Copy link
Collaborator

markbt commented Jan 20, 2020

I really like this idea, so I'm keen to get it merged. What's the status of the PR? I still see a bunch of TODOs in the code, which would be nice to get resolved. For sure we need to sort out the syntax before we can merge it as it will form part of the API.

@gracinet
Copy link
Collaborator

I was planning to get back to it some time in February, once things have hopefully settled down for me (Heptapod…) but if someone gets to it faster, I don't mind.

Big +1 in principle, of course.

@yuja
Copy link
Contributor Author

yuja commented Jan 21, 2020

we need to sort out the syntax

Yes. Some ideas:

py_class!(class List |py|
    shared_data vec: Vec<i32>;  // "shared_data" as keyword
    shared vec: Vec<i32>;       // easy to type
    shared data vec: Vec<i32>;  // like "const" qualifier in C
    data shared vec: Vec<i32>;  // like "mut" in Rust

Do we have any plan to add a syntax to define @property-like function?
I think the syntax should be somewhat similar to this, i.e. if we take shared data <name>: <type>,
the syntax for property function will be e.g. property_getter def <name>(&self)....

I have a PoC-level implementation of py_class! macro extension, so I can send it out in
a few days.

@markbt
Copy link
Collaborator

markbt commented Jan 21, 2020

I don't like underscores in keywords, so I would prefer

py_class!(class List |py| {
    shared data my_vec: Vec<i32>;

Arguably the data here is superfluous, but probably convenient to keep to make it clear.

For properties, I'd want something similar. We could prefix def, but I think it looks ok if we just replace it, e.g.:

    property getter my_prop(&self) -> PyResult<i32> {
        ...
    }
    property setter my_prop(&self, value: i32) -> PyResult<()> {
        ...
    }

@dgrunwald
Copy link
Owner

For getters I would expect the obvious @property def my_prop(&self) -- after all we already support @staticmethod.
Setters could be @my_prop.setter def set_my_prop(&self, value) but I feel like we're already stretching macro_rules! way too far. Now that proc_macro_hack is usable in stable rust, we should seriously consider rewriting py_class! in a more maintainable form.

@yuja
Copy link
Contributor Author

yuja commented Jan 22, 2020

Thanks for the info. I didn't know @staticmethod is already implemented.
So I think @shared data my_vec: Vec<i32>; (decorator-like) is close to the existing syntax.
I'll rework my local patches and update this PR.

Now that proc_macro_hack is usable in stable rust, we should seriously consider rewriting py_class! in a more maintainable form.

+1. Even though I also like the current py_class! syntax, I agree it's difficult to extend.

This was originally developed for the Mercurial's Rust extension. It allows
us to implement a Python iterator over a Rust iterator _relatively_ safely.

The problem is that a Rust iterator typically has a reference to the
collection behind, and thus cannot be a data member of a Python object.
We get around this problem by casting &'a to &'static and add runtime
validation instead. The basic idea is described in the following blog post.

https://raphaelgomes.dev/blog/articles/2019-07-01-sharing-references-between-python-and-rust.html

In order to make the reference-sharing business safe, we have the following
structs:

 - PySharedRefCell defines a storage holding a value to be shared
 - PySharedRef is a lifetime-bound reference to the value above, which provides
   a safe interface on top of the PySharedRefCell and its owner PyObject
 - UnsafePyLeaked is the reference not bound to the real lifetime, and its
   validity is checked at runtime
 - PyLeakedRef/PyLeakedRefMut are borrowed references from UnsafePyLeaked

In order to create PySharedRef in safe manner, py_class!() macro extension
will be added by later patches:

    py_class!(class List |py|
        @shared data vec: Vec<i32>;
        // the storage type is PySharedRefCell<Vec<i32>>, but its accessor
        // returns PySharedRef<'a, Vec<i32>>. It makes sure that the right
        // owner PyObject and its data member are paired.
    });

There are a couple of unsafe business in this module, and some public
functions are unfortunately unsafe. Please refer to the inline comments
why some of them can be safe and some can't.

Thanks to Georges Racinet for the basic ideas and experiments,
          Raphaël Gomès for the core implementation,
          Mark Thomas for the generation-counter idea,
          and Mercurial developers for reviewing the original patches.
I originally tried to extract a helper function from data_decl(), but it
didn't look nice since there were subtle, but many differences between
data_decl() and shared_data_decl().

shared_data_decl() will be updated by the subsequent patches.
This mostly encapsulates PySharedRef business. The main reason of introducing
a dedicated macro syntax is that we need a safe way to bind the correct
PyObject reference to PySharedRefCell data. This is handled as follows:

    $crate::PySharedRef::new(py, &self._unsafe_inner, data)

Here `data` is a PySharedRefCell instance, and `self._unsafe_inner` is the
owner PyObject.

It's still weird that create_instance() takes PySharedRefCell<T> instead of
the inner data type T, which will be fixed by the next patch.
Since the use of PySharedRefCell type is encapsulated, it shouldn't be the
type of create_instance() arguments. This patch adds $init_expr and $init_ty
to map $init_ty argument to the storage $data_ty object.

    ($data_name: $init_ty) -> $data_ty { $init_expr }
@yuja
Copy link
Contributor Author

yuja commented Jan 25, 2020

Added @shared data macro. Test cases are split to test_sharedref.rs to work around
resolution error of py_class_impl! macro.

@markbt markbt merged commit 8145d96 into dgrunwald:master Jan 27, 2020
@yuja yuja deleted the refsharing-core branch January 28, 2020 14:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants