Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind_map/bind_vector: allow customizing the getitem return_value_policy #452

Merged
merged 3 commits into from
May 23, 2024

Conversation

oremanj
Copy link
Contributor

@oremanj oremanj commented Mar 6, 2024

I'm as much a fan as anyone of safe-by-default, but I also maintain some code that relied on the old reference semantics for bind_map/bind_vector, including taking the appropriate level of care not to store the item-reference Python objects past the lifetime of their C++ referent. In order to avoid needing to maintain a local copy of these binding functions just to change one word, I'm hoping you'll consider this PR, which keeps the safe default but allows it to be changed using a 2nd template argument to bind_map/bind_vector. It also includes some expanded discussion in the docs, which reflects that the reference semantics are much more unsafe for std::vector (where inserting or erasing any element can invalidate references to all elements) than they are for std::map/unordered_map (where the only thing that can invalidate references to an element is removing that specific element).

This changes the default getitem rv_policy from std::is_pointer_v<Value> ? rv_policy::take_ownership : rv_policy::copy to rv_policy::automatic_reference. The main difference is that a non-intrusive-refcounting pointer will use reference instead of take_ownership, which seems like an obvious win to me. (When intrusive refcounting is used, the rv_policy is always overridden to be take_ownership in nb_type_put_common(), so it doesn't matter what we pass in that case.) A secondary difference is that if the map iterator returns a temporary value (rather than a reference) we can now move out of it. (Such iterators don't currently work for other reasons, but I'm about to put up a PR that fixes those.)

While working on this, I noticed that the container iterators still use the reference_internal policy. I'm not sure if you intended this or not; I can see the argument that someone is less likely to hold onto an object they got during iteration, but it's certainly not impossible. I didn't make any changes here, but if you'd like to, I would suggest using the same getitem_policy for the iterator result types as well.

Copy link
Owner

@wjakob wjakob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me uncomfortable to ship something that dangerous, but I do understand the need, and you did document the caveats well. But I would appreciate if you could add concrete examples of very innocent-looking programs that cold crash the interpreter.

About iterators in bind_map/bind_vector: yes, these are also bad interfaces and will need to be done. I was thinking of having a custom iterator that reindexes into the arrays/maps instead of using original STL iterators. It would need to store a strong reference to the Python object along with the current index or key.

The nb::make_*_iterator() API will also need a big fat warning about the dangers of using it and concurrently modifying the underlying data structure.

docs/api_extra.rst Show resolved Hide resolved
docs/api_extra.rst Outdated Show resolved Hide resolved
@wjakob
Copy link
Owner

wjakob commented Mar 6, 2024

In the case that this API is used to create an unsafe container, should it be a sort of "structural static read-write view"? In the sense that we would simply not bind functions that are almost certainly going to segfault your process: insertion, deletion, etc.

@oremanj
Copy link
Contributor Author

oremanj commented Mar 6, 2024

I added some examples of innocuous-looking UB.

I was thinking of having a custom iterator that reindexes into the arrays/maps instead of using original STL iterators. It would need to store a strong reference to the Python object along with the current index or key.

That could guard against some of the dangers of concurrent modification during iteration, but it basically just reduces the iterator-safety problem to the __getitem__-safety problem. Imagine:

struct Inner { int v; };
struct Outer { Inner inner; };
nb::class_<Inner>(m, "Inner").def_rw("v", &Inner::v);
nb::class_<Outer>(m, "Outer").def_rw("inner", &Outer::inner);
nb::bind_vector<std::vector<Outer>>(m, "OuterVec");

After some Python code like:

vec = ext.OuterVec()
for i in range(10):
    elem = ext.Outer()
    elem.inner.v = i
    vec.append(elem)

values = [elem.inner for elem in vec if elem.inner % 2 == 0]

you now have an array of values that refer to memory owned by vec, which can be invalidated by various things you might later do to it. (This is because def_rw uses rv_policy::reference_internal by default, but it's kind of hard for it to do its job if it doesn't.)

Currently __getitem__ copies by default, but the iterator __next__ still returns a reference, so it presents the same sort of problems that __getitem__ used to present. I think it might be more consistent to have them all use the same RVP, but of course that would also increase the potential for confusion and breakage of existing code.

In the case that this API is used to create an unsafe container, should it be a sort of "structural static read-write view"? In the sense that we would simply not bind functions that are almost certainly going to segfault your process: insertion, deletion, etc.

If the user has read our extensive warnings and decided to do the unsafe thing anyway, I don't really think it's our place to try to prevent them from doing what they want to do. It's perfectly possible to use a reference-semantic vector in ways that don't crash your program, even without forbidding insertions and deletions; C++ developers do it all the time! I also think it's telling that bind_vector has been in production use for quite a while (certainly if we count pybind) without generating a notable volume of user complaints of the sort that prompted the change to copy semantics. It's not safe in reference mode, but I think "almost certainly going to segfault your process" is going a little far.

@kj4tmp
Copy link

kj4tmp commented May 7, 2024

voicing my support for this feature #564

@wjakob wjakob force-pushed the master branch 4 times, most recently from d7117a4 to 983d6c0 Compare May 22, 2024 15:28
@wjakob
Copy link
Owner

wjakob commented May 23, 2024

Sorry about taking a while to get to this. This change looks good, and I can't come up with a better way of doing things. Thank you for the nice documentation and helpful changes as always @oremanj .

@wjakob wjakob merged commit 3314fac into wjakob:master May 23, 2024
24 checks passed
@wjakob
Copy link
Owner

wjakob commented May 23, 2024

FYI: 0e1fe84 makes a similar change for the iterators. The STL vector/map bindings pass their Policy argument to make_*_iterator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants