Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pybind] Python bindings of pure C++ objects may lose keep_alive relationships when Python view is lost #20383

Open
EricCousineau-TRI opened this issue Oct 17, 2023 · 1 comment
Assignees
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: bug

Comments

@EricCousineau-TRI
Copy link
Contributor

EricCousineau-TRI commented Oct 17, 2023

Summary

This is the core issue to #20131, with more minimal Drake example in #20380
See RobotLocomotion/pybind11#65 for pybind-only minimal reproduction case.

If we have a pure C++ object (it is not a Python class inheriting from a C++ base), then the Python "view" of the object can get garbage collected if we pass ownership solely to C++ and the Python view goes out of scope.

This is generally not an issue, as the C++ object itself stays alive.
However, this becomes a problem when the Python view of the object has additional information, such as keep_alive<> annotations - most importantly, it has patients that the Python view keeps alive.

Example

Adapted from #20131:

drake/tmp/repro.py

Lines 163 to 174 in 20a030a

def add_constraint(s):
context = diagram.CreateDefaultContext()
add_weakref(context)
plant_context = plant.GetMyContextFromRoot(context)
collision_constraint = MinimumDistanceConstraint(
plant, 0.001, plant_context, None, 0.01
)
add_weakref(collision_constraint)
trajopt.AddPathPositionConstraint(collision_constraint, s)
for s in evaluate_at_s:
add_constraint(s)

The sequence of events:

  • L164: Diagram context context created.
  • L166: plant_context created, keeps context alive as its patient.
  • L167: collision_constraint is created, keeps plant_context alive, which keeps context alive.
  • L171: collision_constraint is added to trajopt.
  • After this, the function exits scope:
    • collision_constraint has ref_count == 0, thus is gc'd, along with its keep_alive relationships.
    • Nothing is keeping context alive, so it gets gc'd.
  • Undefined behavior ensues, because plant_context has been freed memory

Related upstream issues

I have not searched for these yet.

Possible Solutions

1. Whenever we pass a shared_ptr<> from Python to C++, we still annotate it.

This is not bullet proof (we can still lose keep_alive relationships), but would at least improve things.

2. Ditch our pybind11 fork and use shared_ptr<> more (#5842)

Even if we wholesale switched to shared_ptr<> and ditched our pybind11 fork (#5842), I believe this would still be an issue unless we guarantee every C++ object shown to Python is only ever exposed as shared_ptr<> / (and possibly using enable_shared_from_this if we need to expose raw pointers).

If there is ever a raw pointer that has lifetime constraints, then that necessitates keep_alive<>, which may fall apart in situations like this.

3. When passing ownership from Python to pure C++, keep the object alive if it has any patients

This may keep things alive longer than intended, and may be slow to do the nurse-based lookup.

I am also not sure how hard this is for a pure Python object.
To prevent slicing, etc., we inject our own logic into tp_dealloc for the derived class.

I need to check if the pybind11 smart_holder branch has a mechanism for this, but I doubt it (this is kinda niche / silly).

4. Never delete anything.

This seems bad?


@jwnimmer-tri @SeanCurtis-TRI @sammy-tri @RussTedrake - do y'all have suggestions / alternatives?

Version

20a030a

@jwnimmer-tri
Copy link
Collaborator

Option (2) sounds best to me. I don't think we'll need enable_shared_from_this. Remember that the "managed object" and the "pointed-to value" of a shared_ptr need not be the same. Something like a shared_ptr<Context> could still be a subcontext alias into a larger context, where the managed object is the full context and the stored pointer is the subcontext.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pydrake Python API and its supporting Starlark macros priority: medium type: bug
Projects
None yet
Development

No branches or pull requests

3 participants