You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If we have a pure C++ object (it is not a Python class inheriting from a C++ base), then the Python "view" of the object can get garbage collected if we pass ownership solely to C++ and the Python view goes out of scope.
This is generally not an issue, as the C++ object itself stays alive.
However, this becomes a problem when the Python view of the object has additional information, such as keep_alive<> annotations - most importantly, it has patients that the Python view keeps alive.
L166: plant_context created, keeps context alive as its patient.
L167: collision_constraint is created, keeps plant_context alive, which keeps context alive.
L171: collision_constraint is added to trajopt.
After this, the function exits scope:
collision_constraint has ref_count == 0, thus is gc'd, along with its keep_alive relationships.
Nothing is keeping context alive, so it gets gc'd.
Undefined behavior ensues, because plant_context has been freed memory
Related upstream issues
I have not searched for these yet.
Possible Solutions
1. Whenever we pass a shared_ptr<> from Python to C++, we still annotate it.
This is not bullet proof (we can still lose keep_alive relationships), but would at least improve things.
2. Ditch our pybind11 fork and use shared_ptr<> more (#5842)
Even if we wholesale switched to shared_ptr<> and ditched our pybind11 fork (#5842), I believe this would still be an issue unless we guarantee every C++ object shown to Python is only ever exposed as shared_ptr<> / (and possibly using enable_shared_from_this if we need to expose raw pointers).
If there is ever a raw pointer that has lifetime constraints, then that necessitates keep_alive<>, which may fall apart in situations like this.
3. When passing ownership from Python to pure C++, keep the object alive if it has any patients
This may keep things alive longer than intended, and may be slow to do the nurse-based lookup.
I am also not sure how hard this is for a pure Python object.
To prevent slicing, etc., we inject our own logic into tp_dealloc for the derived class.
I need to check if the pybind11 smart_holder branch has a mechanism for this, but I doubt it (this is kinda niche / silly).
Option (2) sounds best to me. I don't think we'll need enable_shared_from_this. Remember that the "managed object" and the "pointed-to value" of a shared_ptr need not be the same. Something like a shared_ptr<Context> could still be a subcontext alias into a larger context, where the managed object is the full context and the stored pointer is the subcontext.
Summary
This is the core issue to #20131, with more minimal Drake example in #20380
See RobotLocomotion/pybind11#65 for pybind-only minimal reproduction case.
If we have a pure C++ object (it is not a Python class inheriting from a C++ base), then the Python "view" of the object can get garbage collected if we pass ownership solely to C++ and the Python view goes out of scope.
This is generally not an issue, as the C++ object itself stays alive.
However, this becomes a problem when the Python view of the object has additional information, such as
keep_alive<>
annotations - most importantly, it has patients that the Python view keeps alive.Example
Adapted from #20131:
drake/tmp/repro.py
Lines 163 to 174 in 20a030a
The sequence of events:
context
created.plant_context
created, keepscontext
alive as its patient.collision_constraint
is created, keepsplant_context
alive, which keepscontext
alive.collision_constraint
is added totrajopt
.collision_constraint
hasref_count == 0
, thus is gc'd, along with itskeep_alive
relationships.context
alive, so it gets gc'd.plant_context
has been freed memoryRelated upstream issues
I have not searched for these yet.
Possible Solutions
1. Whenever we pass a
shared_ptr<>
from Python to C++, we still annotate it.This is not bullet proof (we can still lose
keep_alive
relationships), but would at least improve things.2. Ditch our pybind11 fork and use
shared_ptr<>
more (#5842)Even if we wholesale switched to
shared_ptr<>
and ditched ourpybind11
fork (#5842), I believe this would still be an issue unless we guarantee every C++ object shown to Python is only ever exposed asshared_ptr<>
/ (and possibly usingenable_shared_from_this
if we need to expose raw pointers).If there is ever a raw pointer that has lifetime constraints, then that necessitates
keep_alive<>
, which may fall apart in situations like this.3. When passing ownership from Python to pure C++, keep the object alive if it has any patients
This may keep things alive longer than intended, and may be slow to do the nurse-based lookup.
I am also not sure how hard this is for a pure Python object.
To prevent slicing, etc., we inject our own logic into
tp_dealloc
for the derived class.I need to check if the pybind11
smart_holder
branch has a mechanism for this, but I doubt it (this is kinda niche / silly).4. Never delete anything.
This seems bad?
@jwnimmer-tri @SeanCurtis-TRI @sammy-tri @RussTedrake - do y'all have suggestions / alternatives?
Version
20a030a
The text was updated successfully, but these errors were encountered: