Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
gdb: have target_stack automate reference count handling
This commit changes the target_stack class from using a C style array of 'target_ops *' to using a C++ std::array<target_ops_ref, ...>. The benefit of this change is that some of the reference counting of target_ops objects is now done automatically. This commit fixes a crash in gdb.python/py-inferior.exp where GDB crashes at exit, leaving a core file behind. The crash occurs in connpy_connection_dealloc, and is actually triggered by this assert: gdb_assert (conn_obj->target == nullptr); Now a little aside... ... the assert is never actually printed, instead GDB crashes due to calling a pure virtual function. The backtrace at the point of crash looks like this: #7 0x00007fef7e2cf747 in std::terminate() () from /lib64/libstdc++.so.6 #8 0x00007fef7e2d0515 in __cxa_pure_virtual () from /lib64/libstdc++.so.6 #9 0x0000000000de334d in target_stack::find_beneath (this=0x4934d78, t=0x2bda270 <the_dummy_target>) at ../../s> #10 0x0000000000df4380 in inferior::find_target_beneath (this=0x4934b50, t=0x2bda270 <the_dummy_target>) at ../.> #11 0x0000000000de2381 in target_ops::beneath (this=0x2bda270 <the_dummy_target>) at ../../src/gdb/target.c:3047 #12 0x0000000000de68aa in target_ops::supports_terminal_ours (this=0x2bda270 <the_dummy_target>) at ../../src/gd> #13 0x0000000000dde6b9 in target_supports_terminal_ours () at ../../src/gdb/target.c:1112 #14 0x0000000000ee55f1 in internal_vproblem(internal_problem *, const char *, int, const char *, typedef __va_li> Notice in frame #12 we called target_ops::supports_terminal_ours, however, this is the_dummy_target, which is of type dummy_target, and so we should have called dummy_target::supports_terminal_ours. I believe the reason we ended up in the wrong implementation of supports_terminal_ours (which is a virtual function) is because we made the call during GDB's shut-down, and, I suspect, the vtables were in a weird state. Anyway, the point of this patch is not to fix GDB's ability to print an assert during exit, but to address the root cause of the assert. With that aside out of the way, we can return to the main story... Connections are represented in Python with gdb.TargetConnection objects (or its sub-classes). The assert in question confirms that when a gdb.TargetConnection is deallocated, the underlying GDB connection has itself been removed from GDB. If this is not true then we risk creating multiple different gdb.TargetConnection objects for the same connection, which would be bad. To ensure that we have one gdb.TargetConnection object for each connection, the all_connection_objects map exists, this maps the process_stratum_target object (the connection) to the gdb.TargetConnection object that represents the connection. When a connection is removed in GDB the connection_removed observer fires, which we catch with connpy_connection_removed, this function then sets conn_obj->target to nullptr, and removes the corresponding entry from the all_connection_objects map. The first issue here is that connpy_connection_dealloc is being called as part of GDB's exit code, which is run after the Python interpreter has been shut down. The connpy_connection_dealloc function is used to deallocate the gdb.TargetConnection Python object. Surely it is wrong for us to be deallocating Python objects after the interpreter has been shut down. The reason why connpy_connection_dealloc is called during GDB's exit is that the global all_connection_objects map is still holding a reference to the gdb.TargetConnection object. When the map is destroyed during GDB's exit, the gdb.TargetConnection objects within the map can finally be deallocated. The reason why all_connection_objects has contents when GDB exits, and the reason the assert fires, is that, when GDB exits, there are still some connections that have not yet been removed from GDB, that is, they have a non-zero reference count. If we take a look at quit_force (top.c) you can see that, for each inferior, we call pop_all_targets before we (later in the function) call do_final_cleanups. It is the do_final_cleanups call that is responsible for shutting down the Python interpreter. The pop_all_targets calls should, in theory, cause all the connections to be removed from GDB. That this isn't working indicates that some targets have a non-zero reference count even after this final pop_all_targets call, and indeed, when I debug GDB, that is what I see. I tracked the problem down to delete_inferior where we do some house keeping, and then delete the inferior object, which calls inferior::~inferior. In neither delete_inferior or inferior::~inferior do we call pop_all_targets, and it is this missing call that means we leak some references to the target_ops objects on the inferior's target_stack. In this commit I will provide a partial fix for the problem. I say partial fix, but this will actually be enough to resolve the crash. In a later commit I will provide the final part of the fix. As mentioned at the start of the commit message, this commit changes the m_stack in target_stack to hold target_ops_ref objects. This means that when inferior::~inferior is called, and m_stack is released, we automatically decrement the target_ops reference count. With this change in place we no longer leak any references, and now, in quit_force the final pop_all_targets calls will release the final references. This means that the targets will be correctly closed at this point, which means the connections will be removed from GDB and the Python objects deallocated before the Python interpreter shuts down. There's a slight oddity in target_stack::unpush, where we std::move the reference out of m_stack like this: auto ref = std::move (m_stack[stratum]); the `ref' isn't used explicitly, but it serves to hold the target_ops_ref until the end of the scope while allowing the m_stack entry to be reset back to nullptr. The alternative would be to directly set the m_stack entry to nullptr, like this: m_stack[stratum] = nullptr; The problem here is that when we set the m_stack entry to nullptr we first decrement the target_ops reference count, and then set the array entry to nullptr. If the decrement means that the target_ops object reaches a zero reference count then the target_ops object will be closed by calling target_close. In target_close we ensure that the target being closed is not in any inferiors target_stack. As we decrement before clearing, then this check in target_close will fail, and an assert will trigger. By using std::move to move the reference out of m_stack, this clears the m_stack entry, meaning the inferior no longer contains the target_ops in its target_stack. Now when the REF object goes out of scope and the reference count is decremented, target_close can run successfully. I've made use of the Python connection_removed listener API to add a test for this issue. The test installs a listener and then causes delete_inferior to be called, we can then see that the connection is then correctly removed (because the listener triggers).
- Loading branch information