Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #4459 (cpp atexit callbacks) without segfault (#4500) #4505

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Skylion007
Copy link
Collaborator

@Skylion007 Skylion007 commented Feb 8, 2023

Description

Fix #4459 without introducing the segfault discovered in #4500. This is a nasty hotfix, but it should support both use cases without crashing.

This segfault is due to some nasty behavior with the side effects of get_local_internals which currently relies on UB behavior. Full writeup can be found here: #4500 (comment)

Suggested changelog entry:

* Fix regression that prevented using atexit callbacks with cpp registered types in an embedded interpreter.

@Skylion007 Skylion007 requested a review from henryiii February 8, 2023 16:32
@Skylion007
Copy link
Collaborator Author

Hmm, looks like some compilers are being a bit too clever and delaying the initialization of the static variable or otherwise reordering it when optimizations are enabled. This does unfortunately complicate the fix a bit. Thoughts @rwgk?

@rwgk
Copy link
Collaborator

rwgk commented Feb 9, 2023

The failing test is Pass classes and data between modules defined in C++ and Python. Maybe one module still points to data in the other?

I spent 10ish minutes looking around but still feel pretty clueless.

Wild idea, potentially silly: What if we clear only before we initialize a new interpreter, e.g. near the top of initialize_interpreter()?

@Skylion007
Copy link
Collaborator Author

The failing test is Pass classes and data between modules defined in C++ and Python. Maybe one module still points to data in the other?

I spent 10ish minutes looking around but still feel pretty clueless.

Wild idea, potentially silly: What if we clear only before we initialize a new interpreter, e.g. near the top of initialize_interpreter()?

Same issue, we can't call get_local_internals without calling get_internals which requires a GIL lock and thread local storage. The GIL can only be locked if the Python interpreter is initialized.

@Skylion007
Copy link
Collaborator Author

The failing test is Pass classes and data between modules defined in C++ and Python. Maybe one module still points to data in the other?

I spent 10ish minutes looking around but still feel pretty clueless.

Wild idea, potentially silly: What if we clear only before we initialize a new interpreter, e.g. near the top of initialize_interpreter()?

@rwgk I don't think the error is necessarily specific to this test, I think it's just the first test which access local_internals and therefore initializes it.

// avoid undefined behaviors when initializing another interpreter
local_internals.registered_types_cpp.clear();
local_internals.registered_exception_translators.clear();

if (internals_ptr_ptr) {
delete *internals_ptr_ptr;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rwgk It just occurred to me, that the reason deleting the internals_ptr_ptr here works is that it effectively leaks the containers in internals since it just deallocates them without calling the dtors. This shouldn't be working here because if the dtors did run, then the capsule dtor for builtins object would be called which requires the CPython API which would segfault. What this is actually doing is just leaking the objects. If we also changed the behavior to improperly swap out pointers to the container, that would work, but it's not a longterm fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

effectively leaks the containers in internals since it just deallocates them without calling the dtors

Yes, I was aware of that. Just to confirm.

Some background that may or may not matter here, sharing JIC:

Until Python 3.9 or 3.10 (not sure anymore) Python itself was leaking (https://github.com/rwgk/stuff/blob/f6549ecadb46169feb5f3c2456dcd13852382f7a/noddy/embedded_noddy_main_run.c), but with one of those versions that was cleaned up.

Before the cleanup in Python I was thinking: well, there is no point in being clean in pybind11. Therefore I wasn't concerned about any internals leaks.

That thinking has changed now. But who has the free energy to cleanup pybind11? And is it even doable without an ABI break? I don't know.

To come back to the problem here: I wouldn't worry about leaks, still. We "just" need to satisfy the situations of #3744 and #4500 simultaneously, with or without leaks.

Then worry about the leaks later.

Concretely: could we simply replicate the ptr_ptr tricked used by (non-local) internals? I.e. leak the local internals, too?

Copy link
Collaborator Author

@Skylion007 Skylion007 Feb 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we will need an ABI break at some point.

Concretely: could we simply replicate the ptr_ptr tricked used by (non-local) internals? I.e. leak the local internals, too?

Probably? wouldn't that be an ABI break in itself though?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably? wouldn't that be an ABI break in itself though?

I don't think so.

@lalaland for a 2nd opinion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can change local internals without an ABI break?

I think the bigger issue is that it might be an API break, since adding leaking behavior is explicitly changing the behavior of existing code. And in memory tight situations, that can be quite bad.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bigger issue is that it might be an API break, since adding leaking behavior is explicitly changing the behavior of existing code. And in memory tight situations, that can be quite bad.

Absolutely, it's a rock-and-a-hard place situation.

Ideally someone rolls up their sleeves and wrestles it down properly with whatever it takes.

I cannot do that: embedding has no critical use at Google. The only reason I'm looking is general pybind11 health.

So this here boils down to: what do we want to do given very limited resources?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for leaking now to fix the segfaults as a quick patch, but actually try to think of a way to eliminate the leak as a more long term project.

There has got to be some way of fixing this in the long term.

@Skylion007
Copy link
Collaborator Author

@Kalakaarboyz1 Sadly, this doesn't work on all compilers currently, so we need to fix that.

@rwgk
Copy link
Collaborator

rwgk commented Mar 16, 2023

Slightly random thoughts & at a random time (from the back of my mind):

My ideas for the community of users having a vested interest in multiple interpreter initialize/finalize cycles in a given process:

0a. Just don't do it. Initialize the interpreter only once in a given process, and only finalize it once, when the process terminates.

0b. If you really must have multiple interpreter initialize/finalize cycles, write a full rationale for why it's worth putting up with the trouble. Add to the pybind11 documentation.

  1. Add a test for the atexit situation before doing anything else.
  2. Commit to whatever it takes to make the new test and all existing tests pass.

@Skylion007
Copy link
Collaborator Author

I could be in favor of 0a, atexit seems like a more common use case than multiple interperters. Honestly, it's our slopping handle of get_internals_state, and the fact we can't control when to try to regenerate it. Ideally, we should have seperated the Python Internals State from the C++ one in some way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: Python atexit broken with pybind11 version >= 2.9.2
3 participants