Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double-nested gil_scoped_release in a pythread causes fatal interpreter error #1276

Open
KyleFromKitware opened this issue Feb 13, 2018 · 16 comments · May be fixed by #1322
Open

Double-nested gil_scoped_release in a pythread causes fatal interpreter error #1276

KyleFromKitware opened this issue Feb 13, 2018 · 16 comments · May be fixed by #1322

Comments

@KyleFromKitware
Copy link

Assume the following scenario:

  1. Create a new pythread
  2. Call a C++ function from within that pythread
  3. That C++ function calls some Python
  4. That Python calls some MORE C++

Here is the minimum working example:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

#include <iostream>
#include <thread>

static void thread()
{
  pybind11::gil_scoped_acquire acquire;
  {
    // Pretend this block is nested inside Python
    pybind11::gil_scoped_release release;
    std::cout << "Calling C++ code" << std::endl;
  }
}

static void init_threads()
{
  // NOTE: This is a workaround for #1273
  if (!PyEval_ThreadsInitialized())
  {
    {
      pybind11::gil_scoped_acquire acquire;
    }
    PyEval_SaveThread();
  }
}

static const char thread_code[] =
"import threading\n"
"import testmod\n"
"t = threading.Thread(target=testmod.func)\n"
"t.start()\n"
"t.join()\n";

PYBIND11_EMBEDDED_MODULE(testmod, m)
{
  m.def("func", thread, pybind11::call_guard<pybind11::gil_scoped_release>());
}

int main()
{
  pybind11::scoped_interpreter interp;
  init_threads();

  {
    pybind11::gil_scoped_acquire acquire;
    pybind11::exec(thread_code);
  }

  return 0;
}

When the inner nested gil_scoped_release is destructed, it causes the Python interpreter to throw a fatal error:

Calling C++ code
Fatal Python error: Invalid thread state for this thread
Aborted (core dumped)

because there are two different thread states for the same thread, one created by the pythread and one created by pybind.

@KyleFromKitware
Copy link
Author

Note: the fatal error only happens if you are running a DEBUG build of the Python interpreter (VERY important).

KyleFromKitware added a commit to KyleFromKitware/pybind11 that referenced this issue Feb 14, 2018
Having pybind11 keep its own internal thread state can lead to an
inconsistent situation where the Python interpreter has a thread
state but pybind does not, and then when gil_scoped_acquire is
called, pybind creates a new thread state instead of using the one
created by the Python interpreter. This change gets rid of pybind's
internal thread state and always uses the one created by the Python
interpreter.
KyleFromKitware added a commit to KyleFromKitware/pybind11 that referenced this issue Feb 14, 2018
Having pybind11 keep its own internal thread state can lead to an
inconsistent situation where the Python interpreter has a thread
state but pybind does not, and then when gil_scoped_acquire is
called, pybind creates a new thread state instead of using the one
created by the Python interpreter. This change gets rid of pybind's
internal thread state and always uses the one created by the Python
interpreter.
@KyleFromKitware
Copy link
Author

Here's an even more concrete and tangible example that doesn't require a debug build of Python:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

#include <iostream>
#include <thread>

static void thread()
{
  pybind11::gil_scoped_acquire acquire;
  {
    // Call some Python code
    // ...
    // Python code calls a 3rd party library which does this
    PyGILState_STATE state = PyGILState_Ensure();
    std::cout << "Calling 3rd party library" << std::endl;
    PyGILState_Release(state);
  }
}

static const char thread_code[] =
"import threading\n"
"import testmod\n"
"t = threading.Thread(target=testmod.func)\n"
"t.start()\n"
"t.join()\n";

PYBIND11_EMBEDDED_MODULE(testmod, m)
{
  m.def("func", thread, pybind11::call_guard<pybind11::gil_scoped_release>());
}

int main()
{
  pybind11::scoped_interpreter interp;
  pybind11::gil_scoped_release release;

  {
    pybind11::gil_scoped_acquire acquire;
    pybind11::exec(thread_code);
  }

  return 0;
}

This one deadlocks as soon as it reaches the PyGILState_Ensure() call in the hypothetical 3rd party library.

Normally, the PyGILState API is smart enough to handle recursive GIL acquisition. However, because gil_scoped_acquire and gil_scoped_release have their own bastardized version of this API with its own internal data structures, the two are getting out of sync, and the Python interpreter doesn't know about the pybind11 structures, so it fails to recognize the recursive GIL acquisition.

The solution to this problem is to get rid of pybind's internal thread states and use the PyGILState APIs like it did in older versions. However, this breaks the thread dissociation feature of gil_scoped_release, which some projects still use. I propose adding a configuration option that uses the pybind method by default, but falls back to the PyGILState API when you set a flag in order to make this use case work. I am currently working on a PR with this change.

@wjakob
Copy link
Member

wjakob commented Feb 17, 2018

Calling pybind11's implementation "bastardized" is quite inappropriate. It enables several useful applications that are impossible to realize with Python's restricted GIL state API. Naturally, the two are not supposed to be combined in this way.

@wjakob
Copy link
Member

wjakob commented Feb 17, 2018

There is really nothing forcing you to use pybind121::gil_scoped_release by the way. You can declare your own RAII wrapper that is appropriate for your project.

@KyleFromKitware
Copy link
Author

KyleFromKitware commented Feb 19, 2018

My apologies, I wasn't trying to be inappropriate. In English, we sometimes use "bastardized" as another word for "hacked" or "modified" - perhaps one of those would have been more appropriate. In any case, I do understand what this is trying to accomplish, but the behavior was not what I was expecting.

My biggest concern is that the PYBIND11_OVERLOAD() macros use gil_scoped_acquire, which uses this version of the API, though I suppose I could rewrite these macros too.

@Erotemic
Copy link

@wjakob continuing our conversation, let summarize the main technical points and respond to some issues here, and perhaps @jagerman can chime in with his thoughts.

There are currently two options for interfacing with the GIL

  1. CPython's PyGILState API
  2. pybind11's gil_scoped_acquire / gil_scoped_release API

These methods are incompatible and cannot be nested because they each maintain there own state and do not synchronize with each other. Unfortunately, as noted in pybind11.h, the standard PyGILState API is limited and does not support some reasonable (but advanced) operations. Fortunately, the nesting that occurs will happen fairly infrequently because most python projects are small scripts and not large long-running systems.

In my opinion, because this incompatibility exists and most programs will not use the advanced features of the current interface, I believe that the pybind11 API should be changed to be compatible by default and expose a simple way to use the current interface that allows for advanced thread control. This way it would be the responsibility of the developer explicitly using the advanced features to ensure that nested calls were safe (or at least provide some warning or a note in the docs).

However, as @wjakob points out

Simply switching to PyGILState is likely a show-stopper: it will break compatibility for software that uses any of the advanced features and thus will require a new major version (i.e. pybind11 3.x).

This is a valid point, breaking compatibility does require a major version change. I hope to convince you that its worthwhile to do so. My main points of my argument are:

  • Most projects will not use the advanced threading features
  • Most projects will not use nested calls
  • However, I think it is more likely (especially as Python grows and becomes used in larger and larger systems) for nested calls to occur. Kitware's KWIVER project is an example of one such system. Perhaps, I'm biased, but I don't view KWIVER as a project with special requirements. I view it as an example of a larger system that is emerging.
  • Pybind11 has become widely adopted by projects such as caffe2 and pytorch and is quickly becoming a standard in the community.
  • I believe standards should be compatible by default with the current overarching CPython standard.

Some of personal projects actually critically depend on the improved GIL state API, so my excitement for such a change is very limited.

I very much understand this. Ideally we can come up with a good solution that makes upgrading to a new API simple. Of course this means that the new version must have an API to behave like the current version. (Alternatively we can petition for Python 4 to integrate the improved GIL API or we can wait for @larryhastings to finish his gilectomy).

Hopefully, I've laid out a reasonable argument as to why changing the current behavior of pybind11 is desirable. As for discussing the specifics of those changes (maybe even with ones that don't require a major version bump) I'll make a post in #1276

@KyleFromKitware
Copy link
Author

@Erotemic's comments on this matter are spot on. Here are some of my additional thoughts.

We could simply write our own basic_gil_scoped_acquire/release RAII classes that use the "basic" PyGILState_* API, but the biggest problem we're having is that there are places in pybind that use the advanced API, and these can't be changed without changing pybind. Most notable of these is the PYBIND11_OVERLOAD() macros and error_already_set. I was able to write "basic" versions of the PYBIND11_OVERLOAD() macros, but because error_already_set is used in so many places and is thrown any time Python code throws an exception, trying to make a new version of error_already_set would involve redefining so many classes and functions that it would be equivalent to rewriting pybind. At that point, it would be a lot simpler to just fork the project entirely. In order to solve this issue, we will need to make some kind of change to pybind - simply redefining the problematic classes and macros won't be enough.

In short, there are two issues here:

  1. Come up with a way to select the desired GIL behavior for projects (such as ours) that don't want to use the advanced API and are negatively impacted by its use. (This is the higher priority concern.)
  2. What should the default be? Basic or advanced? Personally, I agree with @Erotemic that the pybind default should be compatible with the CPython standard, but changing this default would require a pybind major version update, so it is understandable why there would be resistance to such a change. When I wrote Add option to use the PyGILState API in gil_scoped_acquire (#1276) #1286, I designed it with the specific intention of leaving the default behavior alone (use the pybind11 GIL API) and only use the PyGILState_* API if the developer explicitly selects it, in order to avoid breaking compatibility.

@Erotemic forwarded me this from @wjakob:

Right now the solution looks unappealing to me due to the introduction of extra ‘options’ (which seems very heavy-handed). An optional define like PYBIND11_USE_PYTHON_GILSTATE that switches over to an alternative gil_scoped_acquire_basic / gil_scoped_release_basic RAII wapper may be more suitable.

Can you explain your thoughts here? What I'm thinking is that, at least to the end user, there isn't a great deal of difference between my proposal and adding a #define. Both would require code changes in the project using pybind to select what they want to do. There are only two major differences: my option adds a rather large amount of code whereas the #define would only be a few lines long (an understandable concern in any project), and the #define happens at compile time whereas my proposal with an option would happen at runtime. (@Erotemic pointed out to me that the runtime option would have a slight performance penalty since there would be a few extra branching instructions, but given that the GIL - and mutexes in general - already use expensive kernel system calls for lock contention, this doesn't seem like a major bottleneck to me.)

One concern that @jagerman had is that having a global runtime option could change the value for other libraries (if library A uses the proposed "basic" PyGILState_* API and library B uses the advanced API, one of them could clobber the other.) What I found during my testing is that this doesn't happen (in fact, since we have multiple shared objects in our project, I had to insert my option in more than one place.) This would make sense to me, because pybind11 is a header-only library, so each executable/shared library object would get its own copy of the static option variable, though relying on this behavior is not necessarily a good idea.

@Erotemic also came up with the idea of having both a runtime and compile-time option, where the compile-time option sets the default for the runtime option, but the runtime option can still be changed.

Another option: instead of simply having an either/or between the basic and advanced APIs, what if we created a template function to allow the user to select any class they want? Example:

pybind11::options options;
options.set_gil_scoped_acquire<basic_gil_scoped_acquire>();

Internally, this would use a virtual class with the selected class as a templated member.

We could also do something similar as a compile time option. Example:

#define PYBIND11_GIL_SCOPED_ACQUIRE basic_gil_scoped_acquire

And then inside pybind:

#ifndef PYBIND11_GIL_SCOPED_ACQUIRE
#define PYBIND11_GIL_SCOPED_ACQUIRE ::pybind11::gil_scoped_acquire
#endif

One thing is for sure: as pybind becomes more widespread, and larger and more complicated projects start using it, they too are going to encounter this issue. It is certainly having a major impact on KWIVER, so I will take it upon myself to implement any proposal we can come up with. I hope that we can find a solution that meets everyone's needs.

@jagerman
Copy link
Member

Some food for thought:

having a global runtime option could change the value for other libraries (if library A uses the proposed "basic" PyGILState_* API and library B uses the advanced API, one of them could clobber the other.) What I found during my testing is that this doesn't happen ...

Yeah, I was thinking ahead a bit with that concern: It doesn't happen because, right now, the options class state stack is via a static variable, as you found. I don't think, however, as a runtime option that this is the right approach—the options class was really meant to control how pybind handles bind definitions; its main purpose is to allow you to do something like { options local_opts; local_opts.enable_function_signatures(); /* ... module definitions ... */; } to control how those definitions are applied. (It should probably have been named definition_options in retrospect). With an options-based approach you'd run into trouble if, for example, one module has a options local; to set some unrelated options, but loads your module in its definition code: the localized scope gil choice option would be reset at the end of the first module's definition scope. I don't see that sort of scoped interface helping at all here (instead it just gets in the way).

A much more natural place for this sort of global runtime control is in internals. That data is shared across modules (at least for modules with the same layout, which typically means the same major/minor version). It might even be worthwhile storing similarly to how we store internals but outside internals itself so that it can apply even across modules compiled with different pybind versions.


But back to the core issue. The biggest issue, as I see it, is that we are forcing the use of pybind11::gil_scoped_release: most notably in the OVERLOAD macros, though I also see another use in functional.h. We should fix that.

As to what that fix looks like: I don't think that either the PYBIND11_GIL_SCOPED_ACQUIRE or set_gil_scoped_acquire approachs are particularly nice or that they entirely solve the issue: you can still get a deadlock and/or interpreter error fairly easily if two modules do different things w.r.t the GIL type to use. Your module may set the define, but if another module doesn't you're right back to the same issue.

I don't personally have much experience with the GIL intricacies, so maybe there's a reason this wouldn't work, but we ought to be able to detect whether a Python "simple" GIL or a pybind "advanced" GIL is already acquired, and if so, we could have a default pybind11::new_gil_scoped_acquire (a new class—and just a placeholder name) which just extends whichever approach is currently in effect. That would, I think, solve the issue for OVERLOAD and functional. As @Erotemic points out, most modules don't really care about the implementation, they just want a GIL, and that's what this new class would do.

We could then rename gil_scoped_acquire to pybind11::advanced_gil_acquire (or something along those lines) with a typedef for backwards compatibility. A new pybind11::basic_gil_acquire, as described earlier in the issue, would provide the CPython-compatibile implementation. We might still want something like pybind11::require_gil<basic_scoped_acquire>() or pybind11::require_gil<gil_scoped_acquire>() set the default—and also to double as a runtime-check against two modules requiring incompatible gil implementations.

@KyleFromKitware
Copy link
Author

but we ought to be able to detect whether a Python "simple" GIL or a pybind "advanced" GIL is already acquired, and if so, we could have a default pybind11::new_gil_scoped_acquire (a new class—and just a placeholder name) which just extends whichever approach is currently in effect.

This certainly seems doable. It would just be a matter of calling PyGILState_GetThisThreadState() for the basic API and PyThread_get_key_value(internals.tstate) for the advanced API - though if we are going to generalize it with templates, we will also need to add some sort of static method to advanced_gil_scoped_acquire and basic_gil_scoped_acquire - something like basic_gil_scoped_acquire::thread_has_gil(), and then at that point we would need some sort of function like pybind11::register_gil<basic_gil_scoped_acquire>(). I'm imagining an internal list of GIL implementations which has basic_gil_scoped_acquire and advanced_gil_scoped_acquire by default, and then a single implementation from that list is selected as the default. In fact, since the "release" and "acquire" classes need to be grouped together, they should be typedefs within a larger overarching "implementation" class, something like basic_gil_impl and advanced_gil_impl, so I'm going to refer to this impl class from here on out. The acquire class would be basic_gil_impl::acquire and the release class would be basic_gil_impl::release. These can of course be typedef-ed into basic_gil_scoped_acquire for simplicity.

If we're going to have something like new_gil_scoped_acquire (I'm imagining it as auto_gil_impl::acquire, and I think it's a good idea), then we also need a way to tell it what to do if no GIL state has been created for the thread - something like auto_gil_impl<basic_gil_impl>::acquire would select the basic API if nothing existing can be found.

Actually... as I'm thinking about this some more, maybe auto_gil_impl doesn't need to be parameterized, it can just use the "default" implementation that was selected. And I do like the idea of require_gil<advanced_gil_impl>() doubling as a runtime check and throwing an error for conflicting selections.

It might even be worthwhile storing similarly to how we store internals but outside internals itself so that it can apply even across modules compiled with different pybind versions.

I like this idea too. Unfortunately, it won't help for projects that have been compiled with a version of pybind prior to the implementation of this proposal, but it should at least help going forward. If we make some sort of new builtin object, like __pybind11_gil_internals__, the first field would need to be something like unsigned int gil_internals_version, and we would need to take care to maintain backwards compatibility here. I would also prefer to store this in a way that doesn't require us to acquire the GIL to get at it (PyEval_GetBuiltins() requires the GIL to be held!) I'm not sure what to do here - if anyone has any ideas they would be appreciated.

@KyleFromKitware
Copy link
Author

I have opened #1322. It's not finished, but it should be enough to get a discussion going.

@wjakob
Copy link
Member

wjakob commented Mar 14, 2018

Hi Kyle,

I took a look at PR #1322. For me, it is too complicated. At its core, the GIL is something straightforward (a reference-counted mutex), and I'd be hesitant to introduce such complex machinery to deal with it in pybind11. (I am interested in resolving this issue though, so let's try to figure out what can be done.)

It's been a long time since I wrote the GIL handling in pybind11, and I just took a moment to go through it again. This made me realize that CPython's GILState and pybind11's "advanced GIL" are really supposed to be compatible. In fact, the reason why the code is a bit complicated is because pybind11 has to jump through some hoops to get access to the CPython internal variables, which it then tries to update in exactly the same way.

You can take a look e.g. at https://github.com/python/cpython/blob/master/Python/pystate.c#L1040. The "tss key" corresponds exactly to internals.tstate in pybind11's implementation. The gilstate_counter in both codebases also refers to the same place in memory.

So rather than cooking up a completely new subsystem, I think the way to go is to ensure that whatever pybind11 does is compatible with CPython. It sounds to me like you have potentially run into a corner case where the handling slightly differs.

Now that said, I don't really "get" the code you posted before. Why can't you write the following? This works perfectly on my machine:

#include <pybind11/pybind11.h>
#include <pybind11/embed.h>

#include <iostream>
#include <thread>

static void thread()
{
  pybind11::gil_scoped_acquire acquire;
  {
    // Pretend this block is nested inside Python
    pybind11::gil_scoped_release release;
    std::cout << "Calling C++ code" << std::endl;
  }
}


static const char thread_code[] =
"import threading\n"
"import testmod\n"
"t = threading.Thread(target=testmod.func)\n"
"t.start()\n"
"t.join()\n";

PYBIND11_EMBEDDED_MODULE(testmod, m)
{
  m.def("func", thread, pybind11::call_guard<pybind11::gil_scoped_release>());
}

int main()
{
  pybind11::scoped_interpreter interp;
  pybind11::exec(thread_code);

  return 0;
}

Best,
Wenzel

@KyleFromKitware
Copy link
Author

The crux of this entire issue is that "tss key". In Python 2.7, it's called autoTLSkey (https://github.com/python/cpython/blob/baca85fcc7cdf70af4a76ea0966d4842c173de1a/Python/pystate.c#L599), and it's declared static inside pystate.c (https://github.com/python/cpython/blob/baca85fcc7cdf70af4a76ea0966d4842c173de1a/Python/pystate.c#L40), so external software can't get at it and know its value. (In practice, it is almost certainly going to be 0, but we can't rely on this behavior.) The internals.tstate is a separate key. The two APIs don't know about the other one's key, and so they end up creating two separate thread states for the same thread. Python has internal assert()s which check for that specific condition (they only trigger if you have debug build of Python.)

The first example I gave was when I was in the earlier stages of tracking down this issue. It gets triggered because Python internally uses the PyGILState_* API when it creates a pythread. The second example with PyGILState_* more accurately illustrates our use case. The software that uses PyGILState_* is 3rd party, so we don't have a good way of changing it. Our software is large and complex, and has long-running operations in C++ land, so it always releases the GIL when it leaves Python land and reacquires it when it reenters, hence the release followed by acquire that I had in the main thread in my original example. That release and acquire, coupled with the debug build in Python, was crucial to trigger the assert() error.

But back to my original point: none of this would be an issue if we had access to autoTLSkey. Before #1286, I did experiment with trying to fix the advanced API. I made it first check PyGILState_GetThisThreadState(). The problem is that this breaks the thread disassociation feature of the advanced API: since we don't have access to autoTLSkey, we can't cut the relationship between the thread and the thread state the way the advanced API does. At that point, I decided not to try to fix the advanced API and to instead just give the user the option to switch between the advanced API and PyGILState_*, because our team doesn't need any of the features of the advanced API, and we would prefer to just use PyGILState_*. So that's where we're at now.

If we can find a way to get the advanced API to use whatever thread state was created with autoTLSkey and also be able to cut that relationship to preserve the thread disassociation feature, we can put this whole thing to bed. I currently don't have any ideas on how to do that other than hard-coding a value of 0 for the predicted value of autoTLSkey.

@KyleFromKitware
Copy link
Author

Any more ideas on what we can do here?

fjourdes added a commit to InSimo/pybind11 that referenced this issue Oct 28, 2021
Chekov2k pushed a commit to Chekov2k/pybind11 that referenced this issue Oct 5, 2022
rwgk pushed a commit to Chekov2k/pybind11 that referenced this issue Oct 26, 2022
rwgk pushed a commit to Chekov2k/pybind11 that referenced this issue Oct 26, 2022
rwgk pushed a commit to Chekov2k/pybind11 that referenced this issue Oct 30, 2022
rwgk pushed a commit that referenced this issue Oct 30, 2022
* Add option to force the use of the PYPY GIL scoped acquire/release logic to support nested gil access, see #1276 and pytorch/pytorch#83101

* Apply suggestions from code review

* Update CMakeLists.txt

* docs: update upgrade guide

* Update docs/upgrade.rst

* All bells & whistles.

* Add Reminder to common.h, so that we will not forget to purge `!WITH_THREAD` branches when dropping Python 3.6

* New sentence instead of semicolon.

* Temporarily pull in snapshot of PR #4246

* Add `test_release_acquire`

* Add more unit tests for nested gil locking

* Add test_report_builtins_internals_keys

* Very minor enhancement: sort list only after filtering.

* Revert change in docs/upgrade.rst

* Add test_multi_acquire_release_cross_module, while also forcing unique PYBIND11_INTERNALS_VERSION for cross_module_gil_utils.cpp

* Hopefully fix apparently new ICC error.

```
2022-10-28T07:57:54.5187728Z -- The CXX compiler identification is Intel 2021.7.0.20220726
...
2022-10-28T07:58:53.6758994Z icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
2022-10-28T07:58:54.5801597Z In file included from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/type_caster_base.h(15),
2022-10-28T07:58:54.5803794Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../cast.h(15),
2022-10-28T07:58:54.5805740Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../attr.h(14),
2022-10-28T07:58:54.5809556Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/class.h(12),
2022-10-28T07:58:54.5812154Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/pybind11.h(13),
2022-10-28T07:58:54.5948523Z                  from /home/runner/work/pybind11/pybind11/tests/cross_module_gil_utils.cpp(13):
2022-10-28T07:58:54.5949009Z /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/internals.h(177): error #2282: unrecognized GCC pragma
2022-10-28T07:58:54.5949374Z       PYBIND11_TLS_KEY_INIT(tstate)
2022-10-28T07:58:54.5949579Z       ^
2022-10-28T07:58:54.5949695Z
```

* clang-tidy fixes

* Workaround for PYPY WIN exitcode None

* Revert "Temporarily pull in snapshot of PR #4246"

This reverts commit 23ac16e.

* Another workaround for PYPY WIN exitcode None

* Clean up how the tests are run "run in process" Part 1: uniformity

* Clean up how the tests are run "run in process" Part 2: use `@pytest.mark.parametrize` and clean up the naming.

* Skip some tests `#if defined(THREAD_SANITIZER)` (tested with TSAN using the Google-internal toolchain).

* Run all tests again but ignore ThreadSanitizer exitcode 66 (this is less likely to mask unrelated ThreadSanitizer issues in the future).

* bug fix: missing common.h include before using `PYBIND11_SIMPLE_GIL_MANAGEMENT`

For the tests in the github CI this does not matter, because
`PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line,
but when monkey-patching common.h locally, it matters.

* if process.exitcode is None: assert t_delta > 9.9

* More sophisiticated `_run_in_process()` implementation, clearly reporting `DEADLOCK`, additionally exercised via added `intentional_deadlock()`

* Wrap m.intentional_deadlock in a Python function, for `ForkingPickler` compatibility.

```
>       ForkingPickler(file, protocol).dump(obj)
E       TypeError: cannot pickle 'PyCapsule' object
```

Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6.

* Add link to potential solution for WOULD-BE-NICE-TO-HAVE feature.

* Add `SKIP_IF_DEADLOCK = True` option, to not pollute the CI results with expected `DEADLOCK` failures while we figure out what to do about them.

* Add COPY-PASTE-THIS: gdb ... command (to be used for debugging the detected deadlock)

* style: pre-commit fixes

* Do better than automatic pre-commit fixes.

* Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` to `pytest_report_header()` (so that we can easily know when harvesting deadlock information from the CI logs).

Co-authored-by: Arnim Balzer <[email protected]>
Co-authored-by: Henry Schreiner <[email protected]>
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
henryiii added a commit that referenced this issue Oct 31, 2022
* Add option to force the use of the PYPY GIL scoped acquire/release logic to support nested gil access, see #1276 and pytorch/pytorch#83101

* Apply suggestions from code review

* Update CMakeLists.txt

* docs: update upgrade guide

* Update docs/upgrade.rst

* All bells & whistles.

* Add Reminder to common.h, so that we will not forget to purge `!WITH_THREAD` branches when dropping Python 3.6

* New sentence instead of semicolon.

* Temporarily pull in snapshot of PR #4246

* Add `test_release_acquire`

* Add more unit tests for nested gil locking

* Add test_report_builtins_internals_keys

* Very minor enhancement: sort list only after filtering.

* Revert change in docs/upgrade.rst

* Add test_multi_acquire_release_cross_module, while also forcing unique PYBIND11_INTERNALS_VERSION for cross_module_gil_utils.cpp

* Hopefully fix apparently new ICC error.

```
2022-10-28T07:57:54.5187728Z -- The CXX compiler identification is Intel 2021.7.0.20220726
...
2022-10-28T07:58:53.6758994Z icpc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
2022-10-28T07:58:54.5801597Z In file included from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/type_caster_base.h(15),
2022-10-28T07:58:54.5803794Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../cast.h(15),
2022-10-28T07:58:54.5805740Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/../attr.h(14),
2022-10-28T07:58:54.5809556Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/detail/class.h(12),
2022-10-28T07:58:54.5812154Z                  from /home/runner/work/pybind11/pybind11/include/pybind11/pybind11.h(13),
2022-10-28T07:58:54.5948523Z                  from /home/runner/work/pybind11/pybind11/tests/cross_module_gil_utils.cpp(13):
2022-10-28T07:58:54.5949009Z /home/runner/work/pybind11/pybind11/include/pybind11/detail/../detail/internals.h(177): error #2282: unrecognized GCC pragma
2022-10-28T07:58:54.5949374Z       PYBIND11_TLS_KEY_INIT(tstate)
2022-10-28T07:58:54.5949579Z       ^
2022-10-28T07:58:54.5949695Z
```

* clang-tidy fixes

* Workaround for PYPY WIN exitcode None

* Revert "Temporarily pull in snapshot of PR #4246"

This reverts commit 23ac16e.

* Another workaround for PYPY WIN exitcode None

* Clean up how the tests are run "run in process" Part 1: uniformity

* Clean up how the tests are run "run in process" Part 2: use `@pytest.mark.parametrize` and clean up the naming.

* Skip some tests `#if defined(THREAD_SANITIZER)` (tested with TSAN using the Google-internal toolchain).

* Run all tests again but ignore ThreadSanitizer exitcode 66 (this is less likely to mask unrelated ThreadSanitizer issues in the future).

* bug fix: missing common.h include before using `PYBIND11_SIMPLE_GIL_MANAGEMENT`

For the tests in the github CI this does not matter, because
`PYBIND11_SIMPLE_GIL_MANAGEMENT` is always defined from the command line,
but when monkey-patching common.h locally, it matters.

* if process.exitcode is None: assert t_delta > 9.9

* More sophisiticated `_run_in_process()` implementation, clearly reporting `DEADLOCK`, additionally exercised via added `intentional_deadlock()`

* Wrap m.intentional_deadlock in a Python function, for `ForkingPickler` compatibility.

```
>       ForkingPickler(file, protocol).dump(obj)
E       TypeError: cannot pickle 'PyCapsule' object
```

Observed with all Windows builds including mingw but not PyPy, and macos-latest with Python 3.9, 3.10, 3.11 but not 3.6.

* Add link to potential solution for WOULD-BE-NICE-TO-HAVE feature.

* Add `SKIP_IF_DEADLOCK = True` option, to not pollute the CI results with expected `DEADLOCK` failures while we figure out what to do about them.

* Add COPY-PASTE-THIS: gdb ... command (to be used for debugging the detected deadlock)

* style: pre-commit fixes

* Do better than automatic pre-commit fixes.

* Add `PYBIND11_SIMPLE_GIL_MANAGEMENT` to `pytest_report_header()` (so that we can easily know when harvesting deadlock information from the CI logs).

Co-authored-by: Arnim Balzer <[email protected]>
Co-authored-by: Henry Schreiner <[email protected]>
Co-authored-by: Ralf W. Grosse-Kunstleve <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
@scdub
Copy link

scdub commented Jun 23, 2023

I think the work on PYBIND11_SIMPLE_GIL_MANAGEMENT and its eventual movement to become the default gil_scoped classes supercedes this work. For our use cases, setting that value successfully eliminates the crashes caused by the conflicts with the Python C API GIL calls.

@rwgk
Copy link
Collaborator

rwgk commented Jun 23, 2023

I think the work on PYBIND11_SIMPLE_GIL_MANAGEMENT and its eventual movement to become the default gil_scoped classes supercedes this work.

I couldn't find evidence that PYBIND11_SIMPLE_GIL_MANAGEMENT actually resolves any problem, even though I tried very hard (the massively expanded testing under #4216). I've paid attention to DEADLOCK flakes for a few months since, I've seen them with both PYBIND11_SIMPLE_GIL_MANAGEMENT defined or undefined, macOS only: #4373

For our use cases, setting that value successfully eliminates the crashes caused by the conflicts with the Python C API GIL calls.

That's interesting. Do you have a reliable reproducer?

Every time I took a closer look before, such observations turned out to be chance.

@scdub
Copy link

scdub commented Jun 25, 2023

Unfortunately I don't have an isolated test case to provide for this. We've observed crashes in a large application with an embedded Python interpreter, which already manages GIL state using the same Python C API primitives as PYBIND11_SIMPLE_GIL_MANAGEMENT. When we include third party dependencies which include pybind11 using the more complex gil_scoped classes, we've seen crashes. The change hasn't been installed long enough to be thoroughly tested, but I will update this with any further findings once I have more data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants