Update to pybind11 v2.3.0 which fixes 3.7-related GIL mismanagement #6
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The root cause of this defect, which exists on Python 3.7 in the original project, and also on all subsequent versions in our fork, was this pybind11 defect related to the move from TLS to TSS in Python 3.7. Because the symptoms of the misbehaving software looked very much like GIL mismanagement when examining the system calls (stuck in a tight polling loop, presumable in deadlock), it was easy to make the connection with the above defect and try the proposed solution, which was to upgrade to
pybind11
version 2.3 or higher. Another hint of where to look came from a other projects reporting similar issues on Github, and pointing to the root cause issue (e.g.: pytorch #11419).