-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Python GIL lock handling (Fixes #6524, Fixes #5631) #6525
Conversation
As disscussed in halide#6523 (comment) and later in halide#6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. halide#6524 (comment) notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in halide#5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it.
6ed75cd
to
0f96f29
Compare
I verified this patch fixes the failures I was seeing, and doesn't add any new ones.
Thanks! |
Theoretically, we need to drop the lock every time we call halide API from python, I only handled So i very much expect that this adds more brokenness. |
For calls from Python into Halide that don't spawn threads, the easiest thing to do is leave the lock alone. Then it's fine if Halide calls back into Python, because it still holds the lock. Other cases can be reviewed on a case-by-case basis. |
That's why i posted this yes, clearly if no one cared to fix this problem, |
Well, my point was, no threads == no problem. I don't think that pipeline definitions, scheduling, AOT compilation, target definitions or buffer accesses spawn threads. The only entry points I can think of that do spawn threads are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know enough about the GIL to be the ideal reviewer here, but I do think that we should update python_bindings/CMakeLists.txt
to specify (at least) v2.8.1 for the version we pull for building. (Maybe 2.9, since that is apparently the most recent release?). If you update this PR the buildbots should test using the version specified.
Err, i do not follow. Why do you want to bump the required pybind version? |
Because our version is out of date and we like to keep close to the latest release unless theres a reason not t.
True, but there isn't any mechanism for doing so right now -- we'd have to add one. |
I see, but then this is a separate concern from what we have at hand. Roughly, with caveats, depending on the most newest version of everything makes it hard while not impossible Case in point, if you bump it to v2.9, then i'm simply going to abandon this patch,
|
Not true for the CMake build, which always pulls and builds a specific, captive version, ignoring what is present on the system. (If this isn't what's happening, it's a bug in our CMake rules.) Are you using Make? |
Hang on, we are talking about different things here, aren't we :) |
ah, ok, in that case I defer entirely too @alexreinking :-) |
Bumped the bundled version, CI seems to be happy with it :) |
python_bindings/CMakeLists.txt
Outdated
@@ -4,13 +4,14 @@ | |||
|
|||
find_package(Python3 REQUIRED COMPONENTS Interpreter Development) | |||
|
|||
set(PYBIND11_VER 2.6.2) | |||
find_package(pybind11 ${PYBIND11_VER} QUIET) | |||
set(PYBIND11_MIN_SUPPORTED_VER 2.6.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I wonder why we don't always FetchContent for this, so we always get a known version; IIUC, PyBind11 is 100% a header-only library, with zero runtime dependencies, and it isn't that large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Downloading stuff during cmake/make time is a deal-breaker for packaging, for the record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steven-johnson - see #5092
python_bindings/CMakeLists.txt
Outdated
@@ -4,13 +4,14 @@ | |||
|
|||
find_package(Python3 REQUIRED COMPONENTS Interpreter Development) | |||
|
|||
set(PYBIND11_VER 2.6.2) | |||
find_package(pybind11 ${PYBIND11_VER} QUIET) | |||
set(PYBIND11_MIN_SUPPORTED_VER 2.6.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steven-johnson - see #5092
…9.0)" This reverts commit 128b946.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #6531 to record that we need to test newer pybind11, too. With that backed out, this LGTM.
Failure is a performance test on llvm14 that does not use Python... cannot possibly be related. |
@alexreinking thank you! |
* Fix Python GIL lock handling (Fixes #6524, Fixes #5631) As disscussed in #6523 (comment) and later in #6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. #6524 (comment) notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in #5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it. (cherry picked from commit b8eb22d)
* Fix Python GIL lock handling (Fixes #6524, Fixes #5631) As disscussed in #6523 (comment) and later in #6524, pybind11 v2.8.1 added some defensive checks that fail for halide, namely in `python_tutorial_lesson_04_debugging_2` and `python_tutorial_lesson_05_scheduling_1`. #6524 (comment) notes: > * Python calls a Halide-JIT-generated function , which runs with the GIL held. > * Halide runtime spawns worker threads. > * The worker threads try to call pybind11's py::print function to emit traces. > * Pybind11 complains, correctly, that the worker thread doesn't hold the GIL. > > Trying to acquire the GIL hangs, because the main thread is still holding it. I tried teaching the main thread to release the GIL (as suggested in #5631), but I still saw hangs when I tried this. I have tried, and just dropping the lock before calling into halide, or just acquiring it in `halide_python_print` doesn't work, we need to do both. I have verified that the two tests fail without this fix, and pass with it. (cherry picked from commit b8eb22d)
As disscussed in #6523 (comment)
and later in #6524,
pybind11 v2.8.1 added some defensive checks that fail for halide,
namely in
python_tutorial_lesson_04_debugging_2
and
python_tutorial_lesson_05_scheduling_1
.#6524 (comment) notes:
I have tried, and just dropping the lock before calling into halide,
or just acquiring it in
halide_python_print
doesn't work,we need to do both.
I have verified that the two tests fail without this fix,
and pass with it.