Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed GIL release issue with Python System and Python TestFixture. #2618

Open
wants to merge 7 commits into
base: gz-sim9
Choose a base branch
from

Conversation

AmalDevHaridevan
Copy link

…gned-off-by: Amal Dev Haridevan [email protected]

🦟 Bug fix

Fixes #
Add explicit scoped acquire and release of GIL, so that Python Systems can be executed from a Python Server (run using TestFixture for example).

Summary

Python system plugin, which is attached to my model cannot be run using TestFixture in Python. This is due to GIL not being explicitly released in the PythonSystemLoader.
To overcome this issue, we need to do a scoped_acquire of GIL explicitly and then perform scoped_release after each of the system methods, namely, PreUpdate, Update, PostUpdate of the PythonSystem, so the GIL can be accessed by the TestFixture after.
As a safety mechanism, for future, I also added the scoped_acquire and release of GIL within the pybind code for TestFixture.

Checklist

  • Signed all commits for DCO
  • Added tests
  • Updated documentation (as needed)
  • Updated migration guide (as needed)
  • Consider updating Python bindings (if the library has them)
  • codecheck passed (See contributing)
  • All tests passed (See test coverage)
  • While waiting for a review on your PR, please help review another open pull request to support the maintainers

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸🔸

Note to maintainers: Remember to use Squash-Merge and edit the commit message to match the pull request summary while retaining Signed-off-by messages.

…gned-off-by: Amal Dev Haridevan [email protected]

Signed-off-by: sdcnlab <[email protected]>
Signed-off-by: Amal Dev Haridevan <[email protected]>
python/test/python_system_TEST.py Outdated Show resolved Hide resolved
python/test/python_system_TEST.py Outdated Show resolved Hide resolved
python/test/gravity.sdf Show resolved Hide resolved
AmalDevHaridevan and others added 4 commits September 12, 2024 10:28
Co-authored-by: Alejandro Hernández Cordero <[email protected]>
Signed-off-by: AmaldevHaridevan <[email protected]>
Signed-off-by: AmaldevHaridevan <[email protected]>
Signed-off-by: AmaldevHaridevan <[email protected]>
self->OnPreUpdate(_cb);
pybind11::gil_scoped_release gilr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this necessary? I thought the scoped lock above would automatically release when going out of scope

// This acquire and release is only required from the PythonSystem code
// However, adding this here may prevent undefined or unintended behaviors
// in future
pybind11::gil_scoped_acquire gil;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not run into this issue with our python systems in the examples. Are you doing something specific that's causing a problem? It would be great if we can pin down the actual problem

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi azeey,

Yes. I will explain the issue in detail. The following files from my branch are required to reproduce the issue:

  1. src/gz-sim/python/test/testFixture_TEST.Py
  2. src/gz-sim/python/test/python_system_TEST.py
  3. src/gz-sim/python/test/gravity.sdf

The goal is to run the Server in Python, with an SDF model that contains a Python system plugin. I executed the following command before my proposed modifications:
python testFixture_TEST.py and the output was as follows:

(2024-09-13 12:52:31.329) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 12:52:31.400) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 12:52:31.400) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 12:52:31.426) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 12:52:31.426) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 12:52:31.427) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 12:52:31.427) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 12:52:31.432) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 12:52:31.433) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 12:52:31.434) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 12:52:31.435) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 12:52:31.436) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 12:52:31.436) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 12:52:31.438) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 12:52:31.440) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 12:52:31.441) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 12:52:31.441) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 12:52:31.442) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 12:52:33.066) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 12:52:33.068) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 12:52:33.070) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.
E(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
(2024-09-13 12:52:33.080) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)

======================================================================
ERROR: test_test_fixture (__main__.TestTestFixture)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/sdcnlab/gz_contrib/src/gz-sim/python/test/testFixture_TEST.py", line 58, in test_test_fixture
    server.run(True, 1000, False)
RuntimeError: pybind11::handle::inc_ref() PyGILState_Check() failure.

----------------------------------------------------------------------
Ran 1 test in 2.028s

FAILED (errors=1)

Specifically, the error is this:

pybind11::handle::inc_ref() is being called while the GIL is either not held or invalid. Please see https://pybind11.readthedocs.io/en/stable/advanced/misc.html#common-sources-of-global-interpreter-lock-errors for debugging advice.
If you are convinced there is no bug in your code, you can #define PYBIND11_NO_ASSERT_GIL_HELD_INCREF_DECREFto disable this check. In that case you have to ensure this #define is consistently used for all translation units linked into a given pybind11 extension, otherwise there will be ODR violations.The failing pybind11::handle::inc_ref() call was triggered on a method object.

I wasn't sure how to debug this issue, however, I realized that a probable issue could be the Python system and the Python TestFixture are most likely trying to simultaneously execute.
Since GIL is required to run the Python interpreter, this could potentially cause this problem. Based on this assumption, I made the modification by explicitly acquiring and releasing GIL so that interpreter is not simultaneously used.

This is the result of running python testFixture_TEST.py after my proposed modifications:

(2024-09-13 13:03:42.006) [info] [Server.cc:145] Loading SDF world file[/home/sdcnlab/gz_contrib/src/gz-sim/python/test/gravity.sdf].
(2024-09-13 13:03:42.256) [info] [SystemManager.cc:54] Serving entity system service on [/entity/system/add]
(2024-09-13 13:03:42.313) [debug] [Physics.cc:877] Loaded [gz::physics::dartsim::Plugin] from library [/home/sdcnlab/gz_contrib/src/install/lib/gz-physics-8/engine-plugins/libgz-physics-dartsim-plugin.so]
(2024-09-13 13:03:42.314) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::Physics] for entity [1]
Configured on 4
sdf name: plugin
Applying 0.0 N on link link
(2024-09-13 13:03:42.338) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::PythonSystemLoader] for entity [4]
(2024-09-13 13:03:42.338) [info] [LevelManager.cc:641] Loaded level [default]
(2024-09-13 13:03:42.339) [debug] [ServerConfig.cc:1025] Loading (3) plugins from file [/home/sdcnlab/.gz/sim/9/server.config]
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1647] Additional plugins to load:
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::UserCommands gz-sim-user-commands-system
(2024-09-13 13:03:42.340) [debug] [SimulationRunner.cc:1650] gz::sim::systems::SceneBroadcaster gz-sim-scene-broadcaster-system
(2024-09-13 13:03:42.344) [info] [UserCommands.cc:664] Create service on [/world/gravity/create]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:671] Remove service on [/world/gravity/remove]
(2024-09-13 13:03:42.345) [info] [UserCommands.cc:678] Pose service on [/world/gravity/set_pose]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:686] Pose service on [/world/gravity/set_pose_vector]
(2024-09-13 13:03:42.346) [info] [UserCommands.cc:693] Light configuration service on [/world/gravity/light_config]
(2024-09-13 13:03:42.347) [info] [UserCommands.cc:719] Physics service on [/world/gravity/set_physics]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:727] SphericalCoordinates service on [/world/gravity/set_spherical_coordinates]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:736] Enable collision service on [/world/gravity/enable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:745] Disable collision service on [/world/gravity/disable_collision]
(2024-09-13 13:03:42.348) [info] [UserCommands.cc:754] Material service on [/world/gravity/visual_config]
(2024-09-13 13:03:42.349) [info] [UserCommands.cc:762] Material service on [/world/gravity/wheel_slip]
(2024-09-13 13:03:42.349) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::UserCommands] for entity [1]
(2024-09-13 13:03:42.351) [debug] [SystemManager.cc:80] Loaded system [gz::sim::systems::SceneBroadcaster] for entity [1]
(2024-09-13 13:03:42.352) [info] [SimulationRunner.cc:260] Serving world controls on [/world/gravity/control], [/world/gravity/control/state] and [/world/gravity/playback/control]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:286] Serving GUI information on [/world/gravity/gui/info]
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:289] World [gravity] initialized with [default_physics] physics profile.
(2024-09-13 13:03:42.353) [info] [SimulationRunner.cc:296] Serving world SDF generation service on [/world/gravity/generate_world_sdf]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:334] Serving world names on [/gazebo/worlds]
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:347] Resource path add service on [/gazebo/resource_paths/add].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:360] Resource path get service on [/gazebo/resource_paths/get].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:375] Resource path resolve service on [/gazebo/resource_paths/resolve].
(2024-09-13 13:03:42.354) [info] [ServerPrivate.cc:389] Resource paths published on [/gazebo/resource_paths].
(2024-09-13 13:03:42.355) [info] [ServerPrivate.cc:402] Server control service on [/server_control].
(2024-09-13 13:03:44.003) [info] [SimulationRunner.cc:743] Found no publishers on /stats, adding root stats topic
(2024-09-13 13:03:44.005) [info] [SimulationRunner.cc:777] Found no publishers on /clock, adding root clock topic
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:542] Creating PostUpdate worker threads: 4
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (0)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (1)
(2024-09-13 13:03:44.007) [debug] [SimulationRunner.cc:553] Creating postupdate worker thread (2)
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:634] Serving scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:643] Serving graph information on [/world/gravity/scene/graph]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:654] Serving full state on [/world/gravity/state]
(2024-09-13 13:03:44.011) [info] [SceneBroadcaster.cc:663] Serving full state (async) on [/world/gravity/state_async]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:671] Publishing scene information on [/world/gravity/scene/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:680] Publishing entity deletions on [/world/gravity/scene/deletion]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:689] Publishing state changes on [/world/gravity/state]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:700] Publishing pose messages on [/world/gravity/pose/info]
(2024-09-13 13:03:44.012) [info] [SceneBroadcaster.cc:711] Publishing dynamic pose messages on [/world/gravity/dynamic_pose/info]
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (0)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (1)
(2024-09-13 13:03:45.154) [debug] [SimulationRunner.cc:569] Exiting postupdate worker thread (2)
.
----------------------------------------------------------------------
Ran 1 test in 3.165s

OK

@azeey
Copy link
Contributor

azeey commented Oct 14, 2024

@AmalDevHaridevan just wanted to give you a heads-up that I won't be able to review this for the next couple of weeks due to ROSCon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏛️ ionic Gazebo Ionic
Projects
Status: In review
Development

Successfully merging this pull request may close these issues.

3 participants