You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While building a custom robotic simulation tool on top of OIGE we discovered that testing policies with headless=False was different from headless=True. The issue can be easily reproduced even on standard OIGE tasks. Testing the same trained policy with headless=True/False has a 40% reward difference on Humanoid and Ant tasks.
I am attaching a script that can be run on the latest commit in main, it trains the Humanoid task in headless=True, tests it in headless=True/False and should produce following results:
== Humanoid Test; headless=True
av reward: 6852.170803435147 av steps: 989.174072265625
== Humanoid Test; headless=False
av reward: 4273.75024558347 av steps: 984.9992679355784
This also happens to Ant. And if training is done with headless=False (very slow, but can be done) the test scores are entirely different. See extra outputs at the bottom of this post.
Root-Cause Analysis
We were able to trace this behavior to the state of the internal to_render variable in omniisaacgymenvs/envs/vec_env_rlgames.py. link
there is some convoluted logic for when and how it is being set, but if overloaded to always be False the results are always the same as headless=True, and if it is overloaded to True the results are always same headless=False
the combination headless=True & to_render=True can be tested by enabling cameras via task.sim.enable_cameras=True
reproduce.sh from the gist, as well as outputs at the bottom of this post show that the results with headless=True & task.sim.enable_cameras=True are exactly equivalent to headless=False
Unfortunately, from there the issue goes deep into isaac-sim code via self._world.step(render=to_render)link, so we stopped investigating there.
The issue has been tested on multiple machines with different hardware, using latest drivers as well as recommended version 525. But given that everything runs in the docker, this probably should not matter too much.
Resolution
It would be great if you could confirm the issue, or explain if this behavior is expected and what is the proper way to deal with it?
Currently it seems that visually inspecting a trained policy is unreliable as it behaves differently when rendered, which would be extremely undesirable as visual inspection is vital to debugging RL policies.
Extra Results
Humanoid trained with headless=True
== Humanoid Test; headless=True
av reward: 6852.170803435147 av steps: 989.174072265625
== Humanoid Test; headless=False
av reward: 4273.75024558347 av steps: 984.9992679355784
== Humanoid Test; headless=True enable_cameras=True ==
av reward: 4273.75024558347 av steps: 984.9992679355784
Humanoid trained with headless=False (training takes 1.5h on RTX 3070)
== Humanoid Test; headless=True
av reward: 4156.822625699561 av steps: 830.9899344569288
== Humanoid Test; headless=False
av reward: 3556.779703811363 av steps: 966.6001461988304
== Humanoid Test; headless=True enable_cameras=True ==
av reward: 3556.779703811363 av steps: 966.6001461988304
Ant trained with headless=True
== Ant Test; headless=True
av reward: 7147.375523806955 av steps: 965.1955620580346
== Ant Test; headless=False
av reward: 3829.089754253626 av steps: 996.640625
== Ant Test; headless=True enable_cameras=True ==
av reward: 3829.089754253626 av steps: 996.640625
On request we can also provide complete training and testing logs.
The text was updated successfully, but these errors were encountered:
Overview
While building a custom robotic simulation tool on top of OIGE we discovered that testing policies with
headless=False
was different fromheadless=True
. The issue can be easily reproduced even on standard OIGE tasks.Testing the same trained policy with
headless=True/False
has a 40% reward difference onHumanoid
andAnt
tasks.I am attaching a script that can be run on the latest commit in
main
, it trains theHumanoid
task inheadless=True
, tests it inheadless=True/False
and should produce following results:gist to reproduce this: https://gist.github.com/Demetrio92/c986493cff3b4d791a42412179ec6264
This also happens to
Ant
. And if training is done withheadless=False
(very slow, but can be done) the test scores are entirely different. See extra outputs at the bottom of this post.Root-Cause Analysis
to_render
variable inomniisaacgymenvs/envs/vec_env_rlgames.py
. linkFalse
the results are always the same asheadless=True
, and if it is overloaded toTrue
the results are always sameheadless=False
headless=True
&to_render=True
can be tested by enabling cameras viatask.sim.enable_cameras=True
reproduce.sh
from the gist, as well as outputs at the bottom of this post show that the results withheadless=True & task.sim.enable_cameras=True
are exactly equivalent toheadless=False
self._world.step(render=to_render)
link, so we stopped investigating there.Resolution
It would be great if you could confirm the issue, or explain if this behavior is expected and what is the proper way to deal with it?
Currently it seems that visually inspecting a trained policy is unreliable as it behaves differently when rendered, which would be extremely undesirable as visual inspection is vital to debugging RL policies.
Extra Results
headless=True
headless=False
(training takes 1.5h on RTX 3070)headless=True
On request we can also provide complete training and testing logs.
The text was updated successfully, but these errors were encountered: