-
Notifications
You must be signed in to change notification settings - Fork 76
4. Running Pretrained Policies
NOTE: Pretrained policies are not yet done training for v1.0. Until then, v0.1 policies are available and the instructions below correspond to the v0.1 policies. You can install v0.1 of Assistive Gym to use these policies.
We provide pretrained control policies for each robot and assistive task.
These controllers are trained using Proximal Policy Optimization (PPO) implemented in PyTorch.
The pretrained models were trained for 10,000,000 time steps (50,000 simulation rollouts) on a 36 virtual core AWS machine.
The PyTorch library and pretrained policies can be downloaded using the commands below.
If you do not have wget
installed on your machine, you can download the models directly from the GitHub release page.
You may also need to install OpenCV for OpenAI Baselines. Ubuntu: sudo apt-get install python3-opencv
Mac: brew install opencv
# Install pytorch RL library
pip3 install git+https://github.com/Zackory/pytorch-a2c-ppo-acktr --no-cache-dir
# Install OpenAI Baselines 0.1.6
pip3 install git+https://github.com/openai/baselines.git
# Download pretrained policies
wget -O trained_models/ppo/pretrained_policies.zip https://github.com/Healthcare-Robotics/assistive-gym/releases/download/0.100/pretrained_policies.zip
unzip trained_models/ppo/pretrained_policies.zip -d trained_models/ppo
Here we evaluate a pretrained policy for a Baxter robot assisting to scratch an itch on a person's right arm, while the person sits with a static pose in a wheelchair.
python3 -m ppo.enjoy --env-name "ScratchItchBaxter-v0"
We also provide pretrained policies for a robot and human that learned to collaborate to achieve the same assistive task. Both the robot and human have separate control policies, that are trained simultaneously via co-optimization.
python3 -m ppo.enjoy_coop --env-name "DrinkingSawyerHuman-v0"
We can also compare control policies for a given assistive task. We evaluate a policy over 100 simulation rollouts of the task to calculate the average reward and task success.
python3 -m ppo.enjoy_100trials --env-name "FeedingPR2-v0"
We can also compare policies for collaborative assistance environments where both the robot and human take actions according to co-optimized policies.
python3 -m ppo.enjoy_coop_100trials --env-name "BedBathingJacoHuman-v0"