Skip to content

4. Running Pretrained Policies

Zackory Erickson edited this page Dec 11, 2020 · 6 revisions

NOTE: Pretrained policies are not yet done training for v1.0. Until then, v0.1 policies are available and the instructions below correspond to the v0.1 policies. You can install v0.1 of Assistive Gym to use these policies.

We provide pretrained control policies for each robot and assistive task.
These controllers are trained using Proximal Policy Optimization (PPO) implemented in PyTorch.
The pretrained models were trained for 10,000,000 time steps (50,000 simulation rollouts) on a 36 virtual core AWS machine.

Download library and models

The PyTorch library and pretrained policies can be downloaded using the commands below.
If you do not have wget installed on your machine, you can download the models directly from the GitHub release page.
You may also need to install OpenCV for OpenAI Baselines. Ubuntu: sudo apt-get install python3-opencv Mac: brew install opencv

# Install pytorch RL library
pip3 install git+https://github.com/Zackory/pytorch-a2c-ppo-acktr --no-cache-dir
# Install OpenAI Baselines 0.1.6
pip3 install git+https://github.com/openai/baselines.git
# Download pretrained policies
wget -O trained_models/ppo/pretrained_policies.zip https://github.com/Healthcare-Robotics/assistive-gym/releases/download/0.100/pretrained_policies.zip
unzip trained_models/ppo/pretrained_policies.zip -d trained_models/ppo

Robot assisting a static person

Here we evaluate a pretrained policy for a Baxter robot assisting to scratch an itch on a person's right arm, while the person sits with a static pose in a wheelchair.

python3 -m ppo.enjoy --env-name "ScratchItchBaxter-v0"

Collaborative assistance - robot assisting an active human

We also provide pretrained policies for a robot and human that learned to collaborate to achieve the same assistive task. Both the robot and human have separate control policies, that are trained simultaneously via co-optimization.

python3 -m ppo.enjoy_coop --env-name "DrinkingSawyerHuman-v0"

Evaluating and comparing policies over 100 trails

We can also compare control policies for a given assistive task. We evaluate a policy over 100 simulation rollouts of the task to calculate the average reward and task success.

python3 -m ppo.enjoy_100trials --env-name "FeedingPR2-v0"

We can also compare policies for collaborative assistance environments where both the robot and human take actions according to co-optimized policies.

python3 -m ppo.enjoy_coop_100trials --env-name "BedBathingJacoHuman-v0"