Runaway

Runaway is a Reinforcment Learning environment developed in pygame. The object of the game is to avoid being captured by the evil minion that is chasing after you for as long as possible. A player wins the game if the minion is avoided for 1000 timesteps (roughly 20 seconds), otherwise the game is lost. Additionally, as part of this project I demonstrate how tabular Q-learning can be utilized to learn an optmial policy to play the game.

A short video presentation can be found here discussing how I trained an agent using Q-learning to play the game:

Environment and State

The state of the environment is represented as a vector of the 4 numbers:

the x posistion of the player
the y position of the player
the x position of the minion
the y position of the minion

Actions:

The player avoids the minion chasing after by moving around the 2D game board. The player can take the following actions:

Move up (up keyboard arrow)
Move down (down keyboard arrow)
Move left (left keyboard arrow)
Move right (right keyboard arrow)
Stay put (no action)

If the player moves into a wall the player remains in the same position.

Rewards

Rewards are given as follows:

1 reward is given for each timestep the minion is alive
1000 rewards are given if the player wins (survives for 1000 timesteps without being captured)
-100 rewards are given if the player is captured, ie. the player loses

The Minion policy

The Minion chasing the user follows a policy that always take the move the minimizes the distance between itself and the player.

Q-learning agent

Built into the environment is a trainable Q agent that can learn an optimal policy to play the game. Pretrained models can be found in models/

Summary of results:

I trained the agent using a decaying epsilon greedy policy where the start epsilon was set at 1 and every episode epsilon was decayed at a rate of 0.995 until it reached a minimum of 0.001 where it stay constant for the rest of training. I trained the agent for a total of 5000 episodes. Below is a plot showing the 50 episode moving average of the rewards obtained during training, an animation of the agent following the policy learned, and an animation of the agent following a completely random policy for comparison:

The agent converges when the rewards obtained consistantly reaches above 1250. From the plot, the agent converges after about 2000 episodes.

Getting started:

To get started run the following code to download the repo, create a virtual env, install the dependencies, and start up a human controlled game:

git clone https://github.com/loftiskg/runaway.git
cd runaway
python -m venv env
source env/bin/activate
pip install -r requirements.txt
python play.py

To train a Q learning agent to play the game run the following:

# this will save the Q values in the models directory with the name Q_model_e10000_epsilon0.001_1.pkl
python train.py --episodes 5000 --epsilon 1 --epsilon_decay .995 --min_epsilon .001 --save_model --save_plot --save_rewards --suffix "decay"  --print_every 100

This will recreate the results seen above

To render the trained agent run the following command:

python play --agent q --agent_model_path  models/Q_model_e5000_epsilon1.0_decay.pkl

Organization of the Repo:

src/ contains all the code that defines the enviornment and agents
models/ contains pretrained models and where models are saved after training using the commandline tool
plots/ contains training rewards plots saved after training using the commandline tool
rewards/ contains csvs of rewards at each episode of training when using the commandline tool
images/ contains images used in markdown files
notebook.ipynb contains a demo of training an agent in a notebook.
play.py a commandline tool that starts up a human or trained agent controlled games. For more info on how the CLI works run python play.py --help at the commandline to learn more
train.py a commandline tool that trains and agent using Q learning. For more info on how the CLI works run python train.py --help at the commanline to learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Runaway

Environment and State

Actions:

Rewards

The Minion policy

Q-learning agent

Summary of results:

Getting started:

Organization of the Repo:

More Questions:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
images		images
models		models
plots		plots
rewards		rewards
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
final_project_checkin.md		final_project_checkin.md
notebook.ipynb		notebook.ipynb
play.py		play.py
requirements.txt		requirements.txt
train.py		train.py

License

loftiskg/runaway

Folders and files

Latest commit

History

Repository files navigation

Runaway

Environment and State

Actions:

Rewards

The Minion policy

Q-learning agent

Summary of results:

Getting started:

Organization of the Repo:

More Questions:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages