Study of deep reinforcement learning approach and its application to the programming of intelligent agents to play retro video games
Miguel Enrique Játiva Jiménez
Inside the train folder you can find the following files:
train.py
: code for training DQN agents. It takes as argument the name of the environment.model.py
: the implementation of the DQN architecture.wrappers.py
: wrappers applied to the environment when training.keys.py
: file where the AWS keys are stored.
Names of the environments:
- Columns-Genesis
- Flicky-Genesis
- BioHazardBattle-Genesis
- StreetsOfRage2-Genesis
- SonicTheHedgehog-Genesis
Inside the evaluation folder you can find the following files:
eval.py
: code for evaluating DQN agents. It takes as arguments the agent, the name of the environment and whether you want to record it or not.random_eval.py
: code for evaluating random agents.model.py
: the implementation of the DQN architecture.wrappers_eval.py
: wrappers applied to the environment when evaluating.
You can also find a folder containing the best DQN agent for each game. To see these agents in action you need to execute them using eval.py
. The videos of their progress during training can be seen here:
- Columns: https://youtu.be/BWTVoRRe5KQ
- Flicky: https://youtu.be/qyLJ7IhasuE
- Bio-Hazard Battle: https://youtu.be/1ZIdxuhISDM
- Streets of Rage 2: https://youtu.be/_7QPvCW4j3g
- Sonic The Hedgehog: https://youtu.be/N2wQk5ypmA8
- Sonic The Hedgehog (rings): https://youtu.be/cKtCdReYTqg
- Sonic The Hedgehog (xpos): https://youtu.be/o6dYv10j_9E
Here is a video of the best agents playing:
- Best agents: https://youtu.be/FoARRAapR_Y
- Python 3.7.9
- PyTorch 1.6.0
- Gym Retro 0.8.0
- Kaggle was used at first to perform an informal search in order to select the final hyperparameters for training. The downside is that you are only able to execute a notebook for 9 hours straight so the final training could not be performed using Kaggle.
- I used the Notebooks API from Google Cloud as an alternative to Kaggle but the notebooks stopped their execution after 24 hours.
- The final training was performed in a laboratory of the Escuela Superior de Ingenieria Informática in Albacete. In order to receive the results of the training in my computer I used the S3 (Scalable Storage in cloud) service from AWS. I modified the traning algorithm in order to upload the results to the S3 AWS service.