Code for the paper Deep RL Agent for a Real-Time Strategy Game by Michał Warchalski, Dimitrije Radojević, and Miloš Milošević.
This is the codebase that was used to train and evaluate reinforcement learning agents that can play Heroic - Magic Duel, a real-time strategy player-versus-player mobile game.
There are two components to the system:
- Training/inference agent, which contains policy/value network, training configuration, Gym environment etc. This repository contains agent codebase.
- Training/inference server, which applies agent's actions to battle, and returns observed battle state. This component is provided as a Docker image.
Agents communicate with servers via HTTP. Server exposes a RESTful API for starting new battles, stepping battles, fetching replay etc. Observations and actions are exchanged in JSON format.
Heroic environment, including serialization and communication with server, is wrapped in a Gym environment on the agent side.
MLP policy and value network architecture is implemented in Tensorflow v1. Experimental RNN policy is also available. An implementation of PPO, inspired by OpenAI baselines and stable-baselines, is used. More details in the paper.
Agents can be trained with several training plans, against several kinds of adversaries of varying difficulty, e.g. classical AI bots that are provided with the server, or via selfplay, and can use several predefined reward functions. Training is highly configurable, and can be resumed if interrupted.
GPUs are used for rollouts and update steps, if available. MPI is used as backend for distributed Adam optimizers for policy and value functions, and for syncing TF variables after update step across subprocesses.
Agents can be run in inference mode, which is used for evaluation. There is also limited battle rendering capability, in form of terminal-based UI.
Run an instance of training/inference server with:
docker run -it -d quay.io/nordeus/heroic-rl-server:latest
Multiple instances can be spun up, which can make rollouts faster. When specifying multiple servers for agents to connect to, each agent subprocess will be assigned a single server instance.
In order to access this server from host machine, in case of running the agent from
the source, container port that server is listening on should be mapped to a port
on the host machine with -p <host_port>:8081
.
There are two ways to run the agent, as Docker container or directly from source.
Main agent script, heroic-rl
, can be run containerized with:
docker run -it --gpus all quay.io/nordeus/heroic-rl-agent
It is recommended to create a separate Docker network for communication between agent and server:
docker network create heroic-rl
This network can be specified in Docker run command for both agent and server
by adding --network heroic-rl
to command options.
If running for training, it is also recommended to mount a data
directory,
which is used to store experiment logs, config and checkpoints, by appending
-v $PWD/data:/app/data
to Docker run
command. This will mount data
dir in
current working dir on the host machine to /app/data
dir within the container.
As a result, all files will be written to the host directory.
Arguments can be passed to heroic-rl
script either as environment variables or
by directly appending them to Docker run command from above. Environment
variables are named like this: HEROIC_RL_<COMMAND>_<OPTION>
, and are provided
to run command with -e <ENVVAR>=<VALUE>
, e.g. -e HEROIC_RL_TRAIN_EPOCHS=1000
.
Python 3.6.1 or greater is required to run this project. GNU C compiler and a few more things are also needed - assuming Ubuntu 18.04, these can be installed by running:
sudo apt-get install python3-venv python3-setuptools python3-dev gcc libopenmpi-dev
This project uses Poetry for dependency management and as build tool. To install Poetry for current user, run:
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
In that same shell run source $HOME/.poetry/env
, or just open up a new shell.
After cloning the repository, run this from within project root:
poetry install
Poetry will create a new virtualenv, and install required dependencies and helper
scripts. In case CUDA-supported GPU(s) is available, please append -E gpu
to
the end of previous command. Otherwise, Tensorflow will be installed without
GPU support.
Note: if there is an issue with pip unable to find tensorflow
, please run the
following to upgrade pip to latest version - this will only affect pip within
virtualenv that Poetry automatically creates:
poetry run pip install -U pip
Agent entrypoint script can now be invoked by prepending it with poetry run
:
poetry run heroic-rl
Poetry can spawn a new shell within newly created virtualenv. Within this shell
there is no need to specify poetry run
:
poetry shell
heroic-rl
Exit the shell with Ctrl-D.
Agent provides CLI entrypoint called heroic-rl
that is used to invoke training,
inference and other commands.
Available commands can be listed by running heroic-rl
, and help for for each
command is displayed by running heroic-rl <command> --help
. Defaults for each
option are also displayed.
In a nutshell:
train
is used for starting a fresh training experiment,resume
resumes an existing experiment, which was interrupted before completion,simulate
is used to run inference on specified number of battles with a provided agent checkpoint, against specified adversaryrender
can display actual battle, using provided agent checkpoint as left player, in a terminal user interface, based oncurses
serve
starts a Flask service that exposes inference with provided agent checkpoint via RESTful API
To train an agent for 1000 epochs against utility-based AI adversary, using
simple
reward function (+1 for victory, -1 for defeat), running on a training
server listening at localhost:8081
, and sane defaults for
hyperparameters and other options, run:
heroic-rl train -e 1000 <exp_name>
This will start training in a directory called data/<exp_name>/<exp_name>_s<seed>
.
Default value for seed is current time, so running multiple experiments with
the same name will result in new subdirectories in data/<exp_name>
. For
further reference, let's call directory for our example experiment <exp_dir>
.
You can examine training progress with Tensorboard:
tensorboard --logdir <exp_dir>
All logs go to <exp_dir>/train.log
, including training config. Training
config is also serialized as YAML to <exp_dir>/training_cfg.yml
.
Training progress for each agent in single experiment is tracked separately.
Each agent has a directory called <exp_dir>/agent_<id>
, let's call it
<agent_dir>
. There is a tabular representation of training progress called
progress.txt
in each agent dir, which contains pretty much all data that is
displayed in Tensorboard. Checkpoints for each agent are saved in agent dir, in
SavedModel
format, each checkpoint having its own directory named after the epoch at which
model was saved, i.e. simple_save<epoch>
. Checkpoints are created every 10
epochs, by default, which can be changed with --save-freq
.
Training can be interrupted with Ctrl+C at any time. In order to resume an existing experiment, run:
heroic-rl resume <exp_dir>
To simulate 1000 battles against utility-based AI using trained agent checkpoint
saved at epoch 300, running on a training server listening at localhost:8081
, run:
heroic-rl simulate -n 1000 <exp_path>/agent_1/simple_save300
This will run inference and log final win rate at the end.
To run battle visualization in curses-based TUI, with trained agent checkpoint saved at
epoch 300, running on a training server listening at localhost:8081
, run:
heroic-rl render <exp_path>/agent_1/simple_save300
There are keyboard shortcuts that can speed up/down replay, reverse time
and simulate another battle. Press q
to exit the visualization.
Agent source code is published under the terms of GPL-v3 license. Full license text can be found in LICENSE.agent.md.
Server Docker image is published under the terms of Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. More details can be found in LICENSE.server.md.
@article{warchalski2020heroicrl,
title={Deep RL Agent for a Real-Time Strategy Game},
author={Warchalski, Michał and Radojević, Dimitrije and Milošević, Miloš},
journal={arXiv preprint arXiv:2002.06290},
year={2020}
}