agentMaze.py This file is used for learning and generating actions under specific state
envMaze.py This is the environmnet for this experiment, given an action under a state, return the next state
expMaze.py This is main file to run this project and has interaciton with RLGlue model and pygame to display the progress.
rl_glue.py This is the file contains maze implementation by pygame and the Rlglue framework for Reinforcement learning
The following instructions will get you a copy of this project and you can run the project on your local machine.
- Clone this project:
git clone https://github.com/konantian/Dyna-Maze-Game.git
- Enter the project:
cd Dyna-Maze-Game/Codes
- Start the game:
python3 expMaze.py 5 (5 is for n and you can modify this by yourself)
You need to install the following software:
-
Python3
-
pygame
Install the package for python
$ pip install -r requirements.txt
Consider the simple maze shown inset in Figure 8.2. In each of the 47 states there are four actions, up, down, right, and left, which take the agent deterministically to the corresponding neighboring states, except when movement is blocked by an obstacle or the edge of the maze, in which case the agent remains where it is. Reward is zero on all transitions, except those into the goal state, on which it is +1. After reaching the goal state (G), the agent returns to the start state (S) to begin a new episode. This is a discounted, episodic task with
The main part of Figure 8.2 shows average learning curves from an experiment in which Dyna-Q agents were applied to the maze task. The initial action values were zero, the step-size parameter was ↵ = 0.1, and the exploration parameter was " = 0.1. When selecting greedily among actions, ties were broken randomly. The agents varied in the number of planning steps, n, they performed per real step. For each n, the curves show the number of steps taken by the agent to reach the goal in each episode, averaged over 30 repetitions of the experiment. In each repetition, the initial seed for the random number generator was held constant across algorithms. Because of this, the first episode was exactly the same (about 1700 steps) for all values of n, and its data are not shown in the figure. After the first episode, performance improved for all values of n, but much more rapidly for larger values. Recall that the n = 0 agent is a nonplanning agent, using only direct reinforcement learning (one-step tabular Q-learning). This was by far the slowest agent on this problem, despite the fact that the parameter values (↵ and ") were optimized for it. The nonplanning agent took about 25 episodes to reach ("-)optimal performance, whereas the n = 5 agent took about five episodes, and the n = 50 agent took only three episodes.