Multi-Agent DDPG on ml-agents environment

This repository implements maddpg (Multi-Agent Deep Deterministic Policy Gradients)[1] for solving multi-agent mixed cooperative-competitive environments.

Environment

The goal of this environment is control two agents to bounce a ball over a net with rackets and keep the ball in play.

Reward
- each agent receives its own score.
- +0.1 if an agent hits a ball over the net.
- -0.01 if an agent lets a ball hit the ground or hits the ball out of bounds.
Observation space
- each agent receives its own local observation.
- observation space of size 24 (of 3 frames)
  - single frame observation space size is 8.
    - size 4 for a ball position and velocity.
    - size 4 for a racket position and velocity.
Action space
- each agent is allowed to navigate and jump around in their court region.
- continuous action space of size 2
  - moving toward and bacward
  - jumping up and down
Solving condition
- the task is episodic, the agents must get an average score of +0.5 over 100 consecutive episodes.
- score calculation
  1. add up each agent score separately during a game play as described in the reward section
    - score1 and score2 for respective agents
  2. score for each episode = max(score1, score2)
  3. average over 100 consecutive episodes

Training a multi-agent
- Edit the env_filename, train_config and agent_config in the train.py for different configurations.
- Run python3 train.py to train the multi-agent.
Evaluate the multi-agent
- Edit the env_filename and agent_path in the test.py for different configurations.
- Pretrained models are located inside the examples/<environment>/model directory.
- Run python3 test.py to evaluate the trained models.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
examples/tennis		examples/tennis
images		images
maddpg		maddpg
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
report.ipynb		report.ipynb
test.py		test.py
train.py		train.py