The Evo Runner optimizes the first agent using evolutionary learning.
See this experiment for an example of how to configure it.
This runner extends the evo runner to N > 2
agents by letting the first and second agent assume multiple roles that can be configured via agent1_roles
and agent2_roles
in the experiment configuration.
Both agents receive different sets of memories for each role that they assume but share the weights.
- For heterogeneous games roles can be shuffled for each rollout using the
shuffle_players
flag. - Using the
self_play_anneal
flag one can anneal the self-play probability from 0 to 1 over the course of the experiment.
See this experiment for an example of how to configure it.
A simple baseline for MARL experiments is having one agent assume multiple roles and share the weights between them (but not the memory). In order for this approach to work the observation vector needs to include one entry that indicates the role of the agent (see Terry et al..
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. This runner stops the learning of an opponent during training, corresponds to the hardstop challenge of Shaper.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. Here we also scan over the evolutionary steps, which makes compilation longer, training shorter and logging stats is not possible.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. This runner randomly samples learning rates for the opponents.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. Payoff matrix is randomly sampled at each rollout. Each opponent has a different payoff matrix.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. Payoff matrix is randomly sampled at each rollout. Each opponent has the same payoff matrix.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. This runner randomly samples payoffs that follow Iterated Prisoner's Dilemma constraints.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. Payoff matrix is randomly sampled at each rollout. Each opponent has the same payoff matrix. The payoff matrix is observed as input to the agent.
See this experiment for an example of how to configure it.
The Evo Runner optimizes the first agent using evolutionary learning. Noise is added to the opponents IPD-like payout matrix at each rollout. Each opponent has the same noise added.
See this experiment for an example of how to configure it.