max_step in rollout and order of training #1

wenjunli-0 · 2021-10-19T14:28:06Z

Hi there, thank you a lot for contributing this pytorch version of paired. I have two questions and I hope you could clarify for me. Really appreciate it.

the num_steps for rollouting Protagonist's and Antagonist's policy in the grid env is set as 256 by default (

paired/envs/runners/adversarial_runner.py

Line 373 in fd49543

num_steps=self.agent_rollout_steps,

). I am not sure will the env be terminated when the max_steps=256 is reached. If yes, then the two agents are only rollout on the env for one episode, which is not enough to produce max/mean return for Antagonist/Protagonist. If no, then the two agents will be rollout for several episodes, depending on how many steps they will perform in the env for one episode. But, if this is the case, then the Antagonist and Protagonist are not evaluated for the same number of episodes. So, I am confused about this.
As stated in the first paragraph in Part 4 in the paper, the authors will first generate the env by env_adversary given the Protagonist with fixed policy, and then the Antagonist will be trained on this env to optimality. After training Antagonist, we compute the Regret based on the trained Antagonist's policy and pre-trained Protagonist's policy. However, in your implementation, in the run() function (

paired/envs/runners/adversarial_runner.py

Line 356 in fd49543

def run(self):

), I found that you run env_adversary, Protagonist and Antagonist in order. Could you also clarify this?

Again, thank you so much for your effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

max_step in rollout and order of training #1

max_step in rollout and order of training #1

wenjunli-0 commented Oct 19, 2021

max_step in rollout and order of training #1

max_step in rollout and order of training #1

Comments

wenjunli-0 commented Oct 19, 2021