Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

max_step in rollout and order of training #1

Open
wenjunli-0 opened this issue Oct 19, 2021 · 0 comments
Open

max_step in rollout and order of training #1

wenjunli-0 opened this issue Oct 19, 2021 · 0 comments

Comments

@wenjunli-0
Copy link

Hi there, thank you a lot for contributing this pytorch version of paired. I have two questions and I hope you could clarify for me. Really appreciate it.

  1. the num_steps for rollouting Protagonist's and Antagonist's policy in the grid env is set as 256 by default (

    num_steps=self.agent_rollout_steps,
    ). I am not sure will the env be terminated when the max_steps=256 is reached. If yes, then the two agents are only rollout on the env for one episode, which is not enough to produce max/mean return for Antagonist/Protagonist. If no, then the two agents will be rollout for several episodes, depending on how many steps they will perform in the env for one episode. But, if this is the case, then the Antagonist and Protagonist are not evaluated for the same number of episodes. So, I am confused about this.

  2. As stated in the first paragraph in Part 4 in the paper, the authors will first generate the env by env_adversary given the Protagonist with fixed policy, and then the Antagonist will be trained on this env to optimality. After training Antagonist, we compute the Regret based on the trained Antagonist's policy and pre-trained Protagonist's policy. However, in your implementation, in the run() function (

    ), I found that you run env_adversary, Protagonist and Antagonist in order. Could you also clarify this?

Again, thank you so much for your effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant