You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi there, thank you a lot for contributing this pytorch version of paired. I have two questions and I hope you could clarify for me. Really appreciate it.
the num_steps for rollouting Protagonist's and Antagonist's policy in the grid env is set as 256 by default (
). I am not sure will the env be terminated when the max_steps=256 is reached. If yes, then the two agents are only rollout on the env for one episode, which is not enough to produce max/mean return for Antagonist/Protagonist. If no, then the two agents will be rollout for several episodes, depending on how many steps they will perform in the env for one episode. But, if this is the case, then the Antagonist and Protagonist are not evaluated for the same number of episodes. So, I am confused about this.
As stated in the first paragraph in Part 4 in the paper, the authors will first generate the env by env_adversary given the Protagonist with fixed policy, and then the Antagonist will be trained on this env to optimality. After training Antagonist, we compute the Regret based on the trained Antagonist's policy and pre-trained Protagonist's policy. However, in your implementation, in the run() function (
Hi there, thank you a lot for contributing this pytorch version of paired. I have two questions and I hope you could clarify for me. Really appreciate it.
the num_steps for rollouting Protagonist's and Antagonist's policy in the grid env is set as 256 by default (
paired/envs/runners/adversarial_runner.py
Line 373 in fd49543
As stated in the first paragraph in Part 4 in the paper, the authors will first generate the env by env_adversary given the Protagonist with fixed policy, and then the Antagonist will be trained on this env to optimality. After training Antagonist, we compute the Regret based on the trained Antagonist's policy and pre-trained Protagonist's policy. However, in your implementation, in the run() function (
paired/envs/runners/adversarial_runner.py
Line 356 in fd49543
Again, thank you so much for your effort.
The text was updated successfully, but these errors were encountered: