Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regarding logic for first done indexes #17

Open
victor-psiori opened this issue Apr 7, 2021 · 1 comment
Open

Regarding logic for first done indexes #17

victor-psiori opened this issue Apr 7, 2021 · 1 comment

Comments

@victor-psiori
Copy link

Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In train.py, I haven't understood the logic for calculating ind_first_done in following line:

ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column
.

After going through the loss calculations and learn function where ind_first_done is used, I feel line:

ind_first_done = padding_mask.long().argmin(0) + 1 # will be index of first 1 in each column
should be as follows:
ind_first_done = padding_mask.long().argmax(0) + 1 .
I feel so because from the comments, ind_first_done denotes the final index in each trajectory.

Could you kindly explain the logic used for the mentioned snippet?

@skkuai
Copy link

skkuai commented May 3, 2022

I took a hint from your issue and modified the code like this,

all_zero = (~padding_mask).all(dim=0)
ind_first_done = padding_mask.long().argmax(0) + 1
ind_first_done = (~all_zero) * ind_first_done + all_zero * T

Then the model was trained well. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants