You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In train.py, I haven't understood the logic for calculating ind_first_done in following line:
ind_first_done=padding_mask.long().argmin(0) +1# will be index of first 1 in each column
should be as follows: ind_first_done = padding_mask.long().argmax(0) + 1 .
I feel so because from the comments, ind_first_done denotes the final index in each trajectory.
Could you kindly explain the logic used for the mentioned snippet?
The text was updated successfully, but these errors were encountered:
Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In
train.py
, I haven't understood the logic for calculatingind_first_done
in following line:adaptive-transformers-in-rl/train.py
Line 1240 in 6f75366
After going through the loss calculations and
learn
function whereind_first_done
is used, I feel line:adaptive-transformers-in-rl/train.py
Line 1240 in 6f75366
ind_first_done = padding_mask.long().argmax(0) + 1
.I feel so because from the comments,
ind_first_done
denotes the final index in each trajectory.Could you kindly explain the logic used for the mentioned snippet?
The text was updated successfully, but these errors were encountered: