Regarding logic for first done indexes #17

victor-psiori · 2021-04-07T06:53:28Z

Hi,
Thanks for the code and the paper on using adaptive attention span in RL.
In train.py, I haven't understood the logic for calculating ind_first_done in following line:

adaptive-transformers-in-rl/train.py

Line 1240 in 6f75366

    
           ind_first_done = padding_mask.long().argmin(0) + 1  # will be index of first 1 in each column

.

After going through the loss calculations and learn function where ind_first_done is used, I feel line:

adaptive-transformers-in-rl/train.py

Line 1240 in 6f75366

    
           ind_first_done = padding_mask.long().argmin(0) + 1  # will be index of first 1 in each column

should be as follows:
ind_first_done = padding_mask.long().argmax(0) + 1 .
I feel so because from the comments, ind_first_done denotes the final index in each trajectory.

Could you kindly explain the logic used for the mentioned snippet?

The text was updated successfully, but these errors were encountered:

skkuai · 2022-05-03T05:09:42Z

I took a hint from your issue and modified the code like this,

all_zero = (~padding_mask).all(dim=0)
ind_first_done = padding_mask.long().argmax(0) + 1
ind_first_done = (~all_zero) * ind_first_done + all_zero * T

Then the model was trained well. Thank you.

skkuai mentioned this issue May 10, 2022

Stable Transformer on Pong #16

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding logic for first done indexes #17

Regarding logic for first done indexes #17

victor-psiori commented Apr 7, 2021

skkuai commented May 3, 2022 •

edited

Loading

Regarding logic for first done indexes #17

Regarding logic for first done indexes #17

Comments

victor-psiori commented Apr 7, 2021

skkuai commented May 3, 2022 • edited Loading

skkuai commented May 3, 2022 •

edited

Loading