You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I still have a few questions on implementation details.
First, what is the reason for partitioning the training procedure with powers of 2 ?
Second, I am confused with normalization. For the source dataset, you use the maximum absolute value normalization while for the sample dataset you use a scaling with lambda x: 10 * 1.0 / x.pow(2).sum().sqrt(). Can you give more insight on this choice ?
Third, why did you choose to pad your sequences with small noise rather than zeros ?
The text was updated successfully, but these errors were encountered:
Thanks a lot for the great work.
I still have a few questions on implementation details.
First, what is the reason for partitioning the training procedure with powers of 2 ?
Second, I am confused with normalization. For the source dataset, you use the maximum absolute value normalization while for the sample dataset you use a scaling with
lambda x: 10 * 1.0 / x.pow(2).sum().sqrt().
Can you give more insight on this choice ?Third, why did you choose to pad your sequences with small noise rather than zeros ?
The text was updated successfully, but these errors were encountered: