-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Embedding Layer is expanded every timestep in LSTM #722
Comments
@antinucleon probably this is one of the reason why our lstm isn't fast as expected? |
Will check it tomorrow. Today & tomorrow morning I am benchmarking ELU. |
This could be the reason, because the weight is binded with a shared variable, and current graph optimizer is yet not smart enough to change the gradient summation to an addto to the original variable. |
has this problem been fixed? |
Probably not. Seems that solving this will have to dive into mshadow code, which is harder ? |
@JianboTang @Kublai-Jing As tqchen pointed out, I think you can replace simple_bind with bind and directly provide the ndarrays as a temporary solution |
Yeah that's what I am doing now : ) |
hopefully fixed by #1697 |
Hi mxnet,
Not sure if this is true, but the embedding layer, say with shape (10000,256, i.e. 10000 vocabulary, 256-dimensional embedding) seems to be expanded at every timestep for the LSTM, even if the weights are shared. I don't know how to verify this but as far as what I observed I believe this is true, which causes huge memory consumption. I thought if we share weights for every timestep, we only need to allocate memory for the actual embedding (256 x t) for t-step lstm in this case but not allocating (10000 x 256 x t), which for the FullyConnected layer will be true. It seems that the current Embedding layer takes the same amount of memory as a FullyConnected layer, but I am not sure whether it's this issue that causes lots of memory overhead in lstm. I wonder where I should be looking at to try to fix this issue ?
Thanks!
The text was updated successfully, but these errors were encountered: