Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Embedding Layer is expanded every timestep in LSTM #722

Closed
Kublai-Jing opened this issue Nov 26, 2015 · 8 comments
Closed

Embedding Layer is expanded every timestep in LSTM #722

Kublai-Jing opened this issue Nov 26, 2015 · 8 comments

Comments

@Kublai-Jing
Copy link
Contributor

Hi mxnet,

Not sure if this is true, but the embedding layer, say with shape (10000,256, i.e. 10000 vocabulary, 256-dimensional embedding) seems to be expanded at every timestep for the LSTM, even if the weights are shared. I don't know how to verify this but as far as what I observed I believe this is true, which causes huge memory consumption. I thought if we share weights for every timestep, we only need to allocate memory for the actual embedding (256 x t) for t-step lstm in this case but not allocating (10000 x 256 x t), which for the FullyConnected layer will be true. It seems that the current Embedding layer takes the same amount of memory as a FullyConnected layer, but I am not sure whether it's this issue that causes lots of memory overhead in lstm. I wonder where I should be looking at to try to fix this issue ?

Thanks!

@mli
Copy link
Contributor

mli commented Nov 26, 2015

@antinucleon probably this is one of the reason why our lstm isn't fast as expected?

@antinucleon
Copy link
Contributor

Will check it tomorrow. Today & tomorrow morning I am benchmarking ELU.

@tqchen
Copy link
Member

tqchen commented Nov 26, 2015

This could be the reason, because the weight is binded with a shared variable, and current graph optimizer is yet not smart enough to change the gradient summation to an addto to the original variable.
Either we directly bind by ndarray, or make the graph executor smarter to handle the addto case more gracefully

@JianboTang
Copy link

has this problem been fixed?

@Kublai-Jing
Copy link
Contributor Author

Probably not. Seems that solving this will have to dive into mshadow code, which is harder ?

@piiswrong
Copy link
Contributor

@JianboTang @Kublai-Jing As tqchen pointed out, I think you can replace simple_bind with bind and directly provide the ndarrays as a temporary solution

@Kublai-Jing
Copy link
Contributor Author

Yeah that's what I am doing now : )

@tqchen
Copy link
Member

tqchen commented Mar 22, 2016

hopefully fixed by #1697

@tqchen tqchen closed this as completed Mar 22, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants