Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

can mxnet provide the sparse gradient update for word embedding #1237

Closed
everwind opened this issue Jan 11, 2016 · 12 comments
Closed

can mxnet provide the sparse gradient update for word embedding #1237

everwind opened this issue Jan 11, 2016 · 12 comments
Assignees

Comments

@everwind
Copy link

embed = mx.sym.Embedding(data=input, input_dim=1000000,output_dim=50, name="embed" )
the backward of this embed symbol is so slowly for large vocabulary。

sparse gradient update of word embedding in the NLP is so important !!!!

@piiswrong
Copy link
Contributor

This is due to mshadow's assignment mechanism. You should be able to do it by writing a standalone cuda kernel (and cpu loop for cpu version) to replace take_grad. Maybe copy a kernel from Caffe?

@tqchen
Copy link
Member

tqchen commented Jan 11, 2016

related to #722 mainly due to the graph aggregation in backward, need to optimize graph executor, or change gradient request to add. This should not be problem of the mshadow

@piiswrong
Copy link
Contributor

@tqchen I think @everwind is refering to a different problem.

Tensor<xpu, 2> grad_in = in_grad[embedding::kWeight].get<xpu, 2, real_t>(s);
grad_in = take_grad(data, grad_out, param_.input_dim);

The gradient has the same dimension with the weight, but actually it only need to update the columns refered to by the input indexes. If you have a huge vocabulary it can be really slow.

We probably need to support general row spares matrix to fix this.

@Godricly
Copy link
Contributor

@piiswrong Is there any sparse matrix available now for this embedding layer? I just read the embedding layer code,and it seem that the problem is still there.

@sxjscience
Copy link
Member

+1 for general row sparse matrix support.

@phunterlau
Copy link
Contributor

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

@leopd
Copy link
Contributor

leopd commented Sep 28, 2017

Please don't close valid & useful feature requests just because it's taking them a long time to be implemented. To quote the OP: "sparse gradient update of word embedding in the NLP is so important !!!!"

@sxjscience
Copy link
Member

@leopd I feel that the sparse gradient update feature is added now by @eric-haibin-lin . See #7921 for a tutorial.

@leopd
Copy link
Contributor

leopd commented Sep 28, 2017

Great! I'm super glad to hear that.

@eric-haibin-lin
Copy link
Member

eric-haibin-lin commented Oct 2, 2017

The sparse updaters are in place, but we also need to override the Embedding operator to produce gradients in RowSparse format, which is planned for the next few weeks.

@jamesliu
Copy link
Contributor

jamesliu commented Oct 4, 2017

Looking forward to it.

@tqchen tqchen closed this as completed Oct 19, 2017
@eric-haibin-lin eric-haibin-lin self-assigned this Oct 28, 2017
@eric-haibin-lin
Copy link
Member

#8460

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants