can mxnet provide the sparse gradient update for word embedding #1237

everwind · 2016-01-11T03:43:15Z

embed = mx.sym.Embedding(data=input, input_dim=1000000,output_dim=50, name="embed" )
the backward of this embed symbol is so slowly for large vocabulary。

sparse gradient update of word embedding in the NLP is so important !!!!

piiswrong · 2016-01-11T06:45:47Z

This is due to mshadow's assignment mechanism. You should be able to do it by writing a standalone cuda kernel (and cpu loop for cpu version) to replace take_grad. Maybe copy a kernel from Caffe?

tqchen · 2016-01-11T19:14:39Z

related to #722 mainly due to the graph aggregation in backward, need to optimize graph executor, or change gradient request to add. This should not be problem of the mshadow

piiswrong · 2016-01-11T19:40:49Z

@tqchen I think @everwind is refering to a different problem.

Tensor<xpu, 2> grad_in = in_grad[embedding::kWeight].get<xpu, 2, real_t>(s);
grad_in = take_grad(data, grad_out, param_.input_dim);

The gradient has the same dimension with the weight, but actually it only need to update the columns refered to by the input indexes. If you have a huge vocabulary it can be really slow.

We probably need to support general row spares matrix to fix this.

Godricly · 2016-04-20T01:22:14Z

@piiswrong Is there any sparse matrix available now for this embedding layer? I just read the embedding layer code,and it seem that the problem is still there.

sxjscience · 2016-05-27T07:33:58Z

+1 for general row sparse matrix support.

phunterlau · 2017-09-28T06:17:48Z

This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks!

leopd · 2017-09-28T12:51:55Z

Please don't close valid & useful feature requests just because it's taking them a long time to be implemented. To quote the OP: "sparse gradient update of word embedding in the NLP is so important !!!!"

sxjscience · 2017-09-28T13:49:32Z

@leopd I feel that the sparse gradient update feature is added now by @eric-haibin-lin . See #7921 for a tutorial.

leopd · 2017-09-28T14:18:47Z

Great! I'm super glad to hear that.

eric-haibin-lin · 2017-10-02T17:29:41Z

The sparse updaters are in place, but we also need to override the Embedding operator to produce gradients in RowSparse format, which is planned for the next few weeks.

jamesliu · 2017-10-04T17:57:22Z

Looking forward to it.

eric-haibin-lin · 2017-10-28T18:07:52Z

#8460

Godricly mentioned this issue Apr 22, 2016

model-parallel lstm on multi machine #1933

Closed

formath mentioned this issue Feb 7, 2017

[DISCUSSION] Sparse Tensor Support Design #4742

Closed

phunterlau closed this as completed Sep 28, 2017

eric-haibin-lin reopened this Oct 2, 2017

tqchen closed this as completed Oct 19, 2017

eric-haibin-lin self-assigned this Oct 28, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

can mxnet provide the sparse gradient update for word embedding #1237

can mxnet provide the sparse gradient update for word embedding #1237

everwind commented Jan 11, 2016

piiswrong commented Jan 11, 2016

tqchen commented Jan 11, 2016

piiswrong commented Jan 11, 2016

Godricly commented Apr 20, 2016

sxjscience commented May 27, 2016

phunterlau commented Sep 28, 2017

leopd commented Sep 28, 2017

sxjscience commented Sep 28, 2017

leopd commented Sep 28, 2017

eric-haibin-lin commented Oct 2, 2017 •

edited

Loading

jamesliu commented Oct 4, 2017

eric-haibin-lin commented Oct 28, 2017

can mxnet provide the sparse gradient update for word embedding #1237

can mxnet provide the sparse gradient update for word embedding #1237

Comments

everwind commented Jan 11, 2016

piiswrong commented Jan 11, 2016

tqchen commented Jan 11, 2016

piiswrong commented Jan 11, 2016

Godricly commented Apr 20, 2016

sxjscience commented May 27, 2016

phunterlau commented Sep 28, 2017

leopd commented Sep 28, 2017

sxjscience commented Sep 28, 2017

leopd commented Sep 28, 2017

eric-haibin-lin commented Oct 2, 2017 • edited Loading

jamesliu commented Oct 4, 2017

eric-haibin-lin commented Oct 28, 2017

eric-haibin-lin commented Oct 2, 2017 •

edited

Loading