-
Notifications
You must be signed in to change notification settings - Fork 6.8k
can mxnet provide the sparse gradient update for word embedding #1237
Comments
This is due to mshadow's assignment mechanism. You should be able to do it by writing a standalone cuda kernel (and cpu loop for cpu version) to replace take_grad. Maybe copy a kernel from Caffe? |
related to #722 mainly due to the graph aggregation in backward, need to optimize graph executor, or change gradient request to add. This should not be problem of the mshadow |
@tqchen I think @everwind is refering to a different problem. Tensor<xpu, 2> grad_in = in_grad[embedding::kWeight].get<xpu, 2, real_t>(s);
grad_in = take_grad(data, grad_out, param_.input_dim); The gradient has the same dimension with the weight, but actually it only need to update the columns refered to by the input indexes. If you have a huge vocabulary it can be really slow. We probably need to support general row spares matrix to fix this. |
@piiswrong Is there any sparse matrix available now for this embedding layer? I just read the embedding layer code,and it seem that the problem is still there. |
+1 for general row sparse matrix support. |
This issue is closed due to lack of activity in the last 90 days. Feel free to reopen if this is still an active issue. Thanks! |
Please don't close valid & useful feature requests just because it's taking them a long time to be implemented. To quote the OP: "sparse gradient update of word embedding in the NLP is so important !!!!" |
@leopd I feel that the sparse gradient update feature is added now by @eric-haibin-lin . See #7921 for a tutorial. |
Great! I'm super glad to hear that. |
The sparse updaters are in place, but we also need to override the Embedding operator to produce gradients in |
Looking forward to it. |
embed = mx.sym.Embedding(data=input, input_dim=1000000,output_dim=50, name="embed" )
the backward of this embed symbol is so slowly for large vocabulary。
sparse gradient update of word embedding in the NLP is so important !!!!
The text was updated successfully, but these errors were encountered: