Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Numpy] The gradient of einsum is wrong #18102

Closed
sxjscience opened this issue Apr 20, 2020 · 4 comments
Closed

[Numpy] The gradient of einsum is wrong #18102

sxjscience opened this issue Apr 20, 2020 · 4 comments

Comments

@sxjscience
Copy link
Member

sxjscience commented Apr 20, 2020

The gradient of einsum is not reliable. The following is just one example. There are actually multiple scenarios in which the gradient is wrong. This operator has both performance issues as stated in #18043 and numeric problems.

We should recommend the users not to use the einsum in MXNet util these issues are fixed.

import numpy as np
import mxnet as mx
from numpy.testing import assert_allclose
mx.npx.set_np()
mx.npx.random.seed(123)

ctx = mx.cpu()

A = mx.np.random.normal(0, 1, (1, 1, 5, 3), ctx=ctx)
B = mx.np.random.normal(0, 1, (1, 1, 3, 2), ctx=ctx)
out_grad = mx.np.random.normal(0, 1, (1, 1, 5, 2), ctx=ctx)

A.attach_grad()
B.attach_grad()

with mx.autograd.record():
    out = mx.np.einsum('bnij,bnjc->bnic', A, B)
    out.backward(out_grad)

out_gt = A.asnumpy()[0, 0].dot(B.asnumpy()[0, 0])
A_gt_grad = out_grad.asnumpy()[0, 0].dot(B.asnumpy()[0, 0].T)
B_gt_grad = A.asnumpy()[0, 0].T.dot(out_grad.asnumpy()[0, 0])
A_einsum_grad = A.grad.asnumpy()
B_einsum_grad = B.grad.asnumpy()

A.grad[:] = 0
B.grad[:] = 0
with mx.autograd.record():
    out = mx.np.matmul(A, B)
    out.backward(out_grad)
A_matmul_grad = A.grad.asnumpy()
B_matmul_grad = B.grad.asnumpy()


assert_allclose(A_gt_grad, A_matmul_grad[0, 0], 1E-5, 1E-5)
assert_allclose(B_gt_grad, B_matmul_grad[0, 0], 1E-5, 1E-5)
assert_allclose(A_gt_grad, A_einsum_grad[0, 0], 1E-5, 1E-5)
assert_allclose(B_gt_grad, B_einsum_grad[0, 0], 1E-5, 1E-5)

Error:

AssertionError: 
Not equal to tolerance rtol=1e-05, atol=1e-05

Mismatched elements: 15 / 15 (100%)
Max absolute difference: 1.7815545
Max relative difference: 656.66394
 x: array([[ 1.226955,  0.715323, -0.593543],
       [-0.479941, -0.21298 ,  0.192811],
       [-1.428259, -0.64951 ,  0.583035],...
 y: array([[-5.545996e-01,  1.247264e-02,  7.049765e-02],
       [-5.368751e-02,  1.207402e-03,  6.824460e-03],
       [-9.618776e-02,  2.163209e-03,  1.222686e-02],...
@sxjscience
Copy link
Member Author

sxjscience commented Apr 20, 2020

@yzhliu @hzfan @szha @leezu FYI

@hzfan
Copy link
Contributor

hzfan commented Apr 20, 2020

Thanks for bringing this up. Will check it out.

@yzhliu yzhliu added the v2.0 label Apr 29, 2020
@yzhliu
Copy link
Member

yzhliu commented Apr 29, 2020

Assignee: @hanke580

@hanke580
Copy link
Contributor

hanke580 commented Jun 3, 2020

PR #18419, gradient fixed
All check passed.

sxjscience pushed a commit that referenced this issue Jun 3, 2020
* * Fix einsum Bug

* * Fix sanity

* * Fix one dim start bug

* * Fix test case gt
yijunc pushed a commit to yijunc/incubator-mxnet that referenced this issue Jun 9, 2020
* * Fix einsum Bug

* * Fix sanity

* * Fix one dim start bug

* * Fix test case gt
AntiZpvoh pushed a commit to AntiZpvoh/incubator-mxnet that referenced this issue Jul 6, 2020
* * Fix einsum Bug

* * Fix sanity

* * Fix one dim start bug

* * Fix test case gt
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants