Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TestNeighbors.py failures with pytorch 1.13 #84

Closed
sef43 opened this issue Feb 27, 2023 · 3 comments · Fixed by #91
Closed

TestNeighbors.py failures with pytorch 1.13 #84

sef43 opened this issue Feb 27, 2023 · 3 comments · Fixed by #91
Labels
bug Something isn't working

Comments

@sef43
Copy link
Contributor

sef43 commented Feb 27, 2023

If I install NNPOps with pytorch 13

conda install -c conda-forge nnpops pytorch=1.13

Or build from source with pytorch 1.13 then the tests in TestNeighbors.py fail with a pytorch runtime error, e.g.:

FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;...

Output of

pytest TestNeighbors.py
=================================================================================== short test summary info ====================================================================================
FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[distances-1-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [0]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-2-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[distances-2-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-3-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[distances-3-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [3]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-4-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[distances-4-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [6]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-5-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-5-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [10]], which is output 0 of NormBackward1, is at version ...
FAILED TestNeighbors.py::test_neighbor_grads[distances-10-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [45]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[distances-10-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [45]], which is output 0 of NormBackward1, is at version ...
FAILED TestNeighbors.py::test_neighbor_grads[distances-100-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4950]], which is output 0 of NormBackward1, is at version...
FAILED TestNeighbors.py::test_neighbor_grads[distances-100-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [4950]], which is output 0 of NormBackward1, is at versio...
FAILED TestNeighbors.py::test_neighbor_grads[distances-1000-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [499500]], which is output 0 of NormBackward1, is at versi...
FAILED TestNeighbors.py::test_neighbor_grads[distances-1000-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [499500]], which is output 0 of NormBackward1, is at vers...
FAILED TestNeighbors.py::test_neighbor_grads[combined-1-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[combined-1-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [0]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-2-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[combined-2-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [1]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-3-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [3]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[combined-3-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [3]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-4-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [6]], which is output 0 of NormBackward1, is at version 1;...
FAILED TestNeighbors.py::test_neighbor_grads[combined-4-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [6]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-5-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [10]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-5-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [10]], which is output 0 of NormBackward1, is at version ...
FAILED TestNeighbors.py::test_neighbor_grads[combined-10-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [45]], which is output 0 of NormBackward1, is at version 1...
FAILED TestNeighbors.py::test_neighbor_grads[combined-10-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [45]], which is output 0 of NormBackward1, is at version ...
FAILED TestNeighbors.py::test_neighbor_grads[combined-100-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4950]], which is output 0 of NormBackward1, is at version...
FAILED TestNeighbors.py::test_neighbor_grads[combined-100-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [4950]], which is output 0 of NormBackward1, is at versio...
FAILED TestNeighbors.py::test_neighbor_grads[combined-1000-dtype0] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [499500]], which is output 0 of NormBackward1, is at versi...
FAILED TestNeighbors.py::test_neighbor_grads[combined-1000-dtype1] - RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.DoubleTensor [499500]], which is output 0 of NormBackward1, is at vers...
========================================================================== 32 failed, 214 passed, 2 warnings in 8.07s ==========================================================================

The test passes with pytorch=1.12

Is anyone else able to reproduce?
running on Linux with Cuda 11.7

@RaulPPelaez
Copy link
Contributor

I am compiling from source and I can reproduce this, also using Linux with CUDA 11.7.

@RaulPPelaez
Copy link
Contributor

Torch suggests running the offending code with the following defined:

    pt.autograd.set_detect_anomaly(True)

Running the test with that marks the following lines as offending:

distances_cpu.sum().backward()

(deltas_cpu.sum() + distances_cpu.sum()).backward()

Is this meaningful to you? @sef43 @raimis

@raimis raimis added the bug Something isn't working label Mar 2, 2023
@RaulPPelaez RaulPPelaez mentioned this issue Mar 3, 2023
@sef43
Copy link
Contributor Author

sef43 commented Mar 7, 2023

this only happens for the NNPOps CPU implementation, the changes in #91 seem to fix it

@raimis raimis closed this as completed in #91 Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants