Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failures with GPUs #276

Closed
bwohlberg opened this issue Apr 13, 2022 · 2 comments
Closed

Test failures with GPUs #276

bwohlberg opened this issue Apr 13, 2022 · 2 comments
Assignees
Labels
bug Something isn't working tests Pertaining to SCICO tests
Milestone

Comments

@bwohlberg
Copy link
Collaborator

Two unit tests on main are failing when run on GPUs. One of them should be easy to resolve via the usual strategy of changing a comparison tolerance, but the other one looks a bit more serious.

=================================== FAILURES ===================================
__________________________ test_minimize[CG-float32] ___________________________

dtype = <class 'jax.numpy.float32'>, method = 'CG'

    @pytest.mark.parametrize("dtype", [snp.float32, snp.complex64])
    @pytest.mark.parametrize("method", ["CG", "L-BFGS-B"])
    def test_minimize(dtype, method):
	from scipy.linalg import block_diag

	B, M, N = (4, 3, 2)

	# Models a 12x8 block-diagonal matrix with 4x3 blocks
	A, key = random.randn((B, M, N), dtype=dtype)
	x, key = random.randn((B, N), dtype=dtype)
	y = snp.sum(A * x[:, None], axis=2)  # contract along the N axis

	# result by directly inverting the dense matrix
	A_mat = block_diag(*A)
>       expected = np.linalg.pinv(A_mat) @ y.ravel()

scico/test/test_solver.py:172:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
<__array_function__ internals>:5: in pinv
    ???
/miniconda3/envs/py39gpu/lib/python3.9/site-packages/numpy/linalg/linalg.py:2002: in pinv
    u, s, vt = svd(a, full_matrices=False, hermitian=hermitian)
<__array_function__ internals>:5: in svd
    ???
/miniconda3/envs/py39gpu/lib/python3.9/site-packages/numpy/linalg/linalg.py:1660: in svd
    u, s, vh = gufunc(a, signature=signature, extobj=extobj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

err = 'invalid value', flag = 8

    def _raise_linalgerror_svd_nonconvergence(err, flag):
>       raise LinAlgError("SVD did not converge")
E       numpy.linalg.LinAlgError: SVD did not converge

/miniconda3/envs/py39gpu/lib/python3.9/site-packages/numpy/linalg/linalg.py:97: LinAlgError
_________________________ test_binary_op[float32-add] __________________________

testobj = <test_linop.LinearOperatorTestObj object at 0x7faa2049e160>
operator = <built-in function add>

    @pytest.mark.parametrize("operator", [op.add, op.sub])
    def test_binary_op(testobj, operator):
	# Our AbsMatOp class does not override the __add__, etc
	# so AbsMatOp + AbsMatOp -> LinearOperator
	# So to verify results, we evaluate the new LinearOperator on a random input

	comp_mat = operator(testobj.A, testobj.B)  # composite matrix
	comp_op = operator(testobj.Ao, testobj.Bo)  # composite linop

	assert isinstance(comp_op, linop.LinearOperator)  # Ensure we don't get a Map
	assert comp_op.input_dtype == testobj.A.dtype
>       np.testing.assert_allclose(comp_mat @ testobj.x, comp_op @ testobj.x, rtol=5e-5)
E       AssertionError:
E       Not equal to tolerance rtol=5e-05, atol=0
E
E       Mismatched elements: 1 / 8 (12.5%)
E       Max absolute difference: 9.536743e-07
E       Max relative difference: 7.0016424e-05
E        x: array([-2.417045,  1.69006 ,  2.91617 ,  0.009365,  3.242247,  9.085916,
E               7.729687, -7.39012 ], dtype=float32)
E        y: array([-2.417045,  1.69006 ,  2.916171,  0.009364,  3.242248,  9.085917,
E               7.729687, -7.39012 ], dtype=float32)

scico/test/linop/test_linop.py:108: AssertionError
=========================== short test summary info ============================
FAILED scico/test/test_solver.py::test_minimize[CG-float32] - numpy.linalg.Li...
FAILED scico/test/linop/test_linop.py::test_binary_op[float32-add] - Assertio...
====== 2 failed, 3071 passed, 3 skipped, 11 xfailed in 742.76s (0:12:22) =======
@bwohlberg bwohlberg added bug Something isn't working tests Pertaining to SCICO tests labels Apr 13, 2022
@bwohlberg bwohlberg added this to the Release 0.0.3 milestone Jul 1, 2022
@Michael-T-McCann Michael-T-McCann self-assigned this Jul 28, 2022
@Michael-T-McCann
Copy link
Contributor

I only get the second (easier to fix) failure when I run these, with jaxlib == 0.3.5+cuda11.cudnn82 and jax == 0.3.5. @bwohlberg can you help me reproduce this?

@bwohlberg
Copy link
Collaborator Author

I was able to replicate the error with Python 3.9.12, jaxlib 0.3.5 (0.3.5+cuda11.cudnn82) and jax 0.36. It does not occur if one just runs pytest -x scico/test/test_solver.py, so there must be some sort of state-dependency related to tests that have run before the failing test.

bwohlberg pushed a commit that referenced this issue Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests Pertaining to SCICO tests
Projects
None yet
Development

No branches or pull requests

2 participants