[Bug] Inv for matrice bigger than 1000 #92

mfauvel · 2024-06-21T05:17:20Z

🐛 Bug

There is an issue with inv function for matrices bigger than 1000. For size larger or equal to 1001, the function hangs for ever and memory issues happends for size larger than 1300 (on my laptop).

To reproduce

** Code snippet to reproduce **

for n_samples in [999, 1000, 1300, 2000]:
    # Generate dummy data
    A = torch.rand(n_samples, n_samples)
    A = (A.T + A) / 2
    Ac = Dense(A)

    # Solve with torch
    ts = time.time()
    inv_A = torch.linalg.inv(A)
    proc_time_torch = time.time() - ts

    # Solve with cola
    ts = time.time()
    inv_Ac = inv(Ac).to_dense()
    proc_time_cola = time.time() - ts

    if torch.allclose(inv_A, inv_Ac):
        print(
            f"Processing time for n_samples {n_samples} with "
            f"torch and cola: {proc_time_torch} _ {proc_time_cola}"
        )
    else:
        print(f"inv does not match for n_samples: {n_samples}")

** Stack trace/error message **

Processing time for n_samples 999 with torch and cola: 0.03584551811218262 _ 0.03255629539489746
Processing time for n_samples 1000 with torch and cola: 0.022265911102294922 _ 0.02802872657775879
Process stopped

Expected Behavior

I would say the process should not hang.

System information

Please complete the following information:

Cola version: 0.0.5
Torch version: 2.0.1+cu117
OS: Debian GNU/Linux 12 (bookworm)

Additional context

It seems related to #41

I encounter the problem first with Sparse matrix, but the problem is not related to Sparse since I can reproduce the error with Dense matrix.

The text was updated successfully, but these errors were encountered:

mfauvel · 2024-06-21T08:00:13Z

I try another check, using solverather than inv. I don't have the memory issue and the crash, but the results are not accurate for dim higher than 1000 and the processing time is really more important with cola than torch.

After checking how the inverse algorithm is selected (

cola/cola/linalg/inverse/inv.py

Line 71 in 9562ae1

def inv(A: LinearOperator, alg: Auto):

) I force the alg to be LU and now everything is fine.

import time
import torch
import cola
from cola.ops import Sparse, Dense, Identity
from cola.linalg import solve
from cola.linalg.inverse.cg import CG
from cola.linalg.decompositions.decompositions import LU
print(cola.__version__)
print(torch.__version__)

for n_samples in [999, 1000, 1001, 1300, 2000]:
    # Generate dummy data
    A = torch.rand(n_samples, n_samples, dtype=torch.float64)
    b = torch.rand(n_samples, 1, dtype=torch.float64)
    A = (A.T + A) / 2
    Ac = Dense(A)

    # Solve with torch
    ts = time.time()
    x = torch.linalg.solve(A, b)
    proc_time_torch = time.time() - ts

    # Solve with cola
    ts = time.time()
    xc = solve(Ac, b, alg=LU()).to_dense()
    proc_time_cola = time.time() - ts

    print(
        f"Processing time for n_samples {n_samples} with "
        f"torch and cola: {proc_time_torch} _ {proc_time_cola}"
    )
    print(f"inv does not match for n_samples: {n_samples}")
    print(f"Residual for torch: {torch.mean((A @ x -b)**2)}")
    print(f"Residual for cola: {torch.mean((A @ xc-b)**2)}")

which leads the following outputs:

0.0.5
2.0.1+cu117
Processing time for n_samples 999 with torch and cola: 0.009793996810913086 _ 0.02040863037109375
inv does not match for n_samples: 999
Residual for torch: 7.200492875557123e-25
Residual for cola: 2.4516683688616998e-25
Processing time for n_samples 1000 with torch and cola: 0.008498907089233398 _ 0.016102075576782227
inv does not match for n_samples: 1000
Residual for torch: 1.4491477357523913e-26
Residual for cola: 3.9325588238346365e-27
Processing time for n_samples 1001 with torch and cola: 0.009015798568725586 _ 0.01718282699584961
inv does not match for n_samples: 1001
Residual for torch: 4.434132455473258e-26
Residual for cola: 8.911297474077102e-27
Processing time for n_samples 1300 with torch and cola: 0.015895366668701172 _ 0.026092052459716797
inv does not match for n_samples: 1300
Residual for torch: 7.919092478759107e-25
Residual for cola: 1.8887265316154846e-25
Processing time for n_samples 2000 with torch and cola: 0.04905390739440918 _ 0.06647753715515137
inv does not match for n_samples: 2000
Residual for torch: 1.8726508482806117e-25
Residual for cola: 4.7504923991884954e-26

So i would say something is wrong with GMRES ?

mfinzi · 2024-07-09T21:42:41Z

Hi @mfauvel,

Sorry for the late reply. I think I can see what is going on in your test here.

I believe the key problem is from calling to_dense() on inv(A), which implicitly is a wrapper on performing linear solves iteratively with GMRES (or CG in the symmetric case). When you call to_dense the iterative algorithm is run on the massive RHS being the (n x n) identity matrix, defeating the purpose of using an iterative algorithm in the first place, requiring a massive memory allocation and a very long runtime. The function hanging is likely due more to memory than compute, and we can look at adding an additional batching loop to to_dense() to make sure that even in unlikely cases such as these, the computation will eventually terminate. However, one should be very careful with using to_dense() (often to be used just for debugging purposes), as it will require forming the entire matrices in memory.

Instead, the intended use case of inv is to multiply it with a vector (n, ) or skinny matrix (n,k) with k<<n: inv(A)@b (a sometimes more convenient way writing cola.linalg.solve(A,b)). When to_dense() is called instead you get a multiply with a (n,n) matrix. We will try to make this more evident in the documentation.

The reason why this only happens at n>1000 is that the cola.Auto algorithm switches from dense methods (such as LU) to iterative methods (such as GMRES) at n=1000.

Hope this answers your question.

mfauvel · 2024-07-10T14:37:01Z

Thanks @mfinzi for your answer. I think it explained my initial comment. However, in my second comment (#92 (comment)), I effectively used cola.linalg.solve and as you said i did not face memory issues anymore. But the numerical precision is not accurate if I did not impose LU algorithm:

for n_samples in [999, 1000, 1001, 1300, 2000]:
    # Generate dummy data
    A = torch.rand(n_samples, n_samples, dtype=torch.float64)
    b = torch.rand(n_samples, 1, dtype=torch.float64)
    # A = (A.T + A) / 2
    Ac = Dense(A)

    # Solve with torch
    ts = time.time()
    x = torch.linalg.solve(A, b)
    proc_time_torch = time.time() - ts

    # Solve with cola
    ts = time.time()
    # xc = solve(Ac, b, alg=LU()).to_dense()
    xc = solve(Ac, b).to_dense()
    proc_time_cola = time.time() - ts

    print(
        f"Processing time for n_samples {n_samples} with "
        f"torch and cola: {proc_time_torch} _ {proc_time_cola}"
    )
    print(f"inv does not match for n_samples: {n_samples}")
    print(f"Residual for torch: {torch.mean((A @ x -b)**2)}")
    print(f"Residual for cola: {torch.mean((A @ xc-b)**2)}")

Processing time for n_samples 999 with torch and cola: 0.015241384506225586 _ 0.025005340576171875
inv does not match for n_samples: 999
Residual for torch: 3.769979443344865e-26
Residual for cola: 4.5632820650815264e-26
Processing time for n_samples 1000 with torch and cola: 0.01306605339050293 _ 0.023195981979370117
inv does not match for n_samples: 1000
Residual for torch: 6.740551482619981e-27
Residual for cola: 6.062314891002271e-27
Processing time for n_samples 1001 with torch and cola: 0.011551856994628906 _ 13.982511281967163
inv does not match for n_samples: 1001
Residual for torch: 1.3138882670132605e-26
Residual for cola: 4.7555078392359486e-05
Processing time for n_samples 1300 with torch and cola: 0.02622056007385254 _ 24.619579076766968
inv does not match for n_samples: 1300
Residual for torch: 1.0700557343056313e-26
Residual for cola: 3.9548693482973776
Processing time for n_samples 2000 with torch and cola: 0.04886674880981445 _ 24.41248393058777
inv does not match for n_samples: 2000
Residual for torch: 1.1910256580485765e-25
Residual for cola: 78.85581323368416

I may misused the algorithm GMRES ?

Anyway thanks a lot for the Cola package, it simplifies a lot my work !

mfauvel added the bug Something isn't working label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Inv for matrice bigger than 1000 #92

[Bug] Inv for matrice bigger than 1000 #92

mfauvel commented Jun 21, 2024

mfauvel commented Jun 21, 2024 •

edited

Loading

mfinzi commented Jul 9, 2024

mfauvel commented Jul 10, 2024

[Bug] Inv for matrice bigger than 1000 #92

[Bug] Inv for matrice bigger than 1000 #92

Comments

mfauvel commented Jun 21, 2024

🐛 Bug

To reproduce

Expected Behavior

System information

Additional context

mfauvel commented Jun 21, 2024 • edited Loading

mfinzi commented Jul 9, 2024

mfauvel commented Jul 10, 2024

mfauvel commented Jun 21, 2024 •

edited

Loading