Debugging LinAlgError - any idea what is going on? #132

martinfleis · 2023-11-02T22:07:21Z

I've been occasionally hitting LinAlgError as reported in #94 or #116 and I wanted to better understand what is causing it and when does it happen. And this is one of the toy examples I came up with where I am able to reproduce it but have no idea why.

import numpy as np

arange = np.arange(0, 10).reshape(-1, 1)
coords = np.column_stack([arange, arange])
y = np.random.random(size=(10,1))
X = np.concatenate([arange, arange * 2, arange *3], axis=1)

Then using this very specific bandwidth, I get the singular matrix error

mgwr.gwr.GWR(coords, y, X, bw=12.391785706039375, fixed=True).fit()

But changing it even slightly to another value, larger (12.392) or smaller (12.390), makes it work again. But I am just not able to figure out where this number comes from. My first idea is that it is some specific pairwise distance but it is not.

The requirement for this to happen is collinearity within X but why does it happen for this specific bandwidth is unclear to me. Anyone has an idea?

I started digging into that to either fix it as @ljwolf suggested in #116 or to at least provide an informative error message but given I am not sure what is exactly going on I don't even know how to formulate the error.

The text was updated successfully, but these errors were encountered:

TaylorOshan · 2023-11-03T22:19:49Z

I think there are multiple ways that the data can become ill conditioned enough to throw an error like this and it is sometimes tricky to diagnose because it can be arising from the original data set itself or after being transformed by the weights. In my absence, it sometimes arises from small samples/bandwidths or a lack of variation within samples, or both.

I suppose it would be possible to pull out the individual weighted sample for the particular local regression to try and diagnose things more formally, but implementing a fix as @ljwolf suggested in #116 may help stabilize even without understanding.

martinfleis · 2023-11-03T22:24:36Z

implementing a fix as @ljwolf suggested in #116 may help stabilize even without understanding.

Possibly but it would need to go beyond what @ljwolf suggested as this specific error comes from linalg.solve here: https://github.com/pysal/spglm/blob/ee2d424156118278a3c3f292bf55a9f5d3ff6f7e/spglm/iwls.py#L27-L39 I don't see how replacing inv with pseudo inverse would help (I actually even tried it, with tests passing but error remaining).

martinfleis · 2023-11-03T22:26:29Z

@TaylorOshan do you have any idea how to formulate an error message that is generic enough but give a user a bit more idea what has happened that current LinAlgError: Singular matrix?

knaaptime · 2023-11-05T00:28:00Z

well taylor may disagree with me ;P but multicolinearity can happen in unforseen ways in gwr

the opaque but foreboding error is a reasonable way to let people know they need to think more about the model, imo :)

ljwolf · 2023-11-05T00:37:05Z

To note, Adding a small random value to the diagonal is a distinct fix from moving to a pseudo inverse strategy. And, the pseudo inverse can indeed be swapped in for the solve call… anywhere where solve/inv is called, you can swap a pseudo inverse. Solving for xtx inverse xt is exactly the context linked in #116 in statsmodels. Get Outlook for iOS<https://aka.ms/o0ukef>

…

________________________________ From: eli knaap ***@***.***> Sent: Sunday, November 5, 2023 12:28:11 AM To: pysal/mgwr ***@***.***> Cc: Levi John Wolf ***@***.***>; Mention ***@***.***> Subject: Re: [pysal/mgwr] Debugging LinAlgError - any idea what is going on? (Issue #132) well taylor may disagree with me<https://link.springer.com/article/10.1007/s10109-016-0239-5> ;P but multicolinearity can happen in unforseen ways in gwr * https://link.springer.com/article/10.1007/s10109-005-0155-6 * http://journals.sagepub.com/doi/10.1068/a38218 * https://link.springer.com/article/10.1007/s10109-014-0199-6 the opaque but foreboding error is a reasonable way to let people know they need to think more about the model, imo :) — Reply to this email directly, view it on GitHub<#132 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AARFR465GVK663JOTGEHQC3YC3MRXAVCNFSM6AAAAAA63SJJ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOJTGU4TCNRTG4>. You are receiving this because you were mentioned.Message ID: ***@***.***>

martinfleis · 2023-11-05T09:28:27Z

This is way too deep stats/maths for me 🙃. I'll make a PR with a custom error message and ask you to review its text but will leave the solution using pseudo inverse or anything else to someone more capable.

ljwolf · 2023-11-06T14:09:49Z

Looking at this today, this is not an mgwr-specific issue, and this is not related to local collinearity, separate from global collinearity.

The global regression here is ill-posed: the X matrix is totally collinear, so it's not going to work as a global regression.

>>> from spreg import OLS
>>> OLS(y,X) # raises LinAlgError

The singularity is due to the X.T @ X matrix:

>>> numpy.linalg.inv(X.T @ X) # raises singular

Some argue you should try to avoid direct calls to inv... so
dedicated least squares solvers (numpy.linalg.lstsq, scipy.linalg.lstsq, and scipy.sparse.linalg.lstsq) are designed to handle this. The dedicated lstsq solvers will indicate that the output is poorly conditioned, but still return a solution.

>>> coefs, resids, rank, svs = numpy.linalg.lstsq(X, y)
>>> coefs
array([[0.00656886],
       [0.01313772],
       [0.01970658]])
>>> rank
1
>>> resids
array([], dtype=float64)

Note that the rank (like, effective number of unique columns) is 1, and resids will be empty. Our use of numpy.linalg.solve() is still an exact solver, and will fail as the numpy.linalg.inv() fails.

>>> numpy.linalg.solve(X.T @ X, X.T) # raises LinAlgError

Note that if we use the Tikhonov trick, we get the same betas, and no warning about the very small singular values:

>>> ridge = numpy.eye(X.shape[1]) * 1e-5 # this is the "ridge" in a ridge regression
>>> tikhonov_betas = numpy.linalg.inv(X.T @ X + ridge) @ X.T @ y
>>> tikhonov_betas 
array([[0.00656886],
       [0.01313772],
       [0.01970658]])

So, what's the fix here?

GWR doesn't check the rank of the input X matrix. Maybe it should? If the global regression is totally collinear, then there is no guarantee that local regressions will not be collinear.
Generally,mgwr could swap to more robust least squares solvers across the board, relying on the linear algebra infrastructure in numpy/scipy? I think numpy.linalg.lstsq(wi * X, wi @ y) with a warning raised when rank < n_features would be a fast rank-safe replacement for the inner solve.

ljwolf mentioned this issue Nov 7, 2023

[PoC] add low-rank fitting #134

Open

ljwolf mentioned this issue Dec 20, 2023

LinAlgError: Matrix is Singular #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Debugging LinAlgError - any idea what is going on? #132

Debugging LinAlgError - any idea what is going on? #132

martinfleis commented Nov 2, 2023

TaylorOshan commented Nov 3, 2023

martinfleis commented Nov 3, 2023

martinfleis commented Nov 3, 2023

knaaptime commented Nov 5, 2023

ljwolf commented Nov 5, 2023 via email

martinfleis commented Nov 5, 2023

ljwolf commented Nov 6, 2023 •

edited

Loading

Debugging LinAlgError - any idea what is going on? #132

Debugging LinAlgError - any idea what is going on? #132

Comments

martinfleis commented Nov 2, 2023

TaylorOshan commented Nov 3, 2023

martinfleis commented Nov 3, 2023

martinfleis commented Nov 3, 2023

knaaptime commented Nov 5, 2023

ljwolf commented Nov 5, 2023 via email

martinfleis commented Nov 5, 2023

ljwolf commented Nov 6, 2023 • edited Loading

ljwolf commented Nov 6, 2023 •

edited

Loading