-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DSAUPD() fails with ARPACK 3.9.0, but not with ARPACK 3.8.0 #401
Comments
Try to test the eigen solve by yourself out of any software that may do "stuffs" and set parameters (you don't know about) for you: you need to understand what are the parameter you need for your solve to CV (or to understand which parameter make the solve DV) and be sure they are set the way you need for your particular problem. For instance, use the arpack C++ solver provided by
Now you can play with all parameters and understand what are the one you need:
Now you see that using a CG solver is OK :
Using other solvers (BiCG, LU, ...) seems OK too
All works fine at my side, all solves are OK. I do not see any problem. Do you reproduce all this at your side? Above I asked nbEV = 2 as I am a human and I know that the matrice may have only 2 eigen values. Say now octave can't infer that and set "automatically" for you nbEV = 5 (or n=10 the size of the matrix)... And you'll get a fail as arpack will not be able to compute all EV in this case as you are asking too much and it does no more make sense (rank(A) = 2 - does asking 5 or more EV makes sense? Can you blame for failing if what you ask doesn't make sense?...)
I do not know which parameter octave / igraph use or not, and, I do not know if they use ICB (which prevent a bunch of bugs as, with ICB, compiler can check types #394 (comment))
Not sure what this tells?! I never use octave or scipy. I always used arpack from Fortran or C setting parameters myself. Which kind of solver does this use in backend? CG? BiCG? MINRES? GMRES? QR? LU? LLt? LDLt? With or without preconditioning? If preconditioning which one Jacobi? ILU? Depending on cases wrong solvers (used in backend during reverse-communication-interface steps) may cause failures. If iterative solving, what is the tolerance you ask for (failures may occur if tol is too small)? How much eigen values do you ask for? 1? 2? 10? Was CV workspace increased accordingly (2*nbEV+1 at least)? Sometimes, you can't get result unless you shift/invert: did you try? Specially when you look for small eigen values, invert is often key AFAIR. Shift may help if CV is difficult. Personal experience: depending on your matrices, you need to play with all a bunch of parameters, some set of parameters will fail, some other get result but are slow, some others are the ones that enable efficient / fast solve. I do not know how and which parameters in octave/scipy/igraph are set (or not) according to the nature of the matrices of your problem: this is crucial to get good / fast convergence. For sure, there is no one set of unique parameters that work for all matrices... |
For reference, this is the Regarding solvers, I am not sure what you are referring to. We use the reverse communication interface, i.e. calling I will give the detailed parameters passed to
EDIT: Adding some missing info:
Here's igraph's automatic NCV selection, modelled on Octave's, in case you have any comments on it: https://github.com/igraph/igraph/blob/master/src/linalg/arpack.c#L855 Is there any other information you need about how |
Did you try to run the exact same command lines detailed here #401 (comment) using LA magnitude? Does it work at your side? If so, you may consider the problem is not arpack but the way it's used (= the way it is set up in igraph/octave/scipy/... as they likely use generic heuristic that may no more suit your particular problem?) All this works fine at my side:
Now yesterday this So now you know how to get this to work: OK with both nbEV = 5 or 9
I encourage you to set as many relevant parameters (according to the nature of your problem) as needed to maximize your chances to get a stable result (that will not change with changes in octave /igraph / ...). The less you let igraph/octave set default parameters for you, the more you are safe.
In between arpack iterative calls, arpack ask you to update the vector he works with. To do theses updates you may need to solve some linear system using any solver you decide like LAPACK arpack-ng/EXAMPLES/SIMPLE/dssimp.f Line 318 in 3a9a9ce
Yes, when no B, one always use mode 1
Exact same result with LA added here #402
This is likely what kills you. If your RCI uses an iterative solver (?) you have no chance to get to a solution: relax your tol. Use 1e.-3 for instance. Try different tol.
Sorry, can't afford to go on debugging igraph / octave... Moreover if the problem is not in arpack but the way it's used
|
Also, common, misuse I have seen: are you sure you handle all possible cases in RCI? That is all other cases (ido=-1 or 1 or 2 - easily checked with an assert) than the most frequent one
|
I'm sorry, I am not familiar with Above, I gave complete details about how to call DSAUPD() to reproduce the issue. If anything specific is missing, please point it out and I will make an effort to provide it. I hope you can work with this.
The parameters you use here are not consistent with what I gave you (neither
The specific case I described (Mode 1 with DSAUPD()) only requires a matrix multiplication, not solving a linear system. |
Yes, all IDO values are handled. If an unexpected IDO value came up, it would result in printing an informative error message. |
There is no solver needed, only a matrix multiplication is done. |
Checking out last master cd49a7f, did you try to run the exact same command lines detailed here #401 (comment) using LA magnitude?
Checkout out cd49a7f
OK. Good to know. Easier case.
Assert if you can to be sure!
At my side
|
The difference appears to be in the starting vector. We provide an explicit starting vector, while For this specific problem, the failure is reproducible with relatively high probability if providing a random starting vector sampled uniformly from [-1, 1], or from [0, 1]. It's also reproducible if we set the first two elements to One specific starting vector that reproduces the problem for me has I cannot reproduce the problem if we leave it up to ARPACK to set a random starting vector, instead of providing it ourselves. |
I encourage you to use ICB: this prevents hell debugging that may happen as compilers/linkers are heavily developped (optim, LTO, ...). Using ICB make sure type passed from/to C/F77 are the expected ones. Check #405
Would have used shift for that instead of providing initial vector: test it with arpackmm.
May be in relation with #80 which should not be a problem for me AFAIU (see also JuliaLinearAlgebra/Arpack.jl#138). Not sure. |
Are you able to reproduce the issue with a custom starting vector? |
Not now. Away from home and quite busy this week-end. You can test that with --restart and dumpToFile/restartFromFile features in arpackmm AFAIR: first dump to file, replace your init vector in the file, re-run arpack with --restart. You'll can answer your question.
Rephrasing, would have used shift with estimated eigen value (shift is a scalar), and also, setting estimated eigen vector as initial vector. Doing one thing but not the other may screw arpack. If you can estimate eigen value, shifting is likely enough (and simpler) |
TLDR; the problem comes from from a change in arpack between 3.8.0 and 3.9.0 connected to a change in openblas implementation occurring between 0.3.13 and 0.3.17. You can test it with your example on msys2 by downgrading openblas version available here https://repo.msys2.org/mingw/mingw64 The longer story:
|
@szhorvat: could you try with another blas (netlib or MKL)? |
I tried with two different BLAS:
These are the two that MacPorts supports conveniently. There was no difference in the behaviour. If you think it's worth it, I can look into trying a third option tomorrow. |
Thanks for looking into it @FabienPean
To be accurate, most ports in MacPorts, including both Octave and ARPACK, use vecLib by default (the +accelerate variant). But they can be configured to use OpenBLAS.
Do you mean that you managed to reproduce the issue with OpenBLAS 0.3.21 but not 0.3.13? Did you use igraph or Octave to reproduce it? |
What do you call / use-as the starting vector?... And do you update associated uncertainty? |
What is this error? Is it |
By "starting vector" I mean the initial residual vector passed in the
What do you mean by "associated uncertainty"? |
I would need more time to figure out for certain what the error is specifically with Octave, but very likely it is the same as with igraph, i.e. not |
Strangely, I was not able to reproduce it in this Codespace either. For reference, this is Ubuntu Jammy. I tried both with OpenBLAS 3.20 as well as with reference BLAS/LAPACK 3.10. I could not reproduce it with either. I can reproduce it both on macOS 10.14 as well as with msys2 (which causes the igraph CI tests to keep failing at the moment). On macOS, I can reproduce it either when building ARPACK manually, or through MacPorts, and with both GCC 11 and GCC 12 (used as the Fortran compiler). |
More precisely, on msys2/mingw64, the error you described with ARPACK 3.9.0 could not be reproduced with openblas 0.3.13, but it could be reproduced with OpenBLAS 0.3.17. Versions 0.3.14 to 0.3.16 are absent from msys2 repository so couldn't be tested. Shortly change in ARPACK between 3.8.0→3.9.0 is related negatively to a change in openblas 0.3.14≤version≤0.3.17 |
@fghoussen I tried with netlib LAPACK/BLACK version 3.11.0, on macOS 10.14, and I cannot reproduce the problem. So here's the current list of findings from my side:
|
@szhorvat: I may have some clue. But, need time to run some tests. Hope to get this done by the week-end. |
Thanks so much for looking into this @fghoussen ! Let me know if you need anything else from me. |
Just tried: works OK at least 3 successive times |
FYI first commit where the issue appears is ce2e69a |
@fghoussen I have a question about the correct use of the reverse communication interface. The ARPACK manual (e.g. here), as well as the simple example driver (here) seems to suggest that when calling Does the ARPACK-NG project provide the ARPACK manual somewhere so that we wouldn't have to get it from other, potentially ephemeral sources? The original ARPACK site is no longer accessible, unfortunately. |
All that said, handling |
@fghoussen Regarding ce2e69a, as far as I can tell, this commit causes First, note that Is there a bug here? |
The ARPACK User's Guide refers to the The But anyway, this is tangential to the main issue in #401 (comment) |
Yes, I noticed also that the original site is no longer accessible.
I had to compute eigen values/vectors on difficult matrices where From this experience, I've seen lots of code samples out there (in research labs and enterprises) miusing arpack and / or doing just was is necessary to get a result for some given cases (= some kind of matrices only = the kind they need depending on their application / field). arpack is difficult to use because the doc is cumbersome as is the language (F77)... In this sense, I believe the most part of problems using arpack do not come from arpack, but, do come from misuses (after reading ugly doc that nobody can understand): this is what I tried to say in above comments. Now, I'll try to answer (what I understand from) your questions. Keep in mind that I can misuse arpack myself and that I am aware of that: this is why I tested arpackmm on the maximum number of cases (improving the way I had to use arpack each time). Put another way, I may be wrong answering your questions...
AFAIR, ce2e69a fixes both #332, #371 and MKL related problems.
I would say yes. Whatever
My understanding is that (recall, I may be wrong):
Lehoucq et al are mathematicians. All such people I met are known to be rigorous and precise : this is why they are good at what they do. They call a cat a cat, not a dog. A vector is a (non-zero) vector, not a residual. A residual is a residual (a difference of vectors targetting zero = leftover to converge) not a vector. My understanding is that (I may be wrong):
Which would echo with #215, where A is zero and the lambda to find turns out to be zero too, such that the initial For me, once again, the doc here is to blame: the error "starting vector is zero" from the doc should be actually read like-so "starting residual (associated to the vector) is zero". I may be wrong.
Isn't your
If you do not use ICB (linking to "underscored" symbols), compiler/linker optims/LTO may screw data (incorrect types) when passing from C to F77 or the other way. Did you try to compile with Using ICB, insure compiler/liker know data type and run optims/LTO accordingly ( |
Let me be more concise regarding ce2e69a, to avoid misunderstandings, and to make sure we keep the discussion on topic. There is a piece of code in In additions to issues this may cause in the algorithm, it also clearly changes ARPACK's behaviour across repeated calls. Previously, if a first solution attempt would fail (e.g. converge to the wrong eigenvector) due to bad luck in the random choice of Please address this.
I do not understand this sentence.
The commit we are discussing made it so that the "random" vector will always be exactly the same.
This is not true. If you give the same seed, the output will be the same.
This is not correct. |
This is precisely where I see things going wrong: https://github.com/opencollab/arpack-ng/blob/master/SRC/dsaitr.f#L417
But of course if I don't know how to make this any clearer. Please fix this in 3.9.1. You simply need to allow the RNG's state to be propagated normally instead of resetting it on each call to |
@fghoussen @sylvestre I am happy to submit a PR that reverts the change in ce2e69a. Will you consider merging it? |
Reverting is not a good move to me. As explained:
|
Once again you are not responding to my comments, and not paying attention to or not reading what I wrote. I cannot continue here until you do this. |
Reverting will trigger regressions (possibly unseen by CI).
... But once again, ce2e69a can not be responsible for your problem: it's elsewhere. |
Bug: opencollab/arpack-ng#401 Bug: opencollab/arpack-ng#410 Bug: opencollab/arpack-ng#411 Bug: igraph/igraph#2311 Signed-off-by: Sam James <[email protected]>
- fixes opencollab#401, opencollab#410, opencollab#411 - restores 'inits' variable removed in ce2e69a, ensuring that the RNG state is propagated - reverts e0d6705 to ensure that seed is different on each parallel thread - updates seed initialization of parallel pdgetv0/psgetv0 so that they match that of pzgetv0/pcgetv0
- fixes opencollab#401, opencollab#410, opencollab#411 - restores 'inits' variable removed in ce2e69a, ensuring that the RNG state is propagated - reverts e0d6705 to ensure that seed is different on each parallel thread - updates seed initialization of parallel pdgetv0/psgetv0 so that they match that of pzgetv0/pcgetv0
Fixed by #423 |
This is a summary of the issue described at igraph/igraph#2311, i.e. DSAUPD() failing when running the igraph test suite with ARPACK 3.9.0. With ARPACK 3.8.0, there is no problem.
We are solving the following eigenvalue problem:
which='LA'
The error
An error occurs depending on the starting vector that is used. The error is easy to reproduce after trying a few random starting vectors.
Details of the error:
-9999
is returned ininfo
. The value ofiparam(5)
is4
whenReproducing
The issue can be reproduced independently of igraph, using Octave. This suggests that the problem may be in ARPACK itself, and not with how igraph uses ARPACK.
Transcript of Octave session:
The purpose of the
for
loop is to find a random starting vector that reproduces the issue.The actual starting vector used in the igraph code can be replicated like so:
opt.v0 = ones(n,1); opt.v0(1:2) = 1e-4 * (1-2*rand(2,1))
. However, the simpler random starting vector used in the above example also reproduces the issue.Details of my environment
I am using ARPACK on macOS 10.14.6 on x86_64 with MacPorts, specifically this setup: macports/macports-ports#17716
The problem occurs regardless of the BLAS implementation used. Tried both Apple's vecLib (default in MacPorts) and OpenBLAS.
The ARPACK 3.9.0 test suite passes in this environment.
Affected ARPACK versions
Only 3.9.0, not 3.8.0.
The text was updated successfully, but these errors were encountered: