Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

icb_arpack_cpp: testsuite fails on i386 #123

Closed
sylvestre opened this issue Jun 5, 2018 · 26 comments · Fixed by #125
Closed

icb_arpack_cpp: testsuite fails on i386 #123

sylvestre opened this issue Jun 5, 2018 · 26 comments · Fixed by #125

Comments

@sylvestre
Copy link
Contributor

See: https://buildd.debian.org/status/fetch.php?pkg=arpack&arch=i386&ver=3.6.0-1&stamp=1528229606&raw=0

../build-aux/test-driver: line 107:  7229 Segmentation fault      "$@" > $log_file 2>&1
FAIL: icb_arpack_cpp
../build-aux/test-driver: line 107:  7200 Segmentation fault      "$@" > $log_file 2>&1
FAIL: icb_arpack_c

@dbeurle @fghoussen Rings a bell?

@10110111
Copy link
Contributor

10110111 commented Jun 5, 2018

For reference, here's the backtrace of the crash and a bit more of info:

Thread 1 "lt-icb_arpack_c" received signal SIGSEGV, Segmentation fault.
cneupd (rvec=.TRUE., howmny=..., select=<error reading variable: value requires 538300256 bytes, which is more than max-value-size>, d=<error reading variable: value requires 1076211256 bytes, which is more than max-value-size>, 
    z=<error reading variable: value requires 3545643256 bytes, which is more than max-value-size>, ldz=1001, sigma=(0,0), workev=<error reading variable: value requires 2153201024 bytes, which is more than max-value-size>, bmat=..., n=134526416, 
    which=<error reading variable: Cannot access memory at address 0x3e8>, nev=134526407, tol=1.26116862e-44, resid=<error reading variable: value requires 1076211328 bytes, which is more than max-value-size>, ncv=134575064, 
    v=<error reading variable: value requires 1102378496 bytes, which is more than max-value-size>, ldv=-134864888, iparam=<error reading variable: Cannot access memory at address 0x3e8>, ipntr=..., workd=<error reading variable: value requires 3228633984 bytes, which is more than max-value-size>, 
    workl=<error reading variable: value requires 1076856640 bytes, which is more than max-value-size>, lworkl=134607080, rwork=<error reading variable: value requires 538300256 bytes, which is more than max-value-size>, info=0, _howmny=1, _bmat=1, _which=1) at cneupd.f:340
340           mode = iparam(7)
(gdb) p iparam
Cannot access memory at address 0x3e8
(gdb) bt
#0  cneupd (rvec=.TRUE., howmny=..., select=<error reading variable: value requires 538300256 bytes, which is more than max-value-size>, d=<error reading variable: value requires 1076211256 bytes, which is more than max-value-size>,                                                                                    
    z=<error reading variable: value requires 3545643256 bytes, which is more than max-value-size>, ldz=1001, sigma=(0,0), workev=<error reading variable: value requires 2153201024 bytes, which is more than max-value-size>, bmat=..., n=134526416,                                                                      
    which=<error reading variable: Cannot access memory at address 0x3e8>, nev=134526407, tol=1.26116862e-44, resid=<error reading variable: value requires 1076211328 bytes, which is more than max-value-size>, ncv=134575064,                                                                                            
    v=<error reading variable: value requires 1102378496 bytes, which is more than max-value-size>, ldv=-134864888, iparam=<error reading variable: Cannot access memory at address 0x3e8>, ipntr=..., workd=<error reading variable: value requires 3228633984 bytes, which is more than max-value-size>,                  
    workl=<error reading variable: value requires 1076856640 bytes, which is more than max-value-size>, lworkl=134607080, rwork=<error reading variable: value requires 538300256 bytes, which is more than max-value-size>, info=0, _howmny=1, _bmat=1, _which=1) at cneupd.f:340                                          
#1  0xf7faff53 in cneupd_c (rvec=<optimized out>, howmny=..., select=..., d=..., z=..., ldz=<optimized out>, sigma=<optimized out>, workev=..., bmat=..., n=<optimized out>, which=<error reading variable: Cannot access memory at address 0x3e8>, nev=<optimized out>, tol=<optimized out>, resid=...,                    
    ncv=<optimized out>, v=..., ldv=<optimized out>, iparam=<error reading variable: Cannot access memory at address 0x3e8>, ipntr=..., workd=..., workl=..., lworkl=<optimized out>, rwork=..., info=0) at icbacn.f90:61                                                                                                   
#2  0x0804a4f6 in arpack::neupd (rvec=true, howmny_option=arpack::howmny::ritz_vectors, select=0x8077ea0, d=0x8061658, z=0x80616b0, ldz=1001, sigma=..., workev=0x8075028, bmat_option=arpack::bmat::identity, n=1000, ritz_option=arpack::which::largest_magnitude, nev=9, tol=0, resid=0x80573d8, ncv=19, v=0xf7f62008,   
    ldv=1000, iparam=0xffffd1d8, ipntr=0xffffd1a0, workd=0x8059320, workl=0x805f0e8, lworkl=1197, rwork=0x8074f88, info=@0xffffd19c: 0) at ../arpack.hpp:276                                                                                                                                                                
#3  0x08049c80 in complex_symmetric_runner () at icb_arpack_cpp.cpp:160                                                                                                                                                                                                                                                     
#4  0x0804a1ba in main () at icb_arpack_cpp.cpp:202

Note how iparam changes from being a pointer to 0xffffd1d8 in frame 2 (C++) to being an integer at address 0x3e8 in frame 1 (Fortran).

Also, 0x3e8 is the value of n and ldv in frame 2.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

Interesting. Does it only fail on a 32 bit build? And it also fails for the C test?

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

@dbeurle
The C test crashes in zneupd.

For the C++ case it seems I've found the culprit: the passage of sigma complex number by value results in two DWORDs on the stack instead the single DWORD if passed by reference, and thus the next argument, workev, already has address of 0x0 (sigma is 0.0+0.0i).

For x86_64, I guess, this wouldn't crash (didn't check though) because floating-point parameters are passed by value in registers, so there'd be no bad stack accesses, although the results will likely be garbage (rings a bell: why does the test not find this?)

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

In the C case it's also passed by value right?

extern void zneupd_c(bool rvec, char * howmny, int * select,
                     double _Complex * d, double _Complex * z, int ldz, double _Complex sigma, double _Complex * workev,
                     char * bmat, int n, char * which, int nev,
                     double tol, double _Complex * resid, int ncv, double _Complex * v,
                     int ldv, int * iparam, int * ipntr, double _Complex * workd,
                     double _Complex * workl, int lworkl, double _Complex * rwork, int * info);

I'm not 100% sure. This means that the bindings explicitly depends on the size of the parameters passed to it? If so, both the C bindings and the C++ need to pass by pointer for this value?

Non-zero shifts (sigma) aren't tested so we didn't discover the UB here.

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

This means that the bindings explicitly depends on the size of the parameters passed to it?

No, this means the prototype is simply wrong: sigma must have a * before it.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

Okay, this is fairly easy to fix then? Do we have a test to check this before it's pushed again?

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

@sylvestre You can go ahead and assign this to me, I'll do this in my lunch break if that works for you.

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

The following patch fixes the segmentation faults in the C and C++ tests, but the C test still fails for some reason:

diff --git a/TESTS/icb_arpack_c.c b/TESTS/icb_arpack_c.c
index 419227a..84bfa55 100644
--- a/TESTS/icb_arpack_c.c
+++ b/TESTS/icb_arpack_c.c
@@ -116,7 +116,7 @@ int zn() {
   int select[ncv];
   double _Complex z[(N+1)*(nev+1)];
   BLASINT ldz = N+1;
-  double sigma=0;
+  double _Complex sigma=0;
   int k;
   for (k=0; k < 3*N; ++k )
     workd[k] = 0;
@@ -144,7 +144,7 @@ int zn() {
   if (iparam[4] != nev) return 1; // check number of ev found by arpack.
 
   /* call arpack like you would have, but, use zneupd_c instead of zneupd_ */
-  zneupd_c(rvec, howmny, select, d, z, ldz, sigma, workev,
+  zneupd_c(rvec, howmny, select, d, z, ldz, &sigma, workev,
            bmat, N, which, nev, tol, resid, ncv, V, ldv, iparam, ipntr,
            workd, workl, lworkl, rwork, &info);
   int i;
diff --git a/arpack.h b/arpack.h
index f85ec7b..9cfaf40 100644
--- a/arpack.h
+++ b/arpack.h
@@ -55,7 +55,7 @@ extern void cnaupd_c(int * ido, char * bmat, int n, char * which, int nev,
                      float _Complex * workl, int lworkl, float _Complex * rwork, int * info);
 
 extern void cneupd_c(bool rvec, char * howmny, int * select,
-                     float _Complex * d, float _Complex * z, int ldz, float _Complex sigma, float _Complex * workev,
+                     float _Complex * d, float _Complex * z, int ldz, float _Complex const* sigma, float _Complex * workev,
                      char * bmat, int n, char * which, int nev,
                      float tol, float _Complex * resid, int ncv, float _Complex * v,
                      int ldv, int * iparam, int * ipntr, float _Complex * workd,
@@ -67,7 +67,7 @@ extern void znaupd_c(int * ido, char * bmat, int n, char * which, int nev,
                      double _Complex * workl, int lworkl, double _Complex * rwork, int * info);
 
 extern void zneupd_c(bool rvec, char * howmny, int * select,
-                     double _Complex * d, double _Complex * z, int ldz, double _Complex sigma, double _Complex * workev,
+                     double _Complex * d, double _Complex * z, int ldz, double _Complex const* sigma, double _Complex * workev,
                      char * bmat, int n, char * which, int nev,
                      double tol, double _Complex * resid, int ncv, double _Complex * v,
                      int ldv, int * iparam, int * ipntr, double _Complex * workd,
diff --git a/arpack.hpp b/arpack.hpp
index 6ce800f..ac2dd29 100644
--- a/arpack.hpp
+++ b/arpack.hpp
@@ -94,7 +94,7 @@ void cnaupd_c(int& ido, const char* bmat, int n, const char* which, int nev,
               int& info);
 
 void cneupd_c(bool rvec, const char* howmny, int* select, float _Complex* d,
-              float _Complex* z, int ldz, float _Complex sigma,
+              float _Complex* z, int ldz, float _Complex const* sigma,
               float _Complex* workev, const char* bmat, int n,
               const char* which, int nev, float tol, float _Complex* resid,
               int ncv, float _Complex* v, int ldv, int* iparam, int* ipntr,
@@ -108,7 +108,7 @@ void znaupd_c(int& ido, const char* bmat, int n, const char* which, int nev,
               int& info);
 
 void zneupd_c(bool rvec, const char* howmny, int* select, double _Complex* d,
-              double _Complex* z, int ldz, double _Complex sigma,
+              double _Complex* z, int ldz, double _Complex const* sigma,
               double _Complex* workev, const char* bmat, int n,
               const char* which, int nev, double tol, double _Complex* resid,
               int ncv, double _Complex* v, int ldv, int* iparam, int* ipntr,
@@ -273,10 +273,11 @@ inline void neupd(bool rvec, howmny const howmny_option, int* select,
                   std::complex<float>* v, int ldv, int* iparam, int* ipntr,
                   std::complex<float>* workd, std::complex<float>* workl,
                   int lworkl, std::complex<float>* rwork, int& info) {
+  const _Complex float csigma=std::real(sigma) + std::imag(sigma) * I;
   internal::cneupd_c(rvec, internal::convert_to_char(howmny_option), select,
                      reinterpret_cast<_Complex float*>(d),
                      reinterpret_cast<_Complex float*>(z), ldz,
-                     std::real(sigma) + std::imag(sigma) * I,
+                     &csigma,
                      reinterpret_cast<_Complex float*>(workev),
                      internal::convert_to_char(bmat_option), n,
                      internal::convert_to_char(ritz_option), nev, tol,
@@ -310,10 +311,11 @@ inline void neupd(bool rvec, howmny const howmny_option, int* select,
                   std::complex<double>* v, int ldv, int* iparam, int* ipntr,
                   std::complex<double>* workd, std::complex<double>* workl,
                   int lworkl, std::complex<double>* rwork, int& info) {
+  const _Complex double csigma=std::real(sigma) + _Complex_I * std::imag(sigma);
   internal::zneupd_c(rvec, internal::convert_to_char(howmny_option), select,
                      reinterpret_cast<_Complex double*>(d),
                      reinterpret_cast<_Complex double*>(z), ldz,
-                     std::real(sigma) + _Complex_I * std::imag(sigma),
+                     &csigma,
                      reinterpret_cast<_Complex double*>(workev),
                      internal::convert_to_char(bmat_option), n,
                      internal::convert_to_char(ritz_option), nev, tol,

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

Hmm, the C test fails with this patch due to the following symptoms: in icb_arpack_c.c:153 it gets d[i] = -1.8314593742069275e+263 + 993.00000000000648 * I, which of course means something blew up badly. Maybe some additional C-prototype vs Fortran incompatibility has been missed.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

It should be possible to change sigma in the tests to std::complex<> directly, pass this by value to the C++ bindings and reinterpret_cast to a pointer to avoid the local declaration?

P.S <3 the east const.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

Is there a problem with the setting of complex sigma in the C code? Are both components set to 0.0?

EDIT: No this works fine.

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

Is there a problem with the setting of complex sigma in the C code? Are both components set to 0.0?

According to cppreference,

Implicit conversions are defined between complex types and other arithmetic types.

And the implicit conversions are mostly intuitive enough so that e.g. (double)x -> 0.0*I+(double)x. Mind you, even the expression x+I*y where x and y are real already demonstrates this.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

True - I checked this myself and updated my comment (a little too late). Do the ARPACK error codes give you hint?

@sylvestre
Copy link
Contributor Author

fyi, a bunch of other archs are failing:
https://buildd.debian.org/status/package.php?p=arpack
I guess this is the same source issue.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

@sylvestre Have the ISO C bindings been tested before (apart from x86_64)? Is this linked to the introduction of C++ bindings or the C / FORTRAN bindings?

@sylvestre
Copy link
Contributor Author

Nope, 3.6.0 is the first and travis ci only provides amd64 systems...

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

Actually, as of cd63660, and one commit before, the segfaults are already there.

@sylvestre
Copy link
Contributor Author

@dbeurle sure, please feel free to fix it :)
(we should fix that asap to avoid people downloading the incorrect 3.6.0)

10110111 added a commit to 10110111/arpack-ng that referenced this issue Jun 6, 2018
…as Fortran-C bindings expect it

This fixes opencollab#123. Both C and C++ tests will no longer
crash. But the C test will fail. This might be due to an error in the
test itself.
@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

If this was before my time I'm not sure I can fix it -_-.

@10110111
Copy link
Contributor

10110111 commented Jun 6, 2018

@dbeurle No worries, I'm taking care of it.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 6, 2018

@10110111 Just shout out if you need anything from me - I'm not too familiar with FORTRAN / C intricacies.

@10110111
Copy link
Contributor

10110111 commented Jun 7, 2018

OK, the C bindings are actually totally broken. Not only the C prototypes don't completely match the descriptions icba*.f90, but also the descriptions there sometimes don't match the corresponding Fortran sources. Fixing them will require quite some effort. I'm going to do this, but this might require more than one day.

@dbeurle
Copy link
Contributor

dbeurle commented Jun 7, 2018

Nice catch - if you get the C bindings running I'll take care of the C++ ones.

@sylvestre
Copy link
Contributor Author

Ok, thanks.
FYI, I removed the 3.6.0 tag in the repo.
I will add it back on the same revision + do a 3.6.1 then when the regression is fixed

@matzeri
Copy link
Contributor

matzeri commented Jun 16, 2018

3.6.0 is back, but 3.6.1 in not in the releases

@sylvestre
Copy link
Contributor Author

I tagged 3.6.1 today and uploaded it in Debian.
Most of the issues are fixed:
https://buildd.debian.org/status/package.php?p=arpack

I opened #128 for the remaining issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants