Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples all end with SIGSEGV #512

Closed
lopsided opened this issue Sep 14, 2021 · 17 comments
Closed

Examples all end with SIGSEGV #512

lopsided opened this issue Sep 14, 2021 · 17 comments

Comments

@lopsided
Copy link

Hi,

I'm trying to test my installation and check the different solvers are compiled correctly, but every one exits with Segmentation fault (core dumped).

This is an example log from journalctl:

[datetime] [pcname] systemd-coredump[458973]: [🡕] Process 458971 (hs071_c) of user 1000 dumped core.
                                                 Stack trace of thread 458971:
                                                 #0  0x00007f091c8b4c18 dcopy_ (libblas.so.3 + 0x4c18)
                                                 #1  0x00007f091decd87d _ZN5Ipopt10IpBlasCopyEiPKdiPdi (libipopt.so.3 + 0xd987d)
                                                 #2  0x00007f091dfca5a9 CreateIpoptProblem (libipopt.so.3 + 0x1d65a9)
                                                 #3  0x000000000040118d n/a (/path/to/Ipopt/build/examples/hs071_c/hs071_c + 0x118d)

Some examples show the licence and the selected solver. In ScalableProblems I see the Number of nonzeros in.... lines and larger problems seem to take longer at least, but I can't tell if they are completing and just not exiting properly or if they are actually failing.

@lopsided
Copy link
Author

Also (hopefully related) I'm seeing lots of failures in make test (including some seg faults):

> make test                                                                                                                                                                                  (wormlab3d) 
Making all in src
make[1]: Entering directory '/path/to/Ipopt/build/src'
make  all-recursive
make[2]: Entering directory '/path/to/Ipopt/build/src'
make[3]: Entering directory '/path/to/Ipopt/build/src'
make[3]: Nothing to be done for 'all-am'.
make[3]: Leaving directory '/path/to/Ipopt/build/src'
make[2]: Leaving directory '/path/to/Ipopt/build/src'
make[1]: Leaving directory '/path/to/Ipopt/build/src'
Making all in contrib/sIPOPT
make[1]: Entering directory '/path/to/Ipopt/build/contrib/sIPOPT'
Making all in src
make[2]: Entering directory '/path/to/Ipopt/build/contrib/sIPOPT/src'
make[2]: Nothing to be done for 'all'.
make[2]: Leaving directory '/path/to/Ipopt/build/contrib/sIPOPT/src'
make[2]: Entering directory '/path/to/Ipopt/build/contrib/sIPOPT'
make[2]: Nothing to be done for 'all-am'.
make[2]: Leaving directory '/path/to/Ipopt/build/contrib/sIPOPT'
make[1]: Leaving directory '/path/to/Ipopt/build/contrib/sIPOPT'
make[1]: Entering directory '/path/to/Ipopt/build'
make[1]: Nothing to be done for 'all-am'.
make[1]: Leaving directory '/path/to/Ipopt/build'
cd test; make test
make[1]: Entering directory '/path/to/Ipopt/build/test'
  CXX      hs071_main.o
  CXX      hs071_nlp.o
  CXXLD    hs071_cpp
  CC       hs071_c.o
  CCLD     hs071_c
  CXX      emptynlp.o
  CXXLD    emptynlp
  CXX      getcurr.o
  CXXLD    getcurr
ln -s ../examples/hs071_f/hs071_f.f hs071_f.f
  F77      hs071_f.o
  F77LD    hs071_f
  CXX      parametricTNLP.o
  CXX      parametric_driver.o
  CXXLD    parametric_cpp
  CXX      MySensTNLP.o
  CXX      redhess_cpp.o
  CXXLD    redhess_cpp
chmod u+x ./run_unitTests
./run_unitTests
 
Running unitTests...
 
Testing AMPL Solver Executable...
    no AMPL solver executable found, skipping test...
Testing C++ Example...
    Test passed!
Testing C Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver ma27.


EXIT: Problem has inconsistent variable bounds or constraint sides.


ERROR OCCURRED DURING IPOPT OPTIMIZATION.
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing Fortran Example...
 
 ---- 8< ---- Start of test program output ---- 8< ----

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver ma27.


EXIT: Problem has inconsistent variable bounds or constraint sides.
Note: The following floating-point exceptions are signalling: IEEE_DENORMAL

 An error occoured.
 The error code is          -11

 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Output of the test program does not contain 'EXIT: Optimal Solution Found'.
Skip testing Java Example (Java interface not build)
Testing sIpopt Example parametric_cpp...
./run_unitTests: line 21: 475777 Aborted                 (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        5

Total number of variables............................:        5
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        4
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


Number of Iterations....: 0

Number of objective function evaluations             = 1
Number of objective gradient evaluations             = 1
Number of equality constraint evaluations            = 1
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 1
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 0
Total seconds in IPOPT                               = 0.002

EXIT: Invalid number in NLP function or derivative detected.
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing sIpopt Example redhess_cpp...
./run_unitTests: line 21: 475801 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing EmptyNLP Example...
./run_unitTests: line 21: 475828 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----

*** Solve for 0 variables, feasible constraint, feasible bounds
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing GetCurr Example...
./run_unitTests: line 21: 475852 Aborted                 (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----
Line 449: Wrong compl_x_L[0] = 1.88436637324e-320, expected z_L[0] * (x[0] + 10.0) = 10
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
make[1]: *** [Makefile:816: test] Error 1
make[1]: Leaving directory '/path/to/Ipopt/build/test'
make: *** [Makefile:848: test] Error 2

Should I be expecting these tests to pass?

@svigerske
Copy link
Member

Maybe try a debug build (--enable-debug --disable-shared) and run under valgrind.

@lopsided
Copy link
Author

Results:

$ valgrind --leak-check=yes ./run_unitTests
==642274== Memcheck, a memory error detector
==642274== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==642274== Using Valgrind-3.17.0 and LibVEX; rerun with -h for copyright info
==642274== Command: ./run_unitTests
==642274== 
 
Running unitTests...
 
Testing AMPL Solver Executable...
    no AMPL solver executable found, skipping test...
Testing C++ Example...
./run_unitTests: line 21: 642276 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing C Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver ma27.


EXIT: Problem has inconsistent variable bounds or constraint sides.


ERROR OCCURRED DURING IPOPT OPTIMIZATION.
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing Fortran Example...
./run_unitTests: line 21: 642287 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----

Program received signal SIGSEGV: Segmentation fault - invalid memory reference.

Backtrace for this error:
#0  0x7f6439ef07c2 in ???
#1  0x7f6439eef995 in ???
#2  0x7f643990131f in ???
#3  0x7f643a1e8c18 in ???
#4  0x43c306 in _ZN5Ipopt10IpBlasCopyEiPKdiPdi
	at ../../src/LinAlg/IpBlas.cpp:252
#5  0x40b6e6 in CreateIpoptProblem
	at ../../src/Interfaces/IpStdCInterface.cpp:65
#6  0x40af76 in ipcreate_
	at ../../src/Interfaces/IpStdFInterface.c:349
#7  0x408bdd in example
	at /path/to/Ipopt/build/test/hs071_f.f:108
#8  0x40990e in main
	at /path/to/Ipopt/build/test/hs071_f.f:212
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Skip testing Java Example (Java interface not build)
Testing sIpopt Example parametric_cpp...
./run_unitTests: line 21: 642297 Aborted                 (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----

******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit https://github.com/coin-or/Ipopt
******************************************************************************

This is Ipopt version 3.14.4, running with linear solver ma27.

Number of nonzeros in equality constraint Jacobian...:       10
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        5

Total number of variables............................:        5
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        4
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


Number of Iterations....: 0

Number of objective function evaluations             = 1
Number of objective gradient evaluations             = 1
Number of equality constraint evaluations            = 1
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 1
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 0
Total seconds in IPOPT                               = 0.000

EXIT: Invalid number in NLP function or derivative detected.
terminate called after throwing an instance of 'std::bad_array_new_length'
  what():  std::bad_array_new_length
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing sIpopt Example redhess_cpp...
    Test passed!
Testing EmptyNLP Example...
./run_unitTests: line 21: 642309 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----

*** Solve for 0 variables, feasible constraint, feasible bounds
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
Testing GetCurr Example...
./run_unitTests: line 21: 642319 Segmentation fault      (core dumped) $@ > tmpfile 2>&1
0 
 ---- 8< ---- Start of test program output ---- 8< ----
 ---- 8< ----  End of test program output  ---- 8< ----
 
    ******** Test FAILED! ********
Test program existed with nonzero status.
==642274== 
==642274== HEAP SUMMARY:
==642274==     in use at exit: 169,595 bytes in 1,531 blocks
==642274==   total heap usage: 7,298 allocs, 5,767 frees, 331,013 bytes allocated
==642274== 
==642274== LEAK SUMMARY:
==642274==    definitely lost: 0 bytes in 0 blocks
==642274==    indirectly lost: 0 bytes in 0 blocks
==642274==      possibly lost: 0 bytes in 0 blocks
==642274==    still reachable: 169,595 bytes in 1,531 blocks
==642274==         suppressed: 0 bytes in 0 blocks
==642274== Reachable blocks (those to which a pointer was found) are not shown.
==642274== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==642274== 
==642274== For lists of detected and suppressed errors, rerun with: -s
==642274== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I also ran with --leak-check=full --show-leak-kinds=all as suggested, but got same summary as above with hundreds of blocks like this:

==642631== 8,224 bytes in 2 blocks are still reachable in loss record 812 of 812
==642631==    at 0x484086F: malloc (vg_replace_malloc.c:380)
==642631==    by 0x1E6A30: rl_generic_bind (in /usr/bin/bash)
==642631==    by 0x1EFBE3: ??? (in /usr/bin/bash)
==642631==    by 0x1FB9E1: _rl_init_terminal_io (in /usr/bin/bash)
==642631==    by 0x1FBC1B: _rl_set_screen_size (in /usr/bin/bash)
==642631==    by 0x1D6CF8: get_new_window_size (in /usr/bin/bash)
==642631==    by 0x16DFB2: wait_for (in /usr/bin/bash)
==642631==    by 0x155D1D: execute_command_internal (in /usr/bin/bash)
==642631==    by 0x156577: execute_command (in /usr/bin/bash)
==642631==    by 0x154ADD: execute_command_internal (in /usr/bin/bash)
==642631==    by 0x156577: execute_command (in /usr/bin/bash)
==642631==    by 0x1551D5: execute_command_internal (in /usr/bin/bash)

@svigerske
Copy link
Member

svigerske commented Sep 15, 2021

This ran valgrind on bash only. You need --trace-children=yes as well.

@lopsided
Copy link
Author

Ah OK, new output here: https://pastebin.com/nwwf7Dnd
(too long for github!)

@svigerske
Copy link
Member

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0  0.0000000e+00 0.00e+00 0.00e+00  -1.0 0.00e+00    -  0.00e+00 0.00e+00   0
==643185== Invalid read of size 8
==643185==    at 0x453B88: Ipopt::DenseVector::AmaxImpl() const (IpDenseVector.cpp:260)
==643185==    by 0x481F2F: Ipopt::Vector::Amax() const (IpVector.hpp:664)
==643185==    by 0x4B8346: Ipopt::IpoptCalculatedQuantities::CalcNormOfType(Ipopt::ENormType, Ipopt::Vector const&, Ipopt::Vector const&) (IpIpoptCalculatedQuantities.cpp:2527)
==643185==    by 0x4BA50B: Ipopt::IpoptCalculatedQuantities::unscaled_curr_dual_infeasibility(Ipopt::ENormType) (IpIpoptCalculatedQuantities.cpp:2792)
==643185==    by 0x40D66A: Ipopt::IpoptApplication::call_optimize() (IpIpoptApplication.cpp:667)
==643185==    by 0x40C3C5: Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&, Ipopt::SmartPtr<Ipopt::AlgorithmBuilder>&) (IpIpoptApplication.cpp:530)
==643185==    by 0x40C090: Ipopt::IpoptApplication::OptimizeNLP(Ipopt::SmartPtr<Ipopt::NLP> const&) (IpIpoptApplication.cpp:486)
==643185==    by 0x40BC71: Ipopt::IpoptApplication::OptimizeTNLP(Ipopt::SmartPtr<Ipopt::TNLP> const&) (IpIpoptApplication.cpp:466)
==643185==    by 0x405CA3: runEmpty(int, bool, bool) (emptynlp.cpp:316)
==643185==    by 0x4064DD: main (emptynlp.cpp:666)
==643185==  Address 0x5df8e28 is 8 bytes before a block of size 8 alloc'd
==643185==    at 0x484222F: operator new[](unsigned long) (vg_replace_malloc.c:579)
==643185==    by 0x42B5D9: Ipopt::DenseVectorSpace::AllocateInternalStorage() const (IpDenseVector.hpp:487)
==643185==    by 0x42B576: Ipopt::DenseVector::values_allocated() (IpDenseVector.hpp:478)

points to this line in Ipopt::DenseVector::AmaxImpl():

   return std::abs(values_[IpBlasIamax(Dim(), values_, 1) - 1]);

Somehow IpBlasIamax, which just calls idamax from BLAS, seems to return 0 instead of some index >= 1, so the code tries to access values_[-1] ("8 bytes before a block of size 8 alloc'd").

Indices in BLAS are 1-based and Dim()>0 here, so something fishy seems to go on in your BLAS installation.

@lopsided
Copy link
Author

lopsided commented Sep 15, 2021

I'm trying to compile with MKL blas, but I don't think it is working. I've downloaded the MKL libs and can see the .so files in /opt/intel/oneapi/mkl/latest/lib/intel64. So I export MKLROOT=/opt/intel/oneapi/mkl/latest and then configure again with:

../configure --prefix=${PWD} --with-hsl-lflags="-L${HSLDIR}/.libs -lcoinhsl" --with-hsl-cflags="-I${HSLDIR}" \
  --with-lapack-lflags="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm"

as per the ipopt instructions. But in the configure output I see:

checking for LAPACK... yes: generic module (lapack.pc blas.pc)

Does this mean it didn't find MKL and is just using the netlib/openblas versions?

(Tests still segfault as before)

@svigerske
Copy link
Member

There seems to be something wrong in configure that it didn't pick up your flags. I'll fix. But in the meanwhile, change --with-lapack-lflags to --with-lapack.

Figuring out what the issue could be with your /usr/lib/libblas.so.3 would also be worthwile. Maybe you installed a blas with 64-bit integers by accident.

@lopsided
Copy link
Author

with
../configure --prefix=${PWD} --with-hsl-lflags="-L${HSLDIR}/.libs -lcoinhsl" --with-hsl-cflags="-I${HSLDIR}" --with-lapack="-L${MKLROOT}/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm"

I get:
checking for LAPACK... yes: user-specified (-L/opt/intel/oneapi/mkl/latest/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm)

and also:
checking for function pardiso_ in -L/opt/intel/oneapi/mkl/latest/lib/intel64 -Wl,--no-as-needed -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm ... yes

but now all the tests fail with:

Testing C++ Example...
0 
 ---- 8< ---- Start of test program output ---- 8< ----
/path/to/Ipopt/build/test/.libs/hs071_cpp: error while loading shared libraries: libmkl_intel_lp64.so.1: cannot open shared object file: No such file or directory

but it is present...

$ ll $MKLROOT/lib/intel64/libmkl_intel_lp64.so.1
-rwxr-xr-x. 1 root root 13M Jun 17 22:20 /opt/intel/oneapi/mkl/latest/lib/intel64/libmkl_intel_lp64.so.1

@svigerske
Copy link
Member

But it is not in the library search path. Add MKLROOT to you LD_LIBRARY_PATH.

@lopsided
Copy link
Author

I see! OK that gets me back to some tests failing with segfaults still. Full vagrind debug output here: https://pastebin.com/zvtJ7g4D

svigerske added a commit that referenced this issue Sep 15, 2021
@svigerske
Copy link
Member

And you have also rebuild HSL to use MKL?

@lopsided
Copy link
Author

That fixed it! Happy days :)

$ ./run_unitTests 
 
Running unitTests...
 
Testing AMPL Solver Executable...
    no AMPL solver executable found, skipping test...
Testing C++ Example...
    Test passed!
Testing C Example...
    Test passed!
Testing Fortran Example...
    Test passed!
Skip testing Java Example (Java interface not build)
Testing sIpopt Example parametric_cpp...
    Test passed!
Testing sIpopt Example redhess_cpp...
    Test passed!
Testing EmptyNLP Example...
    Test passed!
Testing GetCurr Example...
    Test passed!

So I guess there is a problem with openblas from the fedora 34 repos. I saw a few tickets on it (eg, OpenMathLib/OpenBLAS#2839) though I can't tell if they are related. But at least the issues seem fixed with MKL.

Thanks for all your help and patience!

@lopsided
Copy link
Author

Interesting development. I recompiled without --enable-debug --disable-shared. The tests still pass, but the examples are now giving seg faults again!

Recompiling with the --disable-shared flag fixes the examples again.

@svigerske
Copy link
Member

Did you install before building the examples?

@lopsided
Copy link
Author

Yes, I think it fails otherwise..?

@svigerske
Copy link
Member

Or the examples are build against some older Ipopt installation.
Check what libraries the examples depend on.

Could also be that there are now both (broken) shared and (working) static libs installed, but the examples link against the shared libs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants