Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hiop~mpi includes MPI headers #662

Open
cameronrutherford opened this issue Sep 25, 2023 · 6 comments
Open

hiop~mpi includes MPI headers #662

cameronrutherford opened this issue Sep 25, 2023 · 6 comments

Comments

@cameronrutherford
Copy link
Collaborator

https://github.com/LLNL/hiop/blob/develop/src/Interface/hiopInterface.hpp#L60

https://github.com/pnnl/ExaGO/actions/runs/6304772661/job/17116842345?pr=15

This is a really weird bug, as even when building petsc~mpi in the exago package here, petsc insists on having an mpi.h lying around that is also picked up...

I am still trying to figure out who to blame here, but this seemed like the right place to start.

@cnpetra
Copy link
Collaborator

cnpetra commented Sep 26, 2023

are you somehow building hiop without MPI? or different mpi headers are with hiop and petsc

@cameronrutherford
Copy link
Collaborator Author

are you somehow building hiop without MPI? or different mpi headers are with hiop and petsc

From the ExaGO pipeline, we are building:

exago@develop+hiop~ipopt~mpi~python+raja+tests arch=None-None-x86_64
 -   tx7nd5d  exago@develop%[email protected]~cuda+hiop~ipo~ipopt+logging~mpi~python+raja~rocm+tests build_system=cmake build_type=RelWithDebInfo dev_path=/__w/ExaGO/ExaGO arch=linux-ubuntu20.04-x86_64
 -   ybikngp      ^[email protected]%[email protected]~cuda~ipo+openmp~rocm~tests build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
 -   kl43gwj          ^[email protected]%[email protected] build_system=generic arch=linux-ubuntu20.04-x86_64
 -   7bzaewm      ^[email protected]%[email protected]~doc+ncurses+ownlibs~qt build_system=generic build_type=Release arch=linux-ubuntu20.04-x86_64
 -   3bxcabf          ^[email protected]%[email protected]~symlinks+termlib abi=none build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   yekhgie          ^[email protected]%[email protected]~docs~shared build_system=generic certs=mozilla arch=linux-ubuntu20.04-x86_64
 -   djeruao              ^ca-certificates-mozilla@2023-01-10%[email protected] build_system=generic arch=linux-ubuntu20.04-x86_64
 -   jymwj6w              ^[email protected]%[email protected]+optimize+pic+shared build_system=makefile arch=linux-ubuntu20.04-x86_64
 -   vcsqn5o      ^[email protected]%[email protected]~cuda+deepchecking~ginkgo~ipo~jsrun~kron~mpi+raja~rocm~shared~sparse build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
 -   wtvhbiz      ^[email protected]%[email protected]~bignuma~consistent_fpcsr+fortran~ilp64+locking+pic+shared build_system=makefile patches=114f95f,a4c642f,c20f518,d3d9b15 symbol_suffix=none threads=none arch=linux-ubuntu20.04-x86_64
 -   5qydzbx          ^[email protected]%[email protected]+cpanm+open+shared+threads build_system=generic arch=linux-ubuntu20.04-x86_64
 -   e5g7oef              ^[email protected]%[email protected]+cxx~docs+stl build_system=autotools patches=26090f4,b231fcc arch=linux-ubuntu20.04-x86_64
 -   gs4r33x              ^[email protected]%[email protected]~debug~pic+shared build_system=generic arch=linux-ubuntu20.04-x86_64
 -   7wdyruu              ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   wslvyrk      ^[email protected]%[email protected]~X~batch~cgns~complex~cuda~debug+double~exodusii~fftw+fortran~giflib~hdf5~hpddm~hwloc~hypre~int64~jpeg~knl~kokkos~libpng~libyaml~memkind+metis~mkl-pardiso~mmg~moab~mpfr~mpi~mumps~openmp~p4est~parmmg~ptscotch~random123~rocm~saws~scalapack+shared~strumpack~suite-sparse~superlu-dist~tetgen~trilinos~valgrind build_system=generic clanguage=C arch=linux-ubuntu20.04-x86_64
 -   kwz7ftm          ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   y4xrp3s              ^[email protected]%[email protected] build_system=autotools libs=shared,static arch=linux-ubuntu20.04-x86_64
 -   wnqabk7          ^[email protected]%[email protected]~gdb~int64~ipo~real64+shared build_system=cmake build_type=RelWithDebInfo patches=4991da9,93a7903,b1225da arch=linux-ubuntu20.04-x86_64
 -   quyjgw3          ^[email protected]%[email protected]+bz2+crypt+ctypes+dbm~debug+libxml2+lzma~nis~optimizations+pic+pyexpat+pythoncmd+readline+shared+sqlite3+ssl~tkinter+uuid+zlib build_system=generic patches=0d98e93,7d40923,f2fd060 arch=linux-ubuntu20.04-x86_64
 -   pgvwni4              ^[email protected]%[email protected]+libbsd build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   en3zuay                  ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   ps7sxlx                      ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   wlq5rko              ^[email protected]%[email protected]+bzip2+curses+git~libunistring+libxml2+tar+xz build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   j6aqcps                  ^[email protected]%[email protected]~python build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   zt4ocio                  ^[email protected]%[email protected] build_system=autotools zip=pigz arch=linux-ubuntu20.04-x86_64
 -   xoxeujp                      ^[email protected]%[email protected] build_system=makefile arch=linux-ubuntu20.04-x86_64
 -   3vtuapf                      ^[email protected]%[email protected]+programs build_system=makefile compression=none libs=shared,static arch=linux-ubuntu20.04-x86_64
 -   6sswith              ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   2evlwmd              ^[email protected]%[email protected]~obsolete_api build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   iuswzm4              ^[email protected]%[email protected] build_system=autotools patches=bbf97f1 arch=linux-ubuntu20.04-x86_64
 -   ghcuaen              ^[email protected]%[email protected]+column_metadata+dynamic_extensions+fts~functions+rtree build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   swhrnzy              ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   qkxtzoa              ^[email protected]%[email protected]~pic build_system=autotools libs=shared,static arch=linux-ubuntu20.04-x86_64
 -   w6opye6      ^[email protected]%[email protected] build_system=autotools arch=linux-ubuntu20.04-x86_64
 -   7xqyl5b      ^[email protected]%[email protected]~cuda+examples+exercises~ipo+openmp~rocm+shared~tests build_system=cmake build_type=RelWithDebInfo arch=linux-ubuntu20.04-x86_64
 -   4s36yj3      ^[email protected]%[email protected]+c~cuda~device_alloc~deviceconst+examples~fortran~ipo~numa~openmp~rocm+shared build_system=cmake build_type=RelWithDebInfo tests=none arch=linux-ubuntu20.04-x86_64

And so we get the backtrace:


     459    In file included from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
            untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
            zo7/include/hiopInterface.hpp:60,
     460                     from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
            untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
            zo7/include/hiopNlpFormulation.hpp:59,
     461                     from /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ub
            untu20.04-x86_64/gcc-9.4.0/hiop-0.7.1-vcsqn5ocwcwfihlrbjqozv3ku2rkg
            zo7/include/hiopAlgFilterIPM.hpp:59,
     462                     from /__w/ExaGO/ExaGO/src/opflow/solver/hiop/opflo
            w_hiop.h:7,
     463                     from /__w/ExaGO/ExaGO/src/opflow/solver/hiop/opflo
            w_hiop.cpp:4:
  >> 464    /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9
            .4.0/petsc-3.18.3-wslvyrkkwofiig24a5rm7gctadb7g4fk/include/petsc/mp
            iuni/mpi.h:186:13: error: multiple types in one declaration
     465      186 | typedef int MPI_Comm;
     466          |             ^~~~~~~~
  >> 467    /__w/ExaGO/ExaGO/tpl/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9
            .4.0/petsc-3.18.3-wslvyrkkwofiig24a5rm7gctadb7g4fk/include/petsc/mp
            iuni/mpi.h:186:13: error: declaration does not declare anything [-f
            permissive]
  >> 468    make[2]: *** [src/opflow/CMakeFiles/OPFLOW_obj_static.dir/build.mak
            e:261: src/opflow/CMakeFiles/OPFLOW_obj_static.dir/solver/hiop/opfl
            ow_hiop.cpp.o] Error 1

So the HiOp header hiopInterface.hpp on line 60 (linked in the issue description originally) is including hiopMPI.h, which is then including mpi.h. This looks for any header, and picks up a random PETSc one which errors out.

We are building PETSc and HiOp without MPI here, so I honestly think this could be a HiOp and a PETSc bug?

@nychiang
Copy link
Collaborator

@cnpetra @cameronrutherford
I can successfully build HiOp without MPI.
In hiopMPI.h, mpi.h is not included if we set HIOP_USE_MPI = OFF.

From your log file, I think the problems are:

  1. When HIOP_USE_MPI = OFF, both HiOp and PETSc define their own MPI_Comm.
  2. Not sure where mpi.h is included. Seems to be it is included via PETSc.

See here

@cameronrutherford
Copy link
Collaborator Author

cameronrutherford commented Sep 26, 2023

@cnpetra @cameronrutherford

I can successfully build HiOp without MPI.

In hiopMPI.h, mpi.h is not included if we set HIOP_USE_MPI = OFF.

From your log file, I think the problems are:

  1. When HIOP_USE_MPI = OFF, both HiOp and PETSc define their own MPI_Comm.

  2. Not sure where mpi.h is included. Seems to be it is included via PETSc.

See here

I'm following, but some clarification. I am also able to build hiop~mpi, but issue only happens when exago~mpi tries to build with both petsc~mpi and hiop~mpi.

Why do HiOp and PETSc both need to define MPI_Comm in these non-mpi builds?

@cameronrutherford
Copy link
Collaborator Author

cameronrutherford commented Sep 26, 2023

Again this might technically be an ExaGO (or PETSc or HiOp) issue, but trying to figure out who's to blame here

@cnpetra
Copy link
Collaborator

cnpetra commented Sep 27, 2023

we had this issue before with mfem if I recall correctly. One the defines has to go. I think HiOp can take with however petsc defines MPI_Comm. So an easy fix would be for HiOp to check if already defined. This is for when HIOP_USE_MPI is off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants