Skip to content
This repository has been archived by the owner on Jul 27, 2020. It is now read-only.

MPI build and segfault #190

Closed
keith923 opened this issue Sep 29, 2014 · 10 comments
Closed

MPI build and segfault #190

keith923 opened this issue Sep 29, 2014 · 10 comments
Assignees

Comments

@keith923
Copy link
Contributor

I was working an a release for SDSC and ran into a few problems.

The first is that the FETK configure.in files do not include the mpi libraries when performing their MALOC linkage test. The result is that they err thinking that MALOC is not available, where in fact there are undefined MPI symbols in MALOC.

The second is that once FETK and MALOC are properly built, and an MPI run is attempted, a segfault is thrown while trying to print the output of the run to stdio.

@sobolevnrm
Copy link
Member

Any updates on this? @lizutah @kmonson

@sobolevnrm sobolevnrm added this to the APBS 1.4.2 release milestone Aug 30, 2015
@keith923
Copy link
Contributor Author

I never successfully tracked down the cause of the segfault. I do not remember what it was, but something else preempted this task, and I did not get back around to finding the problem.

I do remember that appears that some process is attempting to write to a file descriptor that it didn't open. My suspicion at the time was that it was somehow related to MALOC. Given that MALOC and FETK don't even properly include MPI libs during their configuration process, it's possible that MPI support is broken even worse than just not including libraries.

@sobolevnrm
Copy link
Member

OK; thanks for the update.

I think we need to do a better job documenting these problems via issues.

On Mon, Aug 31, 2015 at 9:57 AM, Keith T. Star [email protected]
wrote:

I never successfully tracked down the cause of the segfault. I do not
remember what it was, but something else preempted this task, and I did not
get back around to finding the problem.

I do remember that appears that some process is attempting to write to a
file descriptor that it didn't open. My suspicion at the time was that it
was somehow related to MALOC. Given that MALOC and FETK don't even properly
include MPI libs during their configuration process, it's possible that MPI
support is broken even worse than just not including libraries.


Reply to this email directly or view it on GitHub
#190 (comment)
.

@keith923
Copy link
Contributor Author

Very much agreed. I'm disappointed that I let this one slip.

@keith923
Copy link
Contributor Author

keith923 commented Sep 1, 2015

My plan is to create MALOC and FETK repositories here under Electrostatics, and populate them with what I used for the 1.4.1 release. At that point, I can update the configure scripts to actually link against the MPI libs and update our APBS build to depend on these two repos.

When that's all done, I can dig into the code and find the bit that's causing this segfault. Sound OK @sobolevnrm? @lizutah?

@lizutah
Copy link

lizutah commented Sep 1, 2015

Sounds reasonable to me.

@sobolevnrm
Copy link
Member

Yes. I'm in favor of ditching MPI eventually... there are large enough
computers available now that it's not needed.

On Tue, Sep 1, 2015 at 1:38 PM, Keith T. Star [email protected]
wrote:

My plan is to create MALOC and FETK repositories here under
Electrostatics, and populate them with what I used for the 1.4.1 release.
At that point, I can update the configure scripts to actually link against
the MPI libs and update our APBS build to depend on these two repos.

When that's all done, I can dig into the code and find the bit that's
causing this segfault. Sound OK @sobolevnrm
https://github.com/sobolevnrm? @lizutah https://github.com/lizutah?


Reply to this email directly or view it on GitHub
#190 (comment)
.

@keith923
Copy link
Contributor Author

keith923 commented Sep 2, 2015

Status Update

Gaaahhhhh!! It all comes flooding back...

Sadly the MALOC in the FETK I'm using is built with autotools. That's a nonstarter for Windows. I think the best route is to replace the MALOC source in the FETK tree with the CMake enabled version that Andrew and Kyle created.

@keith923
Copy link
Contributor Author

keith923 commented Sep 3, 2015

The master branch now has support for FETK. It depends on a Git submodule that points to our FETK repository. If you invoke cmake with -DENABLE_FETK=ON it will (on non-Win32 boxes) build FETK using autotools and link against that. If you don't enable FETK, it will use the CMake build system we bolted to MALOC to build MALOC and use that.

Now to update the configure scripts to include the MPI libs...

keith923 added a commit that referenced this issue Sep 4, 2015
…t used to use) when building with FETK. I also fixed a spelling error in the CMakeLists.txt files that probably seriously wrecked FETK builds. Finally, I bumped the APBS version to what it should be.
@lizutah lizutah assigned kozlac and unassigned keith923 Sep 10, 2015
@kozlac kozlac assigned keith923 and unassigned kozlac Oct 15, 2015
keith923 added a commit that referenced this issue Nov 18, 2015
keith923 added a commit that referenced this issue Dec 11, 2015
…dded HAVE_MPI_H define so that the APBS MPI code thinks it can. I think I can, I think I can, ... For issue #190.
keith923 added a commit that referenced this issue Dec 11, 2015
…. This (hopefully, once and for all) closes isue #190.
@keith923
Copy link
Contributor Author

This turned out to be a combination of so many different issues. The configure.ac files in FETk weren't including the MPI libs during the final link, and they were also missing from some intermediate compilation tests. Defines for compilation were not getting set in in the main CMake file. They were always being set in the MALOC CMake file. It should work with FETk (built-in MALOC) and without (external, CMake-based MALOC, which is now source integrated with FETk's, BTW). There are probably other problems that got fixed and I don't remember -- but at least the commit history exists.

NB: There are potentially complications that will arise if you build both with and without FETk from the same repo clone.

At any rate, I've had this build and run successfully on Constance and Olympus. YMMV, and I won't be surprised if we end up fielding support requests in the future. But hopefully not.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants