-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MPI_Init segfaults when using OpenMPI 3.1.3 with flux-core 0.11.0 #2170
Comments
The code inside the openmpi Flux plugin in static int PMI_KVS_Put (const char *kvsname, const char *key, const char *value)
{
int (*f)(const char *, const char *, const char *);
*(void **)(&f) = dso ? dlsym (dso, "PMI_KVS_Put") : NULL;
return f ? f (kvsname, key, value) : PMI_FAIL;
} and Also, what is with that cast? It seems like f = dso ? dlsym (dso, "PMI_KVS_Put") : NULL would have worked just fine. Did I write that? Ugh. Plan of attack. Well, it's hard to know what might be going on, although our other efforts to abuse |
It seems OpenMPI dies trying to dlopen/dlsym a symbol from a PMI library. Is flux's libpmi in the library search path? Is there an easy way to check which libpmi your OpenMPI dlopen'ed? |
@damora: You are not using Spectrum MPI at all due to an issue with the recent bug in PMIx? |
Final thought before taking my kids to school: for spectrum MPI we had to nuke some environment variables to make it work and we capture that logic in our wreck plugin. We might need to use a similar trick to redirect OpenMPI's requests from its PMIx to Flux. This is in case you are really using OpenMPI. This whole thing about PMIx, PMI and bootstrapped is becoming increasingly complex and convoluted and I do hope the standardization effort to fix the interoperability issues for all. sigh. |
@dongahn not using Spectrum MPI. I checked out PMIX 2.1 and built it. I then checked out OpenMPI 3.1.2 and configured it with --with-pmix=/pmix2.1 |
@damora: yes that should be good to launch flux with mpirun/openmpi. But I am not sure that is good enough for flux to bootstrap an MPI app that was build with the openmpi configured this way. Perhaps you can build another OpenMPI with a non PMIx option which is more compatible with flux. I forgot. Was it --with-flux? @garlick? If you build your MPI applications with that version of OpenMPI, you would have a better chance to launch them under flux IMHO. |
Ah that is useful info.
Your trace appears to be from an application linked against openmpi that is trying to bootstrap under Flux and failing in |
I think the flux mca plugin gets built by default. The trace shows it is being used. I guess the question is whether building openmpi without pmix results in an mpirun that can start flux. I haven't tried that. |
@garlick yes, using (open) mpirun seems to start flux correctly...at least I see no errors in the logs. I can also do flux hwloc info and it returns the correct core counts. After I launch the jobs, I can use flux wreck commands to query and look at output. I don't actually see any errors in flux.log Should I have built openmpi without specifying pmix? Should I configure openmpi to use --with-flux-pmi and --with-flux-pmi-library |
Before going after a big hammer solution like building a differently configured OpenMPI, could you quickly check which libpmi.so that is being dlopen'ed here? I am wondering if another libpmi or similar has been dlopen'ed that led to this error. If you set your |
@dongahn same error. I can try rebuilding openmpi using --with-flux-pmi-lib |
flux builds its own libpmi.so. When configuring openmpi --with-flux-pmi=yes (which is default) it will automatically use the libpmi.so that got built with flux? If so, I wonder if that is the problem ? if the default is --with-flux-pmi=yes, is mpirun trying to use the libpmi.so built with flux rather than the one I provided via pmix 2.1? |
So here is the convoluted nature of this problem.
That's why I suggested to build one more OpenMPI configuration (built with Does this make sense? |
@dongahn yes, I tried this combination, but still did not work. |
@damora: Thank you for the report. Some of us may need to look at whether flux can still bootstrap OpenMPI.
Yes. Great thanks. |
@dongahn I was reviewing how I configured flux to work in containers where I can launch in containers, exec into container and run an mpi application across multiple containers using flux wreckrun and I noticed a couple things:
|
@damora: at this point, i propose we hold a call to try to resolve your issue. |
@dongahn how about if I setup a Webex so that way I can show you the steps I'm taking? |
Sounds good. Please send that info to me and @SteVwonder. Thanks |
I have been able to reproduce (or at least create a very similar scenario) on an LC system (opal) using the OpenMPI 4.0.0 installed in /usr/tce. So it appears that this is a more general problem with OpenMPI that we first anticipated Backtrace from my segfault:
|
Does it mean |
|
I am confused. One is dying in |
|
Ah... then your theory makes sense to me. |
Idk if this helps, but here is a subset (removed most of the bindings and left the init/fini) of the output from running with LD_DEBUG=binding:
It is worth noting that there is another set of PMI_Init and Finalize called earlier on in the trace:
|
If the outputs are printed in chronological order, it appears the finalizer function of our
Maybe OpenMPI dlopens a series of PMIs (available in the library search path) and registers them in its DSO table and activate whatever is required given environment variables and such. To debug this further, I would think that we may need to install a debug version of OpenMPI and stepping through the source codes. My current debug support branch may be of help here. |
@SteVwonder: we can try to debug this using the new debugger support branch branch tomorrow if you have some time. flux-framework/flux-core-v0.11#12 (comment). It is less likely I will be able to spend time for this next week as I will be on travel for a week. |
@SteVwonder @dongahn I tried setting LD_DEBUG=cat all and then searched for PMI_KVS_put: It seems like it is looking in the right library |
@damora: @SteVwonder can give more details, but he and I had a debugging session with the new parallel debugger support I added. We determined that this was caused by what appears to be a bug within OpenMPI. Essentially, they seemed to introduce a bug in an upper layer with respect to PMI init/fini reference counting such that Flux's PMI library was prematurely finalized and closed with Real trouble is there are a number of OpenMPI releases around 4.0.0 that break Flux this way. So your W/R should be ether avoid these versions or maybe patch your local flux source code to make its @SteVwonder has a good idea about which OpenMPI commit introduced this bug. So he may file a issue ticket to OpenMPI repo. |
@dongahn @SteVwonder I saw Steve's git issues regarding this, but it seemed like he was using 4.x OpenMPI, but the error I'm seeing is with OpenMPI 3.1.3. I can easily patch my local flux source code though to see if it fixes |
@damora: I don't remember which version at which this bug was introduced but I wouldn't be surprised if this goes all the back to the 3.1.3 version. Yes make PMI_Finalize noop and see if that works around it. |
Yeah. We believe this commit is the one that introduced the problem, which means OpenMPI 3.1.0+ are all affected. |
Just posted an issue on the OpenMPI GitHub issue tracker. I also think I found the source of the reference counting error. After fixing it, OpenMPI now segfaults when finalizing. So it is at least progress towards a complete solution. |
Is there anything we need to fix in our pmi-1 library, or maybe something we can do to make it more robust with respect to how OpenMPI is using it? |
@garlick: It wasn't clear if there was anything we could do in flux. To work around I suggested @damora to make our PMI_Finalize a noop (i.e. comment out the inside of the function for now. I know bad... but at least this may allow him to run MuMMI with a OMPi version he is using. One of the things that our PMI_Finalize does is to close the PMI file descriptor. So if there is anything that the server side does that assumes close (PMI_FD) from the client, we can make it a bit robust by not assuming it for a hack like this. I don't think it would be the case though. |
That will leak some state on the server side (in wrexecd in 0.11). Not ideal but not the end of the world either. Maybe we could introduce an environment variable that turns |
@dongahn I actually reverted to ompi 3.0.4. I can run cuda mpi single task app:
|
When @SteVwonder and I debugged this bug, we actually used 1 process hello world. Ugh... great another bug... Would this have anything to do with CUDA-aware MPI? How did you compile your code?
Since it is dying within BTW if you still have 3.1.3 around, it may be good to replicate the issue with Flux's |
Now that I think about this, this might be a good opportunity to evaluate if our new tool support can support remote debugging in any capacity. I don't think totalview is there to enable this, but I can certainly evaluate this. Run |
@damora: sorry. I've never found good time to look into this. I will give a priority for this, now that I came back to my office. |
@damora: Ok a couple of things. With your OMPI 3.0.4, could you set the following environment variable to see if this works around the issue? OMPI_MCA_osc = "pt2pt"
OMPI_MCA_pml = "yalla"
OMPI_MCA_btl = "self" Now, for OpenMPI 3.1.3 and higher, it turned out the W/R suggestion of making our So if you want to use these newer versions of OpenMPI, you will need the patches posted at open-mpi/ompi#6730 (Both commits). Let me know how this works. |
I checked with these env vars set and still get the same error when running more than 1 MPI task |
OK. Time to do real debugging on our open Sierra systems then... |
This is against the older execution system, so closing. |
Built the following:
flux-core-0.11.0
flux-sched-0.7.0
OpenMPI 3.1.3
PMIX 2.1 (using libpmi)
can launch flux with mpirun without any errors
basic flux informational commands seem to work, but attempting to run an MPI application I get the following segfault:
[c699c056:139471:0] Caught signal 11 (Segmentation fault)
==== backtrace ====
2 0x00000000000745c4 mxm_handle_error() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3111/src/mxm/util/debug/debug.c:641
3 0x0000000000074d04 mxm_error_signal_handler() /var/tmp/OFED_topdir/BUILD/mxm-3.7.3111/src/mxm/util/debug/debug.c:616
4 0x000000000000fe38 _dl_lookup_symbol_x() :0
5 0x0000000000182c98 do_sym() dl-sym.c:0
6 0x000000000000139c dlsym_doit() dlsym.c:0
7 0x00000000000170d0 _dl_catch_error() :0
8 0x0000000000001c18 _dlerror_run() :0
9 0x0000000000001448 __dlsym() :0
10 0x00000000000026dc PMI_KVS_Put() /home/damora/openmpi-3.1.3/build/opal/mca/pmix/flux/../../../../../opal/mca/pmix/flux/pmix_flux.c:245
11 0x0000000000002b84 kvs_put() /home/damora/openmpi-3.1.3/build/opal/mca/pmix/flux/../../../../../opal/mca/pmix/flux/pmix_flux.c:294
12 0x0000000000108190 opal_pmix_base_partial_commit_packed() /home/damora/openmpi-3.1.3/build/opal/mca/pmix/../../../../opal/mca/pmix/base/pmix_base_fns.c:384
13 0x0000000000003fe8 flux_put() /home/damora/openmpi-3.1.3/build/opal/mca/pmix/flux/../../../../../opal/mca/pmix/flux/pmix_flux.c:654
14 0x0000000000005084 mca_pml_ucx_send_worker_address() /home/damora/openmpi-3.1.3/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:95
15 0x0000000000005898 mca_pml_ucx_init() /home/damora/openmpi-3.1.3/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx.c:230
16 0x000000000000b988 mca_pml_ucx_component_init() /home/damora/openmpi-3.1.3/build/ompi/mca/pml/ucx/../../../../../ompi/mca/pml/ucx/pml_ucx_component.c:134
17 0x000000000018003c mca_pml_base_select() /home/damora/openmpi-3.1.3/build/ompi/mca/pml/../../../../ompi/mca/pml/base/pml_base_select.c:126
18 0x0000000000071a4c ompi_mpi_init() /home/damora/openmpi-3.1.3/build/ompi/../../ompi/runtime/ompi_mpi_init.c:640
19 0x00000000000e34d4 PMPI_Init() /home/damora/openmpi-3.1.3/build/ompi/mpi/c/profile/pinit.c:66
20 0x000000001000114c main() ??:0
21 0x0000000000025100 generic_start_main.isra.0() libc-start.c:0
22 0x00000000000252f4 __libc_start_main() ??:0
The text was updated successfully, but these errors were encountered: