You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to create a shared MPI COMM between two executables which are both started independently, e.g.
mpiexec -n 1 ./exe1
mpiexec -n 1 ./exe2
I use MPI_Open_port to generate port details and write these to a file in exe1 and then read with exe2. This is followed by MPI_Comm_connect/MPI_Comm_accept and then send/recv communication (minimal example below).
My question is: can we write port information to file in this way, or is the MPI_Publish_name/MPI_Lookup_name required for MPI to work as in this, this and this? As supercomputers usually share a file system, this file based approach seems simpler and maybe avoids establishing a server. It seems this should work according to the MPI_Open_Port documentation in the MPI 3.1 standard,
port_name is essentially a network address. It is unique within the communication universe to which it belongs (determined by the implementation), and may be used by any client within that communication universe. For instance, if it is an internet (host:port) address, it will be unique on the internet. If it is a low level switch address on an IBM SP, it will be unique to that SP
The following should be compatible with MPI: The server prints out an address to the terminal, the user gives this address to the client program.
MPI does not require a nameserver
A port_name is a system-supplied string that encodes a low-level network address at which a server can be contacted.
By itself, the port_name mechanism is completely portable ...
which all suggests that it should be possible to work in this manner, but I cannot get OpenMPI to work in this way (or maybe a specific solution exists?). Writing the port information to file does work as expected, i.e creates a shared communicator and exchanges information using MPICH (3.2) but hangs at the MPI_Comm_connect/MPI_Comm_accept line when using OpenMPI versions 2.0.1 and 4.0.1 (on my local workstation running Ubuntu 12.04 but eventually needs to work on a tier 1 supercomputer).
Further Information
If I use the MPMD mode with OpenMPI,
mpiexec -n 1 ./exe1 : -n 1 ./exe2
this works correctly, so must be an issue with allowing the jobs to share ompi_global_scope as in this question. I've also tried adding,
with info passed to all commands, with no success. I'm not running a server/client model as both codes run simultaneously so sharing a URL/PID from one is not ideal, although I cannot get this to work even using the suggested approach, which for OpenMPI 2.0.1,
ORTE_ERROR_LOG: Bad parameter in file base/rml_base_contact.c at line 161
This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
pmix server init failed
--> Returned value Bad parameter (-5) instead of ORTE_SUCCESS
ORTE_ERROR_LOG: Bad parameter in file base/rml_base_contact.c at line 50
...
A publish/lookup server was provided, but we were unable to connect
to it - please check the connection info and ensure the server
is alive:
Using 4.0.1 means the error should not be related to this bug in OpenMPI.
Minimal code
#include"mpi.h"
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
#include<unistd.h>
#include<iostream>
#include<fstream>usingnamespacestd;intmain( int argc, char *argv[] )
{
int num_errors = 0;
int rank, size;
char port1[MPI_MAX_PORT_NAME];
char port2[MPI_MAX_PORT_NAME];
MPI_Status status;
MPI_Comm comm1, comm2;
int data = 0;
char *ptr;
int runno = strtol(argv[1], &ptr, 10);
for (int i = 0; i < argc; ++i)
printf("inputs %d %d %s \n", i,runno, argv[i]);
MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &size);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (runno == 0)
{
printf("0: opening ports.\n");fflush(stdout);
MPI_Open_port(MPI_INFO_NULL, port1);
printf("opened port1: <%s>\n", port1);
//Write port file
ofstream myfile;
myfile.open("port");
if( !myfile )
cout << "Opening file failed" << endl;
myfile << port1 << endl;
if( !myfile )
cout << "Write failed" << endl;
myfile.close();
printf("Port %s written to file \n", port1); fflush(stdout);
printf("Attempt to accept port1.\n");fflush(stdout);
//Establish connection and send dataMPI_Comm_accept(port1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &comm1);
printf("sending 5 \n");fflush(stdout);
data = 5;
MPI_Send(&data, 1, MPI_INT, 0, 0, comm1);
MPI_Close_port(port1);
}
elseif (runno == 1)
{
//Read port filesize_t chars_read = 0;
ifstream myfile;
//Wait until file exists and is avaialble
myfile.open("port");
while(!myfile){
myfile.open("port");
cout << "Opening file failed" << myfile << endl;
usleep(30000);
}
while( myfile && chars_read < 255 ) {
myfile >> port1[ chars_read ];
if( myfile )
++chars_read;
if( port1[ chars_read - 1 ] == '\n' )
break;
}
printf("Reading port %s from file \n", port1); fflush(stdout);
remove( "port" );
//Establish connection and recieve dataMPI_Comm_connect(port1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &comm1);
MPI_Recv(&data, 1, MPI_INT, 0, 0, comm1, &status);
printf("Received %d 1\n", data); fflush(stdout);
}
//Barrier on intercomm before disconnectingMPI_Barrier(comm1);
MPI_Comm_disconnect(&comm1);
MPI_Finalize();
return0;
}
The 0 and 1 simply specify if this code writes a port file or reads it in the example above. This is then run with,
mpiexec -n 1 ./a.out 0
mpiexec -n 1 ./a.out 1
The text was updated successfully, but these errors were encountered:
Background information
What version of Open MPI are you using?
Tried with OpenMPI 2.0.1 and 4.0.1 release versions
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Built from source tarball.
Please describe the system on which you are running
Ubuntu 12.04 on HP Z620 (as well as a few others including 16.04 and CentOS)
Details of the problem
This is posted on stackoverflow (https://stackoverflow.com/questions/57039793/can-i-link-two-separate-executables-with-mpi-open-port-and-share-port-informatio), apologies for cross posting but after a week with bounty, I realise it is probably an issue rather than me being stupid :)
I'm trying to create a shared
MPI COMM
between two executables which are both started independently, e.g.I use
MPI_Open_port
to generate port details and write these to a file in exe1 and then read with exe2. This is followed byMPI_Comm_connect
/MPI_Comm_accept
and then send/recv communication (minimal example below).My question is: can we write port information to file in this way, or is the
MPI_Publish_name/MPI_Lookup_name
required for MPI to work as in this, this and this? As supercomputers usually share a file system, this file based approach seems simpler and maybe avoids establishing a server. It seems this should work according to theMPI_Open_Port
documentation in the MPI 3.1 standard,In addition, according to documentation on the MPI forum: https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node213.htm#Node213
which all suggests that it should be possible to work in this manner, but I cannot get OpenMPI to work in this way (or maybe a specific solution exists?). Writing the port information to file does work as expected, i.e creates a shared communicator and exchanges information using MPICH (3.2) but hangs at the
MPI_Comm_connect
/MPI_Comm_accept
line when using OpenMPI versions 2.0.1 and 4.0.1 (on my local workstation running Ubuntu 12.04 but eventually needs to work on a tier 1 supercomputer).Further Information
If I use the MPMD mode with OpenMPI,
this works correctly, so must be an issue with allowing the jobs to share
ompi_global_scope
as in this question. I've also tried adding,with info passed to all commands, with no success. I'm not running a server/client model as both codes run simultaneously so sharing a URL/PID from one is not ideal, although I cannot get this to work even using the suggested approach, which for OpenMPI 2.0.1,
gives,
and with OpenMPI 4.0.1,
gives,
Using 4.0.1 means the error should not be related to this bug in OpenMPI.
Minimal code
The 0 and 1 simply specify if this code writes a port file or reads it in the example above. This is then run with,
The text was updated successfully, but these errors were encountered: