Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linking separate executables with MPI_open_port sharing port information in a text file #6878

Open
edwardsmith999 opened this issue Aug 7, 2019 · 0 comments

Comments

@edwardsmith999
Copy link

edwardsmith999 commented Aug 7, 2019

Background information

What version of Open MPI are you using?

Tried with OpenMPI 2.0.1 and 4.0.1 release versions

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

Built from source tarball.

Please describe the system on which you are running

Ubuntu 12.04 on HP Z620 (as well as a few others including 16.04 and CentOS)


Details of the problem

This is posted on stackoverflow (https://stackoverflow.com/questions/57039793/can-i-link-two-separate-executables-with-mpi-open-port-and-share-port-informatio), apologies for cross posting but after a week with bounty, I realise it is probably an issue rather than me being stupid :)

I'm trying to create a shared MPI COMM between two executables which are both started independently, e.g.

mpiexec -n 1 ./exe1 
mpiexec -n 1 ./exe2

I use MPI_Open_port to generate port details and write these to a file in exe1 and then read with exe2. This is followed by MPI_Comm_connect/MPI_Comm_accept and then send/recv communication (minimal example below).

My question is: can we write port information to file in this way, or is the MPI_Publish_name/MPI_Lookup_name required for MPI to work as in this, this and this? As supercomputers usually share a file system, this file based approach seems simpler and maybe avoids establishing a server. It seems this should work according to the MPI_Open_Port documentation in the MPI 3.1 standard,

port_name is essentially a network address. It is unique within the communication universe to which it belongs (determined by the implementation), and may be used by any client within that communication universe. For instance, if it is an internet (host:port) address, it will be unique on the internet. If it is a low level switch address on an IBM SP, it will be unique to that SP

In addition, according to documentation on the MPI forum: https://www.mpi-forum.org/docs/mpi-2.2/mpi22-report/node213.htm#Node213

  • The following should be compatible with MPI: The server prints out an address to the terminal, the user gives this address to the client program.

  • MPI does not require a nameserver

  • A port_name is a system-supplied string that encodes a low-level network address at which a server can be contacted.

  • By itself, the port_name mechanism is completely portable ...

which all suggests that it should be possible to work in this manner, but I cannot get OpenMPI to work in this way (or maybe a specific solution exists?). Writing the port information to file does work as expected, i.e creates a shared communicator and exchanges information using MPICH (3.2) but hangs at the MPI_Comm_connect/MPI_Comm_accept line when using OpenMPI versions 2.0.1 and 4.0.1 (on my local workstation running Ubuntu 12.04 but eventually needs to work on a tier 1 supercomputer).

Further Information

If I use the MPMD mode with OpenMPI,

mpiexec -n 1 ./exe1 : -n 1 ./exe2

this works correctly, so must be an issue with allowing the jobs to share ompi_global_scope as in this question. I've also tried adding,

MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "ompi_global_scope", "true");

with info passed to all commands, with no success. I'm not running a server/client model as both codes run simultaneously so sharing a URL/PID from one is not ideal, although I cannot get this to work even using the suggested approach, which for OpenMPI 2.0.1,

mpirun -n 1 --report-pid + ./OpenMPI_2.0.1 0
1234

mpirun -n 1 --ompi-server pid:1234 ./OpenMPI_2.0.1 1

gives,

ORTE_ERROR_LOG: Bad parameter in file base/rml_base_contact.c at line 161

This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  pmix server init failed
  --> Returned value Bad parameter (-5) instead of ORTE_SUCCESS

and with OpenMPI 4.0.1,

mpirun -n 1 --report-pid + ./OpenMPI_4.0.1 0
1234

mpirun -n 1 --ompi-server pid:1234 ./OpenMPI_4.0.1 1

gives,

ORTE_ERROR_LOG: Bad parameter in file base/rml_base_contact.c at line 50

...

A publish/lookup server was provided, but we were unable to connect
to it - please check the connection info and ensure the server
is alive:

Using 4.0.1 means the error should not be related to this bug in OpenMPI.

Minimal code

    #include "mpi.h"
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <unistd.h>
    #include <iostream>
    #include <fstream>

    using namespace std;

    int main( int argc, char *argv[] )
    {
        int num_errors = 0;
        int rank, size;
        char port1[MPI_MAX_PORT_NAME];
        char port2[MPI_MAX_PORT_NAME];
        MPI_Status status;
        MPI_Comm comm1, comm2;
        int data = 0;

        char *ptr;
        int runno = strtol(argv[1], &ptr, 10);
        for (int i = 0; i < argc; ++i)
            printf("inputs %d %d %s \n", i,runno, argv[i]);

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);

        if (runno == 0)
        {
            printf("0: opening ports.\n");fflush(stdout);
            MPI_Open_port(MPI_INFO_NULL, port1);
            printf("opened port1: <%s>\n", port1);

            //Write port file
            ofstream myfile;
            myfile.open("port");
            if( !myfile )
                    cout << "Opening file failed" << endl;
            myfile << port1 << endl;
            if( !myfile )
                cout << "Write failed" << endl;
            myfile.close();

            printf("Port %s written to file \n", port1); fflush(stdout);

            printf("Attempt to accept port1.\n");fflush(stdout);

            //Establish connection and send data
            MPI_Comm_accept(port1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &comm1);

            printf("sending 5 \n");fflush(stdout);
            data = 5;
            MPI_Send(&data, 1, MPI_INT, 0, 0, comm1);
            MPI_Close_port(port1);
        }
        else if (runno == 1)
        {

            //Read port file
            size_t   chars_read = 0;  
            ifstream myfile;
            //Wait until file exists and is avaialble
            myfile.open("port");
            while(!myfile){
                myfile.open("port");
                cout << "Opening file failed" << myfile << endl;
                usleep(30000);
            }
            while( myfile && chars_read < 255 ) {
                myfile >> port1[ chars_read ];    
                if( myfile ) 
                     ++chars_read; 
                
                if( port1[ chars_read - 1 ] == '\n' ) 
                     break;
            }
            printf("Reading port %s from file \n", port1); fflush(stdout);
            remove( "port" );

            //Establish connection and recieve data
            MPI_Comm_connect(port1, MPI_INFO_NULL, 0, MPI_COMM_WORLD, &comm1);
            MPI_Recv(&data, 1, MPI_INT, 0, 0, comm1, &status);
            printf("Received %d 1\n", data); fflush(stdout);

        }

        //Barrier on intercomm before disconnecting
        MPI_Barrier(comm1);
        MPI_Comm_disconnect(&comm1);
        MPI_Finalize();
        return 0;
    }

The 0 and 1 simply specify if this code writes a port file or reads it in the example above. This is then run with,

mpiexec -n 1 ./a.out 0 
mpiexec -n 1 ./a.out 1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant