Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standardize the MPI_Status object #2

Open
jeffhammond opened this issue Nov 18, 2022 · 2 comments
Open

standardize the MPI_Status object #2

jeffhammond opened this issue Nov 18, 2022 · 2 comments

Comments

@jeffhammond
Copy link
Member

jeffhammond commented Nov 18, 2022

Problem

We must standardize the layout of the MPI_Status object to have an ABI.

Background (from Jeff's blog)

Let's look at three different implementations of the MPI_Status object:

New MPICH

This is the status object after this commit, which made MPICH consistent with Intel MPI, in order to establish the MPICH ABI initiative. This meant that applications and libraries compiled against Intel MPI could be run using many implementations.

typedef struct MPI_Status {
    int count_lo;
    int count_hi_and_cancelled;
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
} MPI_Status;

Old MPICH

Prior to being consistent with Intel MPI, MPICH had the following status object.

// dnl    EXTRA_STATUS_DECL     - Any extra declarations that the device
// dnl                            needs added to the definition of MPI_Status.
...
typedef struct MPI_Status {
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    MPI_Count count;
    int cancelled;
    int abi_slush_fund[2];
    @EXTRA_STATUS_DECL@
} MPI_Status;

Open-MPI

This is from Open-MPI as of 65bb9e6.
I have not attempted to track the history of the Open-MPI status object.

typedef struct ompi_status_public_t MPI_Status;
...
struct ompi_status_public_t {
    /* These fields are publicly defined in the MPI specification.
       User applications may freely read from these fields. */
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    /* The following two fields are internal to the Open MPI
       implementation and should not be accessed by MPI applications.
       They are subject to change at any time.  These are not the
       droids you're looking for. */
    int _cancelled;
    size_t _ucount;
};
typedef struct ompi_status_public_t ompi_status_public_t;

The wi4mpi ABI for the status object is the same as Open-MPI's:

struct CCC_mpi_status_struct {
    /* These fields are publicly defined in the MPI specification.
       User applications may freely read from these fields. */
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    /* The following two fields are internal to the Open MPI
       implementation and should not be accessed by MPI applications.
       They are subject to change at any time.  These are not the
       droids you're looking for. */
    int _cancelled;
    size_t _ucount;
};
typedef struct CCC_mpi_status_struct MPI_Status;

Proposal

Standardize this:

// N is 3 or 5
typedef struct MPI_Status {
    int MPI_SOURCE;
    int MPI_TAG;
    int MPI_ERROR;
    int extra[N];
} MPI_Status;

Put the public fields first and use 24-32 bytes total, which is sufficient for what both of the major ABIs do right now.

I'm not aware of any architectural advantage of the 20 bytes Intel MPI uses.

One could be conservative and round up to 32 bytes, which has some architectural advantages, since many modern CPUs have 256-bit data paths.

Changes to the Text

Modify MPI 4.0 §3.2.5 Return Status appropriately.

Impact on Implementations

All implementations will have to change. Since it is likely that they use macros for manipulating status objects, the amount of code that needs to change is small.

This will break existing ABIs, which some implementers will not like (e.g. https://www.mpich.org/abi/) but ABI standardization requires this.

Impact on Users

Standard ABI is good.

References and Pull Requests

https://github.com/jeffhammond/blog/blob/main/MPI_Needs_ABI_Part_2.md

@cniethammer
Copy link

Extending the struct to have a size of 32 bytes (extra[5]) seems a good idea to help with cache line alignment - e.g., in handling arrays for multiple status objects in MPI_Waitall, MPI_Test_all.

@jeffhammond
Copy link
Member Author

Yes, I think 32B is the nicest option since it gives implementations the most flexibility to add things. One hopes that the increase of 8-12B versus existing implementations does not cause too much angst over memory consumption 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants