You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The MPI-3 standard introduced functions which enable multiple ranks on the same node / NUMA-domain to use shared local memory to communicate.
Can be used to for example share input data.
Faster than RDMA put and get, even on a single node.
Partitioning by NUMA-domain currently not standardized.
The shared memory region has a different virtual address on the different MPI processes. Some caveats and even more caveats.
Supporting this functionality in KaMPIng could be a unique selling point and very useful. There are probably multiple levels of support:
Wrap the MPI calls to provide a shmalloc().
Implement a C++ allocator and offset_ptr. These might not work with the SLT but possibly with the Boost containers created specifically for shared memory.
Provide faster communication using (a) shared send/recv buffers with parallel serialization/deserialization + fewer messages per node (b) faster inner-node communication (MPI seems to have some problems with inner node communication according to early experiments done by @mschimek). All of this would, however, just be a way of avoiding MPI+OpenMP and simplify being able to claim that you're using hybrid parallelization.
For sake of completeness: It seems as if one could also remap the shared memory region to another virtual address. On a 64bit system, there might even be a large enough block of virtual addresses which are available on all ranks, and we'd thus be able to map the shared memory region there and use raw pointers again.
The text was updated successfully, but these errors were encountered:
I think this is a very interesting proposal and I share Lukas's view that this could be a beneficial feature for KaMPIng.
(@lukashuebner you mean MPI+OpenMP, don't you?)
The MPI-3 standard introduced functions which enable multiple ranks on the same node / NUMA-domain to use shared local memory to communicate.
Supporting this functionality in KaMPIng could be a unique selling point and very useful. There are probably multiple levels of support:
shmalloc()
.For sake of completeness: It seems as if one could also remap the shared memory region to another virtual address. On a 64bit system, there might even be a large enough block of virtual addresses which are available on all ranks, and we'd thus be able to map the shared memory region there and use raw pointers again.
The text was updated successfully, but these errors were encountered: