v4.1.x: MPI File I/O fails with device memory #11798

gonzalobg · 2023-07-05T10:08:23Z

MPI and System

OpenMPI from the NVIDIA HPC SDK. IIUC, built from HPC-X 2.15 sources.

ompi_info

Package: Open MPI qa@sky1 Distribution
                Open MPI: 4.1.5rc2
  Open MPI repo revision: v4.1.5rc1-16-g5980bac
   Open MPI release date: Unreleased developer copy
                Open RTE: 4.1.5rc2
  Open RTE repo revision: v4.1.5rc1-16-g5980bac
   Open RTE release date: Unreleased developer copy
                    OPAL: 4.1.5rc2
      OPAL repo revision: v4.1.5rc1-16-g5980bac
       OPAL release date: Unreleased developer copy
                 MPI API: 3.1.0
            Ident string: 4.1.5rc2
                  Prefix: /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi
 Configured architecture: x86_64-pc-linux-gnu
          Configure host: sky1
           Configured by: qa
           Configured on: Wed May 10 16:39:18 UTC 2023
          Configure host: sky1
  Configure command line: 'CC=gcc' 'CXX=g++' 'FC=nvfortran'
                          'LDFLAGS=-Wl,-rpath-link=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucx/lib
                          -Wl,-rpath-link=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/hcoll/lib'
                          '--with-platform=../contrib/platform/nvhpc/optimized'
                          '--enable-mpi1-compatibility'
                          '--with-libevent=internal' '--without-xpmem'
                          '--with-slurm'
                          '--with-cuda=/proj/cuda/12.1/Linux_x86_64'
                          '--with-hcoll=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/hcoll'
                          '--with-ucc=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucc'
                          '--with-ucx=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ucx'
                          '--prefix=/proj/nv/libraries/Linux_x86_64/23.5/hpcx-12/229172-rel-1/comm_libs/12.1/hpcx/hpcx-2.15/ompi'
                Built by: qa
                Built on: Wed May 10 16:43:47 UTC 2023
              Built host: sky1
              C bindings: yes
            C++ bindings: no
             Fort mpif.h: yes (all)
            Fort use mpi: yes (full: ignore TKR)
       Fort use mpi size: deprecated-ompi-info-value
        Fort use mpi_f08: yes
 Fort mpi_f08 compliance: The mpi_f08 module is available, but due to
                          limitations in the nvfortran compiler and/or Open
                          MPI, does not support the following: array
                          subsections, direct passthru (where possible) to
                          underlying Open MPI's C functionality
  Fort mpi_f08 subarrays: no
           Java bindings: no
  Wrapper compiler rpath: runpath
              C compiler: gcc
     C compiler absolute: /usr/bin/gcc
  C compiler family name: GNU
      C compiler version: 4.8.5
            C++ compiler: g++
   C++ compiler absolute: /usr/bin/g++
           Fort compiler: nvfortran
       Fort compiler abs: /proj/nv/Linux_x86_64/23.5/compilers/bin/nvfortran
         Fort ignore TKR: yes (!DIR$ IGNORE_TKR)
   Fort 08 assumed shape: yes
      Fort optional args: yes
          Fort INTERFACE: yes
    Fort ISO_FORTRAN_ENV: yes
       Fort STORAGE_SIZE: yes
      Fort BIND(C) (all): yes
      Fort ISO_C_BINDING: yes
 Fort SUBROUTINE BIND(C): yes
       Fort TYPE,BIND(C): yes
 Fort T,BIND(C,name="a"): yes
            Fort PRIVATE: yes
          Fort PROTECTED: yes
           Fort ABSTRACT: yes
       Fort ASYNCHRONOUS: yes
          Fort PROCEDURE: yes
         Fort USE...ONLY: yes
           Fort C_FUNLOC: yes
 Fort f08 using wrappers: yes
         Fort MPI_SIZEOF: yes
             C profiling: yes
           C++ profiling: no
   Fort mpif.h profiling: yes
  Fort use mpi profiling: yes
   Fort use mpi_f08 prof: yes
          C++ exceptions: no
          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
                          OMPI progress: no, ORTE progress: yes, Event lib:
                          yes)
           Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: yes
     MPI parameter check: never
Memory profiling support: no
Memory debugging support: no
              dl support: yes
   Heterogeneous support: no
 mpirun default --prefix: yes
       MPI_WTIME support: native
     Symbol vis. support: yes
   Host topology support: yes
            IPv6 support: no
      MPI1 compatibility: yes
          MPI extensions: affinity, cuda, pcollreq
   FT Checkpoint support: no (checkpoint thread: no)
   C/R Enabled Debugging: no
  MPI_MAX_PROCESSOR_NAME: 256
    MPI_MAX_ERROR_STRING: 256
     MPI_MAX_OBJECT_NAME: 64
        MPI_MAX_INFO_KEY: 36
        MPI_MAX_INFO_VAL: 256
       MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.1.5)
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.1.5)
                 MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.1.5)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.1.5)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.1.5)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.1.5)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.1.5)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.1.5)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.1.5)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.1.5)
              MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v4.1.5)
              MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v4.1.5)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.1.5)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.1.5)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.1.5)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.1.5)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.1.5)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.1.5)
                MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.1.5)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.1.5)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.1.5)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.1.5)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.1.5)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.1.5)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.1.5)
              MCA schizo: jsm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.1.5)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.1.5)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: han (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: ucc (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
                MCA coll: adapt (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: hcoll (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.1.5)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                  MCA op: avx (MCA v2.1.0, API v1.0.0, Component v4.1.5)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.1.5)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.1.5)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.1.5)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.1.5)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.1.5)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.1.5)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.1.5)

Operating system/version: Ubuntu 22.04, uname -r: 5.4.0-84-generic
Computer hardware: Intel(R) Xeon(R) Silver 4314 CPU @ 2.40GHz, single node, no HCA.
Network type: N/A.

Details of the problem

Using MPI I/O to write to a file from a device-only memory allocation (e.g. allocated with cudaMalloc) fails. Allocating that same memory in a host-accessible way, e.g., using cudaMallocManaged works.

The reproducer file is here:

reproducer.cpp

#include <iostream>
#include <mpi.h>
#include <cuda_runtime_api.h>

int main(int argc, char* argv[]) {
  int N = 10;
  int* p;

  if (auto e = cudaMalloc(&p, sizeof(int) * N); e != cudaSuccess) std::cerr << __LINE__, abort();
  if (auto e = cudaMemset(p, (int)'7', sizeof(int) * N); e != cudaSuccess) std::cerr << __LINE__, abort();
  if (auto e = cudaDeviceSynchronize(); e != cudaSuccess) std::cerr << __LINE__, abort();

  int mt = -1;
  MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &mt);
  if (mt != MPI_THREAD_MULTIPLE) std::cerr << __LINE__, abort();
  int nranks, rank;
  MPI_Comm_size(MPI_COMM_WORLD, &nranks);
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
  MPI_File f;
  MPI_File_open(MPI_COMM_WORLD, "output", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &f);
  MPI_Offset bytes = sizeof(int) * (MPI_Offset)N;
  MPI_Offset total_bytes = bytes * (MPI_Offset)nranks;
  MPI_Offset off = bytes * (MPI_Offset)rank;
  MPI_File_set_size(f, total_bytes);
  MPI_Request req;
  MPI_File_iwrite_at(f, off, p, bytes, MPI_INT, &req);
  MPI_Waitall(1, &req, MPI_STATUSES_IGNORE);
  MPI_File_close(&f);
  MPI_Finalize();  

  return 0; 
}

Compile it with any CUDA C++ compiler, e.g., nvcc or nvc++ and running it

OMPI_CXX=nvc++ mpicxx -std=c++20 -stdpar=gpu -o mpi_io_bug mpi_io_bug.cpp
mpirun -np 2 ./mpi_io_bug

fails with this error:

The call to cuMemcpyAsync failed. This is a unrecoverable error and will
cause the program to abort.
  cuMemcpyAsync(0x1b1b5f8, 0x7f25f4a00000, 160) returned value 1

The expected behavior is for this to work correctly.

Full Error Message

--------------------------------------------------------------------------
The call to cuMemcpyAsync failed. This is a unrecoverable error and will
cause the program to abort.
  cuMemcpyAsync(0x1b1b5f8, 0x7f25f4a00000, 160) returned value 1
Check the cuda.h file for what the return value means.
--------------------------------------------------------------------------
[ipp2-0153.nvidia.com:00861] CUDA: Error in cuMemcpy: res=-1, dest=0x1b1b5f8, src=0x7f25f4a00000, size=160
[ipp2-0153:00861] *** Process received signal ***
[ipp2-0153:00861] Signal: Aborted (6)
[ipp2-0153:00861] Signal code:  (-6)
[ipp2-0153.nvidia.com:00860] CUDA: Error in cuMemcpy: res=-1, dest=0x30f1908, src=0x7fc2f6a00000, size=160
[ipp2-0153:00860] *** Process received signal ***
[ipp2-0153:00860] Signal: Aborted (6)
[ipp2-0153:00860] Signal code:  (-6)
[ipp2-0153:00861] [ 0] [ipp2-0153:00860] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f281ae1a520]
[ipp2-0153:00861] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f281ae6ea7c]
[ipp2-0153:00861] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7fc51ca1a520]
[ipp2-0153:00860] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7fc51ca6ea7c]
[ipp2-0153:00860] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f281ae1a476]
[ipp2-0153:00861] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7fc51ca1a476]
[ipp2-0153:00860] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f281ae007f3]
[ipp2-0153:00861] [ 4] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7fc51ca007f3]
[ipp2-0153:00860] [ 4] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(+0x55829)[0x7f281a655829]
[ipp2-0153:00861] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(opal_convertor_pack+0x18f)[0x7f281a647bcf]
[ipp2-0153:00861] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite+0x281)[0x7f25c340aae1]
[ipp2-0153:00861] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite_at+0x49)[0x7f25c340ae39]
[ipp2-0153:00861] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/openmpi/mca_io_ompio.so(mca_io_ompio_file_iwrite_at+0x26)[0x7f25c3805b56]
[ipp2-0153:00861] [ 9] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(+0x55829)[0x7fc51c255829]
[ipp2-0153:00860] [ 5] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libopen-pal.so.40(opal_convertor_pack+0x18f)[0x7fc51c247bcf]
[ipp2-0153:00860] [ 6] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite+0x281)[0x7fc2c140aae1]
[ipp2-0153:00860] [ 7] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmca_common_ompio.so.41(mca_common_ompio_file_iwrite_at+0x49)[0x7fc2c140ae39]
[ipp2-0153:00860] [ 8] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/openmpi/mca_io_ompio.so(mca_io_ompio_file_iwrite_at+0x26)[0x7fc2c9405b56]
[ipp2-0153:00860] [ 9] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmpi.so.40(PMPI_File_iwrite_at+0x5e)[0x7f281f2679ce]
[ipp2-0153:00861] [10] ./mpi_io_bug[0x4024c6]
[ipp2-0153:00861] [11] /opt/nvidia/hpc_sdk/Linux_x86_64/23.5/comm_libs/12.1/hpcx/hpcx-2.15/ompi/lib/libmpi.so.40(PMPI_File_iwrite_at+0x5e)[0x7fc520e679ce]
[ipp2-0153:00860] [10] ./mpi_io_bug[0x4024c6]
[ipp2-0153:00860] [11] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7f281ae01d90]
[ipp2-0153:00861] [12] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90)[0x7fc51ca01d90]
[ipp2-0153:00860] [12] /usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7f281ae01e40]
[ipp2-0153:00861] [13] ./mpi_io_bug[0x402295]
[ipp2-0153:00861] *** End of error message ***
/usr/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80)[0x7fc51ca01e40]
[ipp2-0153:00860] [13] ./mpi_io_bug[0x402295]
[ipp2-0153:00860] *** End of error message ***
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ipp2-0153 exited on signal 6 (Aborted).
--------------------------------------------------------------------------
[ipp2-0153.nvidia.com:00856] 1 more process has sent help message help-mpi-common-cuda.txt / cuMemcpyAsync failed
[ipp2-0153.nvidia.com:00856] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

The text was updated successfully, but these errors were encountered:

edgargabriel · 2023-07-05T14:21:29Z

I wrote this code a few years back, but I have at the moment unfortunately no way to test this on an Nvidia device (nor do I have access to the HPCX compilation of Open MPI). I would need help from somebody from Nvidia to debug this, I am more than happy to assist or answer any questions. I did go over the code in ompi 4.1.5 briefly, it does look correct to me, i.e. it should actually work.

I did test the ompi 5.0 version of the same code (which uses accelerator framework API functions however), and I can confirm that that worked as expected on our devices.

jsquyres · 2023-07-05T21:56:35Z

@janjust Ping

janjust · 2023-07-26T16:15:22Z

@gonzalobg
The issue is in the reproducer's write call.
It's passing bytes and using MPI_INT thus writing outside allocated memory.
MPI_File_iwrite_at(f, off, p, bytes, MPI_INT, &req);

It should either pass bytes, MPI_BYTE, or N, MPI_INT
It works in either case.
Closing issue. @edgargabriel thanks for the debug.

jsquyres added question Target: v4.1.x labels Jul 5, 2023

jsquyres assigned janjust Jul 11, 2023

jsquyres added this to the v4.1.6 milestone Jul 11, 2023

jsquyres changed the title ~~MPI File I/O fails with device memory~~ v4.1.x: MPI File I/O fails with device memory Jul 11, 2023

janjust closed this as completed Jul 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v4.1.x: MPI File I/O fails with device memory #11798

v4.1.x: MPI File I/O fails with device memory #11798

gonzalobg commented Jul 5, 2023

edgargabriel commented Jul 5, 2023

jsquyres commented Jul 5, 2023

janjust commented Jul 26, 2023 •

edited

Loading

v4.1.x: MPI File I/O fails with device memory #11798

v4.1.x: MPI File I/O fails with device memory #11798

Comments

gonzalobg commented Jul 5, 2023

MPI and System

Details of the problem

edgargabriel commented Jul 5, 2023

jsquyres commented Jul 5, 2023

janjust commented Jul 26, 2023 • edited Loading

janjust commented Jul 26, 2023 •

edited

Loading