Skip to content

Commit

Permalink
Merge branch 'fix.InitParticleGeometry.unlock.and.free' into 'master.…
Browse files Browse the repository at this point in the history
…dev'

[fix.InitParticleGeometry.unlock.and.free] Fix missing UNLOCK_AND_FREE calls

See merge request piclas/piclas!896
  • Loading branch information
scopplestone committed Jan 17, 2024
2 parents 0e9325a + e5dc547 commit 137bb79
Show file tree
Hide file tree
Showing 33 changed files with 911 additions and 645 deletions.
10 changes: 9 additions & 1 deletion .gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
# Load modules on new boltzplatz reggie
before_script:
- ulimit -s unlimited
- module list
- python3 -V || true
- python2 -V || true
- if [ -n "${DO_CORE_SPLIT}" ]; then
Expand All @@ -27,6 +26,15 @@ before_script:
export GENERATOR=make;
export NCORES=;
fi
- if [ -n "${DO_MPICH}" ]; then
module purge;
module load cmake/3.26.4 gcc/13.2.0 mpich/4.1.2/gcc/13.2.0 hdf5/1.14.0/gcc/13.2.0/mpich/4.1.2 hopr/master/gcc/13.2.0/mpich/4.1.2/hdf5/1.14.0 petsc/3.19.3/gcc/13.2.0/mpich/4.1.2;
fi
- if [ -n "${DO_MPICH_DEBUG}" ]; then
module purge;
module load cmake/3.26.4 gcc/13.2.0 mpich-debug/4.1.2/gcc/13.2.0 hdf5/1.14.0/gcc/13.2.0/mpich-debug/4.1.2 hopr/master/gcc/13.2.0/mpich-debug/4.1.2/hdf5/1.14.0 petsc/3.19.3/gcc/13.2.0/mpich-debug/4.1.2;
fi
- module list
# ----------------------------------------------------------------------------------------------------------------------------------------------------
# Stages
# ----------------------------------------------------------------------------------------------------------------------------------------------------
Expand Down
7 changes: 5 additions & 2 deletions CMakeListsLib.txt
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ IF(LIBS_USE_MPI)
ENDIF()

MARK_AS_ADVANCED(FORCE MPI_LIBRARY MPI_EXTRA_LIBRARY)
REMOVE_DEFINITIONS(-DLIBS_MPICH)
REMOVE_DEFINITIONS(-DLIBS_MPICH_FIX_SHM_INTERFACE)

# Detect MPI implementation and version since it changes some MPI definitions
IF(MPI_C_LIBRARY_VERSION_STRING MATCHES ".*CRAY MPICH.*" AND MPI_C_VERSION_MAJOR VERSION_EQUAL "3")
Expand All @@ -69,7 +69,10 @@ IF(LIBS_USE_MPI)
ELSEIF(MPI_C_LIBRARY_VERSION_STRING MATCHES ".*MPICH.*" AND MPI_C_VERSION_MAJOR VERSION_GREATER_EQUAL "3")
SET(LIBS_MPI_NAME "MPICH")
STRING(REGEX MATCH "([0-9]+)\\.([0-9]+)" MPI_C_LIBRARY_VERSION ${MPI_C_LIBRARY_VERSION_STRING})
ADD_DEFINITIONS(-DLIBS_MPICH=1)
# Missing interface added in 4.2, see https://github.com/pmodels/mpich/pull/6727
IF(${MPI_C_LIBRARY_VERSION} VERSION_LESS_EQUAL "4.1")
ADD_DEFINITIONS(-DLIBS_MPICH_FIX_SHM_INTERFACE=1)
ENDIF()
ELSEIF(MPI_C_LIBRARY_VERSION_STRING MATCHES ".*Open MPI.*" AND MPI_C_VERSION_MAJOR VERSION_EQUAL "3")
SET(LIBS_MPI_NAME "OpenMPI")
STRING(REGEX MATCH "([0-9]+)\\.([0-9]+)\\.([0-9]+)" MPI_C_LIBRARY_VERSION ${MPI_C_LIBRARY_VERSION_STRING})
Expand Down
60 changes: 60 additions & 0 deletions docs/documentation/developerguide/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,63 @@ at best using the multi-node feature `PICLAS_SHARED_MEMORY=OMPI_COMM_TYPE_CORE`
**git hashes**
- One of these bugs was specifically fixed in
[8d1129be95abc91bf56e94bf7f12987e4c214666](https://github.com/piclas-framework/piclas/commit/8d1129be95abc91bf56e94bf7f12987e4c214666)


## Possible memory leak detection when using MPICH
**Error Description**
- The error output looks like this

====================================================================================================================================
PICLAS FINISHED! [ 2.41 sec ] [ 0:00:00:02 ]
====================================================================================================================================
Abort(810645775): Fatal error in internal_Finalize: Other MPI error, error stack:
internal_Finalize(50)...........: MPI_Finalize failed
MPII_Finalize(400)..............:
MPID_Finalize(652)..............:
MPIDI_OFI_mpi_finalize_hook(856):
destroy_vni_context(1094).......: OFI domain close failed (ofi_init.c:1094:destroy_vni_context:Device or resource busy)
[WARNING] yaksa: 4 leaked handle pool objects

and shows that piclas finishes successfully, but an MPI error is invoked afterwards.
Note that last line containing "[WARNING] yaksa: 4 leaked handle pool objects" might not be there and sometimes reflects the
number of processes minus one.

**Necessary Conditions**
- MPICH must be used instead of OpenMPI. The problem even occurs when only one single process is used.


**Finding the Bug**
- Activate `PICLAS_DEBUG_MEMORY=ON` and check all the pairs of, e.g.,

myrank= 0 Allocated ElemBaryNGeo_Shared_Win with WIN_SIZE = 240

with

myrank= 0 Unlocking ElemBaryNGeo_Shared_Win with MPI_WIN_UNLOCK_ALL()
myrank= 0 Freeing window ElemBaryNGeo_Shared_Win with MPI_WIN_FREE()

to find the missing `CALL UNLOCK_AND_FREE(ElemBaryNGeo_Shared_Win)` by running piclas and storing the output in, e.g., `std.out`
and then running the following command

STDOUT='std.out'; dashes='----------------------------------------'; for line in $(grep -o -P '(?<=Allocated).*(?=with)' ${STDOUT} | sort -u | xargs); do printf 'Checking [%s] %s' "$line" "${dashes:${#line}}"; NbrOfA=$(grep -iw "${line}" ${STDOUT} | grep -ic "Allocated"); printf ' allocated %sx' "$NbrOfA"; NbrOfDA=$(grep -iw "${line}" ${STDOUT} | grep -ic "Unlocking"); printf ' deallocated %sx' "$NbrOfDA"; if [[ $NbrOfDA -lt $NbrOfA ]]; then echo " ---> Could not find MPI_WIN_UNLOCK_ALL() and MPI_WIN_FREE() for this variable"; else echo "... okay"; fi; done

Replace `std.out` in the command if a different file name is used.
If a variable is not correctly freed, the output of the script should look like this

Checking [ElemSideNodeID_Shared_Win] --------------- allocated 2x deallocated 2x... okay
Checking [ElemMidPoint_Shared_Win] ----------------- allocated 2x deallocated 2x... okay
Checking [ElemNodeID_Shared_Win] ------------------- allocated 2x deallocated 1x ---> Could not find all required MPI_WIN_UNLOCK_ALL() and MPI_WIN_FREE() for this variable
Checking [ElemBaryNGeo_Shared_Win] ----------------- allocated 2x deallocated 2x... okay
Checking [ElemRadius2NGeo_Shared_Win] -------------- allocated 2x deallocated 2x... okay
Checking [ElemCurved_Shared_Win] ------------------- allocated 2x deallocated 2x... okay

**Resolution**
- Add the missing `CALL UNLOCK_AND_FREE(MYNAME_Win)`, where `MYNAME` is the name of the shared memory window.

**Explanation**
- `MPI_WIN_UNLOCK_ALL()` and `MPI_WIN_FREE()` must be applied to shared memory windows before `CALL MPI_FINALIZE(iError)` is called.

**git hashes**
- One of these bugs was specifically fixed in
[1a151c24bab7ea22809d3d7554ff5ddf18379cf1](https://github.com/piclas-framework/piclas/commit/1a151c24bab7ea22809d3d7554ff5ddf18379cf1)

10 changes: 5 additions & 5 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -62,14 +62,14 @@ IF(PICLAS_SHARED_MEMORY STREQUAL "MPI_COMM_TYPE_SHARED")
UNSET(PICLAS_SHARED_MEMORY_CORES CACHE)
ADD_DEFINITIONS(-DCORE_SPLIT=0)
MESSAGE(STATUS "Shared memory split type for subcommunicators set to node-level")
ELSEIF(PICLAS_SHARED_MEMORY STREQUAL "PICLAS_COMM_TYPE_NODE")
SET(PICLAS_SHARED_MEMORY_CORES "2" CACHE STRING "Number of cores per node when setting PICLAS_SHARED_MEMORY=PICLAS_COMM_TYPE_NODE. All cores must be on the same physical node!")
ADD_DEFINITIONS(-DCORE_SPLIT=${PICLAS_SHARED_MEMORY_CORES})
MESSAGE(STATUS "Shared memory split type for subcommunicators set to sub-node-level with user-specific value [PICLAS_SHARED_MEMORY_CORES = ${PICLAS_SHARED_MEMORY_CORES}] cores per node")
ELSEIF(PICLAS_SHARED_MEMORY STREQUAL "OMPI_COMM_TYPE_CORE")
UNSET(PICLAS_SHARED_MEMORY_CORES CACHE)
ADD_DEFINITIONS(-DCORE_SPLIT=1)
MESSAGE(STATUS "Shared memory split type for subcommunicators set to core-level")
ELSEIF(PICLAS_SHARED_MEMORY STREQUAL "PICLAS_COMM_TYPE_NODE")
SET(PICLAS_SHARED_MEMORY_CORES "2" CACHE STRING "Number of cores per node when setting PICLAS_SHARED_MEMORY=PICLAS_COMM_TYPE_NODE. All cores must be on the same physical node!")
ADD_DEFINITIONS(-DCORE_SPLIT=${PICLAS_SHARED_MEMORY_CORES})
MESSAGE(STATUS "Shared memory split type for subcommunicators set to sub-node-level with user-specific value [PICLAS_SHARED_MEMORY_CORES = ${PICLAS_SHARED_MEMORY_CORES}] cores per node")
ENDIF()

# =========================================================================
Expand Down Expand Up @@ -610,4 +610,4 @@ IF(PICLAS_CTAGS)
MESSAGE(STATUS "Found ctags: ${CTAGS_PATH}")
ADD_DEPENDENCIES(piclas tags)
ENDIF(CTAGS_PATH)
ENDIF(PICLAS_CTAGS)
ENDIF(PICLAS_CTAGS)
2 changes: 1 addition & 1 deletion src/equations/maxwell/equation.f90
Original file line number Diff line number Diff line change
Expand Up @@ -1591,7 +1591,7 @@ PPURE SUBROUTINE ExactFunc_TE_Circular_Waveguide(t,x,resu)
!===================================================================================================================================
! MODULES
USE MOD_Globals
USE MOD_Globals_Vars ,ONLY: c,c2,eps0,PI,c_inv,mu0
USE MOD_Globals_Vars ,ONLY: c,PI,c_inv,mu0
USE MOD_Equation_Vars ,ONLY: TEScale,TEPulse,TEFrequency,TEPolarization
USE MOD_Equation_Vars ,ONLY: TERadius,TEModeRoot,TEDelay,TEPulseSigma,TEPulseSeriesFrequence
USE MOD_Equation_Vars ,ONLY: TEPulseNumber,TEDirection,TEMode,TEPulseShape
Expand Down
6 changes: 3 additions & 3 deletions src/globals/globals.f90
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ MODULE MOD_Globals
! Overload the MPI interface because MPICH fails to provide it
! > https://github.com/pmodels/mpich/issues/2659
! > https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node263.htm
#if LIBS_MPICH
#if LIBS_MPICH_FIX_SHM_INTERFACE
INTERFACE MPI_WIN_ALLOCATE_SHARED
SUBROUTINE PMPI_WIN_ALLOCATE_SHARED(SIZE, DISP_UNIT, INFO, COMM, BASEPTR, WIN, IERROR)
USE, INTRINSIC :: ISO_C_BINDING, ONLY : C_PTR
Expand All @@ -89,7 +89,7 @@ SUBROUTINE PMPI_WIN_SHARED_QUERY(WIN, RANK, SIZE, DISP_UNIT, BASEPTR, IERROR)
TYPE(C_PTR) :: BASEPTR
END SUBROUTINE
END INTERFACE
#endif /*LIBS_MPICH*/
#endif /*LIBS_MPICH_FIX_SHM_INTERFACE*/

INTERFACE Abort
MODULE PROCEDURE AbortProg
Expand Down Expand Up @@ -1511,4 +1511,4 @@ END SUBROUTINE DisplayNumberOfParticles
#endif /*defined(PARTICLES)*/


END MODULE MOD_Globals
END MODULE MOD_Globals
8 changes: 0 additions & 8 deletions src/io_hdf5/hdf5_output_state.f90
Original file line number Diff line number Diff line change
Expand Up @@ -835,14 +835,6 @@ SUBROUTINE WriteIMDStateToHDF5()
CALL FillParticleData()
CALL WriteStateToHDF5(TRIM(MeshFile),t,tFuture)
SWRITE(UNIT_StdOut,'(A)')' Particles: StateFile (IMD MD data) created. Terminating successfully!'
#if USE_MPI
CALL FinalizeMPI()
CALL MPI_FINALIZE(iERROR)
IF(iERROR.NE.0)THEN
CALL abort(__STAMP__, ' MPI_FINALIZE(iERROR) returned non-zero integer value',iERROR)
END IF
#endif /*USE_MPI*/
STOP 0 ! terminate successfully
ELSE
CALL abort(__STAMP__, ' IMDLengthScale.LE.0.0 which is not allowed')
END IF
Expand Down
100 changes: 100 additions & 0 deletions src/mesh/mesh_readin.f90
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,11 @@ MODULE MOD_Mesh_ReadIn
MODULE PROCEDURE INVMAP
END INTERFACE

INTERFACE FinalizeMeshReadin
MODULE PROCEDURE FinalizeMeshReadin
END INTERFACE

PUBLIC :: FinalizeMeshReadin
PUBLIC::ReadMesh,Qsort1Int,INVMAP
!===================================================================================================================================

Expand Down Expand Up @@ -1328,4 +1333,99 @@ SUBROUTINE Partition1Int(A,marker)
RETURN
END SUBROUTINE Partition1Int


SUBROUTINE FinalizeMeshReadin(meshMode)
!===================================================================================================================================
! Finalizes the shared mesh readin
!===================================================================================================================================
! MODULES
USE MOD_Globals
USE MOD_Mesh_Vars
USE MOD_Particle_Mesh_Vars
#if USE_MPI
USE MOD_MPI_Shared
USE MOD_MPI_Shared_Vars
USE MOD_Particle_Vars ,ONLY: Symmetry
#endif
#if USE_LOADBALANCE
USE MOD_LoadBalance_Vars ,ONLY: PerformLoadBalance
#endif /*USE_LOADBALANCE*/
! IMPLICIT VARIABLE HANDLING
IMPLICIT NONE
!-----------------------------------------------------------------------------------------------------------------------------------
! INPUT/OUTPUT VARIABLES
INTEGER,INTENT(IN) :: meshMode !< 0: only read and build Elem_xGP,
!< -1: as 0 + build connectivity and read node info (automatically read for PARTICLES=ON)
!< 1: as 0 + build connectivity
!< 2: as 1 + calc metrics
!< 3: as 2 but skip InitParticleMesh
!-----------------------------------------------------------------------------------------------------------------------------------
! LOCAL VARIABLES
!===================================================================================================================================

! First, free every shared memory window. This requires MPI_BARRIER as per MPI3.1 specification
#if USE_MPI
CALL MPI_BARRIER(MPI_COMM_SHARED,iERROR)

! symmetry sides and elem volumes/characteristic lengths
IF(ABS(meshMode).GT.1)THEN
IF(Symmetry%Order.EQ.2) CALL UNLOCK_AND_FREE(SideIsSymSide_Shared_Win)
CALL UNLOCK_AND_FREE(ElemVolume_Shared_Win)
CALL UNLOCK_AND_FREE(ElemCharLength_Shared_Win)
END IF ! ABS(meshMode).GT.1
#endif /*USE_MPI*/

! Then, free the pointers or arrays
ADEALLOCATE(SideIsSymSide_Shared)
ADEALLOCATE(ElemVolume_Shared)
ADEALLOCATE(ElemCharLength_Shared)

#if USE_MPI
! Free communication arrays
SDEALLOCATE(displsElem)
SDEALLOCATE(recvcountElem)
SDEALLOCATE(displsSide)
SDEALLOCATE(recvcountSide)

#if USE_LOADBALANCE
IF (PerformLoadBalance) THEN
CALL MPI_BARRIER(MPI_COMM_SHARED,iERROR)
RETURN
END IF
#endif /*USE_LOADBALANCE*/

! elems
CALL UNLOCK_AND_FREE(ElemInfo_Shared_Win)

! sides
#if !defined(PARTICLES)
IF(ABS(meshMode).GT.1)THEN
#endif /*!defined(PARTICLES)*/
CALL UNLOCK_AND_FREE(SideInfo_Shared_Win)
#if !defined(PARTICLES)
END IF ! ABS(meshMode).GT.1
#endif /*!defined(PARTICLES)*/

! nodes
CALL UNLOCK_AND_FREE(NodeInfo_Shared_Win)
CALL UNLOCK_AND_FREE(NodeCoords_Shared_Win)

CALL MPI_BARRIER(MPI_COMM_SHARED,iERROR)
#endif /*USE_MPI*/

ADEALLOCATE(ElemInfo_Shared)
ADEALLOCATE(SideInfo_Shared)
ADEALLOCATE(NodeInfo_Shared)
ADEALLOCATE(NodeCoords_Shared)

! Free communication arrays
#if USE_MPI
SDEALLOCATE(displsNode)
SDEALLOCATE(recvcountNode)
#endif /*USE_MPI*/

END SUBROUTINE FinalizeMeshReadin



END MODULE MOD_Mesh_ReadIn
9 changes: 6 additions & 3 deletions src/mpi/mpi.f90
Original file line number Diff line number Diff line change
Expand Up @@ -195,13 +195,16 @@ SUBROUTINE InitMPIvars()
GroupSize=GETINT('GroupSize','0')
IF(GroupSize.LT.1)THEN ! group procs by node
! Split the node communicator (shared memory) from the global communicator on physical processor or node level
#if (CORE_SPLIT==1)
CALL MPI_COMM_SPLIT(MPI_COMM_PICLAS,myRank,0,MPI_COMM_NODE,iError)
#elif (CORE_SPLIT==0)
#if (CORE_SPLIT==0)
! 1.) MPI_COMM_TYPE_SHARED
! Note that using SharedMemoryMethod=OMPI_COMM_TYPE_CORE somehow does not work in every case (intel/amd processors)
! Also note that OMPI_COMM_TYPE_CORE is undefined when not using OpenMPI
CALL MPI_COMM_SPLIT_TYPE(MPI_COMM_PICLAS,SharedMemoryMethod,0,MPI_INFO_NULL,MPI_COMM_NODE,IERROR)
#elif (CORE_SPLIT==1)
! 2.) OMPI_COMM_TYPE_CORE
CALL MPI_COMM_SPLIT(MPI_COMM_PICLAS,myRank,0,MPI_COMM_NODE,iError)
#else
! 3.) PICLAS_COMM_TYPE_NODE
! Check if more nodes than procs are required or
! if the resulting split would create unequal procs per node
IF((CORE_SPLIT.GE.nProcessors_Global).OR.(MOD(nProcessors_Global,CORE_SPLIT).GT.0))THEN
Expand Down
Loading

0 comments on commit 137bb79

Please sign in to comment.