You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that moving to POSIX shared memory may have added limitations to pshmem. One of my workflows is failing with
Process 0: MPIShared_e5ec3957d523 failed MMap of 115168 bytes (14396 elements of 8 bytes each): [Errno 24] Too many open files
TOAST ERROR: Proc 0: Traceback (most recent call last):
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/mpi.py", line 509, in exception_guard
yield
Proc 0: File "/home/reijo/.conda/envs/toastdev/bin/toast_so_sim.py", line 934, in <module>
main()
Proc 0: File "/home/reijo/.conda/envs/toastdev/bin/toast_so_sim.py", line 913, in main
data = simulate_data(job, args, toast_comm, telescope, schedule)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Proc 0: File "/home/reijo/.conda/envs/toastdev/bin/toast_so_sim.py", line 327, in simulate_data
ops.sim_ground.apply(data)
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/timing.py", line 107, in df
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/ops/operator.py", line 107, in apply
self.exec(data, detectors, use_accel=use_accel, **kwargs)
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/timing.py", line 107, in df
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/ops/operator.py", line 47, in exec
self._exec(
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/timing.py", line 81, in df
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/ops/sim_ground.py", line 597, in _exec
ob.shared.create_column(
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/toast/observation_data.py", line 1352, in create_column
MPIShared(
Proc 0: File "/home/reijo/.conda/envs/toastdev/lib/python3.11/site-packages/pshmem/shmem.py", line 195, in __init__
self._shmap = mmap.mmap(
^^^^^^^^^^
Proc 0: OSError: [Errno 24] Too many open files
When I run against pshem version right before the merge of the POSIX branch, there is no problem.
The text was updated successfully, but these errors were encountered:
I guess it is not surprising that the OS limit for shared memory segments (i.e. open files) gets hit eventually. This is set by a kernel config parameter for the whole node and the per-user limit is shown with ulimit -Sn. Here is a page with more discussion.
The surprising thing is that this per user/process limit is (apparently) smaller than the limit on MPI shared memory windows, which is globally across the system.
Is this at NERSC or on the simons1 machine?
Can you try increasing the limit in the same shell where you running the workflow? Something like ulimit -n 8192 before mpirun / srun.
It seems that moving to POSIX shared memory may have added limitations to
pshmem
. One of my workflows is failing withWhen I run against
pshem
version right before the merge of the POSIX branch, there is no problem.The text was updated successfully, but these errors were encountered: