You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the above DLIO command is running, we monitored the memory profile of our machine and found that:
Total Memory utilization of user space processes is around ~40GiB.
Total Memory reported by OS is up to 370-380GiB, out of which upto 330-340GiB is taken by shared memory (/dev/shm). We confirmed the numbers using cat /proc/meminfo and df -h commands.
Dropping page cache by running echo 3 > sudo /proc/sys/vm/drop_caches also doesn't clear this shared memory.
Given above, it seems like the issue is pytorch/pytorch#13246 (comment) i.e. it may be required to use multiprocessing.Arrays to avoid duplication of data in shared memory by differerent processes.
Request you to look into this and do the needful.
Thanks,
Ayush
The text was updated successfully, but these errors were encountered:
Hey Team,
We are using DLIO to simulate Unet3d e.g.
When the above DLIO command is running, we monitored the memory profile of our machine and found that:
/dev/shm
). We confirmed the numbers usingcat /proc/meminfo
anddf -h
commands.echo 3 > sudo /proc/sys/vm/drop_caches
also doesn't clear this shared memory.Given above, it seems like the issue is pytorch/pytorch#13246 (comment) i.e. it may be required to use multiprocessing.Arrays to avoid duplication of data in shared memory by differerent processes.
Request you to look into this and do the needful.
Thanks,
Ayush
The text was updated successfully, but these errors were encountered: