You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that root_trees.py have some memory leaks.
When reading a large number of files in a loop, memory usage increases constantly and the job get killed because out of memory.
Ex: The simple code below illustrate it :
from grand.dataio.root_trees import *
from pathlib import Path
import psutil
id=0
p = psutil.Process()
print(p.memory_info())
pathlist = Path("/sps/grand/data/gp13/jul2023").rglob('*.root')
for pathdir in pathlist:
path=str(pathdir)
print(str(id)+" "+path)
tadccounts = TADC(path)
list_of_events = tadccounts.get_list_of_events()
print(len(list_of_events))
for event, run in list_of_events:
pass
trawvoltage = TRawVoltage(path)
list_of_events = trawvoltage.get_list_of_events()
for event, run in list_of_events:
pass
id=id+1
print(p.memory_info())
showing that the vms: aka “Virtual Memory Size”, this is the total amount of virtual memory used by the process grows constantly (from 6187302912 to 71741800448 after reading 400 files) and the data: aka DRS (data resident set) the amount of physical memory devoted to other than executable code. It matches “top“‘s DATA column) grows from 5477257216 to 71020777472 (so x10 in both cases).
When running this code @ccin2p3 it get killed after a while with error : slurmstepd: error: Detected 1 oom-kill event(s) in StepId=54442604.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
The text was updated successfully, but these errors were encountered:
In at least half, not really a memory leak. It just loads all the trees to memory. Making them exist only in the loop context would be pretty tricky. Please call a new function stop_using() on each tree instance at the end of the loop.
However, there is indeed a memory leak somewhere. It seems that some lists are not dereferenced, but at the moment I can't find which lists and where would they be created. So please inform me if stop_using() is enough, or I need to trace the memory leak as high priority.
It seems that root_trees.py have some memory leaks.
When reading a large number of files in a loop, memory usage increases constantly and the job get killed because out of memory.
Ex: The simple code below illustrate it :
produces the following output :
showing that the vms: aka “Virtual Memory Size”, this is the total amount of virtual memory used by the process grows constantly (from 6187302912 to 71741800448 after reading 400 files) and the data: aka DRS (data resident set) the amount of physical memory devoted to other than executable code. It matches “top“‘s DATA column) grows from 5477257216 to 71020777472 (so x10 in both cases).
When running this code @ccin2p3 it get killed after a while with error : slurmstepd: error: Detected 1 oom-kill event(s) in StepId=54442604.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.
The text was updated successfully, but these errors were encountered: