-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10k+ leaked objects from simple test #1343
Comments
Admittedly I just quickly glanced at it. But there seems to be a lot of DV3D ref in there. Why are they even here? Meshfill shouldn't be using it. |
Not sure, but my guess is that it's related to bloat in the |
It would be useful to distinguish between leaked objects that are allocated on a per-plot basis and those that are allocated once at app initialization. The leaked objects that are allocated once at app startup are probably relatively small and ignorable . From: David Lonie <[email protected]mailto:[email protected]> Not sure, but my guess is that it's related to bloat in the vcs/init.py file. — |
Ignoring leaks is a bad practice, as is initializing objects that aren't needed. It would be good to clean up both of these, as they have performance implications. Leaks are especially dangerous when mixing python with other languages, as the issue we're now seeing is that leaks in vcs are causing segfaults in external modules that are trying to clean up the leaked memory after the python session finishes, but need to release their allocations while the interpreter is active. |
@ThomasMaxwell I don't know how do we do that but yes, I agree that having that information might be helpful. However, like @dlonie said, we should fix both of these leaks with priority for per-plot basis. |
FYIW, running this script import gc
gc.enable()
gc.set_debug(gc.DEBUG_LEAK)
def scoped_import():
import vcs
scoped_import()
del scoped_import
gc.collect()
print len(gc.garbage) Results in The aggregated types are as follows:
|
Just trying to understand. When can we have dict or similar python data types getting leaked? (what scenarios etc) |
Circular dependencies are one case, e.g.
will leak. This is sort of rare, though, and indicative of a bad design. More likely, some other object is getting leaked that references a member dict/list. |
just for convenience updated script leak_detector.py
usage:
|
commenting out all the
I get this while running the "import vcs" test
@ThomasMaxwell I think it is yours to clean up sorry... |
running:
shows a lot of AutoAPI leaks. I think it was used way back then to generate vistrails xml strings. But I doubt it is still true. Will see if I can take off all that AutoAPI that anyway only makes the whole cdms2/vcs heavier than it should. |
@doutriaux1 Any ideas about the 14 remaining leaks without the dv3d imports? Also, take a look at the animation tests. I know of at least one leak from there, resulting from this line: https://github.com/UV-CDAT/uvcdat/blob/master/Packages/vcs/Lib/VTKAnimate.py#L45
What this does is tell the interpreter to call The reason this code was added needs to be readdressed and a better solution put in place. Also, the rest of the vcs code should be checked to make sure this pattern isn't used anywhere else (just grep for atexit and make sure we aren't registering any bound methods or closures with strong references to it). |
@doutriaux1 @dlonie this is great information and very helpful.. thanks for looking into it. Let's try to get it cleaned up 😄 |
@dlonie my guess is that they come from libcf (inside cdms2) that's the only thing that I know of that is using ctype (bad idea i think, since they use ctype to access netcdf and libcf whose sources we have, so we could have done a better job at wrapping it directly) |
@dlonie the register at exit was the only way I could find at the time to force the function to be ran when the object is deleted. This function needs to be ran in order to clean the temporary pngs from ${HOME}/.uvcdat |
It'd be good to check if the leaks are coming from their library, or if we're holding on to references to them that vcs should be cleaning up. The easiest way to check this would be to put some code that uses their library (and not vcs) in the leak-test program and see if there's any garbage left over. If not, then there's an issue with vcs leaking references. |
@dlonie yes will do that. VCS has no ctype so it has to come from somewhere else, but the AutoAPI brings some leaks so I will clean that first, then try to see why cdms2 leaks (I don't think we are using any libcf file anyway) |
Trouble is, it doesn't really clean up the pngs when the object is deleted, it causes the object to never get deleted until the process exits. What happens if a user creates many animations in a batch script? All of the temporary files from each animation would persist until the process exits, along with the memory being held by the animation classes that are bound to atexit. This would usually be fixed by having something explicitly manage the png lifetimes and free them when they're no longer needed, rather than deferring the cleanup to the process's lifetime. One of the animation classes could do this, and then it's a matter of ensuring that the animation class is properly gc'd when it goes out of scope. Sounds like a good plan for the ctype / AutoAPI leaks. |
no I think if you call the "close" manually which you should, it will clear the objects. |
That works too, as long as users will do it ;) That's a fine solution, though, since if people start running into resource issues you can tell them to make sure they're calling close when they're finished 👍 |
FYI: I have put together some slideshttps://www.dropbox.com/s/lg3k1ap1605rpsl/CDAS-Overview-1.pptx?dl=0 describing the work I have been doing lately developing climate data analysis services in support of NASA CDS and ESFG CWT. This work involves cdms/cdutil parallelization and the development of an analysis server implementation for the ESGF compute working team’s WPS API. Aashish and I have been putting together a mailing list ([email protected]) to discuss climate data analysis services for large datasets. Please let Aashish know if you would like to be added to the list. — Tom |
Did you mean to send that to the UVCDAT mailing list? This is a github issue discussing memory leaks. |
interstingly enough removing the AutoAPI bit, triggered a lot of niceties (slots being respected for gms and Canvas for example).
Does that help @ThomasMaxwell |
http://docs.python.org/2/reference/datamodel.html#slots
That is very obscure. |
@dlonie @aashish24 I tracked the leaks I got to import numpy
import numpy.ma
a = numpy.ma.array([1,2,34,5,7.])
b = numpy.ma.masked_equal(a,0.) Leads to:
While disconcerting I don't see any way to avoid this leak. Maybe the upgrade to a newer python will help. @dlonie can you try to bring this branch (with drawColorbar commented out) in your branch and see if it helps? |
oops branch is: issue_1343_leaks |
numpy leak reported at: numpy/numpy#6010 |
Good catch on the numpy arrays! That would explain a lot of the leaks (although I'm curious why so many of these arrays would be created/leaked from just importing vcs). I tested the new changes and it still segfault on my branch when I run |
@dlonie would it be because vcs imports numpy at the top? |
I am wondering if we use numpy + pure VTK combo, then we should see similar issues (seg fault). @dlonie can you create an example as such? |
@aashish24 VTK uses numpy internally without issue, so that's not it. Numpy and VTK have been used together countless times in numerous projects -- it's safe to say that most projects using VTK's python bindings also use numpy. |
@aashish24 |
I see. When VTK segfaults, is there a way to know what objects its trying to release? |
Yep, |
I think you should focus on the VTK objects that are leaked. Those are almost certainly the cause of the seg fault. As mentioned in the issue you posted, leaking a small number of static objects at the module level is going to be pretty common. It is highly unlikely these are preventing the VTK objects from being freed because as @dlonie mentioned VTK uses numpy underneath already. My advice is that you should look for is anywhere a vtk object is allocated and stored statically, either inside a module or a static class attribute. Another possibility is that it exists in the closure of a function being passed to a global event handler such as https://mg.pov.lt/blog/hunting-python-memleaks.html I don't have any personal experience with this, so I can't really offer any further advice. |
To add to @jbeezley's input, I've mentioned a couple of times that the likely culprit is the dictionary returned from After that patch (Which I wouldn't submit for inclusion in master, btw -- it's a hack. The |
@dlonie the 10k are done to 105 when you take out DV3D so the VTKPlot objects are not the culprit for this, the 105 leaks left after a vcs.plot are all the ones from the numpy leak. In the branch I sent you DV3D is gone. Also it was my understanding that the vtk_object thing was just a "hack" until we reimplement the pieline object and return these instead. So we should really work hard on taking these out. Also you try to not return them they are only used by the "update" function when animating, so for now try to not return them and see if it fixes the issue. Again these have been (poorly we knew it back then and agreed to go with it until this re-implementation is done) implemented for the sake of animation only. |
I agree with @doutriaux1. We should remove those hacks! Let me know if I can be of any help. |
@doutriaux1 Have you tried running a more complex example (e.g. the meshfill animation test)? The original 10k leaks were just from importing the module, it would be good to ensure there aren't more leaks generated by a less-trivial example. If all of the leaks have been tracked down to numpy or DV3D then I'll take another pass at the refactor sometime soon, but make sure that more realistic usecases have been tested and checked for vcs leaks, because something was leaking that filter when the animation was run. |
@dlonie isn't the def of a leak is that something that should be released isn't released? If soI fail to see how it affects you. You seem to be releasing memory that has already been released and used for something else. No? |
The memory has not been released and used for something else. The issue is:
|
BTW, I suggested using There are lot of very noticable problems regarding circular references and It also looks like objects are getting stuck in the Generated with import gc
gc.enable()
def runTest():
import vcs
import cdms2
import os
import sys
import time
pth = os.path.join(os.path.dirname(__file__),"..")
sys.path.append(pth)
import checkimage
f=cdms2.open(os.path.join(vcs.sample_data,"clt.nc"))
s=f("clt",slice(0,1)) # read only 12 times steps to speed up things
x=vcs.init()
x.drawlogooff()
x.setbgoutputdimensions(1200,1091,units="pixels")
gm=x.createboxfill()
x.plot(s,gm,bg=1)
x.animate.create()
print "Saving now"
prefix= os.path.split(__file__)[1][:-3]
x.animate.save("%s.mp4"%prefix)
pngs = x.animate.close(preserve_pngs = True) # so we can look at them again
src_pth = sys.argv[1]
pth = os.path.join(src_pth,prefix)
ret = 0
for p in pngs:
print "Checking:",p
ret += checkimage.check_result_image(p,os.path.join(pth,os.path.split(p)[1]),checkimage.defaultThreshold)
if ret == 0:
os.removedirs(os.path.split(p)[0])
os.remove("%s.mp4" % prefix)
x.close()
return ret
ret = runTest()
del runTest
gc.collect()
import objgraph
import random
import sys
import inspect
objgraph.show_most_common_types(5000)
objType = "BoxfillPipeline"
objCount = len(objgraph.by_type(objType))
print "Generating %d graphs for '%s'"%(objCount, objType)
for i in range(objCount):
obj = objgraph.by_type(objType)[i]
print "Object:",str(obj)
print "RefCount:",sys.getrefcount(obj)
objgraph.show_backrefs(obj, max_depth=100, refcounts=True,
filename='/tmp/debug/objgraph-%d.png'%i)
print "ObjGraphs written."
sys.exit(ret) |
Running the
test_meshfill_draw_mesh.py
test leaks about 10,900 objects. This is very significant, as it means that rendering multiple large datasets, even properly scoped so no vcs objects are reachable, will quickly exhaust the available system memory. This is also causing issues interfacing with other modules that need objects to be freed before the interpreter shuts down.This is easy to test for manually:
import
s and all, into adef runTest():
block.sys.exit(x)
calls intoreturn x
.The results of
test_meshfill_draw_mesh.py
are here: https://gist.github.com/dlonie/2d72d0cfa4a52271bb7aThe text was updated successfully, but these errors were encountered: