Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

__del__ destructor for Universes? #297

Closed
mnmelo opened this issue Jun 4, 2015 · 18 comments · Fixed by #321
Closed

__del__ destructor for Universes? #297

mnmelo opened this issue Jun 4, 2015 · 18 comments · Fixed by #321
Assignees
Milestone

Comments

@mnmelo
Copy link
Member

mnmelo commented Jun 4, 2015

The title says it all. @richardjgowers mentioned it'd be interesting to implement, but currently that leads to mem leaks as the garbage collector can't clean up the Universe.

See the discussion on #293.

If Universe has a del I end up with two uncollectables: the Universe itself and the trajectory (which also has a del). I tried to find any sort of circular reference between the two, but found nothing.
Is it worth pursuing this?

@richardjgowers
Copy link
Member

So adding the del destructor does create a memory leak.. :)

@richardjgowers richardjgowers removed their assignment Jun 16, 2015
@mnmelo
Copy link
Member Author

mnmelo commented Jun 16, 2015

So, I think I was able to finally track this down. I always thought it was a problem of having the Reader also having a __del__, because that's apparently the easiest way to have uncollectable circular references. But the snippet below shows I was wrong:

import gc
class Universe():
     def __init__(self):
         self.trajectory = Reader()
     def __del__(self):
         print "Universe is dying"

class Reader():
    def __init__(self):
        self.somevar = 3
    def __del__(self):
        print "Trajectory is dying"

u = Universe()
u = None
gc.collect() # This is empty, meaning all got properly collected

Now let's add some more realism to our toy Universe:

import gc
class Universe():
     def __init__(self):
         self.trajectory = Reader()
         self.atom = Atom(self) # Having this or the following lines breaks garbage collection
         self.universe = self
     def __del__(self):
         print "Universe is dying"

class Reader():
    def __init__(self):
        self.somevar = 3
    def __del__(self):
        print "Trajectory is dying"

class Atom():
    def __init__(self, parent):
        self.universe = parent

u = Universe()
u = None
gc.collect()
gc.garbage # [<__main__.Universe instance at 0x1004a91b8>, <__main__.Reader instance at 0x1004a9200>]

As far as I know there's no way to clean this automatically. A user would not only have to set u.universe = None but also do so for each Atom.universe, AtomGroup.universe, Residue.universe etc. listed under the Universe object.
Hopeless?

@richardjgowers
Copy link
Member

https://docs.python.org/2/library/weakref.html

If we made all the references to Universe (from Atom etc) a weakref.ref, it might fix everything?

Can you try?

class Atom():
   self.universe = weakref.ref(parent)

@mnmelo
Copy link
Member Author

mnmelo commented Jun 16, 2015

Ah, was just writing to propose that!
The downside: if the Universe goes out-of-scope, all the referrers will become hollow. I don't know how much of a use-case problem that is.

I'll try it out and let you know.
EDIT: I got the idea from the nice article at http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python/

@mnmelo mnmelo self-assigned this Jun 16, 2015
@richardjgowers
Copy link
Member

import gc
import weakref

class Universe():
     def __init__(self):
         self.trajectory = Reader()
         self.atom = Atom(self) # Having this or the following lines breaks garbage collection
         self.universe = weakref.ref(self)
     def __del__(self):
         print "Universe is dying"

class Reader():
    def __init__(self):
        self.somevar = 3
    def __del__(self):
        print "Trajectory is dying"

class Atom():
    def __init__(self, parent):
         self.universe = weakref.ref(parent)

Leads to

In [1]: u = Universe()

In [2]: del u
Universe is dying
Trajectory is dying

In [3]: gc.collect()
Out[3]: 0

In [4]: gc.garbage
Out[4]: []

So possibly fixed?

I think deleting Universe should break everything that refers to Universe... makes sense to me

@mnmelo
Copy link
Member Author

mnmelo commented Jun 16, 2015

Well, but now what happens is that if you let the Universe go out-of-scope but still have AtomGroups lying around, the Universe is still alive and you can still use those AtomGroups:

def first10(top,traj):
    u = Universe(top,traj)
    return u.atoms[:10]
ag = first10("top", "traj")
ag.positions  #This works now, but wouldn't anymore with the weakrefs

I see no problem with breaking this. It'll have to be well-documented. What's your take?

@richardjgowers
Copy link
Member

Ah, I didn't think of scopes like that.... I don't think AtomGroups or any other products of Universe are designed to live standalone though, so I think the above code should break.

I guess the next step is to implement this onto the real code and see which tests break

@mnmelo
Copy link
Member Author

mnmelo commented Jun 16, 2015

I'm on it. Might take a bit of fine-combing to find all that refers back to the Universe, though.

@richardjgowers
Copy link
Member

I think it should be just Atoms...

"git grep '.universe ='" might make it easier to find

@orbeckst
Copy link
Member

Just to chime in: I am fine with standalone AtomGroups dying; it's fine to say that they have to live in some Universe.

FYI, the AtomGroup.Merge() hack creates explicitly copies and then assigns them individually to the new Universe:

    # create a list of Atoms, then convert it to an AtomGroup
    atoms = [copy.copy(a) for gr in args for a in gr]
    for a in atoms:
        a.universe = u

which looks to me like the canonical way to do this kind of thing.

@mnmelo
Copy link
Member Author

mnmelo commented Jun 21, 2015

Ok, all tests are passing.
However, as I commented in #312, I don't need a __del__ destructor anymore for pickling/unpickling AtomGroups (#293).

I am against keeping a __del__ method that does nothing, because it's easy to inadvertently create a memleak down the line. My question now is what to do about the weakrefs I introduced to make the __del__ possible:

  • Shall I revert them to the previous behavior?
  • Or shall I leave them in place, ready for the moment a Universe.__del__ is needed?

Option 2 is more forethoughtful, but has the side-effect of breaking the API in the terms we discussed above (Atoms and AtomGroups becoming orphaned if their home Universe goes out of scope).

I'm for option 2, mostly because I don't use AtomGroups in such a Universe-detached way, and think this will enforce clean, explicit handling of Universes. In any case I would really like some backing or dissent before committing. @richardjgowers? @orbeckst?

@richardjgowers
Copy link
Member

Push the branch to this repo and make a pull request so we can see the changes

mnmelo added a commit that referenced this issue Jun 21, 2015
…verses

AtomGroups can now be pickled/unpickled (closes #293)
A weakref system was implemented so that Universes can have a __del__
method without memleaking (closes #297)
ChainReaders no longer leak (and also no longer have a __del__  method)
(closes #312)
Tests now specifically look for uncollectable objects at the end. Breaks
parallelization, though.
@orbeckst
Copy link
Member

I favour (2):

  1. use weakrefs and
  2. remove unnecessary __del__ but
  3. add a comment briefly explaining the problem and referencing this issue __del__ destructor for Universes? #297) and
  4. add documentation explaining the relationship between Universe, AtomGroup, Atom and Reader instances with a note that once a Universe goes out of scope, its constituents are not expected to work anymore.

I don't think we ever promised that AtomGroup and friends could survive when the parent Universe goes out of scope but obviously I have no way of knowing if people inadvertently relied on such a feature. It is possible that this has been used implicitly (especially in code that is not MDAnalysis-based but just uses the library as a swiss-army knife) such as

def get_protein(*args):
   return Universe(*args).selectAtoms("protein")
protein = get_protein("some.pdb")
# contrived analysis...
protein_coords = protein.positions
Rg = protein.radiusOfGyration()

The above will possibly fail with weakref, right? (Depending on whenever garbage is collected, which will lead to really difficult to debug problems and potential Heisenbugs when you run this in a debugger or test case that disabled gc).

One would have to rewrite it to return the Universe.

For this it would be really useful to give a good error message once the proverbial manure hits the fan and a Atom tries to access the dead Universe. Can we try ... except that access without serious performance impact or does anyone else have a smart idea how to give users good feedback when this breaks?

Otherwise: as @richardjgowers said: push a feature-issue297 branch, which will engage Travis and allows to directly comment on the code.

@orbeckst
Copy link
Member

@mnmelo is everything on feature-pickleAtomGroup in #321 ?

@mnmelo
Copy link
Member Author

mnmelo commented Jun 21, 2015

1- I'll look into failing gracefully. Since the access to the universe is a managed property in Atom and AtomGroup I think it's as easy as catching it there.

2- Yup, everything should be in that same pull request. Did I forget something again?

@richardjgowers
Copy link
Member

I can't remember what the problem with this was. As long as we only have the __del__ method on Reader, then this works fine:

import gc
class Universe():
     def __init__(self):
          self.trajectory = Reader()
          self.atom = Atom(self)
          self.universe = self

class Reader():
     def __init__(self):
          self.somevar = 3
     def __del__(self):
          print 'Closing'

class Atom():
     def __init__(self, parent):
          self.universe = parent
          self.name = 'Alfred'

u = Universe()
at = u.atom
del u

gc.collect()
print gc.garbage


# at should still work here
print "Hello I'm {}".format(at.name)
print "My daddy is called {}".format(at.universe)

del at

gc.collect()
print gc.garbage

@orbeckst
Copy link
Member

Following up from #447 : @kain88-de , when we discussed this problem, we thought that the use cases that you're presenting are not very common (see e.g. my summary comment). I agree with you that the current behavior leads to extremely obscure errors and we never put code in place to provide better error messages.

We can certainly re-open the discussion and perhaps you see a way by which we can have our cake (picklable AtomGroups, no memory leaks) and eat it, too (#447). Perhaps some of the problems will go away with #363 ?

@richardjgowers
Copy link
Member

Ok, I removed the weakref thing from Atoms and made a PR. I think it was only necessary when we had a __del__ on both Universe and Reader. This isn't breaking any of the memleak tests, and it seems to allow Universes to leave scope.

@orbeckst I think #363 is just going to be rearranging data structures, not their relationships. So hopefully I'm only responsible for performance there :D

@kain88-de Can you check this allows your use case? I did a little test locally and I think it does

@mnmelo Can you check that this isn't leaking horribly?

import MDAnalysis as mda
import gc


def get_ag():
    u = mda.Universe('adk.psf','adk_dims.dcd')

    return u.atoms[:10]

ag = get_ag()

# Universe should still exist at this point
print ag.positions

del ag

# Universe should now be dead
gc.collect()
print gc.garbage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants