Serialization of AtomGroup #2893

yuxuanzhuang · 2020-08-09T15:35:05Z

Fixes #

Changes made in this Pull Request:

AtomGroup now is pickled without finding its anchored Universe

PR Checklist

Tests?
Docs?
CHANGELOG updated?
Issue raised/referenced?

yuxuanzhuang · 2020-08-09T16:18:51Z

I may miss the point in #321 about lightweight serialization. Don't we always have to serialize (or recreate) a Universe in the new process? In that regard, there's no literal difference.

And due to the nature of pickle, one Universe instance will only be pickled once during serialization. So in the new process, all AtomGroup will still be bound to the same Universe (not the old one though, but I don't think it is a problem).

One advantage of it is: for reference AtomGroup (that is independent from the main Universe) in e.g. RMSD, we don't need to worry about it not being able to pickled because of the missing of its bound Universe.

Besides, another advantage to ditch anchoring is: we no longer need to worry about the order of Universe and AtomGroup during serialization:

>>> def print_id(ag, u):
>>>     print('id u:',id(u))
>>>     print('id ag._u:',id(ag._u))
>>>     print('')


>>> ag = u.atoms[2:5]
>>> print('before pickle:')
>>> print('id u:',id(u))
>>> print('id ag_u:',id(ag._u))
>>> print('')
>>> print('after pickle:')


>>> p = multiprocessing.Pool(2)
>>> i = 0
>>> res = np.array([p.apply(print_id, args=(ag, u))  #  this won't work in the old version with AtomGroup before Universe.
>>>             for ts in u.trajectory[:3]])
>>> p.close()

before pickle:
id u: 139955715000496
id ag_u: 139955715000496

after pickle:
id u: 139956140148912
id ag._u: 139956140148912

id u: 139956140148912
id ag._u: 139956140148912

id u: 139955676441232
id ag._u: 139955676441232

old version:

before pickle:
id u: 139990197591440
id ag_u: 139990197591440

after pickle:
id u: 139990190787024
id ag._u: 139990197591440

id u: 139990190788224
id ag._u: 139990197591440

id u: 139990202526352
id ag._u: 139990190787024

…pickle

pep8speaks · 2020-08-12T11:26:41Z

Hello @yuxuanzhuang! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-19 08:56:48 UTC

richardjgowers

Lgtm

codecov · 2020-08-12T13:11:53Z

Codecov Report

Merging #2893 into develop will increase coverage by 0.02%.
The diff coverage is 100.00%.

@@             Coverage Diff             @@
##           develop    #2893      +/-   ##
===========================================
+ Coverage    92.91%   92.93%   +0.02%     
===========================================
  Files          187      187              
  Lines        24591    24548      -43     
  Branches      3185     3182       -3     
===========================================
- Hits         22848    22814      -34     
+ Misses        1697     1688       -9     
  Partials        46       46

Impacted Files	Coverage Δ
package/MDAnalysis/__init__.py	`91.89% <ø> (-0.61%)`	⬇️
package/MDAnalysis/core/groups.py	`98.58% <100.00%> (-0.01%)`	⬇️
package/MDAnalysis/core/universe.py	`97.45% <100.00%> (+0.25%)`	⬆️
__init__.py	`91.89% <0.00%> (-0.61%)`	⬇️
package/MDAnalysis/coordinates/base.py	`95.12% <0.00%> (+0.48%)`	⬆️
package/MDAnalysis/due.py	`75.00% <0.00%> (+8.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 23074c6...c82c487. Read the comment docs.

orbeckst

Ditching the anchors makes sense, now that we can serialize universes. Your new approach is more streamlined and transparent than keeping state in _ANCHOR_UNIVERSES.

But I don't understand why we still need

package/MDAnalysis/__init__.py:_ANCHOR_UNIVERSES = weakref.WeakValueDictionary()

and any of the other occurrences of _ANCHOR_UNIVERSES: can't we get rid of all of them now?

EDIT: Or do we keep anchoring to avoid having duplicated universes?

EDIT 2: ... yes.

yuxuanzhuang · 2020-08-13T11:58:41Z

I am not sure how it helps to prevent duplicated universes, could you elaborate a bit more?

Also I am not certain whether it's fine to ditch the anchor completely--all the tests still pass without that.

With that said, in #297, it also mentioned that it helps the mem leaks issue. (is that tested anywhere?)

orbeckst · 2020-08-14T01:07:15Z

We used to be able to test for memleaks when we were using nose for testing but didn't manage to port it to pytest. I don't remember why. So we are not really testing for memleaks – this would be a useful thing to look at.

I don't think that the ANCHOR_UNIVERSES were related to #297, though, that was more about weakrefs inside Universe. The ANCHOR_UNIVERSES came from PR #321.

I am not sure how it helps to prevent duplicated universes, could you elaborate a bit more?

You're right, it doesn't. I had thought that serializing a universe and unpacking it in the following way

import pickle
import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
u = mda.Universe(data.PSF, data.DCD)
u_copy = pickle.loads(pickle.dumps(u))

would make u and u_copy magically the same

u is u_copy    # nope

but that's False.

When doing it with a group g = u.atoms then the new group g_copy = loads(dumps(g)) correctly attaches to the old universe: (u is g_copy.universe) is True but that also works with your new approach, right?

orbeckst · 2020-08-14T01:12:24Z

@yuxuanzhuang , with this PR, what does the following code do?

import pickle
import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
u = mda.Universe(data.PSF, data.DCD)
g = u.atoms
h = u.atoms[:10]
g_copy = pickle.loads(pickle.dumps(g))
h_copy = pickle.loads(pickle.dumps(h))

print("g_copy.universe:", u is g_copy.universe)
print("h_copy:", u is h_copy.universe)

I.e., do groups attach to their own universe if it is already present?

I vaguely remember that you had already checked that previously but just wanted to confirm.

orbeckst · 2020-08-14T01:13:26Z

Assuming all looking good then try removing any remnants of ANCHOR_UNIVERSE.

yuxuanzhuang · 2020-08-14T12:21:40Z

No it is not after ditching anchoring.

import pickle
import MDAnalysis as mda; from MDAnalysis.tests import datafiles as data
u = mda.Universe(data.PSF, data.DCD)
g = u.atoms
h = u.atoms[:10]
g_copy = pickle.loads(pickle.dumps(g))
h_copy = pickle.loads(pickle.dumps(h))

print("g_copy.universe is u:", u is g_copy.universe)
print("h_copy.universe is u:", u is h_copy.universe)
print("g_copy.universe is h_copy.universe", g_copy.universe == h_copy.universe)

g_copy.universe is u: False
h_copy.universe is u: False
g_copy.universe is h_copy.universe False

g_copy, h_copy, u_copy = pickle.loads(pickle.dumps([g, h, u]))

print("g_copy.universe is u_copy:", u_copy is g_copy.universe)
print("h_copy.universe is u_copy:", u_copy is h_copy.universe)
print("g_copy.universe is h_copy.universe", g_copy.universe == h_copy.universe)

g_copy.universe is u_copy: True
h_copy.universe is u_copy: True
g_copy.universe is h_copy.universe True

They have to be pickled together. Is there any use cases for the former condition?

yuxuanzhuang · 2020-08-14T12:34:21Z

No performance difference is observed when using pmda:

u = mda.Universe('serialize_mda/benchmarking/YiiP_system.pdb','serialize_mda/benchmarking/YiiP_system_90ns_center.xtc')
ref = mda.Universe('serialize_mda/benchmarking/YiiP_system.pdb')
rmsd_ana = pmda.rms.RMSD(u.atoms,ref.atoms)
rmsd_ana.run(n_jobs=12, n_blocks=12)
print(rmsd_ana.timing._total)

Before: 26.77
Afrer: 26.20

orbeckst · 2020-08-14T17:58:06Z

Thanks for the illuminating examples.

I've been thinking more about the question if unpickling an AtomGroup should attach to an existing Universe or create a new one.

Is there any use cases for the former condition?

I don't really envisage a use where I would keep serializing separate AtomGroups but users come up with all kinds of creative solutions so I can see people getting confused if they find that two AtomGroups are now diverging in their history (and don't update when iterating over the trajectory, for instance).

On the other hand, I would say from a Python point of view, serialize-deserialize should behave like a deepcopy and with reattaching to existing universes we break this expectation. Furthermore, carrying global state around in ANCHOR_UNIVERSES is awkward.

With all of this I am now strongly in favor of removing all ANCHOR_UNIVERSES stuff and be very explicit about what happens when we serialize:

You always get an independent Universe.
If you have multiple related objects then you must serialize them in one go (e.g. as a tuple).

If the docs don't say this already explicitly then that should be added.

@MDAnalysis/coredevs any thoughts?

orbeckst

I am in favor of removing _ANCHOR_UNIVERSES.

Please add documentation that makes clear how ags are to be pickled.

… (PR MDAnalysis#2893) * AtomGroup now are pickled/unpickled without looking for its anchored Universe * removed core.universe._ANCHOR_UNIVERSES (originally introduced with MDAnalysis#293) and related machinery to keep global state. No global state is kept any more. * update docs for atomgroup pickle * update CHANGELOG * update tests

atomgroup pickle

e64127b

yuxuanzhuang added 2 commits August 12, 2020 13:08

Merge remote-tracking branch 'mda_origin/develop' into new_atomgroup_…

6f54936

…pickle

new atomgroup pickle test

675d62b

richardjgowers approved these changes Aug 12, 2020

View reviewed changes

yuxuanzhuang added 3 commits August 12, 2020 13:35

pep

15af759

typo

fe15f5c

changelog

f5bb186

orbeckst added the GSoC GSoC project label Aug 12, 2020

orbeckst self-requested a review August 12, 2020 16:35

orbeckst requested changes Aug 12, 2020

View reviewed changes

orbeckst approved these changes Aug 12, 2020

View reviewed changes

orbeckst assigned richardjgowers Aug 12, 2020

yuxuanzhuang added 3 commits August 14, 2020 14:27

ditch anchor_u

f299c14

ditch anchor_u test

e65601f

Merge branch 'develop' into new_atomgroup_pickle

45a66f1

orbeckst requested changes Aug 14, 2020

View reviewed changes

yuxuanzhuang added 4 commits August 16, 2020 17:28

doc for atomgroup pickle

cd9c23f

merge to develop

d917905

pull

a41ac3f

Merge branch 'develop' into new_atomgroup_pickle

c82c487

yuxuanzhuang mentioned this pull request Aug 19, 2020

Turn ParallelAnalysisBase into dask custom collection MDAnalysis/pmda#136

Open

4 tasks

orbeckst approved these changes Aug 21, 2020

View reviewed changes

orbeckst self-assigned this Aug 21, 2020

orbeckst merged commit 2f0381e into MDAnalysis:develop Aug 21, 2020

fiona-naughton added enhancement parallelization labels Sep 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization of AtomGroup #2893

Serialization of AtomGroup #2893

yuxuanzhuang commented Aug 9, 2020 •

edited

Loading

yuxuanzhuang commented Aug 9, 2020 •

edited

Loading

pep8speaks commented Aug 12, 2020 •

edited

Loading

richardjgowers left a comment

codecov bot commented Aug 12, 2020 •

edited

Loading

orbeckst left a comment •

edited

Loading

yuxuanzhuang commented Aug 13, 2020

orbeckst commented Aug 14, 2020 •

edited

Loading

orbeckst commented Aug 14, 2020

orbeckst commented Aug 14, 2020

yuxuanzhuang commented Aug 14, 2020 •

edited

Loading

yuxuanzhuang commented Aug 14, 2020

orbeckst commented Aug 14, 2020

orbeckst left a comment

Serialization of AtomGroup #2893

Serialization of AtomGroup #2893

Conversation

yuxuanzhuang commented Aug 9, 2020 • edited Loading

PR Checklist

yuxuanzhuang commented Aug 9, 2020 • edited Loading

pep8speaks commented Aug 12, 2020 • edited Loading

Comment last updated at 2020-08-19 08:56:48 UTC

richardjgowers left a comment

Choose a reason for hiding this comment

codecov bot commented Aug 12, 2020 • edited Loading

Codecov Report

orbeckst left a comment • edited Loading

Choose a reason for hiding this comment

yuxuanzhuang commented Aug 13, 2020

orbeckst commented Aug 14, 2020 • edited Loading

orbeckst commented Aug 14, 2020

orbeckst commented Aug 14, 2020

yuxuanzhuang commented Aug 14, 2020 • edited Loading

yuxuanzhuang commented Aug 14, 2020

orbeckst commented Aug 14, 2020

orbeckst left a comment

Choose a reason for hiding this comment

yuxuanzhuang commented Aug 9, 2020 •

edited

Loading

yuxuanzhuang commented Aug 9, 2020 •

edited

Loading

pep8speaks commented Aug 12, 2020 •

edited

Loading

codecov bot commented Aug 12, 2020 •

edited

Loading

orbeckst left a comment •

edited

Loading

orbeckst commented Aug 14, 2020 •

edited

Loading

yuxuanzhuang commented Aug 14, 2020 •

edited

Loading