cannot pickle universe object #173

GoogleCodeExporter · 2015-04-04T00:43:35Z

What steps will reproduce the problem?
1. Create a universe object by passing a psf file and a dcd file
          u = universe(psf, dcd)
2. Try to serialize this object by using python pickle module
          u_blob = pickle.dumps(u)

What is the expected output? What do you see instead?
Expected output should be a byte stream, but I got error message 'TypeError: 
'AtomGroup' object is not callable'

What version of the product are you using? On what operating system?

MDAnalysis 0.8.0
Python 2.7.6 / Python 2.6.2
Arch Linux / CentOS

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 27 Mar 2014 at 9:18

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2015-04-04T00:43:35Z

I've certainly brought this kind of thing up in the past, but I think the fact 
that universe objects are closely tied to open files makes it difficult to 
allow proper serialization via pickle. This also makes it challenging to pass 
universe objects between cores in a parallel workflow. I generally end up 
adjusting my workflow to produce something more tractable like a numpy array of 
coordinates and pickle that for storage and / or interprocess communication.

Original comment by [email protected] on 27 Mar 2014 at 10:14

GoogleCodeExporter · 2015-04-04T00:43:35Z

The way Universes are built at the moment makes it impossible to pickle them as 
they contain trajectory reader objects which in turn contain open file 
descriptors.

This won't change any time soon unless someone comes up with a smart way to do 
this. Therefore I am closing this with 'WontFix' – but feel free to start a 
discussion on the developer mailing list or in the comments to this issue. If a 
sensible approach and consensus emerges we will reopen.

Oliver

Original comment by orbeckst on 27 Mar 2014 at 10:16

GoogleCodeExporter · 2015-04-04T00:43:35Z

Original comment by orbeckst on 27 Mar 2014 at 10:16

Changed state: WontFix

GoogleCodeExporter · 2015-04-04T00:43:36Z

Just to clarify the cryptic error message:

pickle checks whether a class has a __getstate__ function, and then executes it 
if it does. The way this is done is a sort of duck-typing, where 
object.__getstate__ is assigned to a variable, which is subsequently called.
If there is no __getstate__ an AttributeError is raised, in which case pickle's 
default behavior ensues.

The thing with some MDAnalysis onjects is that they manage their attributes and 
never raise an AttributeError. In particular if you try AtomGroup.__getstate__ 
you get a SelectionError which pickle does not handle; if you do the same with 
Segment.__getstate__ you get no error (!!) and an empty AtomGroup is returned. 
It is this last case that causes the 'TypeError: 'AtomGroup' object is not 
callable' when pickle tries to execute what it got for Segment.__getstate__.

This is a dangerous side-effect of the syntactic sugar for selection shortcuts. 
A lot of things can go silently ignored.

On the topic of parallelization, I'll post soon some code I have geared 
specifically for parallelizing trajectory reads. The serialization of 
MDAnalysis objects only becomes a problem if they are to be passed back and 
forth between workers. If the code avoids that (and takes care of renewing file 
descriptors) multiprocessing works fine.

Original comment by [email protected] on 27 Mar 2014 at 10:29

GoogleCodeExporter · 2015-04-04T00:43:36Z

Is that possible to extract all the fields of the universe object, and 
serialize them to byte stream. When we wanna regenerate this object, we 
deserialize all the fields and construct the object.

Original comment by [email protected] on 27 Mar 2014 at 11:18

GoogleCodeExporter · 2015-04-04T00:43:36Z

That is pickle's default approach. MDAnalysis universes are unpicklable in this 
way due to, among other things, open file descriptors and unpicklable function 
objects (SWIG stuff).

This can all be managed instead of leaving pickle to do its default process. 
Look into the __getstate__ and __setstate__ functions. There probably is a way 
of coercing the serialization of enough information for __setstate__ to 
recreate the universe. As far as I went it looked messy and, in my case, not 
worth the trouble.

Original comment by [email protected] on 27 Mar 2014 at 11:34

GoogleCodeExporter · 2015-04-04T00:43:36Z

For __getstate__():
* all the arguments of Universe()
* current frame in the trajectory (often that will just be frame 1)
* optional: index of the XTC/TRR reader

For __setstate__():
1) build the Universe again
2) optional: recreate the XTC/TRR reader index
3) go to the saved frame

Original comment by orbeckst on 27 Mar 2014 at 11:42

GoogleCodeExporter · 2015-04-04T00:43:37Z

I added explicit __setstate__() and __getstate__() methods to Universe and 
AtomGroup which raise a NotImplementedError. This should at least make this 
particular error message clearer (in 0.8.2-dev, commit 
4bb36e64e8d182226c9303777a166bdb56d08b34)

Original comment by orbeckst on 15 Apr 2014 at 8:25

GoogleCodeExporter · 2015-04-04T00:43:37Z

Maybe it would be worthwhile making pickling work along the lines of recreating 
a copy of the universe using the constructor information and information from 
the TrajectoryReader (which would need its own __getstate__/__setstate__) — 
see comments in this issue for more details.

I reopen as an enhancement and anyone interested can grab the ticket.

Original comment by orbeckst on 15 Apr 2014 at 8:29

Changed state: Accepted
Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter · 2015-04-04T00:43:37Z

There is a _dcd_c_ptr attribute inside the DCDReader object. It is a C object 
defined in dcd.c, if we wanna serialize the DCDReader object, we have to 
serialize this c pointer as well. But it seems like pickle cannot dumps a 
pyobject.

Original comment by [email protected] on 15 Apr 2014 at 6:54

GoogleCodeExporter · 2015-04-04T00:43:37Z

I don't think that we should serialize a Reader wholesale but instead provide a 
way to re-instantiate. In this way you only serialize state information such as

- filename
- number of atoms
- current frame
- ... all other parameters set in __init__()

and then essentially create a brand new Reader with this information.

Original comment by orbeckst on 15 Apr 2014 at 7:08

GoogleCodeExporter · 2015-04-04T00:43:37Z

If we wanna initialize a reader object, we must call the _read_dcd_header() 
inside the initialize function which is implemented by c in dcd.c and it will 
set a new attribute named __dcd_c_ptr which is a PyObject in the reader object. 

This attribute is used by function like MDAnalysis.analysis.align.rmsd(A, B) 
and most other functions. So I think if we wanna serialize DCDReader we will 
have to serialize this attribute or we can just serialize the coordinate and 
topology file for regenerating universe object later.

Original comment by [email protected] on 15 Apr 2014 at 7:54

GoogleCodeExporter · 2015-04-04T00:43:38Z

Come to think it will be difficult to pickle the exact state of a Universe, 
e.g. if I renamed atoms or changed coordinates. 

What is quite do-able (I think) is to re-create the Universe in the same way as 
it was created in the first place as you just need to know the constructor 
arguments (namely, topology files and coordinate files and any keyword args). 
This might or might not be sufficient for many applications but it certainly is 
not true serialization of the current state of an object.

Original comment by orbeckst on 16 Apr 2014 at 8:53

orbeckst · 2016-04-29T23:35:51Z

Is this going to be solved by #643 ?

richardjgowers · 2016-04-30T13:50:26Z

Sort of, Universe pickle will just call topology.pickle and
trajectory.pickle. 643 is most of the hard work though

On Sat, 30 Apr 2016 00:35 Oliver Beckstein, [email protected]
wrote:

Is this going to be solved by #643
#643 ?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#173 (comment)

dotsdl · 2016-05-06T09:17:05Z

As currently written Topology can be pickled, but the 10M atom system I've been playing with takes much longer to deserialize from disk and the filesize of the pickle is nearly 7 times larger than the JSON. I think if we decide to stick with JSON as the preferred serialization for topologies, we can just define the JSONified topology as the output of __setstate__ (and likewise de-JSONifying for __getstate__).

Trajectory pickling is probably just going to have to be pickling/unpickling the filepaths. Will be sufficient for multiprocessing or distributed computing on a common filesystem.

kain88-de · 2016-05-06T09:37:13Z

@dotsdl did you enable compression for the pickle? The standard pickle protocol is super inefficient. If you change it the results are comparable to JSON. I had the same problems deciding how to store offsets in the xdrlib refactor.

dotsdl · 2016-05-06T09:39:53Z

@kain88-de I didn't; I'll give that a try and report back. I suppose if there's no benefit to JSONifying the topology before pickling we can skip that step entirely? Or is there still value in only serializing what we know we need to regenerate the Topology object completely?

richardjgowers · 2017-01-12T12:35:37Z

This is a duplicate of #878 as that revolves around unpickling a Universe

GoogleCodeExporter added Priority-Medium Type-Enhancement labels Apr 4, 2015

orbeckst mentioned this issue Apr 7, 2015

File descriptors that are opened and closed as needed for Universes #239

Open

orbeckst added enhancement and removed Type-Enhancement labels Apr 9, 2015

richardjgowers mentioned this issue Apr 14, 2015

Saving (pickling) Topology between sessions #253

Closed

orbeckst mentioned this issue May 26, 2015

Sim should take a real MDAnalysis.Universe instance, too datreant/MDSynthesis#25

Closed

mnmelo mentioned this issue Jun 1, 2015

Pickling AtomGroups #293

Closed

orbeckst added the persistence label Jun 2, 2015

orbeckst added the Component-Topology label Dec 15, 2015

richardjgowers closed this as completed Jan 12, 2017

orbeckst added the GSoC GSoC project label Aug 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cannot pickle universe object #173

cannot pickle universe object #173

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

orbeckst commented Apr 29, 2016

richardjgowers commented Apr 30, 2016

dotsdl commented May 6, 2016

kain88-de commented May 6, 2016

dotsdl commented May 6, 2016

richardjgowers commented Jan 12, 2017

cannot pickle universe object #173

cannot pickle universe object #173

Comments

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

GoogleCodeExporter commented Apr 4, 2015

orbeckst commented Apr 29, 2016

richardjgowers commented Apr 30, 2016

dotsdl commented May 6, 2016

kain88-de commented May 6, 2016

dotsdl commented May 6, 2016

richardjgowers commented Jan 12, 2017