rewrite xdrlib #419

kain88-de · 2015-09-07T22:23:13Z

I've been thinking about the way to best port the old xdrlib code to cython. I noticed that the old code basically reads like no information about the xtc in the beginning and after that only checks the xyz data and time.

My current idea is to write a class that acts as a file handle with the follwing API methods.

def number_atoms(self)
    return self._natoms

def frame(self, SEEK)
    # magic
    return xyz, time

def next_frame(self):
    # magic
    return xyz, time

Inside this class is all of the error handling from the xdrlib done, if anything happens an exception is raised. The class should also work with context managers. This should best encapsulate xtc parsing.

This should also allow simple usages like this for quick and dirty xtc parsing

with xtc_parser(fname) as xtc:
    while not xtc.eof():
        xyz, time = xtc.next_frame()

Did I miss some important information that we need to get from the xtc as well? And could this be a good approach for the DCD parser as well? If so then we could simplify/unify quite a bit of the coordinate parsing.

richardjgowers · 2015-09-08T08:07:05Z

Is this to live inside a Reader or to replace the Reader?

kain88-de · 2015-09-08T08:16:01Z

I thought it could be a variable used inside the reader. But I honestly need to have a closer look into the whole reader API and how all the subclasses work. What currently exists is a little bit messy.

richardjgowers · 2015-09-08T08:33:20Z

I've been tidying the API from the top down, so various implementations (DCD) are a little messy at the bottom. The basic gist of it is a Reader holds a Timestep object. The Reader fills the Timestep object with data and returns it.

If you look in coordinates/base.py there's the ProtoReader class which basically needs the following two methods to be defined by subclasses:

_read_next_timestep() - fill Timestep with whatever is next in the trajectory
_read_frame() - seek randomly and grab this frame (not everyone can seek easily, so optional)

kain88-de · 2015-09-15T13:55:50Z

How does the conversion setup in base.Reader work? This is my current XTCReader class

kain88-de@9f10530

class XTCReader(base.Reader):
    format = 'XTC'
    units = {'time': 'ps', 'length': 'nm'}

    def __init__(self, filename, convert_units=True, **kwargs):
        super(XTCReader, self).__init__(filename, convert_units=convert_units,
                                        **kwargs)
        self.xtc = XTCFile(filename)

        xyz, box, step, time, prec = self.xtc.read()
        self.n_frames = len(self.xtc)
        self._n_atoms = self.xtc.n_atoms
        self.n_atoms = self.xtc.n_atoms
        self.xtc.seek(0)

        self.ts = self._Timestep(self._n_atoms, **self._ts_kwargs)
        self.ts.frame = 0  # 0-based frame number as starting frame
        self.ts._pos = xyz

        if self.convert_units:
            self.convert_pos_from_native(self.ts._pos)
            self.convert_pos_from_native(self.ts._unitcell[:3])

    def close(self):
        self.xtc.close()

    def _read_frame(self, i):
        self.xtc.seek(i)
        xyz, box, step, time, prec = self.xtc.read()
        self.ts._pos = xyz
        self.ts.frame = i
        return self.ts

    def _read_next_timestep(self, ts=None):
        if ts is None:
            ts = self.ts
        else:
            # TODO: cleanup _read_frame() to use a "free" Timestep
            raise NotImplementedError("XTCReader cannot assign to a timestep")
        xyz, box, step, time, prec = self.xtc.read()
        self.ts._pos = xyz
        self.ts.frame += 1
        return ts

Just by reading PrimitivePDBReader I expected that setting the units dict and convert_units to True would be enough.

kain88-de · 2015-09-15T13:57:08Z

While I'm at it. What is the difference between self.n_atoms and self._n_atoms? I got that I have to specify both.

richardjgowers · 2015-09-15T14:24:50Z

You don't need _n_atoms, I think a lot of the Readers have it as a property to make it "read only".

This lists the API
http://pythonhosted.org/MDAnalysis/documentation_pages/coordinates/init.html?%20api#trajectory-reader-class

With convert_units, we also have a flag system (that I haven't really used) which lets you set defaults for a session. So the logic is kind of keyword > flag > default

kain88-de · 2015-09-15T14:41:53Z

Ah I forgot to convert the coordinates again in the read_next_frame function.

Without _n_atoms one of the tests is failing, see test_analysis.py line 425

richardjgowers · 2015-09-15T15:04:57Z

You should probably refactor a little so frame reading only happens in one method, so __init__ and _read_frame just call _read_next_frame (and the file IO is done there)

richardjgowers · 2015-09-15T15:10:15Z

I'm not sure about the failing test, _n_atoms is "private" so analysis shouldn't be using it... If you can show me a travis fail I'll have a look

kain88-de · 2015-09-15T15:18:11Z

never mind. it failed in my own XTC class. I reference the _n_atoms without having it set.

dotsdl · 2015-09-20T18:31:15Z

This is a big undertaking, and the result so far is looking very good. What do you think of the conversation in #238, @kain88-de? Do you think chunked reads at the C-level could happen in a single IO call such that we see any performance gains when iterating through the frames of a long trajectory?

kain88-de · 2015-09-20T18:44:20Z

Sure we could do that at the cython-level. But supporting the slicing and everything correctly would take even more work. We could add this after merging the current PR though I imagine this is better to implement this in base.reader so that everything benefits from such chunked reads.

richardjgowers · 2015-09-20T18:44:21Z

I think a clever way to look at that would be to compare the read times of
different size frames, so does 1,000,000 atoms take twice as long as
500,000. The scaling of that should tell you what the overheads are like

On Sun, 20 Sep 2015 19:31 David Dotson [email protected] wrote:

This is a big undertaking, and the result so far is looking very good.
What do you think of the conversation in #238
#238, @kain88-de
https://github.com/kain88-de? Do you think chunked reads at the C-level
could happen in a single IO call such that we see any performance gains
when iterating through the frames of a long trajectory?

—
Reply to this email directly or view it on GitHub
#419 (comment)
.

rathann · 2015-10-14T12:09:51Z

Have you tried talking to xdrfile upstream (i.e. Gromacs)? In the process of creating a package of MDAnalysis for inclusion in Fedora, I did find the bundled xdrfile and started talking to Gromacs developers to see if they were open to including your changes: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-developers/2015-May/008300.html . However, your changes to the python parts were quite extensive and I wasn't able to provide a clean patch, so the discussion died out rather quickly. Maybe it's a good idea to revive that thread so that xdrfile can be brought up to date for the benefit of all of its users.

orbeckst · 2015-10-18T22:36:11Z

Hi @rathann ,

I saw that you also revived the thread at https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-developers/2015-October/008721.html . We haven't formally talked to the Gromacs folks, partly because we (= @mnmelo ) added some fairly specific functionality, i.e. returning an array of frame offsets when the length of the trajectory is determined. MDAnalysis stores these offsets as a cache file on disk to implement fast random access, something that XTCs do not natively do. It's a bit of a hack but works really well in practice when one analyzes trajectories repeatedly.

Furthermore, mdtraj contains it's own modified version (bug fixes etc) so we would probably also need to merge the mdtraj and the MDAnalysis version (which might not be a terrible idea because we should at least take their bug fixes on board but it's not clear to me if mdtraj necessarily wants/needs our hacks — @rmcgibbo ?).

FYI, the swig bindings to libxdr2 are certain to go because they make it hard for us to go to Py3. We will also always bundle the library inside MDAnalysis because it's essential for us – I don't know if this is something that you can deal with when packaging for Fedora. (I also agree with @rmcgibbo 's comment that we expect to be maintaining xtc-related code for a good long time to come).

Cheers,
Oliver

rmcgibbo · 2015-10-18T23:01:04Z

My sense is that getting this to happen w/ upstream probably requires someone who is motivated to work with upstream to add git://git.gromacs.org/libxdrfile.git to the gromacs gerrit server, and then begin making patch requests through the standard gerrit / redmine system.

In the meantime, y'all are more than welcome to to take any/all of the bug fixes that I added to libxrdfile. I think mdtraj will probably pass on adding the indexing capabilities that you've developed, but that could change if other developers express interest.

rathann · 2015-10-27T09:24:02Z

@orbeckst We'd prefer to unbundle xdrfile, package it separately and link all its consumers (gromacs, python(3)-MDAnalysis, which is under review and mdtraj, if/when it's packaged) to the system version. If you support it directly, that's great. If not and it's not too complicated to patch the code to use the system version, then I'll probably do it myself.

orbeckst · 2015-10-27T21:26:27Z

Just fyi, as far as MDAnalysis is concerned, we rely on our additions to libxdrfile and that's why we called it libxdrfile2, to make clear that there's additional functionality in it. If you were to patch MDAnalysis to use the vanilla libxdrfile you would cripple MDAnalysis' XTC/TRR reader.

The ideal situation would be an upstream libxdrfile with everything that we need but it seems unclear if this is ever going to happen.

richardjgowers · 2016-01-19T14:34:39Z

This is finished now right? @kain88-de

orbeckst added Format-Gromacs Component-Readers labels Sep 8, 2015

richardjgowers added the Python3 label Sep 10, 2015

richardjgowers assigned kain88-de Sep 10, 2015

kain88-de mentioned this issue Sep 19, 2015

Cython xdrlib #441

Merged

15 tasks

orbeckst mentioned this issue Sep 27, 2015

SWIG -> Cython rewrite of the XDRReader interface #211

Closed

orbeckst added this to the 1.0 milestone Sep 27, 2015

orbeckst mentioned this issue Jan 7, 2016

new 0.13.0 release #600

Closed

kain88-de closed this as completed Jan 19, 2016

orbeckst added the downstream label Feb 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rewrite xdrlib #419

rewrite xdrlib #419

kain88-de commented Sep 7, 2015

richardjgowers commented Sep 8, 2015

kain88-de commented Sep 8, 2015

richardjgowers commented Sep 8, 2015

kain88-de commented Sep 15, 2015

kain88-de commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

kain88-de commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

kain88-de commented Sep 15, 2015

dotsdl commented Sep 20, 2015

kain88-de commented Sep 20, 2015

richardjgowers commented Sep 20, 2015

rathann commented Oct 14, 2015

orbeckst commented Oct 18, 2015

rmcgibbo commented Oct 18, 2015

rathann commented Oct 27, 2015

orbeckst commented Oct 27, 2015

richardjgowers commented Jan 19, 2016

rewrite xdrlib #419

rewrite xdrlib #419

Comments

kain88-de commented Sep 7, 2015

richardjgowers commented Sep 8, 2015

kain88-de commented Sep 8, 2015

richardjgowers commented Sep 8, 2015

kain88-de commented Sep 15, 2015

kain88-de commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

kain88-de commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

richardjgowers commented Sep 15, 2015

kain88-de commented Sep 15, 2015

dotsdl commented Sep 20, 2015

kain88-de commented Sep 20, 2015

richardjgowers commented Sep 20, 2015

rathann commented Oct 14, 2015

orbeckst commented Oct 18, 2015

rmcgibbo commented Oct 18, 2015

rathann commented Oct 27, 2015

orbeckst commented Oct 27, 2015

richardjgowers commented Jan 19, 2016