Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite xdrlib #419

Closed
kain88-de opened this issue Sep 7, 2015 · 19 comments
Closed

rewrite xdrlib #419

kain88-de opened this issue Sep 7, 2015 · 19 comments

Comments

@kain88-de
Copy link
Member

I've been thinking about the way to best port the old xdrlib code to cython. I noticed that the old code basically reads like no information about the xtc in the beginning and after that only checks the xyz data and time.

My current idea is to write a class that acts as a file handle with the follwing API methods.

def number_atoms(self)
    return self._natoms

def frame(self, SEEK)
    # magic
    return xyz, time

def next_frame(self):
    # magic
    return xyz, time

Inside this class is all of the error handling from the xdrlib done, if anything happens an exception is raised. The class should also work with context managers. This should best encapsulate xtc parsing.

This should also allow simple usages like this for quick and dirty xtc parsing

with xtc_parser(fname) as xtc:
    while not xtc.eof():
        xyz, time = xtc.next_frame()

Did I miss some important information that we need to get from the xtc as well? And could this be a good approach for the DCD parser as well? If so then we could simplify/unify quite a bit of the coordinate parsing.

@richardjgowers
Copy link
Member

Is this to live inside a Reader or to replace the Reader?

@kain88-de
Copy link
Member Author

I thought it could be a variable used inside the reader. But I honestly need to have a closer look into the whole reader API and how all the subclasses work. What currently exists is a little bit messy.

@richardjgowers
Copy link
Member

I've been tidying the API from the top down, so various implementations (DCD) are a little messy at the bottom. The basic gist of it is a Reader holds a Timestep object. The Reader fills the Timestep object with data and returns it.

If you look in coordinates/base.py there's the ProtoReader class which basically needs the following two methods to be defined by subclasses:

  • _read_next_timestep() - fill Timestep with whatever is next in the trajectory
  • _read_frame() - seek randomly and grab this frame (not everyone can seek easily, so optional)

@kain88-de
Copy link
Member Author

How does the conversion setup in base.Reader work? This is my current XTCReader class

kain88-de@9f10530

class XTCReader(base.Reader):
    format = 'XTC'
    units = {'time': 'ps', 'length': 'nm'}

    def __init__(self, filename, convert_units=True, **kwargs):
        super(XTCReader, self).__init__(filename, convert_units=convert_units,
                                        **kwargs)
        self.xtc = XTCFile(filename)

        xyz, box, step, time, prec = self.xtc.read()
        self.n_frames = len(self.xtc)
        self._n_atoms = self.xtc.n_atoms
        self.n_atoms = self.xtc.n_atoms
        self.xtc.seek(0)

        self.ts = self._Timestep(self._n_atoms, **self._ts_kwargs)
        self.ts.frame = 0  # 0-based frame number as starting frame
        self.ts._pos = xyz

        if self.convert_units:
            self.convert_pos_from_native(self.ts._pos)
            self.convert_pos_from_native(self.ts._unitcell[:3])

    def close(self):
        self.xtc.close()

    def _read_frame(self, i):
        self.xtc.seek(i)
        xyz, box, step, time, prec = self.xtc.read()
        self.ts._pos = xyz
        self.ts.frame = i
        return self.ts

    def _read_next_timestep(self, ts=None):
        if ts is None:
            ts = self.ts
        else:
            # TODO: cleanup _read_frame() to use a "free" Timestep
            raise NotImplementedError("XTCReader cannot assign to a timestep")
        xyz, box, step, time, prec = self.xtc.read()
        self.ts._pos = xyz
        self.ts.frame += 1
        return ts

Just by reading PrimitivePDBReader I expected that setting the units dict and convert_units to True would be enough.

@kain88-de
Copy link
Member Author

While I'm at it. What is the difference between self.n_atoms and self._n_atoms? I got that I have to specify both.

@richardjgowers
Copy link
Member

You don't need _n_atoms, I think a lot of the Readers have it as a property to make it "read only".

This lists the API
http://pythonhosted.org/MDAnalysis/documentation_pages/coordinates/init.html?%20api#trajectory-reader-class

With convert_units, we also have a flag system (that I haven't really used) which lets you set defaults for a session. So the logic is kind of keyword > flag > default

@kain88-de
Copy link
Member Author

Ah I forgot to convert the coordinates again in the read_next_frame function.

Without _n_atoms one of the tests is failing, see test_analysis.py line 425

@richardjgowers
Copy link
Member

You should probably refactor a little so frame reading only happens in one method, so __init__ and _read_frame just call _read_next_frame (and the file IO is done there)

@richardjgowers
Copy link
Member

I'm not sure about the failing test, _n_atoms is "private" so analysis shouldn't be using it... If you can show me a travis fail I'll have a look

@kain88-de
Copy link
Member Author

never mind. it failed in my own XTC class. I reference the _n_atoms without having it set.

@kain88-de kain88-de mentioned this issue Sep 19, 2015
15 tasks
@dotsdl
Copy link
Member

dotsdl commented Sep 20, 2015

This is a big undertaking, and the result so far is looking very good. What do you think of the conversation in #238, @kain88-de? Do you think chunked reads at the C-level could happen in a single IO call such that we see any performance gains when iterating through the frames of a long trajectory?

@kain88-de
Copy link
Member Author

Sure we could do that at the cython-level. But supporting the slicing and everything correctly would take even more work. We could add this after merging the current PR though I imagine this is better to implement this in base.reader so that everything benefits from such chunked reads.

@richardjgowers
Copy link
Member

I think a clever way to look at that would be to compare the read times of
different size frames, so does 1,000,000 atoms take twice as long as
500,000. The scaling of that should tell you what the overheads are like

On Sun, 20 Sep 2015 19:31 David Dotson [email protected] wrote:

This is a big undertaking, and the result so far is looking very good.
What do you think of the conversation in #238
#238, @kain88-de
https://github.com/kain88-de? Do you think chunked reads at the C-level
could happen in a single IO call such that we see any performance gains
when iterating through the frames of a long trajectory?


Reply to this email directly or view it on GitHub
#419 (comment)
.

@rathann
Copy link
Contributor

rathann commented Oct 14, 2015

Have you tried talking to xdrfile upstream (i.e. Gromacs)? In the process of creating a package of MDAnalysis for inclusion in Fedora, I did find the bundled xdrfile and started talking to Gromacs developers to see if they were open to including your changes: https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-developers/2015-May/008300.html . However, your changes to the python parts were quite extensive and I wasn't able to provide a clean patch, so the discussion died out rather quickly. Maybe it's a good idea to revive that thread so that xdrfile can be brought up to date for the benefit of all of its users.

@orbeckst
Copy link
Member

Hi @rathann ,

I saw that you also revived the thread at https://mailman-1.sys.kth.se/pipermail/gromacs.org_gmx-developers/2015-October/008721.html . We haven't formally talked to the Gromacs folks, partly because we (= @mnmelo ) added some fairly specific functionality, i.e. returning an array of frame offsets when the length of the trajectory is determined. MDAnalysis stores these offsets as a cache file on disk to implement fast random access, something that XTCs do not natively do. It's a bit of a hack but works really well in practice when one analyzes trajectories repeatedly.

Furthermore, mdtraj contains it's own modified version (bug fixes etc) so we would probably also need to merge the mdtraj and the MDAnalysis version (which might not be a terrible idea because we should at least take their bug fixes on board but it's not clear to me if mdtraj necessarily wants/needs our hacks — @rmcgibbo ?).

FYI, the swig bindings to libxdr2 are certain to go because they make it hard for us to go to Py3. We will also always bundle the library inside MDAnalysis because it's essential for us – I don't know if this is something that you can deal with when packaging for Fedora. (I also agree with @rmcgibbo 's comment that we expect to be maintaining xtc-related code for a good long time to come).

Cheers,
Oliver

@rmcgibbo
Copy link
Contributor

My sense is that getting this to happen w/ upstream probably requires someone who is motivated to work with upstream to add git://git.gromacs.org/libxdrfile.git to the gromacs gerrit server, and then begin making patch requests through the standard gerrit / redmine system.

In the meantime, y'all are more than welcome to to take any/all of the bug fixes that I added to libxrdfile. I think mdtraj will probably pass on adding the indexing capabilities that you've developed, but that could change if other developers express interest.

@rathann
Copy link
Contributor

rathann commented Oct 27, 2015

@orbeckst We'd prefer to unbundle xdrfile, package it separately and link all its consumers (gromacs, python(3)-MDAnalysis, which is under review and mdtraj, if/when it's packaged) to the system version. If you support it directly, that's great. If not and it's not too complicated to patch the code to use the system version, then I'll probably do it myself.

@orbeckst
Copy link
Member

Just fyi, as far as MDAnalysis is concerned, we rely on our additions to libxdrfile and that's why we called it libxdrfile2, to make clear that there's additional functionality in it. If you were to patch MDAnalysis to use the vanilla libxdrfile you would cripple MDAnalysis' XTC/TRR reader.

The ideal situation would be an upstream libxdrfile with everything that we need but it seems unclear if this is ever going to happen.

@richardjgowers
Copy link
Member

This is finished now right? @kain88-de

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants