Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chemfiles as a coordinate reader/writer #1862

Merged
merged 21 commits into from
Mar 10, 2020

Conversation

Luthaf
Copy link
Contributor

@Luthaf Luthaf commented Apr 10, 2018

Here is a first working version of the integration of chemfiles as a coordinate reader. See #1194 for some initial discussion on this. There are still a few TODO in the code, but I think this is ready for an initial round of review.

Currently I get three test failures: two related to aux, and one to container format. I don't really understand what is getting tested here, could someone give me a little pointer?

Using chemfiles might help with:

Unresolved questions:

Chemfiles supports reading/writing files with a non constant number of atoms. Currently I error when reading if the number of atoms changes, and I do nothing and write the new number of atoms when writing. Is this the desired behaviour?

PR Checklist

  • Tests?
    • Formats with no extension association on chemfiles side (LAMMPS Data)
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

dimensions = cell.lengths() + cell.angles()
ts.dimensions = np.array(dimensions, dtype=np.float32)

# TODO: should we make sure to keep the frame alive / copy the data?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame.positions()/frame.velocities() returns a view into C++ memory, that is only valid as long as the underlying frame does not change. So we might want to keep it alive to prevent invalid memory access. But at the same time, it is a numpy array of float64, so maybe .astype(np.float32) already performs a copy of the data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ts.dimensions = np.array(dimensions, dtype=np.float32)

# TODO: should we make sure to keep the frame alive / copy the data?
# TODO: docs mention FORTRAN order, chemfiles uses C order
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work well, is there an issue with the doc or is the conversion automatically done for me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No like this you assign a new reference. If you would to ts.positions[:] then python would to an inplace update. You are just doing a reference update.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inplace update should also do the type conversion automatically.


cell = frame.cell()
dimensions = cell.lengths() + cell.angles()
ts.dimensions = np.array(dimensions, dtype=np.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ts.dimensions[:] = cell.lengths() + cell.angles() a inplace update that does the type conversion for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, great! I changed the code to use it.

ts.dimensions = np.array(dimensions, dtype=np.float32)

# TODO: should we make sure to keep the frame alive / copy the data?
# TODO: docs mention FORTRAN order, chemfiles uses C order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inplace update should also do the type conversion automatically.

ts.positions = frame.positions().astype(np.float32)
if frame.has_velocities():
ts.has_velocities = True
ts.velocities = frame.velocities().astype(np.float32)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inplace update see comment above.

@kain88-de
Copy link
Member

Thanks for this addon. You still need to install chemfiles on travis. Otherwise we can't see what the error is.

@Luthaf
Copy link
Contributor Author

Luthaf commented Apr 13, 2018

Done! Here are the errors about auxwhich I don't understand: https://travis-ci.org/MDAnalysis/mdanalysis/jobs/365014671#L1947

@richardjgowers
Copy link
Member

@Luthaf this looks really interesting!

So the aux thing is you can have a file which runs alongside the trajectory, maybe another time based measurement which isn't in the trajectory file, and you can then iterate wrt the frames in the aux file. It's not the most used feature, so I'll find some time to take a look to see what's going wrong, it's not immediately obvious to me.

def _open(self):
self._file = Trajectory(self.filename, 'r', self._format)
self._closed = False
self._step = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding self._frame = -1 here allow the failing tests about auxiliaries to pass. I still need to figure out why, though.

It does not fix the test about containers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for investigating! I'll try to get some time to give the containers test a look

"""
Convert a Timestep to a chemfiles Frame
"""
if ts.n_atoms != self.n_atoms:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should I do in this case ? Chemfiles have no issue with varying number of atoms in a trajectory (except if the underlying format does not support it), so writing would work. But then it would not be possible to read it again with MDAnalysis.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The writer can be free to vary n_atoms, it's only on the input side that we can't have it vary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll document this behavior then.

@coveralls
Copy link

coveralls commented Jul 3, 2018

Coverage Status

Coverage decreased (-0.01%) to 89.869% when pulling 7833927 on Luthaf:chemfiles into 060e2c2 on MDAnalysis:develop.

@Luthaf Luthaf force-pushed the chemfiles branch 3 times, most recently from 84e2911 to dbb4877 Compare March 29, 2019 09:39
@codecov
Copy link

codecov bot commented Mar 29, 2019

Codecov Report

Merging #1862 into develop will decrease coverage by <.01%.
The diff coverage is 90.11%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1862      +/-   ##
===========================================
- Coverage    90.61%   90.61%   -0.01%     
===========================================
  Files          173      174       +1     
  Lines        23381    23554     +173     
  Branches      3038     3072      +34     
===========================================
+ Hits         21187    21343     +156     
- Misses        1577     1585       +8     
- Partials       617      626       +9
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/__init__.py 100% <100%> (ø) ⬆️
package/MDAnalysis/coordinates/chemfiles.py 90.05% <90.05%> (ø)
coordinates/__init__.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9a78a4...7f6395f. Read the comment docs.

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hey @Luthaf

Sorry for letting this slip off my radar. I've been playing with this this morning. I think it's very cool, and is fast, I just need to go through and check that it can catch all corner cases....

@Luthaf
Copy link
Contributor Author

Luthaf commented May 15, 2019

I think I have added all that I wanted here, so this is ready for review !

@Luthaf Luthaf force-pushed the chemfiles branch 3 times, most recently from 64a7322 to 0e8485a Compare May 16, 2019 10:54
@Luthaf Luthaf changed the title [WIP] Add Chemfiles as a coordinate reader/writer Add Chemfiles as a coordinate reader/writer May 20, 2019
@Luthaf
Copy link
Contributor Author

Luthaf commented Jul 5, 2019

Travis failure looks weird, could someone restart the build?

@richardjgowers
Copy link
Member

Yeah that’s something we fixed recently, some dependency problem. Did a restart, otherwise you can rebase and push to this branch

@richardjgowers
Copy link
Member

Ok I've rebased this on top of the FORMAT_HINT stuff. Chemfiles is now optional, but can be used in MDA if it is installed.

ie this is what this branch does:

In [1]: import MDAnalysis as mda                                                                                                            

In [2]: mda._READER_HINTS                                                                                                                   
Out[2]: 
{'CHAIN': <function MDAnalysis.coordinates.chain.ChainReader._format_hint(thing)>,
 'CHEMFILES': <function MDAnalysis.coordinates.chemfiles.ChemfilesReader._format_hint(thing)>,
 'PARMED': <function MDAnalysis.coordinates.ParmEd.ParmEdReader._format_hint(thing)>,
 'MEMORY': <function MDAnalysis.coordinates.memory.MemoryReader._format_hint(thing)>,
 'MMTF': <function MDAnalysis.coordinates.MMTF.MMTFReader._format_hint(thing)>}

In [3]: import chemfiles                                                                                                                    

In [4]: c = chemfiles.Trajectory('../data/2r9r-1b.xyz', 'r')                                                                                

In [5]: mda.Universe('../data/2r9r-1b.xyz', c)                                                                                              
Out[5]: <Universe with 1284 atoms>

In [6]: u = mda.Universe('../data/2r9r-1b.xyz', c)                                                                                          

In [7]: u_ref = mda.Universe('../data/2r9r-1b.xyz')                                                                                         

In [8]: u_ref.atoms.positions == u.atoms.positions                                                                                          
Out[8]: 
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       ...,
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [9]:  u_other = mda.Universe('../data/2r9r-1b.xyz', '../data/2r9r-1b.xyz', format='CHEMFILES') 

Slightly annoying is that you need the two arguments there, I'll look into that. Probably needs some tests around passing in chemfiles.Trajectory objects.

@@ -79,12 +88,13 @@ def __init__(self, filename, chemfiles_format="", **kwargs):
"""
Parameters
----------
filename : str
trajectory filename
filename : chemfiles.Trajectory or str
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why passing a chemfiles.Trajectory should work. What would be the use case for this? Users creating chemfiles trajectories and then wanting to do analysis with MDA?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's so you can open a chemfiles thing as you want then pass this as the trajectory MDA will use. Passing a filename and specifying format='CHEMFILES' still works, but that's relying on the baked in way to open a trajectory

@richardjgowers
Copy link
Member

Ok I think this is ready, @kain88-de do you want to give it a last look over?

@kain88-de
Copy link
Member

Why do the tests fail?

@Luthaf
Copy link
Contributor Author

Luthaf commented Feb 23, 2020

This is strange... in https://travis-ci.com/MDAnalysis/mdanalysis/jobs/289697062, conda install chemfiles version 0.7.4 from conda-forge. I initially thought is was due to the builder using python 3.5, which is no longer supported by conda-forge, but the latest release with support for Python 3.5 is chemfiles 0.8 (https://anaconda.org/conda-forge/chemfiles-python/files).

Maybe conda is using "minimal compatible version" when resolving versions to install?

Anyway, the test should be skipped in that case, since chemfiles version is not compatible.

@richardjgowers richardjgowers merged commit e416aa7 into MDAnalysis:develop Mar 10, 2020
@Luthaf
Copy link
Contributor Author

Luthaf commented Mar 10, 2020

Yay! Thanks a lot @richardjgowers and @kain88-de for helping with this 😃

@Luthaf Luthaf deleted the chemfiles branch March 10, 2020 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants