Add Chemfiles as a coordinate reader/writer #1862

Luthaf · 2018-04-10T15:58:30Z

Here is a first working version of the integration of chemfiles as a coordinate reader. See #1194 for some initial discussion on this. There are still a few TODO in the code, but I think this is ready for an initial round of review.

~~Currently I get three test failures: two related to aux, and one to container format. I don't really understand what is getting tested here, could someone give me a little pointer?~~

Using chemfiles might help with:

Implementation of time-dependent topology #864: time dependent topology is supported in chemfiles, it could be brought over to MDAnalysis
Support for Gromacs's TNG format #865: TNG format is implemented
Problem with reading/writing CONECT records in PDB files with many atoms #988: chemfiles handles this by sending a warning if too many atoms are in the frame for PDB format, and not outputting the corresponding bonds
Reading a multi-frame PDB file created by VMD #1133: VMD style PDB are supported
strict PDB parsing #1966: I am pretty sure this is working, I'll need to double check it
Allow PDBWriter (and similar) to write record_types #1753: supported in chemfiles though the is_hetatm atomic property
- TODO: translate recordtype to chemfiles property
Write compressed ascii outputs #2216: already supported for all text formats, for .gz, .xz and .bz2

Unresolved questions:

Chemfiles supports reading/writing files with a non constant number of atoms. Currently I error when reading if the number of atoms changes, and I do nothing and write the new number of atoms when writing. Is this the desired behaviour?

PR Checklist

Luthaf · 2018-04-10T16:00:31Z

package/MDAnalysis/coordinates/CHEMFILES.py

+        dimensions = cell.lengths() + cell.angles()
+        ts.dimensions = np.array(dimensions, dtype=np.float32)
+
+        # TODO: should we make sure to keep the frame alive / copy the data?


frame.positions()/frame.velocities() returns a view into C++ memory, that is only valid as long as the underlying frame does not change. So we might want to keep it alive to prevent invalid memory access. But at the same time, it is a numpy array of float64, so maybe .astype(np.float32) already performs a copy of the data.

astype does a copy. "Copy of the array, cast to a specified type"

Luthaf · 2018-04-10T16:01:04Z

package/MDAnalysis/coordinates/CHEMFILES.py

+        ts.dimensions = np.array(dimensions, dtype=np.float32)
+
+        # TODO: should we make sure to keep the frame alive / copy the data?
+        # TODO: docs mention FORTRAN order, chemfiles uses C order


This seems to work well, is there an issue with the doc or is the conversion automatically done for me?

No like this you assign a new reference. If you would to ts.positions[:] then python would to an inplace update. You are just doing a reference update.

the inplace update should also do the type conversion automatically.

kain88-de · 2018-04-10T20:54:17Z

package/MDAnalysis/coordinates/CHEMFILES.py

+
+        cell = frame.cell()
+        dimensions = cell.lengths() + cell.angles()
+        ts.dimensions = np.array(dimensions, dtype=np.float32)


ts.dimensions[:] = cell.lengths() + cell.angles() a inplace update that does the type conversion for you.

Oh, great! I changed the code to use it.

kain88-de · 2018-04-10T20:54:37Z

package/MDAnalysis/coordinates/CHEMFILES.py

+        ts.dimensions = np.array(dimensions, dtype=np.float32)
+
+        # TODO: should we make sure to keep the frame alive / copy the data?
+        # TODO: docs mention FORTRAN order, chemfiles uses C order


the inplace update should also do the type conversion automatically.

kain88-de · 2018-04-10T20:54:56Z

package/MDAnalysis/coordinates/CHEMFILES.py

+        ts.positions = frame.positions().astype(np.float32)
+        if frame.has_velocities():
+            ts.has_velocities = True
+            ts.velocities = frame.velocities().astype(np.float32)


inplace update see comment above.

kain88-de · 2018-04-10T20:58:07Z

Thanks for this addon. You still need to install chemfiles on travis. Otherwise we can't see what the error is.

Luthaf · 2018-04-13T19:21:54Z

Done! Here are the errors about auxwhich I don't understand: https://travis-ci.org/MDAnalysis/mdanalysis/jobs/365014671#L1947

richardjgowers · 2018-04-16T14:34:16Z

@Luthaf this looks really interesting!

So the aux thing is you can have a file which runs alongside the trajectory, maybe another time based measurement which isn't in the trajectory file, and you can then iterate wrt the frames in the aux file. It's not the most used feature, so I'll find some time to take a look to see what's going wrong, it's not immediately obvious to me.

jbarnoud · 2018-06-17T15:43:19Z

package/MDAnalysis/coordinates/CHEMFILES.py

+    def _open(self):
+        self._file = Trajectory(self.filename, 'r', self._format)
+        self._closed = False
+        self._step = 0


Adding self._frame = -1 here allow the failing tests about auxiliaries to pass. I still need to figure out why, though.

It does not fix the test about containers.

Thanks for investigating! I'll try to get some time to give the containers test a look

Luthaf · 2018-07-03T13:07:10Z

package/MDAnalysis/coordinates/CHEMFILES.py

+        """
+        Convert a Timestep to a chemfiles Frame
+        """
+        if ts.n_atoms != self.n_atoms:


What should I do in this case ? Chemfiles have no issue with varying number of atoms in a trajectory (except if the underlying format does not support it), so writing would work. But then it would not be possible to read it again with MDAnalysis.

The writer can be free to vary n_atoms, it's only on the input side that we can't have it vary

Ok, I'll document this behavior then.

coveralls · 2018-07-03T13:34:42Z

Coverage decreased (-0.01%) to 89.869% when pulling 7833927 on Luthaf:chemfiles into 060e2c2 on MDAnalysis:develop.

codecov · 2019-03-29T09:39:59Z

Codecov Report

Merging #1862 into develop will decrease coverage by <.01%.
The diff coverage is 90.11%.

@@             Coverage Diff             @@
##           develop    #1862      +/-   ##
===========================================
- Coverage    90.61%   90.61%   -0.01%     
===========================================
  Files          173      174       +1     
  Lines        23381    23554     +173     
  Branches      3038     3072      +34     
===========================================
+ Hits         21187    21343     +156     
- Misses        1577     1585       +8     
- Partials       617      626       +9

Impacted Files	Coverage Δ
package/MDAnalysis/coordinates/__init__.py	`100% <100%> (ø)`	⬆️
package/MDAnalysis/coordinates/chemfiles.py	`90.05% <90.05%> (ø)`
coordinates/__init__.py	`100% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9a78a4...7f6395f. Read the comment docs.

richardjgowers

hey @Luthaf

Sorry for letting this slip off my radar. I've been playing with this this morning. I think it's very cool, and is fast, I just need to go through and check that it can catch all corner cases....

Luthaf · 2019-05-15T20:20:04Z

I think I have added all that I wanted here, so this is ready for review !

Luthaf · 2019-07-05T10:07:04Z

Travis failure looks weird, could someone restart the build?

richardjgowers · 2019-07-05T10:14:11Z

Yeah that’s something we fixed recently, some dependency problem. Did a restart, otherwise you can rebase and push to this branch

some failures for .Writer and .dimensions (with no dimensions for xyz)

richardjgowers · 2020-02-18T10:37:11Z

Ok I've rebased this on top of the FORMAT_HINT stuff. Chemfiles is now optional, but can be used in MDA if it is installed.

ie this is what this branch does:

In [1]: import MDAnalysis as mda                                                                                                            

In [2]: mda._READER_HINTS                                                                                                                   
Out[2]: 
{'CHAIN': <function MDAnalysis.coordinates.chain.ChainReader._format_hint(thing)>,
 'CHEMFILES': <function MDAnalysis.coordinates.chemfiles.ChemfilesReader._format_hint(thing)>,
 'PARMED': <function MDAnalysis.coordinates.ParmEd.ParmEdReader._format_hint(thing)>,
 'MEMORY': <function MDAnalysis.coordinates.memory.MemoryReader._format_hint(thing)>,
 'MMTF': <function MDAnalysis.coordinates.MMTF.MMTFReader._format_hint(thing)>}

In [3]: import chemfiles                                                                                                                    

In [4]: c = chemfiles.Trajectory('../data/2r9r-1b.xyz', 'r')                                                                                

In [5]: mda.Universe('../data/2r9r-1b.xyz', c)                                                                                              
Out[5]: <Universe with 1284 atoms>

In [6]: u = mda.Universe('../data/2r9r-1b.xyz', c)                                                                                          

In [7]: u_ref = mda.Universe('../data/2r9r-1b.xyz')                                                                                         

In [8]: u_ref.atoms.positions == u.atoms.positions                                                                                          
Out[8]: 
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       ...,
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]])

In [9]:  u_other = mda.Universe('../data/2r9r-1b.xyz', '../data/2r9r-1b.xyz', format='CHEMFILES')

Slightly annoying is that you need the two arguments there, I'll look into that. Probably needs some tests around passing in chemfiles.Trajectory objects.

Luthaf · 2020-02-18T11:35:25Z

package/MDAnalysis/coordinates/chemfiles.py

@@ -79,12 +88,13 @@ def __init__(self, filename, chemfiles_format="", **kwargs):
        """
        Parameters
        ----------
-        filename : str
-            trajectory filename
+        filename : chemfiles.Trajectory or str


I am not sure I understand why passing a chemfiles.Trajectory should work. What would be the use case for this? Users creating chemfiles trajectories and then wanting to do analysis with MDA?

Yeah, it's so you can open a chemfiles thing as you want then pass this as the trajectory MDA will use. Passing a filename and specifying format='CHEMFILES' still works, but that's relying on the baked in way to open a trajectory

richardjgowers · 2020-02-18T19:39:04Z

Ok I think this is ready, @kain88-de do you want to give it a last look over?

kain88-de · 2020-02-22T10:43:04Z

Why do the tests fail?

Luthaf · 2020-02-23T14:32:45Z

This is strange... in https://travis-ci.com/MDAnalysis/mdanalysis/jobs/289697062, conda install chemfiles version 0.7.4 from conda-forge. I initially thought is was due to the builder using python 3.5, which is no longer supported by conda-forge, but the latest release with support for Python 3.5 is chemfiles 0.8 (https://anaconda.org/conda-forge/chemfiles-python/files).

Maybe conda is using "minimal compatible version" when resolving versions to install?

Anyway, the test should be skipped in that case, since chemfiles version is not compatible.

Luthaf · 2020-03-10T12:24:41Z

Yay! Thanks a lot @richardjgowers and @kain88-de for helping with this 😃

Luthaf commented Apr 10, 2018

View reviewed changes

kain88-de reviewed Apr 10, 2018

View reviewed changes

Luthaf force-pushed the chemfiles branch from 24b7563 to 4f17349 Compare April 11, 2018 09:07

jbarnoud reviewed Jun 17, 2018

View reviewed changes

orbeckst added Component-Readers Work in progress labels Jun 18, 2018

Luthaf force-pushed the chemfiles branch from 4f17349 to 7833927 Compare July 3, 2018 13:05

Luthaf commented Jul 3, 2018

View reviewed changes

orbeckst mentioned this pull request Mar 27, 2019

[WIP] Cythonizes GROParser & GROReader very minimally #2227

Closed

4 tasks

Luthaf force-pushed the chemfiles branch 3 times, most recently from 84e2911 to dbb4877 Compare March 29, 2019 09:39

richardjgowers reviewed May 8, 2019

View reviewed changes

Luthaf force-pushed the chemfiles branch from dbb4877 to 31bf9dc Compare May 8, 2019 16:56

Luthaf force-pushed the chemfiles branch from 31bf9dc to 1baa8f5 Compare May 15, 2019 19:08

Luthaf force-pushed the chemfiles branch 3 times, most recently from 64a7322 to 0e8485a Compare May 16, 2019 10:54

Luthaf changed the title ~~[WIP] Add Chemfiles as a coordinate reader/writer~~ Add Chemfiles as a coordinate reader/writer May 20, 2019

Luthaf force-pushed the chemfiles branch from 0e8485a to 806fa04 Compare May 20, 2019 15:27

Luthaf force-pushed the chemfiles branch from 806fa04 to f00ee28 Compare July 5, 2019 10:20

Luthaf and others added 12 commits February 18, 2020 09:54

Allow passing the format name to chemfiles

ad5d199

Add documentation to the code

b450ee3

Convert MDAnalysis topology to chemfiles when writing

634e326

added standard XYZ tests via Chemfiles

ca33ab2

some failures for .Writer and .dimensions (with no dimensions for xyz)

minor tweaks to get tests passing

5eadec7

Also translate segindex attribute to property

b4410db

Move version checking inside a function

93ddfd7

Documentation improvements

6f195ae

Add a test when opening a file without extension

c3d5321

Add a comment about unit conversion

eaf172c

Add a test checking that velocities and unit cell are written

359ea4e

made chemfiles optional dependency using FORMAT_HINT

b1aa7d2

richardjgowers force-pushed the chemfiles branch from 4369421 to b1aa7d2 Compare February 18, 2020 10:33

Luthaf commented Feb 18, 2020

View reviewed changes

richardjgowers added 2 commits February 18, 2020 19:32

properly skip chemfiles tests if bad version too

55e97e7

changelog and authors

8fcdc23

richardjgowers approved these changes Feb 18, 2020

View reviewed changes

kain88-de approved these changes Feb 18, 2020

View reviewed changes

fixed logic in chemfiles skipping

0338863

richardjgowers and others added 3 commits February 26, 2020 10:25

Merge branch 'develop' into chemfiles

85b7ac1

TST: skip test if wrong chemfiles version is installed.

570e191

Merge branch 'develop' into chemfiles

7f6395f

richardjgowers merged commit e416aa7 into MDAnalysis:develop Mar 10, 2020

Luthaf deleted the chemfiles branch March 10, 2020 12:24

Luthaf mentioned this pull request Jun 10, 2020

Future of chemfiles reader in MDA #2731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Chemfiles as a coordinate reader/writer #1862

Add Chemfiles as a coordinate reader/writer #1862

Luthaf commented Apr 10, 2018 •

edited

Loading

Luthaf Apr 10, 2018

kain88-de Apr 10, 2018

Luthaf Apr 10, 2018

kain88-de Apr 10, 2018

kain88-de Apr 10, 2018

kain88-de Apr 10, 2018

Luthaf Apr 11, 2018

kain88-de Apr 10, 2018

kain88-de Apr 10, 2018

kain88-de commented Apr 10, 2018

Luthaf commented Apr 13, 2018

richardjgowers commented Apr 16, 2018

jbarnoud Jun 17, 2018

Luthaf Jun 18, 2018

Luthaf Jul 3, 2018

richardjgowers Jul 3, 2018

Luthaf Jul 3, 2018

coveralls commented Jul 3, 2018 •

edited

Loading

codecov bot commented Mar 29, 2019 •

edited

Loading

richardjgowers left a comment

Luthaf commented May 15, 2019

Luthaf commented Jul 5, 2019

richardjgowers commented Jul 5, 2019

richardjgowers commented Feb 18, 2020

Luthaf Feb 18, 2020

richardjgowers Feb 18, 2020

richardjgowers commented Feb 18, 2020

kain88-de commented Feb 22, 2020

Luthaf commented Feb 23, 2020

Luthaf commented Mar 10, 2020

Add Chemfiles as a coordinate reader/writer #1862

Add Chemfiles as a coordinate reader/writer #1862

Conversation

Luthaf commented Apr 10, 2018 • edited Loading

Unresolved questions:

PR Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kain88-de commented Apr 10, 2018

Luthaf commented Apr 13, 2018

richardjgowers commented Apr 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented Jul 3, 2018 • edited Loading

codecov bot commented Mar 29, 2019 • edited Loading

Codecov Report

richardjgowers left a comment

Choose a reason for hiding this comment

Luthaf commented May 15, 2019

Luthaf commented Jul 5, 2019

richardjgowers commented Jul 5, 2019

richardjgowers commented Feb 18, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richardjgowers commented Feb 18, 2020

kain88-de commented Feb 22, 2020

Luthaf commented Feb 23, 2020

Luthaf commented Mar 10, 2020

Luthaf commented Apr 10, 2018 •

edited

Loading

coveralls commented Jul 3, 2018 •

edited

Loading

codecov bot commented Mar 29, 2019 •

edited

Loading