-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Chemfiles as a coordinate reader/writer #1862
Conversation
dimensions = cell.lengths() + cell.angles() | ||
ts.dimensions = np.array(dimensions, dtype=np.float32) | ||
|
||
# TODO: should we make sure to keep the frame alive / copy the data? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
frame.positions()/frame.velocities()
returns a view into C++ memory, that is only valid as long as the underlying frame does not change. So we might want to keep it alive to prevent invalid memory access. But at the same time, it is a numpy array of float64, so maybe .astype(np.float32)
already performs a copy of the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
astype
does a copy. "Copy of the array, cast to a specified type"
ts.dimensions = np.array(dimensions, dtype=np.float32) | ||
|
||
# TODO: should we make sure to keep the frame alive / copy the data? | ||
# TODO: docs mention FORTRAN order, chemfiles uses C order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to work well, is there an issue with the doc or is the conversion automatically done for me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No like this you assign a new reference. If you would to ts.positions[:]
then python would to an inplace update. You are just doing a reference update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the inplace update should also do the type conversion automatically.
|
||
cell = frame.cell() | ||
dimensions = cell.lengths() + cell.angles() | ||
ts.dimensions = np.array(dimensions, dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ts.dimensions[:] = cell.lengths() + cell.angles()
a inplace update that does the type conversion for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, great! I changed the code to use it.
ts.dimensions = np.array(dimensions, dtype=np.float32) | ||
|
||
# TODO: should we make sure to keep the frame alive / copy the data? | ||
# TODO: docs mention FORTRAN order, chemfiles uses C order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the inplace update should also do the type conversion automatically.
ts.positions = frame.positions().astype(np.float32) | ||
if frame.has_velocities(): | ||
ts.has_velocities = True | ||
ts.velocities = frame.velocities().astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inplace update see comment above.
Thanks for this addon. You still need to install chemfiles on travis. Otherwise we can't see what the error is. |
Done! Here are the errors about |
@Luthaf this looks really interesting! So the |
def _open(self): | ||
self._file = Trajectory(self.filename, 'r', self._format) | ||
self._closed = False | ||
self._step = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding self._frame = -1
here allow the failing tests about auxiliaries to pass. I still need to figure out why, though.
It does not fix the test about containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for investigating! I'll try to get some time to give the containers test a look
""" | ||
Convert a Timestep to a chemfiles Frame | ||
""" | ||
if ts.n_atoms != self.n_atoms: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I do in this case ? Chemfiles have no issue with varying number of atoms in a trajectory (except if the underlying format does not support it), so writing would work. But then it would not be possible to read it again with MDAnalysis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The writer can be free to vary n_atoms, it's only on the input side that we can't have it vary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll document this behavior then.
84e2911
to
dbb4877
Compare
Codecov Report
@@ Coverage Diff @@
## develop #1862 +/- ##
===========================================
- Coverage 90.61% 90.61% -0.01%
===========================================
Files 173 174 +1
Lines 23381 23554 +173
Branches 3038 3072 +34
===========================================
+ Hits 21187 21343 +156
- Misses 1577 1585 +8
- Partials 617 626 +9
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @Luthaf
Sorry for letting this slip off my radar. I've been playing with this this morning. I think it's very cool, and is fast, I just need to go through and check that it can catch all corner cases....
I think I have added all that I wanted here, so this is ready for review ! |
64a7322
to
0e8485a
Compare
Travis failure looks weird, could someone restart the build? |
Yeah that’s something we fixed recently, some dependency problem. Did a restart, otherwise you can rebase and push to this branch |
some failures for .Writer and .dimensions (with no dimensions for xyz)
4369421
to
b1aa7d2
Compare
Ok I've rebased this on top of the FORMAT_HINT stuff. Chemfiles is now optional, but can be used in MDA if it is installed. ie this is what this branch does: In [1]: import MDAnalysis as mda
In [2]: mda._READER_HINTS
Out[2]:
{'CHAIN': <function MDAnalysis.coordinates.chain.ChainReader._format_hint(thing)>,
'CHEMFILES': <function MDAnalysis.coordinates.chemfiles.ChemfilesReader._format_hint(thing)>,
'PARMED': <function MDAnalysis.coordinates.ParmEd.ParmEdReader._format_hint(thing)>,
'MEMORY': <function MDAnalysis.coordinates.memory.MemoryReader._format_hint(thing)>,
'MMTF': <function MDAnalysis.coordinates.MMTF.MMTFReader._format_hint(thing)>}
In [3]: import chemfiles
In [4]: c = chemfiles.Trajectory('../data/2r9r-1b.xyz', 'r')
In [5]: mda.Universe('../data/2r9r-1b.xyz', c)
Out[5]: <Universe with 1284 atoms>
In [6]: u = mda.Universe('../data/2r9r-1b.xyz', c)
In [7]: u_ref = mda.Universe('../data/2r9r-1b.xyz')
In [8]: u_ref.atoms.positions == u.atoms.positions
Out[8]:
array([[ True, True, True],
[ True, True, True],
[ True, True, True],
...,
[ True, True, True],
[ True, True, True],
[ True, True, True]])
In [9]: u_other = mda.Universe('../data/2r9r-1b.xyz', '../data/2r9r-1b.xyz', format='CHEMFILES') Slightly annoying is that you need the two arguments there, I'll look into that. Probably needs some tests around passing in |
@@ -79,12 +88,13 @@ def __init__(self, filename, chemfiles_format="", **kwargs): | |||
""" | |||
Parameters | |||
---------- | |||
filename : str | |||
trajectory filename | |||
filename : chemfiles.Trajectory or str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I understand why passing a chemfiles.Trajectory should work. What would be the use case for this? Users creating chemfiles trajectories and then wanting to do analysis with MDA?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's so you can open a chemfiles thing as you want then pass this as the trajectory MDA will use. Passing a filename and specifying format='CHEMFILES'
still works, but that's relying on the baked in way to open a trajectory
Ok I think this is ready, @kain88-de do you want to give it a last look over? |
Why do the tests fail? |
This is strange... in https://travis-ci.com/MDAnalysis/mdanalysis/jobs/289697062, conda install chemfiles version 0.7.4 from conda-forge. I initially thought is was due to the builder using python 3.5, which is no longer supported by conda-forge, but the latest release with support for Python 3.5 is chemfiles 0.8 (https://anaconda.org/conda-forge/chemfiles-python/files). Maybe conda is using "minimal compatible version" when resolving versions to install? Anyway, the test should be skipped in that case, since chemfiles version is not compatible. |
Yay! Thanks a lot @richardjgowers and @kain88-de for helping with this 😃 |
Here is a first working version of the integration of chemfiles as a coordinate reader. See #1194 for some initial discussion on this. There are still a few TODO in the code, but I think this is ready for an initial round of review.
Currently I get three test failures: two related to aux, and one to container format. I don't really understand what is getting tested here, could someone give me a little pointer?Using chemfiles might help with:
is_hetatm
atomic propertyUnresolved questions:
Chemfiles supports reading/writing files with a non constant number of atoms. Currently I error when reading if the number of atoms changes, and I do nothing and write the new number of atoms when writing. Is this the desired behaviour?
PR Checklist