-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Netcdf with no time #4074
Netcdf with no time #4074
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on the developer mailing list so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS
as part of this PR.
package/AUTHORS
Outdated
@@ -209,6 +209,7 @@ Chronological list of authors | |||
- Vishal Parmar | |||
- Moritz Schaeffler | |||
- Xu Hong Chen | |||
- DrDomenicoMarson |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would prefer your real name rather than github handle here, but no obligation :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I edited it, I also prefer the real name, I just didn't notice real names were used :-)
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #4074 +/- ##
========================================
Coverage 93.56% 93.56%
========================================
Files 191 191
Lines 25075 25083 +8
Branches 4049 4050 +1
========================================
+ Hits 23462 23470 +8
Misses 1093 1093
Partials 520 520
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @DrDomenicoMarson
I've added a couple of comments on the current code changes.
The other thing that needs to be added here are tests for these new code branches, these can be added to testsuite/MDAnalysisTests/coordinates/test_netcdf.py
Also please do introduce yourself on the mailing list as detailed by: #4074 (review)
@@ -640,7 +643,10 @@ def _read_frame(self, frame): | |||
self.n_frames)) | |||
# note: self.trjfile.variables['coordinates'].shape == (frames, n_atoms, 3) | |||
ts._pos[:] = self._get_var_and_scale('coordinates', frame) | |||
ts.time = self._get_var_and_scale('time', frame) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to avoid hitting a try except at each read (which can be quite expensive) and keep to the same formatting as the other optional variables, could you instead just assign an (edit: an not as) has_time
flag on line 550?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, done as you suggested!
try: | ||
ts.time = self._get_var_and_scale('time', frame) | ||
except KeyError: | ||
ts.time = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to ping @MDAnalysis/coredevs for ideas on this one. Technically this case should only happen for non-temporal trajectory data, i.e. "a bunch of static frames that aren't related to each other by time".
Should we either a) use the timestep default, not set time and let it be set to dt * frame + offset, or b) set it to -1, c) some other standard we've been using somewhere that I've not kept up with.
I originally advocated for b), but (edit: after a) look at the PDB reader I'm starting to convince myself that a) might be the right approach (since PDB trajectories are also meant to be frames not linked via time).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed this, the time is now frame * dt
, but if I read multiple trajectories the "time" is restarted at each trajectory (in the example I'm adding to the tests, the trajectories have 3 frames, and I get a time as
0.0 1.0 2.0 0.0 1.0 2.0
Is there already somewhere saved the offset from the previous trajectory that I can use here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, I should have probably clarified, the default action for ts.time
is to do dt * frame + offset
so I believe the answer is just not to set ts.time
at all if the time variable doesn't exist (I think, it needs a bit of digging around the base and timestep files and I'm a bit short on time to be 100% sure today sorry!).
I think the chainreader should overwrite frames
when the default happens, although I know there were some issues opened about its behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried as you suggested and it worked as intended, thanks!
@IAlibay I think this sounds most similar in spirit to the PDB reader which leads me to a). If we get consensus we should write down somewhere that that should be the setup for "collection of frames in a trajectory format". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some early questions / comments on the test files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bigger than all the other prmtops we have in our testsuite (going by lines alone, I think the biggest thing we have is ~ 18k lines.
If there's no smaller systems available, is there any way you could possibly bz2 this to save a bit of space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, I forgot to edit the files. I removed all but the first 3 residues, everything is much smaller now!
@@ -102,6 +102,7 @@ | |||
"PRM12", "TRJ12_bz2", # Amber (v12 format, Issue 100) | |||
"PRMncdf", "TRJncdf", "NCDF", # Amber (netcdf) | |||
"PFncdf_Top", "PFncdf_Trj", # Amber ncdf with Positions and Forces | |||
"CPPTRAJ_TRAJ_TOP", "CPPTRAJ_TRAJ_1", "CPPTRAJ_TRAJ_2", # Amber ncdf extracted from CPPTRAJ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I may ask, what is the difference between traj 1 and 2? Could you not just chain traj 1 twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are way smarter than me, you are absolutely right :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm definitely not smarter 🤣 I just spend far too much time on this repository.
@@ -38,13 +38,13 @@ | |||
|
|||
__all__ = [ | |||
"PSF", "DCD", "CRD", # CHARMM (AdK example, DIMS trajectory from JMB 2009 paper) | |||
"DCD2", # CHARMM (AdK example, DIMS trajectory from PLOS Comput Biol paper) | |||
"DCD2", # CHARMM (AdK example, DIMS trajectory from PLOS Comput Biol paper) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just ignore linting things for anything you aren't changing. Especially for this file given we're in the processing of changing it (see: #4056).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of small changes, then I think we're good to merge!
Co-authored-by: Irfan Alibay <[email protected]>
Co-authored-by: Irfan Alibay <[email protected]>
Co-authored-by: Irfan Alibay <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two tiny things + that small extra test from the previous review.
Also the merge conflict needs resolving (mainly just squashing things in the changelog). |
Co-authored-by: Irfan Alibay <[email protected]>
Co-authored-by: Irfan Alibay <[email protected]>
[CPPTRAJ_TRAJ, CPPTRAJ_TRAJ]) | ||
|
||
def test_chain_times(self, u): | ||
"""Check times entries for a chain of trajectories without a defined time variable"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check times entries for a chain of trajectories without a defined time variable"" | |
"""Check times entries for a chain of trajectories without a defined time variable""" |
Missing an extra quote mark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the delay, this looks good to me thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM also :)
Fixes #4073
Changes made in this Pull Request:
Now NetCDF trajectories without
time
variable will be successfully read.In such cases:
time
will be set to '-1' for each framedt
will be set to1
(this relies on the mechanism already in place, as we raise anAttributeError
which trigger the default value of1
.Running tests on my machine (python 3.10) I see no failures except for one which I can't see if it's related to the new changes. The failing test is
test_parmed_parser.py
, which relies only on topology and not trajectory data, as far as I can see.PR Checklist