Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Netcdf with no time #4074

Merged
merged 18 commits into from
Mar 22, 2023
Merged

Conversation

DrDomenicoMarson
Copy link
Contributor

@DrDomenicoMarson DrDomenicoMarson commented Mar 15, 2023

Fixes #4073

Changes made in this Pull Request:
Now NetCDF trajectories without time variable will be successfully read.
In such cases:

  • time will be set to '-1' for each frame
  • dt will be set to 1 (this relies on the mechanism already in place, as we raise an AttributeError which trigger the default value of 1.

Running tests on my machine (python 3.10) I see no failures except for one which I can't see if it's related to the new changes. The failing test is test_parmed_parser.py, which relies only on topology and not trajectory data, as far as I can see.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello there first time contributor! Welcome to the MDAnalysis community! We ask that all contributors abide by our Code of Conduct and that first time contributors introduce themselves on the developer mailing list so we can get to know you. You can learn more about participating here. Please also add yourself to package/AUTHORS as part of this PR.

package/AUTHORS Outdated
@@ -209,6 +209,7 @@ Chronological list of authors
- Vishal Parmar
- Moritz Schaeffler
- Xu Hong Chen
- DrDomenicoMarson
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would prefer your real name rather than github handle here, but no obligation :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I edited it, I also prefer the real name, I just didn't notice real names were used :-)

@codecov
Copy link

codecov bot commented Mar 15, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (c452da7) 93.56% compared to head (15a5441) 93.56%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #4074   +/-   ##
========================================
  Coverage    93.56%   93.56%           
========================================
  Files          191      191           
  Lines        25075    25083    +8     
  Branches      4049     4050    +1     
========================================
+ Hits         23462    23470    +8     
  Misses        1093     1093           
  Partials       520      520           
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/TRJ.py 95.85% <100.00%> (+0.07%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @DrDomenicoMarson

I've added a couple of comments on the current code changes.

The other thing that needs to be added here are tests for these new code branches, these can be added to testsuite/MDAnalysisTests/coordinates/test_netcdf.py

Also please do introduce yourself on the mailing list as detailed by: #4074 (review)

@@ -640,7 +643,10 @@ def _read_frame(self, frame):
self.n_frames))
# note: self.trjfile.variables['coordinates'].shape == (frames, n_atoms, 3)
ts._pos[:] = self._get_var_and_scale('coordinates', frame)
ts.time = self._get_var_and_scale('time', frame)
try:
Copy link
Member

@IAlibay IAlibay Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid hitting a try except at each read (which can be quite expensive) and keep to the same formatting as the other optional variables, could you instead just assign an (edit: an not as) has_time flag on line 550?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, done as you suggested!

try:
ts.time = self._get_var_and_scale('time', frame)
except KeyError:
ts.time = -1
Copy link
Member

@IAlibay IAlibay Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to ping @MDAnalysis/coredevs for ideas on this one. Technically this case should only happen for non-temporal trajectory data, i.e. "a bunch of static frames that aren't related to each other by time".

Should we either a) use the timestep default, not set time and let it be set to dt * frame + offset, or b) set it to -1, c) some other standard we've been using somewhere that I've not kept up with.

I originally advocated for b), but (edit: after a) look at the PDB reader I'm starting to convince myself that a) might be the right approach (since PDB trajectories are also meant to be frames not linked via time).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this, the time is now frame * dt, but if I read multiple trajectories the "time" is restarted at each trajectory (in the example I'm adding to the tests, the trajectories have 3 frames, and I get a time as
0.0 1.0 2.0 0.0 1.0 2.0

Is there already somewhere saved the offset from the previous trajectory that I can use here?

Copy link
Member

@IAlibay IAlibay Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry, I should have probably clarified, the default action for ts.time is to do dt * frame + offset so I believe the answer is just not to set ts.time at all if the time variable doesn't exist (I think, it needs a bit of digging around the base and timestep files and I'm a bit short on time to be 100% sure today sorry!).

I think the chainreader should overwrite frames when the default happens, although I know there were some issues opened about its behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried as you suggested and it worked as intended, thanks!

@hmacdope
Copy link
Member

@IAlibay I think this sounds most similar in spirit to the PDB reader which leads me to a). If we get consensus we should write down somewhere that that should be the setup for "collection of frames in a trajectory format".

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some early questions / comments on the test files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bigger than all the other prmtops we have in our testsuite (going by lines alone, I think the biggest thing we have is ~ 18k lines.

If there's no smaller systems available, is there any way you could possibly bz2 this to save a bit of space?

Copy link
Contributor Author

@DrDomenicoMarson DrDomenicoMarson Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you are right, I forgot to edit the files. I removed all but the first 3 residues, everything is much smaller now!

@@ -102,6 +102,7 @@
"PRM12", "TRJ12_bz2", # Amber (v12 format, Issue 100)
"PRMncdf", "TRJncdf", "NCDF", # Amber (netcdf)
"PFncdf_Top", "PFncdf_Trj", # Amber ncdf with Positions and Forces
"CPPTRAJ_TRAJ_TOP", "CPPTRAJ_TRAJ_1", "CPPTRAJ_TRAJ_2", # Amber ncdf extracted from CPPTRAJ
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may ask, what is the difference between traj 1 and 2? Could you not just chain traj 1 twice?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are way smarter than me, you are absolutely right :-)

Copy link
Member

@IAlibay IAlibay Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm definitely not smarter 🤣 I just spend far too much time on this repository.

@@ -38,13 +38,13 @@

__all__ = [
"PSF", "DCD", "CRD", # CHARMM (AdK example, DIMS trajectory from JMB 2009 paper)
"DCD2", # CHARMM (AdK example, DIMS trajectory from PLOS Comput Biol paper)
"DCD2", # CHARMM (AdK example, DIMS trajectory from PLOS Comput Biol paper)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just ignore linting things for anything you aren't changing. Especially for this file given we're in the processing of changing it (see: #4056).

@IAlibay IAlibay self-requested a review March 16, 2023 12:29
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of small changes, then I think we're good to merge!

package/MDAnalysis/coordinates/TRJ.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_netcdf.py Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_netcdf.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two tiny things + that small extra test from the previous review.

package/CHANGELOG Outdated Show resolved Hide resolved
testsuite/MDAnalysisTests/datafiles.py Outdated Show resolved Hide resolved
@IAlibay
Copy link
Member

IAlibay commented Mar 16, 2023

Also the merge conflict needs resolving (mainly just squashing things in the changelog).

[CPPTRAJ_TRAJ, CPPTRAJ_TRAJ])

def test_chain_times(self, u):
"""Check times entries for a chain of trajectories without a defined time variable""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Check times entries for a chain of trajectories without a defined time variable""
"""Check times entries for a chain of trajectories without a defined time variable"""

Missing an extra quote mark.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, this looks good to me thanks!

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM also :)

@hmacdope hmacdope merged commit 72e3cdb into MDAnalysis:develop Mar 22, 2023
@DrDomenicoMarson DrDomenicoMarson deleted the netcdf-with-no-time branch March 24, 2023 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can't open NetCDF trajectories created by cpptraj
3 participants