Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AuxReader for EDR Files #3749

Merged
merged 63 commits into from
Sep 19, 2022
Merged

AuxReader for EDR Files #3749

merged 63 commits into from
Sep 19, 2022

Conversation

BFedder
Copy link
Contributor

@BFedder BFedder commented Jul 9, 2022

Fixes #3629

With panedrlite now closer to pyedr now being a thing, and the actual code itself less likely to still change, I have now finally started work on the EDR auxiliary reader that I want to implement as part of my GSoC project.
Currently, this is just a skeleton, but it already allows EDR data to be read and assigned to AuxStep instances.

To Do:

  • Finish implementation, allow similar features as XVGReader
  • Write tests
  • Improve documentation (currently barely there)
  • Write self-contained example in documentation
  • Change the add_auxiliary method to accommodate EDR data

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Jul 9, 2022

Hello @BFedder! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 171:80: E501 line too long (88 > 79 characters)
Line 174:80: E501 line too long (86 > 79 characters)
Line 175:80: E501 line too long (91 > 79 characters)

Line 56:80: E501 line too long (91 > 79 characters)
Line 57:80: E501 line too long (91 > 79 characters)
Line 528:1: E402 module level import not at top of file

Comment last updated at 2022-09-12 10:12:49 UTC

Copy link
Member

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good so far. Just some small comments and nitpicks. Would be good to add type hints also when you have decided on final design. :)

package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
@IAlibay IAlibay requested a review from fiona-naughton July 14, 2022 15:42
@codecov
Copy link

codecov bot commented Jul 20, 2022

Codecov Report

Base: 94.31% // Head: 94.33% // Increases project coverage by +0.02% 🎉

Coverage data is based on head (828b2dc) compared to base (93c0aac).
Patch coverage: 99.35% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #3749      +/-   ##
===========================================
+ Coverage    94.31%   94.33%   +0.02%     
===========================================
  Files          192      193       +1     
  Lines        24781    24975     +194     
  Branches      3341     3369      +28     
===========================================
+ Hits         23371    23561     +190     
- Misses        1362     1365       +3     
- Partials        48       49       +1     
Impacted Files Coverage Δ
package/MDAnalysis/auxiliary/EDR.py 98.90% <98.90%> (ø)
package/MDAnalysis/auxiliary/XVG.py 95.61% <100.00%> (+0.07%) ⬆️
package/MDAnalysis/auxiliary/__init__.py 100.00% <100.00%> (ø)
package/MDAnalysis/auxiliary/base.py 94.01% <100.00%> (+2.69%) ⬆️
package/MDAnalysis/coordinates/base.py 94.37% <100.00%> (+0.38%) ⬆️
package/MDAnalysis/units.py 100.00% <100.00%> (ø)
package/MDAnalysis/lib/pkdtree.py 91.91% <0.00%> (-2.59%) ⬇️
MDAnalysisTests/auxiliary/base.py 89.74% <0.00%> (-1.58%) ⬇️
package/MDAnalysis/lib/mdamath.py 100.00% <0.00%> (ø)
MDAnalysisTests/auxiliary/__init__.py 100.00% <0.00%> (ø)
... and 4 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@@ -76,6 +76,8 @@ inputs:
default: 'pytest-cov<=2.10.1'
pytest-xdist:
default: 'pytest-xdist'
pyedr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we making pyedr an optional dependency? I'm somewhat ok either way, if we could drop the 14MB test files in the next release it's small enough that it could be core if we wanted to. Thoughts @MDAnalysis/coredevs ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, pyedr needs to go into the setup.py either as an extra_requires (if not a core dependency), or just as an install_requires otherwise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can decrease the size to <1 MB then it's just more convenient to have it as a core dependency — it's light weight and we maintain it.

@@ -76,6 +76,8 @@ inputs:
default: 'pytest-cov<=2.10.1'
pytest-xdist:
default: 'pytest-xdist'
pyedr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs adding to the azure pipelines file:

python -m pip install --only-binary=h5py
cython
hypothesis
matplotlib
numpy
packaging
pytest
pytest-cov
pytest-xdist
scikit-learn
scipy
h5py>=2.10
tqdm
threadpoolctl
fasteners

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not at this point making pyedr a core dependency - going by the fact that we aren't touching setup.py etc... (I would like to get more coredev oks on this anyways)

As a result, I would ask that this be moved to the pip optional dependencies section.

:class:`~MDAnalysis.auxiliary.base.AuxStep`
"""

def __init__(self, time_selector="Time", data_selector=None, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should go ahead and add type hints here as you go along

package/MDAnalysis/auxiliary/EDR.py Show resolved Hide resolved
@hmacdope hmacdope added the GSoC GSoC project label Jul 23, 2022
@BFedder
Copy link
Contributor Author

BFedder commented Jul 24, 2022

I have now adapted the tests for the trajectory-independent functionality. Next step will be the association of energy data to trajectory frames.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor drive-by comments

testsuite/MDAnalysisTests/auxiliary/base.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Outdated Show resolved Hide resolved
package/MDAnalysis/auxiliary/EDR.py Show resolved Hide resolved
testsuite/MDAnalysisTests/auxiliary/test_edr.py Outdated Show resolved Hide resolved
@BFedder
Copy link
Contributor Author

BFedder commented Sep 7, 2022

The remaining PEP8 complaints are URLs, a table, and the import done to match what's already present.

@hmacdope
Copy link
Member

hmacdope commented Sep 8, 2022

@IAlibay you happy with moving forward with the PEP8 the way it is?

@IAlibay
Copy link
Member

IAlibay commented Sep 9, 2022

@IAlibay you happy with moving forward with the PEP8 the way it is?

I'll try to give it a read through tomorrow, but technically I'm on holiday so no promises (I'm on "pressing matters or stuff that I want to play with" only atm)

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First review set - base maintenance stuff.

Pyedr should remain an optional dependency for now.

@@ -76,6 +76,8 @@ inputs:
default: 'pytest-cov<=2.10.1'
pytest-xdist:
default: 'pytest-xdist'
pyedr:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not at this point making pyedr a core dependency - going by the fact that we aren't touching setup.py etc... (I would like to get more coredev oks on this anyways)

As a result, I would ask that this be moved to the pip optional dependencies section.

- name: pip_install
shell: bash -l {0}
env:
PIP_MIN_DEPS: |
${{ inputs.coverage }}
${{ inputs.pytest-cov }}
${{ inputs.pytest-xdist }}
${{ inputs.pyedr }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, optional dep please.

@@ -96,6 +96,7 @@ jobs:
tqdm
threadpoolctl
fasteners
pyedr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not common practice to have optional deps in azure pipeline runners, I'd remove it for now.

@@ -35,7 +47,6 @@ Deprecations

08/29/22 IAlibay, PicoCentauri, orbeckst, hmacdope, rmeli, miss77jun, rzhao271,
yuxuanzhuang, hsadia538, lilyminium

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't remove this.

@IAlibay
Copy link
Member

IAlibay commented Sep 9, 2022

First review set - base maintenance stuff.

Pyedr should remain an optional dependency for now.

I see that it is being added to setup.py... honestly I don't know, here @orbeckst @hmacdope can you please run this through the business channel? I've become increasingly reluctant to add new dependencies to core (indeed my plan is to remove half the ones we already have by 3.0). I'll agree to whatever the majority agreement is.

edit: I should note, if we make it a core requirement, that means we need to ensure that we either ship it with very small test files, or we don't ship the test files at all. The ~ 20 MB tarball with the tests is not appropriate for the core library. That also needs to be done before 2.4.0 is released.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unfortunately a quick review, but there are a few things that need addressing.

package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
Comment on lines 403 to 405
with pytest.warns(UserWarning, match="AuxReader: memory usage "
"warning! Auxiliary data takes up 3.328e-06 GB of "
r"memory \(Warning limit: 1e-08 GB\)"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String match on a float seems like it might be a bit of a flaky test. I'd be happy with just matching through to warning!

Copy link
Member

@orbeckst orbeckst Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use regular expressions for something vaguely float-like. Btw, the . is already matched as "any character".

Maybe r"warning! Auxiliary data takes up 3[0-9.]*e-06 GB of " will roughly capture the message and will fail if the size changes dramatically

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the regular expression as suggested, thanks! The reason I match through to the warning limit here is to also test if changing the warning limit actually works as expected.


.. note::
The following will change with the release of MDAnalysis 3.0, which is
scheduled for September 2023. From then on, the order of these two
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't add specific dates for releases in the docs. We never know what might happen and I prefer knowing exactly where our public facing schedule is (which right now is a work-in progress blog post).

Note
----
The file is assumed to be of a size such that reading and storing the full
contents is practical.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please elaborate based on the memory warning stuff

Comment on lines 328 to 330
raise ValueError('Step {0} has {1} columns instead of '
'{2}'.format(self.step, len(auxstep._data),
'{2}'.format(self.step,
len(auxstep._data),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is going to change, use f-strings

Comment on lines 120 to 122



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

One line between class methods?

package/setup.py Outdated
@@ -603,6 +603,7 @@ def long_description(readme):
'packaging',
'fasteners',
'gsd>=1.9.3',
'pyedr',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go ahead with pyedr as a core dep, please pin to >=0.7.0 everywhere.

Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BFedder , sorry to come back with changes. After a quick discussion I became convinced that we should start out with pyedr being an optional dependency. We want to reduce the initial dependency footprint of MDAnalysis anyway and we especially cannot require people to install 20 MB of test files for pyedr. Until PR MDAnalysis/panedr#55 (or something similar that reduces the test set) is merged, we definitely want to keep pyedr optional.

Please change the PR accordingly. For some inspiration, see @hmacdope 's TNG PR #3765 to see how he set up the whole "optional reader" thing. For example, the dependency then goes under the "extra_formats" in setup.py and you typically guard your imports with a documented HAS_PYEDR module variable.

@BFedder
Copy link
Contributor Author

BFedder commented Sep 12, 2022

Thanks for the reviews, this should address them @IAlibay @orbeckst

@BFedder
Copy link
Contributor Author

BFedder commented Sep 12, 2022

I'm not sure what to do about the Cirrus CI failing, the error message seems like it's not due to any change I made... Could I ask for advice, please?

@hmacdope
Copy link
Member

Ping @tylerjereddy? RE cirrus? I'm not so sure either.

@IAlibay
Copy link
Member

IAlibay commented Sep 13, 2022

I've restarted the task, as Tyler previous detailed somewhere else, the issue seems to be purely down to Cirrus getting hammered pretty heavily by a bunch of folks that suddenly realised they had free M1 runner available. Hopefully it should complete now.

@tylerjereddy
Copy link
Member

Looks good now, feel free to deactivate Cirrus if it becomes annoying of course.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more things from a quick review.

package/CHANGELOG Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
with pytest.raises(ImportError, match="please install pyedr"):
aux = mda.auxiliary.EDR.EDRReader(AUX_EDR)

# The EDRReader behaves differently from the XVGReader, creating dummy test
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this comment needs to be here, could it possibly be docstring instead?

return_time_diff=True)

assert frame == idx
np.testing.assert_allclose(time_diff, idx * 0.002)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why call the full signature here when you import assert_allclose specifically?

package/MDAnalysis/units.py Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of comments, mostly in response to @IAlibay 's review. Please see inline.

package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
**kwargs):
# allow auxname to be optional for when using reader separate from
# trajectory.
self.auxname = auxname
self.represent_ts_as = represent_ts_as
self.cutoff = cutoff
# no cutoff was previously passed as -1. This check is to maintain this
# behaviour: negative number is appropriately turned to None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is not quite the old behavior: we now use any negative number as None. Previously only -1 would be seen as default. Arguably, that was a bug and we would have gotten weird behavior for -2.

Therefore, I support the change as is.

I think you can add a CHANGELOG entry for fixes that you removed undefined behavior for negative numbers <-1 for cutoff.

package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
Comment on lines 800 to 803
if term not in value:
value[term] = cutoff_data[dataset][term]
else:
value[term] += cutoff_data[dataset][term]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a defaultdict avoids the if/else. Or initialize a dict with all the terms and zeros. But avoid branches insides loops if you can.

(It might not be a big performance penalty here but I think it's always good to strive for cleaner code.)

package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
@BFedder
Copy link
Contributor Author

BFedder commented Sep 14, 2022

I have now used a defaultdict in calc_representative, moved the comments in test_edr into class docstrings, and opened issue #3830 as a place to discuss the future of calc_representative with averages.

@hmacdope
Copy link
Member

@IAlibay @orbeckst are we satisfied that pyedr is now an optional dependency and that your review questions have been addressed? I had a read over and am satisfied. :)

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few final couple of things and I think we're good.

I'll approve now so that I'm not blocking once this get addressed.


Enhancements
* AuxReaders not have a memory_limit parameter to control when memory usage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* AuxReaders not have a memory_limit parameter to control when memory usage
* AuxReaders now have a memory_limit parameter to control when memory usage


* 2.4.0

Fixes
* Fixes distances.between not always returning AtomGroup (Issue #3794)
* Upgrade to chemfiles 0.10.3 (Issue #3798)
* Auxiliary; determination of representative frames: Removed undefined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kinda lost in the comment sea, I'd have thought it'd be fine to only have one of these in "changes", but if someone requested it here too then that's fine. Although maybe newest entry first ;)

aux_spec = {term: term for term in self.terms}

elif isinstance(aux_spec, str):
# This is to keep XVGReader functioning as-is, working with strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment not review thing - The fact that we have a special case for XVG kinda breaks the idea that this should be auxreader independent (I know we've discussed this before). We should keep an eye on this when we add new readers, if this ever expands we'll need to revisit it.

package/MDAnalysis/auxiliary/base.py Show resolved Hide resolved
.github/actions/setup-deps/action.yaml Outdated Show resolved Hide resolved
maintainer/conda/environment.yml Outdated Show resolved Hide resolved
@BFedder
Copy link
Contributor Author

BFedder commented Sep 19, 2022

This latest batch of reviews is now also addressed, and the checks pass. I believe the EDRReader is now ready to be merged, then 🥳 Thanks everyone for your mentorship, reviews, and help over these past few months! I have learnt a lot and am looking forward to continue contributing to MDAnalysis.

@hmacdope
Copy link
Member

Congratulations @BFedder!

@hmacdope hmacdope merged commit 22ceb5a into MDAnalysis:develop Sep 19, 2022
@BFedder BFedder deleted the edr_reader branch September 21, 2022 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement an AuxReader that supports GROMACS EDR files
8 participants