-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding H5MDWriter to H5MD.py #3189
Conversation
Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2021-08-21 10:28:58 UTC |
It's not crucial for 2.0 and could go into 2.1 (at this point there will likely be a citation for a SciPy paper); if @edisj wants to work double-time to attempt to squeeze it into a 2.0 then I won't object, though. |
Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?). I'm still testing some things with HPC benchmarks before updating this PR |
So unless I misunderstood our workshop targets, the 10th is really the day it needs to be on conda-forge. I think we'd have to put down the code freeze at the latest the Friday before, i.e. May 7th. |
…to issue2866-h5mdwriter
Hey @orbeckst , ready for review! :) (also @IAlibay if you wanted to take a look) I've made a lot of changes over the last few days and think it's complete other than maybe a couple exceptions to add and documentation to edit. I'll finish up the tests for codecov tomorrow There are a few parts of the code I'm not sure about, I'll open some discussions... |
Codecov Report
@@ Coverage Diff @@
## develop #3189 +/- ##
===========================================
+ Coverage 93.70% 93.79% +0.08%
===========================================
Files 177 177
Lines 22990 23206 +216
Branches 3247 3299 +52
===========================================
+ Hits 21542 21765 +223
+ Misses 1397 1390 -7
Partials 51 51
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry only had time to have a quick glance over, so it's mostly superficial comments.
I'll probably wait until you've added the extra tests before I re-review.
import MDAnalysis as mda | ||
import numpy as np | ||
from . import base, core | ||
from .base import Timestep |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you're using Timestep anywhere right? (it's all just inherited from ReaderBase and WriterBase)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@richardjgowers added that in #3132 but I think it can be removed, unless there was a specific reason for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow I didn't realize I had to hit "submit review" for my comments to show up, I thought I had already replied to your review comments sorry @IAlibay :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not used explicitly, remove it.
self.h5md_file = None | ||
|
||
# check which datasets are to be written | ||
self.has_positions = kwargs.get('positions', False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it's something we do for the other writers (although I'm trying to change this for the NetCDF writer), so I'm kinda asking to get @orbeckst's opinion on this.
There's a lot of kwargs here, and it's not immediately clear to me that they have user-facing documentation. Would it be worth not making these kwargs but instead named optional arguments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's a good idea. We still need the catch-all **kwargs
for anything else that we do not care about but having explicit kwarg names in the signature (+ defaults) is good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a deeper look but here are some initial comments from a quick scan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--> review through to line 1240 of H5MD.py
Sorry I'm going to have to break up my reviews into chunks. Here is the first part of my review, I'll follow up with the rest probably tomorrow at this rate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second half of review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to re-review a bit later today, although if you think all my comments were addressed please do go ahead with the merge @orbeckst |
Your point about "check exception messages" was a good one — I am currently watching @edisj find out which exceptions are actually triggered .... so my comments had been addressed, yours should be done in the next hour. |
Hi @IAlibay , I just pushed my final changes, all comments should be addressed. All error tests now have a If you have time to review, there are a couple of decisions made while talking to @orbeckst that might need a little justification:
Thanks again for your review :D, if there's anything else that remains let me know |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @edisj I'm just having a quick look through now. Just doing this quick pre-review comment to include this duecredit thing - not sure if it needs testing or not ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just the one comment from a quick scan though. I'll directly approve now since it's just a documentation style thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a couple doc quibbles that can slide
W.write(u) | ||
|
||
To write an H5MD file with contiguous datasets, you must specifly the | ||
number of frames to be written and set ``chunks=False``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a quick one line hint as to why someone would want to do this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a note for it on line 948, although not sure if it fully covers what you're asking @richardjgowers ?
Seems to be everything, I'll merge once CI returns green - thanks for all the work @edisj 🎉 |
Congratulations @edisj — PR with >200 comments and what turned out to be one of the more complicated formats. Given that this could be considered, together with the SciPy paper, the capstone of your NSF REU experience, would you want to write a blog post for https://www.mdanalysis.org/blog/ where you summarize your experience and achievements? |
Fixes #2866
Changes made in this Pull Request:
class H5MDWriter
toH5MD.py
This is a continuation from #2787 and #2869. I closed #2869 because I tried rebasing to current develop and ended up making a mess of things, so it was easier to make a new pull request.
At the moment, the writer has all of the functionality it needs. It writes all of the data from the
timestep
(positions, velocities, etc), it writes everything from thets.data
dictionary that isn't'time', 'step', 'dt'
into the 'observables' group. It can write datasets with chunked and/or compressed configurations, and files can be opened with MPI drivers.The current to-do list is:
Benchmark the writer on multiple nodes with different configurations (chunked, compression)Figure out what to do the shape argument forstep
andtime
datasets when the number of frames written isn't known by the writer. This could be solved by chunking, but I'm not sure what the default chunk size should be for these datasets.PR Checklist