Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding H5MDWriter to H5MD.py #3189

Merged
merged 49 commits into from
Aug 21, 2021
Merged

Conversation

edisj
Copy link
Contributor

@edisj edisj commented Mar 24, 2021

Fixes #2866

Changes made in this Pull Request:

  • adds class H5MDWriter to H5MD.py

This is a continuation from #2787 and #2869. I closed #2869 because I tried rebasing to current develop and ended up making a mess of things, so it was easier to make a new pull request.

At the moment, the writer has all of the functionality it needs. It writes all of the data from the timestep (positions, velocities, etc), it writes everything from the ts.data dictionary that isn't 'time', 'step', 'dt' into the 'observables' group. It can write datasets with chunked and/or compressed configurations, and files can be opened with MPI drivers.

The current to-do list is:

  • Benchmark the writer on multiple nodes with different configurations (chunked, compression)
  • Have the code reviewed and clean things up
  • Figure out what to do the shape argument for step and time datasets when the number of frames written isn't known by the writer. This could be solved by chunking, but I'm not sure what the default chunk size should be for these datasets.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@pep8speaks
Copy link

pep8speaks commented Mar 24, 2021

Hello @edisj! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 107:80: E501 line too long (94 > 79 characters)
Line 973:80: E501 line too long (101 > 79 characters)
Line 974:80: E501 line too long (83 > 79 characters)
Line 1335:80: E501 line too long (80 > 79 characters)

Line 262:9: E128 continuation line under-indented for visual indent
Line 415:9: E128 continuation line under-indented for visual indent
Line 416:9: E128 continuation line under-indented for visual indent
Line 509:80: E501 line too long (80 > 79 characters)

Line 55:80: E501 line too long (89 > 79 characters)
Line 84:80: E501 line too long (89 > 79 characters)
Line 110:80: E501 line too long (89 > 79 characters)

Comment last updated at 2021-08-21 10:28:58 UTC

@edisj edisj mentioned this pull request Mar 24, 2021
4 tasks
@IAlibay
Copy link
Member

IAlibay commented Apr 22, 2021

@edisj @orbeckst what's the target milestone for this? (i.e. is this a long term goal or are you planning 2.0/2.1?)

@orbeckst
Copy link
Member

It's not crucial for 2.0 and could go into 2.1 (at this point there will likely be a citation for a SciPy paper); if @edisj wants to work double-time to attempt to squeeze it into a 2.0 then I won't object, though.

@IAlibay IAlibay added this to the 2.1.0 milestone Apr 22, 2021
@edisj
Copy link
Contributor Author

edisj commented Apr 22, 2021

Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?). I'm still testing some things with HPC benchmarks before updating this PR

@IAlibay
Copy link
Member

IAlibay commented Apr 22, 2021

Hi @IAlibay , I'll push to have it done by 2.0 (May 10th, right?).

So unless I misunderstood our workshop targets, the 10th is really the day it needs to be on conda-forge. I think we'd have to put down the code freeze at the latest the Friday before, i.e. May 7th.

@orbeckst orbeckst self-assigned this May 5, 2021
@orbeckst orbeckst added the NSF REU NSF Research Experience for Undergraduates project label May 5, 2021
@edisj edisj marked this pull request as draft May 7, 2021 23:28
@edisj edisj marked this pull request as ready for review June 6, 2021 07:51
@edisj
Copy link
Contributor Author

edisj commented Jun 6, 2021

Hey @orbeckst , ready for review! :) (also @IAlibay if you wanted to take a look)

I've made a lot of changes over the last few days and think it's complete other than maybe a couple exceptions to add and documentation to edit. I'll finish up the tests for codecov tomorrow

There are a few parts of the code I'm not sure about, I'll open some discussions...

@codecov
Copy link

codecov bot commented Jun 6, 2021

Codecov Report

Merging #3189 (2f4038c) into develop (bd0ed5d) will increase coverage by 0.08%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3189      +/-   ##
===========================================
+ Coverage    93.70%   93.79%   +0.08%     
===========================================
  Files          177      177              
  Lines        22990    23206     +216     
  Branches      3247     3299      +52     
===========================================
+ Hits         21542    21765     +223     
+ Misses        1397     1390       -7     
  Partials        51       51              
Impacted Files Coverage Δ
package/MDAnalysis/coordinates/base.py 96.06% <ø> (+0.71%) ⬆️
package/MDAnalysis/coordinates/H5MD.py 97.53% <100.00%> (+3.32%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bd0ed5d...2f4038c. Read the comment docs.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry only had time to have a quick glance over, so it's mostly superficial comments.

I'll probably wait until you've added the extra tests before I re-review.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
import MDAnalysis as mda
import numpy as np
from . import base, core
from .base import Timestep
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you're using Timestep anywhere right? (it's all just inherited from ReaderBase and WriterBase)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardjgowers added that in #3132 but I think it can be removed, unless there was a specific reason for it

Copy link
Contributor Author

@edisj edisj Jun 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't realize I had to hit "submit review" for my comments to show up, I thought I had already replied to your review comments sorry @IAlibay :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it's not used explicitly, remove it.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
self.h5md_file = None

# check which datasets are to be written
self.has_positions = kwargs.get('positions', False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it's something we do for the other writers (although I'm trying to change this for the NetCDF writer), so I'm kinda asking to get @orbeckst's opinion on this.

There's a lot of kwargs here, and it's not immediately clear to me that they have user-facing documentation. Would it be worth not making these kwargs but instead named optional arguments?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's a good idea. We still need the catch-all **kwargs for anything else that we do not care about but having explicit kwarg names in the signature (+ defaults) is good.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a deeper look but here are some initial comments from a quick scan.

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> review through to line 1240 of H5MD.py

Sorry I'm going to have to break up my reviews into chunks. Here is the first part of my review, I'll follow up with the rest probably tomorrow at this rate.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second half of review.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
testsuite/MDAnalysisTests/coordinates/test_h5md.py Outdated Show resolved Hide resolved
Copy link
Member

@orbeckst orbeckst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor things & mostly agreeing with @IAlibay (except that I think i can stay ;-) ).

You'll have to ask @IAlibay how fast you need to be in order to try and slip this into 2.0.0.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
package/MDAnalysis/coordinates/H5MD.py Show resolved Hide resolved
@IAlibay
Copy link
Member

IAlibay commented Aug 20, 2021

I can try to re-review a bit later today, although if you think all my comments were addressed please do go ahead with the merge @orbeckst

@orbeckst
Copy link
Member

Your point about "check exception messages" was a good one — I am currently watching @edisj find out which exceptions are actually triggered .... so my comments had been addressed, yours should be done in the next hour.

@edisj
Copy link
Contributor Author

edisj commented Aug 20, 2021

Hi @IAlibay , I just pushed my final changes, all comments should be addressed. All error tests now have a match. I hope there's still time to make it into 2.0.0...

If you have time to review, there are a couple of decisions made while talking to @orbeckst that might need a little justification:

  • Removed the has_* setters from the writer because we realized that in order to actually write a test to activate them, we had to write code that wouldn't ever be used for an H5MD trajectory, so the setter itself was pointless
  • Passed defaults (compression, compression_opts, driver) from input H5MD into H5MDReader.Writer(), but intentionally left out chunks. This is because H5MDWriter chooses (1, n_atoms, 3) by default, and accidently choosing a bad chunk shape can be really bad for performance. I added a detailed note in the docstring of H5MDReader.Writer() to explain the situation and also say how to manually set the chunk shape if you wanted to.
    EDIT: more reasoning- if the original file was written with a poor chunk layout (which turns out can easily happen due to h5py's auto-chunking), then H5MDReader.Writer() will at the very least not propagate a bad chunk shape, which is good if the user isn't aware of chunking. But in cases where an advanced user knows how to use chunking, the option is still there (and documented) on how to set it manually

Thanks again for your review :D, if there's anything else that remains let me know

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @edisj I'm just having a quick look through now. Just doing this quick pre-review comment to include this duecredit thing - not sure if it needs testing or not ?

Copy link
Member

@IAlibay IAlibay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the one comment from a quick scan though. I'll directly approve now since it's just a documentation style thing.

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
@IAlibay IAlibay mentioned this pull request Aug 21, 2021
2 tasks
@edisj
Copy link
Contributor Author

edisj commented Aug 21, 2021

@IAlibay @orbeckst just merged with current develop for good measure, but I think the final points regarding the duecredit test and the backticks are resolved

Copy link
Member

@richardjgowers richardjgowers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple doc quibbles that can slide

package/MDAnalysis/coordinates/H5MD.py Outdated Show resolved Hide resolved
W.write(u)

To write an H5MD file with contiguous datasets, you must specifly the
number of frames to be written and set ``chunks=False``:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a quick one line hint as to why someone would want to do this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a note for it on line 948, although not sure if it fully covers what you're asking @richardjgowers ?

package/MDAnalysis/coordinates/base.py Show resolved Hide resolved
@IAlibay
Copy link
Member

IAlibay commented Aug 21, 2021

Seems to be everything, I'll merge once CI returns green - thanks for all the work @edisj 🎉

@IAlibay IAlibay merged commit ddb1d95 into MDAnalysis:develop Aug 21, 2021
@orbeckst
Copy link
Member

Congratulations @edisj — PR with >200 comments and what turned out to be one of the more complicated formats.

Given that this could be considered, together with the SciPy paper, the capstone of your NSF REU experience, would you want to write a blog post for https://www.mdanalysis.org/blog/ where you summarize your experience and achievements?

@edisj edisj deleted the issue2866-h5mdwriter branch April 7, 2024 07:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement NSF REU NSF Research Experience for Undergraduates project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding H5MDWriter to H5MD.py
6 participants