-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read APBS .dx.gz generated grids #70
Conversation
Travis shows 'pending' state in github but it already finished. This could be used like this: from gridData import Grid
grid = Grid("data.dx.gz", file_format='DXGZ') Is this something that you would merge into master? Thanks |
Hi @eloyfelix this looks like a nice addition. Definitely good for merging. Can you add a test for this and add yourself to the authors? I wonder if we should just allow this for all files. If it ends in |
Codecov Report
@@ Coverage Diff @@
## master #70 +/- ##
=========================================
+ Coverage 87.03% 87.34% +0.3%
=========================================
Files 5 5
Lines 779 798 +19
Branches 113 118 +5
=========================================
+ Hits 678 697 +19
Misses 60 60
Partials 41 41
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #70 +/- ##
==========================================
+ Coverage 87.03% 87.31% +0.27%
==========================================
Files 5 5
Lines 779 796 +17
Branches 113 115 +2
==========================================
+ Hits 678 695 +17
Misses 60 60
Partials 41 41
Continue to review full report at Codecov.
|
Added capability to write .dx.gz files and tests |
The functionality is very useful – not sure why I didn't put it in the first place because I gzip my dx files, too. But I don't like additional kwargs to indicate compression. I agree with @kain88-de that this should be universal and should just be based on the extension.
We have functionality in MDAnalysis |
@eloyfelix you can either wait and see if we can just take the |
Relicensing of
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for your contribution. As already mentioned in the comments, we would like this to work by filename extension instead of explicit kwargs.
- Please change your code to use filename extension for deciding how to read/write the file.
- Simplify the tests by running the same tests for compressed and uncompressed files: use
pytest.mark.parametrize('filename', [datafiles.DX, datafiles.DXGZ])
andpytest.mark.parametrize('outfile', ["grid.dx", "grid.dx.gz"])
- add an entry to CHANGELOG
Many thanks!
Many thanks for your suggestions @orbeckst. I'll take a look at it asap and try to make the changes to better fit to your existing codebase. |
this should be, at least, a bit closer :P |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. Some additional things
- move the logic for writing compressed files into the field.write() method
GridDataFormats/gridData/OpenDX.py
Line 462 in 818c8da
def write(self, filename): DXclass.write()
GridDataFormats/gridData/OpenDX.py
Line 180 in 818c8da
def write(self,file,optstring="",quote=False): - other minor issues: see comments
for l in in_file: | ||
out_file.write(l) | ||
os.remove(filename) | ||
if ext == '.gz': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The compressed writing should be handled in dx.write()
instead of working on the uncompressed file. I would generally try to avoid any intermediate file writing.
Sorry, I didn't notice on the first round of reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the logic to dx.write but given how it was originally implemented it was a bit tricky. I needed to add few isinstance to check if I needed to write string or bytes depending on the stream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also needed to implement a simple version of your 'openany' to deal with it:
https://github.com/eloyfelix/GridDataFormats/blob/DXGZ_input_format/gridData/OpenDX.py#L486
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I don't particularly like having to repeat the same code everywhere. But that's because of the poor structure of the original code (i.e., my fault ;-) where writing is actually done in each element of the DXfield. Can you think of a better code structure where we only need to do the encoding etc once? E.g., instead of having each element write, return a text representation that is then concatenated at the top level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added write_line function: https://github.com/eloyfelix/GridDataFormats/blob/DXGZ_input_format/gridData/OpenDX.py#L193
It's probably not optimal but it's the best that I could do without doing major codebase changes (which I'm not sure I'm ready to do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your approach is fine, I just added a few minor change requests to clean up the code. Importantly, this is really cleaning up my mess – I would be very grateful if you could do it as part of this PR!
gridData/tests/test_dx.py
Outdated
|
||
def test_read_dxgz(): | ||
g = Grid(datafiles.DXGZ, file_format='DXGZ') | ||
@pytest.mark.parametrize("infile", [datafiles.DX, datafiles.DX+'.gz']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Be explicit and use datafiles.DXGZ
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you also need to
- add a
test.dx.gz
to test files - add
DXGZ
todatafiles/__init__.py
I don't know if one can make a fixture work in which you generate the gzipped file on the fly. I can think of the first part, but not how to use the fixture together with datafiles.DX
in a parameterized test or fixture. But in any case, here's the code I can think of (uses tmpdir_factory because the temporary file should be created in a temporary location but the tmpfile
fixture is not module level)
@pytest.fixture(scope="module")
def dxgz_file(tmpdir_factory, src=datafiles.DX):
basename = os.path.basename(src) + '.gz'
fn = tmpdir_factory.mktemp("compressed").join(basename)
with open(src, 'rb') as inp:
with gzip.open(str(fn), "wb") as out:
out.write(inp.read())
return str(fn)
# use dxgz_file fixture when needed ... although I am actually not sure how
# you use a fixture as part of a parameterized fixture.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added test.dx.gz file and DXGZ to datafiles/init.py but I'm not sure about the fixture...
I'm okay with the changes. |
hope we are getting closer with the latest changes |
LGTM. @orbeckst I leave you to merge it @eloyfelix Nice work. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good in principle, but it would be really nice if you could make some things clearer and cleaner. (The latter is just cleaning up my old crappy code – I would be very grateful if you could do this as part of polishing the PR.) — see comments.
You also need to make sure that your test file is bundled in the python package, so you need to add a pattern to setup.py
near
Lines 41 to 42 in 818c8da
package_data={'gridData': ['tests/datafiles/*.dx', 'tests/datafiles/*.ccp4', | |
'tests/datafiles/*.plt']}, |
"tests/datafiles/*.gz"
.
for l in in_file: | ||
out_file.write(l) | ||
os.remove(filename) | ||
if ext == '.gz': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your approach is fine, I just added a few minor change requests to clean up the code. Importantly, this is really cleaning up my mess – I would be very grateful if you could do it as part of this PR!
Thanks for the review, I believe I did all the proposed changes. The whole thing looks definitely cleaner now. |
Thank you for all the work and extra work addressing the review comments. This is a nice addition! |
Always happy to contribute to nice libraries :)
Thanks for reviewing and merging.
El dj., 7 de nov. 2019, 16:53, Oliver Beckstein <[email protected]>
va escriure:
… Thank you for all the work and extra work addressing the review comments.
This is a nice addition!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#70?email_source=notifications&email_token=AAISB3MC3N46WNXZIAJYYELQSRBW7A5CNFSM4JGYUAA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNCHKQ#issuecomment-551166890>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAISB3IVULRWYCUXACKXDTLQSRBW7ANCNFSM4JGYUAAQ>
.
|
APBS can write gz compressed dx files: https://apbs-pdb2pqr.readthedocs.io/en/latest/apbs/input/elec/write.html
It's quite convenient when dealing with very big grids as it needs less disk space/RAM.
This PR adds DXGZ input format functionality.