read APBS .dx.gz generated grids #70

eloyfelix · 2019-10-30T12:41:27Z

APBS can write gz compressed dx files: https://apbs-pdb2pqr.readthedocs.io/en/latest/apbs/input/elec/write.html
It's quite convenient when dealing with very big grids as it needs less disk space/RAM.

This PR adds DXGZ input format functionality.

eloyfelix · 2019-11-01T11:19:33Z

Travis shows 'pending' state in github but it already finished.

This could be used like this:

from gridData import Grid

grid = Grid("data.dx.gz", file_format='DXGZ')

Is this something that you would merge into master?

Thanks

data.dx.gz

kain88-de · 2019-11-03T21:19:13Z

Hi @eloyfelix this looks like a nice addition. Definitely good for merging. Can you add a test for this and add yourself to the authors?

I wonder if we should just allow this for all files. If it ends in .gz we just assume it is zipped.

codecov · 2019-11-05T17:56:58Z

Codecov Report

Merging #70 into master will increase coverage by 0.3%.
The diff coverage is 100%.

@@            Coverage Diff            @@
##           master      #70     +/-   ##
=========================================
+ Coverage   87.03%   87.34%   +0.3%     
=========================================
  Files           5        5             
  Lines         779      798     +19     
  Branches      113      118      +5     
=========================================
+ Hits          678      697     +19     
  Misses         60       60             
  Partials       41       41

Impacted Files	Coverage Δ
gridData/OpenDX.py	`82.88% <100%> (+0.66%)`	⬆️
gridData/core.py	`94.16% <100%> (+0.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 818c8da...5dbd3e2. Read the comment docs.

codecov · 2019-11-05T18:02:11Z

Codecov Report

Merging #70 into master will increase coverage by 0.27%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master      #70      +/-   ##
==========================================
+ Coverage   87.03%   87.31%   +0.27%     
==========================================
  Files           5        5              
  Lines         779      796      +17     
  Branches      113      115       +2     
==========================================
+ Hits          678      695      +17     
  Misses         60       60              
  Partials       41       41

Impacted Files	Coverage Δ
gridData/OpenDX.py	`82.41% <100%> (+0.19%)`	⬆️
gridData/core.py	`94.33% <100%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 818c8da...40abc1b. Read the comment docs.

eloyfelix · 2019-11-05T18:03:43Z

Added capability to write .dx.gz files and tests

orbeckst · 2019-11-05T22:33:59Z

The functionality is very useful – not sure why I didn't put it in the first place because I gzip my dx files, too.

But I don't like additional kwargs to indicate compression. I agree with @kain88-de that this should be universal and should just be based on the extension.

I wonder if we should just allow this for all files. If it ends in .gz we just assume it is zipped.

We have functionality in MDAnalysis openany() https://github.com/MDAnalysis/mdanalysis/blob/7f82b88f769f3882e44b5121463c93b899ef5b55/package/MDAnalysis/lib/util.py#L265-L454 to do this seamlessly. The problem is that MDA is GPL2 and GridDataFormats is LGPL. My understanding is that we cannot copy code from GPL to LGPL. Instead I have to find out who wrote the code, get everyone who contributed to agree to relicense, and then we can take it.

orbeckst · 2019-11-05T22:35:38Z

@eloyfelix you can either wait and see if we can just take the openany() code, which will just solve the problem, or if you are keen to move forward, change your code so that it looks at the filename extension to decide if it should uncompress or compress. In this way, the API will stay the same even if we add openany() later.

orbeckst · 2019-11-05T22:39:47Z

Relicensing of `openany()` under LGPL

From blame: lib/util.py L265-L454: Do you agree to relicense openany() (see link for exact lines of code) under LGPL? Please reply below (you can also tick the box but there must be a reply with your handle in this thread):

Thanks!

orbeckst

Many thanks for your contribution. As already mentioned in the comments, we would like this to work by filename extension instead of explicit kwargs.

Please change your code to use filename extension for deciding how to read/write the file.
Simplify the tests by running the same tests for compressed and uncompressed files: use pytest.mark.parametrize('filename', [datafiles.DX, datafiles.DXGZ]) and pytest.mark.parametrize('outfile', ["grid.dx", "grid.dx.gz"])
add an entry to CHANGELOG

Many thanks!

gridData/OpenDX.py

gridData/core.py

gridData/tests/datafiles/__init__.py

gridData/tests/test_dx.py

eloyfelix · 2019-11-05T23:11:41Z

Many thanks for your suggestions @orbeckst. I'll take a look at it asap and try to make the changes to better fit to your existing codebase.

eloyfelix · 2019-11-06T00:01:16Z

this should be, at least, a bit closer :P

orbeckst

Thanks for the changes. Some additional things

move the logic for writing compressed files into the field.write() method

GridDataFormats/gridData/OpenDX.py

Line 462 in 818c8da

def write(self, filename):

or rather in the DXclass.write()

GridDataFormats/gridData/OpenDX.py

Line 180 in 818c8da

def write(self,file,optstring="",quote=False):
other minor issues: see comments

orbeckst · 2019-11-06T00:06:25Z

gridData/core.py

-                for l in in_file:
-                    out_file.write(l)
-        os.remove(filename)
+        if ext == '.gz':


The compressed writing should be handled in dx.write() instead of working on the uncompressed file. I would generally try to avoid any intermediate file writing.

Sorry, I didn't notice on the first round of reviews.

I moved the logic to dx.write but given how it was originally implemented it was a bit tricky. I needed to add few isinstance to check if I needed to write string or bytes depending on the stream

Also needed to implement a simple version of your 'openany' to deal with it:

https://github.com/eloyfelix/GridDataFormats/blob/DXGZ_input_format/gridData/OpenDX.py#L486

I see. I don't particularly like having to repeat the same code everywhere. But that's because of the poor structure of the original code (i.e., my fault ;-) where writing is actually done in each element of the DXfield. Can you think of a better code structure where we only need to do the encoding etc once? E.g., instead of having each element write, return a text representation that is then concatenated at the top level.

added write_line function: https://github.com/eloyfelix/GridDataFormats/blob/DXGZ_input_format/gridData/OpenDX.py#L193

It's probably not optimal but it's the best that I could do without doing major codebase changes (which I'm not sure I'm ready to do)

Your approach is fine, I just added a few minor change requests to clean up the code. Importantly, this is really cleaning up my mess – I would be very grateful if you could do it as part of this PR!

orbeckst · 2019-11-06T00:08:13Z

gridData/tests/test_dx.py

-
-def test_read_dxgz():
-    g = Grid(datafiles.DXGZ, file_format='DXGZ')
+@pytest.mark.parametrize("infile", [datafiles.DX, datafiles.DX+'.gz'])


Be explicit and use datafiles.DXGZ

I think you also need to

add a test.dx.gz to test files

add DXGZ to datafiles/__init__.py

I don't know if one can make a fixture work in which you generate the gzipped file on the fly. I can think of the first part, but not how to use the fixture together with datafiles.DX in a parameterized test or fixture. But in any case, here's the code I can think of (uses tmpdir_factory because the temporary file should be created in a temporary location but the tmpfile fixture is not module level)

@pytest.fixture(scope="module") def dxgz_file(tmpdir_factory, src=datafiles.DX): basename = os.path.basename(src) + '.gz' fn = tmpdir_factory.mktemp("compressed").join(basename) with open(src, 'rb') as inp: with gzip.open(str(fn), "wb") as out: out.write(inp.read()) return str(fn) # use dxgz_file fixture when needed ... although I am actually not sure how # you use a fixture as part of a parameterized fixture.

I added test.dx.gz file and DXGZ to datafiles/init.py but I'm not sure about the fixture...

CHANGELOG

gridData/tests/test_dx.py

utkbansal · 2019-11-06T07:35:34Z

I'm okay with the changes.

…atafile

eloyfelix · 2019-11-06T13:53:02Z

hope we are getting closer with the latest changes

gridData/OpenDX.py

kain88-de · 2019-11-06T20:57:17Z

LGTM. @orbeckst I leave you to merge it

@eloyfelix Nice work. Thanks

orbeckst

This is good in principle, but it would be really nice if you could make some things clearer and cleaner. (The latter is just cleaning up my old crappy code – I would be very grateful if you could do this as part of polishing the PR.) — see comments.

You also need to make sure that your test file is bundled in the python package, so you need to add a pattern to setup.py near

GridDataFormats/setup.py

Lines 41 to 42 in 818c8da

    
           package_data={'gridData': ['tests/datafiles/*.dx', 'tests/datafiles/*.ccp4', 
        
                                      'tests/datafiles/*.plt']},

e.g., something like "tests/datafiles/*.gz".

gridData/OpenDX.py

gridData/core.py

orbeckst · 2019-11-07T01:14:16Z

gridData/core.py

-                for l in in_file:
-                    out_file.write(l)
-        os.remove(filename)
+        if ext == '.gz':


Your approach is fine, I just added a few minor change requests to clean up the code. Importantly, this is really cleaning up my mess – I would be very grateful if you could do it as part of this PR!

gridData/tests/test_dx.py

eloyfelix · 2019-11-07T11:21:09Z

Thanks for the review, I believe I did all the proposed changes. The whole thing looks definitely cleaner now.
Hope to see a release soon! :P

orbeckst · 2019-11-07T16:53:01Z

Thank you for all the work and extra work addressing the review comments. This is a nice addition!

eloyfelix · 2019-11-07T17:03:03Z

Always happy to contribute to nice libraries :) Thanks for reviewing and merging. El dj., 7 de nov. 2019, 16:53, Oliver Beckstein <[email protected]> va escriure:

…

Thank you for all the work and extra work addressing the review comments. This is a nice addition! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#70?email_source=notifications&email_token=AAISB3MC3N46WNXZIAJYYELQSRBW7A5CNFSM4JGYUAA2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDNCHKQ#issuecomment-551166890>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAISB3IVULRWYCUXACKXDTLQSRBW7ANCNFSM4JGYUAAQ> .

read APBS .dx.gz generated grids

a3919c7

add DXGZ export capability and tests

40abc1b

orbeckst requested changes Nov 5, 2019

View reviewed changes

code review fixes

2e9b0b2

eloyfelix added 2 commits November 6, 2019 00:05

update CHANGELOG

e838aac

update CHANGELOG

5ab7f2e

eloyfelix requested a review from orbeckst November 6, 2019 00:06

keep DXGZ datafile...

1989629

orbeckst requested changes Nov 6, 2019

View reviewed changes

fixed CHANGELOG, compressed writing moved to dx.write(), added DXGZ d…

cbdbc36

…atafile

orbeckst reviewed Nov 6, 2019

View reviewed changes

gridData/OpenDX.py Outdated Show resolved Hide resolved

orbeckst reviewed Nov 6, 2019

View reviewed changes

gridData/OpenDX.py Show resolved Hide resolved

add write_line function

fbfd0b8

kain88-de approved these changes Nov 6, 2019

View reviewed changes

orbeckst requested changes Nov 7, 2019

View reviewed changes

orbeckst self-assigned this Nov 7, 2019

some cleanup

32b6e24

some cleanup

0f4b6c7

some cleanup

5dbd3e2

orbeckst approved these changes Nov 7, 2019

View reviewed changes

orbeckst merged commit 37e6d0c into MDAnalysis:master Nov 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read APBS .dx.gz generated grids #70

read APBS .dx.gz generated grids #70

eloyfelix commented Oct 30, 2019 •

edited

Loading

eloyfelix commented Nov 1, 2019

kain88-de commented Nov 3, 2019

codecov bot commented Nov 5, 2019 •

edited

Loading

codecov bot commented Nov 5, 2019

eloyfelix commented Nov 5, 2019 •

edited

Loading

orbeckst commented Nov 5, 2019

orbeckst commented Nov 5, 2019

orbeckst commented Nov 5, 2019 •

edited

Loading

orbeckst left a comment

eloyfelix commented Nov 5, 2019 •

edited

Loading

eloyfelix commented Nov 6, 2019

orbeckst left a comment

orbeckst Nov 6, 2019

eloyfelix Nov 6, 2019

eloyfelix Nov 6, 2019

orbeckst Nov 6, 2019

eloyfelix Nov 6, 2019

orbeckst Nov 7, 2019

orbeckst Nov 6, 2019

orbeckst Nov 6, 2019

eloyfelix Nov 6, 2019

utkbansal commented Nov 6, 2019

eloyfelix commented Nov 6, 2019

kain88-de commented Nov 6, 2019

orbeckst left a comment

orbeckst Nov 7, 2019

eloyfelix commented Nov 7, 2019

orbeckst commented Nov 7, 2019

eloyfelix commented Nov 7, 2019 via email

	package_data={'gridData': ['tests/datafiles/.dx', 'tests/datafiles/.ccp4',
	'tests/datafiles/*.plt']},

read APBS .dx.gz generated grids #70

read APBS .dx.gz generated grids #70

Conversation

eloyfelix commented Oct 30, 2019 • edited Loading

eloyfelix commented Nov 1, 2019

kain88-de commented Nov 3, 2019

codecov bot commented Nov 5, 2019 • edited Loading

Codecov Report

codecov bot commented Nov 5, 2019

Codecov Report

eloyfelix commented Nov 5, 2019 • edited Loading

orbeckst commented Nov 5, 2019

orbeckst commented Nov 5, 2019

orbeckst commented Nov 5, 2019 • edited Loading

Relicensing of openany() under LGPL

orbeckst left a comment

Choose a reason for hiding this comment

eloyfelix commented Nov 5, 2019 • edited Loading

eloyfelix commented Nov 6, 2019

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

utkbansal commented Nov 6, 2019

eloyfelix commented Nov 6, 2019

kain88-de commented Nov 6, 2019

orbeckst left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eloyfelix commented Nov 7, 2019

orbeckst commented Nov 7, 2019

eloyfelix commented Nov 7, 2019 via email

eloyfelix commented Oct 30, 2019 •

edited

Loading

codecov bot commented Nov 5, 2019 •

edited

Loading

eloyfelix commented Nov 5, 2019 •

edited

Loading

orbeckst commented Nov 5, 2019 •

edited

Loading

Relicensing of `openany()` under LGPL

eloyfelix commented Nov 5, 2019 •

edited

Loading