Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove save() methods from analysis classes #1745

Closed
13 of 15 tasks
orbeckst opened this issue Dec 18, 2017 · 18 comments
Closed
13 of 15 tasks

remove save() methods from analysis classes #1745

orbeckst opened this issue Dec 18, 2017 · 18 comments
Labels
API Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. persistence
Milestone

Comments

@orbeckst
Copy link
Member

orbeckst commented Dec 18, 2017

Many classes in MDAnalysis.analysis have a save() method as a convenience to persist calculated data such as a timeseries to disk. However, storing data structures to disk in a portable manner is difficult and typical approaches such as using Python pickle are not portable (not even between Python 2 and Python 3). Instead of having to worry about the details of storing data, we should just let the user decide how to store data (e.g., convert to pandas dataframes and let pandas handle all this, or write as a custom text format).

(This issue came up in discussion on PR #1744 (for issue #1743).)

Proposal

Remove save() methods from analysis classes if the method is only a convenience functions. If the saved files can be read in again by the class, e.g., for re-analysis, then a more detailed case-by-case evaluation should be performed.

EDIT: Generally accepted, I rewrote the list of cases below.

approach

  1. survey the list of save methods below: decide which ones should be removed and note here
  2. add a deprecation notice for 0.17.1 (once this is done, update the Milestone to 1.0 instead of closing)
    • docs (add a .. deprecated:: 0.17.1 saying that the save() functionality will be removed in 1.0 because users generally know best how they want to persist the data.
    • DeprecationWarning
  3. add docs that explain which attributes might be worthwhile to save to disk and show an example for how to to this (basically, adding the code from the save() method to the docs)
  4. remove in 1.0

cases

With git grep 'save\w*(self'): Add a checkmark for any method that should be removed.

To be removed

Check methods once they have been removed.

To stay

Check if they are covered by tests

  • MDAnalysis/analysis/encore/utils.py: def savez(self, fname):
  • MDAnalysis/analysis/legacy/x3dna.py: def save(self, filename="x3dna.pickle"):
  • MDAnalysis/analysis/psa.py: def save_paths(self, filename=None):
@orbeckst
Copy link
Member Author

@richardjgowers @kain88-de have we got a nice decorator for deprecating methods?

Can we add a code snippet in the comments that anyone can quickly apply to any method they come across?

@orbeckst
Copy link
Member Author

@sseyler can you please comment on the necessity to keep the save_paths() and save_result() methods in the PSA module?

Are they commonly used?

Are they used in the published tutorials and docs?

@orbeckst
Copy link
Member Author

We are not going to touch anything in legacy, i.e., we will keep x3dna.X3DNA.save().

@kain88-de
Copy link
Member

numpy has a deprecated decorator for functions we use in several places. That should be good enough here as well.

@kain88-de
Copy link
Member

@mtiberti can you comment about the need for savez in encore?

@sseyler
Copy link
Contributor

sseyler commented Dec 19, 2017

For psa.py, the save_result() method can probably be removed since that's just a convenience function for saving a final result (i.e., distance matrix). But this can be done manually easily. The save_paths() method might be worth saving since it's storing intermediate results that can take some time to re-compute (i.e., the paths that have been transformed/fitted prior to measuring path distances).

The paths are saved in their original or (optionally) npz format, so I wouldn't think compatibility is much of an issue—but what do I know? The save_paths() method is also used (by default) in the published tutorial (via the store keyword in generate_paths()).

@mtiberti
Copy link
Contributor

mtiberti commented Dec 20, 2017

The idea behind savez is to have an easy way to write to disk the matrix computed by get_distance_matrix, so that the user can load it from disk when they want to try runs of ces or dres. It writes, together with the array containing the data, some metadata which is used to do a consistency check on the file when it's loaded while initialising a TriangularMatrix with the loadfile option. It is not meant to be a user-friendly or output format (even though it can be opened using numpy as it's a standard numpy npz file), more a way to store a midpoint of the calculation for later reuse. We could always get rid of it, specify a format that the user needs to use when saving arrays for TriangularMatrix and expect the user to conform, however it sounds needlessly complicated and more error-prone. What do you think?

@orbeckst
Copy link
Member Author

orbeckst commented Dec 21, 2017 via email

@orbeckst
Copy link
Member Author

orbeckst commented Dec 21, 2017 via email

@sseyler
Copy link
Contributor

sseyler commented Jan 1, 2018

Can the files from save_paths() read into any of the functions in the PSA module?

Yes, in that save_paths() is responsible for generating the (fitted) DCDs that can be reused directly since they're numbered and can be dumped into a PSA object easily.

Do we have a test that covers it?

Not specifically.

@orbeckst
Copy link
Member Author

orbeckst commented Jan 3, 2018 via email

@richardjgowers richardjgowers modified the milestones: 0.17.0, 1.0 Jan 25, 2018
@orbeckst
Copy link
Member Author

@richardjgowers I think the deprecations need to go into 0.17.x.

@orbeckst orbeckst mentioned this issue Jul 6, 2018
12 tasks
@orbeckst orbeckst added the deprecation Deprecated functionality to give advance warning for API changes. label Aug 12, 2018
@orbeckst orbeckst mentioned this issue Oct 19, 2018
4 tasks
@IAlibay
Copy link
Member

IAlibay commented Jan 29, 2020

@orbeckst unless someone is already working on this one, I'm going to give removing the save() methods a go so that a PR is ready in case we decide to go for v1.0 as per #2443

@IAlibay IAlibay mentioned this issue Jan 29, 2020
6 tasks
@orbeckst
Copy link
Member Author

That would be super-helpful! Thanks @IAlibay !

@xiki-tempula
Copy link
Contributor

The save method in water bridge analysis has already been removed.

@IAlibay IAlibay mentioned this issue Feb 2, 2020
5 tasks
orbeckst pushed a commit that referenced this issue Feb 6, 2020
* Partly addresses #1745 (completes save() removal with PR #2487 and PR #2486 )
* removes save methods
  - removes save method from psa.py
  - also minor change to test_hbonds.py
* deprecates hbond_analysis
  - close #2492
* updates changelog
@orbeckst
Copy link
Member Author

orbeckst commented Mar 5, 2020

@IAlibay when you have a moment, could you check the save methods that were removed in the merged PRs? (If you cannot modify my issue i.e., you cannot click the check marks then let me know, perhaps copy and paste the list into a comment where you can modify). I want to know what needs to be done to close this issue.

@IAlibay
Copy link
Member

IAlibay commented Mar 5, 2020

@orbeckst Unfortunately I don't have modify access (I think the settings are set so that this is a core-dev only thing).

Here is a copy of the list w/ changes:

To be removed

Check methods once they have been removed.

To stay

Check if they are covered by tests

  • MDAnalysis/analysis/encore/utils.py: def savez(self, fname):
    Should be covered by:

assert_equal(triangular_matrix[0,1], triangular_matrix[1,0],
err_msg="Data error in TriangularMatrix: matrix non symmetrical")
triangular_matrix.savez(filename)
triangular_matrix_2 = encore.utils.TriangularMatrix(size = size, loadfile = filename)
assert_equal(triangular_matrix_2[0,1], expected_value,
err_msg="Data error in TriangularMatrix: loaded matrix non symmetrical")

  • MDAnalysis/analysis/legacy/x3dna.py: def save(self, filename="x3dna.pickle"):
    As far as I am aware, there are no tests for x3dna at all. Can add them if we want though.

  • MDAnalysis/analysis/psa.py: def save_paths(self, filename=None):
    Should be covered by the code below, but it's set to xfail. I looked into it more and it looks like the issue is just that np.load isn't setting allow_pickle to True and there's some shenanigans going on with path_names. I'm working on a PR for this.

@pytest.mark.xfail
def test_load(self, psa):
"""Test that the automatically saved files can be loaded"""
expected_path_names = psa.path_names[:]
expected_paths = [p.copy() for p in psa.paths]
psa.save_paths()
psa.load()
assert psa.path_names == expected_path_names
# manually compare paths because
# assert_almost_equal(psa.paths, expected_paths, decimal=6)
# raises a ValueError in the assertion code itself
assert len(psa.paths) == len(expected_paths)
for ipath, (observed, expected) in enumerate(zip(psa.paths, expected_paths)):
assert_almost_equal(observed, expected, decimal=6,
err_msg="loaded path {} does not agree with input".format(ipath))

@orbeckst
Copy link
Member Author

orbeckst commented Mar 5, 2020

Many thanks @IAlibay , I copied the bare information to the top of the issue. Because everything was removed that needed removing I will close the issue. Yay! Thank you so much for doing most of the work!

For anything related, please open new issues.

Specifically

  • x3dna: we cannot test it because it uses an external tool with a non-free licence... it's legacy code and I think it's omitted from coverage
  • PSA: IIRC there was something with the psa save, comments in PR more PSA tests #2049, although PR Addresses failure of test_load() in PSA (#2049) #2239 is supposed to have fixed it (from my cursory reading). If you can get rid of the xfail then by all means do it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. persistence
Projects
None yet
Development

No branches or pull requests

7 participants