Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters not saved in Analysis classes #2206

Closed
PicoCentauri opened this issue Feb 17, 2019 · 21 comments
Closed

Parameters not saved in Analysis classes #2206

PicoCentauri opened this issue Feb 17, 2019 · 21 comments
Labels
close? Evaluate if issue/PR is stale and can be closed. Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. Difficulty-easy maintainability usability

Comments

@PicoCentauri
Copy link
Contributor

PicoCentauri commented Feb 17, 2019

In all analysis classes the parameters start, stop, step and verbose are not saved if they are changed after the object was run once.

After looking into the code I found that for the start, stop and step parameters this is an issue due to the fix of the old deprecated syntax. For the verbose one just have to save this in the _verbose attribute of the object. All these things are easy to fix and I could provide a patch.

Code to reproduce the behavior

from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysis.tests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)
ldens = LinearDensity(u.atoms, verbose=True)

ldens.run(start=0, stop=10)
print(ldens.start, ldens.stop)

ldens.run(start=0, stop=5, verbose=False)
print(ldens.start, ldens.stop)

Currently version of MDAnalysis

Python 3.6, mda 0.19.3, MacOS

@richardjgowers
Copy link
Member

Yeah we used to save them on __init__ and use this during run(). We could save the last used values in run() I guess.

@PicoCentauri
Copy link
Contributor Author

PicoCentauri commented Feb 17, 2019

Yes. I would just add a simple test in _setup_frames. See PR #2207.

@orbeckst
Copy link
Member

I don't think we should save them. That's a left over when we set them in init. When providing parameters to run I would expect that they only apply to this invocation.

@PicoCentauri
Copy link
Contributor Author

PicoCentauri commented Feb 18, 2019

On the one hand I could imagine that sometimes its useful to access this information i.e. at what frame the analysis started. On the other the user already put the information in so maybe its superfluous.

@zemanj
Copy link
Member

zemanj commented Feb 18, 2019

@orbeckst

I don't think we should save them. That's a left over when we set them in init. When providing parameters to run I would expect that they only apply to this invocation.

I agree. Since the start/stop/step parameters are not passed to the initializer, they shouldn't be reused. The run() method should always work as documented, including its kwarg defaults. Permanently storing and reusing the parameters would invalidate the documented signature.

@PicoCentauri

On the other the user already put the information in so maybe its superfluous.

IMHO it is.

@PicoCentauri
Copy link
Contributor Author

@zemanj I'm also fine with not storing them if this consistent with what is in the documentation. The total number of analysed frames is saved in any case. But this means to drop the old code completly, right?

@orbeckst
Copy link
Member

orbeckst commented Feb 18, 2019

Yes, any old code that stores this information should go away. The whole verbose thing is/was deprecated, I think. Not sure when it ought to get officially removed.

EDIT: 2020-02-05: verbose in constructor and run() conforms to the new API as required by #1463

@PicoCentauri
Copy link
Contributor Author

I see. I quickly checked the usage of verbose and it seems that it is not used anymore by the modules. We could also kick this out...

@richardjgowers
Copy link
Member

@PicoCentauri I think verbose is deprecated, it just needs finally removing (eventually). It's frustrating, but the idea is that old scripts (which use verbose) shouldn't break overnight because people have updated. I think we're planning on ripping out most of the tagged deprecations for the 1.0 release.

WRT start/stop/step, these parameters should be saved in the .results attribute, but it's not the base class' job to do this. Ie an individual implementation should be labelling frames when it's appropriate.

@orbeckst
Copy link
Member

WRT start/stop/step, these parameters should be saved in the .results attribute

Under which circumstances is it useful to safe start/stop/step? I don't think that there's an expectation that run() knows how to reproduce the last run?

As long as the result contains e.g. the times or the frame numbers for a time series calculation, this should be fine. Or am I missing something?

@richardjgowers
Copy link
Member

@orbeckst I run some analysis on frames 1-10, pickle the Analysis class object (containing the results), unpickle it a month later and don't know what frames it was ran on? Or more generally, it'd be nice if the object had full provenance of the results?

@orbeckst
Copy link
Member

I don't like pickling analysis class objects as a form of data storage – that's just abuse of the pickle format. I wouldn't want to encourage this approach.

For timeseries it is eas(ier) to see what data you used. For anything that averages or does statistics this is harder and their I can see a valid reason for storing a metadata dict with, e.g., number of frames analyzed etc. because you would need to report this in a paper. But then this can be up to the individual class to do it.

Thinking about it, you summarized it succinctly

WRT start/stop/step, these parameters should be saved in the .results attribute, but it's not the base class' job to do this. Ie an individual implementation should be labelling frames when it's appropriate.

@richardjgowers
Copy link
Member

Ok, then I think this issue is solved by recommending in the AnalysisBase docs that frame indices are stored for reproducibility.

@PicoCentauri
Copy link
Contributor Author

Okay I’m fine with storing the information in the results attribute. Unfortunately, only changing the docs this will not resolve the issue I think. We still have to change the code in such a way that only the current parameters of the .run method are used. I also don’t see an „easy“ and clear way for an individual class to get acces to the information for saving start/stop/step in the results section if its not stored in object itself...

@orbeckst
Copy link
Member

I think you ever only need to store anything if the class does not store the frames or times.

In my opinion, the default should be that start/stop/step are not stored. If a class has a special need to store this information then it has to store it .results (or wherever it wants to) and document it. It is then up to the user to store this information in a permanent way. We also deprecated and removed typical "save()" methods because the user knows best how to archive their data. That's not a decision that we can make – we tried and then ran into pickle hell etc.

@zemanj
Copy link
Member

zemanj commented Feb 28, 2019

@PicoCentauri

We still have to change the code in such a way that only the current parameters of the .run method are used. I also don’t see an „easy“ and clear way for an individual class to get acces to the information for saving start/stop/step in the results section if its not stored in object itself...

In pre- or postprocessing, that will only be possible with an awful introspection hack using inspect.getframeinfo and friends in order to access data of AnalysisBase.run() from the _prepare() or _conclude() methods of the derived class. In other words: I'd strongly discourage anyone from doing that.
However, it is easily possible to access frame information from within the derived class' _single_frame() method by examining self._frame_index or self._tson the go. Therefore I'm pretty sure we don't need to change existing code for that.

Anyway, I don't see the value of gathering frame information during executions of _single_frame() if it is not required by the particular analysis to have that data available for the computations it performs.
After all, frame indices are worth nothing if they're not stored along with trajectory/topology file paths and, in almost any case, many additional parameters. In the end, reproducibility can only be achieved by the user via storing all input and output data, simulation software, analysis scripts, the commands which were used to call the software/scripts, and so on. Storing start/stop/step in the results of an analysis class IMHO doesn't really contribute a great deal here.

So... any objections against closing this?

@zemanj
Copy link
Member

zemanj commented Feb 28, 2019

Maybe we do need to change something because @PicoCentauri's LinearDensity example is definitely confusing. My suggestion:

  • make self.start, self.stop, self.step, self.n_frames "private" by prepending an underscore
  • Until the 1.0 release, add @propertys start, stop, etc. returning their private counterparts (no setters, only getters!) and deprecate these properties. The respective DeprecationWarning should also inform the user that the returned values no longer correspond to the last used ones.
  • Adjust AnalysisBase.__init__() so that start/stop/step are stored in the respective private variables. Also make sure that the DeprecationWarning is raised even if any of the deprecated kwargs is None (currently missing).

@zemanj zemanj added usability maintainability Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. Difficulty-easy labels Feb 28, 2019
@Purva-Chaudhari Purva-Chaudhari mentioned this issue Mar 17, 2020
4 tasks
@lilyminium lilyminium added the close? Evaluate if issue/PR is stale and can be closed. label May 20, 2020
@lilyminium
Copy link
Member

@IAlibay does #2505 fix this or does it require #2525 as well?

@lilyminium lilyminium mentioned this issue Jun 5, 2020
6 tasks
@IAlibay
Copy link
Member

IAlibay commented Jun 5, 2020

Ok, so an update on this. In part because we've removed setting start/stop/step from __init__ (I think some other PR altered the behaviour but I can't work out exactly which), the behaviour here has changed.

Currently the way it seems to behave is that self.start, self.stop, and self.step are stored each time run() is called, but then get overwritten the next time that run() gets called. Anything stored in self.start/self.stop/self.step doesn't get re-used (verbose defaults to whatever was set in __init__).

So in version 0.21.1 you'd get:

import MDAnalysis as mda
from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysisTests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)

ldens = LinearDensity(u.atoms, verbose=True)
ldens.run(start=0, stop=10) # pm on
print(ldens.start, ldens.stop) # 0 10
ldens.run(start=0, stop=5, verbose=False) # pm on
print(ldens.start, ldens.stop) # 0 10

ldens2 = LinearDensity(u.atoms, verbose=True)
ldens2.run(start=0, stop=5, verbose=False) # pm on
print(ldens2.start, ldens2.stop) # 0 5
ldens.run(start=0, stop=10) # pm on
print(ldens2.start, ldens2.stop) # 0 5

And in the current develop version you get:

import MDAnalysis as mda
from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysisTests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)

ldens = LinearDensity(u.atoms, verbose=True)
ldens.run(start=0, stop=10) # pm on
print(ldens.start, ldens.stop) # 0 10
ldens.run(start=0, stop=5, verbose=False) # pm off
print(ldens.start, ldens.stop) # 0 5

ldens2 = LinearDensity(u.atoms, verbose=True)
ldens2.run(start=0, stop=5, verbose=False) # pm off
print(ldens2.start, ldens2.stop) # 0 5
ldens.run(start=0, stop=10) # pm on
print(ldens2.start, ldens2.stop) # 0 10

Based on these changes, I'm not sure where everyone's opinions on this issue stand. We could switch to @zemanj's proposal: #2206 (comment), making "private" variables, but I'm not sure if it's still necessary.

We'd probably want a decision here prior to v1.0, so pinging @MDAnalysis/coredevs here.

@orbeckst
Copy link
Member

orbeckst commented Jun 5, 2020

The current (develop, second example) behavior looks sensible to me. That's exactly what it should be doing, shouldn't it? The run() method acts like a function that uses init's verbose as default and always runs with the values of start, stop, step given to the function; it just stores all its results in the instance instead of returning them.

With the current AnalysisBase, we store start/stop/step

start, stop, step = trajectory.check_slice_indices(start, stop, step)
self.start = start
self.stop = stop
self.step = step
as instance variables so in principle any code who really wants to know start/stop/step to save in results can access self in _conclude().

I think we can just close this issue as fixed by #2505 and the changes to verbose... but I am feeling that I am overlooking something obvious. Additional 👀 are welcome!

(I kind of like #2206 (comment) to make self.start, self.stop, self.step read-only to emphasize the point that they are "internal" but visible for user code if neccessary but that's not really relevant for the discussion and could be done any time.)

@orbeckst
Copy link
Member

Nobody seems to think that anything else needs doing here, so I am closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
close? Evaluate if issue/PR is stale and can be closed. Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. Difficulty-easy maintainability usability
Projects
None yet
Development

No branches or pull requests

6 participants