Parameters not saved in Analysis classes #2206

PicoCentauri · 2019-02-17T20:06:37Z

In all analysis classes the parameters start, stop, step and verbose are not saved if they are changed after the object was run once.

After looking into the code I found that for the start, stop and step parameters this is an issue due to the fix of the old deprecated syntax. For the verbose one just have to save this in the _verbose attribute of the object. All these things are easy to fix and I could provide a patch.

Code to reproduce the behavior

from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysis.tests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)
ldens = LinearDensity(u.atoms, verbose=True)

ldens.run(start=0, stop=10)
print(ldens.start, ldens.stop)

ldens.run(start=0, stop=5, verbose=False)
print(ldens.start, ldens.stop)

Currently version of MDAnalysis

Python 3.6, mda 0.19.3, MacOS

The text was updated successfully, but these errors were encountered:

richardjgowers · 2019-02-17T20:11:43Z

Yeah we used to save them on __init__ and use this during run(). We could save the last used values in run() I guess.

PicoCentauri · 2019-02-17T20:38:41Z

Yes. I would just add a simple test in _setup_frames. See PR #2207.

orbeckst · 2019-02-18T02:58:45Z

I don't think we should save them. That's a left over when we set them in init. When providing parameters to run I would expect that they only apply to this invocation.

PicoCentauri · 2019-02-18T08:24:25Z

On the one hand I could imagine that sometimes its useful to access this information i.e. at what frame the analysis started. On the other the user already put the information in so maybe its superfluous.

zemanj · 2019-02-18T10:24:04Z

@orbeckst

I don't think we should save them. That's a left over when we set them in init. When providing parameters to run I would expect that they only apply to this invocation.

I agree. Since the start/stop/step parameters are not passed to the initializer, they shouldn't be reused. The run() method should always work as documented, including its kwarg defaults. Permanently storing and reusing the parameters would invalidate the documented signature.

@PicoCentauri

On the other the user already put the information in so maybe its superfluous.

IMHO it is.

PicoCentauri · 2019-02-18T14:01:40Z

@zemanj I'm also fine with not storing them if this consistent with what is in the documentation. The total number of analysed frames is saved in any case. But this means to drop the old code completly, right?

orbeckst · 2019-02-18T17:33:37Z

Yes, any old code that stores this information should go away. ~~The whole verbose thing is/was deprecated, I think. Not sure when it ought to get officially removed.~~

EDIT: 2020-02-05: verbose in constructor and run() conforms to the new API as required by #1463

PicoCentauri · 2019-02-19T18:28:56Z

I see. I quickly checked the usage of verbose and it seems that it is not used anymore by the modules. We could also kick this out...

richardjgowers · 2019-02-25T17:59:17Z

@PicoCentauri I think verbose is deprecated, it just needs finally removing (eventually). It's frustrating, but the idea is that old scripts (which use verbose) shouldn't break overnight because people have updated. I think we're planning on ripping out most of the tagged deprecations for the 1.0 release.

WRT start/stop/step, these parameters should be saved in the .results attribute, but it's not the base class' job to do this. Ie an individual implementation should be labelling frames when it's appropriate.

orbeckst · 2019-02-25T18:15:51Z

WRT start/stop/step, these parameters should be saved in the .results attribute

Under which circumstances is it useful to safe start/stop/step? I don't think that there's an expectation that run() knows how to reproduce the last run?

As long as the result contains e.g. the times or the frame numbers for a time series calculation, this should be fine. Or am I missing something?

richardjgowers · 2019-02-25T18:17:50Z

@orbeckst I run some analysis on frames 1-10, pickle the Analysis class object (containing the results), unpickle it a month later and don't know what frames it was ran on? Or more generally, it'd be nice if the object had full provenance of the results?

orbeckst · 2019-02-25T18:25:38Z

I don't like pickling analysis class objects as a form of data storage – that's just abuse of the pickle format. I wouldn't want to encourage this approach.

For timeseries it is eas(ier) to see what data you used. For anything that averages or does statistics this is harder and their I can see a valid reason for storing a metadata dict with, e.g., number of frames analyzed etc. because you would need to report this in a paper. But then this can be up to the individual class to do it.

Thinking about it, you summarized it succinctly

WRT start/stop/step, these parameters should be saved in the .results attribute, but it's not the base class' job to do this. Ie an individual implementation should be labelling frames when it's appropriate.

richardjgowers · 2019-02-25T18:29:29Z

Ok, then I think this issue is solved by recommending in the AnalysisBase docs that frame indices are stored for reproducibility.

PicoCentauri · 2019-02-25T21:35:49Z

Okay I’m fine with storing the information in the results attribute. Unfortunately, only changing the docs this will not resolve the issue I think. We still have to change the code in such a way that only the current parameters of the .run method are used. I also don’t see an „easy“ and clear way for an individual class to get acces to the information for saving start/stop/step in the results section if its not stored in object itself...

orbeckst · 2019-02-25T21:39:31Z

I think you ever only need to store anything if the class does not store the frames or times.

In my opinion, the default should be that start/stop/step are not stored. If a class has a special need to store this information then it has to store it .results (or wherever it wants to) and document it. It is then up to the user to store this information in a permanent way. We also deprecated and removed typical "save()" methods because the user knows best how to archive their data. That's not a decision that we can make – we tried and then ran into pickle hell etc.

zemanj · 2019-02-28T17:01:52Z

@PicoCentauri

We still have to change the code in such a way that only the current parameters of the .run method are used. I also don’t see an „easy“ and clear way for an individual class to get acces to the information for saving start/stop/step in the results section if its not stored in object itself...

In pre- or postprocessing, that will only be possible with an awful introspection hack using inspect.getframeinfo and friends in order to access data of AnalysisBase.run() from the _prepare() or _conclude() methods of the derived class. In other words: I'd strongly discourage anyone from doing that.
However, it is easily possible to access frame information from within the derived class' _single_frame() method by examining self._frame_index or self._tson the go. Therefore I'm pretty sure we don't need to change existing code for that.

Anyway, I don't see the value of gathering frame information during executions of _single_frame() if it is not required by the particular analysis to have that data available for the computations it performs.
After all, frame indices are worth nothing if they're not stored along with trajectory/topology file paths and, in almost any case, many additional parameters. In the end, reproducibility can only be achieved by the user via storing all input and output data, simulation software, analysis scripts, the commands which were used to call the software/scripts, and so on. Storing start/stop/step in the results of an analysis class IMHO doesn't really contribute a great deal here.

~~So... any objections against closing this?~~

zemanj · 2019-02-28T17:39:09Z

Maybe we do need to change something because @PicoCentauri's LinearDensity example is definitely confusing. My suggestion:

make self.start, self.stop, self.step, self.n_frames "private" by prepending an underscore
Until the 1.0 release, add @propertys start, stop, etc. returning their private counterparts (no setters, only getters!) and deprecate these properties. The respective DeprecationWarning should also inform the user that the returned values no longer correspond to the last used ones.
Adjust AnalysisBase.__init__() so that start/stop/step are stored in the respective private variables. Also make sure that the DeprecationWarning is raised even if any of the deprecated kwargs is None (currently missing).

lilyminium · 2020-05-20T02:15:41Z

@IAlibay does #2505 fix this or does it require #2525 as well?

IAlibay · 2020-06-05T22:05:29Z

Ok, so an update on this. In part because we've removed setting start/stop/step from __init__ (I think some other PR altered the behaviour but I can't work out exactly which), the behaviour here has changed.

Currently the way it seems to behave is that self.start, self.stop, and self.step are stored each time run() is called, but then get overwritten the next time that run() gets called. Anything stored in self.start/self.stop/self.step doesn't get re-used (verbose defaults to whatever was set in __init__).

So in version 0.21.1 you'd get:

import MDAnalysis as mda
from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysisTests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)

ldens = LinearDensity(u.atoms, verbose=True)
ldens.run(start=0, stop=10) # pm on
print(ldens.start, ldens.stop) # 0 10
ldens.run(start=0, stop=5, verbose=False) # pm on
print(ldens.start, ldens.stop) # 0 10

ldens2 = LinearDensity(u.atoms, verbose=True)
ldens2.run(start=0, stop=5, verbose=False) # pm on
print(ldens2.start, ldens2.stop) # 0 5
ldens.run(start=0, stop=10) # pm on
print(ldens2.start, ldens2.stop) # 0 5

And in the current develop version you get:

import MDAnalysis as mda
from MDAnalysis.analysis.lineardensity import LinearDensity
from MDAnalysisTests.datafiles import PSF_TRICLINIC, DCD_TRICLINIC

u = mda.Universe(PSF_TRICLINIC, DCD_TRICLINIC)

ldens = LinearDensity(u.atoms, verbose=True)
ldens.run(start=0, stop=10) # pm on
print(ldens.start, ldens.stop) # 0 10
ldens.run(start=0, stop=5, verbose=False) # pm off
print(ldens.start, ldens.stop) # 0 5

ldens2 = LinearDensity(u.atoms, verbose=True)
ldens2.run(start=0, stop=5, verbose=False) # pm off
print(ldens2.start, ldens2.stop) # 0 5
ldens.run(start=0, stop=10) # pm on
print(ldens2.start, ldens2.stop) # 0 10

Based on these changes, I'm not sure where everyone's opinions on this issue stand. We could switch to @zemanj's proposal: #2206 (comment), making "private" variables, but I'm not sure if it's still necessary.

We'd probably want a decision here prior to v1.0, so pinging @MDAnalysis/coredevs here.

orbeckst · 2020-06-05T22:40:43Z

The current (develop, second example) behavior looks sensible to me. That's exactly what it should be doing, shouldn't it? The run() method acts like a function that uses init's verbose as default and always runs with the values of start, stop, step given to the function; it just stores all its results in the instance instead of returning them.

With the current AnalysisBase, we store start/stop/step

mdanalysis/package/MDAnalysis/analysis/base.py

Lines 141 to 144 in 8d7d77f

    
           start, stop, step = trajectory.check_slice_indices(start, stop, step) 
        
           self.start = start 
        
           self.stop = stop 
        
           self.step = step

as instance variables so in principle any code who really wants to know start/stop/step to save in results can access self in _conclude().

I think we can just close this issue as fixed by #2505 and the changes to verbose... but I am feeling that I am overlooking something obvious. Additional 👀 are welcome!

(I kind of like #2206 (comment) to make self.start, self.stop, self.step read-only to emphasize the point that they are "internal" but visible for user code if neccessary but that's not really relevant for the discussion and could be done any time.)

orbeckst · 2020-08-16T06:56:55Z

Nobody seems to think that anything else needs doing here, so I am closing.

PicoCentauri mentioned this issue Feb 17, 2019

Fixed parameters not safed in Analysis class #2207

Closed

4 tasks

zemanj added usability maintainability Component-Analysis deprecation Deprecated functionality to give advance warning for API changes. Difficulty-easy labels Feb 28, 2019

lilyminium mentioned this issue Feb 5, 2020

verbose not set in analysis.AnalysisBase.run #2504

Closed

Purva-Chaudhari mentioned this issue Mar 17, 2020

Setting Verbose #2631

Closed

4 tasks

lilyminium added the close? Evaluate if issue/PR is stale and can be closed. label May 20, 2020

lilyminium mentioned this issue Jun 5, 2020

Release 1.0 #2443

Closed

6 tasks

orbeckst closed this as completed Aug 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameters not saved in Analysis classes #2206

Parameters not saved in Analysis classes #2206

PicoCentauri commented Feb 17, 2019 •

edited

Loading

richardjgowers commented Feb 17, 2019

PicoCentauri commented Feb 17, 2019 •

edited

Loading

orbeckst commented Feb 18, 2019

PicoCentauri commented Feb 18, 2019 •

edited

Loading

zemanj commented Feb 18, 2019

PicoCentauri commented Feb 18, 2019

orbeckst commented Feb 18, 2019 •

edited

Loading

PicoCentauri commented Feb 19, 2019

richardjgowers commented Feb 25, 2019

orbeckst commented Feb 25, 2019

richardjgowers commented Feb 25, 2019

orbeckst commented Feb 25, 2019

richardjgowers commented Feb 25, 2019

PicoCentauri commented Feb 25, 2019

orbeckst commented Feb 25, 2019

zemanj commented Feb 28, 2019 •

edited

Loading

zemanj commented Feb 28, 2019 •

edited

Loading

lilyminium commented May 20, 2020

IAlibay commented Jun 5, 2020

orbeckst commented Jun 5, 2020

orbeckst commented Aug 16, 2020

Parameters not saved in Analysis classes #2206

Parameters not saved in Analysis classes #2206

Comments

PicoCentauri commented Feb 17, 2019 • edited Loading

richardjgowers commented Feb 17, 2019

PicoCentauri commented Feb 17, 2019 • edited Loading

orbeckst commented Feb 18, 2019

PicoCentauri commented Feb 18, 2019 • edited Loading

zemanj commented Feb 18, 2019

PicoCentauri commented Feb 18, 2019

orbeckst commented Feb 18, 2019 • edited Loading

PicoCentauri commented Feb 19, 2019

richardjgowers commented Feb 25, 2019

orbeckst commented Feb 25, 2019

richardjgowers commented Feb 25, 2019

orbeckst commented Feb 25, 2019

richardjgowers commented Feb 25, 2019

PicoCentauri commented Feb 25, 2019

orbeckst commented Feb 25, 2019

zemanj commented Feb 28, 2019 • edited Loading

zemanj commented Feb 28, 2019 • edited Loading

lilyminium commented May 20, 2020

IAlibay commented Jun 5, 2020

orbeckst commented Jun 5, 2020

orbeckst commented Aug 16, 2020

PicoCentauri commented Feb 17, 2019 •

edited

Loading

PicoCentauri commented Feb 17, 2019 •

edited

Loading

PicoCentauri commented Feb 18, 2019 •

edited

Loading

orbeckst commented Feb 18, 2019 •

edited

Loading

zemanj commented Feb 28, 2019 •

edited

Loading

zemanj commented Feb 28, 2019 •

edited

Loading