Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collective analysis of different analysis classes on the same trajectory #3569

Open
PicoCentauri opened this issue Mar 21, 2022 · 6 comments · May be fixed by #4017
Open

Collective analysis of different analysis classes on the same trajectory #3569

PicoCentauri opened this issue Mar 21, 2022 · 6 comments · May be fixed by #4017
Assignees

Comments

@PicoCentauri
Copy link
Contributor

PicoCentauri commented Mar 21, 2022

Disclaimer: I thought there was an issue in the past with this idea but I was not able to find it. Sorry if the following idea is
a duplication

Is your feature request related to a problem?

If one wants to run several analyses on the same trajectory one has to loop through the trajectory several times. For example, one naive way of analyzing two RDFs is the following:

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF
from MDAnalysis.analysis.base import AnalysisCollection

# Load simulation results
u = mda.Universe('topol.tpr','traj.trr')

# Select atoms
O = u.select_atoms('name O')
H = u.select_atoms('name H')

# Create individual analysis objects
rdf_OO = InterRDF(O,O)
rdf_OH = InterRDF(O,H)

# Run the individual analysis
rdf_OO.run(start=0, end=100, step=10)
rdf_OH.run(start=0, end=100, step=10)

# Results are stored in indivial objects
print(rdf_OO.results)
print(rdf_OH.results)

I know that there is the InterRDF_s class which exactly does this job. But, the problem of a collective analysis loop holds for any analysis class and I only chose the RDF since it a straightforward example. In addition, a collected Analysis could make InterRDF_s superfluous.

Describe the solution you'd like

I would introduce an AnalysisCollection(*analysis_object) class. When creating an object one can register several
analysis_objects as arguments. The run method of AnalysisCollection loops through the
trajectory and calls the single_frame method of each analysis_object in every frame. Results are stored in each method.
The code from above will simplify to

import MDAnalysis as mda
from MDAnalysis.analysis.rdf import InterRDF
from MDAnalysis.analysis.base import AnalysisCollection

# Load simulation results
u = mda.Universe('topol.tpr','traj.trr')

# Select atoms
O = u.select_atoms('name O')
H = u.select_atoms('name H')

# Create individual analysis objects
rdf_OO = InterRDF(O,O)
rdf_OH = InterRDF(O,H)

# Create collection for common trajectory loop
collection = AnalysisCollection(rdf_OO, rdf_OH)

# Run the collected analysis
collection.run(start=0, end=100, step=10)

# Results are stored in indivial objects
print(rdf_OO.results)
print(rdf_OH.results)

I think this is approach is fairly easy to implement since the managing of the data remains in the name space of each analysis_object .

Describe alternatives you've considered

Keep everything as it is.

@IAlibay
Copy link
Member

IAlibay commented Mar 21, 2022

We had a discussion about this some time ago, I thought I had a PR open about doing an analysis collection somewhere but I might have closed it or just never gotten around to it.

I can re-open it if we want to come back to this.

@PicoCentauri
Copy link
Contributor Author

@MDAnalysis/coredevs are there any opinions or ideas? I will work on this if we want to have this and decide on a strategy.

@IAlibay
Copy link
Member

IAlibay commented May 16, 2022

I was going to open that PR but I'm swamped with 2.2.0 edit: so I guess at this point I'm going to have to give up on it..

I'd appreciate it if we did that (releasing 2.2.0 and all the associated work around it) as a priority.

@richardjgowers
Copy link
Member

@PicoCentauri Sorry I missed this the first time round. This is a great idea, and I am anticipating that this pattern of only looping through the trajectory once will be much faster than two loops. Problems I can foresee:

  • this assumes that each Analysis class has the "base" run method (can check this through inspection)
  • this assumes that an analysis method doesn't alter the Timestep (or Universe). This is harder to check, but you could copy Timestep and "restore" it between different Analysis classes' _single_frame. E.g. if AnalysisClass 1 rotates the system as part of its analysis, then AnalysisClass2 is getting a different input than expected.

I think somewhere we talked about having an __add__ method even for combining classes (this seems like a bad idea now), but maybe going back through the original design discussions of AnalysisBase would be useful.

@PicoCentauri
Copy link
Contributor Author

No problem! Your second point is indeed something I haven't thought about. However, altering the timestep could be something one wants to do. E.g. if AnalysisClass 1 unwraps the systems, which could be a costly operation, AnalysisClass 2 does not have to do it again.

I am in favor of not having an __add__ method. The AnalysisBase is clean and looks more or less structured right now.

@PicoCentauri
Copy link
Contributor Author

@IAlibay if you still have your code it would be nice starting point.

@PicoCentauri PicoCentauri linked a pull request Jan 30, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants