Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow analysis classes to run on arbitrary selections of frames #1985

Open
jbarnoud opened this issue Jul 13, 2018 · 13 comments
Open

Allow analysis classes to run on arbitrary selections of frames #1985

jbarnoud opened this issue Jul 13, 2018 · 13 comments

Comments

@jbarnoud
Copy link
Contributor

It would be convenient to be able to run some analyses on an arbitrary set of frames. I recently encounter a case where I wanted to build an RSMD matrix with mda.analysis.diffusionmap.DistanceMatrix on a selection of frame I obtained from clustering. For now, the only ways I could find to do that was to either write a new trajectory (on file or in memory), or to hack a SelectionReader. Adding a frames argument to run alonside start, stop, and step (see #1979) would allow to pass an array of frame indices, or a mask, or an already indexed trajectory using #1934. The interface would be the same as for ag.write.

# API as envisioned

u = mda.Universe(TPR, XTC)
ag = u.select_atoms(...)

labels = my_clustering_method(ag)  # returns np.array( [1, 1, 0, 0, 2, ...])
frames_group0 = u.trajectory[labels == 0]

mat0 = MySuperAnalysis(ag).run(frames=frames_group0)
# or
mat0 = MySuperAnalysis(ag).run(frames=(labels == 0))

# Should prevent non-sensical calls such as mixing `frames` with `start`, `stop`, `step`.
mat0 = MySuperAnalysis(ag).run(frames=frames_group0, step=3)  # ValueError

Some analyses rely on the time step to be constant, so the use of the frames must be limited for these analyses. An slices trajectory can be used, but not an indexed one. In these case, a NotImplementedError should be raised. This could be achieved by using a class attribute to switch off arbitrary frame selections:

class MyConstentDTAnalysis(AnalysisBase):
    needs_constant_dt = True

   ...
@orbeckst
Copy link
Member

👍 I like the idea of using the new trajectory iterables as a concise way to work with trajectories. I suppose, this is now being called "working with frames".

@orbeckst
Copy link
Member

orbeckst commented Jul 13, 2018

Sidenote: I think this approach

frames_group0 = u.trajectory[labels == 0]

is beautiful.

Would it make sense to add a copy() method to the trajectory (EDIT 👎 )

t2 = u.trajectory.copy()
frames2 = t2[labels == 2]

so that we can have multiple frames iterables that do not interfere with each other, e.g., for cross correlation time analysis... zip(frames1, frames2) and whatever else one finds in itertools.

EDIT: Yes, this is flawed... it is not clear where one AtomGroup should get its coordinates from unless we bring back AtomGroup.coordinates(ts).

@kain88-de
Copy link
Member

For this you can use the universe copy method and individual selections for each.

@orbeckst
Copy link
Member

We have a Universe.copy() ... apparently so. And we also have trajectory.copy(). I didn't know... (and the docs don't tell me when this was added, so must have been a while.)

@jbarnoud
Copy link
Contributor Author

jbarnoud commented Jul 13, 2018 via email

@orbeckst
Copy link
Member

Fair enough ;-)

@kain88-de
Copy link
Member

and the docs don't tell me when this was added, so must have been a while

#1538

Nope fairly recent. It's mentioned in the changelog though.

@orbeckst
Copy link
Member

orbeckst commented Jul 13, 2018 via email

@orbeckst
Copy link
Member

orbeckst commented Sep 6, 2021

@rsexton2 , as discussed, this might be an outline for changing _setup_frames():

from MDAnalysis.tests import datafiles as data
import MDAnalysis as mda

u = mda.Universe(data.TPR, data.XTC)
frames = [0, 2, 4, 5]

def setup_frames(start, stop, step, frames):
    if frames is not None:
         if any([start, stop, step]):
            raise ValueError("start/stop/step cannot be combined with frames")
         slicer = frames
     else:
         slicer = slice(start, stop, step)
     sliced_traj = u.trajectory[slicer]
     n_frames = len(sliced_traj)
     return n_frames, sliced_traj

@IAlibay
Copy link
Member

IAlibay commented Sep 6, 2021

@orbeckst that solution might need care if combined with check_slice_indices?

The returned values `start`, `stop` and `step` give the expected result
when passed in :func:`range` but gives unexpected behavior when passed
in a :class:`slice` when ``stop=None`` and ``step=-1``

rsexton2 added a commit to rsexton2/mdanalysis that referenced this issue Sep 6, 2021
@orbeckst
Copy link
Member

orbeckst commented Sep 6, 2021

The way that @rsexton2 and I discussed it (which should be reflected in the snippet) is that using frames is exclusive with using any of start, stop, step. That's a lot easier to handle than thinking about what start/stop/step mean for a sliced trajectory. Fundamentally, I see this as two alternative ways of slicing and not something that should be combined, especially because a user who really only wants "every third frame of my custom frame selection" can just do so themselves with frames=myframes[::3].

@orbeckst
Copy link
Member

orbeckst commented Sep 6, 2021

For right now we also did not include the original suggestion to be able to pass a sliced trajectory in frames. Perhaps that should be allowed because we allow writing trajectories with AtomGroup.write(..., frames=...)

ag.write("traj.xtc", frames=ag.universe.trajectory[::2])

where frames can be a sliced trajectory (a FrameIterator) or a array-like of indices or a boolean array of length n_frames, or a slice(). There's something to be said for being consistent but I don't know if there are some odd interactions possible if we pass in a trajectory that belongs to a different universe than the atomgroup or universe that the analysis is supposed to run with.

@orbeckst orbeckst removed the proposal label Sep 8, 2021
@orbeckst
Copy link
Member

orbeckst commented Sep 8, 2021

For the purpose of PR #3415 @rsexton2 and I decided to keep things simple and only implement fancy indexing via the frames kwarg and not using a sliced trajectory. If anyone thinks the latter should be included then please say something. Reviews of the PR are also very welcome!

rsexton2 added a commit to rsexton2/mdanalysis that referenced this issue Sep 10, 2021
rsexton2 added a commit to rsexton2/mdanalysis that referenced this issue Sep 10, 2021
- Fix MDAnalysis#1985
- Added changes to AUTHORS CHANGELOG
- Added tests to test_base.py and changes to base.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants