-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffusion Maps rebase, new PR for final review #894
Changes from 21 commits
ffcd3f2
5fdd7b6
ce7dd68
6f8e8cc
b6f8632
e773500
f2ae4a4
a142422
38f2352
19c3675
786ef01
32a9655
aea25a7
a9a5dbe
7492455
9418cdc
aa98fa7
b24364e
a4c185a
b299d89
cc384f1
313a174
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,364 @@ | ||
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
# | ||
# MDAnalysis --- http://www.MDAnalysis.org | ||
# Copyright (c) 2006-2015 Naveen Michaud-Agrawal, Elizabeth J. Denning, Oliver Beckstein | ||
# and contributors (see AUTHORS for the full list) | ||
# | ||
# Released under the GNU Public Licence, v2 or any higher version | ||
# | ||
# Please cite your use of MDAnalysis in published work: | ||
# | ||
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. | ||
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. | ||
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 | ||
# | ||
|
||
""" | ||
Diffusion map --- :mod:`MDAnalysis.analysis.diffusionmap` | ||
===================================================================== | ||
|
||
:Authors: Eugen Hruska, John Detlefs | ||
:Year: 2016 | ||
:Copyright: GNU Public License v2 | ||
|
||
This module contains the non-linear dimension reduction method diffusion map. | ||
The eigenvectors of a diffusion matrix represent the 'collective coordinates' | ||
of a molecule; the largest eigenvalues are the more dominant collective | ||
coordinates. Assigning phyiscal meaning to the 'collective coordinates' is a | ||
fundamentally difficult problem. The time complexity of the diffusion map is | ||
:math:`O(N^3)`, where N is the number of frames in the trajectory, and the in-memory | ||
storage complexity is :math:`O(N^2)`. Instead of a single trajectory a sample of | ||
protein structures can be used. The sample should be equiblibrated, at least | ||
locally. The order of the sampled structures in the trajectory is irrelevant. | ||
|
||
The :ref:`Diffusion-Map-tutorial` shows how to use diffusion map for dimension | ||
reduction. | ||
|
||
More details about diffusion maps are in [deLaPorte1]_, [Lafon1]_ , | ||
[Ferguson1]_, and [Clementi1]_. | ||
|
||
.. _Diffusion-Map-tutorial: | ||
|
||
Diffusion Map tutorial | ||
---------------------- | ||
|
||
The example uses files provided as part of the MDAnalysis test suite | ||
(in the variables :data:`~MDAnalysis.tests.datafiles.PSF` and | ||
:data:`~MDAnalysis.tests.datafiles.DCD`). This tutorial shows how to use the | ||
Diffusion Map class. | ||
|
||
First load all modules and test data :: | ||
|
||
>>> import MDAnalysis as mda | ||
>>> import numpy as np | ||
>>> import MDAnalysis.analysis.diffusionmap as diffusionmap | ||
>>> from MDAnalysis.tests.datafiles import PSF, DCD | ||
|
||
Given a universe or atom group, we can create and eigenvalue decompose | ||
the Diffusion Matrix from that trajectory using :class:`DiffusionMap`:: and get | ||
the corresponding eigenvalues and eigenvectors. | ||
|
||
>>> u = mda.Universe(PSF,DCD) | ||
|
||
We leave determination of the appropriate scale parameter epsilon to the user, | ||
[Clementi1]_ uses a complex method involving the k-nearest-neighbors of a | ||
trajectory frame, whereas others simple use a trial-and-error approach with | ||
a constant epsilon. Currently, the constant epsilon method is implemented | ||
by MDAnalysis. | ||
|
||
>>> dmap = diffusionmap.DiffusionMap(u, select='backbone', epsilon=2) | ||
>>> dmap.run() | ||
|
||
From here we can perform an embedding onto the k dominant eigenvectors. The | ||
non-linearity of the map means there is no explicit relationship between the | ||
lower dimensional space and our original trajectory. However, this is an | ||
isometry (distance preserving map), which means that points close in the lower | ||
dimensional space are close in the higher-dimensional space and vice versa. | ||
In order to embed into the most relevant low-dimensional space, there should | ||
exist some number of dominant eigenvectors, whose corresponding eigenvalues | ||
diminish at a constant rate until falling off, this is referred to as a | ||
spectral gap and should be somewhat apparent for a system at equilibrium with a | ||
high number of frames. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Show how to plot the spectrum. |
||
|
||
>>> # first cell of a jupyter notebook should contain: %matplotlib inline | ||
>>> import matplotlib.pyplot as plt | ||
>>> f, ax = plt.subplots() | ||
>>> upper_limit = # some reasonably high number less than the n_eigenvectors | ||
>>> ax.plot(dmap.eigenvalues[:upper_limit]) | ||
>>> ax.set(xlabel ='eigenvalue index', ylabel='eigenvalue') | ||
>>> plt.tight_layout() | ||
|
||
From here we can transform into the diffusion space | ||
|
||
>>> num_eigenvectors = # some number less than the number of frames after | ||
>>> # inspecting for the spectral gap | ||
>>> fit = dmap.transform(num_eigenvectors, time=1) | ||
|
||
It can be difficult to interpret the data, and is left as a task | ||
for the user. The `diffusion distance` between frames i and j is best | ||
approximated by the euclidean distance between rows i and j of | ||
self.diffusion_space. | ||
|
||
|
||
.. _Distance-Matrix-tutorial: | ||
|
||
Distance Matrix tutorial | ||
------------------------ | ||
|
||
Often a, a custom distance matrix could be useful for local | ||
epsilon determination or other manipulations on the diffusion | ||
map method. The :class:`DistanceMatrix` exists in | ||
:mod:`~MDAnalysis.analysis.diffusionmap` and can be passed | ||
as an initialization argument for :class:`DiffusionMap`. | ||
>>> import MDAnalysis as mda | ||
>>> import numpy as np | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto |
||
>>> import MDAnalysis.analysis.diffusionmap as diffusionmap | ||
>>> from MDAnalysis.tests.datafiles import PSF, DCD | ||
|
||
Now create the distance matrix and pass it as an argument to | ||
:class:`DiffusionMap`. | ||
|
||
>>> u = mda.Universe(PSF,DCD) | ||
>>> dist_matrix = diffusionmap.DistanceMatrix(u, select='all') | ||
>>> dist_matrix.run() | ||
>>> dmap = diffusionmap.DiffusionMap(dist_matrix) | ||
>>> dmap.run() | ||
|
||
Classes | ||
------- | ||
|
||
.. autoclass:: DiffusionMap | ||
.. autoclass:: DistanceMatrix | ||
|
||
References | ||
---------- | ||
|
||
If you use this Dimension Reduction method in a publication, please | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it is enough when people cite the original paper. Otherwise please only list papers you refer to in the docs |
||
cite: | ||
|
||
.. [Lafon1] | ||
Coifman, Ronald R., Lafon, Stephane Diffusion maps. Appl. Comput. Harmon. | ||
Anal. 21, 5–30 (2006). | ||
|
||
For more information | ||
-------------------- | ||
|
||
.. [deLaPorte1] | ||
J. de la Porte, B. M. Herbst, W. Hereman, S. J. van der Walt. | ||
An Introduction to Diffusion Maps. | ||
|
||
.. [Clementi1] | ||
Rohrdanz, M. A, Zheng, W, Maggioni, M, & Clementi, C. | ||
Determination of reaction coordinates via locally scaled | ||
diffusion map. J. Chem. Phys. 134, 124116 (2011). | ||
|
||
.. [Ferguson1] | ||
Ferguson, A. L.; Panagiotopoulos, A. Z.; Kevrekidis, I. G. | ||
Debenedetti, P. G. Nonlinear dimensionality reduction in molecular simulation: | ||
The diffusion map approach Chem. Phys. Lett. 509, 1−11 (2011) | ||
|
||
.. If you choose the default metric, this module uses the fast QCP algorithm | ||
[Theobald2005]_ to calculate the root mean square distance (RMSD) between | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You can add an anchor for this paper here like you do for the other papers. That way it will be linked for sure and we don't have to deal with possible errors in sphinx. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For what it's worth I have been compiling the docs and it links fine. I can go ahead and add it again here but it seems unnecessary. |
||
two coordinate sets (as implemented | ||
in :func:`MDAnalysis.lib.qcprot.CalcRMSDRotationalMatrix`). | ||
When using this module in published work please cite [Theobald2005]_. | ||
|
||
|
||
""" | ||
from six.moves import range | ||
import logging | ||
import warnings | ||
|
||
from MDAnalysis.core.AtomGroup import Universe | ||
import numpy as np | ||
|
||
from .rms import rmsd | ||
from .base import AnalysisBase | ||
|
||
logger = logging.getLogger("MDAnalysis.analysis.diffusionmap") | ||
|
||
|
||
class DistanceMatrix(AnalysisBase): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are docs missing how to run only dist matrix alone There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should I add info in the diffusion map tutorial or the docs for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the class is ok. We might might this anyway later. |
||
"""Calculate the pairwise distance between each frame in a trajectory | ||
using a given metric | ||
|
||
A distance matrix can be initialized on its own and used as an | ||
initialization argument in :class:`DiffusionMap`. Refer to the | ||
:ref:`Distance-Matrix-tutorial` for a demonstration. | ||
|
||
Attributes | ||
---------- | ||
atoms : AtomGroup | ||
Selected atoms in trajectory subject to dimension reduction | ||
dist_matrix : array | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you explicitly state the size of the array here, I think it's nframes x nframes |
||
Array of all possible ij metric distances between frames in trajectory. | ||
This matrix is symmetric with zeros on the diagonal. | ||
|
||
Methods | ||
------- | ||
save(filename) | ||
Save the `dist_matrix` to a given filename | ||
""" | ||
def __init__(self, u, select='all', metric=rmsd, cutoff=1E0-5, | ||
weights=None, start=None, stop=None, step=None): | ||
""" | ||
Parameters | ||
---------- | ||
u : trajectory `~MDAnalysis.core.AtomGroup.Universe` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't call U a trajectory to avoid confusion with the Reader object |
||
The MD Trajectory for dimension reduction, remember that | ||
computational cost of eigenvalue decomposition | ||
scales at O(N^3) where N is the number of frames. | ||
Cost can be reduced by increasing step interval or specifying a | ||
start and stop. | ||
select: str, optional | ||
Any valid selection string for | ||
:meth:`~MDAnalysis.core.AtomGroup.AtomGroup.select_atoms` | ||
This selection of atoms is used to calculate the RMSD between | ||
different frames. Water should be excluded. | ||
metric : function, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to document the API for the metric function. For example it has to take weights |
||
Maps two numpy arrays to a float, is positive definite and | ||
symmetric. The API for a metric requires that the arrays must have | ||
equal length, and that the function should have weights as an | ||
optional argument. Weights give each index value its own weight for | ||
the metric calculation over the entire arrays. Default: metric is | ||
set to rms.rmsd(). | ||
cutoff : float, optional | ||
Specify a given cutoff for metric values to be considered equal, | ||
Default: 1EO-5 | ||
weights : array, optional | ||
Weights to be given to coordinates for metric calculation | ||
start : int, optional | ||
First frame of trajectory to analyse, Default: 0 | ||
stop : int, optional | ||
Last frame of trajectory to analyse, Default: -1 | ||
step : int, optional | ||
Step between frames to analyse, Default: 1 | ||
""" | ||
self._u = u | ||
traj = self._u.trajectory | ||
self.atoms = self._u.select_atoms(select) | ||
self._metric = metric | ||
self._cutoff = cutoff | ||
self._weights = weights | ||
self._calculated = False | ||
# remember that this must be called before referencing self.n_frames | ||
self._setup_frames(traj, start, stop, step) | ||
|
||
def _prepare(self): | ||
self.dist_matrix = np.zeros((self.n_frames, self.n_frames)) | ||
|
||
def _single_frame(self): | ||
iframe = self._ts.frame | ||
i_ref = self.atoms.positions | ||
# diagonal entries need not be calculated due to metric(x,x) == 0 in | ||
# theory, _ts not updated properly. Possible savings by setting a | ||
# cutoff for significant decimal places to sparsify matrix | ||
for j, ts in enumerate(self._u.trajectory[iframe:self.stop:self.step]): | ||
self._ts = ts | ||
j_ref = self.atoms.positions | ||
dist = self._metric(i_ref, j_ref, weights=self._weights) | ||
self.dist_matrix[self._frame_index, j+self._frame_index] = ( | ||
dist if dist > self._cutoff else 0) | ||
self.dist_matrix[j+self._frame_index, self._frame_index] = ( | ||
self.dist_matrix[self._frame_index, j+self._frame_index]) | ||
self._ts = self._u.trajectory[iframe] | ||
|
||
def _conclude(self): | ||
self._calculated = True | ||
|
||
def save(self, filename): | ||
np.save(filename, self.dist_matrix) | ||
logger.info("Wrote the distance-squared matrix to file %r", filename) | ||
|
||
|
||
class DiffusionMap(object): | ||
"""Non-linear dimension reduction method | ||
|
||
Dimension reduction with diffusion mapping of selected structures in a | ||
trajectory. | ||
|
||
Attributes | ||
---------- | ||
eigenvalues: array () | ||
Eigenvalues of the diffusion map | ||
|
||
Methods | ||
------- | ||
run() | ||
Constructs an anisotropic diffusion kernel and performs eigenvalue | ||
decomposition on it. | ||
transform(n_eigenvectors, time) | ||
Perform an embedding of a frame into the eigenvectors representing | ||
the collective coordinates. | ||
""" | ||
|
||
def __init__(self, u, epsilon=1, **kwargs): | ||
""" | ||
Parameters | ||
------------- | ||
u : MDAnalysis Universe or DistanceMatrix object | ||
Can be a Universe, in which case one must supply kwargs for the | ||
initialization of a DistanceMatrix. Otherwise, this can be a | ||
DistanceMatrix already initialized. Either way, this will be made | ||
into a diffusion kernel. | ||
epsilon : Float | ||
Specifies the method used for the choice of scale parameter in the | ||
diffusion map. More information in [Lafon1]_, [Ferguson1]_ and | ||
[Clementi1]_, Default: 1. | ||
**kwargs | ||
Parameters to be passed for the initialization of a | ||
:class:`DistanceMatrix`. | ||
""" | ||
if isinstance(u, Universe): | ||
self._dist_matrix = DistanceMatrix(u, **kwargs) | ||
elif isinstance(u, DistanceMatrix): | ||
self._dist_matrix = u | ||
else: | ||
raise ValueError("U is not a Universe or DistanceMatrix and" | ||
" so the DiffusionMap has no data to work with.") | ||
self._epsilon = epsilon | ||
# important for transform function and length of .run() method | ||
self._n_frames = self._dist_matrix.n_frames | ||
if self._n_frames > 5000: | ||
warnings.warn("The distance matrix is very large, and can " | ||
"be very slow to compute. Consider picking a larger " | ||
"step size in distance matrix initialization.") | ||
|
||
|
||
def run(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Doc string missing |
||
""" Create and decompose the diffusion matrix in preparation | ||
for a diffusion map.""" | ||
# run only if distance matrix not already calculated | ||
if not self._dist_matrix._calculated: | ||
self._dist_matrix.run() | ||
self._scaled_matrix = (self._dist_matrix.dist_matrix ** 2 / | ||
self._epsilon) | ||
# take negative exponent of scaled matrix to create Isotropic kernel | ||
self._kernel = np.exp(-self._scaled_matrix) | ||
D_inv = np.diag(1 / self._kernel.sum(1)) | ||
self._diff = np.dot(D_inv, self._kernel) | ||
self._eigenvals, self._eigenvectors = np.linalg.eig(self._diff) | ||
sort_idx = np.argsort(self._eigenvals)[::-1] | ||
self.eigenvalues = self._eigenvals[sort_idx] | ||
self._eigenvectors = self._eigenvectors[sort_idx] | ||
self._calculated = True | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this needs to |
||
def transform(self, n_eigenvectors, time): | ||
""" Embeds a trajectory via the diffusion map | ||
|
||
Parameter | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. parameters |
||
--------- | ||
n_eigenvectors : int | ||
The number of dominant eigenvectors to be used for | ||
diffusion mapping | ||
time : float | ||
Exponent that eigenvalues are raised to for embedding, for large | ||
values, more dominant eigenvectors determine diffusion distance. | ||
Return | ||
------ | ||
diffusion_space : array (n_frames, n_eigenvectors) | ||
The diffusion map embedding as defined by [Ferguson1]_. | ||
""" | ||
return (self._eigenvectors[1:n_eigenvectors+1,].T * | ||
(self.eigenvalues[1:n_eigenvectors+1]**time)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.. automodule:: MDAnalysis.analysis.diffusionmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually use np in the code below