-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffusion Map Refactor/Implementation #863
Changes from 43 commits
c511ead
3be892e
3ce705a
328c27a
d2d4593
ed5f4c5
2947c0f
7e94a58
ab7f93c
43144e0
18c84ee
0da96fa
2c93f2d
49578a0
7ec486a
f350eaa
245d2e8
17fced2
14b08e4
cfcbfe0
07a83d1
6faec56
48c27a9
fe08866
cafeb68
bb394ec
074a950
68f2928
98e899b
b893244
7b12ece
e5a3487
cbf685d
e0ed7b4
303b816
01dc0d5
2f2db09
34b1e7c
7c56863
bd40f38
420edd0
c058c6c
b6c309a
8ba1a58
aa22772
97c3105
b3b2ce9
c5d1958
d7cd454
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,371 @@ | ||
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*- | ||
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4 | ||
# | ||
# MDAnalysis --- http://www.MDAnalysis.org | ||
# Copyright (c) 2006-2015 Naveen Michaud-Agrawal, Elizabeth J. Denning, Oliver Beckstein | ||
# and contributors (see AUTHORS for the full list) | ||
# | ||
# Released under the GNU Public Licence, v2 or any higher version | ||
# | ||
# Please cite your use of MDAnalysis in published work: | ||
# | ||
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. | ||
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. | ||
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787 | ||
# | ||
|
||
""" | ||
Diffusion map --- :mod:`MDAnalysis.analysis.diffusionmap` | ||
===================================================================== | ||
|
||
:Authors: Eugen Hruska, John Detlefs | ||
:Year: 2016 | ||
:Copyright: GNU Public License v3 | ||
|
||
The module contains the non-linear dimension reduction method diffusion map. | ||
The diffusion map provides an estimate of the slowest collective | ||
coordinates for a trajectory. This non-linear dimension reduction method | ||
assumes that the trajectory is long enough to represent a probability | ||
distribution of a protein close to the equilibrium. Furthermore, the diffusion | ||
map assumes that the diffusion coefficients associated with the dynamical | ||
motion of molecules in the system are constant. The eigenvectors with | ||
the largest eigenvalues are the more dominant collective coordinates. Assigning | ||
phyiscal meaning of the 'collective coordinates' is a fundamentally difficult | ||
problem. The time complexity of the diffusion map is O(N^3), where N is the | ||
number of frames in the trajectory, and the in-memory storage complexity is | ||
O(N^2). Instead of a single trajectory a sample of protein structures | ||
can be used. The sample should be equiblibrated, at least locally. Different | ||
weights can be used to determine the anisotropy of the diffusion kernel. | ||
The motivation for the creation of an anisotropic kernel is given on page 14 of | ||
[Lafon1]_. The order of the sampled structures in the trajectory is irrelevant. | ||
|
||
The :ref:`Diffusion-Map-tutorial` shows how to use diffusion map for dimension | ||
reduction. | ||
|
||
More details about diffusion maps are in [Lafon1]_ , [Ferguson1]_, and | ||
[Clementi1]_. | ||
|
||
.. _Diffusion-Map-tutorial: | ||
|
||
Diffusion Map tutorial | ||
-------------------- | ||
|
||
The example uses files provided as part of the MDAnalysis test suite | ||
(in the variables :data:`~MDAnalysis.tests.datafiles.PSF` and | ||
:data:`~MDAnalysis.tests.datafiles.DCD`). Notice that this trajectory | ||
does not represent a sample from equilibrium. This violates a basic | ||
assumption of the anisotropic kernel. The anisotropic kernel is created to act | ||
as a discrete approximation for the Fokker-Plank equation from statistical | ||
mechanics, results from a non-equilibrium trajectory shouldn't be interpreted | ||
for this reason. This tutorial shows how to use the diffusionmap function. | ||
First load all modules and test data :: | ||
|
||
>>> import MDAnalysis | ||
>>> import numpy as np | ||
>>> import MDAnalysis.analysis.diffusionmap as diffusionmap | ||
>>> from MDAnalysis.tests.datafiles import PSF,DCD | ||
|
||
Given a universe or atom group, we can calculate the diffusion map from | ||
that trajectory using :class:`DiffusionMap`:: and get the corresponding | ||
eigenvalues and eigenvectors. | ||
|
||
>>> u = MDAnalysis.Universe(PSF,DCD) | ||
|
||
We leave determination of the appropriate scale parameter epsilon to the user, | ||
[Clementi1]_ uses a complex method involving the k-nearest-neighbors of a | ||
trajectory frame, whereas others simple use a trial-and-error approach with | ||
a constant epsilon. Users wishing to use a complex method to determine epsilon | ||
will have to initialize a distance matrix and scale it appropriately themselves | ||
and then pass the new scaled distance matrix as a parameter. | ||
|
||
>>> dmap = diffusionmap.DiffusionMap(dist_matrix, epsilon =2) | ||
>>> dmap.run() | ||
>>> eigenvalues = dmap.eigenvalues | ||
>>> eigenvectors = dmap.eigenvectors | ||
|
||
From here we can perform an embedding onto the k dominant eigenvectors. This | ||
is similar to the idea of a transform in Principal Component Analysis, but the | ||
non-linearity of the map means there is no explicit relationship between the | ||
lower dimensional space and our original trajectory. However, this is an | ||
isometry (distance preserving map), which means that points close in the lower | ||
dimensional space are close in the higher-dimensional space and vice versa. | ||
In order to embed into the most relevant low-dimensional space, there should | ||
exist some number of dominant eigenvectors, whose corresponding eigenvalues | ||
diminish at a constant rate until falling off, this is referred to as a | ||
spectral gap and should be somewhat apparent for a system at equilibrium with a | ||
high number of frames. | ||
|
||
>>> num_eigenvectors = # some number less than the number of frames-1 | ||
>>> dmap.transform(num_eigenvectors) | ||
|
||
From here it can be difficult to interpret the data, and is left as a task | ||
for the user. The `diffusion distance` between frames i and j is best | ||
approximated by the euclidean distance between rows i and j of | ||
self.diffusion_space. A Jupyter notebook providing an analysis of protein | ||
opening and closing is provided [here](#TODO url to notebook) | ||
|
||
Classes | ||
------- | ||
|
||
.. autoclass:: DiffusionMap | ||
.. autoclass:: DistMatrix | ||
|
||
References | ||
--------- | ||
|
||
If you use this Dimension Reduction method in a publication, please | ||
reference: | ||
..[Ferguson1] Ferguson, A. L.; Panagiotopoulos, A. Z.; Kevrekidis, I. G.; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is the title of this paper? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nonlinear dimensionality reduction in molecular simulation: The diffusion map approach |
||
Debenedetti, P. G. Chem. Phys. Lett. 2011, 509, 1−11 | ||
..[Lafon1] Coifman, Ronald R., Lafon, Stephane (2006) Diffusion maps. | ||
Appl. Comput. Harmon. Anal. 21, 5–30. | ||
..[Clementi1] Rohrdanz, M. A, Zheng, W, Maggioni, M, & Clementi, C. (2013) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please check your references. For example this paper is from 2011 |
||
Determination of reaction coordinates via locally scaled | ||
diffusion map. Journal of Chemical Physics. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think you can also cite the 'introduction to diffusion maps` paper from de la Porte. That paper is to me the clearest explanation I found so far. |
||
|
||
.. If you choose the default metric, this module uses the fast QCP algorithm | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is this chown? |
||
[Theobald2005]_ to calculate the root mean square distance (RMSD) between | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please check that this is correctly cross referenced in the produced docs. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hi Max, I'm currently working on that 1 page paper. Will do. |
||
two coordinate sets (as implemented | ||
in :func:`MDAnalysis.lib.qcprot.CalcRMSDRotationalMatrix`). | ||
When using this module in published work please cite [Theobald2005]_. | ||
|
||
|
||
""" | ||
from six.moves import range | ||
import logging | ||
import warnings | ||
|
||
import MDAnalysis as mda | ||
import numpy as np | ||
|
||
from .rms import rmsd | ||
from .base import AnalysisBase | ||
|
||
logger = logging.getLogger("MDAnalysis.analysis.diffusionmap") | ||
|
||
|
||
class DistanceMatrix(AnalysisBase): | ||
""" Calculate the pairwise distance between each frame in a trajectory using | ||
a given metric | ||
|
||
Attributes | ||
---------- | ||
atoms : AtomGroup | ||
Selected atoms in trajectory subject to dimension reduction | ||
dist_matrix : array | ||
Array of all possible ij metric distances between frames in trajectory. | ||
This matrix is symmetric with zeros on the diagonal. | ||
|
||
Methods | ||
------- | ||
save(filename) | ||
Save the `dist_matrix` to a given filename | ||
""" | ||
def __init__(self, u, select='all', metric=rmsd, cutoff=1E0-5, | ||
weights=None, start=None, stop=None, step=None): | ||
""" | ||
Parameters | ||
---------- | ||
u : trajectory `~MDAnalysis.core.AtomGroup.Universe` | ||
The MD Trajectory for dimension reduction, remember that | ||
computational cost of eigenvalue decomposition | ||
scales at O(N^3) where N is the number of frames. | ||
Cost can be reduced by increasing step interval or specifying a | ||
start and stop. | ||
select: str, optional | ||
Any valid selection string for | ||
:meth:`~MDAnalysis.core.AtomGroup.AtomGroup.select_atoms` | ||
This selection of atoms is used to calculate the RMSD between | ||
different frames. Water should be excluded. | ||
metric : function, optional | ||
Maps two numpy arrays to a float, is positive definite and | ||
symmetric, Default: metric is set to rms.rmsd(). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this should expand what a metric is and the API it follows. |
||
cutoff : float, optional | ||
Specify a given cutoff for metric values to be considered equal, | ||
Default: 1EO-5 | ||
weights : array, optional | ||
Weights to be given to coordinates for metric calculation | ||
start : int, optional | ||
First frame of trajectory to analyse, Default: 0 | ||
stop : int, optional | ||
Last frame of trajectory to analyse, Default: -1 | ||
step : int, optional | ||
Step between frames to analyse, Default: 1 | ||
""" | ||
|
||
self._u = u | ||
traj = self._u.trajectory | ||
self.atoms = self._u.select_atoms(select) | ||
self._metric = metric | ||
self._cutoff = cutoff | ||
self._weights = weights | ||
self._calculated = False | ||
# remember that this must be called before referencing self.nframes | ||
self._setup_frames(traj, start, stop, step) | ||
|
||
def _prepare(self): | ||
self.dist_matrix = np.zeros((self.nframes, self.nframes)) | ||
|
||
def _single_frame(self): | ||
iframe = self._ts.frame | ||
i_ref = self.atoms.positions | ||
# diagonal entries need not be calculated due to metric(x,x) == 0 in | ||
# theory, _ts not updated properly. Possible savings by setting a | ||
# cutoff for significant decimal places to sparsify matrix | ||
for j, ts in enumerate(self._u.trajectory[iframe:self.stop:self.step]): | ||
self._ts = ts | ||
j_ref = self.atoms.positions | ||
dist = self._metric(i_ref, j_ref, weights=self._weights) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can allow optional arguments for the metric. Because some might want to center as well. |
||
self.dist_matrix[self._frame_index, j+self._frame_index] = ( | ||
dist if dist > self._cutoff else 0) | ||
self.dist_matrix[j+self._frame_index, self._frame_index] = ( | ||
self.dist_matrix[self._frame_index, j+self._frame_index]) | ||
self._ts = self._u.trajectory[iframe] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please make a comment why you are resetting the trajectory |
||
|
||
def _conclude(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this isn't good. If I want to change my choice of epsilon I still have to recalculate the complete distance matrix! Or close knowledge about the inner working of this class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for all this valuable input @kain88-de! Perhaps you're right and I went the wrong way. I want to allow the function to be tuneable for custom purposes such that the way epsilon is calculated is left up to the user. In addition, it would be nice if someone working in a scripting environment could just write a function in there script and pass it as an argument. Given some explicit instructions, a user passes epsilon=function_written_for_choice_of_epsilon and the arguments necessary to execute the function. (Would this be in the form of **kwargs?, I am not entirely sure of the formalism). Of course, many epsilon functions would take the distance matrix as a parameter. Hopefully that makes sense and indicates why I think this wouldn't require recalculating the complete distance matrix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. write up some usage examples that you are thinking of. They can later be used as documentation as well. Then it becomes clearer what you want to do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, good suggestion thanks. |
||
self._calculated = True | ||
|
||
def save(self, filename): | ||
np.save(filename, self.dist_matrix) | ||
logger.info("Wrote the distance-squared matrix to file %r", filename) | ||
|
||
|
||
class DiffusionMap(object): | ||
"""Non-linear dimension reduction method | ||
|
||
Dimension reduction with diffusion mapping of selected structures in a | ||
trajectory. | ||
|
||
Attributes | ||
---------- | ||
eigenvalues: array | ||
Eigenvalues of the diffusion map | ||
eigenvectors: array | ||
Eigenvectors of the diffusion map | ||
diffusion_space : array | ||
After calling `transform(num_eigenvectors)` the diffusion map embedding | ||
into the lower dimensional diffusion space will exist here. | ||
|
||
Methods | ||
------- | ||
run() | ||
Constructs an anisotropic diffusion kernel and performs eigenvalue | ||
decomposition on it. | ||
transform(num_eigenvectors) | ||
Perform an embedding of a frame into the eigenvectors representing | ||
the collective coordinates. | ||
""" | ||
|
||
def __init__(self, u, epsilon=1, manifold_density=None, | ||
timescale=1, **kwargs): | ||
""" | ||
Parameters | ||
------------- | ||
u : MDAnalysis Universe or DistanceMatrix object | ||
Can be a Universe, in which case one must supply kwargs for the | ||
initialization of a DistanceMatrix. Otherwise, this can be a | ||
DistanceMatrix already initialized. Either way, this will be made | ||
into a diffusion kernel. | ||
epsilon : Float | ||
Specifies the method used for the choice of scale parameter in the | ||
diffusion map. More information in [1], [2] and [3], Default: 1. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why a default of 1? What unit is the epsilon? |
||
manifold_density: list, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is a manifold density? What would be good weights? Why would a user change the weight of a frame? From which paper did you get that this should be done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a cool result from Lafon's Diffusion Maps Paper, Page 14. He provides a proof to show how diffusion maps can reflect a system that is modeled by the Fokker-Plank equation. (Does that sentence make sense?) If I'm not mistaken MD simulations are one such area, and it is taken into consideration in Dr. Clementi's locally scaled diffusion and free energy landscapes paper. I will gladly admit that these more advanced topics are a bit beyond me. I can't tell if you want to see this gotten rid of or if I should work on a better description of this in the documentation. I can provide a link to my blog in the docs where I talked about this derivation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah Focker-Plank equation and diffusion are related I know (also good the correspondence can be shown for diffusion maps). Yes MD-simulations can be modeled with Fokker-Plank equations. What I would like to know is what is a manifold_density? Your answer doesn't give me the description. I still don't know what this parameter does from reading the docs nor do I know what sensible values are as a user nor could I guess them. Please explain what the manifold_density is for the diffusion maps and how they influence the result. If we can't explain it think about leaving it out. Please don't link to blog posts. We don't know how long they will be around and MDAnalysis will likely be around for another 10 years so I rather keep everything in the docs self contained. References to publications are OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're absolutely right, I will do some research and get back to you with a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay so with 10 minutes of reading another Lafon paper he gives the density at a point x to be equal to $\int d(x,y) Pr{y} dy $ Pr{y} is a probability measure equal to e^U(y), where U is the potential function on the diffusion process. Does this make sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that is a better answer. So you are saying you want to implement a anisotropic algorithm for the diffusion maps. I have so far not found that information in the docstrings for this class. As I read the paper they provide an equation (11) for p_{epsilon} that can be calculated from the data. And I assume you are referring to the first unnumbered equations in chapter 4. anisotropic diffusion. Then the integral you have written is wrong. Second from the equation in the paper I read $\int d(y) k_{epsilon}(x,y) Pr{y} dy $ that the probability to be in state x is equal to the "sum" of all transitions to x from any point y considering the probability to be in y. Which sounds unnecessary confusing. Or do you mean equation 13? Even in that case we can hardly give any estimate because the p_{epsilon} is some normalization and not a probability density. Giving this as a user is really hard, because having a good estimate of \mu is hard in high dimensions (more then 3) and would help us solve the complete state equation to a large extend. I currently think your implementation has algorithms from different papers mixed up. Please once write up in latex the formulas you use. What each symbol stands for and from what paper you have copied the algorithm. Since they all provide test-case data we can compare against that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before we have any more math discussion I think it's the best when you once write down what exactly you are doing on one page in latex. That page will then be used as a reference to check your implementation. At least to remove the confusion in my head in regard to what you are doing ;-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please prepare the one sheet paper with citations. Then I can say more where I have questions about the details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, this certainly the best medium for an easy-to-follow dialogue. |
||
The list has to have the same length as the trajectory. | ||
With 'None' the weight of each frame of the trajectory will be the | ||
same, Default : None | ||
timescale: int, optional | ||
The number of steps in the random walk, large t reflects global | ||
structure whereas small t indicates local structure, Default: 1 | ||
""" | ||
if isinstance(u, mda.Universe): | ||
self._dist_matrix = DistanceMatrix(u, **kwargs) | ||
elif isinstance(u, DistanceMatrix): | ||
self._dist_matrix = u | ||
else: | ||
raise ValueError("U is not a Universe or DistanceMatrix and" | ||
" so the DiffusionMap has no data to work with.") | ||
self._epsilon = epsilon | ||
# important for transform function and length of .run() method | ||
self._nframes = self._dist_matrix.nframes | ||
if self._nframes > 2000: | ||
warnings.warn("The distance matrix is very large, and can " | ||
"be very slow to compute. Consider picking a larger " | ||
"step size in distance matrix initialization.") | ||
|
||
# determines length of diffusion process | ||
self._t = timescale | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please use the same names as the input variables. This makes them easier to look up later. |
||
|
||
if manifold_density is None: | ||
# weights do not apply to metric but density of data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is something you can read about in page 14 of the Lafon Paper or my diffusion maps blog post. The weight quantity is supposed to reflect how crowded points are on the free energy surface in the case of the adk simulation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well good if I'd know that my problem would be solved. The density is what determines the free energy landscape and hard to calculate accurately, especially if you talk about the density in the full conformation space. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And this is not documented which would be incredible important here since the meaning of weight deviates from the usual meaning of weight inside of MDAnalysis. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I'll go back and have a look at the literature before I make any changes. I'll email @euhruska to see if he has any examples of literature using a density estimate. |
||
self._weights_ker = np.ones((self._nframes,)) | ||
else: | ||
if manifold_density.shape[0] != self._nframes: | ||
raise ValueError("The weight should have the same length as " | ||
'the trajectory') | ||
else: | ||
# density weights are constructed as relative to the mean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why? |
||
self._weights_ker = (np.asarray(manifold_density, | ||
dtype=np.float64) / | ||
np.mean(manifold_density)) | ||
|
||
def run(self): | ||
# will only run if distance matrix not already calculated | ||
self._dist_matrix.run() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why wouldn't this trigger a rerun of the distmatrix? There is nothing in the AnalysisBase code or DistMatrix that would stop it from being run again. You have to do the check here manually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally right, I don't know why these mistakes are popping up, I'll try to be more careful. |
||
self._scaled_matrix = self._dist_matrix.dist_matrix / self._epsilon | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. distance squared There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you looking at the Lafon paper and seeing h(||x-y||^2/ epsilon)? This is the thing you made me fix a while back. |
||
# this should be a reference to the same object as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure when this comment used to make sense/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Relic of Epsilon class, would permanently change dist_matrix.dist_matrix to reflect scaling. There weren't two matrices in memory. |
||
# self.dist_matrix.dist_matrix | ||
# take negative exponent of scaled matrix to create Isotropic kernel | ||
self._kernel = np.exp(-self._scaled_matrix) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why saving the |
||
# define an anistropic diffusion term q | ||
q_vector = np.zeros((self._nframes, )) | ||
# weights should reflect the density of the points on the manifold | ||
for i in range(self._nframes): | ||
q_vector[i] = np.dot(self._kernel[i, :], self._weights_ker) | ||
|
||
# Form a new kernel from the anisotropic diffusion term q | ||
self._kernel /= np.sqrt(q_vector[:, np.newaxis].dot(q_vector[np.newaxis])) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From which paper is this. I'm not familiar what q_vector is doing here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Lafon page 14 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are saying this q_vector comes from this paper? Coifman, R. R., & Lafon, S. (2006). Diffusion maps. Applied and Computational Harmonic Analysis, 21(1), 5–30. http://doi.org/10.1016/j.acha.2006.04.006 This wouldn't calculate a kernel but some d vector according to that page. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
# Weighted Graph Laplacian normalization on this new graph | ||
d_vector = np.zeros((self._nframes, )) | ||
for i in range(self._nframes): | ||
d_vector[i] = np.dot(self._kernel[i, :], self._weights_ker) | ||
|
||
for i in range(self._nframes): | ||
self._kernel[i, :] = self._kernel[i, :] * self._weights_ker | ||
|
||
# Define anisotropic transition by dividing kernel by this term | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are creating a row-stochastic matrix here that "defines" a discrete markov process There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These aren't mutually exclusive terms. If you think that better indicates the underlying framework of diffusion maps I'd probably agree, but these comments were just left by me to indicate how I was following the algorithm supplied in page 14 of Lafon's paper for the construction of an anisotropic kernel. |
||
self._kernel /= np.sqrt(d_vector[:, np.newaxis].dot(d_vector[np.newaxis])) | ||
|
||
# Apply timescaling | ||
for i in range(self._t): | ||
if i > 0: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Another thing. Isn't there a matrix power somewhere in numpy/scipy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Urgh I don't know why I didn't find this but there is numpy.linalg.matrix_power. Will change. |
||
self._kernel = self._kernel.dot(self._kernel) | ||
|
||
eigenvals, eigenvectors = np.linalg.eig(self._kernel) | ||
eg_arg = np.argsort(eigenvals) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can already change the sorting direction here. Since these are indices it would also be nice if the name reflects that. |
||
self.eigenvalues = eigenvals[eg_arg[::-1]] | ||
self.eigenvalues = self.eigenvalues[1:] | ||
self.eigenvectors = eigenvectors[eg_arg[::-1], :] | ||
self.eigenvectors = self.eigenvectors[1:] | ||
self._calculated = True | ||
|
||
def transform(self, num_eigenvectors): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function doesn't check that decomposition was called. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And how would someone perform such a check? (Asking for a friend...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two ways. Either check that variables exists that are only created in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noted thanks |
||
""" Embeds a trajectory via the diffusion map | ||
|
||
Parameter | ||
--------- | ||
num_eigenvectors : int | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. standard naming for number of something is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I had no idea, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is what we use in MDAnalysis. See |
||
The number of dominant eigenvectors to be used for | ||
diffusion mapping | ||
|
||
Return | ||
------ | ||
diffusion_space : array | ||
The diffusion map embedding as defined by [Ferguson1]_. | ||
This isn't a linear transformation, but an isometry | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this explanaition doesn't belong here. The first sentence is OK at this point but the rest should be at some other place. Either the general module docstring or in the class docstring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay |
||
between the higher dimensional space and the space spanned by | ||
the eigenvectors. | ||
""" | ||
self.diffusion_space = (self.eigenvectors.T[:,:num_eigenvectors] * | ||
self.eigenvalues[:num_eigenvectors]) | ||
return self.diffusion_space | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to a straight return self.eigenvectors.T[:, :n_eigenvectors] * self.eigenvalues[:n_eigenvectors] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes total sense, will fix. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
.. automodule:: MDAnalysis.analysis.diffusionmap |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for a documentation page to get built you also need to add a rst stub in
package/doc/sphinx/source/documentation_pages/analysis
(look at others in that dir) and a link to the page indocumentation_pages/analysis_modules.rst
Each rst file becomes a page at docs.mdanalysis.org, and it's the
automodule
thing in the rst that basically just points to this source here for content.