-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Diffusion Map Refactor/Implementation #863
Changes from 1 commit
c511ead
3be892e
3ce705a
328c27a
d2d4593
ed5f4c5
2947c0f
7e94a58
ab7f93c
43144e0
18c84ee
0da96fa
2c93f2d
49578a0
7ec486a
f350eaa
245d2e8
17fced2
14b08e4
cfcbfe0
07a83d1
6faec56
48c27a9
fe08866
cafeb68
bb394ec
074a950
68f2928
98e899b
b893244
7b12ece
e5a3487
cbf685d
e0ed7b4
303b816
01dc0d5
2f2db09
34b1e7c
7c56863
bd40f38
420edd0
c058c6c
b6c309a
8ba1a58
aa22772
97c3105
b3b2ce9
c5d1958
d7cd454
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
Changed code further to make transform a simple iteration. Provided functionality for doing a diffusion map with just a distance matrix.
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,8 +160,7 @@ class DistanceMatrix(AnalysisBase): | |
dist_matrix : array | ||
Array of all possible ij metric distances between frames in trajectory. | ||
This matrix is symmetric with zeros on the diagonal. | ||
calculated : boolean | ||
Boolean indicating if the distance matrix has been calculated. | ||
|
||
Methods | ||
------- | ||
save(filename) | ||
|
@@ -205,7 +204,7 @@ def __init__(self, u, select='all', metric=rmsd, cutoff=1E0-5, | |
self._metric = metric | ||
self._cutoff = cutoff | ||
self._weights = weights | ||
self.calculated = False | ||
self._calculated = False | ||
# remember that this must be called before referencing self.nframes | ||
self._setup_frames(traj, start, stop, step) | ||
|
||
|
@@ -214,23 +213,22 @@ def _prepare(self): | |
|
||
def _single_frame(self): | ||
iframe = self._ts.frame | ||
i_ref = self.atoms.positions - self.atoms.center_of_mass() | ||
i_ref = self.atoms.positions | ||
# diagonal entries need not be calculated due to metric(x,x) == 0 in | ||
# theory, _ts not updated properly. Possible savings by setting a | ||
# cutoff for significant decimal places to sparsify matrix | ||
for j, ts in enumerate(self._u.trajectory[iframe:self.stop:self.step]): | ||
self._ts = ts | ||
j_ref = self.atoms.positions-self.atoms.center_of_mass() | ||
j_ref = self.atoms.positions | ||
dist = self._metric(i_ref, j_ref, weights=self._weights) | ||
self.dist_matrix[self._index, j+self._index] = (dist if dist > | ||
self._cutoff else 0) | ||
self.dist_matrix[j+self._index, self._index] = (self.dist_matrix[ | ||
self._index, | ||
j+self._index]) | ||
self.dist_matrix[self._frame_index, j+self._frame_index] = ( | ||
dist if dist > self._cutoff else 0) | ||
self.dist_matrix[j+self._frame_index, self._frame_index] = ( | ||
self.dist_matrix[self._frame_index, j+self._frame_index]) | ||
self._ts = self._u.trajectory[iframe] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please make a comment why you are resetting the trajectory |
||
|
||
def _conclude(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this isn't good. If I want to change my choice of epsilon I still have to recalculate the complete distance matrix! Or close knowledge about the inner working of this class. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for all this valuable input @kain88-de! Perhaps you're right and I went the wrong way. I want to allow the function to be tuneable for custom purposes such that the way epsilon is calculated is left up to the user. In addition, it would be nice if someone working in a scripting environment could just write a function in there script and pass it as an argument. Given some explicit instructions, a user passes epsilon=function_written_for_choice_of_epsilon and the arguments necessary to execute the function. (Would this be in the form of **kwargs?, I am not entirely sure of the formalism). Of course, many epsilon functions would take the distance matrix as a parameter. Hopefully that makes sense and indicates why I think this wouldn't require recalculating the complete distance matrix. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. write up some usage examples that you are thinking of. They can later be used as documentation as well. Then it becomes clearer what you want to do. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, good suggestion thanks. |
||
self.calculated = True | ||
self._calculated = True | ||
|
||
def save(self, filename): | ||
np.save(filename, self.dist_matrix) | ||
|
@@ -242,49 +240,53 @@ class Epsilon(object): | |
|
||
Attributes | ||
---------- | ||
scaled_matrix : DistanceMatrix object | ||
scaled_matrix : numpy array | ||
A matrix with each term divided by a local scale parameter | ||
calculated : boolean | ||
True after determine epsilon called | ||
|
||
Methods | ||
------- | ||
determine_epsilon() | ||
Determine local scale parameters using a chosen algorithm | ||
""" | ||
def __init__(self, DistanceMatrix, **kwargs): | ||
self._dist_matrix = DistMatrix | ||
def __init__(self, dist_matrix): | ||
if dist_matrix._calculated: | ||
# the DistanceMatrix object | ||
self.dist_matrix = dist_matrix | ||
# the actual numpy array containing the data | ||
self.scaled_matrix = dist_matrix.dist_matrix | ||
else: | ||
raise AttributeError('Distance Matrix does not exist, was' | ||
'DistanceMatrix.run() called?') | ||
self._calculated = False | ||
|
||
def determine_epsilon(self): | ||
# set self.calculated = True here | ||
pass | ||
|
||
|
||
class EpsilonConstant(Epsilon): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't use this anymore so I think it can go. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is so weird. I had deleted it in a previous commit but I think I mistakenly brought it back into diffusion when creating a new Epsilon class. Will get rid of it. |
||
"""Premade function for the determination of epsilon based on providing | ||
an arbitrary constant""" | ||
|
||
def __init__(self, DistanceMatrix, epsilon): | ||
def __init__(self, dist_matrix, epsilon): | ||
""" | ||
Parameters | ||
---------- | ||
epsilon : int | ||
The value of epsilon to be used as a local scale parameter | ||
""" | ||
if DistanceMatrix.calculated: | ||
self.scaled_matrix = DistanceMatrix.dist_matrix | ||
else: | ||
raise AttributeError('Distance Matrix does not exist, was' | ||
'DistanceMatrix.run() called?') | ||
|
||
Epsilon.__init__(self, dist_matrix) | ||
self._epsilon = epsilon | ||
self.calculated = False | ||
|
||
def determine_epsilon(self): | ||
self.scaled_matrix /= self._epsilon | ||
self.calculated = True | ||
# set self.calculated = True here | ||
if not self._calculated: | ||
self.scaled_matrix /= self._epsilon | ||
self._calculated = True | ||
return | ||
|
||
|
||
class DiffusionMap(AnalysisBase): | ||
class DiffusionMap(object): | ||
"""Non-linear dimension reduction method | ||
|
||
Dimension reduction with diffusion mapping of selected structures in a | ||
|
@@ -310,59 +312,63 @@ class DiffusionMap(AnalysisBase): | |
the collective coordinates. | ||
""" | ||
|
||
def __init__(self, DistanceMatrix, Epsilon, weights=None, timescale=1): | ||
def __init__(self, dist_matrix, epsilon=1, manifold_density=None, timescale=1): | ||
""" | ||
Parameters | ||
------------- | ||
DistanceMatrix : DistanceMatrix object | ||
dist_matrix : DistanceMatrix object | ||
Distance matrix to be made into a diffusion kernel and perform | ||
an eigenvector decomposition on. | ||
Epsilon : Epsilon object | ||
epsilon : Float or Epsilon object | ||
Specifies the method used for the choice of scale parameter in the | ||
diffusion map. More information in [1], [2] and [3]. | ||
weights: list, optional | ||
diffusion map. More information in [1], [2] and [3]. If a float | ||
is given, this will be the scale parameter used for | ||
the kernel. | ||
manifold_density: list, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is a manifold density? What would be good weights? Why would a user change the weight of a frame? From which paper did you get that this should be done? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a cool result from Lafon's Diffusion Maps Paper, Page 14. He provides a proof to show how diffusion maps can reflect a system that is modeled by the Fokker-Plank equation. (Does that sentence make sense?) If I'm not mistaken MD simulations are one such area, and it is taken into consideration in Dr. Clementi's locally scaled diffusion and free energy landscapes paper. I will gladly admit that these more advanced topics are a bit beyond me. I can't tell if you want to see this gotten rid of or if I should work on a better description of this in the documentation. I can provide a link to my blog in the docs where I talked about this derivation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah Focker-Plank equation and diffusion are related I know (also good the correspondence can be shown for diffusion maps). Yes MD-simulations can be modeled with Fokker-Plank equations. What I would like to know is what is a manifold_density? Your answer doesn't give me the description. I still don't know what this parameter does from reading the docs nor do I know what sensible values are as a user nor could I guess them. Please explain what the manifold_density is for the diffusion maps and how they influence the result. If we can't explain it think about leaving it out. Please don't link to blog posts. We don't know how long they will be around and MDAnalysis will likely be around for another 10 years so I rather keep everything in the docs self contained. References to publications are OK. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're absolutely right, I will do some research and get back to you with a
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay so with 10 minutes of reading another Lafon paper he gives the density at a point x to be equal to $\int d(x,y) Pr{y} dy $ Pr{y} is a probability measure equal to e^U(y), where U is the potential function on the diffusion process. Does this make sense? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok that is a better answer. So you are saying you want to implement a anisotropic algorithm for the diffusion maps. I have so far not found that information in the docstrings for this class. As I read the paper they provide an equation (11) for p_{epsilon} that can be calculated from the data. And I assume you are referring to the first unnumbered equations in chapter 4. anisotropic diffusion. Then the integral you have written is wrong. Second from the equation in the paper I read $\int d(y) k_{epsilon}(x,y) Pr{y} dy $ that the probability to be in state x is equal to the "sum" of all transitions to x from any point y considering the probability to be in y. Which sounds unnecessary confusing. Or do you mean equation 13? Even in that case we can hardly give any estimate because the p_{epsilon} is some normalization and not a probability density. Giving this as a user is really hard, because having a good estimate of \mu is hard in high dimensions (more then 3) and would help us solve the complete state equation to a large extend. I currently think your implementation has algorithms from different papers mixed up. Please once write up in latex the formulas you use. What each symbol stands for and from what paper you have copied the algorithm. Since they all provide test-case data we can compare against that. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before we have any more math discussion I think it's the best when you once write down what exactly you are doing on one page in latex. That page will then be used as a reference to check your implementation. At least to remove the confusion in my head in regard to what you are doing ;-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please prepare the one sheet paper with citations. Then I can say more where I have questions about the details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, this certainly the best medium for an easy-to-follow dialogue. |
||
The list has to have the same length as the trajectory. | ||
With 'None' the weight of each frame of the trajectory will be the | ||
same. | ||
timescale: int, optional | ||
The number of steps in the random walk, large t reflects global | ||
structure whereas small t indicates local structure. | ||
""" | ||
# check .run() called | ||
if DistanceMatrix.calculated: | ||
self._dist_matrix = DistanceMatrix | ||
else: | ||
raise AttributeError('Distance Matrix does not exist, was' | ||
'DistanceMatrix.run() called?') | ||
|
||
# .run() called in _prepare | ||
self._dist_matrix = dist_matrix | ||
# because we check for .run() in Epsilon class, we must be sure to call | ||
# dist_matrix.run() before determine_epsilon, if a user wishes to | ||
# provide their own epsilon class, dist_matrix.run() must be run | ||
# prior to Diffusion Map initilization | ||
self._epsilon = epsilon | ||
# important for transform function | ||
self._nframes = self._dist_matrix.nframes | ||
# determines length of diffusion process | ||
self._t = timescale | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. keep names as from the init arguments for consistency and easier understanding of other methods that use the init arguments. (best practice) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. please use the same names as the input variables. This makes them easier to look up later. |
||
|
||
# check determine_epsilon called | ||
if Epsilon.calculated: | ||
self._kernel = Epsilon.scaled_matrix | ||
else: | ||
raise AttributeError('scaled_matrix does not exist, was' | ||
'determine_epsilon() called?') | ||
self._epsilon = Epsilon | ||
|
||
if weights is None: | ||
if manifold_density is None: | ||
# weights do not apply to metric but density of data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is something you can read about in page 14 of the Lafon Paper or my diffusion maps blog post. The weight quantity is supposed to reflect how crowded points are on the free energy surface in the case of the adk simulation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well good if I'd know that my problem would be solved. The density is what determines the free energy landscape and hard to calculate accurately, especially if you talk about the density in the full conformation space. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And this is not documented which would be incredible important here since the meaning of weight deviates from the usual meaning of weight inside of MDAnalysis. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, I'll go back and have a look at the literature before I make any changes. I'll email @euhruska to see if he has any examples of literature using a density estimate. |
||
self._weights_ker = np.ones((self._nframes,)) | ||
else: | ||
if weights.shape[0] != self._nframes: | ||
raise ValueError("The weight should have the same length as " | ||
'the trajectory') | ||
else: | ||
# weights are constructed as relative to the mean | ||
# density weights are constructed as relative to the mean | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why? |
||
self._weights_ker = (np.asarray(weights, dtype=np.float64) / | ||
np.mean(weights)) | ||
|
||
def decompose_kernel(self): | ||
def run(self): | ||
# will only run if distance matrix not already calculated | ||
self._dist_matrix.run() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why wouldn't this trigger a rerun of the distmatrix? There is nothing in the AnalysisBase code or DistMatrix that would stop it from being run again. You have to do the check here manually. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Totally right, I don't know why these mistakes are popping up, I'll try to be more careful. |
||
# this is the case when user provides no Epsilon class | ||
if isinstance(self._epsilon, float) or isinstance(self._epsilon, int): | ||
self._epsilon = EpsilonConstant(self._dist_matrix, | ||
self._epsilon) | ||
|
||
self._epsilon.determine_epsilon() | ||
# this should be a reference to the same object as | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure when this comment used to make sense/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Relic of Epsilon class, would permanently change dist_matrix.dist_matrix to reflect scaling. There weren't two matrices in memory. |
||
# self.dist_matrix.dist_matrix | ||
# take negative exponent of scaled matrix to create Isotropic kernel | ||
self._kernel = np.exp(-self._kernel) | ||
self._kernel = np.exp(-self._epsilon.scaled_matrix) | ||
|
||
# define an anistropic diffusion term q | ||
q_vector = np.zeros((self._nframes, )) | ||
|
@@ -394,18 +400,7 @@ def decompose_kernel(self): | |
eg_arg = np.argsort(eigenvals) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can already change the sorting direction here. Since these are indices it would also be nice if the name reflects that. |
||
self.eigenvalues = eigenvals[eg_arg[::-1]] | ||
self.eigenvectors = eigenvectors[eg_arg[::-1], :] | ||
|
||
def _prepare(self): | ||
self.diffusion_space = np.zeros((self.eigenvectors.shape[0], | ||
self.num_eigenvectors)) | ||
|
||
def _single_frame(self): | ||
# The diffusion map embedding takes the ith sample in the | ||
# data matrix and maps it to each of the ith coordinates | ||
# in the set of k-dominant eigenvectors | ||
for k in range(self.num_eigenvectors): | ||
self.diffusion_space[self._index][k] = (self.eigenvectors[k][ | ||
self._index]) | ||
self._calculated = True | ||
|
||
def transform(self, num_eigenvectors): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This function doesn't check that decomposition was called. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And how would someone perform such a check? (Asking for a friend...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are two ways. Either check that variables exists that are only created in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. noted thanks |
||
""" Embeds a trajectory via the diffusion map | ||
|
@@ -424,9 +419,11 @@ def transform(self, num_eigenvectors): | |
between the higher dimensional space and the space spanned by | ||
the eigenvectors. | ||
""" | ||
self.num_eigenvectors = num_eigenvectors | ||
self._setup_frames(self._dist_matrix._u.trajectory, | ||
self._dist_matrix.start, self._dist_matrix.stop, | ||
self._dist_matrix.step) | ||
self.run() | ||
self.diffusion_space = np.zeros((self.eigenvectors.shape[0], | ||
num_eigenvectors)) | ||
|
||
for i in range(self.nframes): | ||
for j in range(num_eigenvectors): | ||
self.diffusion_space[i][j] = self.eigenvectors[j][i] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is just a transpose operation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're totally right, will fix. |
||
|
||
return self.diffusion_space | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. to a straight return self.eigenvectors.T[:, :n_eigenvectors] * self.eigenvalues[:n_eigenvectors] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This makes total sense, will fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can allow optional arguments for the metric. Because some might want to center as well.