Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffusion Map Refactor/Implementation #863

Closed
wants to merge 49 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
c511ead
new analysis feature: diffusion map
Mar 2, 2016
3be892e
description of diffusion map
Mar 2, 2016
3ce705a
diffusionmap minor fix
Mar 2, 2016
328c27a
test for diffusionmap
Mar 2, 2016
d2d4593
diffusionmap faster test
Mar 2, 2016
ed5f4c5
included notes
Mar 11, 2016
2947c0f
initial work for refactor
jdetle May 24, 2016
7e94a58
functional refactor
jdetle May 24, 2016
ab7f93c
updated changes
jdetle May 24, 2016
43144e0
iteration error
jdetle May 29, 2016
18c84ee
refactor of @euhruska diffusion map
jdetle May 29, 2016
0da96fa
removed deco import
jdetle May 29, 2016
2c93f2d
added tests, fixed logger issue
jdetle May 30, 2016
49578a0
style fixes, protection fixes, added docs
jdetle Jun 3, 2016
7ec486a
fix of some broken areas
jdetle Jun 3, 2016
f350eaa
small docs fix
jdetle Jun 4, 2016
245d2e8
Constant epsilon tests
jdetle Jun 6, 2016
17fced2
updated CHANGELOG
jdetle Jun 6, 2016
14b08e4
fixed selections and updated tests accordingly
jdetle Jun 8, 2016
cfcbfe0
switched order of imports [skip ci]
jdetle Jun 8, 2016
07a83d1
Separated calculation of distance matrix from diffusion
jdetle Jun 10, 2016
6faec56
Working refactor
jdetle Jun 10, 2016
48c27a9
Change of direction in API, more work to be done implementing everything
jdetle Jun 16, 2016
fe08866
added timescaling, fixed some parts
jdetle Jun 16, 2016
cafeb68
Automodule added, Changed some object oriented stuff around
jdetle Jun 16, 2016
bb394ec
Changed things around to get embedding working
jdetle Jun 16, 2016
074a950
Docs fixes
jdetle Jun 17, 2016
68f2928
Added tests, found bugs, squashed bugs.
jdetle Jun 17, 2016
98e899b
Eliminated square of distance, random mistake
jdetle Jun 17, 2016
b893244
Now witness the firepower of this fully armed and
jdetle Jun 18, 2016
7b12ece
Satisfying the linter
jdetle Jun 18, 2016
e5a3487
Updated docs
jdetle Jun 18, 2016
cbf685d
Fixed bug in _single_frame()
jdetle Jun 19, 2016
e0ed7b4
Eliminated spectral gap function... for now
jdetle Jun 21, 2016
303b816
Fixed matrix multiplication bug
jdetle Jun 23, 2016
01dc0d5
Better single_frame loop
jdetle Jun 27, 2016
2f2db09
Refactor of code to stop DiffusionMap inheriting from AnalysisBase
jdetle Jun 29, 2016
34b1e7c
Removed epsilon class from pull request, now must supply a constant a…
jdetle Jun 29, 2016
7c56863
Added exception to be thrown for large distance matrices.
jdetle Jun 29, 2016
bd40f38
Updated diffusion map after some test-driven development, added error…
jdetle Jun 29, 2016
420edd0
Fixed some docs [skip ci]
jdetle Jun 30, 2016
c058c6c
Change to warning
jdetle Jun 30, 2016
b6c309a
Updated CHANGELOG
jdetle Jun 30, 2016
8ba1a58
Fixes as suggested to be more pythonic and elegant
jdetle Jul 1, 2016
aa22772
Fixes to docs, Transform function.
jdetle Jul 3, 2016
97c3105
Removed anisotropic kernel creation. Figured out that I was
jdetle Jul 3, 2016
b3b2ce9
Small style change
jdetle Jul 5, 2016
c5d1958
Updated docs to remove references to anisotropy, update tutorial [ski…
jdetle Jul 5, 2016
d7cd454
Added link to jupyter notebook [skip ci]
jdetle Jul 6, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
52 changes: 4 additions & 48 deletions package/MDAnalysis/analysis/diffusionmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -260,8 +260,7 @@ class DiffusionMap(object):
the collective coordinates.
"""

def __init__(self, u, epsilon=1, manifold_density=None,
timescale=1, **kwargs):
def __init__(self, u, epsilon=1, **kwargs):
"""
Parameters
-------------
Expand All @@ -273,10 +272,6 @@ def __init__(self, u, epsilon=1, manifold_density=None,
epsilon : Float
Specifies the method used for the choice of scale parameter in the
diffusion map. More information in [1], [2] and [3], Default: 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a default of 1? What unit is the epsilon?

manifold_density: list, optional
The list has to have the same length as the trajectory.
With 'None' the weight of each frame of the trajectory will be the
same, Default : None
timescale: int, optional
The number of steps in the random walk, large t reflects global
structure whereas small t indicates local structure, Default: 1
Expand All @@ -296,21 +291,6 @@ def __init__(self, u, epsilon=1, manifold_density=None,
"be very slow to compute. Consider picking a larger "
"step size in distance matrix initialization.")

# determines length of diffusion process
self._timescale = timescale

if manifold_density is None:
# weights do not apply to metric but density of data
self._weights_ker = np.ones((self._nframes,))
else:
if manifold_density.shape[0] != self._nframes:
raise ValueError("The weight should have the same length as "
'the trajectory')
else:
# density weights are constructed as relative to the mean
self._weights_ker = (np.asarray(manifold_density,
dtype=np.float64) /
np.mean(manifold_density))

def run(self):
# run only if distance matrix not already calculated
Expand All @@ -320,33 +300,9 @@ def run(self):
self._epsilon)
# take negative exponent of scaled matrix to create Isotropic kernel
self._kernel = np.exp(-self._scaled_matrix)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why saving the _scaled_matrix and not just calculate the kernel directly. Let cpython reclaim some memory. I think cpython is also good with temporary objects.

# define an anistropic diffusion term q
q_vector = np.zeros((self._nframes, ))
# weights should reflect the density of the points on the manifold
for i in range(self._nframes):
q_vector[i] = np.dot(self._kernel[i, :], self._weights_ker)

# Form a new kernel from the anisotropic diffusion term q
self._kernel /= (np.sqrt(q_vector[:, np.newaxis
].dot(q_vector[np.newaxis])))

# Weighted Graph Laplacian normalization on this new graph
d_vector = np.zeros((self._nframes, ))
for i in range(self._nframes):
d_vector[i] = np.dot(self._kernel[i, :], self._weights_ker)

for i in range(self._nframes):
self._kernel[i, :] = self._kernel[i, :] * self._weights_ker

# Define anisotropic transition by dividing kernel by this term
self._kernel /= np.sqrt(d_vector[:, np.newaxis].dot(d_vector[np.newaxis]))

# Apply timescaling
if self._timescale > 1:
self._kernel = np.linalg.matrix_power(self._kernel,
self._timescale)

eigenvals, eigenvectors = np.linalg.eig(self._kernel)
D_inv = np.diag(1 / self._kernel.sum(1))
self._diff = np.dot(D_inv, self._kernel)
eigenvals, eigenvectors = np.linalg.eig(self._diff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you say it's better to use eigh?

Copy link
Contributor Author

@jdetle jdetle Jul 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did and apparently I may have been wrong, the two give different results:
Eigenvalues using epsilon = 5, eigh:

[ 1.02532315  0.63865029  0.50461156  0.46652167  0.43577238  0.33804695
  0.29255447  0.23455876  0.2047588   0.18443111  0.14899126  0.14097275
  0.12780406  0.11636266  0.11326329  0.11052915  0.10414109  0.10212337
  0.09150211  0.08408036]

Eigenvalues using epsilon = 5, eig:

[ 1.          0.69843825  0.51574675  0.47646882  0.40558677  0.3209098
  0.25232349  0.21522759  0.20871485  0.17725692  0.14328863  0.13184095
  0.11912811  0.11392278  0.10615283  0.10202534  0.10149993  0.09744823
  0.08432603  0.07256051]

Given that the first eigenvalue should be 1 I am weary of using eigh

eg_arg = np.argsort(eigenvals)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can already change the sorting direction here. Since these are indices it would also be nice if the name reflects that.

self.eigenvalues = eigenvals[eg_arg[::-1]]
self.eigenvectors = eigenvectors[eg_arg[::-1]]
Expand Down
27 changes: 1 addition & 26 deletions testsuite/MDAnalysisTests/analysis/test_diffusionmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,43 +37,18 @@ def test_eg(self):
# number of frames is trajectory is now 10 vs. 98
assert_equal(self.eigvals.shape, (self.dist.nframes, ))
# makes no sense to test values here, no physical meaning
del self.dmap

def test_dist_weights(self):
backbone = self.u.select_atoms('backbone')
weights_atoms = np.ones(len(backbone.atoms))
self.dist = diffusionmap.DistanceMatrix(self.u, select='backbone',
weights=weights_atoms)
self.dist.run()
del self.dist

def test_kernel_weights(self):
self.dist = diffusionmap.DistanceMatrix(self.u, select='backbone')
self.weights_ker = np.ones((self.dist.nframes, ))
self.dmap = diffusionmap.DiffusionMap(self.dist,
manifold_density=self.weights_ker)
self.dmap.run()
assert_array_almost_equal(self.eigvals, self.dmap.eigenvalues, decimal=5)
assert_array_almost_equal(self.eigvects, self.dmap.eigenvectors, decimal=6)
del self.dist, self.dmap

@raises(ValueError)
def test_wrong_kernel_weights(self):
self.dmap = diffusionmap.DiffusionMap(self.dist,
manifold_density=np.ones((2,)))
del self.dmap

def test_timescaling(self):
self.dmap = diffusionmap.DiffusionMap(self.u, timescale=2)
self.dmap.run()
assert_equal(self.dmap.eigenvalues.shape, (self.dmap._nframes, ))
del self.dmap

def test_different_steps(self):
self.dmap = diffusionmap.DiffusionMap(self.u, select='backbone', step=3)
self.dmap.run()
del self.dmap, self.dist


def test_transform(self):
self.n_eigenvectors = 4
self.dmap = diffusionmap.DiffusionMap(self.u)
Expand Down