Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffusion Map Refactor/Implementation #863

Closed
wants to merge 49 commits into from

Conversation

jdetle
Copy link
Contributor

@jdetle jdetle commented May 29, 2016

Fixes #857

Changes made in this Pull Request:

  • Refactor of @euhruska's work to fit bauhaus style
  • Currently working through literature to ensure module fits researcher needs, agrees with theory.

TODO:

  • Finish Tranform Function
  • Jupyter notebook explanation of diffusion maps analysis
  • Figure out if matrix exponentiation actually needs to be done for timescaling, (it does not, http://www.stat.cmu.edu/~cshalizi/350/lectures/15/lecture-15.pdf) The eigenvectors for A^m are the same as those for A. Stems from diagonizability of matrix)
  • Write up some notes in TeX for algorithm
  • Check all citations in docs
  • Switch to isotropic diffusion maps method.

PR Checklist

  • Tests?
  • Docs?
  • CHANGELOG updated?
  • Issue raised/referenced?

@jdetle
Copy link
Contributor Author

jdetle commented May 29, 2016

Should parallelization be something we do in another PR or should we tackle it in this?

@@ -63,6 +63,7 @@ Fixes

Changes

<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflicts left over from rebasing

@kain88-de
Copy link
Member

Great I'll have a look at it soon.

@jdetle
Copy link
Contributor Author

jdetle commented May 29, 2016

@kain88-de, I still need to update tests so you can let it wait a bit. Decrementing loop in single frame is a simple fix to make sure that all frames are hit in run method. At first, a normal doubly nested loop would stop after one _single_frame call because self._ts would increment every time the j_th trajectory frame was accessed. (I don't know if what I just said made any sense but it was a funky little problem)

@coveralls
Copy link

coveralls commented May 30, 2016

Coverage Status

Coverage increased (+0.03%) to 80.235% when pulling 0671f02 on jdetle:diffusion into d68cf73 on MDAnalysis:develop.

@coveralls
Copy link

coveralls commented May 30, 2016

Coverage Status

Coverage increased (+0.04%) to 80.25% when pulling 0671f02 on jdetle:diffusion into d68cf73 on MDAnalysis:develop.

@coveralls
Copy link

coveralls commented May 30, 2016

Coverage Status

Coverage increased (+0.04%) to 80.25% when pulling 6553583 on jdetle:diffusion into d68cf73 on MDAnalysis:develop.

@richardjgowers
Copy link
Member

@jdetle is this ready for a final review?

@jdetle
Copy link
Contributor Author

jdetle commented Jun 3, 2016

No, tests are passing but I had some conversations with @euhruska and there are some things that I need to make more clear in the docs. I'll have something up tomorrow with fixed docs and more tests.

@jdetle
Copy link
Contributor Author

jdetle commented Jun 3, 2016

Style and such is satisfactory for me, need more tests to address coverage. I will get on it after school hours.

@coveralls
Copy link

coveralls commented Jun 3, 2016

Coverage Status

Coverage increased (+0.05%) to 80.255% when pulling 0e15dcf on jdetle:diffusion into d68cf73 on MDAnalysis:develop.

@jdetle
Copy link
Contributor Author

jdetle commented Jun 6, 2016

This is ready for a review.

@jdetle
Copy link
Contributor Author

jdetle commented Jun 6, 2016

I found a bug and I believe it has to do with the fact that eigenvectors and eigenvalues are not unique, when running test_constant_epsilon eigenvectors are different because some coordinates are the negative of the other, even though the epsilon values and eigenvalues are the same. I'm not exactly sure how to test for the equality of results given this issue, so I deleted the test until I figure it out.

@jdetle jdetle changed the title [WIP] Diffusion Map Refactor Diffusion Map Refactor Jun 6, 2016
@kain88-de
Copy link
Member

Are you saying for an eigenvector v_i you are seeing in two different runs v_i == -v_i?
That would be normal this just means that they point in opposite directions (which is OK the eigenvalue problem says nothing about the direction of a vector). Mathematically you can insert -v_i into the eigenvalue problem, the minus 1 will cancel out.

@jdetle
Copy link
Contributor Author

jdetle commented Jun 6, 2016

No, I am saying I am seeing the jth coordinate of v_i equal to the the -jth
coordinate of v_i. But the vectors have the same norm but components aren't
necessarily all of the same sign.

On Mon, Jun 6, 2016 at 2:14 PM kain88-de [email protected] wrote:

Are you saying for an eigenvector v_i you are seeing in two different
runs v_i == -v_i?
That would be normal this just means that they point in opposite
directions (which is OK the eigenvalue problem says nothing about the
direction of a vector). Mathematically you can insert -v_i into the
eigenvalue problem, the minus 1 will cancel out.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#863 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AKcARt7YC45mPcCF9gkYO0C32CVxn-qPks5qJI2pgaJpZM4IpYIS
.

Have a wonderful day,
John J. Detlefs

@jdetle jdetle force-pushed the diffusion branch 2 times, most recently from 4723d6a to 6d80248 Compare June 6, 2016 21:29
@jdetle jdetle changed the title Diffusion Map Refactor Diffusion Map Refactor/Implementation Jun 6, 2016
@coveralls
Copy link

coveralls commented Jun 7, 2016

Coverage Status

Coverage increased (+0.1%) to 80.436% when pulling 5cfee8b on jdetle:diffusion into debae2b on MDAnalysis:develop.

@jdetle
Copy link
Contributor Author

jdetle commented Jun 7, 2016

@kain88-de, @richardjgowers, these failures are a pain. @richardjgowers, @dotsdl, will splitting up new tests for #363 help to fix this problem? Is there a way I can hit the ground running and help out?

@jbarnoud
Copy link
Contributor

jbarnoud commented Jun 7, 2016

I just restarted the failing travis build. It complained about timeout errors. Can we just remove timeouts in nose?

@richardjgowers
Copy link
Member

I think the failures are caused by file I/O problems, so not really a 363 thing.

# will only run if distance matrix not already calculated
self._dist_matrix.run()
self._scaled_matrix = self._dist_matrix.dist_matrix / self._epsilon
# this should be a reference to the same object as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure when this comment used to make sense/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relic of Epsilon class, would permanently change dist_matrix.dist_matrix to reflect scaling. There weren't two matrices in memory.

@coveralls
Copy link

coveralls commented Jul 3, 2016

Coverage Status

Changes Unknown when pulling aa22772 on jdetle:diffusion into * on MDAnalysis:develop*.

@coveralls
Copy link

coveralls commented Jul 3, 2016

Coverage Status

Changes Unknown when pulling aa22772 on jdetle:diffusion into * on MDAnalysis:develop*.

unneccesarily exponentiating a diagonalizable matrix. Removed
timescaling init, removed tests for anisotropic kernel and
timescaling.
@coveralls
Copy link

coveralls commented Jul 3, 2016

Coverage Status

Changes Unknown when pulling 97c3105 on jdetle:diffusion into * on MDAnalysis:develop*.

eigenvals, eigenvectors = np.linalg.eig(self._kernel)
D_inv = np.diag(1 / self._kernel.sum(1))
self._diff = np.dot(D_inv, self._kernel)
eigenvals, eigenvectors = np.linalg.eig(self._diff)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't you say it's better to use eigh?

Copy link
Contributor Author

@jdetle jdetle Jul 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did and apparently I may have been wrong, the two give different results:
Eigenvalues using epsilon = 5, eigh:

[ 1.02532315  0.63865029  0.50461156  0.46652167  0.43577238  0.33804695
  0.29255447  0.23455876  0.2047588   0.18443111  0.14899126  0.14097275
  0.12780406  0.11636266  0.11326329  0.11052915  0.10414109  0.10212337
  0.09150211  0.08408036]

Eigenvalues using epsilon = 5, eig:

[ 1.          0.69843825  0.51574675  0.47646882  0.40558677  0.3209098
  0.25232349  0.21522759  0.20871485  0.17725692  0.14328863  0.13184095
  0.11912811  0.11392278  0.10615283  0.10202534  0.10149993  0.09744823
  0.08432603  0.07256051]

Given that the first eigenvalue should be 1 I am weary of using eigh

@kain88-de
Copy link
Member

Strange but yeah we should stick with the normal eig function then.

@coveralls
Copy link

coveralls commented Jul 5, 2016

Coverage Status

Changes Unknown when pulling b3b2ce9 on jdetle:diffusion into * on MDAnalysis:develop*.

@kain88-de
Copy link
Member

There are still some comments left to address. Can you say when you are done with them.

@jdetle
Copy link
Contributor Author

jdetle commented Jul 7, 2016

I think I lost track of them! I will look for them here, but in the mean time I opened up a new PR to help with the final pieces.

@jdetle jdetle closed this Jul 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants