-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add encore blog post #33
Changes from 1 commit
d26cad8
ee6a336
0073f4d
7c173c6
4761547
702a71a
0e0cb60
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
--- | ||
layout: post | ||
title: ENCORE ensemble similarity | ||
--- | ||
|
||
The **ENCORE** ensemble similarity library has been integrated in the next | ||
version of MDAnalysis | ||
as | ||
[MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html). | ||
It implements a variety of techniques for calculating similarities between | ||
structural ensembles (trajectories), as described in this publication: | ||
|
||
Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE: | ||
Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10): | ||
e1004415. | ||
doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415). | ||
|
||
Using the similarity measures is simply a matter of loading the trajectories or | ||
experimental ensembles that one would like to compare as MDAnalysis.Universe | ||
objects: | ||
|
||
```python | ||
>>> from MDAnalysis import Universe | ||
>>> import MDAnalysis.analysis.encore as encore | ||
>>> from MDAnalysis.tests.datafiles import PSF, DCD, DCD2 | ||
>>> u1 = Universe(PSF, DCD) | ||
>>> u2 = Universe(PSF, DCD2) | ||
``` | ||
|
||
and running the similarity measures on them, choosing among the Harmonic | ||
Ensemble Similarity measure: | ||
|
||
```python | ||
>>> hes_similarities, details = encore.hes([u1, u2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what is actually stored in the details? |
||
>>> print hes_similarities | ||
[[ 0. 38279683.9587939] | ||
[ 38279683.9587939 0. ]] | ||
``` | ||
|
||
Similarities are written in a square symmetric matrix having the same dimensions | ||
and ordering as the input list, with each element being the similarity value for | ||
a pair of the input ensembles. Other available measures are the clustering | ||
ensemble similarity measure `encore.ces` and dimensionality reduction ensemble | ||
measure `encore.dres`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be nice to have a rule of thumb which method would fit which use-case. Links to papers or the implmentation paper chapter would also be fine. |
||
|
||
The encore library includes a general interface to various clustering and | ||
dimensionality reduction algorithms (through | ||
the [scikit-learn](http://scikit-learn.org/) package), which makes it easy to | ||
switch between clustering and dimensionality reduction algorithms when using the | ||
`ces` and `dres` functions. The clustering and dimensionality reduction | ||
functionality is also directly available through the `cluster` and | ||
`reduce_dimensionality` functions. For instance, to cluster the conformations | ||
from the two universes defined above, we can write: | ||
|
||
```python | ||
>>> cluster_collection = encore.cluster([u1,u2]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What clustering algorithm is chosen? Can I change it? |
||
>>> print cluster_collection | ||
0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98]) | ||
1 (size:5,centroid:6): array([4, 5, 6, 7, 8]) | ||
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The blog doesn't really explan the output. It suggests I could infer from it to which trajectory a centroid belongs to. |
||
… | ||
``` | ||
|
||
In addition to standard cluster membership information, the `cluster_collection` | ||
output keep track of the origin of each conformation, so you check how the | ||
different trajectories are represented in each cluster. For further details, see | ||
the documentation of the individual functions within Encore. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should link to the exact docs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do a list of all that encore adds. That would include clustering. Anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also the PCA. Any other dimension reduction algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bootstraping is also good to know. But that requires better docs that should explain what it does and where the limitations are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nicer link is http://devdocs.mdanalysis.org/documentation_pages/analysis/encore.html
This can be easily changed to the release docs when necessary with
s/devdocs/docs/g
...There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work and the comments - I'm reviewing the blog post. @kain88-de, for the bootstrapping, guess you mean better docs in the docs of bootstrapping itself rather than the blog post?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about both? The docs need some work for sure. But I think it should also be introduced here since it is a powerful method to estimate errors when sampling is low (and more sampling can be quite expensive for MD)