Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add encore blog post #33

Merged
merged 7 commits into from
Dec 17, 2016
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions _posts/2016-12-04-encore.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
---
layout: post
title: ENCORE ensemble similarity
---

The **ENCORE** ensemble similarity library has been integrated in the next
version of MDAnalysis
as
[MDAnalysis.analysis.encore](http://www.mdanalysis.org/mdanalysis/documentation_pages/analysis/encore.html).
It implements a variety of techniques for calculating similarities between
structural ensembles (trajectories), as described in this publication:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do a list of all that encore adds. That would include clustering. Anything else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the PCA. Any other dimension reduction algorithms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bootstraping is also good to know. But that requires better docs that should explain what it does and where the limitations are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nicer link is http://devdocs.mdanalysis.org/documentation_pages/analysis/encore.html

This can be easily changed to the release docs when necessary with s/devdocs/docs/g...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work and the comments - I'm reviewing the blog post. @kain88-de, for the bootstrapping, guess you mean better docs in the docs of bootstrapping itself rather than the blog post?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess you mean better docs in the docs of bootstrapping itself rather than the blog post

How about both? The docs need some work for sure. But I think it should also be introduced here since it is a powerful method to estimate errors when sampling is low (and more sampling can be quite expensive for MD)


Tiberti M, Papaleo E, Bengtsen T, Boomsma W, Lindorff-Larsen K (2015), ENCORE:
Software for Quantitative Ensemble Comparison. PLoS Comput Biol 11(10):
e1004415.
doi:[10.1371/journal.pcbi.1004415](http://doi.org/10.1371/journal.pcbi.1004415).

Using the similarity measures is simply a matter of loading the trajectories or
experimental ensembles that one would like to compare as MDAnalysis.Universe
objects:

```python
>>> from MDAnalysis import Universe
>>> import MDAnalysis.analysis.encore as encore
>>> from MDAnalysis.tests.datafiles import PSF, DCD, DCD2
>>> u1 = Universe(PSF, DCD)
>>> u2 = Universe(PSF, DCD2)
```

and running the similarity measures on them, choosing among the Harmonic
Ensemble Similarity measure:

```python
>>> hes_similarities, details = encore.hes([u1, u2])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is actually stored in the details?

>>> print hes_similarities
[[ 0. 38279683.9587939]
[ 38279683.9587939 0. ]]
```

Similarities are written in a square symmetric matrix having the same dimensions
and ordering as the input list, with each element being the similarity value for
a pair of the input ensembles. Other available measures are the clustering
ensemble similarity measure `encore.ces` and dimensionality reduction ensemble
measure `encore.dres`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a rule of thumb which method would fit which use-case. Links to papers or the implmentation paper chapter would also be fine.


The encore library includes a general interface to various clustering and
dimensionality reduction algorithms (through
the [scikit-learn](http://scikit-learn.org/) package), which makes it easy to
switch between clustering and dimensionality reduction algorithms when using the
`ces` and `dres` functions. The clustering and dimensionality reduction
functionality is also directly available through the `cluster` and
`reduce_dimensionality` functions. For instance, to cluster the conformations
from the two universes defined above, we can write:

```python
>>> cluster_collection = encore.cluster([u1,u2])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What clustering algorithm is chosen? Can I change it?

>>> print cluster_collection
0 (size:5,centroid:1): array([ 0, 1, 2, 3, 98])
1 (size:5,centroid:6): array([4, 5, 6, 7, 8])
2 (size:7,centroid:12): array([ 9, 10, 11, 12, 13, 14, 15])
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The blog doesn't really explan the output. It suggests I could infer from it to which trajectory a centroid belongs to.

```

In addition to standard cluster membership information, the `cluster_collection`
output keep track of the origin of each conformation, so you check how the
different trajectories are represented in each cluster. For further details, see
the documentation of the individual functions within Encore.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should link to the exact docs.