Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changed theme to RTD and created quick start guide #11

Merged
merged 4 commits into from
Sep 22, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,13 @@
*~
.vagrant
notebooks/.ipynb_checkpoints
notebooks/*.png
notebooks/*.svg
notebooks/*.pdf
notebooks/*.pdb
notebooks/*.ndx
tmp
doc/build
doc/source/.ipynb_checkpoints
**/.ipynb_checkpoints
**/.vscode
.vscode
**/.DS_Store
6 changes: 6 additions & 0 deletions doc/examples/README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

========
Examples
========

Here are some examples.
344 changes: 344 additions & 0 deletions doc/examples/analysis/pca.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
{
lilyminium marked this conversation as resolved.
Show resolved Hide resolved
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Principal Component Analysis in MDAnalysis\n",
"\n",
"2019\n",
"\n",
"Author: [Lily Wang](http://minium.com.au) [(@lilyminium)](https://github.com/lilyminium)\n",
"\n",
"Inspired by the MDAnalysis PCA tutorial by [Kathleen Clark](https://becksteinlab.physics.asu.edu/people/75/kathleen-clark) [(@kaceyreidy)](https://github.com/kaceyreidy)\n",
"\n",
"In this tutorial we:\n",
"\n",
"* use PCA to analyse and visualise large macromolecular conformational changes in the enzyme adenylate kinase (AdK)\n",
"* use PCA to compare the conformational differences between ???"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Background"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Principal component analysis (PCA) is a statistical technique that decomposes a system of observations into linearly uncorrelated variables called **principal components**. These components are ordered so that the first principal component accounts for the largest variance in the data, and each following component accounts for less and less variance. PCA is often applied to molecular dynamics trajectories to extract the large-scale conformational motions or \"essential dynamics\" of a protein. The frame-by-frame conformational fluctuation can be considered a linear combination of the essential dynamics yielded by the PCA.\n",
"\n",
"In MDAnalysis, the method is as follows:\n",
"\n",
"1. Optionally align each frame in your trajectory to the first frame.\n",
"2. Construct a 3N x 3N covariance for the N atoms in your trajectory. Optionally, you can provide a mean; otherwise the covariance is to the averaged structure over the trajectory.\n",
"3. Diagonalise the covariance matrix. The eigenvectors are the principal components, and their eigenvalues are the associated variance.\n",
"4. Sort the eigenvalues so that the principal components are ordered by variance.\n",
"\n",
"<div class=\"alert alert-warning\">\n",
" \n",
"**Note**\n",
" \n",
"It should be noted that principal component analysis algorithms are deterministic, but the solutions are not unique. For example, you could easily change the sign of an eigenvector without altering the PCA. Different algorithms are likely to produce different answers, due to variations in implementation. `MDAnalysis` is likely to return different solutions to, say, `cpptraj`. \n",
"</div>"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Large conformational changes in adenylate kinase"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In MDAnalysis, analysis modules usually need to be imported explicitly. The `pca` module contains the `PCA` class that we will use for analysis. We also import the AdK files from the MDAnalysis test suite."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "3996de84aab74b3aa903e977a8ac80b7",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"_ColormakerRegistry()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import MDAnalysis as mda\n",
"import MDAnalysis.analysis.pca as pca\n",
"from MDAnalysisTests.datafiles import PSF, DCD\n",
"\n",
"import nglview as nv"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual, we start off by creating a universe."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"u = mda.Universe(PSF, DCD)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Unlike other analyses, `pc.PCA` can only be applied to `Universe`s. The default `PCA` arguments are:\n",
"\n",
"```python\n",
"my_pca = pca.PCA(u, select='all', align=False, mean=None, n_components=None)\n",
"```\n",
"\n",
"By default (`align=False`), your trajectory will not be aligned to any structure. If you set `align=True`, every frame will be aligned to the first frame of your trajectory, based on the atoms in your `select` string. \n",
"\n",
"As PCA is usually used to extract large-scale conformational motions, we select only the backbone atoms here."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"pc = pca.PCA(u, select=\"backbone\", align=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Once you set up the class, you can run the analysis with `.run(start=None, stop=None, step=None, verbose=None)`. These allow you to specify the frames to compute the analysis over. The default arguments compute over every frame."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<MDAnalysis.analysis.pca.PCA at 0x11c506ef0>"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pc.run(verbose=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The principal components are accessible in `.p_components`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"(2565, 2565)\n"
]
},
{
"data": {
"text/plain": [
"array([ 0.02725098, 0.00156086, 0.00816821, ..., -0.01783826,\n",
" 0.04746114, 0.04257271])"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"print(pc.p_components.shape)\n",
"pc.p_components[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The variance of each principal component is in `.variance`. For example, to get the variance explained by the first principal component:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"281443.5086197605"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pc.variance[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This variance is somewhat meaningless by itself. It is much more intuitive to consider the variance of a principal component as a percentage of the total variance in the data. MDAnalysis also tracks the percentage cumulative variance in `.cumulated_variance`. As shown below, the first principal component contains 98.7% the total trajectory variance. The first three components combined account for 99.9% of the total variance."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.9873464381554058\n",
"0.999419901112709\n"
]
}
],
"source": [
"print(pc.cumulated_variance[0])\n",
"print(pc.cumulated_variance[3])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The average structure is also saved as an `AtomGroup` in `.mean_atoms`."
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[[16.297781 6.8397956 -7.622989 ]\n",
" [14.900139 7.062459 -7.235277 ]\n",
" [14.185768 5.8268375 -6.879689 ]\n",
" ...\n",
" [13.035071 15.354209 -3.8042812]\n",
" [13.695147 15.725297 -4.988666 ]\n",
" [12.63667 15.566869 -6.1185045]]\n"
]
}
],
"source": [
"print(pc.mean_atoms.positions)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "a33209712ed34b32b969c1f8258aed91",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"NGLWidget()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"mean_structure = mda.Merge(pc.mean_atoms)\n",
"nv.show_mdanalysis(mean_structure)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python (mdanalysis)",
"language": "python",
"name": "mdanalysis"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
},
"toc": {
"base_numbering": 1,
"nav_menu": {},
"number_sections": false,
"sideBar": true,
"skip_h1_title": false,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Binary file added doc/source/_static/.DS_Store
Binary file not shown.
Loading