Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Rewrite of old documentation #110

Merged
merged 18 commits into from
Jan 29, 2016
Merged
182 changes: 182 additions & 0 deletions doc-api/source/PBassign.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,182 @@
PBassign
==========

``PBassign`` assigns a PB sequence to a protein structure.

.. note:: The following examples use ``PBdata`` and the demo files.
See :ref:`Demo files <demo>` for more information.


Example
-------

.. code-block:: bash

$ PBassign -p `PBdata`/3ICH.pdb -o 3ICH
Read 1 chain(s) in demo/3ICH.pdb
wrote 3ICH.PB.fasta

Content of `3ICH.PB.fasta`: ::

>demo1/3ICH.pdb | chain A
ZZccdfbdcdddddehjbdebjcdddddfklmmmlmmmmmmmmnopnopajeopacfbdc
ehibacehiamnonopgocdfkbjbdcdfblmbccfbghiacdddebehiafkbccddfb
dcfklgokaccfbdcfbhklmmmmmmmpccdfkopafbacddfbgcddddfbacddddZZ

Note that Protein Blocs assignment is only possible for proteins (as its name suggests).
As a consequence, processed PDB files must contain protein structures **only** (please remove any other molecule).
In addition, the PDB parser implemented here is pretty straightforward.
Be sure your PDB files complies with the `ATOM field <http://www.wwpdb.org/documentation/format33/sect9.html#ATOM>`_
of the `PDB format <http://www.wwpdb.org/documentation/format33/v3.3.html) and that the protein structure is coherent>`_.


Usage
-----

Here’s the ``PBassign`` help text. ::

Usage: PBassign [options] -p file.pdb|dir [-p file2.pdb] -o output_root_name -g gro_file -x xtc_file

Options:
--version show program's version number and exit
-h, --help show this help message and exit

Mandatory arguments:
-p P name of pdb file or directory containing pdb files
-o O root name for results
-x X name of xtc file (Gromacs)
-g G name of gro file (Gromacs)

Optional arguments:
--phipsi writes phi and psi angle
--flat writes one PBs sequence per line


`-p` option
```````````

can be used several times. For instance:

.. code-block:: bash

$ PBassign -p `PBdata`/3ICH.pdb -p `PBdata`/1BTA.pdb -p `PBdata`/1AY7.pdb -o test1
3 PDB file(s) to process
Read 1 chain(s) in demo/3ICH.pdb
Read 1 chain(s) in demo/1BTA.pdb
Read 2 chain(s) in demo/1AY7.pdb
wrote test1.PB.fasta


All PB assignments are written in the same output file. If a PDB file contains several chains
and/or models, PBs assignments are also written in a single output file.
From the previous example, the ouput of `test1.PB.fasta` is: ::

>demo/3ICH.pdb | chain A
ZZccdfbdcdddddehjbdebjcdddddfklmmmlmmmmmmmmnopnopajeopacfbdc
ehibacehiamnonopgocdfkbjbdcdfblmbccfbghiacdddebehiafkbccddfb
dcfklgokaccfbdcfbhklmmmmmmmpccdfkopafbacddfbgcddddfbacddddZZ
>demo/1BTA.pdb | chain A
ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmm
mngoilmmmmmmmmmmmmnopacdcddZZ
>demo/1AY7.pdb | chain A
ZZbjadfklmcfklmmmmmmmmnnpaafbfkgopacehlnomaccddehjaccdddddeh
klpnbjadcdddfbehiacddfegolaccdddfkZZ
>demo/1AY7.pdb | chain B
ZZcddfklpcbfklmmmmmmmmnopafklgoiaklmmmmmmmmpacddddddehkllmmm
mnnommmmmmmmmmmmmmnopacddddZZ


One can also use the `-p` option to provide a directory containing PDB files as an input.
``PBassign`` will process all PDB files located in the `PBdata` directory:

.. code-block:: bash

$ PBassign -p `PBdata`/ -o test2
8 PDB file(s) to process
Read 2 chain(s) in demo/1AY7.pdb
Read 90 chain(s) in demo/psi_md_traj_1.pdb
Read 10 chain(s) in demo/2LFU.pdb
Read 90 chain(s) in demo/psi_md_traj_2.pdb
Read 1 chain(s) in demo/3ICH.pdb
Read 90 chain(s) in demo/psi_md_traj_3.pdb
Read 190 chain(s) in demo/beta3_IEGF12.pdb
Read 1 chain(s) in demo/1BTA.pdb
wrote test2.PB.fasta


`-x` and `-g` options
`````````````````````

.. warning:: These options require the installation of python library `MDAnalysis <http://www.mdanalysis.org/>`_

Instead using the `-p` option, the protein structures could come
from a molecular dynamics simulation file from Gromacs.
For this, you have to specify a '.xtc' file with the `-x` option and a '.gro' file with the `-g` option.

.. code-block:: bash

$ PBassign -x `PBdata`/md_traj_4.xtc -g `PBdata`/md_traj_4.gro -o md_traj_4
PBs assigned for demo/md.xtc | frame 1
PBs assigned for demo/md.xtc | frame 2
PBs assigned for demo/md.xtc | frame 3
PBs assigned for demo/md.xtc | frame 4
...
PBs assigned for demo/md.xtc | frame 198
PBs assigned for demo/md.xtc | frame 199
PBs assigned for demo/md.xtc | frame 200
PBs assigned for demo/md.xtc | frame 201
wrote md_traj_4.PB.fasta


`--phipsi` option
`````````````````

generates an additionnal file with the
`phi and psi angles <http://en.wikipedia.org/wiki/Dihedral_angle#Dihedral_angles_of_biological_molecules>`_
for each residue.

.. code-block:: bash

$ PBassign -p `PBdata`/1BTA.pdb -o 1BTA --phipsi
1 PDB file(s) to process
Read 1 chain(s) in demo/1BTA.pdb
wrote 1BTA.PB.fasta
wrote 1BTA.PB.phipsi

Content of `1BTA.PB.phipsi`: ::

demo/1BTA.pdb | chain A 1 None -171.66
demo/1BTA.pdb | chain A 2 -133.80 153.74
demo/1BTA.pdb | chain A 3 -134.66 157.30
demo/1BTA.pdb | chain A 4 -144.49 118.60
demo/1BTA.pdb | chain A 5 -100.13 92.99
demo/1BTA.pdb | chain A 6 -83.49 104.24
demo/1BTA.pdb | chain A 7 -64.77 -43.25
demo/1BTA.pdb | chain A 8 -44.48 -25.89
demo/1BTA.pdb | chain A 9 -94.91 -47.18
demo/1BTA.pdb | chain A 10 -41.31 133.74
[snip]

The first part of the line is the comment also found in the fasta file.
The last thee columns are, from left to right, the residue number, the phi angle and the psi angle.
The phi angle of the first residue and the psi angle of the last residue cannot be computed.


`--flat` option
```````````````

formats the PBs assignment with one sequence per line.

.. code-block:: bash

$ PBassign -p `PBdata`/1BTA.pdb -o 1BTA --flat
1 PDB file(s) to process
Read 1 chain(s) in demo/1BTA.pdb
wrote 1BTA.PB.fasta
wrote 1BTA.PB.flat

Content of `1BTA.PB.flat`: ::

ZZdddfklonbfklmmmmmmmmnopafklnoiaklmmmmmnoopacddddddehkllmmmmngoilmmmmmmmmmmmmnopacdcddZZ


121 changes: 121 additions & 0 deletions doc-api/source/PBclust.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
PBclust
=======

.. warning:: NOT UPDATED.

Once converted to PB sequences, conformations of a same protein can be clustered
based on PB similarities.


Example
-------

.. code-block:: bash

$ PBclust -f `PBdata`/psi_md_traj_all.PB.fasta -o psi_md_traj_all --clusters 5
read 270 sequences in demo2/psi_md_traj_all.PB.fasta
read substitution matrix
Building distance matrix
100%
wrote psi_md_traj_all.PB.dist
R clustering: OK
cluster 1: 90 sequences (33%)
cluster 2: 55 sequences (20%)
cluster 3: 35 sequences (13%)
cluster 4: 35 sequences (13%)
cluster 5: 55 sequences (20%)
wrote psi_md_traj_all.PB.clust


Cluster 1 is the biggest cluster with 33% of all conformations.
`psi_md_traj_all.PB.dist` contains the matrix distance between all PB sequences.

Content of `psi_md_traj_all.PB.clust` (clustering results): ::

SEQ_CLU "psi_md_traj_1.pdb | model 0" 1
SEQ_CLU "psi_md_traj_1.pdb | model 1" 1
SEQ_CLU "psi_md_traj_1.pdb | model 2" 1
[snip]
...
[snip]
SEQ_CLU "psi_md_traj_3.pdb | model 31" 4
SEQ_CLU "psi_md_traj_3.pdb | model 32" 4
SEQ_CLU "psi_md_traj_3.pdb | model 33" 5
SEQ_CLU "psi_md_traj_3.pdb | model 34" 5
[snip]
...
[snip]
SEQ_CLU "psi_md_traj_3.pdb | model 88" 5
SEQ_CLU "psi_md_traj_3.pdb | model 89" 5
MED_CLU "psi_md_traj_1.pdb | model 65" 1
MED_CLU "psi_md_traj_2.pdb | model 33" 2
MED_CLU "psi_md_traj_2.pdb | model 74" 3
MED_CLU "psi_md_traj_3.pdb | model 0" 4
MED_CLU "psi_md_traj_3.pdb | model 87" 5


Usage
-----

Here’s the ``PBclust`` help text. ::

usage: PBclust [-h] -f F -o O (--clusters CLUSTERS | --compare)

Cluster protein structures based on their PB sequences.

optional arguments:
-h, --help show this help message and exit
-f F name(s) of the PBs file (in fasta format)
-o O name for results
--clusters CLUSTERS number of wanted clusters
--compare compare the first sequence versus all others


`--compare` option
``````````````````

compares, position by position, the first sequence found in the fasta file against all others.
The result of the comparison is a score between O (identical) and 9 (different).

.. code-block:: bash

$ PBclust -f `PBdata`/psi_md_traj_all.PB.fasta -o psi_md_traj_all --compare
read 270 sequences in demo2/psi_md_traj_all.PB.fasta
read substitution matrix
Normalized substitution matrix (between 0 and 9)
[[0 3 2 3 4 3 3 4 2 3 5 3 5 4 3 3]
[3 0 3 3 3 4 3 2 2 3 3 2 5 3 3 2]
[2 3 0 3 4 3 2 4 3 4 5 5 5 4 3 2]
[3 3 3 0 2 3 4 4 3 3 5 5 9 6 5 4]
[4 3 4 2 0 2 2 2 4 3 3 4 7 4 5 5]
[3 4 3 3 2 0 3 3 4 2 3 3 5 5 4 5]
[3 3 2 4 2 3 0 3 3 3 4 3 3 2 2 1]
[4 2 4 4 2 3 3 0 3 1 2 3 5 4 2 4]
[2 2 3 3 4 4 3 3 0 2 2 2 5 3 3 2]
[3 3 4 3 3 2 3 1 2 0 2 2 4 4 3 3]
[5 3 5 5 3 3 4 2 2 2 0 3 3 3 4 4]
[3 2 5 5 4 3 3 3 2 2 3 0 3 2 2 4]
[5 5 5 9 7 5 3 5 5 4 3 3 0 2 3 3]
[4 3 4 6 4 5 2 4 3 4 3 2 2 0 2 2]
[3 3 3 5 5 4 2 2 3 3 4 2 3 2 0 2]
[3 2 2 4 5 5 1 4 2 3 4 4 3 2 2 0]]
Compare first sequence (psi_md_traj_1.pdb | model 0) with others
wrote psi_md_traj_all.PB.compare.fasta

Content of `psi_md_traj_all.PB.compare.fasta`: ::

>psi_md_traj_1.pdb | model 0 vs psi_md_traj_1.pdb | model 1
00000002000000000020000000000002000200000000000230002000
>psi_md_traj_1.pdb | model 0 vs psi_md_traj_1.pdb | model 2
00000002000000000005000000000002000243000000055230000000
>psi_md_traj_1.pdb | model 0 vs psi_md_traj_1.pdb | model 3
00000002000000000020000000000002000200000000055230002000
[snip]
...
[snip]
>psi_md_traj_1.pdb | model 0 vs psi_md_traj_3.pdb | model 87
00302523340000000005000000035032000323300000335220000000
>psi_md_traj_1.pdb | model 0 vs psi_md_traj_3.pdb | model 88
00302523350500000005000000032232000323300000555225000000
>psi_md_traj_1.pdb | model 0 vs psi_md_traj_3.pdb | model 89
00333522250000000025000000035032000323300002035020002000
89 changes: 89 additions & 0 deletions doc-api/source/PBcount.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
PBcount
=======

``PBcount`` computes the frequency of PBs at each position along the amino acid sequence.

.. note:: The following examples use ``PBdata`` and the demo files.
See :ref:`Demo files <demo>` for more information.

Example
-------

.. code-block:: bash

$ PBcount -f `PBdata`/psi_md_traj_1.PB.fasta -o psi_md_traj_1
read 90 sequences in demo/psi_md_traj_1.PB.fasta
wrote psi_md_traj_1.PB.count

Content of `psi_md_traj_1.PB.count`: ::

a b c d e f g h i j k l m n o p
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 1 0 0 0 0 89 0 0 0 0 0
[snip]
51 0 0 0 0 0 22 0 40 0 0 28 0 0 0 0 0
52 0 23 0 0 0 0 0 0 38 1 1 27 0 0 0 0
53 62 0 21 0 0 0 0 0 0 0 0 0 0 0 0 7
54 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0
55 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Note that residues 1, 2, 55 and 56 have a null count of all PBs.
These residues are the first and last residues of the structure and no PB can be assigned to them.

Usage
-----

Here’s the ``PBcount`` help text. ::

usage: PBcount [-h] -f F -o O [--first-residue FIRST_RESIDUE]

Compute PB frequency along protein sequence.

optional arguments:
-h, --help show this help message and exit
-f F name(s) of the PBs file (in fasta format)
-o O name for results
--first-residue FIRST_RESIDUE
define first residue number (1 by default)


`-f` option
```````````

can be used several times:

.. code-block:: bash

$ PBcount -f `PBdata`/psi_md_traj_1.PB.fasta -f `PBdata`/psi_md_traj_2.PB.fasta -f `PBdata`/psi_md_traj_3.PB.fasta -o psi_md_traj_all
read 90 sequences in demo/psi_md_traj_1.PB.fasta
read 90 sequences in demo/psi_md_traj_2.PB.fasta
read 90 sequences in demo/psi_md_traj_3.PB.fasta
wrote psi_md_traj_all.PB.count


`--first-residue` option
````````````````````````

By default, the number of the first residue is 1, this option allows
to adjust the number associated to the first residue (and to the followings automaticaly).

.. code-block:: bash

$ PBcount --first-residue 5 -f `PBdata`/psi_md_traj_1.PB.fasta -o psi_md_traj_1_shifted
read 90 sequences in demo/psi_md_traj_1.PB.fasta
wrote psi_md_traj_1_shifted.PB.count


Content of `psi_md_traj_1_shifted.PB.count`: ::

a b c d e f g h i j k l m n o p
5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 90 0 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 1 0 0 0 0 89 0 0 0 0 0
9 0 89 0 0 0 0 0 0 0 0 0 1 0 0 0 0
10 0 0 86 0 0 3 0 0 0 0 0 0 1 0 0 0
[snip]
Loading