Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

[WIP] PDB file handling enhancements #483

Merged
merged 36 commits into from
Aug 16, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
f7afe95
Added option to download pdb to another directory
Jul 9, 2017
f393893
Added option to download multiple pdb files
Jul 9, 2017
aa1a5f5
Added option to get all PDB entries available in RCSB PDB
Jul 9, 2017
f67339c
Added option to download entire PDB files available in RCSB
Jul 9, 2017
61e5a56
Added option to get status of new and modified pdb entries
Jul 11, 2017
4c4da97
Updated docstrings
Jul 12, 2017
9319cdb
Added option to overwrite existing PDB files when downloading
Jul 12, 2017
e729bf1
Updated docstrings
Jul 12, 2017
90208c5
Added option to update PDB files based on weekly status list
Jul 12, 2017
3d9ac64
Added option to get all obsolete entries in PDB
Jul 16, 2017
c669711
PDB directory automatically created if it does not exist
Jul 16, 2017
730dbf1
Added option to download all obsolete pdb files from RCSB PDB server
Jul 16, 2017
545f110
Updated docstrings
Jul 16, 2017
b90f905
Added obsolete pdb file handling
Jul 20, 2017
574c896
minor enhancements on print statements
Jul 24, 2017
c671663
Updated comments and docstrings
Aug 3, 2017
f0d2878
Added function retrievepdb to download and read PDB file
Aug 3, 2017
18a7043
Code Refactoring and recommended changes
Aug 4, 2017
01c8d81
Overrided downloadpdb function for downloading multiple PDB files
Aug 5, 2017
e5f52ff
Added - download and update pdb files in PDB, XML and mmCIF formats
Aug 5, 2017
a120b81
Added - Download PDB,XML,mmCIF compressed and MMTF uncompressed format
Aug 6, 2017
01873ad
Recommended code fixes
Aug 7, 2017
37f31be
Minor Bug fixes and recommended changes
Aug 8, 2017
95e95d6
Exception handling improvements and Minor fixes
Aug 9, 2017
928f769
PDB Extraction fix-compatible for julia 0.5 & 0.6
Aug 9, 2017
9a25cf7
Updated Docstrings and Minor code changes
Aug 10, 2017
958afb4
Merge pull request #3 from BioJulia/master
Aug 11, 2017
ae0ae4c
simple ci test
Aug 11, 2017
03f80c6
Test cases and Bug fixes
Aug 12, 2017
b90e0e6
Bug fixes and Test case updates
Aug 12, 2017
a460de1
Test - Fix
Aug 12, 2017
8a292a9
Small test case added
Aug 12, 2017
a3ff2ae
Merge branch 'pdb_test' into pdb_enhancements
Aug 12, 2017
1a97bf3
Added Documentation
Aug 13, 2017
06acee3
Updated docs and Minor code changes
Aug 15, 2017
22231b3
Minor document corrections
Aug 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 177 additions & 17 deletions docs/src/man/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,19 @@ end
The `Bio.Structure` module provides functionality to manipulate macromolecular structures, and in particular to read and write [Protein Data Bank](http://www.rcsb.org/pdb/home/home.do) (PDB) files. It is designed to be used for standard structural analysis tasks, as well as acting as a platform on which others can build to create more specific tools. It compares favourably in terms of performance to other PDB parsers - see some [benchmarks](https://github.com/jgreener64/pdb-benchmarks).


## Parsing PDB files
## Basics

To download a PDB file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change this section title from "Parsing PDB files" to "Basics" as this makes more sense in the new arrangement.


```julia
# Stored in the current working directory by default
downloadpdb("1EN2")
```

To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework:

```julia
julia> struc = read(filepath_1EN2, PDB)
julia> struc = read("/path/to/pdb/file.pdb", PDB)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference, this was initially set as filepath_1EN2 to pass the doctests but I don't think they are being tested here so this can stay as "/path/to/pdb/file.pdb".

Bio.Structure.ProteinStructure
Name - 1EN2.pdb
Number of models - 1
Expand All @@ -40,6 +41,8 @@ Number of hydrogens - 0
Number of disordered atoms - 27
```

**Note** : Refer to [Downloading PDB files](#downloading-pdb-files) and [Reading PDB files](#reading-pdb-files) sections for more options.

The elements of `struc` can be accessed as follows:

| Command | Returns | Return type |
Expand Down Expand Up @@ -194,21 +197,6 @@ RCGSQGGGSTCPGLRCCSIWGWCGDSEPYCGRTCENKCWSGERSDHRCGAAVGNPPCGQDRCCSVHGWCGGGNDYCSGGN
```


## Writing PDB files

PDB format files can be written:

```julia
writepdb("1EN2_out.pdb", struc)
```

Any element type can be given as input to `writepdb`. Atom selectors can also be given as additional arguments:

```julia
writepdb("1EN2_out.pdb", struc, backboneselector)
```


## Spatial calculations

Various functions are provided to calculate spatial quantities for proteins:
Expand Down Expand Up @@ -244,6 +232,178 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51]))
```


## Downloading PDB files

To download a PDB file to a specify directory:

```julia
downloadpdb("1EN2", pdb_dir="path/to/pdb/directory/")
```

To download multiple PDB files to a specify directory:

```julia
downloadpdb(["1EN2","1ALW","1AKE"], pdb_dir="path/to/pdb/directory/")
```

To download a PDB file in PDB, XML, MMCIF or MMTF format:

```julia
# PDB file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDB)
# XML file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDBXML)
# MMCIF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMCIF)
# MMTF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

Various options can be set through optional keyword arguments when downloading PDB files as follows:

| Keyword Argument | Description |
| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options are PDB, PDBXML, MMCIF or MMTF |
| `obsolete::Bool=false` | If set `true`, the PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir` |
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |
| `ba_number::Integer=0` | If set > 0, downloads the respective biological assembly; by default downloads the PDB file |


## Reading PDB files

- To parse a existing PDB file into a Structure-Model-Chain-Residue-Atom framework:

```julia
julia> struc = read("/path/to/pdb/file.pdb", PDB)
Bio.Structure.ProteinStructure
Name - 1EN2.pdb
Number of models - 1
Chain(s) - A
Number of residues - 85
Number of point mutations - 5
Number of other molecules - 5
Number of water molecules - 76
Number of atoms - 614
Number of hydrogens - 0
Number of disordered atoms - 27
```

Various options can be set through optional keyword arguments when parsing a PDB file as follows:

| Keyword Argument | Description |
| :------------------------------------------- | :------------------------------------------------------------------------------ |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |

- To parse a PDB file by specifying the PDB ID and PDB directory into a Structure-Model-Chain-Residue-Atom framework (file name must be in upper case, e.g. "1EN2.pdb")

The function `readpdb` provides an uniform way to download and read PDB files. For example:

```julia
struc = readpdb("1EN2", pdb_dir="/path/to/pdb/directory")
```

The same keyword arguments are taken as `read` above, plus `pdb_dir` and `ba_number`.

- To download and parse a PDB file into a Structure-Model-Chain-Residue-Atom framework in a single line:

```julia
julia> struc = retrievepdb("1ALW", pdb_dir="path/to/pdb/directory")
INFO: Downloading PDB : 1ALW
INFO: Parsing the PDB file...
Bio.Structure.ProteinStructure
Name - 1ALW.pdb
Number of models - 1
Chain(s) - AB
Number of residues - 346
Number of point mutations - 0
Number of other molecules - 10
Number of water molecules - 104
Number of atoms - 2790
Number of hydrogens - 0
Number of disordered atoms - 0
```

Various options can be set through optional keyword arguments when downloading and parsing a PDB file as follows:

| Keyword Argument | Description |
| :--------------------------------------------| :--------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read |
| `obsolete::Bool=false` | If set `true`, PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir`|
| `overwrite::Bool=false` | if set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading PDB file if exists |
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |


## Writing PDB files

PDB format files can be written:

```julia
writepdb("1EN2_out.pdb", struc)
```

Any element type can be given as input to `writepdb`. Atom selectors can also be given as additional arguments:

```julia
writepdb("1EN2_out.pdb", struc, backboneselector)
```


## RCSB PDB Utility Functions

- To download the entire RCSB PDB database in your preferred file format:

```julia
downloadentirepdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF, overwrite=false)
```

The keyword arguments are described below:

| Keyword Argument | Description |
| :------------------------------- | :------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()` | The directory to which the PDB files are downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options are PDB, PDBXML, MMCIF or MMTF |
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |

- To update your local PDB directory based on the weekly status list of new, modified and obsolete PDB files from RCSB Server:

```julia
updatelocalpdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

The `file_format` specifies the format of the PDB files present in the local PDB directory. Obsolete PDB files are stored in the autogenerated `obsolete` directory inside the specified local PDB directory.

- To download all obsolete PDB files from RCSB Server:

```julia
downloadallobsoletepdb(;obsolete_dir="/path/to/obsolete/directory/", file_format=MMCIF, overwrite=false)
```

The `file_format` specfies the format in which the PDB files are downloaded; Options are PDB, PDBXML, MMCIF or MMTF.

If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones.

- To maintain a local copy of the entire RCSB PDB Database

Run the `downloadentirepdb` function once to download all PDB files and setup a CRON job or similar to run `updatelocalpdb` function once in every week to keep the local PDB directory up to date with the RCSB Server.

There are a few more functions that may help.

| Function | Returns | Return type |
| :----------------- | :------------------------------------------------------------------------------ | :------------------------------------------------------- |
| `pdbentrylist` | List of all PDB entries from RCSB Server | `Array{String,1}` |
| `pdbstatuslist` | List of PDB entries from specified RCSB weekly status list URL | `Array{String,1}` |
| `pdbrecentchanges` | Added, modified and obsolete PDB lists from the recent RCSB weekly status files | `Tuple{Array{String,1},Array{String,1},Array{String,1}}` |
| `pdbobsoletelist` | List of all obsolete PDB entries in the RCSB server | `Array{String,1}` |


## Examples

A few further examples of `Bio.Structure` usage are given below.
Expand Down
Loading