Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

[WIP] PDB file handling enhancements #483

Merged
merged 36 commits into from
Aug 16, 2017
Merged
Show file tree
Hide file tree
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
f7afe95
Added option to download pdb to another directory
Jul 9, 2017
f393893
Added option to download multiple pdb files
Jul 9, 2017
aa1a5f5
Added option to get all PDB entries available in RCSB PDB
Jul 9, 2017
f67339c
Added option to download entire PDB files available in RCSB
Jul 9, 2017
61e5a56
Added option to get status of new and modified pdb entries
Jul 11, 2017
4c4da97
Updated docstrings
Jul 12, 2017
9319cdb
Added option to overwrite existing PDB files when downloading
Jul 12, 2017
e729bf1
Updated docstrings
Jul 12, 2017
90208c5
Added option to update PDB files based on weekly status list
Jul 12, 2017
3d9ac64
Added option to get all obsolete entries in PDB
Jul 16, 2017
c669711
PDB directory automatically created if it does not exist
Jul 16, 2017
730dbf1
Added option to download all obsolete pdb files from RCSB PDB server
Jul 16, 2017
545f110
Updated docstrings
Jul 16, 2017
b90f905
Added obsolete pdb file handling
Jul 20, 2017
574c896
minor enhancements on print statements
Jul 24, 2017
c671663
Updated comments and docstrings
Aug 3, 2017
f0d2878
Added function retrievepdb to download and read PDB file
Aug 3, 2017
18a7043
Code Refactoring and recommended changes
Aug 4, 2017
01c8d81
Overrided downloadpdb function for downloading multiple PDB files
Aug 5, 2017
e5f52ff
Added - download and update pdb files in PDB, XML and mmCIF formats
Aug 5, 2017
a120b81
Added - Download PDB,XML,mmCIF compressed and MMTF uncompressed format
Aug 6, 2017
01873ad
Recommended code fixes
Aug 7, 2017
37f31be
Minor Bug fixes and recommended changes
Aug 8, 2017
95e95d6
Exception handling improvements and Minor fixes
Aug 9, 2017
928f769
PDB Extraction fix-compatible for julia 0.5 & 0.6
Aug 9, 2017
9a25cf7
Updated Docstrings and Minor code changes
Aug 10, 2017
958afb4
Merge pull request #3 from BioJulia/master
Aug 11, 2017
ae0ae4c
simple ci test
Aug 11, 2017
03f80c6
Test cases and Bug fixes
Aug 12, 2017
b90e0e6
Bug fixes and Test case updates
Aug 12, 2017
a460de1
Test - Fix
Aug 12, 2017
8a292a9
Small test case added
Aug 12, 2017
a3ff2ae
Merge branch 'pdb_test' into pdb_enhancements
Aug 12, 2017
1a97bf3
Added Documentation
Aug 13, 2017
06acee3
Updated docs and Minor code changes
Aug 15, 2017
22231b3
Minor document corrections
Aug 15, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
140 changes: 137 additions & 3 deletions docs/src/man/structure.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,97 @@ end
The `Bio.Structure` module provides functionality to manipulate macromolecular structures, and in particular to read and write [Protein Data Bank](http://www.rcsb.org/pdb/home/home.do) (PDB) files. It is designed to be used for standard structural analysis tasks, as well as acting as a platform on which others can build to create more specific tools. It compares favourably in terms of performance to other PDB parsers - see some [benchmarks](https://github.com/jgreener64/pdb-benchmarks).


## Parsing PDB files
## Downloading PDB files

To download a PDB file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps change this section title from "Parsing PDB files" to "Basics" as this makes more sense in the new arrangement.


```julia
# Stored in the current working directory by default
downloadpdb("1EN2")
```

To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework:
To download a PDB file to a specify directory:

```julia
downloadpdb("1EN2", pdb_dir="path/to/pdb/directory/")
```

To download multiple PDB files to a specify directory:

```julia
downloadpdb(["1EN2","1ALW","1AKE"], pdb_dir="path/to/pdb/directory/")
```

To download a PDB file in PDB, XML, mmCIF or MMTF format:

```julia
# PDB file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDB)
# XML file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDBXML)
# mmCIF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=mmCIF)
# MMTF file format
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

Various options can be set through optional keyword arguments when downloading PDB files as follows:

| Keyword Argument | Description |
| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "Options are PDB, PDBXML, mmCIF or MMTF" is more readable.

| `obsolete::Bool=false` | If set `true`, the PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir` |
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |
| `ba_number::Integer=0` | If set > 0, downloads the respective biological assembly; by default downloads the PDB file |

To download all obsolete PDB files from RCSB Server:


```julia
downloadallobsoletepdb(;obsolete_dir="/path/to/obsolete/directory/", file_format=mmCIF, overwrite=false)
```

The `file_format` specfies the format in which the PDB files are downloaded; Options <PDB, PDBXML, mmCIF or MMTF>.

If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones.


## Maintaining a Local Copy of the entire RCSB PDB Database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantry - no capitals on local copy.


BioJulia provides the feature to download and update your local copy of the entire RCSB PDB Database in your preferred file format.

To download the entire RCSB PDB database in your preferred file format:

```julia
julia> struc = read(filepath_1EN2, PDB)
downloadentirepdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF, overwrite=false)
```

The keyword arguments are described below:

| Keyword Argument | Description |
| :----------------------------- | :------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB files are downloaded |
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file |

To update your local PDB directory based on the weekly status list of new, modified and obsolete PDB files from RCSB Server:

```julia
updatelocalpdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF)
```

The `file_format` specifies the format of the PDB files present in the local PDB directory. Obsolete PDB files are stored in the autogenerated `obsolete` directory inside the specified local PDB directory.

Run the `downloadentirepdb()` once and setup a CRON job or similar to run `updatelocalpdb()` once in every week to keep the local PDB directory up to date with RCSB Server.


## Parsing PDB files

To parse a existing PDB file into a Structure-Model-Chain-Residue-Atom framework:

```julia
julia> struc = readpdb("1EN2", pdb_dir="path/to/pdb/directory")
Bio.Structure.ProteinStructure
Name - 1EN2.pdb
Number of models - 1
Expand All @@ -40,6 +119,49 @@ Number of hydrogens - 0
Number of disordered atoms - 27
```

Various options can be set through optional keyword arguments when parsing a PDB file as follows:

| Keyword Argument | Description |
| :------------------------------------------- | :------------------------------------------------------------------------------ |
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read |
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |

To download and parse a PDB file into a Structure-Model-Chain-Residue-Atom framework

```julia
julia> struc = retrievepdb("1ALW", pdb_dir="path/to/pdb/directory")
INFO: Downloading PDB : 1ALW
INFO: Parsing the PDB file...
Bio.Structure.ProteinStructure
Name - 1ALW.pdb
Number of models - 1
Chain(s) - AB
Number of residues - 346
Number of point mutations - 0
Number of other molecules - 10
Number of water molecules - 104
Number of atoms - 2790
Number of hydrogens - 0
Number of disordered atoms - 0
```

Various options can be set through optional keyword arguments when downloading and parsing a PDB file as follows:

| Keyword Argument | Description |
| :--------------------------------------------| :--------------------------------------------------------------------------------------------------------------- |
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read |
| `obsolete::Bool=false` | If set `true`, PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir`|
| `overwrite::Bool=false` | if set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading PDB file if exists |
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file |
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" |
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed |
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed |
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed |

The elements of `struc` can be accessed as follows:

| Command | Returns | Return type |
Expand Down Expand Up @@ -244,6 +366,18 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51]))
```


## RCSB PDB Metadata

Few functions that may help fetching information about the RCSB PDB Database.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantry - "There are a few functions that may help" etc.


| Function | Returns | Return type |
| :----------------- | :------------------------------------------------------------------------------ | :------------------------------------------------------- |
| `pdbentrylist` | List of all PDB entries from RCSB Server | `Array{String,1}` |
| `pdbstatuslist` | List of PDB entries from specified RCSB weekly status list URL | `Array{String,1}` |
| `pdbrecentchanges` | Added, modified and obsolete PDB lists from the recent RCSB weekly status files | `Tuple{Array{String,1},Array{String,1},Array{String,1}}` |
| `pdbobsoletelist` | List of all obsolete PDB entries in the RCSB server | `Array{String,1}` |


## Examples

A few further examples of `Bio.Structure` usage are given below.
Expand Down
Loading