This repository has been archived by the owner on Aug 26, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 63
[WIP] PDB file handling enhancements #483
Merged
Merged
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
f7afe95
Added option to download pdb to another directory
f393893
Added option to download multiple pdb files
aa1a5f5
Added option to get all PDB entries available in RCSB PDB
f67339c
Added option to download entire PDB files available in RCSB
61e5a56
Added option to get status of new and modified pdb entries
4c4da97
Updated docstrings
9319cdb
Added option to overwrite existing PDB files when downloading
e729bf1
Updated docstrings
90208c5
Added option to update PDB files based on weekly status list
3d9ac64
Added option to get all obsolete entries in PDB
c669711
PDB directory automatically created if it does not exist
730dbf1
Added option to download all obsolete pdb files from RCSB PDB server
545f110
Updated docstrings
b90f905
Added obsolete pdb file handling
574c896
minor enhancements on print statements
c671663
Updated comments and docstrings
f0d2878
Added function retrievepdb to download and read PDB file
18a7043
Code Refactoring and recommended changes
01c8d81
Overrided downloadpdb function for downloading multiple PDB files
e5f52ff
Added - download and update pdb files in PDB, XML and mmCIF formats
a120b81
Added - Download PDB,XML,mmCIF compressed and MMTF uncompressed format
01873ad
Recommended code fixes
37f31be
Minor Bug fixes and recommended changes
95e95d6
Exception handling improvements and Minor fixes
928f769
PDB Extraction fix-compatible for julia 0.5 & 0.6
9a25cf7
Updated Docstrings and Minor code changes
958afb4
Merge pull request #3 from BioJulia/master
ae0ae4c
simple ci test
03f80c6
Test cases and Bug fixes
b90e0e6
Bug fixes and Test case updates
a460de1
Test - Fix
8a292a9
Small test case added
a3ff2ae
Merge branch 'pdb_test' into pdb_enhancements
1a97bf3
Added Documentation
06acee3
Updated docs and Minor code changes
22231b3
Minor document corrections
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,18 +15,97 @@ end | |
The `Bio.Structure` module provides functionality to manipulate macromolecular structures, and in particular to read and write [Protein Data Bank](http://www.rcsb.org/pdb/home/home.do) (PDB) files. It is designed to be used for standard structural analysis tasks, as well as acting as a platform on which others can build to create more specific tools. It compares favourably in terms of performance to other PDB parsers - see some [benchmarks](https://github.com/jgreener64/pdb-benchmarks). | ||
|
||
|
||
## Parsing PDB files | ||
## Downloading PDB files | ||
|
||
To download a PDB file: | ||
|
||
```julia | ||
# Stored in the current working directory by default | ||
downloadpdb("1EN2") | ||
``` | ||
|
||
To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework: | ||
To download a PDB file to a specify directory: | ||
|
||
```julia | ||
downloadpdb("1EN2", pdb_dir="path/to/pdb/directory/") | ||
``` | ||
|
||
To download multiple PDB files to a specify directory: | ||
|
||
```julia | ||
downloadpdb(["1EN2","1ALW","1AKE"], pdb_dir="path/to/pdb/directory/") | ||
``` | ||
|
||
To download a PDB file in PDB, XML, mmCIF or MMTF format: | ||
|
||
```julia | ||
# PDB file format | ||
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDB) | ||
# XML file format | ||
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=PDBXML) | ||
# mmCIF file format | ||
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=mmCIF) | ||
# MMTF file format | ||
downloadpdb("1ALW", pdb_dir="path/to/pdb/directory/", file_format=MMTF) | ||
``` | ||
|
||
Various options can be set through optional keyword arguments when downloading PDB files as follows: | ||
|
||
| Keyword Argument | Description | | ||
| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- | | ||
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded | | ||
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think "Options are PDB, PDBXML, mmCIF or MMTF" is more readable. |
||
| `obsolete::Bool=false` | If set `true`, the PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir` | | ||
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file | | ||
| `ba_number::Integer=0` | If set > 0, downloads the respective biological assembly; by default downloads the PDB file | | ||
|
||
To download all obsolete PDB files from RCSB Server: | ||
|
||
|
||
```julia | ||
downloadallobsoletepdb(;obsolete_dir="/path/to/obsolete/directory/", file_format=mmCIF, overwrite=false) | ||
``` | ||
|
||
The `file_format` specfies the format in which the PDB files are downloaded; Options <PDB, PDBXML, mmCIF or MMTF>. | ||
|
||
If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones. | ||
|
||
|
||
## Maintaining a Local Copy of the entire RCSB PDB Database | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pedantry - no capitals on local copy. |
||
|
||
BioJulia provides the feature to download and update your local copy of the entire RCSB PDB Database in your preferred file format. | ||
|
||
To download the entire RCSB PDB database in your preferred file format: | ||
|
||
```julia | ||
julia> struc = read(filepath_1EN2, PDB) | ||
downloadentirepdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF, overwrite=false) | ||
``` | ||
|
||
The keyword arguments are described below: | ||
|
||
| Keyword Argument | Description | | ||
| :----------------------------- | :------------------------------------------------------------------------------------------------------- | | ||
| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB files are downloaded | | ||
| `file_format::Type=PDB` | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF> | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above. |
||
| `overwrite::Bool=false` | If set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading the PDB file | | ||
|
||
To update your local PDB directory based on the weekly status list of new, modified and obsolete PDB files from RCSB Server: | ||
|
||
```julia | ||
updatelocalpdb(pdb_dir="path/to/pdb/directory/", file_format=MMTF) | ||
``` | ||
|
||
The `file_format` specifies the format of the PDB files present in the local PDB directory. Obsolete PDB files are stored in the autogenerated `obsolete` directory inside the specified local PDB directory. | ||
|
||
Run the `downloadentirepdb()` once and setup a CRON job or similar to run `updatelocalpdb()` once in every week to keep the local PDB directory up to date with RCSB Server. | ||
|
||
|
||
## Parsing PDB files | ||
|
||
To parse a existing PDB file into a Structure-Model-Chain-Residue-Atom framework: | ||
|
||
```julia | ||
julia> struc = readpdb("1EN2", pdb_dir="path/to/pdb/directory") | ||
Bio.Structure.ProteinStructure | ||
Name - 1EN2.pdb | ||
Number of models - 1 | ||
|
@@ -40,6 +119,49 @@ Number of hydrogens - 0 | |
Number of disordered atoms - 27 | ||
``` | ||
|
||
Various options can be set through optional keyword arguments when parsing a PDB file as follows: | ||
|
||
| Keyword Argument | Description | | ||
| :------------------------------------------- | :------------------------------------------------------------------------------ | | ||
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read | | ||
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file | | ||
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" | | ||
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed | | ||
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed | | ||
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed | | ||
|
||
To download and parse a PDB file into a Structure-Model-Chain-Residue-Atom framework | ||
|
||
```julia | ||
julia> struc = retrievepdb("1ALW", pdb_dir="path/to/pdb/directory") | ||
INFO: Downloading PDB : 1ALW | ||
INFO: Parsing the PDB file... | ||
Bio.Structure.ProteinStructure | ||
Name - 1ALW.pdb | ||
Number of models - 1 | ||
Chain(s) - AB | ||
Number of residues - 346 | ||
Number of point mutations - 0 | ||
Number of other molecules - 10 | ||
Number of water molecules - 104 | ||
Number of atoms - 2790 | ||
Number of hydrogens - 0 | ||
Number of disordered atoms - 0 | ||
``` | ||
|
||
Various options can be set through optional keyword arguments when downloading and parsing a PDB file as follows: | ||
|
||
| Keyword Argument | Description | | ||
| :--------------------------------------------| :--------------------------------------------------------------------------------------------------------------- | | ||
| `pdb_dir::AbstractString=pwd()` | The directory from which the PDB file is read | | ||
| `obsolete::Bool=false` | If set `true`, PDB file is downloaded into the auto-generated "obsolete" directory inside the specified `pdb_dir`| | ||
| `overwrite::Bool=false` | if set `true`, overwrites the PDB file if exists in `pdb_dir`; by default skips downloading PDB file if exists | | ||
| `ba_number::Integer=0` | If set > 0 reads the respective biological assembly; by default reads PDB file | | ||
| `structure_name::AbstractString="$pdbid.pdb"`| The name of the PDB Structure read. Defaults to "< PDBID >.pdb" | | ||
| `remove_disorder::Bool=false` | If set true, then disordered atoms wont be parsed | | ||
| `read_std_atoms::Bool=true` | If set false, then standard ATOM records wont be parsed | | ||
| `read_het_atoms::Bool=true` | If set false, then HETATOM records wont be parsed | | ||
|
||
The elements of `struc` can be accessed as follows: | ||
|
||
| Command | Returns | Return type | | ||
|
@@ -244,6 +366,18 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51])) | |
``` | ||
|
||
|
||
## RCSB PDB Metadata | ||
|
||
Few functions that may help fetching information about the RCSB PDB Database. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pedantry - "There are a few functions that may help" etc. |
||
|
||
| Function | Returns | Return type | | ||
| :----------------- | :------------------------------------------------------------------------------ | :------------------------------------------------------- | | ||
| `pdbentrylist` | List of all PDB entries from RCSB Server | `Array{String,1}` | | ||
| `pdbstatuslist` | List of PDB entries from specified RCSB weekly status list URL | `Array{String,1}` | | ||
| `pdbrecentchanges` | Added, modified and obsolete PDB lists from the recent RCSB weekly status files | `Tuple{Array{String,1},Array{String,1},Array{String,1}}` | | ||
| `pdbobsoletelist` | List of all obsolete PDB entries in the RCSB server | `Array{String,1}` | | ||
|
||
|
||
## Examples | ||
|
||
A few further examples of `Bio.Structure` usage are given below. | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps change this section title from "Parsing PDB files" to "Basics" as this makes more sense in the new arrangement.