[WIP] PDB file handling enhancements #483

joelselvaraj · 2017-08-03T22:36:57Z

I have been working on few enhancements over handling PDB files based on BioPython library to be true.
Such as:

Downloading PDB file to a specific directory instead of only current directory
Skip or Overwrite if the PDB file already exists in the directory
Downloading Entire PDB database
Downloading multiple PDB files by passing a list of PDB IDs
Downloading all obsolete PDB entries from RCSB Server
Updating a PDB directory based on weekly status file from RCSB Server
Retrieve a PDB file (i.e. it will download the PDB file if it does not exists and read them)
Obsolete PDB file handling (A directory named "obsolete" is automatically created inside the specified "pdb_dir" to store and handle them)

>Also added new Base.read() for reading PDB files with PDB ID and directory. >Bug Fixes

jgreener64 · 2017-08-03T22:52:02Z

Great, thanks for making this PR. I will have a look at this tomorrow.

joelselvaraj · 2017-08-03T23:06:28Z

We have few things to discuss.

1. Whether the out_filepath argument in downloadpdb() is required?
Because removing it will give an uniformity in the code and will be easy for users. It would be better to keep the PDB ID as file name so that it will be easy to handle it. The user can specify a different structure_name when reading the file to differentiate the file as he wishes.

2. Should we write test cases?
The code has grown little complex than before. Especially there are lots of options and different combinations. This is may lead to unexpected bugs.

3. Should getstatuslist() function be exported in the module?
As of now its just a helper function for getrecentchanges() function.

jgreener64 · 2017-08-04T09:57:57Z

src/structure/pdb.jl

-function downloadpdb(pdbid::AbstractString,
-                    out_filepath::AbstractString="$pdbid.pdb";
-                    ba_number::Integer=0)
+function getallpdbentries()


In general we avoid get in function names. The name should describe what is returned, perhaps pdbentrylist?

That should be fine. I will change it accordingly.

jgreener64 · 2017-08-04T09:58:41Z

src/structure/pdb.jl

+"""
+Returns a list of pdb codes in the weekly pdb status file from the given URL. 
+"""
+function getstatuslist(url::AbstractString)


The function name should reference the PDB, perhaps pdbstatuslist?

What URL is the user supposed to use here? If there is a certain one perhaps put it in the docstring?

Actually see comment below.

The URL is supposed be weekly status file from RCSB Server. I will add it in the docstrings.

jgreener64 · 2017-08-04T09:59:00Z

src/structure/pdb.jl

+"""
+Returns three lists of the newest weekly files (added,modified,obsolete) from RCSB PDB Server
+"""
+function getrecentchanges()


Perhaps pdbrecentchanges?

jgreener64 · 2017-08-04T09:59:24Z

src/structure/pdb.jl

+"""
+Returns a list of all obsolete entries ever in the RCSB PDB server
+"""
+function getallobsolete()


Perhaps pdbobsoletelist?

jgreener64 · 2017-08-04T10:02:37Z

src/structure/pdb.jl

+if the keyword argument `obsolete` is set `true`, the PDB files are downloaded into the obsolete directory inside `pdb_dir`;
+if the keyword argument `overwrite` is set `true`, then it will overwrite the PDB file if it exists in the `pdb_dir`;
+"""
+function downloadmultiplepdb(pdbidlist::AbstractArray{String,1}; pdb_dir::AbstractString=pwd(), obsolete::Bool=false, overwrite::Bool=false)


This name seems a bit long. We could even make this another method of downloadpdb, so that function can take either a string or a list.

I think this should be pdbidlist::Array{String,1} for the first argument.

Nice. I will override the downloadpdb for downloading multiple PDB files. Regarding pdbidlist::Array{String,1} I was getting an error, defining it as AbstractArray was only working. I will look into it and update accordingly.

jgreener64 · 2017-08-04T10:09:40Z

src/structure/pdb.jl

@@ -107,6 +321,17 @@ function Base.read(filepath::AbstractString,
    end
 end

+# Read PDB file based on PDB ID and pdb_dir. 
+function Base.read(pdbid::AbstractString,


This method clashes with the above read method. I wonder whether if it is necessary, but if it goes in then the directory should be a full argument (so args are pdbid, directory, PDB).

Actually I just added it because that will be in a uniform format as the rest. If users wants to download first and then read it later, they will have uniform way of calling the functions. We may change the arguments as you mentioned.

jgreener64 · 2017-08-04T10:10:31Z

src/structure/pdb.jl

+
+
+"""
+Download a PDB file or biological assembly from the RCSB PDB server. 


Use line breaks in the doc strings to keep line length to 80.

K. I had this doubt. I will keep the line length to 80 in docstrings.

jgreener64 · 2017-08-04T10:13:06Z

src/structure/pdb.jl

+    pdbidlist = Array{String,1}()
+    info("Fetching list of all PDB Entries from RCSB PDB Server...")
+    download("ftp://ftp.wwpdb.org/pub/pdb/derived_data/index/entries.idx","entries.idx")
+    open("entries.idx") do input


Given that we make then delete the file here, I wonder if we should use a temporary filepath? If the user has entries.idx in their directory here it is getting overwritten without warning. You can use tempname() to save a name to a variable, write to this then delete it (I think there are examples of this in the PDB tests).

Nice. I didn't know about it. thank you.

jgreener64 · 2017-08-04T10:13:42Z

src/structure/pdb.jl

+
+
+"""
+Returns a list of pdb codes in the weekly pdb status file from the given URL. 


Pedantry - PDB should be capital in docstrings.

jgreener64 · 2017-08-04T10:16:25Z

src/structure/pdb.jl

+if the keyword argument `obsolete` is set `true`, the PDB file is downloaded into the obsolete directory inside `pdb_dir`;
+if the keyword argument `overwrite` is set `true`, then it will overwrite the PDB file if it exists in the `pdb_dir`;
+"""
+function retrievepdb(pdbid::AbstractString;


This is a nice function to have, difficult to find a descriptive name for it though. A descriptive one would be downloadpdbandread but that is way too long. Maybe this name is okay.

Ya. I was so confused. Then finally decided keep it as in BioPython. So if BioPython users are using this, they may find it easy. As mentioned downloadpdbandread will be too long. We will keep it as retrievepdb as of now. Any other good function name is welcomed.

One possibility is readfrompdb.

jgreener64 · 2017-08-04T10:19:18Z

So overall looks good, thanks for adding this useful feature. There are a couple more things but the above is most of my issues. For reference the Biopython implementation is here: https://github.com/biopython/biopython/blob/master/Bio/PDB/PDBList.py

In answer to your specific questions:

jgreener64 · 2017-08-04T10:24:43Z

out_filepath in my initial implementation was generally meant for the user to specify a directory rather than filename. Since you are adding the extra directory argument you can remove out_filepath for uniformity of filenames.
Yes, there should be some tests. These will rely on an internet connection but I think that is okay. Obviously we cannot download the whole PDB in the tests but certainly on the other functions.
Maybe don't export it, if we do then there should be information on its use in the docstring.

jgreener64 · 2017-08-04T10:27:13Z

There is one more thing that could be added now you are looking at this code, namely download of mmCIF and MMTF files from the PDB. Since mmCIF is now the standard, this is an important feature (I am writing an mmCIF parser for Bio.Structure now too). Would you be okay to implement this as a file_format (or similar) keyword argument to downloadpdb?

joelselvaraj · 2017-08-04T11:46:57Z

@jgreener64 Thank you so much for your review. I will update the code accordingly in the future commits. Adding option file_format in downloadpdb will be useful. Nice that you are working on mmCIF parser. We can discuss further changes as the code grows.

jgreener64 · 2017-08-04T12:05:16Z

src/structure/pdb.jl

+Returns a list of pdb codes in the weekly pdb status file from the given URL. 
+"""
+function getstatuslist(url::AbstractString)
+    statuslist = Array{String,1}()


I think statuslist = String[] is better, it seems to allocate less (this applies to all similar ones below too).

K. I didnt know that

jgreener64 · 2017-08-04T12:07:23Z

src/structure/pdb.jl

+            push!(failedlist,pdbid)
+        end
+    end
+    warn(length(failedlist)," PDB file failed to download : ", failedlist)


Only do this if length(failedlist) > 0.

Yes. I missed it

joelselvaraj · 2017-08-06T19:30:03Z

PDB, XML, mmCIF files are now downloaded in compressed format to reduce internet usage.
PDB files can now be downloaded and updated in MMTF format also.

jgreener64

Thanks for the great changes @joels94 ! In my view this is near completion. I have made some more comments. A few more things before merge:

Write some tests (not required for downloadentirepdb, updatelocalpdb or downloadallobsoletepdb I would say).
Add a section to the docs (http://biojulia.net/Bio.jl/latest/man/structure/) by editing https://github.com/BioJulia/Bio.jl/blob/master/docs/src/man/structure.md . Perhaps have a table with the functions similar to other sections.
It would be good to get feedback from someone else, e.g. @bicycle1885 ?

jgreener64 · 2017-08-07T12:43:35Z

src/structure/pdb.jl

-Download a Protein Data Bank (PDB) file or biological assembly from the RCSB
-PDB. By default downloads the PDB file; if `ba_number` is set the biological
-assembly with that number will be downloaded.
+Returns a list of all PDB entries from RCSB PDB server


Pedantry - full stop at the end (and in some other docstrings).

jgreener64 · 2017-08-07T12:44:44Z

src/structure/pdb.jl

+            end
+            linecount +=1
+        end
+    end


You do need to explicitly remove the temp file I think - tempname() gets you an available name then you download to it.

This applies elsewhere you have used temp files too.

jgreener64 · 2017-08-07T12:45:44Z

src/structure/pdb.jl

+from RCSB PDB Server
+"""
+function pdbrecentchanges()
+    addedlist = String[]


These lines are not required as pdbstatuslist returns the array.

jgreener64 · 2017-08-07T12:46:37Z

src/structure/pdb.jl

+if the keyword argument `obsolete` is set `true`, the PDB file is downloaded
+into the obsolete directory inside `pdb_dir`;
+if the keyword argument `overwrite` is set `true`, then it will overwrite the
+PDB file if it exists in the `pdb_dir`;


End with full stop.

jgreener64 · 2017-08-07T12:47:39Z

src/structure/pdb.jl

+"""
+function downloadpdb(pdbid::AbstractString; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false, ba_number::Integer=0)
+     # A Dict mapping the type to their file extensions
+    pdbextension = Dict{Type,String}( PDB => ".pdb", PDBXML => ".xml", mmCIF => ".cif", MMTF => ".mmtf")


This Dict is defined in a couple of places so can be taken out of the functions and defined as const Dict near the top.

jgreener64 · 2017-08-07T12:53:22Z

src/structure/pdb.jl

+        # check if PDB file is downloaded and extracted properly
+        if !ispath(pdbpath) || filesize(pdbpath)==0
+            # If the file size is 0, its deleted. force=true ensures error is not thrown when file does not exists
+            rm(pdbpath, force=true)


I would avoid forcing deletion and check again if the file exists here.

jgreener64 · 2017-08-07T12:57:09Z

src/structure/pdb.jl

+if the keyword argument `overwrite` is set `true`, then it will overwrite the
+PDB file if it exists in the `pdb_dir`;
+"""
+function downloadpdb(pdbidlist::AbstractArray{String,1}; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false)


I think this should be pdbidlist::Array{String,1}, did you say this caused a problem?

jgreener64 · 2017-08-07T13:00:52Z

src/structure/pdb.jl

+            pdb_dir::AbstractString=pwd(),
+            ba_number::Integer=0,
+            structure_name::AbstractString="$pdbid.pdb",
+            remove_disorder::Bool=false,


These 3 arguments can be replaced by kwargs... and then pass that to the inner read function (this means the defaults for remove_disorder etc. are defined in one place only).

jgreener64 · 2017-08-07T13:01:45Z

src/structure/pdb.jl

+            read_het_atoms::Bool=true)
+    filepath = joinpath(pdb_dir,"$pdbid.pdb")
+    pdbpath = ba_number == 0 ? filepath : filepath*"$ba_number"
+    open(pdbpath, "r") do input


This open is not actually required as calling read with the filepath is already defined below.

jgreener64 · 2017-08-07T13:02:55Z

src/structure/pdb.jl

+        filepath = joinpath(pdb_dir,"$pdbid.pdb")
+    end
+    pdbpath = ba_number == 0 ? filepath : filepath*"$ba_number"
+    open(pdbpath, "r") do input


This open is not actually required as calling read with the filepath is already defined below.

codecov-io · 2017-08-07T22:31:50Z

Codecov Report

Merging #483 into master will increase coverage by 0.6%.
The diff coverage is 80%.

@@            Coverage Diff            @@
##           master     #483     +/-   ##
=========================================
+ Coverage   70.34%   70.94%   +0.6%     
=========================================
  Files          34       34             
  Lines        2421     2537    +116     
=========================================
+ Hits         1703     1800     +97     
- Misses        718      737     +19

Impacted Files	Coverage Δ
src/structure/pdb.jl	`89.22% <80%> (-5.61%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e08f2a3...22231b3. Read the comment docs.

jgreener64

I have tried out some of the functions and they appear okay. I have left a few more small comments. Thanks for addressing everything. I'll look again when tests and docs are up 👍

jgreener64 · 2017-08-08T12:25:42Z

src/structure/pdb.jl

+if the keyword argument `overwrite` is set `true`, then it will overwrite the
+PDB file if it exists in the `pdb_dir`.
+"""
+function downloadpdb(pdbidlist::Array{String,1}; pdb_dir::AbstractString=pwd(), file_format::Type=PDB, obsolete::Bool=false, overwrite::Bool=false)


The last 4 arguments here could just be kwargs... so their defaults are defined in one place above.

jgreener64 · 2017-08-08T12:26:47Z

src/structure/pdb.jl

+        if ispath(archivefilepath) && filesize(archivefilepath) > 0 && file_format != MMTF           
+            input = open(archivefilepath) |> ZlibInflateInputStream
+            open(pdbpath,"w") do output
+                for line in eachline(input)


This prints each newline character twice so the files alternate line/blank line. Could use print rather than println.

I think this is also a system specific issue? Because when using println() the file is generated properly and I can read them. But if I use print(), all lines are getting concatenated and I cannot read the file. I have attached the two files for your reference. (Renamed the extensions as .txt as I was not able upload in .pdb format). Let me know what you find.

1ENT_print().txt
1ENT_println().txt

Thanks for this. It seems to be a Julia 0.5/0.6 issue, see the first breaking change for Julia v0.6.0 here: https://github.com/JuliaLang/julia/blob/master/NEWS.md .

In 0.5 the line break is not removed by eachline, in 0.6 it is. In order to be compatible with both I think we can use eachline(..., chomp=false) and use print inside.

Ok. Guess, I should keep an eye on the release notes as Julia changes a lot in each version. Thank you. I will update accordingly.

Actually, I just tested that on 0.5 and eachline can't take the chomp argument. It should be

for line in eachline(input) println(output, chomp(line)) end

Sorry about that.

jgreener64 · 2017-08-08T12:28:29Z

src/structure/pdb.jl

+            throw(ErrorException("Error downloading PDB : $pdbid"))
+        end
+    end
+    rm(archivefilepath)


If the condition is satisfied on line 180 then this will error as the file is not created. Either check the file exists here or move it up in the logic to where you know it exists.

I don't think so. The archivefilepath exists independent of the if condition in line 180. Its because, archivefilepath=tempname() in line 172 actually creates an empty temporary file instead of just the path.

Interesting, it doesn't do that on my machine. I wonder if that is system specific. For example if I attempt to overwrite when overwrite is false I get

INFO: PDB Exists : 1AKE ERROR: unlink: no such file or directory (ENOENT) in unlink(::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? in #rm#7(::Bool, ::Bool, ::Function, ::String) at /Applications/Julia-0.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? in #downloadpdb#1(::String, ::Type{T}, ::Bool, ::Bool, ::Int64, ::Function, ::String) at ./REPL[1]:61 in (::#kw##downloadpdb)(::Array{Any,1}, ::#downloadpdb, ::String) at ./<missing>:0

Ok. Can you check by trying ispath(tempname())? For me i m getting true in Windows 10

I get false in Mac OSX, Julia 0.6. Searching this on Julia turns up JuliaLang/julia#9053 which addresses this.

I guess check it exists and if so remove, which will work either way.

jgreener64 · 2017-08-08T12:30:04Z

src/structure/pdb.jl

    else
-        # Will download error page if ba_number is too high
-        download("http://www.rcsb.org/pdb/files/$pdbid.pdb$ba_number", out_filepath)
+        pdbpath = joinpath(pdb_dir,"$pdbid"*pdbextension[file_format]*"$ba_number")


Thinking a bit more about it, the biological number should not change the extension of the file. Ideally it would change the filename, e.g. 1ABC_ba1.pdb or something like that rather than 1ABC.pdb1.

bicycle1885 · 2017-08-09T02:36:43Z

src/structure/pdb.jl

+                # The first 4 characters in the line is the PDB ID
+                pdbid = uppercase(line[1:4])
+                # Check PDB ID is 4 characters long and only consits of alphanumeric characters
+                if length(pdbid) != 4 || ismatch(r"[^a-zA-Z0-9]", pdbid)


r"^[a-zA-Z0-9]{4}$" will check the length as well.

bicycle1885 · 2017-08-09T02:40:06Z

src/structure/pdb.jl

+            linecount +=1
+        end
+    end
+    rm(tempfilepath)


rm(tempfilepath, force=true) would be better because it works even when the file does not exist.

Also, to make sure that the temporary file will be deleted, you need to use the try-catch-finally statement.

Wouldn't try-finally be sufficient? That way we may be able to know what error occurs. Which might be helpful in the initial stages to debug the code.

bicycle1885 · 2017-08-09T02:50:10Z

src/structure/pdb.jl

+                # MMTF is downloaded in uncompressed form, thus directly stored in pdbpath
+                download("http://mmtf.rcsb.org/v1.0/full/$pdbid", pdbpath)
+            else
+                warn("Invalid PDB file format!")


This should throw an ArgumentError exception.

bicycle1885 · 2017-08-09T02:50:26Z

src/structure/pdb.jl

+            elseif file_format == mmCIF
+                download("http://files.rcsb.org/download/$pdbid-assembly$ba_number"*pdbextension[file_format]*".gz", archivefilepath)
+            else
+                warn("Biological Assembly is available only in PDB and mmCIF formats!")


ArgumentError.

bicycle1885 · 2017-08-09T02:52:22Z

src/structure/pdb.jl

+
+
+"""
+Download a PDB file or biological assembly from the RCSB PDB server. 


The first line should be the signature of the function (see: https://docs.julialang.org/en/stable/manual/documentation).

bicycle1885 · 2017-08-09T02:52:58Z

src/structure/pdb.jl

+            end
+        end
+        # Verify if the compressed PDB file is downloaded properly and extract it. For MMTF no extraction is needed
+        if ispath(archivefilepath) && filesize(archivefilepath) > 0 && file_format != MMTF           


Use isfile, not ispath.

bicycle1885 · 2017-08-09T02:54:28Z

src/structure/pdb.jl

+function downloadentirepdb(;pdb_dir::AbstractString=pwd(), file_format::Type=PDB, overwrite::Bool=false)
+    # Get the list of all pdb entries from RCSB PDB Server using getallpdbentries() and downloads them
+    pdblist = pdbentrylist()
+    info("About to download "*string(length(pdblist))*" PDB files. Make sure to have enough disk space and time!")


Why don't you use string interpolation? "... download $(length(pdblist)) PDB files..."

Decompose Bio.Align and Bio.Intervals (BioJulia#482)

joelselvaraj · 2017-08-12T16:29:54Z

I have written the test cases. I have also made few changes to the code. Kindly take a look at it and let me know if any changes are required.

joelselvaraj · 2017-08-13T16:29:45Z

TO DO

Update docstrings
Write test cases
Add documentation

Finally, completed writing the documentation. Let me if any changes are required before merging the code.

jgreener64

So I am happy with the code now, thanks for making those changes. The tests look good too.

I think the doc changes need a bit of re-ordering though. The content is good but I think that the first few things the user reads should be how to download a PDB file and read it in the manner consistent with the other BioJulia read interfaces, i.e. read("path.pdb", PDB).

So I suggest keeping the start of the docs the same, apart from maybe changing "To download a PDB file" to "To download a PDB file - see below for more options".

Then after the struc = read(filepath_1EN2, PDB) box have a line or box talking about retrievepdb as a shortcut. The other options for downloadpdb, maintaining a local PDB copy etc. can go in the "RCSB PDB Metadata" section at the bottom, which could be renamed to "RCSB PDB utility functions".

jgreener64 · 2017-08-14T10:50:42Z

src/structure/pdb.jl

+            info("Downloading PDB : $pdbid")
+            if ba_number == 0            
+                if file_format == PDB || file_format == PDBXML || file_format == mmCIF
+                    download("http://files.rcsb.org/download/$pdbid"*pdbextension[file_format]*".gz", archivefilepath)


May as well use string interpolation as "http://files.rcsb.org/download/$(pdbid)$(pdbextension[file_format]).gz" here (and throughout).

jgreener64 · 2017-08-14T10:54:33Z

docs/src/man/structure.md

+| Keyword Argument               | Description                                                                                                           |
+| :----------------------------- | :-------------------------------------------------------------------------------------------------------------------- |
+| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB file is downloaded                                                                     |
+| `file_format::Type=PDB`        | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF>                                                        |


I think "Options are PDB, PDBXML, mmCIF or MMTF" is more readable.

jgreener64 · 2017-08-14T10:55:24Z

docs/src/man/structure.md

+If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones.
+
+
+## Maintaining a Local Copy of the entire RCSB PDB Database


Pedantry - no capitals on local copy.

jgreener64 · 2017-08-14T10:55:36Z

docs/src/man/structure.md

+| Keyword Argument               | Description                                                                                              |
+| :----------------------------- | :------------------------------------------------------------------------------------------------------- |
+| `pdb_dir::AbstractString=pwd()`| The directory to which the PDB files are downloaded                                                      |
+| `file_format::Type=PDB`        | The format of the PDB file. Options <PDB, PDBXML, mmCIF, MMTF>                                           |


Same as above.

jgreener64 · 2017-08-14T10:57:20Z

docs/src/man/structure.md

@@ -244,6 +366,18 @@ julia> rad2deg(psiangle(struc['A'][50], struc['A'][51]))
 ```


+## RCSB PDB Metadata
+
+Few functions that may help fetching information about the RCSB PDB Database.


Pedantry - "There are a few functions that may help" etc.

jgreener64 · 2017-08-14T15:51:33Z

On further thought, should the mmCIF type be called MMCIF? I think the Julia convention of capital types might supersede the technical name, and also this would bring it in line with MMTF where the MMs mean the same thing.

joelselvaraj · 2017-08-15T15:18:21Z

Updated docs and code as discussed. Let me if further changes are required.

jgreener64

Thanks for making the changes. I am happy with the code and tests and would suggest a few more small changes to the docs before merge. Hopefully they won't take long.

Sorry to spend a while on the docs but I think it's very important to give a concise and useful overview of the module.

jgreener64 · 2017-08-15T19:57:19Z

docs/src/man/structure.md

+    struc = readpdb("1EN2", pdb_dir="/path/to/pdb/directory")
+    ```
+
+    **Note:** This requires the PDB file name to be uppercase PDB ID. Example : "1EN2.pdb"


Maybe remove this line and change the above line to 'To parse a PDB file by specifying the PDB ID and PDB directory into a Structure-Model-Chain-Residue-Atom framework (file name must be in upper case, e.g. "1EN2.pdb")'.

jgreener64 · 2017-08-15T19:58:11Z

docs/src/man/structure.md

+    Number of disordered atoms  -  27
+    ```
+
+    Various options can be set through optional keyword arguments when parsing a PDB file as follows:


This is indented - to be in line with the rest of the file I don't think any indentation is needed, even for code.

This applies to other lines too.

jgreener64 · 2017-08-15T19:59:11Z

docs/src/man/structure.md

@@ -40,6 +41,8 @@ Number of hydrogens         -  0
 Number of disordered atoms  -  27
 ```

+**Note** : Refer [Downloading PDB files](#downloading-pdb-files) and [Reading PDB files](#reading-pdb-files) sections for more options.


"Refer to..."

jgreener64 · 2017-08-15T20:03:19Z

docs/src/man/structure.md

+
+    **Note:** This requires the PDB file name to be uppercase PDB ID. Example : "1EN2.pdb"
+
+    The function `readpdb` provides an uniform way to download and read PDB files. For example:


I'm not sure lines 309-337 are required. Perhaps we could just say "The function readpdb provides an uniform way to download and read PDB files, for example readpdb("1EN2",pdb_dir="/path/to/pdb/directory"). The same keyword arguments are taken as read above, plus pdb_dir and ba_number."

jgreener64 · 2017-08-15T20:06:19Z

docs/src/man/structure.md

+| `pdbobsoletelist`  | List of all obsolete PDB entries in the RCSB server                             | `Array{String,1}`                                        |
+
+
+## Maintaining a local copy of the entire RCSB PDB Database


This section title is quite long, and the section mainly refers to the above section. Perhaps remove this section and put the sentence "Run the downloadentirepdb function..." into the above section.

jgreener64 · 2017-08-15T20:12:15Z

docs/src/man/structure.md

@@ -20,13 +20,14 @@ The `Bio.Structure` module provides functionality to manipulate macromolecular s
 To download a PDB file:


Perhaps change this section title from "Parsing PDB files" to "Basics" as this makes more sense in the new arrangement.

jgreener64 · 2017-08-15T20:17:28Z

docs/src/man/structure.md

 downloadpdb("1EN2")
 ```

 To parse a PDB file into a Structure-Model-Chain-Residue-Atom framework:

 ```julia
-julia> struc = read(filepath_1EN2, PDB)
+julia> struc = read("/path/to/pdb/file.pdb", PDB)


For reference, this was initially set as filepath_1EN2 to pass the doctests but I don't think they are being tested here so this can stay as "/path/to/pdb/file.pdb".

joelselvaraj · 2017-08-15T21:00:41Z

I have updated the docs. Let me know if any changes are required.

jgreener64 · 2017-08-15T21:33:34Z

Great, I'm happy with this to go in. Thanks for all the work. I will wait until tomorrow in case anyone else has any comments, then I'll merge.

Joel S and others added 17 commits July 9, 2017 14:37

Added option to download pdb to another directory

f7afe95

Added option to download multiple pdb files

f393893

Added option to get all PDB entries available in RCSB PDB

aa1a5f5

Added option to download entire PDB files available in RCSB

f67339c

Added option to get status of new and modified pdb entries

61e5a56

Updated docstrings

4c4da97

Added option to overwrite existing PDB files when downloading

9319cdb

Updated docstrings

e729bf1

Added option to update PDB files based on weekly status list

90208c5

Added option to get all obsolete entries in PDB

3d9ac64

PDB directory automatically created if it does not exist

c669711

Added option to download all obsolete pdb files from RCSB PDB server

730dbf1

Updated docstrings

545f110

Added obsolete pdb file handling

b90f905

minor enhancements on print statements

574c896

Updated comments and docstrings

c671663

Added function retrievepdb to download and read PDB file

f0d2878

>Also added new Base.read() for reading PDB files with PDB ID and directory. >Bug Fixes

joelselvaraj changed the title ~~PDB file handling enhancements~~ [WIP] PDB file handling enhancements Aug 3, 2017

jgreener64 reviewed Aug 4, 2017

View reviewed changes

Joel S added 3 commits August 5, 2017 04:32

Code Refactoring and recommended changes

18a7043

Overrided downloadpdb function for downloading multiple PDB files

01c8d81

Added - download and update pdb files in PDB, XML and mmCIF formats

e5f52ff

Added - Download PDB,XML,mmCIF compressed and MMTF uncompressed format

a120b81

jgreener64 reviewed Aug 7, 2017

View reviewed changes

Recommended code fixes

01873ad

jgreener64 reviewed Aug 8, 2017

View reviewed changes

Minor Bug fixes and recommended changes

37f31be

bicycle1885 reviewed Aug 9, 2017

View reviewed changes

Joel S added 10 commits August 9, 2017 12:42

Exception handling improvements and Minor fixes

95e95d6

PDB Extraction fix-compatible for julia 0.5 & 0.6

928f769

Updated Docstrings and Minor code changes

9a25cf7

Merge pull request #3 from BioJulia/master

958afb4

Decompose Bio.Align and Bio.Intervals (BioJulia#482)

simple ci test

ae0ae4c

Test cases and Bug fixes

03f80c6

Bug fixes and Test case updates

b90e0e6

Test - Fix

a460de1

Small test case added

8a292a9

Merge branch 'pdb_test' into pdb_enhancements

a3ff2ae

Added Documentation

1a97bf3

jgreener64 reviewed Aug 14, 2017

View reviewed changes

Updated docs and Minor code changes

06acee3

jgreener64 reviewed Aug 15, 2017

View reviewed changes

Minor document corrections

22231b3

jgreener64 merged commit 1c89218 into BioJulia:master Aug 16, 2017



		"""
		Download a PDB file or biological assembly from the RCSB PDB server.



		"""
		Returns a list of pdb codes in the weekly pdb status file from the given URL.

		If `overwrite=true`, the existing PDB files in obsolete directory will be overwritten by the newly downloaded ones.


		## Maintaining a Local Copy of the entire RCSB PDB Database


		Note: This requires the PDB file name to be uppercase PDB ID. Example : "1EN2.pdb"

		The function `readpdb` provides an uniform way to download and read PDB files. For example:

		\| `pdbobsoletelist` \| List of all obsolete PDB entries in the RCSB server \| `Array{String,1}` \|


		## Maintaining a local copy of the entire RCSB PDB Database

		@@ -20,13 +20,14 @@ The `Bio.Structure` module provides functionality to manipulate macromolecular s
		To download a PDB file:

[WIP] PDB file handling enhancements #483

[WIP] PDB file handling enhancements #483

Conversation

joelselvaraj commented Aug 3, 2017 • edited Loading

jgreener64 commented Aug 3, 2017

joelselvaraj commented Aug 3, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgreener64 Aug 4, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jgreener64 commented Aug 4, 2017

jgreener64 commented Aug 4, 2017

jgreener64 commented Aug 4, 2017

joelselvaraj commented Aug 4, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelselvaraj commented Aug 6, 2017

jgreener64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Aug 7, 2017 • edited Loading

Codecov Report

jgreener64 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelselvaraj Aug 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joelselvaraj commented Aug 3, 2017 •

edited

Loading

joelselvaraj commented Aug 3, 2017 •

edited

Loading

jgreener64 Aug 4, 2017 •

edited

Loading

codecov-io commented Aug 7, 2017 •

edited

Loading

joelselvaraj Aug 8, 2017 •

edited

Loading

joelselvaraj commented Aug 12, 2017 •

edited

Loading

joelselvaraj commented Aug 13, 2017 •

edited

Loading

jgreener64 left a comment •

edited

Loading