Add top-level function `from_PDB` for obtaining structure directly from PDB #810

dotsdl · 2016-03-31T01:40:18Z

It would be nice to be able to get a Universe directly from the PDB without first writing the file to disk. This could be done using perhaps a NamedStream so that we could do:

import MDAnalysis as mda

u = mda.Universe(mda.from_PDB('4AKE'))

where u.filename just returns the web address the NamedStream pulled from.

The text was updated successfully, but these errors were encountered:

jbarnoud · 2016-03-31T08:48:30Z

I would call the function fetch_PDB. I feel from is too generic, while fetch is most commonly used in that context.

kain88-de · 2016-03-31T09:24:11Z

from_PDB could also be confused that it generates a universe from a PDB. And I would rather go that the API is u = mda.fetch_PDB('4AKE').

where u.filename just returns the web address the NamedStream pulled from.

Sure this won't confuse analysis algorithms that assume u.filename is an existing file on hard drive? Wouldn't it be better to save the pdb in a temporary directory?

richardjgowers · 2016-03-31T09:33:48Z

U. Filename is always the topology, which is read once and never used
again. So pulling from Web matches this pattern too? Or should the filename
be a URL to make it clear?

On Thu, 31 Mar 2016 10:24 kain88-de, [email protected] wrote:

from_PDB could also be confused that it generates a universe from a PDB.
And I would rather go that the API is u = mda.fetch_PDB('4AKE').

where u.filename just returns the web address the NamedStream pulled
from. Sure this won't confuse analysis algorithms that assume u.filename
is an existing file on hard drive? Wouldn't it be better to save the pdb in
a temporary directory?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#810 (comment)

jbarnoud · 2016-03-31T09:36:17Z

Sure this won't confuse analysis algorithms that assume u.filename is an existing file on hard drive?

Most of the time, if the analysis does not assume it receives a stream, it does something wrong.

kain88-de · 2016-03-31T09:45:44Z

Most of the time, if the analysis does not assume it receives a stream, it does something wrong.

The old soon to be deprecated ContactAnalysis for example uses u.filename to create a new copy of the universe, this wouldn't work with that approach or our universe object needs to recognize urls. I would almost bet that any of the other older analysis classes does something similar.

Not to mention all the custom analysis code users have build on top of MDAnalysis and that might be shared inside of a research group. I'm pretty sure that they implicitly assume that u.filename is an actually existing path because it is in 99% of all cases.

orbeckst · 2016-03-31T19:12:27Z

Given that we don't really universaly support streams everywhere (it's still a hack) I'd go with the unambiguous

u = mda.fetch_PDB('4AKE', cache=True, cachedir=os.curdir)

If you decide to cache, u.filename is the cached file on disk, otherwise the URL.

I'd probably cache in the current directory and provide another kwarg to specify an alternative cache dir.

(It would be cool if we could use streams interchangeably with files everywhere, although I am not sure if the benefits really balance the time to implement a good solution.)

mnmelo · 2016-04-01T07:55:33Z

There's a compact alternative (that I'm not entirely sure I prefer), which is to overload the universe constructor with it:

u = mda.Universe('4AKE', fetch=True)

We can even make it more flexible, database-wise:

u = mda.Universe('4AKE', fetch='PDB')

And maybe allow further keywords to differentiate between fetching topology and trajectory.

hainm · 2016-04-01T07:56:34Z

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

kain88-de · 2016-04-01T08:40:58Z

which is to overload the universe constructor with it:

No that will just be more confusing.

orbeckst · 2016-04-01T09:15:59Z

On 1 Apr, 2016, at 00:56, Hai Nguyen wrote:

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

I don't mind fetch_pdb().

richardjgowers · 2016-04-01T10:15:46Z

I don't want to add another kwarg to Universe, just because of complexity,
but another good reason is if we make fetch return a stream,
fetch_pdb('thing').save('./mycopy.pdb') is a way of downloading the files.

On Fri, 1 Apr 2016 10:16 Oliver Beckstein, [email protected] wrote:

On 1 Apr, 2016, at 00:56, Hai Nguyen wrote:

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

I don't mind fetch_pdb().

—
You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#810 (comment)

kain88-de · 2016-04-01T11:22:39Z

is a way of downloading the files.

wouldn't we need to download it anyway to create the stream? So better would be fetch_pdb('4AKE', fname='foo.pdb'). The we don't write a temporary file but foo.pdb directly.

hainm · 2016-04-02T05:08:28Z

@orbeckst does MDA have intention to add on memory Universe? pytraj has both on disk and on memory objects. The on-memory is good for interactive work (such like visualization).

orbeckst · 2016-05-04T00:49:22Z

FYI, @tomnewport showed one way how to do the PDB downloading in his issue report for #839.

richardjgowers · 2016-07-22T10:05:24Z

FWIW, mmtf at #907 has this built into their API (according to website) and is arguably a better solution for downloading than bulky PDB files?

orbeckst · 2016-07-22T15:30:08Z

On 22 Jul, 2016, at 03:05, Richard Gowers [email protected] wrote:

FWIW, mmtf at #907 has this built into their API (according to website) and is arguably a better solution for downloading than bulky PDB files?

Good idea, although users probably still want to have a PDB file locally, simply because many other tools won’t have MMTF support (yet). Does MMTF have a way to convert to a “native looking” PDB, i.e., one with all the metadata?

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

orbeckst · 2016-08-15T22:09:56Z

Simple code (from #839):

import urllib 
downloader = urllib.URLopener()

def fetch_pdb(pdb_id, download_url="http://files.rcsb.org/download"):
    filename = "{0}.pdb".format(pdb_id.lower())
    downloader.retrieve("/".join([download_url, filename]), filename)
    return filename

# example use:
fetch_pdb("1osm")

orbeckst · 2016-11-20T17:31:28Z

We should use the functionality provided by MMTF, see #907 (and PR #1069).

import mmtf
m = mmtf.fetch('1ake')

... although this m is not immediately useable so I assume that the MMTFParser and MMTFReader would need to be changed so that they can deal with a <mmtf.api.mmtf_reader.MMTFDecoder instance>.

dotsdl added the Difficulty-easy label Mar 31, 2016

orbeckst added the Component-Readers label Apr 19, 2016

orbeckst mentioned this issue Aug 15, 2016

Fails to make a universe from 2Y2N.pdb #936

Closed

orbeckst mentioned this issue Nov 11, 2016

Second draft of MMTF reading (Issue #907) #1069

Merged

4 tasks

orbeckst mentioned this issue Nov 20, 2016

Blog Post for release 0.16.0 MDAnalysis/MDAnalysis.github.io#26

Merged

6 tasks

richardjgowers self-assigned this Nov 21, 2016

richardjgowers mentioned this issue Nov 21, 2016

Issue 810 fetchmmtf #1082

Merged

4 tasks

richardjgowers added a commit that referenced this issue Nov 21, 2016

CHANGELOG for #810

0981704

richardjgowers closed this as completed in #1082 Nov 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add top-level function `from_PDB` for obtaining structure directly from PDB #810

Add top-level function `from_PDB` for obtaining structure directly from PDB #810

dotsdl commented Mar 31, 2016

jbarnoud commented Mar 31, 2016

kain88-de commented Mar 31, 2016

richardjgowers commented Mar 31, 2016

jbarnoud commented Mar 31, 2016

kain88-de commented Mar 31, 2016

orbeckst commented Mar 31, 2016

mnmelo commented Apr 1, 2016

hainm commented Apr 1, 2016

kain88-de commented Apr 1, 2016

orbeckst commented Apr 1, 2016

richardjgowers commented Apr 1, 2016

kain88-de commented Apr 1, 2016

hainm commented Apr 2, 2016

orbeckst commented May 4, 2016

richardjgowers commented Jul 22, 2016

orbeckst commented Jul 22, 2016

orbeckst commented Aug 15, 2016

orbeckst commented Nov 20, 2016 •

edited

Loading

Add top-level function from_PDB for obtaining structure directly from PDB #810

Add top-level function from_PDB for obtaining structure directly from PDB #810

Comments

dotsdl commented Mar 31, 2016

jbarnoud commented Mar 31, 2016

kain88-de commented Mar 31, 2016

richardjgowers commented Mar 31, 2016

jbarnoud commented Mar 31, 2016

kain88-de commented Mar 31, 2016

orbeckst commented Mar 31, 2016

mnmelo commented Apr 1, 2016

hainm commented Apr 1, 2016

kain88-de commented Apr 1, 2016

orbeckst commented Apr 1, 2016

richardjgowers commented Apr 1, 2016

kain88-de commented Apr 1, 2016

hainm commented Apr 2, 2016

orbeckst commented May 4, 2016

richardjgowers commented Jul 22, 2016

orbeckst commented Jul 22, 2016

orbeckst commented Aug 15, 2016

orbeckst commented Nov 20, 2016 • edited Loading

Add top-level function `from_PDB` for obtaining structure directly from PDB #810

Add top-level function `from_PDB` for obtaining structure directly from PDB #810

orbeckst commented Nov 20, 2016 •

edited

Loading