Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add top-level function from_PDB for obtaining structure directly from PDB #810

Closed
dotsdl opened this issue Mar 31, 2016 · 18 comments
Closed

Comments

@dotsdl
Copy link
Member

dotsdl commented Mar 31, 2016

It would be nice to be able to get a Universe directly from the PDB without first writing the file to disk. This could be done using perhaps a NamedStream so that we could do:

import MDAnalysis as mda

u = mda.Universe(mda.from_PDB('4AKE'))

where u.filename just returns the web address the NamedStream pulled from.

@jbarnoud
Copy link
Contributor

I would call the function fetch_PDB. I feel from is too generic, while fetch is most commonly used in that context.

@kain88-de
Copy link
Member

from_PDB could also be confused that it generates a universe from a PDB. And I would rather go that the API is u = mda.fetch_PDB('4AKE').

where u.filename just returns the web address the NamedStream pulled from.

Sure this won't confuse analysis algorithms that assume u.filename is an existing file on hard drive? Wouldn't it be better to save the pdb in a temporary directory?

@richardjgowers
Copy link
Member

U. Filename is always the topology, which is read once and never used
again. So pulling from Web matches this pattern too? Or should the filename
be a URL to make it clear?

On Thu, 31 Mar 2016 10:24 kain88-de, [email protected] wrote:

from_PDB could also be confused that it generates a universe from a PDB.
And I would rather go that the API is u = mda.fetch_PDB('4AKE').

where u.filename just returns the web address the NamedStream pulled
from. Sure this won't confuse analysis algorithms that assume u.filename
is an existing file on hard drive? Wouldn't it be better to save the pdb in
a temporary directory?


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#810 (comment)

@jbarnoud
Copy link
Contributor

Sure this won't confuse analysis algorithms that assume u.filename is an existing file on hard drive?

Most of the time, if the analysis does not assume it receives a stream, it does something wrong.

@kain88-de
Copy link
Member

Most of the time, if the analysis does not assume it receives a stream, it does something wrong.

The old soon to be deprecated ContactAnalysis for example uses u.filename to create a new copy of the universe, this wouldn't work with that approach or our universe object needs to recognize urls. I would almost bet that any of the other older analysis classes does something similar.

Not to mention all the custom analysis code users have build on top of MDAnalysis and that might be shared inside of a research group. I'm pretty sure that they implicitly assume that u.filename is an actually existing path because it is in 99% of all cases.

@orbeckst
Copy link
Member

Given that we don't really universaly support streams everywhere (it's still a hack) I'd go with the unambiguous

u = mda.fetch_PDB('4AKE', cache=True, cachedir=os.curdir)

If you decide to cache, u.filename is the cached file on disk, otherwise the URL.

I'd probably cache in the current directory and provide another kwarg to specify an alternative cache dir.

(It would be cool if we could use streams interchangeably with files everywhere, although I am not sure if the benefits really balance the time to implement a good solution.)

@mnmelo
Copy link
Member

mnmelo commented Apr 1, 2016

There's a compact alternative (that I'm not entirely sure I prefer), which is to overload the universe constructor with it:

u = mda.Universe('4AKE', fetch=True)

We can even make it more flexible, database-wise:

u = mda.Universe('4AKE', fetch='PDB')

And maybe allow further keywords to differentiate between fetching topology and trajectory.

@hainm
Copy link
Contributor

hainm commented Apr 1, 2016

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

@kain88-de
Copy link
Member

which is to overload the universe constructor with it:

No that will just be more confusing.

@orbeckst
Copy link
Member

orbeckst commented Apr 1, 2016

On 1 Apr, 2016, at 00:56, Hai Nguyen wrote:

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

I don't mind fetch_pdb().

@richardjgowers
Copy link
Member

I don't want to add another kwarg to Universe, just because of complexity,
but another good reason is if we make fetch return a stream,
fetch_pdb('thing').save('./mycopy.pdb') is a way of downloading the files.

On Fri, 1 Apr 2016 10:16 Oliver Beckstein, [email protected] wrote:

On 1 Apr, 2016, at 00:56, Hai Nguyen wrote:

+1 vote for fetch_PDB, +2 for fetch_pdb (pytraj uses it :)) ).

I don't mind fetch_pdb().


You are receiving this because you commented.

Reply to this email directly or view it on GitHub
#810 (comment)

@kain88-de
Copy link
Member

is a way of downloading the files.

wouldn't we need to download it anyway to create the stream? So better would be fetch_pdb('4AKE', fname='foo.pdb'). The we don't write a temporary file but foo.pdb directly.

@hainm
Copy link
Contributor

hainm commented Apr 2, 2016

@orbeckst does MDA have intention to add on memory Universe? pytraj has both on disk and on memory objects. The on-memory is good for interactive work (such like visualization).

@orbeckst
Copy link
Member

orbeckst commented May 4, 2016

FYI, @tomnewport showed one way how to do the PDB downloading in his issue report for #839.

@richardjgowers
Copy link
Member

FWIW, mmtf at #907 has this built into their API (according to website) and is arguably a better solution for downloading than bulky PDB files?

@orbeckst
Copy link
Member

On 22 Jul, 2016, at 03:05, Richard Gowers [email protected] wrote:

FWIW, mmtf at #907 has this built into their API (according to website) and is arguably a better solution for downloading than bulky PDB files?

Good idea, although users probably still want to have a PDB file locally, simply because many other tools won’t have MMTF support (yet). Does MMTF have a way to convert to a “native looking” PDB, i.e., one with all the metadata?

Oliver Beckstein * [email protected]
skype: orbeckst * [email protected]

@orbeckst
Copy link
Member

Simple code (from #839):

import urllib 
downloader = urllib.URLopener()

def fetch_pdb(pdb_id, download_url="http://files.rcsb.org/download"):
    filename = "{0}.pdb".format(pdb_id.lower())
    downloader.retrieve("/".join([download_url, filename]), filename)
    return filename

# example use:
fetch_pdb("1osm")

@orbeckst
Copy link
Member

orbeckst commented Nov 20, 2016

We should use the functionality provided by MMTF, see #907 (and PR #1069).

import mmtf
m = mmtf.fetch('1ake')

... although this m is not immediately useable so I assume that the MMTFParser and MMTFReader would need to be changed so that they can deal with a <mmtf.api.mmtf_reader.MMTFDecoder instance>.

@richardjgowers richardjgowers self-assigned this Nov 21, 2016
richardjgowers added a commit that referenced this issue Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants