Remove the BioPython PDBParser for topology and trajectory readers #777

kain88-de · 2016-03-17T22:56:54Z

In the discussion in #775 it became clear that the BioPython Parser should be removed completely.

jdetle · 2016-03-28T02:34:29Z

I will go ahead and take care of this in the coming days.

jbarnoud · 2016-03-28T07:38:25Z

I do not know if anybody calls the PDB parser directly, but it may be best to have a deprecation period for it. Anyway, no other part of the code should call it, and the documentation should be updated.

kain88-de · 2016-03-28T09:30:44Z

You should also deprecate the permissive flag in the universe creation.

richardjgowers · 2016-03-28T10:53:32Z

I think the point here is we're trying to make our parser do everything the other one does, so it doesn't need deprecating as no functionality is lost?

@jdetle as step 0 for this, I'd try reading every .pdb file we have in the test suite with PrimitivePDBParser and see if you get any failures. Then try comparing to the results of PDBParser

kain88-de · 2016-03-28T11:26:10Z

@richardjgowers does our testsuite even use the BioPython Trajectory reader. As far as I can tell from the coordinates tests only the primitive reader is used. I actually started to notice a bunch of PDB related issues when I tried to use the new common-API reader tests class defined in MDAnalysisTests/coordinates/base.

richardjgowers · 2016-03-28T11:41:22Z

This is getting a little confusing, I thought we were talking about the Parser (reads topology). Either way, I find the whole permissive/primitive thing confusing and don't really understand what one does that the other doesn't

kain88-de · 2016-03-28T11:49:25Z

Oh sorry I meant the trajectory reader, I never looked at the topology parsers so I don't have a clear idea about them

richardjgowers · 2016-03-28T11:59:31Z

@kain88-de there's a similar split and it could do with removing too. I guess you could rename this issue to cover both

orbeckst · 2016-03-31T22:31:57Z

We want to get rid of the Bio.PDB.PDBParser which is being used when using Universe(..., permissive=False). The permissive keyword was supposed to indicate that the original Bio.PDB parser is very strict about the PDB format whereas our PrimitivePDBReader is supposedly more lenient.

jdetle · 2016-03-31T22:39:48Z

Duly noted, I am working on what @richardjgowers said earlier. I think in in these initial stages everything is going to take longer than I think it should because I'm getting familiar with the codebase and python itself. I just ran into an interesting error that I'm investigating regarding object equality. By my understanding:

def setUp(self):
   self.universe = mda.Universe(PDB_small, permissive=True)
    # 3 decimals in PDB spec
    # http://www.wwpdb.org/documentation/format32/sect9.html#ATOM
   self.prec = 3
def test_PDB(self):
        from MDAnalysis.coordinates.PDB import PrimitivePDBReader
        primitiveReader = PrimitivePDBReader(PDB_small, n_atoms=3341)
        print("checking equal:" + str(self.universe.trajectory == primitiveReader))
        assert_equal(self.universe.trajectory, primitiveReader)

the print statement should yield "checking equal: True" but instead yields False, furthermore the test doesn't pass:

======================================================================
FAIL: test_PDB (MDAnalysisTests.coordinates.test_pdb.TestPrimitivePDBReader)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jdetlefs/github/MDAnalysis/testsuite/MDAnalysisTests/coordinates/test_pdb.py", line 126, in test_PDB
    assert_equal(self.universe.trajectory, primitiveReader)
  File "/usr/lib/python2.7/dist-packages/numpy/testing/utils.py", line 317, in assert_equal
    raise AssertionError(msg)
AssertionError: 
Items are not equal:
 ACTUAL: <PrimitivePDBReader /home/jdetlefs/github/MDAnalysis/testsuite/MDAnalysisTests/data/adk_open.pdb with 1 frames of 3341 atoms>
 DESIRED: <PrimitivePDBReader /home/jdetlefs/github/MDAnalysis/testsuite/MDAnalysisTests/data/adk_open.pdb with 1 frames of 3341 atoms>

Is this done intentionally for some reason or are we missing some __eq__(object, other): methods in places?

jbarnoud · 2016-04-01T05:56:08Z

I am not sure readers implement any comparison operator. If so, the python documentation tells us that objects are compared by their identity so == behave like is:

If no cmp(), eq() or ne() operation is defined, class instances are compared by object identity (“address”).

This means an object will only be equal to itself. In your example the two objects represent the same thing (a PDB reader for the same file), but they are different instances: i.e. id(self.universe.trajectory) != id(primitiveReader).

In that case, the distinction is important, because a reader store a state and changing the sate of a reader does not change the state of an identical reader.

kain88-de · 2016-04-01T06:59:21Z

We don't implement a comparison for the readers. It also wouldn't make much sense to have that. From your code snippet I guess you want to check if the reader attached to the universe is of type PrimitivePDBReader that can be done with the isinstance function.

jdetle · 2016-04-02T20:02:44Z

@richardjgowers maybe I misunderstood, but I went ahead and tried to start testing the parsers, however I ran into an issue that I don't know how to get around. ~~all the modules are cythonized~~

from MDAnalysis.topology import PDBParser, PrimitivePDBParser
......
BioPParser = PDBParser(filename)

throws the exception "Module not callable" how do I go about testing the .py versions of the modules? Does this require changing how my packages are linked something far simpler?

dotsdl · 2016-04-02T20:23:13Z

@jdetle none of the parsers are written in Cython; that's not the issue. The problem here is that if have a look at the source tree, PDBParser and PrimitivePDBParser are modules, which is what the exception tells you. You will need to something like:

from MDAnalysis.topology.PDBParser import PDBParser

to make this work.

jdetle · 2016-04-02T20:34:42Z

@dotsdl ah okay! Thank you.

jdetle · 2016-04-02T21:44:16Z

@dotsdl Did you write the test_topology file? It seems like you actually have already done most of what @richardjgowers suggested I do with the _TestTopology and TestPDB classes in test_topology.py, I wrote a kind of shoddy script from inspecting failures with this

class testParsers(object):
    def test_PrimitiveAndBioPython(self):
        files = [PDB_NAMD,
        PDB_small,
        PDB_closed,
        PDB_multiframe,
        PDB_helix,
        PDB_conect,
        PDB_full,
        PDB_HOLE,
        unordered_res]

        for filename in files:
            print(filename)
            ParsePrim = PrimitivePDBParser(filename).parse()
            ParseBIO = PDBParser(filename).parse()

            if(filename == PDB_multiframe or filename == PDB_conect
                or filename == PDB_full):
                print('skip')
                #do nothing
            else:
                assert_equal(ParsePrim['atoms'] == ParseBIO['atoms'], True, "Atoms not equal")

My gut tells me that I should be separating this out into a test for each filename so that it would be three failures and the rest pass, I can do that but since we are removing the BioPython parser anyways it seems moot. The parsed atom arrays are not equal for PDB_multiframe, PDB_conect and PDB_full. Given that the PrimitiveParser is tested extensively in test_topology, I think its safe to say that these are instances in which the strict parser fails and further reinforce why we are getting rid of it. If this conclusion seems valid I'll go ahead and delete BioPython parser where it comes up and make a pull request.

richardjgowers · 2016-04-02T21:51:56Z

@jdetle so what you've described is called a test generator. An example of using them is here:

https://github.com/MDAnalysis/mdanalysis/blob/develop/testsuite/MDAnalysisTests/test_util.py#L643

Where the for loop goes over the list called formats and yields many tests

So a simpler example is...

class TestAddition(object):  # can't use TestCase!
    def _check_addition(self, a, b, ref):
        assert_equal(a+b, ref)

    def test_addition(self):
        for x, y, z in ((1, 2, 3), (7, 10, 1), (3, 4, 7)):
            yield self._check_addition, x, y, z

So the yield statement is creating tests, 3 tests will get created, with the 2nd test failing but the 3rd still running

jdetle · 2016-04-02T21:57:42Z

@richardjgowers Awesome, I will certainly use that for a problem in the future. Big question: Do we still need the BioPython trajectory writer? If this is not true, by my understanding, we could get rid of the permissive flag in its entirety, but the pull request would be somewhat more substantial.

richardjgowers · 2016-04-02T22:03:17Z

Yeah I think we're trying to get rid of permissive variants of everything. But it's sometimes a lot easier for 1 issue to have 3 PRs if you can split it up nicely.

jdetle · 2016-04-02T22:07:03Z

Sorry I think I'm having a jargon issue, my understanding is that BioPython.PDBReader is a strict reader, in that it is not 'permissive' of weird pdb formats. We are getting rid of permissive = False variants of everything by my understanding. Oh I think I also forgot that there are more than just PDB Readers, so I'm no longer sure if we could get rid of the permissive flag.

richardjgowers · 2016-04-02T22:11:01Z

Yeah I get them confused too. We're killing the bio versions whatever
they're called

On Sat, 2 Apr 2016 23:07 John Detlefs, [email protected] wrote:

Sorry I think I'm having a jargon issue, my understanding is that
BioPython is a strict reader, in that it is not 'permissive' of weird pdb
formats. We are getting rid of permissive = False variants of everything by
my understanding.

—
You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#777 (comment)

jdetle · 2016-04-02T22:14:13Z

Yea okay so now I am fairly certain that we could get rid of the 'permissive_pdb_reader' flag in MDAnalyis/core/init.py'

kain88-de · 2016-04-03T09:00:51Z

Yes we can get rid of that

- 'permissive'=False has no effect anymore - Added deprecation warnings for Primitive Readers/Writers and Parsers. - Changed doc strings to eliminate references to BioPython Reader/Writer. - Updated CHANGELOG to reflect changes.

- Changed tests to eliminate known failures caused by BioPython - Updated CHANGELOGs and fixed wrong version number in AtomGroup.py, - fixed indentation issue in PDB.py - Fixed doc references to version strings and the permissive flags, - got rid of extraneous text in PrimitivePDBParser, fixed scope of warnings - Used boolean property of collections as suggested by QC

- 'permissive'=False has no effect anymore - Added deprecation warnings for Primitive Readers/Writers and Parsers. - Changed doc strings to eliminate references to BioPython Reader/Writer. - Updated CHANGELOG to reflect changes.

- Changed tests to eliminate known failures caused by BioPython - Updated CHANGELOGs and fixed wrong version number in AtomGroup.py, - fixed indentation issue in PDB.py - Fixed doc references to version strings and the permissive flags, - got rid of extraneous text in PrimitivePDBParser, fixed scope of warnings - Used boolean property of collections as suggested by QC

remove Bio.PDBParser (Issue #777)

orbeckst · 2016-04-24T17:41:21Z

@jdetle well done, that was sizable chunk of work. I look forward to your GSoC contribution!

jdetle · 2016-04-24T19:19:32Z

@orbeckst Thanks!

kain88-de added Format-PDB Component-Readers Difficulty-easy labels Mar 17, 2016

kain88-de added this to the 0.15.0 milestone Mar 17, 2016

mnmelo mentioned this issue Mar 23, 2016

setuptools setup_requires installs numpy/cython in unwanted locations #798

Closed

orbeckst mentioned this issue Mar 25, 2016

setup.py now cleanly handles setup-time cython dependency (closes #768) #799

Closed

orbeckst changed the title ~~Remove the BioPython PDBParser~~ Remove the BioPython PDBParser for topology and trajectory readers Mar 31, 2016

jdetle mentioned this issue Apr 2, 2016

updated to return Permissive PDB Reader and Parser for all PDBs #812

Closed

4 tasks

orbeckst assigned jdetle Apr 16, 2016

orbeckst mentioned this issue Apr 22, 2016

remove Bio.PDBParser (Issue #777) #832

Merged

4 tasks

richardjgowers closed this as completed in #832 Apr 24, 2016

richardjgowers added a commit that referenced this issue Apr 24, 2016

Merge pull request #832 from MDAnalysis/issue-777-remove-BioPDB

2de1290

remove Bio.PDBParser (Issue #777)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the BioPython PDBParser for topology and trajectory readers #777

Remove the BioPython PDBParser for topology and trajectory readers #777

kain88-de commented Mar 17, 2016

jdetle commented Mar 28, 2016

jbarnoud commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

orbeckst commented Mar 31, 2016

jdetle commented Mar 31, 2016

jbarnoud commented Apr 1, 2016

kain88-de commented Apr 1, 2016

jdetle commented Apr 2, 2016

dotsdl commented Apr 2, 2016

jdetle commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

kain88-de commented Apr 3, 2016

orbeckst commented Apr 24, 2016

jdetle commented Apr 24, 2016

Remove the BioPython PDBParser for topology and trajectory readers #777

Remove the BioPython PDBParser for topology and trajectory readers #777

Comments

kain88-de commented Mar 17, 2016

jdetle commented Mar 28, 2016

jbarnoud commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

kain88-de commented Mar 28, 2016

richardjgowers commented Mar 28, 2016

orbeckst commented Mar 31, 2016

jdetle commented Mar 31, 2016

jbarnoud commented Apr 1, 2016

kain88-de commented Apr 1, 2016

jdetle commented Apr 2, 2016

dotsdl commented Apr 2, 2016

jdetle commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

richardjgowers commented Apr 2, 2016

jdetle commented Apr 2, 2016

kain88-de commented Apr 3, 2016

orbeckst commented Apr 24, 2016

jdetle commented Apr 24, 2016