Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PQR Topology Parser does not respect insertion codes #1317

Closed
kaceyaurum opened this issue Apr 24, 2017 · 6 comments
Closed

PQR Topology Parser does not respect insertion codes #1317

kaceyaurum opened this issue Apr 24, 2017 · 6 comments

Comments

@kaceyaurum
Copy link
Contributor

kaceyaurum commented Apr 24, 2017

Expected behaviour

Should create a Universe object.

Actual behaviour

ValueError                                Traceback (most recent call last)
<ipython-input-92-bd129cc992c5> in <module>()
----> 1 mda.Universe('1A2C.pqr')

/nfs/homes/kreidy/.local/lib/python2.7/site-packages/MDAnalysis/core/universe.pyc in __init__(self, *args, **kwargs)
    246                     raise ValueError("Failed to construct topology from file {0}"
    247                                      " with parser {1} \n"
--> 248                                      "Error: {2}".format(self.filename, parser, err))
    249 
    250             # generate and populate Universe version of each class

ValueError: Failed to construct topology from file 1A2C.pqr with parser <class 'MDAnalysis.topology.PQRParser.PQRParser'> 
Error: invalid literal for long() with base 10: '36A'

Code to reproduce the behaviour

import MDAnalysis as mda
mda.Universe('1A2C.pqr')

PQR file was generated with pdb2pqr --whitespace --ff=charmm and is attached as

Current version of MDAnalysis:

0.16.0-dev0

@orbeckst
Copy link
Member

orbeckst commented Apr 24, 2017

The PDB from 1A2C loads fine and it generates to residues with resnum 36. According to the PDB docs on ATOM

Alphabet letters are commonly used for insertion code. The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue.

the residue with resnum 36A has to be treated as a normal residue between resnum 36 and resnum 37.

Thus, the PDBParser works but the PRQ one does not.

@richardjgowers
Copy link
Member

So I had a quick go at this and it got a little problematic. The attached file has different column alignment to the existing PQR file in the test files. The PQR in the tests is from PDB2PQR v1.5, whereas the new one is v2.1.1, so I guess we've been bitten by that.

Can somebody link to a guide on what this format should/could be? I couldn't really find a good definition myself.

@orbeckst
Copy link
Member

The main thing is that the columns are white-space separated. But there isn't a good definition... our docs and the APBS/PDB2PQR ones still say pretty much the same thing:

  • http://www.mdanalysis.org/mdanalysis/documentation_pages/coordinates/PQR.html

  • https://www.poissonboltzmann.org/docs/file-format-info/#pqr :

    PQR

    This format is a modification of the PDB format which allows users to add charge and radius parameters to existing PDB data while keeping it in a format amenable to visualization with standard molecular graphics programs. The origins of the PQR format are somewhat uncertain, but has been used by several computational biology software programs, including MEAD and AutoDock. UHBD uses a very similar format called QCD.

    APBS reads very loosely-formatted PQR files: all fields are whitespace-delimited rather than the strict column formatting mandated by the PDB format. This more liberal formatting allows coordinates which are larger/smaller than ± 999 Å.

    APBS reads data on a per-line basis from PQR files using the following format:

    Field_name Atom_number Atom_name Residue_name Chain_ID Residue_number X Y Z Charge Radius

    where the whitespace is the most important feature of this format. The fields are:

    Field_name A string which specifies the type of PQR entry and should either be ATOM or HETATM in order to be parsed by APBS.

    Atom_number An integer which provides the atom index.

    Atom_name A string which provides the atom name.

    Residue_name A string which provides the residue name.

    Chain_ID An optional string which provides the chain ID of the atom. Note chain ID support is a new feature of APBS 0.5.0 and later versions.

    Residue_number An integer which provides the residue index.

    X Y Z 3 floats which provide the atomic coordiantes.

    Charge A float which provides the atomic charge (in electrons).

    Radius A float which provides the atomic radius (in Å).

    Clearly, this format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments. This deviation can be particularly significant when large coordinate values are used. However, in order to maintain compatibility with most molecular graphics programs, the PDB2PQR program and the utilities provided with APBS (see the Parameterization section) attempt to preserve the PDB format as much as possible.

    This is pretty much what we have in our docs.
    In our docs we linked to http://www.poissonboltzmann.org/file-formats/biomolecular-structurw/pqr but that is now a 404... we should update the broken PQR link in our docs.

@orbeckst
Copy link
Member

orbeckst commented Apr 26, 2017

The main problem appears that the format spec states for Residue_number

Residue_number An integer which provides the residue index

that it ought to be an integer. However, pdb2pqr will apparently happily process residue number + insertion code as a unit. That makes sense, because according to PDB standard, this combination identifies a residue.

Our PDB parser will assign the same resnum to two residues with same residue number but different insertion codes (I am not even sure if we store insertion codes...?). The residues get different resids, though, which makes them distinguishable in MDAnalysis. At a minimum we would need similar behavior for the PQR parser.

@richardjgowers
Copy link
Member

@orbeckst ok thanks, look like we just need to check if a resid has an icode appended and split it off if so. We can make PQR respect icodes fairly easily.

@richardjgowers richardjgowers self-assigned this Apr 27, 2017
@richardjgowers richardjgowers added this to the 0.16.x milestone Apr 27, 2017
richardjgowers added a commit that referenced this issue May 1, 2017
richardjgowers added a commit that referenced this issue May 1, 2017
orbeckst pushed a commit that referenced this issue May 1, 2017
* PQRParser now reads icodes
* Fixes Issue #1317
* Updated PQR docs
@orbeckst
Copy link
Member

orbeckst commented May 1, 2017

Closed in PR #1328

@orbeckst orbeckst closed this as completed May 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants