PQR Topology Parser does not respect insertion codes #1317

kaceyaurum · 2017-04-24T22:31:51Z

Expected behaviour

Should create a Universe object.

Actual behaviour

ValueError                                Traceback (most recent call last)
<ipython-input-92-bd129cc992c5> in <module>()
----> 1 mda.Universe('1A2C.pqr')

/nfs/homes/kreidy/.local/lib/python2.7/site-packages/MDAnalysis/core/universe.pyc in __init__(self, *args, **kwargs)
    246                     raise ValueError("Failed to construct topology from file {0}"
    247                                      " with parser {1} \n"
--> 248                                      "Error: {2}".format(self.filename, parser, err))
    249 
    250             # generate and populate Universe version of each class

ValueError: Failed to construct topology from file 1A2C.pqr with parser <class 'MDAnalysis.topology.PQRParser.PQRParser'> 
Error: invalid literal for long() with base 10: '36A'

Code to reproduce the behaviour

import MDAnalysis as mda
mda.Universe('1A2C.pqr')

PQR file was generated with pdb2pqr --whitespace --ff=charmm and is attached as

1A2C.pqr.zip

Current version of MDAnalysis:

0.16.0-dev0

The text was updated successfully, but these errors were encountered:

orbeckst · 2017-04-24T22:36:26Z

The PDB from 1A2C loads fine and it generates to residues with resnum 36. According to the PDB docs on ATOM

Alphabet letters are commonly used for insertion code. The insertion code is used when two residues have the same numbering. The combination of residue numbering and insertion code defines the unique residue.

the residue with resnum 36A has to be treated as a normal residue between resnum 36 and resnum 37.

Thus, the PDBParser works but the PRQ one does not.

richardjgowers · 2017-04-26T16:58:48Z

So I had a quick go at this and it got a little problematic. The attached file has different column alignment to the existing PQR file in the test files. The PQR in the tests is from PDB2PQR v1.5, whereas the new one is v2.1.1, so I guess we've been bitten by that.

Can somebody link to a guide on what this format should/could be? I couldn't really find a good definition myself.

orbeckst · 2017-04-26T17:19:36Z

The main thing is that the columns are white-space separated. But there isn't a good definition... our docs and the APBS/PDB2PQR ones still say pretty much the same thing:

http://www.mdanalysis.org/mdanalysis/documentation_pages/coordinates/PQR.html
https://www.poissonboltzmann.org/docs/file-format-info/#pqr :

PQR

This format is a modification of the PDB format which allows users to add charge and radius parameters to existing PDB data while keeping it in a format amenable to visualization with standard molecular graphics programs. The origins of the PQR format are somewhat uncertain, but has been used by several computational biology software programs, including MEAD and AutoDock. UHBD uses a very similar format called QCD.

APBS reads very loosely-formatted PQR files: all fields are whitespace-delimited rather than the strict column formatting mandated by the PDB format. This more liberal formatting allows coordinates which are larger/smaller than ± 999 Å.

APBS reads data on a per-line basis from PQR files using the following format:

Field_name Atom_number Atom_name Residue_name Chain_ID Residue_number X Y Z Charge Radius

where the whitespace is the most important feature of this format. The fields are:

Field_name A string which specifies the type of PQR entry and should either be ATOM or HETATM in order to be parsed by APBS.

Atom_number An integer which provides the atom index.

Atom_name A string which provides the atom name.

Residue_name A string which provides the residue name.

Chain_ID An optional string which provides the chain ID of the atom. Note chain ID support is a new feature of APBS 0.5.0 and later versions.

Residue_number An integer which provides the residue index.

X Y Z 3 floats which provide the atomic coordiantes.

Charge A float which provides the atomic charge (in electrons).

Radius A float which provides the atomic radius (in Å).

Clearly, this format can deviate wildly from PDB due to the use of whitespaces rather than specific column widths and alignments. This deviation can be particularly significant when large coordinate values are used. However, in order to maintain compatibility with most molecular graphics programs, the PDB2PQR program and the utilities provided with APBS (see the Parameterization section) attempt to preserve the PDB format as much as possible.

This is pretty much what we have in our docs.
In our docs we linked to http://www.poissonboltzmann.org/file-formats/biomolecular-structurw/pqr but that is now a 404... we should update the broken PQR link in our docs.

orbeckst · 2017-04-26T17:25:00Z

The main problem appears that the format spec states for Residue_number

Residue_number An integer which provides the residue index

that it ought to be an integer. However, pdb2pqr will apparently happily process residue number + insertion code as a unit. That makes sense, because according to PDB standard, this combination identifies a residue.

Our PDB parser will assign the same resnum to two residues with same residue number but different insertion codes (I am not even sure if we store insertion codes...?). The residues get different resids, though, which makes them distinguishable in MDAnalysis. At a minimum we would need similar behavior for the PQR parser.

richardjgowers · 2017-04-27T09:19:24Z

@orbeckst ok thanks, look like we just need to check if a resid has an icode appended and split it off if so. We can make PQR respect icodes fairly easily.

Fixes Issue #1317

* PQRParser now reads icodes * Fixes Issue #1317 * Updated PQR docs

orbeckst · 2017-05-01T20:06:43Z

Closed in PR #1328

orbeckst added Component-Topology Format-PQR defect labels Apr 24, 2017

richardjgowers self-assigned this Apr 27, 2017

richardjgowers added this to the 0.16.x milestone Apr 27, 2017

richardjgowers added a commit that referenced this issue May 1, 2017

PQRParser now reads icodes

657451b

Fixes Issue #1317

richardjgowers mentioned this issue May 1, 2017

PQRParser now reads icodes #1328

Merged

4 tasks

richardjgowers added a commit that referenced this issue May 1, 2017

PQRParser now reads icodes

809832f

Fixes Issue #1317

orbeckst pushed a commit that referenced this issue May 1, 2017

PQRParser now reads icodes (#1328)

ac27e28

* PQRParser now reads icodes * Fixes Issue #1317 * Updated PQR docs

orbeckst closed this as completed May 1, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PQR Topology Parser does not respect insertion codes #1317

PQR Topology Parser does not respect insertion codes #1317

kaceyaurum commented Apr 24, 2017 •

edited by orbeckst

Loading

orbeckst commented Apr 24, 2017 •

edited

Loading

richardjgowers commented Apr 26, 2017

orbeckst commented Apr 26, 2017

orbeckst commented Apr 26, 2017 •

edited

Loading

richardjgowers commented Apr 27, 2017

orbeckst commented May 1, 2017

PQR Topology Parser does not respect insertion codes #1317

PQR Topology Parser does not respect insertion codes #1317

Comments

kaceyaurum commented Apr 24, 2017 • edited by orbeckst Loading

Expected behaviour

Actual behaviour

Code to reproduce the behaviour

Current version of MDAnalysis:

orbeckst commented Apr 24, 2017 • edited Loading

richardjgowers commented Apr 26, 2017

orbeckst commented Apr 26, 2017

orbeckst commented Apr 26, 2017 • edited Loading

richardjgowers commented Apr 27, 2017

orbeckst commented May 1, 2017

kaceyaurum commented Apr 24, 2017 •

edited by orbeckst

Loading

orbeckst commented Apr 24, 2017 •

edited

Loading

orbeckst commented Apr 26, 2017 •

edited

Loading