Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] first draft of PDBx Reader #4303

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions package/MDAnalysis/coordinates/PDBx.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the GNU Public Licence, v2 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#

"""

Check warning on line 24 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L24

Added line #L24 was not covered by tests
PDBx (mmcif) files in MDAnalysis --- :mod:`MDAnalysis.coordinates.PDBx`
=======================================================================

Reads coordinates from a PDBx_ (mmcif) format file. Will populate the Universe positions from the
``_atom_site.Cartn_x`` field in the PDBx file. Will populate the unitcell dimensions from the ``_cell`` section.


.. _PDBx:
https://pdb101.rcsb.org/learn/guide-to-understanding-pdb-data/beginner’s-guide-to-pdb-structures-and-the-pdbx-mmcif-format
"""
import gemmi
import numpy as np

Check warning on line 36 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L35-L36

Added lines #L35 - L36 were not covered by tests

from . import base

Check warning on line 38 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L38

Added line #L38 was not covered by tests


class PDBxReader(base.SingleFrameReaderBase):
format = ['cif', 'pdbx']
units = {'time': None, 'length': 'Angstrom'}

Check warning on line 43 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L41-L43

Added lines #L41 - L43 were not covered by tests

def _read_first_frame(self):
doc = gemmi.cif.read(self.filename)

Check warning on line 46 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L45-L46

Added lines #L45 - L46 were not covered by tests

block = doc.sole_block()

Check warning on line 48 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L48

Added line #L48 was not covered by tests

coords = block.find('_atom_site.', ['Cartn_x', 'Cartn_y', 'Cartn_z'])
self.natoms = len(coords)

Check warning on line 51 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L50-L51

Added lines #L50 - L51 were not covered by tests

xyz = np.zeros((self.natoms, 3), dtype=np.float32)

Check warning on line 53 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L53

Added line #L53 was not covered by tests

for i, (x, y, z) in enumerate(coords):
xyz[i, :] = x, y, z

Check warning on line 56 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L56

Added line #L56 was not covered by tests

ts = self.ts = base.Timestep.from_coordinates(xyz, **self._ts_kwargs)
ts.frame = 0

Check warning on line 59 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L58-L59

Added lines #L58 - L59 were not covered by tests

box = block.find('_cell.', ['length_a', 'length_b', 'length_c',

Check warning on line 61 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L61

Added line #L61 was not covered by tests
'angle_alpha', 'angle_beta', 'angle_gamma'])
if box:
unitcell = np.zeros(6, dtype=np.float64)
unitcell[:] = box[0]

Check warning on line 65 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L64-L65

Added lines #L64 - L65 were not covered by tests

ts.dimensions = unitcell

Check warning on line 67 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L67

Added line #L67 was not covered by tests

if self.convert_units:
# in-place !
self.convert_pos_from_native(self.ts._pos)

Check warning on line 71 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L71

Added line #L71 was not covered by tests
if self.ts.dimensions is not None:
self.convert_pos_from_native(self.ts.dimensions[:3])

Check warning on line 73 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L73

Added line #L73 was not covered by tests

return ts

Check warning on line 75 in package/MDAnalysis/coordinates/PDBx.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/coordinates/PDBx.py#L75

Added line #L75 was not covered by tests
122 changes: 122 additions & 0 deletions package/MDAnalysis/topology/PDBxParser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# -*- Mode: python; tab-width: 4; indent-tabs-mode:nil; coding:utf-8 -*-
# vim: tabstop=4 expandtab shiftwidth=4 softtabstop=4
#
# MDAnalysis --- https://www.mdanalysis.org
# Copyright (c) 2006-2017 The MDAnalysis Development Team and contributors
# (see the file AUTHORS for the full list of names)
#
# Released under the GNU Public Licence, v2 or any higher version
#
# Please cite your use of MDAnalysis in published work:
#
# R. J. Gowers, M. Linke, J. Barnoud, T. J. E. Reddy, M. N. Melo, S. L. Seyler,
# D. L. Dotson, J. Domanski, S. Buchoux, I. M. Kenney, and O. Beckstein.
# MDAnalysis: A Python package for the rapid analysis of molecular dynamics
# simulations. In S. Benthall and S. Rostrup editors, Proceedings of the 15th
# Python in Science Conference, pages 102-109, Austin, TX, 2016. SciPy.
# doi: 10.25080/majora-629e541a-00e
#
# N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein.
# MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations.
# J. Comput. Chem. 32 (2011), 2319--2327, doi:10.1002/jcc.21787
#
"""

Check warning on line 23 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L23

Added line #L23 was not covered by tests
PDBx topology parser
====================


See Also
--------
:class:`MDAnalysis.coordinates.PDBx`

"""
import gemmi
import numpy as np

Check warning on line 34 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L33-L34

Added lines #L33 - L34 were not covered by tests

from .base import TopologyReaderBase, change_squash
from ..core.topology import Topology
from ..core.topologyattrs import (

Check warning on line 38 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L36-L38

Added lines #L36 - L38 were not covered by tests
Atomnames,
Atomids,
AltLocs,
Elements,
ICodes,
RecordTypes,
Resids,
Resnames,
Segids,
)


class PDBxParser(TopologyReaderBase):

Check warning on line 51 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L51

Added line #L51 was not covered by tests
"""Read a Topology from a PDBx file

Creates the following attributes from these "_atom_site" PDBx loop entries
- "group_PDB" RecordType
- "id" AtomId
- "label_alt_id" AltLoc
- "label_type_symbol" Element
- "label_atom_id" AtomName
- "auth_seq_id" Resid
- "auth_comp_id" Resname
- "pdbx_PDB_ins_code" ICode
- "auth_asym_id" ChainID
"""
format = ['PBDx', 'cif']

Check warning on line 65 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L65

Added line #L65 was not covered by tests

def parse(self, **kwargs) -> Topology:
doc = gemmi.cif.read(self.filename)
block = doc.sole_block()

Check warning on line 69 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L67-L69

Added lines #L67 - L69 were not covered by tests

attrs = []

Check warning on line 71 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L71

Added line #L71 was not covered by tests

def objarr(x):
return np.array(x, dtype=object)

Check warning on line 74 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L73-L74

Added lines #L73 - L74 were not covered by tests

# hierarchy correspondence:
# seq_id -> residues
# entity_id -> chains
if recordtypes := block.find('_atom_site.group_PDB'):
attrs.append(RecordTypes(recordtypes))
ids = block.find_loop('_atom_site.id')
n_atoms = len(ids)
attrs.append(Atomids(ids))

Check warning on line 83 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L80-L83

Added lines #L80 - L83 were not covered by tests
if altlocs := block.find_loop('_atom_site.label_alt_id'):
altlocs = np.array(altlocs, dtype=object)
altlocs[altlocs == '.'] = ''
attrs.append(AltLocs(altlocs))

Check warning on line 87 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L85-L87

Added lines #L85 - L87 were not covered by tests
if elements_loop := block.find_loop('_atom_site.type_symbol'):
attrs.append(Elements(objarr(elements_loop)))

Check warning on line 89 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L89

Added line #L89 was not covered by tests
if names_loop := block.find_loop('_atom_site.label_atom_id'):
attrs.append(Atomnames(objarr(names_loop)))

Check warning on line 91 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L91

Added line #L91 was not covered by tests

# sort out residues/segments
# label_seq_id seems to not cover entire model unlike author versions
resids = block.find_loop('_atom_site.auth_seq_id')
resnames = block.find_loop('_atom_site.auth_comp_id')
icodes = block.find_loop('_atom_site.pdbx_PDB_ins_code')
chainids = block.find_loop('_atom_site.auth_asym_id')

Check warning on line 98 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L95-L98

Added lines #L95 - L98 were not covered by tests

residx, (resids, icodes, resnames, chainids) = change_squash(

Check warning on line 100 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L100

Added line #L100 was not covered by tests
(resids, icodes), (resids, icodes, resnames, chainids)
)
segidx, (chainids,) = change_squash((chainids,), (chainids,))

Check warning on line 103 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L103

Added line #L103 was not covered by tests

attrs.extend((

Check warning on line 105 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L105

Added line #L105 was not covered by tests
Resids(resids),
Resnames(objarr(resnames)),
ICodes(objarr(icodes)),
Segids(chainids),
))

n_residues = len(resids)
n_segments = len(chainids)

Check warning on line 113 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L112-L113

Added lines #L112 - L113 were not covered by tests

return Topology(

Check warning on line 115 in package/MDAnalysis/topology/PDBxParser.py

View check run for this annotation

Codecov / codecov/patch

package/MDAnalysis/topology/PDBxParser.py#L115

Added line #L115 was not covered by tests
n_atoms=n_atoms,
n_res=n_residues,
n_seg=n_segments,
attrs=attrs,
atom_resindex=residx,
residue_segindex=segidx,
)
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.. automodule:: MDAnalysis.coordinates.PDBx
:members:
Loading