Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MOL2 parser populates elements attribute #3063

Merged
merged 116 commits into from
Aug 13, 2021
Merged
Show file tree
Hide file tree
Changes from 109 commits
Commits
Show all changes
116 commits
Select commit Hold shift + click to select a range
7af3352
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 7, 2019
51624f3
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 9, 2019
fa7777e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 13, 2019
94ac6cc
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 20, 2019
80370af
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 5, 2019
36fa2ea
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 10, 2019
5111103
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 2, 2019
ad0c149
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 3, 2019
def7c14
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 16, 2019
2d693e3
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 30, 2019
f7d5bc6
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Nov 21, 2019
64bcae5
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Nov 28, 2019
2e2561e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Dec 23, 2019
2184e2b
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 14, 2020
de7afcc
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 16, 2020
ff4ed8d
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 17, 2020
c515e89
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 17, 2020
7d27138
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 18, 2020
3c35af0
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 21, 2020
22def6a
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 23, 2020
93413dc
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 24, 2020
8c5dc00
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 1, 2020
ad1766e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 4, 2020
65e4d66
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 4, 2020
549c1a2
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 6, 2020
f527e62
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 9, 2020
4d5f7de
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 9, 2020
d7eda78
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 13, 2020
ccd77e9
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 15, 2020
7546c3e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 16, 2020
af65b17
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 26, 2020
5944b83
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 27, 2020
640ce81
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Feb 29, 2020
4d639f9
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Mar 3, 2020
30effbd
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Mar 13, 2020
a2951ee
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Mar 15, 2020
51d1347
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Mar 19, 2020
8023811
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Apr 10, 2020
591bf39
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Apr 13, 2020
2e248fb
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Apr 24, 2020
9009c82
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 15, 2020
a0b2a79
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 19, 2020
833aff3
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 19, 2020
d263d1f
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 25, 2020
a6a1976
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 29, 2020
888b1f9
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 3, 2020
13d6c5e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 5, 2020
9581ee4
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 5, 2020
d1ff4c4
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 6, 2020
dbe7750
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 8, 2020
c454e4f
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 8, 2020
2002570
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 9, 2020
b11c517
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 10, 2020
264b564
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 10, 2020
a5d7d68
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 11, 2020
d7902cf
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 21, 2020
fe8420e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jul 8, 2020
7330ca3
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 11, 2020
da33ac6
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 21, 2020
698b2b1
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 2, 2020
9192604
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 2, 2020
ca745a3
Merge branch 'develop' into upstream/develop
RMeli Sep 2, 2020
206b547
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 2, 2020
58e79d2
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Sep 19, 2020
54c2af9
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 7, 2020
488484c
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 7, 2020
0359fec
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Oct 13, 2020
751d441
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Nov 9, 2020
1ec8b20
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Dec 3, 2020
3f8c5a6
Adds basic GH Actions CI workflow (#3040)
IAlibay Dec 2, 2020
fac470c
write failing test
RMeli Dec 8, 2020
7109ebe
revert coordinates
RMeli Dec 8, 2020
bbbd6f4
add correct failing test
RMeli Dec 8, 2020
b44ca84
populate elements from MOL2
RMeli Dec 8, 2020
2f08e60
pep8
RMeli Dec 8, 2020
3e60ebe
changelog
RMeli Dec 8, 2020
2661c9b
validate elements and test
RMeli Dec 8, 2020
660750c
fix pep8
RMeli Dec 8, 2020
cca4ac6
Try to fix git mess
RMeli Dec 8, 2020
d87f651
try to fix git mess
RMeli Dec 8, 2020
c354c12
one warning per invalid element
RMeli Dec 8, 2020
815a399
print only a single warning
RMeli Dec 8, 2020
e64fdd9
print set directly
RMeli Dec 8, 2020
7717963
do not capitalize elements
RMeli Dec 12, 2020
2dafafb
Merge remote-tracking branch 'upstream/develop' into mol2-elements
RMeli Dec 12, 2020
db032a1
Merge remote-tracking branch 'upstream/develop' into mol2-elements
RMeli Dec 12, 2020
69585b3
uncomment tests for now merged PRs
RMeli Dec 12, 2020
df10a1b
prepare fix for rdkit test from PR #3069
RMeli Dec 12, 2020
c57697e
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jan 13, 2021
d7bbd32
Merge branch 'MDAnalysis:develop' into develop
RMeli May 6, 2021
b6bcd3c
Merge remote-tracking branch 'upstream/develop' into develop
RMeli May 7, 2021
a4e9f15
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Jun 17, 2021
ec71a6d
Merge remote-tracking branch 'upstream/develop' into develop
RMeli Aug 3, 2021
cdb1fdd
Merge branch 'develop' into mol2-elements
RMeli Aug 3, 2021
41a8025
delete topology attribute for RDKit test
RMeli Aug 3, 2021
fa96cbc
add dictionary of SYBYL atom types
RMeli Aug 3, 2021
899717c
check types are SYBYL atom sypes
RMeli Aug 3, 2021
be39fe1
conform SYBYL2SYMB and test mol2 files to standard
RMeli Aug 3, 2021
0bad3f6
support alternative S types
RMeli Aug 4, 2021
a8f70f3
fix typo
RMeli Aug 4, 2021
7c600f8
cleanup old code
RMeli Aug 4, 2021
b69877f
remove carbon dummy atom as valid element
RMeli Aug 5, 2021
3161a7c
fix docstring
RMeli Aug 5, 2021
c2ca5aa
test with all supported elements and a few unsupported ones
RMeli Aug 5, 2021
d06f110
changelog enhancement instead of fix
RMeli Aug 5, 2021
ea0bee8
revert changes on test files to non-standard ones
RMeli Aug 5, 2021
ee3a91c
add comment on dictionary
RMeli Aug 5, 2021
ad50c49
add elements to list of attributes
RMeli Aug 5, 2021
cde5269
add comment
RMeli Aug 5, 2021
7411a5b
Update package/MDAnalysis/topology/MOL2Parser.py
RMeli Aug 12, 2021
bca37da
link to internet archive
RMeli Aug 12, 2021
59dce63
pep8 import
RMeli Aug 12, 2021
1f39cdc
move versionchanged
RMeli Aug 12, 2021
0e95bf3
fix problem where element attibute was added also with all invalid el…
RMeli Aug 12, 2021
a170c44
switch try/except for if/else
RMeli Aug 12, 2021
500ccef
Update package/CHANGELOG
RMeli Aug 13, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions package/CHANGELOG
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,7 @@ Fixes
* new `Results` class now can be pickled/unpickled. (PR #3309)

Enhancements
* MOL2 parser populates elements attribute from SYMBL atom types (Issue #3062)
RMeli marked this conversation as resolved.
Show resolved Hide resolved
* Added guessers for aromaticity and Gasteiger partial charges (Issue #2468,
PR #2926)
* Added `Results` class for storing analysis results (#3115, PR #3233)
Expand Down
35 changes: 33 additions & 2 deletions package/MDAnalysis/topology/MOL2Parser.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,13 +53,17 @@
Atomtypes,
Bonds,
Charges,
Elements,
Masses,
Resids,
Resnums,
Resnames,
Segids,
)
from ..core.topology import Topology
from .tables import SYBYL2SYMB

import warnings


class MOL2Parser(TopologyReaderBase):
Expand All @@ -70,13 +74,18 @@ class MOL2Parser(TopologyReaderBase):
- Atomnames
- Atomtypes
- Charges
- Resids,
- Resids
- Resnames
- Bonds
- Elements

Guesses the following:
- masses

Notes
-----
Elements are obtained directly from the SYBYL atom types.
RMeli marked this conversation as resolved.
Show resolved Hide resolved

.. versionchanged:: 0.9
Now subclasses TopologyReaderBase
.. versionchanged:: 0.20.0
Expand All @@ -93,6 +102,10 @@ def parse(self, **kwargs):
Returns
-------
A MDAnalysis Topology object


.. versionchanged: 2.0.0
RMeli marked this conversation as resolved.
Show resolved Hide resolved
Parse elements from atom types.
RMeli marked this conversation as resolved.
Show resolved Hide resolved
"""
blocks = []

Expand Down Expand Up @@ -148,7 +161,22 @@ def parse(self, **kwargs):

n_atoms = len(ids)

masses = guessers.guess_masses(types)
validated_elements = []
invalid_elements = set()
for at in types:
try:
RMeli marked this conversation as resolved.
Show resolved Hide resolved
validated_elements.append(SYBYL2SYMB[at])
except KeyError:
invalid_elements.add(at)
validated_elements.append('')

# Print single warning for all unknown elements, if any
if invalid_elements:
warnings.warn("Unknown elements found for some "
f"atoms: {invalid_elements}. "
"These have been given an empty element record.")

masses = guessers.guess_masses(validated_elements)

attrs = []
attrs.append(Atomids(np.array(ids, dtype=np.int32)))
Expand All @@ -157,6 +185,9 @@ def parse(self, **kwargs):
attrs.append(Charges(np.array(charges, dtype=np.float32)))
attrs.append(Masses(masses, guessed=True))

if not np.all(validated_elements == ''):
attrs.append(Elements(np.array(validated_elements, dtype="U3")))

resids = np.array(resids, dtype=np.int32)
resnames = np.array(resnames, dtype=object)

Expand Down
34 changes: 34 additions & 0 deletions package/MDAnalysis/topology/tables.py
Original file line number Diff line number Diff line change
Expand Up @@ -394,3 +394,37 @@ def kv2dict(s, convertor=str):
113: 'Nh', 114: 'Fl', 115: 'Mc', 116: 'Lv', 117: 'Ts', 118: 'Og'}

SYMB2Z = {v:k for k, v in Z2SYMB.items()}

# Conversion between SYBYL atom types and corresponding elements
# Tripos MOL2 file format: https://zhanglab.ccmb.med.umich.edu/DockRMSD/mol2.pdf
RMeli marked this conversation as resolved.
Show resolved Hide resolved
SYBYL2SYMB = {
RMeli marked this conversation as resolved.
Show resolved Hide resolved
"H": "H", "H.spc": "H", "H.t3p": "H",
"C.3": "C", "C.2": "C", "C.1": "C", "C.ar": "C", "C.cat": "C",
"N.3": "N", "N.2": "N", "N.1": "N", "N.ar": "N",
"N.am": "N", "N.pl3": "N", "N.4": "N",
"O.3": "O", "O.2": "O", "O.co2": "O", "O.spc": "O", "O.t3p": "O",
"S.3": "S", "S.2": "S", "S.O": "S", "S.O2": "S",
"S.o": "S", "S.o2": "S", # Non-standard but often found in the wild...
"P.3": "P",
"F": "F",
"Li": "Li",
"Na": "Na",
"Mg": "Mg",
"Al": "Al",
"Si": "Si",
"K": "K",
"Ca": "Ca",
"Cr.th": "Cr",
"Cr.oh": "Cr",
"Mn": "Mn",
"Fe": "Fe",
"Co.oh": "Co",
"Cu": "Cu",
"Cl": "Cl",
"Br": "Br",
"I": "I",
"Zn": "Zn",
"Se": "Se",
"Mo": "Mo",
"Sn": "Sn",
}
4 changes: 4 additions & 0 deletions testsuite/MDAnalysisTests/converters/test_rdkit.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,10 @@ def test_identical_topology(self, rdmol):

def test_raise_requires_elements(self):
u = mda.Universe(mol2_molecule)

# Delete topology attribute (PR #3069)
u.del_TopologyAttr('elements')

with pytest.raises(
AttributeError,
match="`elements` attribute is required for the RDKitConverter"
Expand Down
103 changes: 101 additions & 2 deletions testsuite/MDAnalysisTests/topology/test_mol2.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,15 @@
mol2_molecule,
mol2_molecules,
)
from numpy.testing import assert_equal

import numpy as np
from io import StringIO
RMeli marked this conversation as resolved.
Show resolved Hide resolved

class TestMOL2Base(ParserBase):
parser = mda.topology.MOL2Parser.MOL2Parser
expected_attrs = [
'ids', 'names', 'types', 'charges', 'resids', 'resnames', 'bonds'
'ids', 'names', 'types', 'charges', 'resids', 'resnames', 'bonds',
'elements',
]
guessed_attrs = ['masses']
expected_n_atoms = 49
Expand All @@ -50,6 +52,7 @@ def test_attr_size(self, top):
assert len(top.charges) == top.n_atoms
assert len(top.resids) == top.n_residues
assert len(top.resnames) == top.n_residues
assert len(top.elements) == top.n_atoms

def test_bonds(self, top):
assert len(top.bonds) == 49 # bonds for 49 atoms
Expand All @@ -68,3 +71,99 @@ def test_bond_orders():
u = mda.Universe(mol2_molecule)
orders = [bond.order for bond in u.atoms.bonds]
assert_equal(orders, ref_orders)


def test_elements():
u = mda.Universe(mol2_molecule)

assert_equal(
u.atoms.elements[:5],
np.array(["N", "S", "N", "N", "O"], dtype="U3")
)


# Test for #2927
def test_elements_selection():
u = mda.Universe(mol2_molecule)
ag = u.select_atoms("element S")

assert_equal(
ag.elements,
np.array(["S", "S"], dtype="U3")
)


mol2_wrong_element = """\
@<TRIPOS>MOLECULE
FXA101_1
49 51 1 0 0
SMALL
USER_CHARGES


@<TRIPOS>ATOM
1 N1 6.8420 9.9900 22.7430 N.am 1 Q101 -0.8960
2 S1 8.1400 9.2310 23.3330 X.o2 1 Q101 1.3220
3 N2 4.4000 9.1300 20.4710 XX.am 1 Q101 -0.3970
"""


def test_wrong_elements_warnings():
RMeli marked this conversation as resolved.
Show resolved Hide resolved
with pytest.warns(UserWarning, match='Unknown elements found') as record:
u = mda.Universe(StringIO(mol2_wrong_element), format='MOL2')

# One warning from invalid elements, one from invalid masses
assert len(record) == 2
print(record[0].message)

expected = np.array(['N', '', ''], dtype=object)
assert_equal(u.atoms.elements, expected)


mol2_fake = """\
@<TRIPOS>MOLECULE
FXA101_1
49 0 0 0 0
SMALL
USER_CHARGES


@<TRIPOS>ATOM
1 H1 0.0000 1.0000 10.0000 H.spc 1 XXXX 5.0000
2 H2 0.0000 1.0000 10.0000 H.t3p 1 XXXX 5.0000
3 H3 0.0000 1.0000 10.0000 H.xyz 1 XXXX 5.0000
4 C1 0.0000 1.0000 10.0000 C.1 1 XXXX 5.0000
5 C2 0.0000 1.0000 10.0000 C.2 1 XXXX 5.0000
5 C3 0.0000 1.0000 10.0000 C.3 1 XXXX 5.0000
6 C4 0.0000 1.0000 10.0000 C.ar 1 XXXX 5.0000
7 C5 0.0000 1.0000 10.0000 C.cat 1 XXXX 5.0000
8 C6 0.0000 1.0000 10.0000 C.xyz 1 XXXX 5.0000
9 N1 0.0000 1.0000 10.0000 N.1 1 XXXX 5.0000
10 N2 0.0000 1.0000 10.0000 N.2 1 XXXX 5.0000
11 N3 0.0000 1.0000 10.0000 N.3 1 XXXX 5.0000
12 N4 0.0000 1.0000 10.0000 N.ar 1 XXXX 5.0000
13 O1 0.0000 1.0000 10.0000 O.2 1 XXXX 5.0000
14 O2 0.0000 1.0000 10.0000 O.3 1 XXXX 5.0000
15 O3 0.0000 1.0000 10.0000 O.co2 1 XXXX 5.0000
16 O4 0.0000 1.0000 10.0000 O.spc 1 XXXX 5.0000
16 O5 0.0000 1.0000 10.0000 O.t3p 1 XXXX 5.0000
17 S1 0.0000 1.0000 10.0000 S.3 1 XXXX 5.0000
18 S2 0.0000 1.0000 10.0000 S.2 1 XXXX 5.0000
19 S3 0.0000 1.0000 10.0000 S.O 1 XXXX 5.0000
20 S4 0.0000 1.0000 10.0000 S.O2 1 XXXX 5.0000
21 S5 0.0000 1.0000 10.0000 S.o 1 XXXX 5.0000
22 S6 0.0000 1.0000 10.0000 S.o2 1 XXXX 5.0000
23 P1 0.0000 1.0000 10.0000 P.3 1 XXXX 5.0000
24 Cr1 0.0000 1.0000 10.0000 Cr.th 1 XXXX 5.0000
25 Cr2 0.0000 1.0000 10.0000 Cr.oh 1 XXXX 5.0000
26 Co1 0.0000 1.0000 10.0000 Co.oh 1 XXXX 5.0000
"""

def test_all_elements():
with pytest.warns(UserWarning, match='Unknown elements found') as record:
u = mda.Universe(StringIO(mol2_fake), format='MOL2')

expected = ["H"] * 2 + [""] + ["C"] * 5 + [""] + ["N"] * 4 + ["O"] * 5 + \
["S"] * 6 + ["P"] + ["Cr"] * 2 + ["Co"]
expected = np.array(expected, dtype=object)
assert_equal(u.atoms.elements, expected)