Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue reading custom GROMACS TOP/ITP files in MDAnalysis 2.0.0 #3419

Closed
mdd31 opened this issue Sep 22, 2021 · 5 comments · Fixed by #3425
Closed

Issue reading custom GROMACS TOP/ITP files in MDAnalysis 2.0.0 #3419

mdd31 opened this issue Sep 22, 2021 · 5 comments · Fixed by #3425

Comments

@mdd31
Copy link

mdd31 commented Sep 22, 2021

Expected behaviour

I'm using MDAnalysis for processing coarse grained simulations run in GROMACS. I need to assign the correct topology to systems to enable further processing. The easiest way to do this in MDAnalysis was to read in this information from the top/itp files to apply to the pdb/trr files from gromacs. This worked in MDAnalysis 1.1.1 (however charges were not detected).

Actual behaviour

In MDAnalysis 2.0.0 a Value error from reading charges in top/itp files is returned.

Traceback (most recent call last):

File "/home/mark/anaconda3/envs/1bpa2/lib/python3.8/site-packages/MDAnalysis/core/universe.py", line 122, in _topology_from_file_like
topology = p.parse(**kwargs)

File "/home/mark/anaconda3/envs/1bpa2/lib/python3.8/site-packages/MDAnalysis/topology/ITPParser.py", line 584, in parse
attrs.append(Attr(np.array(vals, dtype=dtype)))

ValueError: could not convert string to float: '-'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/tmp/ipykernel_7090/573278789.py", line 1, in
mdanalysis.Universe("polymer.top", "examplegromacs.pdb", topology_format="ITP", include_dir="./")

File "/home/mark/anaconda3/envs/1bpa2/lib/python3.8/site-packages/MDAnalysis/core/universe.py", line 336, in init
topology = _topology_from_file_like(self.filename,

File "/home/mark/anaconda3/envs/1bpa2/lib/python3.8/site-packages/MDAnalysis/core/universe.py", line 137, in _topology_from_file_like
raise ValueError(

ValueError: Failed to construct topology from file polymer.top with parser <class 'MDAnalysis.topology.ITPParser.ITPParser'>.
Error: could not convert string to float: '-'

I added some extra logging statements to localise the issue, and I found the behaviour stems from ITPParser when it is reading charges into the array (detailed output in the attached log file mdanalysisissue.log).
The setting of the array in line 534 seems to cause the issue- the datatype is set to '<U1' which later causes only the '-' sign of any residues to be read (later causing the observed issue when type cast to a float).

This breaks my current analysis workflow. I'm upgrading to 2.0.0 to try and utilise the parallelisation support you have added.

I found that setting dtype=object in the array created on line 534 resolves this issue. If the issue is confirmed how would I go about a pull request/ getting this change committed into develop?

Code to reproduce the behavior

import MDAnalysis as mda

u = mda.Universe("polymer.top", "examplegromacs.pdb",  topology_format="ITP", include_dir="./")


....

Example coarse grained files attached that cause the issue. Any residues with a negative charge result in the given error.
mdanalysisissueexamplefiles.zip

Current version of MDAnalysis

  • Which version are you using? (run python -c "import MDAnalysis as mda; print(mda.__version__)")

mdanalysis.version
Out[18]: '2.0.0'

  • Which version of Python (python -V)?

Python 3.8.11

  • Which operating system?

Ubuntu 20.04

@mdd31
Copy link
Author

mdd31 commented Sep 22, 2021

other packages in the environment used is:

name: 1bpa
channels:
  - conda-forge
  - defaults
dependencies:
  - _libgcc_mutex=0.1=conda_forge
  - _openmp_mutex=4.5=1_llvm
  - alabaster=0.7.12=pyhd3eb1b0_0
  - appdirs=1.4.4=pyhd3eb1b0_0
  - argh=0.26.2=py38_0
  - arrow=0.13.1=py38_0
  - astroid=2.6.6=py38h06a4308_0
  - async_generator=1.10=pyhd3eb1b0_0
  - atomicwrites=1.4.0=py_0
  - attrs=21.2.0=pyhd3eb1b0_0
  - autopep8=1.5.6=pyhd3eb1b0_0
  - babel=2.9.1=pyhd3eb1b0_0
  - backcall=0.2.0=pyhd3eb1b0_0
  - binaryornot=0.4.4=pyhd3eb1b0_1
  - biopython=1.79=py38h497a2fe_0
  - black=19.10b0=py38_0
  - blas=1.0=mkl
  - bleach=4.0.0=pyhd3eb1b0_0
  - bokeh=2.3.3=py38h06a4308_0
  - bottleneck=1.3.2=py38heb32a55_1
  - brotlipy=0.7.0=py38h27cfd23_1003
  - bzip2=1.0.8=h7f98852_4
  - c-ares=1.17.2=h7f98852_0
  - ca-certificates=2021.7.5=h06a4308_1
  - certifi=2021.5.30=py38h06a4308_0
  - cffi=1.14.6=py38h400218f_0
  - cftime=1.5.0=py38hb5d20a5_0
  - chardet=4.0.0=py38h06a4308_1003
  - charset-normalizer=2.0.4=pyhd3eb1b0_0
  - click=8.0.1=pyhd3eb1b0_0
  - cloudpickle=1.6.0=pyhd3eb1b0_0
  - colorama=0.4.4=pyh9f0ad1d_0
  - cookiecutter=1.7.2=pyhd3eb1b0_0
  - cryptography=3.4.7=py38hd23ed53_0
  - cudatoolkit=11.2.2=he111cf0_8
  - curl=7.79.0=hea6ffbf_0
  - cycler=0.10.0=py_2
  - cython=0.29.24=py38h295c915_0
  - cytoolz=0.11.0=py38h7b6447c_0
  - dask=2021.8.1=pyhd3eb1b0_0
  - dask-core=2021.8.1=pyhd3eb1b0_0
  - dbus=1.13.18=hb2f20db_0
  - debugpy=1.4.1=py38h295c915_0
  - decorator=5.0.9=pyhd3eb1b0_0
  - defusedxml=0.7.1=pyhd3eb1b0_0
  - deprecated=1.2.12=pyh44b312d_0
  - diff-match-patch=20200713=pyhd3eb1b0_0
  - distributed=2021.8.1=py38h06a4308_0
  - docutils=0.17.1=py38h06a4308_1
  - entrypoints=0.3=py38_0
  - expat=2.4.1=h2531618_2
  - fftw=3.3.9=h27cfd23_1
  - flake8=3.9.0=pyhd3eb1b0_0
  - fontconfig=2.13.1=h6c09931_0
  - freetype=2.10.4=h5ab3b9f_0
  - fsspec=2021.7.0=pyhd3eb1b0_0
  - future=0.18.2=py38_1
  - glib=2.69.1=h5202010_0
  - griddataformats=0.5.0=py_0
  - gsd=2.4.2=py38hb5d20a5_0
  - gst-plugins-base=1.14.0=h8213a91_2
  - gstreamer=1.14.0=h28cd5cc_2
  - hdf4=4.2.15=h10796ff_3
  - hdf5=1.10.6=nompi_h7c3c948_1111
  - heapdict=1.0.1=pyhd3eb1b0_0
  - icu=58.2=he6710b0_3
  - idna=3.2=pyhd3eb1b0_0
  - imagesize=1.2.0=pyhd3eb1b0_0
  - importlib-metadata=4.8.1=py38h06a4308_0
  - importlib_metadata=4.8.1=hd3eb1b0_0
  - inflection=0.5.1=py38h06a4308_0
  - iniconfig=1.1.1=pyhd3eb1b0_0
  - intel-openmp=2021.3.0=h06a4308_3350
  - intervaltree=3.1.0=pyhd3eb1b0_0
  - ipykernel=6.2.0=py38h06a4308_1
  - ipython=7.27.0=py38hb070fc8_0
  - ipython_genutils=0.2.0=pyhd3eb1b0_1
  - isort=5.9.3=pyhd3eb1b0_0
  - jbig=2.1=h7f98852_2003
  - jedi=0.17.2=py38h06a4308_1
  - jeepney=0.7.1=pyhd3eb1b0_0
  - jinja2=2.11.3=pyhd3eb1b0_0
  - jinja2-time=0.2.0=pyhd3eb1b0_2
  - joblib=1.0.1=pyhd8ed1ab_0
  - jpeg=9d=h7f8727e_0
  - jsonschema=3.2.0=pyhd3eb1b0_2
  - jupyter_client=7.0.1=pyhd3eb1b0_0
  - jupyter_core=4.7.1=py38h06a4308_0
  - jupyterlab_pygments=0.1.2=py_0
  - keyring=23.1.0=py38h06a4308_0
  - kiwisolver=1.3.2=py38h1fd1430_0
  - krb5=1.19.2=hcc1bbae_0
  - lazy-object-proxy=1.6.0=py38h27cfd23_0
  - lcms2=2.12=hddcbb42_0
  - ld_impl_linux-64=2.35.1=h7274673_9
  - lerc=2.2.1=h9c3ff4c_0
  - libblas=3.9.0=11_linux64_mkl
  - libcblas=3.9.0=11_linux64_mkl
  - libcurl=7.79.0=h2574ce0_0
  - libdeflate=1.7=h7f98852_5
  - libedit=3.1.20191231=he28a2e2_2
  - libev=4.33=h516909a_1
  - libffi=3.3=he6710b0_2
  - libgcc-ng=11.2.0=h1d223b6_8
  - libgfortran-ng=7.5.0=ha8ba4b0_17
  - libgfortran4=7.5.0=ha8ba4b0_17
  - libnetcdf=4.8.1=nompi_hcd642e3_100
  - libnghttp2=1.43.0=h812cca2_0
  - libpng=1.6.37=hbc83047_0
  - libsodium=1.0.18=h7b6447c_0
  - libspatialindex=1.9.3=h2531618_0
  - libssh2=1.10.0=ha56f1ee_0
  - libstdcxx-ng=11.2.0=he4da1e4_8
  - libtiff=4.3.0=hf544144_1
  - libuuid=1.0.3=h1bed415_2
  - libwebp-base=1.2.1=h7f98852_0
  - libxcb=1.14=h7b6447c_0
  - libxml2=2.9.12=h03d6c58_0
  - libxslt=1.1.34=hc22bd24_0
  - libzip=1.8.0=h4de3113_0
  - llvm-openmp=12.0.1=h4bd325d_1
  - locket=0.2.1=py38h06a4308_1
  - lxml=4.6.3=py38hf1fe3a4_0
  - lz4-c=1.9.3=h9c3ff4c_1
  - markupsafe=1.1.1=py38h7b6447c_0
  - matplotlib=3.4.3=py38h578d9bd_0
  - matplotlib-base=3.4.3=py38hf4fb855_0
  - matplotlib-inline=0.1.2=pyhd3eb1b0_2
  - mccabe=0.6.1=py38_1
  - mistune=0.8.4=py38h7b6447c_1000
  - mkl=2021.3.0=h06a4308_520
  - mkl-service=2.4.0=py38h7f8727e_0
  - mkl_fft=1.3.0=py38h42c9631_2
  - mkl_random=1.2.2=py38h51133e4_0
  - mmtf-python=1.1.2=py_0
  - more-itertools=8.8.0=pyhd3eb1b0_0
  - msgpack-python=1.0.2=py38h1fd1430_1
  - mypy_extensions=0.4.3=py38_0
  - nbclient=0.5.3=pyhd3eb1b0_0
  - nbconvert=6.1.0=py38h06a4308_0
  - nbformat=5.1.3=pyhd3eb1b0_0
  - ncurses=6.2=he6710b0_1
  - nest-asyncio=1.5.1=pyhd3eb1b0_0
  - netcdf4=1.5.7=nompi_py38hcc16cfe_101
  - networkx=2.3=py_0
  - numexpr=2.7.3=py38h22e1b3c_1
  - numpy=1.20.3=py38hf144106_0
  - numpy-base=1.20.3=py38h74d4b33_0
  - numpydoc=1.1.0=pyhd3eb1b0_1
  - ocl-icd=2.3.1=h7f98852_0
  - ocl-icd-system=1.0.0=1
  - olefile=0.46=pyh9f0ad1d_1
  - openmm=7.6.0=py38h30ff9b7_0
  - openssl=1.1.1l=h7f8727e_0
  - packaging=21.0=pyhd3eb1b0_0
  - pandas=1.3.2=py38h8c16a72_0
  - pandocfilters=1.4.3=py38h06a4308_1
  - parso=0.7.0=py_0
  - partd=1.2.0=pyhd3eb1b0_0
  - pathspec=0.7.0=py_0
  - patsy=0.5.1=py_0
  - pcre=8.45=h295c915_0
  - pexpect=4.8.0=pyhd3eb1b0_3
  - pickleshare=0.7.5=pyhd3eb1b0_1003
  - pillow=7.2.0=py38h9776b28_2
  - pip=21.0.1=py38h06a4308_0
  - pluggy=0.13.1=py38h06a4308_0
  - poyo=0.5.0=pyhd3eb1b0_0
  - prompt-toolkit=3.0.17=pyhca03da5_0
  - psutil=5.8.0=py38h27cfd23_1
  - ptyprocess=0.7.0=pyhd3eb1b0_2
  - py=1.10.0=pyhd3eb1b0_0
  - pycodestyle=2.6.0=pyhd3eb1b0_0
  - pycparser=2.20=py_2
  - pydocstyle=6.1.1=pyhd3eb1b0_0
  - pyflakes=2.2.0=pyhd3eb1b0_0
  - pygments=2.10.0=pyhd3eb1b0_0
  - pylint=2.9.6=py38h06a4308_1
  - pyls-black=0.4.6=hd3eb1b0_0
  - pyls-spyder=0.3.2=pyhd3eb1b0_0
  - pyopenssl=20.0.1=pyhd3eb1b0_1
  - pyparsing=2.4.7=pyhd3eb1b0_0
  - pyqt=5.9.2=py38h05f1152_4
  - pyrsistent=0.17.3=py38h7b6447c_0
  - pysocks=1.7.1=py38h06a4308_0
  - pytest=6.2.4=py38h06a4308_2
  - python=3.8.11=h12debd9_0_cpython
  - python-dateutil=2.8.2=pyhd3eb1b0_0
  - python-jsonrpc-server=0.4.0=py_0
  - python-language-server=0.36.2=pyhd3eb1b0_0
  - python-slugify=5.0.2=pyhd3eb1b0_0
  - python_abi=3.8=2_cp38
  - pytz=2021.1=pyhd3eb1b0_0
  - pyxdg=0.27=pyhd3eb1b0_0
  - pyyaml=5.4.1=py38h27cfd23_1
  - pyzmq=22.2.1=py38h295c915_1
  - qdarkstyle=3.0.2=pyhd3eb1b0_0
  - qstylizer=0.1.10=pyhd3eb1b0_0
  - qt=5.9.7=h5867ecd_1
  - qtawesome=1.0.2=pyhd3eb1b0_0
  - qtconsole=5.1.0=pyhd3eb1b0_0
  - qtpy=1.10.0=pyhd3eb1b0_0
  - readline=8.1=h27cfd23_0
  - regex=2021.8.3=py38h7f8727e_0
  - requests=2.26.0=pyhd3eb1b0_0
  - rope=0.19.0=pyhd3eb1b0_0
  - rtree=0.9.7=py38h06a4308_1
  - scikit-learn=0.24.2=py38hacb3eff_1
  - scipy=1.7.1=py38h292c36d_2
  - seaborn=0.11.2=hd8ed1ab_0
  - seaborn-base=0.11.2=pyhd8ed1ab_0
  - secretstorage=3.3.1=py38h06a4308_0
  - setuptools=58.0.4=py38h06a4308_0
  - sip=4.19.13=py38he6710b0_0
  - six=1.16.0=pyhd3eb1b0_0
  - snowballstemmer=2.1.0=pyhd3eb1b0_0
  - sortedcontainers=2.4.0=pyhd3eb1b0_0
  - sphinx=4.0.2=pyhd3eb1b0_0
  - sphinxcontrib-applehelp=1.0.2=pyhd3eb1b0_0
  - sphinxcontrib-devhelp=1.0.2=pyhd3eb1b0_0
  - sphinxcontrib-htmlhelp=2.0.0=pyhd3eb1b0_0
  - sphinxcontrib-jsmath=1.0.1=pyhd3eb1b0_0
  - sphinxcontrib-qthelp=1.0.3=pyhd3eb1b0_0
  - sphinxcontrib-serializinghtml=1.1.5=pyhd3eb1b0_0
  - spyder=5.0.5=py38h06a4308_2
  - spyder-kernels=2.0.5=py38h06a4308_0
  - sqlite=3.36.0=hc218d9a_0
  - statsmodels=0.12.2=py38h6c62de6_0
  - tblib=1.7.0=pyhd3eb1b0_0
  - testpath=0.5.0=pyhd3eb1b0_0
  - text-unidecode=1.3=pyhd3eb1b0_0
  - textdistance=4.2.1=pyhd3eb1b0_0
  - threadpoolctl=2.2.0=pyh8a188c0_0
  - three-merge=0.1.1=pyhd3eb1b0_0
  - tinycss=0.4=pyhd3eb1b0_1002
  - tk=8.6.10=hbc83047_0
  - toml=0.10.2=pyhd3eb1b0_0
  - toolz=0.11.1=pyhd3eb1b0_0
  - tornado=6.1=py38h27cfd23_0
  - tqdm=4.62.3=pyhd8ed1ab_0
  - traitlets=5.0.5=pyhd3eb1b0_0
  - typed-ast=1.4.3=py38h7f8727e_1
  - typing_extensions=3.10.0.2=pyh06a4308_0
  - ujson=4.0.2=py38h2531618_0
  - unidecode=1.2.0=pyhd3eb1b0_0
  - urllib3=1.26.6=pyhd3eb1b0_1
  - watchdog=2.1.3=py38h06a4308_0
  - wcwidth=0.2.5=pyhd3eb1b0_0
  - webencodings=0.5.1=py38_1
  - wheel=0.37.0=pyhd3eb1b0_1
  - whichcraft=0.6.1=pyhd3eb1b0_0
  - wrapt=1.12.1=py38h7b6447c_1
  - wurlitzer=2.1.1=py38h06a4308_0
  - xz=5.2.5=h7b6447c_0
  - yaml=0.2.5=h7b6447c_0
  - yapf=0.31.0=pyhd3eb1b0_0
  - zeromq=4.3.4=h2531618_0
  - zict=2.0.0=pyhd3eb1b0_0
  - zipp=3.5.0=pyhd3eb1b0_0
  - zlib=1.2.11=h7b6447c_3
  - zstd=1.5.0=ha95c52a_0

@orbeckst
Copy link
Member

Hello @mdd31 , I labeled this issue tentatively as a bug (but haven't confirmed further than looking at your report).

Fixes are very welcome, the User Guide has a section to get you started https://userguide.mdanalysis.org/stable/contributing_code.html . In short, create a PR and we discuss details on the PR.

@orbeckst
Copy link
Member

Quick comment from looking at the ITP parser code: If you had to change anything in line 534

self.charges = np.array(self.charges)

then I suspect that the actual problem is elsewhere in how charges are (or are not) picked up.

@mdd31
Copy link
Author

mdd31 commented Sep 23, 2021

@orbeckst Yes, you are correct, From my logging and investigation I found that it is the assignment for the charge values into the array actually causing the issue in the block around line 538-546:

empty = self.charges == ''
self.charges[empty] = [
(
self.atomtypes.get(x)["charge"]
if x in self.atomtypes.keys()
else ''
)
for x in self.types[empty]
]

Due to the previously set dtype it seems to only get the first character of the charge value. I found updating to explicitly set the dtype in line 534 meant the array was already of the appropriate type to not experience the later issues.

@mdd31
Copy link
Author

mdd31 commented Sep 23, 2021

I will work on committing my changes to a local branch and submitting a PR request tomorrow morning (CEST).

Thank you for your help.

mdd31 pushed a commit to mdd31/mdanalysis that referenced this issue Sep 24, 2021
… to solve issue MDAnalysis#3419.

Update to test as extra decimal place is now picked up from file.
@mdd31 mdd31 mentioned this issue Sep 24, 2021
4 tasks
mdd31 pushed a commit to mdd31/mdanalysis that referenced this issue Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants