mmtf parser concatenates models #1496

kain88-de · 2017-07-17T09:09:14Z

EDIT by @jbarnoud: The issue actually comes from the handling of models by the mmtf paser. See discussion.

Expected behaviour

both give the same number of segments for 1ubq

Actual behaviour

the mmtf parser gives two segments and the pdb parser gives one

Code to reproduce the behaviour

import MDAnalysis as mda

u = mda.fetch_mmtf('1UBQ')
print(u.atoms.segments)
u = mda.Universe('1ubq.pdb')
print(u.atoms.segments)

....

Currently version of MDAnalysis:

0.17.0dev (couple days old)

The text was updated successfully, but these errors were encountered:

kain88-de · 2017-07-17T09:11:49Z

Behavior is even worse for *1Q0W` where the mmtf parser finds 40 segments while the PDB parser finds only 2.

jbarnoud · 2017-07-24T14:33:55Z

It seems that the problem is not so much the segments, but the whole content of the files:

import MDAnalysis as mda

u_mmtf = mda.fetch_mmtf('1Q0W')
print(len(u_mmtf.atoms))  # 32420
u_pdb = mda.Universe('/home/jon/Downloads/1q0w.pdb')
print(len(u_pdb.atoms))  # 1621

The MMTF file contains 20 times the number of atoms the PDB file has. Looking at the PDB file, there are 20 models. The issue seems to be that we do not account for models in the same way in the PDB and the MMTF parsers.

richardjgowers · 2017-07-24T19:42:02Z

Yeah so PDB uses models to differentiate between frames in a trajectory. In MMTF we don't do this, but maybe we should? Ie we could use u.trajectory[i] to select the ith model. I think the problem with this was that models didn't necessarily have the same number of atoms, which breaks our trajectory iteration model.

kain88-de · 2017-07-24T20:46:51Z

We could default to the first model though.

jbarnoud · 2017-07-25T08:38:51Z

That is what we do with the PDB parser. Actually, we do not even read after the first model when reading the topology: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/PDBParser.py#L158.

richardjgowers · 2017-07-25T15:58:10Z

@jbarnoud for topology purposes, we read the first model and assume that all other models have the same topology, the coordinate reader will iterate (the coordinates) over models

orbeckst · 2017-08-21T23:53:12Z

One issue is the handling of models, the other the handling of segments or did I misunderstand?

For PDB, segments are parts of a model. Is the same true for mmtf? Are there even segments defined in MMTF?

Assuming that a model broadly means the same thing in both formats then I would also go with the current approach to choose the first model for topology building. I think MMTFParser effectively switches the topology when another model is selected, doesn't it?

richardjgowers · 2017-08-22T12:57:12Z

@orbeckst the problem is that different models can have a different number of atoms. We currently can't change the topology with frame (model) changes.

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

kain88-de · 2017-08-22T15:08:51Z

The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.

That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format.

orbeckst · 2017-08-22T17:10:57Z

This sounds like a discussion that we already had and where we consulted Alex Rose; I *think* the conclusion was exactly what Max is saying but it might be worthwhile to look through the issue tracker again. Ideally we come up with a smart way to deal with changing topologies. But we also need a short term solution that makes these formats useable in a reasonably consistent way. What are the user expectations here?

…

Am Aug 22, 2017 um 11:08 schrieb Max Linke ***@***.***>: The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj. That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format.

richardjgowers · 2024-08-22T09:07:11Z

probably not relevant now mmtf is going extinct

jbarnoud mentioned this issue Jul 24, 2017

Molecule type as a topology attribute when reading a TPR #1555

Closed

richardjgowers added the Format-MMTF label Jul 24, 2017

jbarnoud changed the title ~~mmtf parser gives different number of segments then pdb parser~~ mmtf parser concatenates models Jul 25, 2017

richardjgowers added the wontfix label Aug 22, 2024

richardjgowers closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mmtf parser concatenates models #1496

mmtf parser concatenates models #1496

kain88-de commented Jul 17, 2017 •

edited by jbarnoud

Loading

kain88-de commented Jul 17, 2017

jbarnoud commented Jul 24, 2017

richardjgowers commented Jul 24, 2017

kain88-de commented Jul 24, 2017

jbarnoud commented Jul 25, 2017

richardjgowers commented Jul 25, 2017

orbeckst commented Aug 21, 2017

richardjgowers commented Aug 22, 2017

kain88-de commented Aug 22, 2017

orbeckst commented Aug 22, 2017 via email

richardjgowers commented Aug 22, 2024

mmtf parser concatenates models #1496

mmtf parser concatenates models #1496

Comments

kain88-de commented Jul 17, 2017 • edited by jbarnoud Loading

Expected behaviour

Actual behaviour

Code to reproduce the behaviour

Currently version of MDAnalysis:

kain88-de commented Jul 17, 2017

jbarnoud commented Jul 24, 2017

richardjgowers commented Jul 24, 2017

kain88-de commented Jul 24, 2017

jbarnoud commented Jul 25, 2017

richardjgowers commented Jul 25, 2017

orbeckst commented Aug 21, 2017

richardjgowers commented Aug 22, 2017

kain88-de commented Aug 22, 2017

orbeckst commented Aug 22, 2017 via email

richardjgowers commented Aug 22, 2024

kain88-de commented Jul 17, 2017 •

edited by jbarnoud

Loading