-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
mmtf parser concatenates models #1496
Comments
Behavior is even worse for *1Q0W` where the mmtf parser finds 40 segments while the PDB parser finds only 2. |
It seems that the problem is not so much the segments, but the whole content of the files:
The MMTF file contains 20 times the number of atoms the PDB file has. Looking at the PDB file, there are 20 models. The issue seems to be that we do not account for models in the same way in the PDB and the MMTF parsers. |
Yeah so PDB uses models to differentiate between frames in a trajectory. In MMTF we don't do this, but maybe we should? Ie we could use |
We could default to the first model though. |
That is what we do with the PDB parser. Actually, we do not even read after the first model when reading the topology: https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/topology/PDBParser.py#L158. |
@jbarnoud for topology purposes, we read the first model and assume that all other models have the same topology, the coordinate reader will iterate (the coordinates) over models |
One issue is the handling of models, the other the handling of segments or did I misunderstand? For PDB, segments are parts of a model. Is the same true for mmtf? Are there even segments defined in MMTF? Assuming that a model broadly means the same thing in both formats then I would also go with the current approach to choose the first model for topology building. I think MMTFParser effectively switches the topology when another model is selected, doesn't it? |
@orbeckst the problem is that different models can have a different number of atoms. We currently can't change the topology with frame (model) changes. The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj. |
That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format. |
This sounds like a discussion that we already had and where we consulted Alex Rose; I *think* the conclusion was exactly what Max is saying but it might be worthwhile to look through the issue tracker again.
Ideally we come up with a smart way to deal with changing topologies.
But we also need a short term solution that makes these formats useable in a reasonably consistent way. What are the user expectations here?
… Am Aug 22, 2017 um 11:08 schrieb Max Linke ***@***.***>:
The PDB format using models as frames is a special case, as the number of atoms remains constant, which is why we can treat it as a traj.
That is what most PDBs do. Actually I do not know of a rule in PDB that forbids that models have a different number of atoms. It's just what most people implicitly do, especially since it's abused as a trajectory format.
|
probably not relevant now mmtf is going extinct |
EDIT by @jbarnoud: The issue actually comes from the handling of models by the mmtf paser. See discussion.
Expected behaviour
both give the same number of segments for
1ubq
Actual behaviour
the mmtf parser gives two segments and the pdb parser gives one
Code to reproduce the behaviour
Currently version of MDAnalysis:
0.17.0dev (couple days old)
The text was updated successfully, but these errors were encountered: