Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removing guess by default in TOPParser #2651

Closed
2 tasks
IAlibay opened this issue Mar 29, 2020 · 3 comments · Fixed by #2714
Closed
2 tasks

Removing guess by default in TOPParser #2651

IAlibay opened this issue Mar 29, 2020 · 3 comments · Fixed by #2714

Comments

@IAlibay
Copy link
Member

IAlibay commented Mar 29, 2020

Is your feature request related to a problem?

So a mix between a bug and feature request.
In light of #2630, especially #2630 (comment) I went back and looked at the element parser for the TOPParser. I noticed that not only is it by default guessing elements (using guess_types if the ATOMIC_NUMBER record isn't provided), but it also reports bad elements for two of our test files (NA is reported instead of N for PRM and PRM7).

Describe the solution you'd like

I would like to remove the guess by default functionality, and make it so that elements are only assigned if a full ATOMIC_NUMBER record is present (possibly assigning "empty" records when some of the elements are partially missing [i.e. if the ATOMIC_NUMBER contains negative values].

Guessing can then be done a later user-defined point if necessary.

Additional context

I am complicit in this "bug" so I'm willing to fix this, but we probably need a consensus on what an "empty" element record would look like (this might be agreed upon and I've missed the discussion, pinging @lilyminium who would probably know better here).

Update

So the way I see it there are three things that probably need to be done here:

  • Remove guess by default for elements in the TOPParser (this should also partly fix EP handling in parm7 files #2449 [at least users will be told that it wasn't recognised as P]
  • Add use of the atomnum attribute to the TOPParser
@IAlibay
Copy link
Member Author

IAlibay commented Mar 29, 2020

To add context here, I suddenly realised that AMBER does offer a united atom force field (ff03ua) [as far as I am aware no CG though?]. It looks like the default here is to set the ATOMIC_NUMBER to the element of the heavy atom (i.e. C in CH3). See the following parm7 snippet fo alanine dipeptide:

%FLAG ATOM_NAME
%FORMAT(20a4)
CH3 C   O   N   H   CA  HA  CB  C   O   N   H   CH3
%FLAG CHARGE
%FORMAT(5E16.8)
  1.04557735E+00  8.25646946E+00 -9.85343539E+00 -7.37589504E+00  5.36238555E+00
 -1.57422450E-01  1.91674907E+00 -2.08827558E-02  1.03907928E+01 -1.01157272E+01
 -8.77327211E+00  5.21844761E+00  4.10619486E+00
%FLAG ATOMIC_NUMBER
%FORMAT(10I8)
       6       6       8       7       1       6       1       6       6       8
       7       1       6
%FLAG MASS

I'm not a united atom FF user, so I don't know if using the ATOMIC_NUMBER to get the elements would be problematic in this case. Thoughts?

@zemanj
Copy link
Member

zemanj commented Mar 29, 2020

@IAlibay Yeah that looks like a tricky one. I don't know whether there exists a convention on defining atomic numbers for united-atom groups. Doing so seems just blatantly wrong to me, but here's my two cents on the topic:

I think there is no more exact way of defining an element than by supplying its atomic number. Thus, the AMBER ff03ua file you pasted claims that CH3 groups are actually carbon atoms (which they clearly aren't). Nevertheless, whoever wrote this force field decided that it's a good idea to do so, and I think that is what matters. If I decide to feed a water topology to MDAnalysis and claim that all my atoms are Darmstadtium, I want MDA to happily follow along.

Atomic numbers are unambiguous information, so there is no need to guess anything here. Whether we want to offer consistency checks is probably a different issue.

@lilyminium
Copy link
Member

I agree with @zemanj re: validation. PDB elements are validated in #2553 which is necessary for the MDAnalysis paradigm for now, but if/when the guessing system is overhauled, we should take the input as canon. If I've gone to the trouble of labelling some of my atoms 'X' in the element column, I would want that information in MDAnalysis as well. If you squash and replace invalid values with an empty record when you read the file in, then you inhibit later guessing/analysis/postprocessing because the guesser no longer has access to the original info, e.g. in the hypothetical situation where an atom number of 0 means something different (e.g. virtual sites) to an atom number of -1 (e.g. CG bead). Possibly we could put the atomnum attribute to better use here.

what an "empty" element record would look like

I think the MDAnalysis has been using the empty string so far.

@IAlibay IAlibay mentioned this issue May 27, 2020
4 tasks
orbeckst added a commit that referenced this issue May 28, 2020
- Fixes #2699
- Changes made in this Pull Request:
    - Removes MDAnalysis.topology._elements.py (Remove `NUMBER_TO_ELEMENTS`)
    - Switches `NUMBER_TO_ELEMENTS` to `Z2SYMB` in TOPParser
    - Adds a more thorough elements test and starts cleaning up tests in anticipation of #2651
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants