-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Removing guess by default in TOPParser #2651
Comments
To add context here, I suddenly realised that AMBER does offer a united atom force field (ff03ua) [as far as I am aware no CG though?]. It looks like the default here is to set the ATOMIC_NUMBER to the element of the heavy atom (i.e. C in CH3). See the following parm7 snippet fo alanine dipeptide:
I'm not a united atom FF user, so I don't know if using the ATOMIC_NUMBER to get the elements would be problematic in this case. Thoughts? |
@IAlibay Yeah that looks like a tricky one. I don't know whether there exists a convention on defining atomic numbers for united-atom groups. Doing so seems just blatantly wrong to me, but here's my two cents on the topic: I think there is no more exact way of defining an element than by supplying its atomic number. Thus, the AMBER ff03ua file you pasted claims that CH3 groups are actually carbon atoms (which they clearly aren't). Nevertheless, whoever wrote this force field decided that it's a good idea to do so, and I think that is what matters. If I decide to feed a water topology to MDAnalysis and claim that all my atoms are Darmstadtium, I want MDA to happily follow along. Atomic numbers are unambiguous information, so there is no need to guess anything here. Whether we want to offer consistency checks is probably a different issue. |
I agree with @zemanj re: validation. PDB elements are validated in #2553 which is necessary for the MDAnalysis paradigm for now, but if/when the guessing system is overhauled, we should take the input as canon. If I've gone to the trouble of labelling some of my atoms 'X' in the element column, I would want that information in MDAnalysis as well. If you squash and replace invalid values with an empty record when you read the file in, then you inhibit later guessing/analysis/postprocessing because the guesser no longer has access to the original info, e.g. in the hypothetical situation where an atom number of 0 means something different (e.g. virtual sites) to an atom number of -1 (e.g. CG bead). Possibly we could put the atomnum attribute to better use here.
I think the MDAnalysis has been using the empty string so far. |
Is your feature request related to a problem?
So a mix between a bug and feature request.
In light of #2630, especially #2630 (comment) I went back and looked at the element parser for the TOPParser. I noticed that not only is it by default guessing elements (using
guess_types
if theATOMIC_NUMBER
record isn't provided), but it also reports bad elements for two of our test files (NA
is reported instead ofN
for PRM and PRM7).Describe the solution you'd like
I would like to remove the guess by default functionality, and make it so that elements are only assigned if a full
ATOMIC_NUMBER
record is present (possibly assigning "empty" records when some of the elements are partially missing [i.e. if the ATOMIC_NUMBER contains negative values].Guessing can then be done a later user-defined point if necessary.
Additional context
I am complicit in this "bug" so I'm willing to fix this, but we probably need a consensus on what an "empty" element record would look like (this might be agreed upon and I've missed the discussion, pinging @lilyminium who would probably know better here).
Update
So the way I see it there are three things that probably need to be done here:
The text was updated successfully, but these errors were encountered: