-
Notifications
You must be signed in to change notification settings - Fork 663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add contexts to guess attributes better (especially elements and masses) #2630
Comments
I did not look at the code, but I find the proposal sound. Guessers have always been what frustrates me the most in mdanalysis. One thing to be aware of is that some contexts will require maintenance and careful versioning. I think especially about a context for the martini coarse grained force field : to guess masses from a gro file, you need your context to know the mapping for all the residues in your universe. Yet, those get updated from time to time. Similar issues will come with united atom force fields. The context should be as easy as possible to create, extend, and modify without editing the code of mdanalysis. |
@jbarnoud Thanks for having a look!
Yes, this is why I think being able to read from a file (preferably straight from a force field file) would be important. In that scenario the user would ideally be to be able to override whatever masses are generated from the parser with: >>> context = Context.from_files('martini.itp', 'martini_aa.itp',
'martini_ions.itp', name='my_martini')
>>> u = mda.Universe('martini.gro')
>>> u.guess_TopologyAttr('masses', from_attrs=['names', 'resnames'],
context='my_martini') perhaps prompted by a helpful warning message. I don't think it would be practical to try to keep up with force field updates. |
I like this idea. To play devil's advocate, I'm not keen on adding more lines to load a Universe - part of what the package does is try and hide complexity, ie masses are magically guessed often. But maybe it's cleaner and more correct to not guess anything by default to force users to understand what is from their file and what is derived information. |
@richardjgowers as >>> context = Context.from_files('martini.itp', 'martini_aa.itp',
'martini_ions.itp', name='my_martini')
>>> u = mda.Universe('martini.gro', context='my_martini') Under the hood, GROParser dumps all the available topology attributes into
If the user chooses not to load a new context, the result would wind up being the status quo with masses of 0.0 (assuming MARTINI particles aren't in >>> u = mda.Universe('martini.gro', context='base') Edit: I agree that masses are important and should be guessed, and this may also apply to elements -- but I think a way to easily guess using different information to the tables in guessers, and to easily overwrite the attribute with a guess that the user manually makes, would be really convenient. |
So would it make sense for the context to do the guessing? ie context = Context.from_files(...)
masses = context.guess_masses(topology) # modifies topology in place? Where the default |
MARTINI all have either 72 (which is the case for non-polarizable version) or 36 and 72 for (polarizable version). The particle which has a mass of 36 can be easily picked up by the presence of the name. for residue in protein.select_atoms('name D').residues:
if residue.resname == 'LYS':
residue.atoms.select_atoms('name Qd D').masses = 36
elif residue.resname == 'THR':
residue.atoms.select_atoms('name N0 D').masses = 36 |
I really like that. I would just advocate to ditch the name here, and to refer directly to the object. Just for the sake of reducing complexity. |
It could! I left it as a classmethod because I like x.from(y), but it’s probably easier to subclass and modify as a Context method. However my method allows users to match values that are not in the topology by passing it in themselves, which I thought might matter in some way.
Sure. I would leave the name as an option just because it follows the topology format and because MDAnalysis might like to offer some default contexts. |
This is not completely accurate. The "small" beads used (mostly) in rings have a mass of 45 amu. You need to know the topology of the residue to know which beads are small. Added to that, the upcoming version of the force field has 3 bead sizes with different masses and many more molecules. But it is just an example, other cg force fields do have different masses for each particles. |
tl;dr: We should not guess anything by default. I'd like to take up what @richardjgowers said:
In my opinion, this is not a "maybe". It may appear like a nice convenience feature if properties can be guessed automatically. However, there is no scientifically sound way to get around actually providing the correct numbers to obtain reliable data. Results obtained with the help of MDAnalysis are regularly published in scientific papers. Any such result must not rely on educated guesses but on facts. And guessing masses is not "deriving". It is guessing. After all, we cannot know for sure which masses/charges/whatever were used in the simulation unless the user provides this information. |
@lilyminium can we close this issue with #3704 merged (and raise individual ones for specific guessers) or should this remain open as an omnibus issue? |
I started writing this as a comment in #2553 but it got, wow, really long. Let me know if I should move it there to carry on the discussion, or to #598.
Is your feature request related to a problem?
I think any effort to work with element-dependent stuff like finding hydrogens from H-bond donors (#2521) will be hampered by the element/type guesser, which is not good (...noting that I wrote the current version). Workarounds like looking at mass only work if the mass is not derived from an incorrectly guessed element. Few file formats directly provide element information, so it needs to be guessed from other information. In practice elements are only guessed from atom names, leading to all sorts of fun results.
Here I round up a bit of what has been discussed in the past and (re-)propose how it could be improved. This is more of an initial idea to gather suggestions than a real solution. Please let me know what you think!
Summary
The problem of guessing elements (and topology attributes in general) has come up many times (#598, #942, #1808, #2331, #2348, #2364, #2553). Some non-comprehensive history of discussion and current state of affairs:
guess_masses
to do it, we simply check if the input (an atom type or name) is an element. If not, the mass defaults to 0.0 and a warning is raised.[guess_atom_mass(a.type) for a in ag]
, it passes the atom type toguess_atom_element()
which does some string fudging to see if subsets of the string might be an element. If an element is not found, it just returns the original input, but with symbols stripped. No warning is raised byguess_atom_element
, butvalidate_atom_types
does.In #2553 there has been interesting discussion of how to guess elements appropriately, and what we can infer from them (e.g. element <-> mass is no longer so straightforward, given HMR systems).
Describe the solution you'd like
We need a better element guesser (which then results in better guessing of related properties). With that solution, (imo) we should consider these:
And these would be nice to have:
Proposal
This is just @jbarnoud and @mnmelo's proposal but with more spitballing --
Add a Context class:
Basically a database for straightforward element-type-mass-name-radii-etc relationships. Something like a Pandas dataframe rather than dicts, to support looking up any attribute by any other attribute, e.g.
Should be easily subclassed or otherwise modified by users. Users should be able to add new attributes (columns) to the table, and attribute combinations (rows), with minimal fuss
Should be read/writable to file
Registered with the same metaclass trick as Parsers and Readers
Add more flexible guesser methods
These could either be part of the
Context
class, be their own class, or (my favourite) be class methods ofTopologyAttr
. I don't think bundling them withContext
is so helpful because most of the CG/HMR issues can be solved by just changing the values in the table. This also means we don't have to validate categorical values like elements ourselves, we just look them up in the table.API
An API that matches
add_TopologyAttr
would be convenient for the user:and developers when working with Parsers, which all contribute different information:
Additional context
Below is an example implementation that could work well with that desired API. It is far from the real thing, just to show what I'm thinking.
Context class that contains the data and looks up close matches and stuff
able to look up any attribute from any combination of other attributes
can return close matches, again for any attribute, from any other attribute. e.g. get the element from a close match to the mass
would be nice: matching ranges of values (e.g. HE atom is H if mass is < 4)
may need different matching methods based on the type of the value passed in
Example guessing method for Elements
This is actually pretty general and I guess there could be some base method where you pass in
match_exact
andmatch_similar
. It takes both instantiated TopologyAttr objects and keyword-named arrays, and returns an Element instance.The text was updated successfully, but these errors were encountered: