-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to use information in topology files #942
Comments
Since you mention guesses, I link #507 for the record. |
In general I like being able to work with reasonable automatic guesses if
|
I think I'm in very much favour of 2. If someone provides a file full of information we should make it all available. Some stuff might not work, but as long as this is properly documented (ie what you need to provide for each calculation) it should be fine. If you only provide an XYZ as your topology, you can't go crying when Resname based stuff doesn't work! WRT masses, I think we've removed that currently in 363 just because we've not implemented guessers yet. I'd happily backtrack on this because they can be guessed with a fair degree of confidence |
But we should fail with an error message saying that this isn't available information in the chosen topology instead of just throwing an attribute error. Another thing is this makes code fragile that depends on MDAnalysis. We should have a defined set of values that have to exist independent of the topology and a set of values that are guessed when non are found. This will help a lot for people who write their own libraries on top of MDAnalysis. |
So we could hack into the |
I think we should start a wiki page where we document names and what default behavior we expect from them. A wiki page might be easier to handle as an overview than a long issue thread. The discussion can still happen here. |
In #598 I'm working on implementing the guessers for #363. Some decisions must be made at this point, even after choosing option 2. above. I need feedback for the following:
|
I guess a good starting point is to try and follow the previous api. So all attributes that had to exist before are handled gracefully when they do not exist. This can probably get done through hacking into Eg if I try and access WRT masses, if these are guessed by default, they should still raise some sort of warning on Universe creation ( Bonds/Angles/Dihedrals are kind of a special case as they don't belong to the Atoms in the same way as masses, so they deserve their own first class position like |
I don't agree with the I'd propose to let the user decide whether to guess bonds or not at |
Anyway, summarizing this thread: attributes that are absent from the topology at Universe creation time can fall in one of four categories:
Let's then decide which attributes go into each category? What's the best way to gather a consensus vote for this, a shared Google Docs spreadsheet or @kain88-de's suggestion for a Wiki page? |
I guess make a wiki page so it's all on the repo still |
There, I made a wiki page at https://github.com/MDAnalysis/mdanalysis/wiki/Topology-attribute-handling |
My rationale for the classification I already added was to have a few attributes with VIP status, like mass, charge, and element, which we can expect in a format-independent manner. These are either guessed or left missing if guessing fails. Bond guessing also follows this. Most other attributes can either be safely set to defaults, or, for the more format-specific ones, left missing. |
Thanks for the page @mnmelo. I added my categories and and also added a Comments heading below. Maybe paste your comment regarding your rationale into the comments section, too? Btw, tempfactors and bfactors are the same thing. I'd keep bfactors because this is what we used in the past.
Oliver Beckstein * [email protected] |
@mnmelo can you shepherd the discussion towards a conclusion? |
Sure! I was waiting for @jbarnoud's input, but he just told me that he's somewhere in between my and @richardjgowers' opinion, so let's leave him as an advisor in case we can't reach a consensus in some cases. I appreciate your opinions, and patience reading this relatively long post. As a summary: there is mostly agreement on how to proceed with missing attributes, but there's also some divergent points. As a first approach I assigned the categories democratically, noting the minority reports when appropriate (see below). A number of points might warrant some attribute reassignment:
Category assignmentHere's the 3-vote democratic category assignment on how to handle missing attributes (see the wiki page for category reference; not all categories were used). Interestingly, @orbeckst always sided with the majority: 1. Left out of the Topology, with a general
|
Ok I'll try and follow this as a squash more tests. |
@mnmelo I'm in agreement with this summary. Thanks for taking the time to compile this in a straightforward way. I think the business of figuring out how missing attributes should be handled is really quite fuzzy, but this is very clear despite that. |
Great! I may fall with the minority reports for 'types' and 'radii', but they are the two attributes I care the less about so you can safely ignore me. I have two questions, though:
|
So, this is what I meant in point 2 above, and I need your feedback to decide: either we make I have also given some thought to your other question. About |
Any guessed attributes will have to issue a warning on Universe creation. I guess for debug purposes I could also add the fact that the attributes were guessed into the I'm not sure on failed guesses, you could set them to a default (ie charges = 0.0?) but I prefer getting an actual warnings.warn("Trying to guess masses...")
try:
masses = guess_masses(atoms)
except NoDataError:
warnings.warn("Couldn't guess masses, sorry")
else:
attrs.append(masses) # masses only enter Topology if guessed Ultimately there'll have to be a bit of defensive programming at the start of analysis to check that all the required attributes exist.. eg def my_amazing_mass_thing(atoms):
try:
masses = atoms.masses
except NoDataError:
raise ValueError("Sorry, I need masses") We could even make decorators that do this |
If a guess fails, I expect to get a NoDataError, default dummy values are utterly misleading. If an attribute on category 2 fails to be guessed, I would like the error to reflect it. My concern is the poor user that get the MissingTopologyAttributeError, run the guesser as suggested, only to get the MissingTopologyAttributeError again.
If the masses remain in category 5, then Something like I like @richardjgowers decorator. |
And what's your take on lazy guessing? That way we only guess when accessed (it avoids spurious warnings on Universe loading about attributes that have nothing to do with the analysis at hand). |
I like lazy guessing! It makes it even more interesting to know before hand if an attribute exists, though. If I know bond guessing can be slow and I can do without it if needed, then I want to check bonds are there before I trigger the guesser unexpectedly. |
@jbarnoud: The idea is that failing guessers raise warnings. This way the poor user will ave some more feedback. Regarding cat. 1 vs. cat 2. behavior: cat. 2 are VIP attributes that we decided are always present. Cat. 1 attributes ( We can think of an attribute state as well, which would solve your knowing-if-there-are-bonds-beforehand problem: u = mda.Universe(top, traj)
# This just tells you whether bonds exist and won't blow up with a
# MissingTopologyAttributeError in your face if they don't.
if u.topology.bonds:
...
# If there are no bonds I'm not sure whether this should return False
# or raise an AttributeError.
if u.topology.bonds.guessed:
...
# Non VIP attributes need a different approach:
try:
attr = u.topology.chocolate_propensity
except AttributeError:
...
#or
if hasattr(u.topology, 'chocolate_propensity'):
... What do you think? |
Also, to help the end user in the non-VIP cases, we can have a check in
and
Looks fancy but all that's really needed is a table linking formats and non-VIP attribute names. |
Hopefully it will be enough to prevent a poor user entering an infinite loop. Your fancy exception are more inline with what I had in mind.
I guess you are answering to:
My point is that keeping a list of the attribute that are read from the topology and of those that are guessed would be an easy way to see what happens with the topology, independently from the categories. Dealing with exceptions will of course have to deal with the categories. |
Ok, I guess that's possible: a list for what's from the topology file and a list for what's guessed. Technically, we can have attributes from the topology that we later override by guessing/setting by hand. Should this be in both lists? Or should the lists be slightly different? As in: a list for every available attribute (irrespective of how it was added to the topology) and a list for which of those were guessed. |
|
It has come up in #941 and #815 that we do have a problem with abstracting the away the differences between topology formats because they save different information. There are in two types of problems
1. Set of common properties
In #815 I noticed that I right now write code under the assumption that a field like
masses
is always available even if they are not specified in the topology (eg PDB). While the new topology system only provides data actually present in the topology.2. properties specific to a format
In #941 @jbarnoud noted that he would like to make all information in a topology available to the user. This means going away from being completely format independent.
To me this suggests that we need to make a list of properties that are always available independent of the format and a special property under which we can add format specific properties like so:
ag.special.format_properties
.What do others think for this? Should we have properties like
masses, charges
always available with a guess function for some format?The text was updated successfully, but these errors were encountered: