-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for bonds #87
Comments
maybe those are two different discussions? Start a separate thread for the trajectories? (btw, I only ever needed to treat trajectories as a list e.g. Vector or structures.) |
Re bonds : that's usually information stored in a neighbourlist. Are you asking how to cache the neighbourlist? I see no barrier to doing this right now. |
@cortner I think the question is about adding functions to the interface for specifying bonds, which could be useful in a number of interoperability scenarios including visualization, but also for things like constructing molecular graphs for ML applications, etc. |
Also, a neighbor list is a bit different from bonds, since one has to do with spatial proximity and the other with specific chemical interactions (and, in the molecular simulation context, often geometric constraints). |
Yes I'm more after adding function to the interface to standardize bonds in AtomsBase. |
To my mind this sort of interface necessarily involves neighbourlists. I thought it is the responsibility of the neighbourlist to provide lists of bonds - ideally lazily since anything beyond pair bonds would scale very badly. But if that's not how people commonly think about it, that's fine of course. The op talks about "potential ways to store bond information" hence I asked about caching. If one wants to repeatedly access the list of bonds and if that list is provided via a neighbourlist, then I think this is related. |
In molecular mechanics terms the bonds are defined during setup and fixed throughout the whole simulation. The question is whether we should represent that information (a graph, effectively) in AtomsBase. The neighbour list calculates close pairs from all possible atom pairs and is used to calculate non-bonded interactions (Lennard-Jones and Coulomb). There is the complexity that pre-defined bonds are important in this context, since atoms linked by one or two bonds typically have their non-bonded interactions excluded and atoms linked by three bonds typically have their non-bonded interactions down-weighted (huge hacks). |
Thank you for clarifying this for me. If that's the concrete context, I still think this is something to be calculated in a separate object and then stored somewhere. But why in a structure and not in the calculator? I can see arguments for both. Anyhow, any object can already be stored? The calculator then needs to know how to find it. If there really is a need for an interface (which I don't see yet) then I think this is actually not unrelated to #84 and #86 --- i.e. providing interfaces for calculating things. |
The "standard" way of doing molecular mechanics is that you give an initial structure. Then you setup topology for the system. This "topology" meaning you give bonding information for the system. These bonds are not broken during the simulation and (usually) harmonic forces are setup for the bonds. So, the molecular mechanical force field is two parts. One that comes form "topology" and dispersion/Coulomb part that is added with the help of pairlist calculation. The question here seems to be: should we have a some standard way of implementing topology for the system. I would say that separate structure might be a way to go. The issue I see here is that most file formats don't support topology information and if we have it in System-structures it could make saving/loading difficult. But we could make an abstract structure |
Chatted with @ejmeitz a bit more about this today and we may put together a PR for a function to return bond information. A few questions for discussion and our preliminary answers:
Ethan's primary interest right now is in using this for visualization purposes, but I could imagine it being useful for building graphs for ML things, defining potentials for Molly in other packages, etc. @jgreener64, do you have any opinions on this? More as a note to self, things to add in PR once these decisions are made:
If nobody has strong opinions (or at least nobody is vehemently opposed to this existing 🤪), we'll probably just make a prototype in a PR sometime soon and move further discussion there. |
I was also wondering if it was important to add some way to differentiate between types of bonds (at a minimum double/triple bonds). |
No strong opinions on this, I guess it could be nice in some situations, it could go in AtomsBase.jl provided that sensible defaults are defined so existing systems don't need to change. List of bonds seems best since an adjacency matrix would want to be sparse to avoid N^2 scaling, and sparse matrices are basically lists of indices anyway. Maybe |
Despite resistance to my comments above - I still think a list of bonds and a neighbourlist are very closely related objects (connectivity). It would be great to give some thought to a general interface. This shouldn't prevent experimentation of course. Regarding the questions above:
I think this is all about structure, so AtomsBase is fine for me.
an iterator I think. One iterator for pairs, one for triplets and so forth. I'm thinking of compute_bonds!(structure, ...)
for (i, j, ...) in pairs(structure)
# do something
end
for (i, j1, j2, ...) in nclusters{3}(structure, ...)
# do something else
end Of course they can always be collected and there could be convenience wrappers for that, which can also be overloaded. But with iterators you don't presume how the list is stored.
This is not clear to me and highly application specific. I think some thought should go into designing this. see above.
the problem with a single call to all possible bond types is that it will necessarily be type-unstable. |
Why can't this be stored in the meta-data of the existing systems? If there are no bonds stored in the meta-data then the call to |
PLEASE PLEASE do not re-implement the neighbourlist for the 5th time? Can we rather talk about integrating one or all of the existing ones? |
I agree that bonds and neighbor lists are related conceptually, but I think the way they are used in actual simulation (and also in visualization, which is the particular use case Ethan is working to implement here) is quite different. As a few examples (which largely summarize comments already made earlier in this thread):
We could certainly store either piece of information in metadata (and I think it's a good idea to include that as an example and/or in the tests, since the interface is of course agnostic to where/how the information is actually stored in the object), but I think it's an important enough piece of information in certain contexts that having its own function (and hence standardized format) to access it is justified. @ejmeitz, maybe you could point folks to your visualization stuff to make clear why this would be nice at least in this particular use case? |
Not if we have a sensible fallback default implementation that just says "there are no bonds" (as described in my earlier comment). |
That is a fair point. If we do want to support designating bond types (as opposed to just the presence of a bond via a pair of indices), we should talk about how to deal with that. One option is of course to set up an initial implementation that doesn't support this and punt the discussion to later...or just a limited specification like what Joe suggested above (i.e. just a number that indicates bond order). |
That doesn't change the fact that the fundamental "thing" that is a list of bonds remains the same in those use-cases. It's only how they are used that differs. I therefore disagree that they are different objects and should be treated differently. |
The justification for a separate But to my mind, a Sorry if I missed this in the long thread above. |
|
then you have only pairs. Sure, that's fine. Except don't use |
We are not talking about writing new functionality, but about how to store such a list. A list of pairs is exactly the default output from a neighbourlist. |
I'm not trying to be contrary for fun here and as soon as I see a convincing argument that a list of bonded pairs is fundamentally something different from a neighbourlist, I'll back off. So far I don't see it. |
We can use an enum for the type of the bond then to make it isbits. What bonds aren't between pairs of atoms? Like if I have a methane molecule that information is almost always stored as 4 separate bonds in MD packages. Yeah a list of pairs is the default neighbor list but not all neighbors in the neighbor list are literally bonded to each other. They're just near each other spatially. A bond list is a more specific piece of information than a neighbor list. |
They are the same (bar extra data) in that they represent connectivity, but different in that a The key difference when implementing is that bonds are computed once at the start whereas neighbours are re-computed during simulation. This puts more restraints on the internal details for the neighbour list since the data structures have to be updated whilst considering performance. It would be nice to have a unified interface for accessing pairs of connected entities in molecular objects, which would encompass both. It will be harder to get consensus for neighbour lists though, since a performant interface may put requirements on the internal storage. |
yes, exactly.
that's what I'm trying to achieve
I do not want consistent use of internatl storage, everybody should do what they want. Hence the suggestion to use an iterator interface. But that's a weak suggestion and should be thought about more carefully. |
Generally speaking, |
One last comment from me:
First, I'm not sure it really matters. It's just two different neighbour lists for the same system. Secondly, there are also dynamic neighbourlists that are adaptive to the various properties of the system such as chemical species. |
One question I haven't seen a compelling answer to as yet (and I think is important to think about to guard against premature optimization) is: what is the actual use case of a generic neighbor list interface? The only use case of neighbor lists I'm aware of is internal to an MD simulation, where it doesn't need to be passed around to other codes anyway. I suppose in constructing graphs for ML applications one might make use of neighbor lists but given that a neighbor list is not an entirely unambiguous quantity anyway (i.e. there are "hyperparameters" such as cutoff distance, whether to consider PBC's, etc.), they would likely be manually recomputed in such a situation anyway. Put another way, I think defining a set of bonds is both:
These two points, to me, justify bonds being considered "special" in this sense. As I said before, I do agree that these are similar objects in the sense of the shape of the data, which I think is @cortner's core point. In a certain sense, I think we've been talking past each other a bit from a terminology perspective, because I think Christoph is thinking about these things purely from a data structures perspective, whereas some of us are also incorporating our physical understanding of what these ideas actually mean (e.g. sharing electrons vs. just being near each other in space). Obviously, in the sense that these are both "lists of pairs of atoms," that distinction perhaps shouldn't matter, but my feeling is still that what they're actually used for in practice (i.e. what we do with that data) is sufficiently different as to consider them to be different things. Of course, if we do decide that we want the interface to support both, I do agree that it should function similarly for both kinds of information. I'm just not currently convinced that AtomsBase needs a function for neighbor lists at all. Sorry, this one ended up a bit longer than I imagined. 😅 |
sure - there is no rush.
Swapping out different nlist implementations for different simulation scenarios such as large-scale, small-scale, MIC, no MIC, different architectures, performance, and so forth.
no, in fact the point is there can be many different data structures to represent neighours / topology / bonds / whatever you want to call them. I am saying that conceptually they are the same objects. I don't think we've been talking past each other. I am just challenging some of the assumptions made in this thread, which Teemu educated me is based on the fact that I'm the only non-Chemist in this room. |
Just wanted to start a discussion about potential ways to store bond information in the current AtomsBase object as well as how trajectories could be standardized.
- Trajectories I'm a little less certain how to include as they are dynamic and AtomsBase fundamentally is not. That said a lot of analysis functions act on atomic trajectories not systems and it could be nice to unify trajectory formats as well.The text was updated successfully, but these errors were encountered: