-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autodiff the graph building for "forces" #10
Comments
What would we need for that? Which functions would need differentiating? |
I haven't though it out in a ton of detail yet, but loosely here's the concept: Interatomic forces come from derivatives of potential energy with respect to atomic separation. Those same atomic separations are what we use to build the adjacency matrices of the graphs (edge weights set by some decaying function of the distance). So if we had a model that took in a graph and outputted a total energy and you could differentiate all the way through, in principle you could get forces (whether they'd be any good without forces as part of the training is a separate, though interesting, question). To do this, the main thing we'd need would be to differentiate through the different weight computation functions ( Then, those adjacency matrices go into building the graph laplacians, which influence how the convolutional layers work, but since those are already inside Flux, I assume that differentiation shouldn't be too hard? Though the laplacians aren't trainable parameters, so IDK if that throws any kind of wrench in the works syntactically. If you'd be interested in diving into this in more detail, I think it could be really cool, and potentially super impactful if it turns out to work well! We'd have to figure out what a good first test case would be (presumably some snapshots from DFT calculation relaxation trajectories), but @venkvis might have some good ideas... |
Absolutely! Sounds like a good idea. I would imagine that the forces calculated would still need to be part of a learning scheme with some trainable parameters. |
Have been staring at this and thinking for a bit before I make my new branch too messy...I think the shortest set of steps to get a prototype of this working is:
Bigger questions include how to actually integrate this stuff into the API of ChemistryFeaturization as it currently exists? There are questions about incorporating the kind of "backend" architecture we were talking about that come into this, but also things that could be really helped along by the abstract interface discussions that have come out of the BoF session. Hopefully we'll have a basic prototype of that by the end of the month...for now, I think doing this fast and somewhat hacky is probably okay just to get a proof of concept... |
Wouldn't this be required all the same, regardless of (1) and (2), since (2) should ideally produce the same outputs and
I think before actually building a hacky proof of concept, it might be a good idea to take a step back and try to ascertain the role A few questions that come to mind which I think may be worth thinking about in this context are -
Dhairya mentioned I could also possibly be misunderstanding what either of you has in mind when you say backend, and if I am, please correct me. 😅 |
Trying to answer more or less in order...
Yes? I'm not sure exactly what the question is, as it feels like you're asking me whether I think I need to do the thing that I said I needed to do. 😛
These are exactly the right kinds of questions. I would hope it would (eventually) be a one-stop shop as you say, but the "specific domain" would be featurization for ML models, and obviously ML is only one part of the broader ecosystem, which will also have a big focus on simulations like DFT and MD, for which the notion of "featurization" is not really needed...or at least not in the same way, there are obviously choices to make regarding basis sets, energy cutoffs and other parameters, (pseudo)potentials, etc. I think these things are well outside the domain of what I'm hoping to do with ChemistryFeaturization and Chemellia, at least right now. My hope would be that the abstract types we define are fairly universally applicable, and then we either purpose-build or perhaps even use a more generally applicable concrete type for our own structure representations that we featurize for ML. As for the backend question, I don't necessarily have a clear answer of how it should look, but I'll share a few thoughts:
Yet another somewhat separate question (but related to the abstract structure API stuff) is one of data structure for this stuff going forward. I'm thinking more more that it probably makes sense to (at least have the option to) attach full structural information to any ML representation (e.g. an AtomGraph). This would allow compatibility with things like this autodiff scheme, but also featurization by rdkit functions, provided we can convert our own structural representation into an rdkit one relatively easily. Because it seems the other option for these kinds of featurizations would be to "start from scratch" (i.e. from structure file), which for just a SMILES string or something may not be a big deal, but seems rather inelegant as a longterm practice. Tagging @cortner in case he has any thoughts on this, as some of this is relevant both to our conversation just now, as well as the ongoing ones about structure representations. No pressure to read all this, Christoph, just in case you're interested! |
I would like that. I'm having a discussion about it with my student tomorrow. His first reaction was he'd like to just start building the model but eventually put ChemistryFeaturization on top of it (and independently maybe supply our model as a layer for your ecosystem as well...)
Once you have forces you can do force matching, which will vastly improve your fit accuracy and in particular generalisation. It is much harder to overfit forces than energies. (Some researchers do just force-mathcing and just forget about the energy altogether... "force domain learning" or something like that ....) |
Made some progress with the graph building functions with some handy adjoints. The gradients are incorrect yet, but I have to do a little bit of math to correct that. julia> gradient(collect(1:10), collect(1:10), rand(10)) do i, j, dist
sum(GraphBuilding.weights_cutoff(i, j, dist))
end
([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], [0.21753773508968766, 0.2952598053193285, 0.28951826203415876, 0.7383348139259314, 0.7245524645420596, 0.7866558412828093, 0.6683988349998842, 0.6922819179751303, 0.3047111113015635, 0.19151690048933245]) |
What I meant was, (1), (2), (3) need not necessarily be the order in which we do these things, as (3) doesn't really need to worry about how (1) and (2) are implemented because whatever is passed for performing (3) remains the same even after regardless. |
Ah, I see. I suppose that's true, but in practice I'll do them in that order because currently (1) and (2) are done in one function (though without using Xtals.jl) so that general logic is all there already and just needs to be modularized further. |
Came up in a group meeting discussion. Could be a neat idea to try to differentiate with respect to graph weights and see if you can get something like a force by propagating that through a pretrained model...
The text was updated successfully, but these errors were encountered: