-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Particle Mesh Ewald #103
Particle Mesh Ewald #103
Conversation
Would it be possible to implement the reciprocal space energy calculations in python/torch as well? It would be easier to do tests of things like forces by taking advantage of automatic differentiation (as opposed to just comparing against hard coded floats) |
You could try, by I couldn't think of any way to implement it that wouldn't be too horribly inefficient. For example, you either end up with a loop in Python to spread charges from atoms one at a time, or you write a vectorized version that writes to every grid point for every atom. Can you think of a better way? Letting PyTorch compute derivatives through generic backpropagation is also likely to be extremely expensive. The FFT means that every point in the reciprocal space grid depends on every point in the real space grid. What sorts of tests would you want to do? |
This is ready for review. I originally hoped to implement the direct space calculation in Python, but I couldn't figure out any way to do it that was compatible with TorchScript. The implementation still isn't especially fast, since it just uses the NNPOps neighbor list structure, which isn't well suited to GPU computation, but it should be plenty fast enough for our purposes. The reciprocal space calculation is based on the implementation in OpenMM. I simplified it a bit, removing optimizations that add a lot of complexity for only modest speedup, but the performance should still be pretty good. If someone can double check my values for Coulomb's constant, that would be helpful! |
m[i] = (m[(i-1+ndata)%ndata]+m[(i+1)%ndata])*0.5 | ||
self.moduli.append(m) | ||
|
||
def compute_direct(self, positions: torch.Tensor, charges: torch.Tensor, cutoff: float, box_vectors: torch.Tensor, max_num_pairs: int = -1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The grid size is specified in the constructor, I guess for performance reasons. Does it really make sense then to leave the box size change at every forward call? grid size and box size are kind of tied to each other, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how pretty much all MD codes work. In an NPT simulation, the box size can change in each step, though usually only within a narrow range. But you would need to be very careful about changing the grid dimensions. That would cause a discontinuous change in the energy that would be hard to account for in analysis.
self.exclusions = self.exclusions.to(positions.device) | ||
return torch.ops.pme.pme_direct(positions, charges, neighbors, deltas, distances, self.exclusions, self.alpha, self.coulomb) | ||
|
||
def compute_reciprocal(self, positions: torch.Tensor, charges: torch.Tensor, box_vectors: torch.Tensor): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides testing purposes, is there any situation in which one would call compute_direct() and compute_reciprocal() for any other thing apart from subtracting each other?
I believe it would be useful for the PmeModule you wrote for the test to be defined in this file instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One common situation is multiple time step integrators, where you evaluate the direct space part more often than the reciprocal space part. I also want to experiment with models where I include only the reciprocal space part, which covers long range interactions for which point charges are a pretty good approximation, while letting the neural network try to learn a more accurate short range part.
// Store data for later use. | ||
|
||
ctx->save_for_backward({posDeriv, chargeDeriv}); | ||
return {torch::tensor(energy, options)}; | ||
} | ||
|
||
static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) { | ||
auto saved = ctx->get_saved_variables(); | ||
Tensor posDeriv = saved[0]; | ||
Tensor chargeDeriv = saved[1]; | ||
torch::Tensor ignore; | ||
return {posDeriv*grad_outputs[0], chargeDeriv*grad_outputs[0], ignore, ignore, ignore, ignore, ignore, ignore}; | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am facing an issue in TorchMD-net where saving intermediate variables makes a double backwards pass be silently incorrect.
There are several tricks one can do to get over it (here )
Will this implementation work correctly if one grads twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's possible to support second derivatives, except by writing code to compute them explicitly. This corresponds to the last example in the page you linked. Since the calculations are done by custom code, PyTorch doesn't know how to define gradients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This backwards is quite involved, the only way I see is to write the double backwards yourself, as you say.
The dangerous thing for me is that it will be silently wrong if you try to grad twice. Maybe there is a short way to at least raise an error if something tries to double backwards. Sadly, as far as I can see, even in the most simple form you need to implement a second autograd function like PME_Backwards.
int atom2 = exclusions[atom1][index-atom1*maxExclusions]; | ||
if (atom2 > atom1) { | ||
for (int j = 0; j < 3; j++) | ||
dr[j] = pos[atom1][j]-pos[atom2][j]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dont you have to apply PBC here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exclusions apply only to a single copy of each other atom, the one in the same periodic copy. See the discussion of this in the class documentation.
Accessor<float, 3> grid, int gridx, int gridy, int gridz, float sqrtCoulomb) { | ||
float recipBoxVectors[3][3]; | ||
invertBoxVectors(box, recipBoxVectors); | ||
float data[PME_ORDER][3]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of registers in these kernels is really high. I wonder if computing these in another kernel and fetching from global memory here would be better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could cut down the registers by putting recipBoxVectors
in shared memory. But in practice it pretty much doesn't matter. This kernel is overwhelmingly dominated by the memory bandwidth from all the atom adds.
src/pytorch/pme/pmeCPU.cpp
Outdated
|
||
// Take the inverse Fourier transform. | ||
|
||
Tensor realGrid = torch::fft::irfftn(recipGrid, {gridSize[0], gridSize[1], gridSize[2]}, c10::nullopt, "forward"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch<2 refuses to compile this line because it cannot convert from {gridSize[0], gridSize[1], gridSize[2]}
to the expected c10::optional<c10::ArrayRef<long int> >
type.
Maybe passing long int gridSizeTemp[3] = {gridSize[0], gridSize[1], gridSize[2]}
makes it happy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an open issue about this on the PyTorch github (pytorch/pytorch#102753). I plan to investigate more today and see if I can figure out a solution.
Really clean implementation @peastman ! Just some comments/questions from me, but looks great! |
included in reciprocal space. The sum of the two terms thus yields the correct energy with the interaction fully | ||
excluded. | ||
|
||
When performing backpropagation, this class computes derivatives with respect to atomic positions and charges, but |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I am over amplifying this because it made me waste a lot of time recently, but I believe there should be a mention here to this module having only a single backwards pass.
If one trains with forces this will be grad'ed during the forward pass, which requires second grad during backprop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's an annoying limitation. It shouldn't be an issue for the sorts of problems I'm interested in, since you wouldn't use PME during training. You would be training on isolated (not periodic) small molecules, for which all interactions are computed in direct space. PME would only be used for inference, if you want to apply the model to larger systems. It would be more of an issues for materials applications, if you wanted to train on crystals.
I'll see if I can get it to throw an exception if you try to take a second derivative.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python-side there is the @once_differentiable decorator: https://discuss.pytorch.org/t/what-does-the-function-wrapper-once-differentiable-do/31513
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that it already throws an exception:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I added a test case to check for it.
I'm working on an implementation of PME. This will be useful for models that include explicit Coulomb terms, especially when the charges are predicted by the model and require extra chain rule terms.
The direct space interaction is implemented in PyTorch and is very simple. The reciprocal space part is implemented as a C++ extension. I've written the CPU version. I still have to do the CUDA version.