Particle Mesh Ewald #103

peastman · 2023-05-23T19:12:35Z

I'm working on an implementation of PME. This will be useful for models that include explicit Coulomb terms, especially when the charges are predicted by the model and require extra chain rule terms.

The direct space interaction is implemented in PyTorch and is very simple. The reciprocal space part is implemented as a C++ extension. I've written the CPU version. I still have to do the CUDA version.

proteneer · 2023-05-23T19:52:58Z

Would it be possible to implement the reciprocal space energy calculations in python/torch as well? It would be easier to do tests of things like forces by taking advantage of automatic differentiation (as opposed to just comparing against hard coded floats)

peastman · 2023-05-23T20:10:13Z

You could try, by I couldn't think of any way to implement it that wouldn't be too horribly inefficient. For example, you either end up with a loop in Python to spread charges from atoms one at a time, or you write a vectorized version that writes to every grid point for every atom. Can you think of a better way?

Letting PyTorch compute derivatives through generic backpropagation is also likely to be extremely expensive. The FFT means that every point in the reciprocal space grid depends on every point in the real space grid.

What sorts of tests would you want to do?

peastman · 2023-06-01T18:05:48Z

This is ready for review.

I originally hoped to implement the direct space calculation in Python, but I couldn't figure out any way to do it that was compatible with TorchScript. The implementation still isn't especially fast, since it just uses the NNPOps neighbor list structure, which isn't well suited to GPU computation, but it should be plenty fast enough for our purposes. The reciprocal space calculation is based on the implementation in OpenMM. I simplified it a bit, removing optimizations that add a lot of complexity for only modest speedup, but the performance should still be pretty good.

If someone can double check my values for Coulomb's constant, that would be helpful!

RaulPPelaez · 2023-06-02T09:12:43Z

src/pytorch/pme/pme.py

+                    m[i] = (m[(i-1+ndata)%ndata]+m[(i+1)%ndata])*0.5
+            self.moduli.append(m)
+
+    def compute_direct(self, positions: torch.Tensor, charges: torch.Tensor, cutoff: float, box_vectors: torch.Tensor, max_num_pairs: int = -1):


The grid size is specified in the constructor, I guess for performance reasons. Does it really make sense then to leave the box size change at every forward call? grid size and box size are kind of tied to each other, right?

This is how pretty much all MD codes work. In an NPT simulation, the box size can change in each step, though usually only within a narrow range. But you would need to be very careful about changing the grid dimensions. That would cause a discontinuous change in the energy that would be hard to account for in analysis.

RaulPPelaez · 2023-06-02T09:17:33Z

src/pytorch/pme/pme.py

+        self.exclusions = self.exclusions.to(positions.device)
+        return torch.ops.pme.pme_direct(positions, charges, neighbors, deltas, distances, self.exclusions, self.alpha, self.coulomb)
+
+    def compute_reciprocal(self, positions: torch.Tensor, charges: torch.Tensor, box_vectors: torch.Tensor):


Besides testing purposes, is there any situation in which one would call compute_direct() and compute_reciprocal() for any other thing apart from subtracting each other?
I believe it would be useful for the PmeModule you wrote for the test to be defined in this file instead.

One common situation is multiple time step integrators, where you evaluate the direct space part more often than the reciprocal space part. I also want to experiment with models where I include only the reciprocal space part, which covers long range interactions for which point charges are a pretty good approximation, while letting the neural network try to learn a more accurate short range part.

RaulPPelaez · 2023-06-02T09:39:41Z

src/pytorch/pme/pmeCPU.cpp

+        // Store data for later use.
+
+        ctx->save_for_backward({posDeriv, chargeDeriv});
+        return {torch::tensor(energy, options)};
+    }
+
+    static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
+        auto saved = ctx->get_saved_variables();
+        Tensor posDeriv = saved[0];
+        Tensor chargeDeriv = saved[1];
+        torch::Tensor ignore;
+        return {posDeriv*grad_outputs[0], chargeDeriv*grad_outputs[0], ignore, ignore, ignore, ignore, ignore, ignore};
+    }
+};


I am facing an issue in TorchMD-net where saving intermediate variables makes a double backwards pass be silently incorrect.
There are several tricks one can do to get over it (here )
Will this implementation work correctly if one grads twice?

I'm not sure it's possible to support second derivatives, except by writing code to compute them explicitly. This corresponds to the last example in the page you linked. Since the calculations are done by custom code, PyTorch doesn't know how to define gradients.

This backwards is quite involved, the only way I see is to write the double backwards yourself, as you say.
The dangerous thing for me is that it will be silently wrong if you try to grad twice. Maybe there is a short way to at least raise an error if something tries to double backwards. Sadly, as far as I can see, even in the most simple form you need to implement a second autograd function like PME_Backwards.

RaulPPelaez · 2023-06-02T10:32:01Z

src/pytorch/pme/pmeCUDA.cu

+        int atom2 = exclusions[atom1][index-atom1*maxExclusions];
+        if (atom2 > atom1) {
+            for (int j = 0; j < 3; j++)
+                dr[j] = pos[atom1][j]-pos[atom2][j];


Dont you have to apply PBC here?

Exclusions apply only to a single copy of each other atom, the one in the same periodic copy. See the discussion of this in the class documentation.

RaulPPelaez · 2023-06-02T10:43:33Z

src/pytorch/pme/pmeCUDA.cu

+                             Accessor<float, 3> grid, int gridx, int gridy, int gridz, float sqrtCoulomb) {
+    float recipBoxVectors[3][3];
+    invertBoxVectors(box, recipBoxVectors);
+    float data[PME_ORDER][3];


The number of registers in these kernels is really high. I wonder if computing these in another kernel and fetching from global memory here would be better.

I could cut down the registers by putting recipBoxVectors in shared memory. But in practice it pretty much doesn't matter. This kernel is overwhelmingly dominated by the memory bandwidth from all the atom adds.

src/pytorch/pme/pmeCUDA.cu

RaulPPelaez · 2023-06-02T11:05:44Z

src/pytorch/pme/pmeCPU.cpp

+
+        // Take the inverse Fourier transform.
+
+        Tensor realGrid = torch::fft::irfftn(recipGrid, {gridSize[0], gridSize[1], gridSize[2]}, c10::nullopt, "forward");


torch<2 refuses to compile this line because it cannot convert from {gridSize[0], gridSize[1], gridSize[2]} to the expected c10::optional<c10::ArrayRef<long int> > type.
Maybe passing long int gridSizeTemp[3] = {gridSize[0], gridSize[1], gridSize[2]} makes it happy?

I have an open issue about this on the PyTorch github (pytorch/pytorch#102753). I plan to investigate more today and see if I can figure out a solution.

RaulPPelaez · 2023-06-02T11:08:28Z

Really clean implementation @peastman ! Just some comments/questions from me, but looks great!

RaulPPelaez · 2023-06-05T06:44:43Z

src/pytorch/pme/pme.py

+    included in reciprocal space.  The sum of the two terms thus yields the correct energy with the interaction fully
+    excluded.
+
+    When performing backpropagation, this class computes derivatives with respect to atomic positions and charges, but


Maybe I am over amplifying this because it made me waste a lot of time recently, but I believe there should be a mention here to this module having only a single backwards pass.
If one trains with forces this will be grad'ed during the forward pass, which requires second grad during backprop.

Yeah, that's an annoying limitation. It shouldn't be an issue for the sorts of problems I'm interested in, since you wouldn't use PME during training. You would be training on isolated (not periodic) small molecules, for which all interactions are computed in direct space. PME would only be used for inference, if you want to apply the model to larger systems. It would be more of an issues for materials applications, if you wanted to train on crystals.

I'll see if I can get it to throw an exception if you try to take a second derivative.

Python-side there is the @once_differentiable decorator: https://discuss.pytorch.org/t/what-does-the-function-wrapper-once-differentiable-do/31513

It turns out that it already throws an exception:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I added a test case to check for it.

peastman added 4 commits May 19, 2023 14:20

Beginning of implementing PME

e1e19f3

CPU implementation of forward pass

6eaa543

CPU implementation of backward pass

10de98f

Test cases for PME

19bb493

peastman added 8 commits May 25, 2023 15:17

CUDA implementation of PME

0316601

Bug fixes

583eef8

More test cases

5e23d08

CPU implementation of direct space

83da284

CUDA implementation of direct space

49c12de

More test cases

1bb345c

Implemented exclusions for PME

c39e576

Added documentation and error checking

5db9073

peastman changed the title ~~[WIP] Particle Mesh Ewald~~ Particle Mesh Ewald Jun 1, 2023

peastman marked this pull request as ready for review June 1, 2023 18:01