Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Particle Mesh Ewald #103

Merged
merged 15 commits into from
Jun 8, 2023
Merged

Particle Mesh Ewald #103

merged 15 commits into from
Jun 8, 2023

Conversation

peastman
Copy link
Member

I'm working on an implementation of PME. This will be useful for models that include explicit Coulomb terms, especially when the charges are predicted by the model and require extra chain rule terms.

The direct space interaction is implemented in PyTorch and is very simple. The reciprocal space part is implemented as a C++ extension. I've written the CPU version. I still have to do the CUDA version.

@proteneer
Copy link

Would it be possible to implement the reciprocal space energy calculations in python/torch as well? It would be easier to do tests of things like forces by taking advantage of automatic differentiation (as opposed to just comparing against hard coded floats)

@peastman
Copy link
Member Author

You could try, by I couldn't think of any way to implement it that wouldn't be too horribly inefficient. For example, you either end up with a loop in Python to spread charges from atoms one at a time, or you write a vectorized version that writes to every grid point for every atom. Can you think of a better way?

Letting PyTorch compute derivatives through generic backpropagation is also likely to be extremely expensive. The FFT means that every point in the reciprocal space grid depends on every point in the real space grid.

What sorts of tests would you want to do?

@peastman peastman changed the title [WIP] Particle Mesh Ewald Particle Mesh Ewald Jun 1, 2023
@peastman peastman marked this pull request as ready for review June 1, 2023 18:01
@peastman
Copy link
Member Author

peastman commented Jun 1, 2023

This is ready for review.

I originally hoped to implement the direct space calculation in Python, but I couldn't figure out any way to do it that was compatible with TorchScript. The implementation still isn't especially fast, since it just uses the NNPOps neighbor list structure, which isn't well suited to GPU computation, but it should be plenty fast enough for our purposes. The reciprocal space calculation is based on the implementation in OpenMM. I simplified it a bit, removing optimizations that add a lot of complexity for only modest speedup, but the performance should still be pretty good.

If someone can double check my values for Coulomb's constant, that would be helpful!

m[i] = (m[(i-1+ndata)%ndata]+m[(i+1)%ndata])*0.5
self.moduli.append(m)

def compute_direct(self, positions: torch.Tensor, charges: torch.Tensor, cutoff: float, box_vectors: torch.Tensor, max_num_pairs: int = -1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The grid size is specified in the constructor, I guess for performance reasons. Does it really make sense then to leave the box size change at every forward call? grid size and box size are kind of tied to each other, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how pretty much all MD codes work. In an NPT simulation, the box size can change in each step, though usually only within a narrow range. But you would need to be very careful about changing the grid dimensions. That would cause a discontinuous change in the energy that would be hard to account for in analysis.

self.exclusions = self.exclusions.to(positions.device)
return torch.ops.pme.pme_direct(positions, charges, neighbors, deltas, distances, self.exclusions, self.alpha, self.coulomb)

def compute_reciprocal(self, positions: torch.Tensor, charges: torch.Tensor, box_vectors: torch.Tensor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides testing purposes, is there any situation in which one would call compute_direct() and compute_reciprocal() for any other thing apart from subtracting each other?
I believe it would be useful for the PmeModule you wrote for the test to be defined in this file instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One common situation is multiple time step integrators, where you evaluate the direct space part more often than the reciprocal space part. I also want to experiment with models where I include only the reciprocal space part, which covers long range interactions for which point charges are a pretty good approximation, while letting the neural network try to learn a more accurate short range part.

Comment on lines +159 to +172
// Store data for later use.

ctx->save_for_backward({posDeriv, chargeDeriv});
return {torch::tensor(energy, options)};
}

static tensor_list backward(AutogradContext *ctx, tensor_list grad_outputs) {
auto saved = ctx->get_saved_variables();
Tensor posDeriv = saved[0];
Tensor chargeDeriv = saved[1];
torch::Tensor ignore;
return {posDeriv*grad_outputs[0], chargeDeriv*grad_outputs[0], ignore, ignore, ignore, ignore, ignore, ignore};
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am facing an issue in TorchMD-net where saving intermediate variables makes a double backwards pass be silently incorrect.
There are several tricks one can do to get over it (here )
Will this implementation work correctly if one grads twice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's possible to support second derivatives, except by writing code to compute them explicitly. This corresponds to the last example in the page you linked. Since the calculations are done by custom code, PyTorch doesn't know how to define gradients.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This backwards is quite involved, the only way I see is to write the double backwards yourself, as you say.
The dangerous thing for me is that it will be silently wrong if you try to grad twice. Maybe there is a short way to at least raise an error if something tries to double backwards. Sadly, as far as I can see, even in the most simple form you need to implement a second autograd function like PME_Backwards.

int atom2 = exclusions[atom1][index-atom1*maxExclusions];
if (atom2 > atom1) {
for (int j = 0; j < 3; j++)
dr[j] = pos[atom1][j]-pos[atom2][j];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont you have to apply PBC here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exclusions apply only to a single copy of each other atom, the one in the same periodic copy. See the discussion of this in the class documentation.

Accessor<float, 3> grid, int gridx, int gridy, int gridz, float sqrtCoulomb) {
float recipBoxVectors[3][3];
invertBoxVectors(box, recipBoxVectors);
float data[PME_ORDER][3];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of registers in these kernels is really high. I wonder if computing these in another kernel and fetching from global memory here would be better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could cut down the registers by putting recipBoxVectors in shared memory. But in practice it pretty much doesn't matter. This kernel is overwhelmingly dominated by the memory bandwidth from all the atom adds.


// Take the inverse Fourier transform.

Tensor realGrid = torch::fft::irfftn(recipGrid, {gridSize[0], gridSize[1], gridSize[2]}, c10::nullopt, "forward");
Copy link
Contributor

@RaulPPelaez RaulPPelaez Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch<2 refuses to compile this line because it cannot convert from {gridSize[0], gridSize[1], gridSize[2]} to the expected c10::optional<c10::ArrayRef<long int> > type.
Maybe passing long int gridSizeTemp[3] = {gridSize[0], gridSize[1], gridSize[2]} makes it happy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have an open issue about this on the PyTorch github (pytorch/pytorch#102753). I plan to investigate more today and see if I can figure out a solution.

@RaulPPelaez
Copy link
Contributor

Really clean implementation @peastman ! Just some comments/questions from me, but looks great!

included in reciprocal space. The sum of the two terms thus yields the correct energy with the interaction fully
excluded.

When performing backpropagation, this class computes derivatives with respect to atomic positions and charges, but
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I am over amplifying this because it made me waste a lot of time recently, but I believe there should be a mention here to this module having only a single backwards pass.
If one trains with forces this will be grad'ed during the forward pass, which requires second grad during backprop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's an annoying limitation. It shouldn't be an issue for the sorts of problems I'm interested in, since you wouldn't use PME during training. You would be training on isolated (not periodic) small molecules, for which all interactions are computed in direct space. PME would only be used for inference, if you want to apply the model to larger systems. It would be more of an issues for materials applications, if you wanted to train on crystals.

I'll see if I can get it to throw an exception if you try to take a second derivative.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that it already throws an exception:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I added a test case to check for it.

@peastman peastman merged commit a02edec into master Jun 8, 2023
@peastman peastman deleted the pme branch June 8, 2023 19:40
@raimis raimis mentioned this pull request Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants