-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Surprising differences between nnpops and torchani #82
Comments
The relevant packages in my environment are
|
@RaulPPelaez could you try to reproduce and verify the issue? |
Out of curiosity, what are you using for visualization? |
Could you try running with nnpops=0.3? |
I am using I have started simulations with |
with the updated This is a minimum example to reproduce the issue:
|
I am able to reproduce this. We believe it might be a problem with periodic boundary conditions in nnpops. |
I turned off one by one each of the optimized components here: NNPOps/src/pytorch/OptimizedTorchANI.py Lines 41 to 45 in 3c96f5b
The error arises when using the aev_computer, replacing the line by: self.aev_computer = model.aev_computer Results in relative error around machine precision as one would expect. |
Thanks for tracking this down! I will give it a try! |
Many of the CUDA tests in NNPOps fail for me, @peastman @raimis, could you confirm? Test project /shared/raul/NNPOps/build
Start 1: TestCpuANISymmetryFunctions
1/13 Test #1: TestCpuANISymmetryFunctions ...... Passed 1.50 sec
Start 2: TestCpuCFConv
2/13 Test #2: TestCpuCFConv .................... Passed 0.48 sec
Start 3: TestCudaANISymmetryFunctions
3/13 Test #3: TestCudaANISymmetryFunctions .....Subprocess aborted***Exception: 1.80 sec
Start 4: TestCudaCFConv
4/13 Test #4: TestCudaCFConv ...................Subprocess aborted***Exception: 1.82 sec
Start 5: TestBatchedNN
5/13 Test #5: TestBatchedNN ....................***Failed 203.79 sec
Start 6: TestCFConv
6/13 Test #6: TestCFConv ....................... Passed 5.28 sec
Start 7: TestCFConvNeighbors
7/13 Test #7: TestCFConvNeighbors .............. Passed 3.38 sec
Start 8: TestEnergyShifter
8/13 Test #8: TestEnergyShifter ................ Passed 108.41 sec
Start 9: TestOptimizedTorchANI
9/13 Test #9: TestOptimizedTorchANI ............***Failed 216.59 sec
Start 10: TestSpeciesConverter
10/13 Test #10: TestSpeciesConverter ............. Passed 111.40 sec
Start 11: TestSymmetryFunctions
11/13 Test #11: TestSymmetryFunctions ............***Failed 129.94 sec
Start 12: TestNeighbors
12/13 Test #12: TestNeighbors ....................***Exception: SegFault 5.57 sec
Start 13: TestGetNeighborPairs
13/13 Test #13: TestGetNeighborPairs ............. Passed 2.47 sec
54% tests passed, 6 tests failed out of 13
Total Test time (real) = 792.56 sec
The following tests FAILED:
3 - TestCudaANISymmetryFunctions (Subprocess aborted)
4 - TestCudaCFConv (Subprocess aborted)
5 - TestBatchedNN (Failed)
9 - TestOptimizedTorchANI (Failed)
11 - TestSymmetryFunctions (Failed)
12 - TestNeighbors (SEGFAULT) |
Mine mostly pass:
The failures are just a couple of floating point assertion errors that might be stochastic This was using a build environment created with |
I think these lines in if (device.is_cpu()) {
impl = std::make_shared<CpuANISymmetryFunctions>(numAtoms, numSpecies, Rcr, Rca, false, atomSpecies_, radialFunctions, angularFunctions, true);
^^^^^
#ifdef ENABLE_CUDA
} else if (device.is_cuda()) {
// PyTorch allow to chose GPU with "torch.device", but it doesn't set as the default one.
CHECK_CUDA_RESULT(cudaSetDevice(device.index()));
impl = std::make_shared<CudaANISymmetryFunctions>(numAtoms, numSpecies, Rcr, Rca, false, atomSpecies_, radialFunctions, angularFunctions, true);
^^^^^
#endif
} else mean that the |
I believe you're correct about that. If we replace |
Amazing catch @sef43 ! @peastman's suggestion fixes the issue. The following test now passes: from openmm.app import Simulation
from openmm import unit, LangevinIntegrator, Platform
from openmmml import MLPotential
from openmmtools.testsystems import WaterBox
import numpy as np
box_edge = 15 * unit.angstrom
testsystem = WaterBox(box_edge, cutoff=7 * unit.angstrom)
potential = MLPotential("ani2x")
platform = Platform.getPlatformByName("CPU")
prop = dict(CudaPrecision="mixed")
forces={}
positions=[]
for s in ("nnpops","torchani"):
system = potential.createSystem(
testsystem.topology, implementation=s
)
print(f"Implementation {s}")
file=open(f"{s}.dat", 'w')
integrator = LangevinIntegrator(300 * unit.kelvin, 1 / unit.picosecond, 0 * unit.picoseconds)
simulation=Simulation(testsystem.topology, system, integrator, platform, prop)
simulation.context.setPositions(testsystem.positions)
forces[s] = simulation.context.getState(getForces=True).getForces().value_in_unit(unit.kilojoules/unit.mole/unit.nanometer)
positions = simulation.context.getState(getPositions=True, enforcePeriodicBox=True).getPositions().value_in_unit(unit.nanometer)
fnorms_nnpops=np.linalg.norm(forces["nnpops"],axis=1)
fnorms_torchani=np.linalg.norm(forces["torchani"],axis=1)
error = np.abs((fnorms_nnpops - fnorms_torchani)/fnorms_torchani)
print(f"Maximum error: {np.max(error)}")
print(f"Mean error: {np.mean(error)}")
print(f"Std error: {np.std(error)}") Printing:
|
this is great, thank you! |
Hi,
I have been getting surprising results running waterbox simulations with the
torchani
vs thennpops
implementation ofani2x
. I used theopenmmtools
waterbox testsystem with an edge length of 20 A, and a 1 fs timestep, and simulated for 1 ns using a Langevin integrator with 1/ps collision rate. The system was set up withpotential.createSystem()
.When I run simulations in NpT with the
torchani
implementation at 300 K everything looks relatively normal (density is a bit too high, and the rdf has some surprising signal though):When I perform the same simulation with the
nnpops
implementation I see this:and the simulation box has shrunk (the initial box size is the yellow outlined square). Also, note the difference in the y-axis for the potential energy.
In NVT I observe vacuum bubbles with
nnpos
https://user-images.githubusercontent.com/31651017/218335894-0254ed80-e51f-4189-9bfc-ae94637cfd85.mp4
compared to the same simulation with
torchani
https://user-images.githubusercontent.com/31651017/218335817-3e911757-d19d-4f71-b922-8b9de913237e.mp4
I attached a minimal example to reproduce the simulations.
min_example.py.zip
The text was updated successfully, but these errors were encountered: