Add support for NequIP models #60

sef43 · 2023-10-04T10:14:37Z

This PR adds in support for NequIP models to openmm-ml. There are no pre-trained models available but the model framework is well defined. This will allow users to use their own trained NequIP models in OpenMM simulations.

Also adds code to compute neighbor lists with pytorch that will be used for MACE models too. (NNPOps neighbor list can be added later)

Addresses #48 and see mir-group/nequip#288 for further discussion.

TODO: Need to add testing but not sure how to do this cleanly in CI considering NequIP needs to be installed via pip

* create openmmml/models/nequippotential.py * create example for toluene example/run_nequip.py

* implement PBC * cleanup nequip types * user specified unit conversions * should work on GPU and CPU * add example for toy model with PBC

* uses nequip 0.6.0 @develop branch * uses torch-nl compute_neighborlist * sets torch dtype from the loaded model metadata

peastman

This will be a really nice feature to have. I did a first pass through the code and made some comments.

Reading through this, it occurs to me that we really need proper documentation. The README gives a brief overview, but as we expand to more options than just ANI, and especially as we add models that are more complex to use and require installing other packages, that won't be enough.

peastman · 2023-10-05T23:11:30Z

openmmml/models/nequippotential.py

+        return NequIPPotentialImpl(name, model_path, distance_to_nm, energy_to_kJ_per_mol, atom_types)
+
+class NequIPPotentialImpl(MLPotentialImpl):
+    """This is the MLPotentialImpl implementing the NequIP potential.


We should clarify what this means, since there isn't really such a thing as "the NequIP potential". NequIP is a code, not a potential. Presumably this class can be used for any model implemented with that code, including Allegro models for example?

The class can be used with any model generated by NequIP. I'm currently verifying if this also applies to Allegro, though I anticipate it should.

peastman · 2023-10-05T23:13:43Z

openmmml/models/nequippotential.py

+                input_dict["pos"] = positions
+
+                # compute edges
+                mapping, shifts_idx = simple_nl(positions, input_dict["cell"], pbc, self.r_max)


Is there a reason not to use the NNPOps neighbor list? In mir-group/nequip#288 (comment) you found it was much faster. NNPOps is already a dependency of this module.

peastman · 2023-10-05T23:16:01Z

openmmml/models/nequippotential.py

+
+    """
+
+    def __init__(self, name, model_path, distance_to_nm, energy_to_kJ_per_mol, atom_types):


Does NequIP have default units it uses, or are the units arbitrary? If it has a preferred set of units, we can put the conversion factors here as default values.

Based on the discussion at mir-group/nequip#288 (comment), it appears that NequIP/Allegro is entirely agnostic to units and preserves those of the training dataset. I think it's better for users to receive a TypeError indicating missing arguments rather than potentially proceeding with incorrect conversions.

peastman · 2023-10-05T23:17:34Z

openmmml/models/nequippotential.py

+                self.register_buffer('nm_to_distance', torch.tensor(1.0/distance_to_nm))
+                self.register_buffer('distance_to_nm', torch.tensor(distance_to_nm))


Why register two redundant buffers with the same information?

peastman · 2023-10-05T23:18:27Z

openmmml/models/nequippotential.py

+    model_path: str
+        path to deployed NequIP model


Should we also allow the user to directly pass in the model as a PyTorch object?

jchodera · 2023-11-11T16:15:54Z

Can we train a Nequip model on SPICE and enable that to be usable through openmm-ml?

svarner9 · 2024-04-10T04:16:42Z

Hello,

Has there been any further progress on this? I have used NequIP in LAMMPS but would like to instead use OpenMM because it is more compatible with the enhanced sampling packages that I use.

I have tried running simulations with a NequIP potential with openmm-ml in its current state, however the speed is significantly slower than in LAMMPS. Both simulations are run on a single GPU, however in LAMMPS I also use 32 cpu threads and kokkos.

I am not sure if I am doing something incorrect in running openmm-ml, but currently it is unusable for my rather simple system of 645 atoms. Is it expected for it to be slow on a system of this size in its current state?

I can provide further information if needed. Thank you so much in advance!

Best,
Sam

JMorado · 2024-04-23T11:04:40Z

@svarner9, could you try the current implementation available here? It uses the NNPOps neighbor list, so I anticipate it might be slightly faster for a system of the size you're working with. You can create the MLPotential using something along these lines:

potential = MLPotential('nequip', modelPath='model.pth', lengthScale=0.1, energyScale=4.184)

What speed-up did you observe in your LAMMPS simulations compared to OpenMM/OpenMM-ML?

… directory

JMorado · 2024-05-07T10:07:28Z

This is done from my side. If someone could take a look and review the changes, that would be great. Performance benchmarks on test models can be found here.

Many thanks!

peastman · 2024-05-07T17:12:51Z

openmmml/models/nequippotential.py

+    ``atomTypes`` parameter. This must be a list containing an integer specifying
+    the atom type of each particle in the system. Note that by default the model


Based on the code, I think this description is incorrect. It actually should contain the atom type for each particle that will be modeled with the ML potential. So if you call createMixedSystem(), it should contain one element for each element of the atoms argument, not one for each particle in the System. Can you make this clear both here and in the description of the atomTypes argument below?

Thanks for noticing this. I have now clarified it. Now, atomTypes is also an argument passed during system creation. This allows creating systems with varying ML regions from the same MLPotential in cases where custom nequip atom types are being used.

peastman · 2024-05-07T17:15:17Z

openmmml/models/nequippotential.py

+        Parameters
+        ----------
+        name : str
+            The name of the deployed model.


Actually the name that was specified in the MLPotential constructor, which in this case will always be nequip.

peastman · 2024-05-07T17:20:26Z

openmmml/models/nequippotential.py

+            typeNameToTypeIndex = {
+                typeNames: i for i, typeNames in enumerate(typeNames)
+            }
+            self.atomTypes = [


Modifying the object this method is called on is dangerous. If you create a MLPotential and then create multiple Systems from it, this will lead to incorrect results on the second and later calls.

Agreed. Fixed.

peastman · 2024-05-07T17:43:33Z

openmmml/models/nequippotential.py

+            model : str
+                The path to the deployed NequIP model.
+            lengthScale : float
+                The energy conversion factor from the model units to kJ/mol.
+            energyScale : float
+                The length conversion factor from the model units to nanometers.
+            dtype : torch.dtype
+                The precision of the model.
+            r_max : torch.Tensor
+                The maximum distance for the neighbor search.
+            inputDict : dict
+                The input dictionary passed to the model.


This doesn't match the actual list of attributes.

Fixed. Since buffers can also be accessed as attributes, should those be included in the docstring?

peastman · 2024-05-07T17:48:17Z

openmmml/models/nequippotential.py

+        self.model, metadata = nequip.scripts.deploy.load_deployed_model(
+            self.modelPath, device="cpu", freeze=False
+        )


Here is another instance of modifying self inside a method that should treat it as immutable.

peastman · 2024-05-07T17:56:27Z

test/test_mlpotential.py



 @pytest.mark.parametrize("implementation,platform_int", list(itertools.product(['nnpops', 'torchani'], list(platform_ints))))
 class TestMLPotential:

    def testCreateMixedSystem(self, implementation, platform_int):
-        pdb = app.PDBFile('alanine-dipeptide-explicit.pdb')
+        pdb = app.PDBFile(os.path.join(test_data_dir, 'alanine-dipeptide/alanine-dipeptide-explicit.pdb'))


The hardcoded unix path separator will fail on Windows.

peastman · 2024-05-07T17:58:22Z

test/test_mlpotential.py

+test_data_dir = os.path.dirname(os.path.abspath(__file__))
+test_data_dir = os.path.join(test_data_dir, "data")


The first line is confusing: you set test_data_dir to a directory that doesn't contain the test data. It's better to give it a different name, or just combine these two lines.

peastman · 2024-05-07T18:17:56Z

examples/nequip/toluene.pdb

@@ -0,0 +1,23 @@
+HETATM    1  C1  UNL     1       2.199  -0.143   0.062  1.00  0.00           C  


The correct residue name for toluene is MBN. See http://ligand-expo.rcsb.org/reports/M/MBN/index.html.

peastman · 2024-05-07T18:22:18Z

examples/nequip/README.md

+conda install -c conda-forge openmm-torch nnpops
+```
+
+Then install the development versions of NequIP and `openmm-ml` using pip:


We shouldn't be telling people to install pre-release versions of packages unless there's a really good reason. Any release of OpenMM-ML that doesn't include NequIP support also doesn't include these examples, so we should always direct people to the latest release. Why is the pre-release NequIP needed, and when will the necessary features be in a release?

Makes sense. I rewrote the instructions so that that the packages from the OpenMM ecosystem, viz. NNPOps and openmm-ml packages, are installed from conda-forge.

Regarding the pre-release of NequIP, the one currently available through pip is version 0.5.6, while pip install git+https://github.com/mir-group/nequip@develop installs version 0.6.0. There's some discussion in this thread as to why the development version of NequIP might be better (or necessary) to use with this interface. I am sure @Linux-cpp-lisp is in a better position to provide more informed answers to your questions. @Linux-cpp-lisp could you please clarify why is the development version of NequiP required and whether there any plans to make it available through pip and/or conda? Many thanks!

Sorry for coming around to this thread late, and thanks both for your efforts on this!

This is something I need to fix and hope to have fixed and available normally on PyPI in the very near future; I'll let you know as soon as I have that up. Since you are using load_deployed_model, this will probably work with the current main as well, but ideally I will just get that released and we will restrict to >=0.6.0. Thanks for your flexibility while I clean this up.

OK, 0.6.0 is now available from PyPI: https://pypi.org/project/nequip/

Let me know if this resolves the issues for you.

(I would recommend restricting the nequip version to 0.6.0+ just for simplicity's sake)

Many thanks @Linux-cpp-lisp. That's really useful! I'll integrate it soon.

peastman · 2024-05-07T19:26:21Z

examples/nequip/run_nequip.py

+simulation.minimizeEnergy()
+
+# Run the simulation
+simulation.step(1000)


Before starting the simulation it's a good idea to call

simulation.context.setVelocitiesToTemperature(300*unit.kelvin)

Assume people will be copying your code, so we want to make sure it follows best practices. For the same reason, it's better to use a DCDReporter instead of a PDBReporter (PDB being a terrible format for trajectories).

peastman · 2024-05-07T19:29:55Z

examples/nequip/run_nequip.ipynb

+    "!mamba install -c conda-forge openmm-torch nnpops pytorch=*=cuda*\n",
+    "\n",
+    "!pip install git+https://github.com/mir-group/nequip@develop\n",
+    "!pip install git+https://github.com/sef43/openmm-ml@nequip"


Let's not have the example install OpenMM-ML from your personal fork! Remember that whatever you do in the example, users will copy it.

Fixed. And thanks for the heads up, will keep that in mind :)

svarner9 · 2024-05-08T02:16:23Z

@svarner9, could you try the current implementation available here? It uses the NNPOps neighbor list, so I anticipate it might be slightly faster for a system of the size you're working with. You can create the MLPotential using something along these lines:
potential = MLPotential('nequip', modelPath='model.pth', lengthScale=0.1, energyScale=4.184)
What speed-up did you observe in your LAMMPS simulations compared to OpenMM/OpenMM-ML?

@JMorado
I went ahead and tested out the version on the nequip branch, however I am unable to get it to run on a GPU. When I specify the potential and the platform in the following way,

potential = MLPotential("nequip",
                            modelPath='model.pth',
                            lengthScale=0.1,
                            energyScale=96.48)
...

plat = openmm.Platform.getPlatformByName("CUDA")
properties = {"Precision": "double", "DeviceIndex": "0",
              "UseBlockingSync": "false"}
simulation = app.Simulation(topology, system, integrator, plat, properties)

I get the following set of warnings and errors:

/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/torchani/aev.py:16: UserWarning: cuaev not installed
  warnings.warn("cuaev not installed")
/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/scripts/deploy.py:138: UserWarning: Models deployed before v0.6.0 don't contain information about their default_dtype or model_dtype; assuming the old default of float32 for both, but this might not be right if you had explicitly set default_dtype=float64.
  warnings.warn(
/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/utils/_global_options.py:70: UserWarning: Setting the GLOBAL value for jit fusion strategy to `[('DYNAMIC', 3)]` which is different than the previous value of `[('STATIC', 2), ('DYNAMIC', 10)]`
  warnings.warn(
Traceback (most recent call last):
  File "/home/svarner/Practicum/sim.py", line 174, in <module>
    run(1,1,1,1,1)
  File "/home/svarner/Practicum/sim.py", line 145, in run
    simulation = app.Simulation(topology, system, integrator, plat, properties)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/openmm/app/simulation.py", line 106, in __init__
    self.context = mm.Context(self.system, self.integrator, platform, platformProperties)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/openmm/openmm.py", line 12171, in __init__
    _openmm.Context_swiginit(self, _openmm.new_Context(*args))
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
openmm.OpenMMException: Specified a Platform for a Context which does not support all required kernels

Here is my mamba list:

# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
ase                       3.22.1             pyhd8ed1ab_1    conda-forge
blinker                   1.8.2              pyhd8ed1ab_0    conda-forge
brotli                    1.1.0                hd590300_1    conda-forge
brotli-bin                1.1.0                hd590300_1    conda-forge
brotli-python             1.1.0           py311hb755f60_1    conda-forge
bzip2                     1.0.8                hd590300_5    conda-forge
c-ares                    1.28.1               hd590300_0    conda-forge
ca-certificates           2024.2.2             hbcca054_0    conda-forge
cached-property           1.5.2                hd8ed1ab_1    conda-forge
cached_property           1.5.2              pyha770c72_1    conda-forge
certifi                   2024.2.2           pyhd8ed1ab_0    conda-forge
charset-normalizer        3.3.2              pyhd8ed1ab_0    conda-forge
click                     8.1.7           unix_pyh707e725_0    conda-forge
contourpy                 1.2.1           py311h9547e67_0    conda-forge
cudatoolkit               11.5.2              hbdc67f6_13    conda-forge
cycler                    0.12.1             pyhd8ed1ab_0    conda-forge
e3nn                      0.5.1                    pypi_0    pypi
filelock                  3.14.0             pyhd8ed1ab_0    conda-forge
flask                     3.0.3              pyhd8ed1ab_0    conda-forge
fonttools                 4.51.0          py311h459d7ec_0    conda-forge
freetype                  2.12.1               h267a509_2    conda-forge
fsspec                    2024.3.1           pyhca7485f_0    conda-forge
gmp                       6.3.0                h59595ed_1    conda-forge
gmpy2                     2.1.5           py311he48d604_0    conda-forge
h5py                      3.11.0          nompi_py311hebc2b07_100    conda-forge
hdf5                      1.14.3          nompi_h4f84152_101    conda-forge
idna                      3.7                pyhd8ed1ab_0    conda-forge
importlib-metadata        7.1.0              pyha770c72_0    conda-forge
importlib_metadata        7.1.0                hd8ed1ab_0    conda-forge
itsdangerous              2.2.0              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.3              pyhd8ed1ab_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.5           py311h9547e67_1    conda-forge
krb5                      1.21.2               h659d440_0    conda-forge
lark-parser               0.12.0             pyhd8ed1ab_0    conda-forge
lcms2                     2.16                 hb7c19ff_0    conda-forge
ld_impl_linux-64          2.40                 h55db66e_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libabseil                 20230802.1      cxx17_h59595ed_0    conda-forge
libaec                    1.1.3                h59595ed_0    conda-forge
libblas                   3.9.0           22_linux64_openblas    conda-forge
libbrotlicommon           1.1.0                hd590300_1    conda-forge
libbrotlidec              1.1.0                hd590300_1    conda-forge
libbrotlienc              1.1.0                hd590300_1    conda-forge
libcblas                  3.9.0           22_linux64_openblas    conda-forge
libcurl                   8.7.1                hca28451_0    conda-forge
libdeflate                1.20                 hd590300_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 hd590300_2    conda-forge
libexpat                  2.6.2                h59595ed_0    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.2.0               h77fa898_7    conda-forge
libgfortran-ng            13.2.0               h69a702a_7    conda-forge
libgfortran5              13.2.0               hca663fb_7    conda-forge
libgomp                   13.2.0               h77fa898_7    conda-forge
libjpeg-turbo             3.0.0                hd590300_1    conda-forge
liblapack                 3.9.0           22_linux64_openblas    conda-forge
libnghttp2                1.58.0               h47da74e_1    conda-forge
libnsl                    2.0.1                hd590300_0    conda-forge
libopenblas               0.3.27          pthreads_h413a1c8_0    conda-forge
libpng                    1.6.43               h2797004_0    conda-forge
libprotobuf               4.25.1               hf27288f_2    conda-forge
libsqlite                 3.45.3               h2797004_0    conda-forge
libssh2                   1.11.0               h0841786_0    conda-forge
libstdcxx-ng              13.2.0               hc0a3c3a_7    conda-forge
libtiff                   4.6.0                h1dd3fc0_3    conda-forge
libtorch                  2.1.2           cpu_generic_ha017de0_3    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libuv                     1.48.0               hd590300_0    conda-forge
libwebp-base              1.4.0                hd590300_0    conda-forge
libxcb                    1.15                 h0b41bf4_0    conda-forge
libxcrypt                 4.4.36               hd590300_1    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
markupsafe                2.1.5           py311h459d7ec_0    conda-forge
matplotlib-base           3.8.4           py311h54ef318_0    conda-forge
mpc                       1.3.1                hfe3b2da_0    conda-forge
mpfr                      4.2.1                h9458935_1    conda-forge
mpmath                    1.3.0              pyhd8ed1ab_0    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
ncurses                   6.4.20240210         h59595ed_0    conda-forge
nequip                    0.6.0                    pypi_0    pypi
networkx                  3.3                pyhd8ed1ab_1    conda-forge
nnpops                    0.6             cpu_py311h7697b17_7    conda-forge
nomkl                     1.0                  h5ca1d4c_0    conda-forge
numpy                     1.26.4          py311h64a7726_0    conda-forge
ocl-icd                   2.3.2                hd590300_1    conda-forge
ocl-icd-system            1.0.0                         1    conda-forge
openjpeg                  2.5.2                h488ebb8_0    conda-forge
openmm                    8.1.1           py311h28d7ac7_1    conda-forge
openmm-torch              1.4             cpu_py311h446247e_4    conda-forge
openmmml                  1.1                      pypi_0    pypi
openssl                   3.3.0                hd590300_0    conda-forge
opt-einsum                3.3.0                    pypi_0    pypi
opt-einsum-fx             0.1.4                    pypi_0    pypi
packaging                 24.0               pyhd8ed1ab_0    conda-forge
pillow                    10.3.0          py311h18e6fac_0    conda-forge
pip                       24.0               pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pyparsing                 3.1.2              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
python                    3.11.9          hb806964_0_cpython    conda-forge
python-dateutil           2.9.0              pyhd8ed1ab_0    conda-forge
python_abi                3.11                    4_cp311    conda-forge
pytorch                   2.1.2           cpu_generic_py311h1584bb0_3    conda-forge
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h8228510_1    conda-forge
requests                  2.31.0             pyhd8ed1ab_0    conda-forge
scipy                     1.13.0          py311h517d4fd_1    conda-forge
setuptools                65.3.0             pyhd8ed1ab_1    conda-forge
setuptools-scm            6.3.2              pyhd8ed1ab_0    conda-forge
setuptools_scm            6.3.2                hd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sleef                     3.5.1                h9b69904_2    conda-forge
sympy                     1.12            pypyh9d50eac_103    conda-forge
tk                        8.6.13          noxft_h4845f30_101    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
torch-ema                 0.3                      pypi_0    pypi
torch-runstats            0.2.0                    pypi_0    pypi
torchani                  2.2.4           cpu_py311h12a0d1d_3    conda-forge
tqdm                      4.66.4                   pypi_0    pypi
typing_extensions         4.11.0             pyha770c72_0    conda-forge
tzdata                    2024a                h0c530f3_0    conda-forge
urllib3                   2.2.1              pyhd8ed1ab_0    conda-forge
werkzeug                  3.0.3              pyhd8ed1ab_0    conda-forge
wheel                     0.43.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.11               hd590300_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
zipp                      3.17.0             pyhd8ed1ab_0    conda-forge
zstd                      1.5.6                ha6fb4c9_0    conda-forge

If I don't specify any platform, then the simulation runs, but extremely slowly since it is on CPU.

Thank you so much in advance!

Best,
Sam

peastman · 2024-05-08T03:38:03Z

That means a plugin couldn't be loaded. Try printing the value of Platform.getPluginLoadFailures(). It will tell you which ones failed, and what the errors were.

Usually it's because some library they depended on couldn't be found, and it can be fixed by adding the directory containing the library to LD_LIBRARY_PATH.

svarner9 · 2024-05-08T03:55:57Z

That means a plugin couldn't be loaded. Try printing the value of Platform.getPluginLoadFailures(). It will tell you which ones failed, and what the errors were.

Usually it's because some library they depended on couldn't be found, and it can be fixed by adding the directory containing the library to LD_LIBRARY_PATH.

Thank you for the quick response!

I tried that based on some previous replies of yours that I found. I ran the following:

print(pluginLoadedLibNames)
print(Platform.getPluginLoadFailures())

and the output was:

('/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMPME.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMCPU.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMCUDA.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMOpenCL.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMRPMDCUDA.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMDrudeCUDA.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMAmoebaCUDA.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMRPMDOpenCL.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMTorchOpenCL.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMDrudeOpenCL.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMAmoebaOpenCL.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMRPMDReference.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMTorchReference.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMDrudeReference.so', '/home/svarner/miniconda3/envs/practicum/lib/plugins/libOpenMMAmoebaReference.so')

()

The failures command returned an empty tuple.

Best,
Sam

peastman · 2024-05-08T04:05:30Z

The versions of PyTorch and OpenMM-Torch you have installed are CPU only:

openmm-torch              1.4             cpu_py311h446247e_4    conda-forge
pytorch                   2.1.2           cpu_generic_py311h1584bb0_3    conda-forge

That might be because you have an older version of cudatoolkit:

cudatoolkit               11.5.2              hbdc67f6_13    conda-forge

If you upgrade it to 11.8, you might be able to get it to install the CUDA version of PyTorch. Conda installation issues like this tend to be frustrating and hard to figure out. They often depend on the precise order you install packages in.

svarner9 · 2024-05-08T05:09:50Z

The versions of PyTorch and OpenMM-Torch you have installed are CPU only:
openmm-torch              1.4             cpu_py311h446247e_4    conda-forge
pytorch                   2.1.2           cpu_generic_py311h1584bb0_3    conda-forge
That might be because you have an older version of cudatoolkit:
cudatoolkit               11.5.2              hbdc67f6_13    conda-forge
If you upgrade it to 11.8, you might be able to get it to install the CUDA version of PyTorch. Conda installation issues like this tend to be frustrating and hard to figure out. They often depend on the precise order you install packages in.

Ahhh I see. Thank you!

I went ahead an uninstalled openmm-torch and pytorch. I upgraded the cudatoolkit, and then installed the cuda version of pytorch:

install pytorch pytorch-cuda=11.8 -c pytorch -c nvidia

Installing openmm-torch downgraded it back to the cpu version, but then installing nnpops upgraded it back to the cuda version. I agree, conda installations are very frustrating.

It is working on GPU now, but only getting about 0.2 ns/day, whereas on lammps I was getting 1.5 ns/day. To your knowledge, could any of the following warnings have to do with it being slow?

/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/scripts/deploy.py:138: UserWarning: Models deployed before v0.6.0 don't contain information about their default_dtype or model_dtype; assuming the old default of float32 for both, but this might not be right if you had explicitly set default_dtype=float64.
  warnings.warn(
/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/utils/_global_options.py:59: UserWarning: !! Upstream issues in PyTorch versions >1.11 have been seen to cause unusual performance degredations on some CUDA systems that become worse over time; see https://github.com/mir-group/nequip/discussions/311. At present we *strongly* recommend the use of PyTorch 1.11 if using CUDA devices; while using other versions if you observe this problem, an unexpected lack of this problem, or other strange behavior, please post in the linked GitHub issue.
  warnings.warn(
/home/svarner/miniconda3/envs/practicum/lib/python3.11/site-packages/nequip/utils/_global_options.py:70: UserWarning: Setting the GLOBAL value for jit fusion strategy to `[('DYNAMIC', 3)]` which is different than the previous value of `[('STATIC', 2), ('DYNAMIC', 10)]`
  warnings.warn(

I tried to install the packages in such a way to allow me to use pytorch 1.11.0 (which according to the error is the most stable version with nequip), however, as far as I can tell there is no way to use pytorch 1.11.0 with openmm-torch. Every time I would install openmm-torch it would install pytorch 2.1.2.

This is the order that I did everything:

mamba create -n env
mamba activate env
mamba install python=3.10
mamba install -c conda-forge openmm cudatoolkit=11.8
pip install git+https://github.com/mir-group/nequip@develop
pip install git+https://github.com/sef43/openmm-ml@nequip
mamba install pytorch=1.11 pytorch-cuda=11.8 -c pytorch -c nvidia
mamba install -c conda-forge openmm-torch nnpops

JMorado · 2024-05-08T12:47:06Z

Many thanks for the thorough review, @peastman! Most of it should be now resolved.

Thanks for testing, @svarner9. I think the slow performance you're seeing is not related to that warning, the underlying issue of which is described here. You could test if the issue that underlies that warning is indeed present by identifying a slowdown in performance over time. I ran some performance benchmarks on systems much smaller than yours and did not see any decrease in performance over time, and the simulation speed is around what I would expect.

If that is your baseline OpenMM performance, I wonder what could be causing that. Do you remember by any chance what was the performance you were getting with the previous neighbor list? Does anyone have any ideas about whether it's possible to improve performance here?

svarner9 · 2024-05-08T16:16:48Z

Yes many thanks @peastman for the help!

@JMorado I am not sure, but there are a few things I can think of that might be the issue, but I am not an expert and have not looked through the code, so it might be a bit naive.

In LAMMPS the nequip pairstyle works with Kokkos, so in that case I was using 1 gpu + 32 cpus.

mpiexec -n 1 ./lmp -in in.script -k on g 1 t 32 -sf kk -pk kokkos newton on neigh full

The LAMMPS nequip pairstyle uses libtorch instead of pytorch, which could make a difference?
When reading in the model, is the cutoff set to the cutoff of the MLP? Most of them have very short cutoffs of around 5 Angstroms, so if that cutoff is not being used for neighborlists, then that could be leading to slow performance. Is that something that should be set separately?
I am getting this warning for jit but I am not sure if it is important or could be affecting performance. I have seen the NequIP devs say that it can usually be silently ignored.

/home/svarner/miniconda3/envs/practicum/lib/python3.10/site-packages/nequip/utils/_global_options.py:70: UserWarning: Setting the GLOBAL value for jit fusion strategy to `[('DYNAMIC', 3)]` which is different than the previous value of `[('STATIC', 2), ('DYNAMIC', 10)]`
  warnings.warn(

Best,
Sam

Linux-cpp-lisp · 2024-05-10T17:23:38Z

Is there an option to predict a formation energy instead of total energy, or to subtract off per-atom mean energies? That leads to a much smaller output value and better accuracy.

We actually do this internally, at least from develop onward---single precision calculations are done in a more numerically favorable range, and the final energy scalings, shiftings, and sums are done in float64, regardless of the precision of the weights. The final predictions you get should be float64, and if they aren't, something might be off.

Regarding the reproducibility of energies between ASE and OpenMM: you can try turning off TF32, or even better using a fully F64 model (default_dtype: float64 and model_dtype: float64) to ensure that this is just numerics as a sanity check.

Linux-cpp-lisp · 2024-05-10T17:25:58Z

@svarner9 a few questions on performance:

What are the actual LAMMPS vs OpenMM numbers? Not sure where they were in this thread.
Yes, there will be additional Python and doubled neighborlist overhead in OpenMM, both of which are absent in pair_allegro. This should be more important for smaller models and smaller systems.
You can ignore that particular warning about the fusion strategy safely, it is just there to ensure that nequip never silently sets global state when called from someone else's program

peastman · 2024-05-10T17:48:16Z

There shouldn't be any overhead from Python. The model gets compiled to torchscript, and the simulation gets run by C++ code.

Linux-cpp-lisp · 2024-05-10T18:06:47Z

Do you call TorchScript from Python here, or directly from C++? Not that I would expect a roundtrip through Python to matter much, just curious.

peastman · 2024-05-10T18:34:42Z

It's called directly from C++.

…be returned.

…e to downcasting to the model's dtype

JMorado · 2024-05-14T16:09:16Z

@peastman @Linux-cpp-lisp, I've trained a model with these settings:

default_dtype: float64
model_dtype: float64
allow_tf32: true

and the energy and force differences between ASE and OpenMM are indeed very small, on the order of $10^{−10}$, when combined with {"Precision": "double"} in the simulation settings.

Linux-cpp-lisp · 2024-05-15T03:18:52Z

@JMorado great!

(Note that allow_tf32: true is a no-op when model_dtype: float64 and we should probably error on this configuration, but that doesn't change the results.)

svarner9 · 2024-05-22T20:12:30Z

@svarner9 a few questions on performance:

What are the actual LAMMPS vs OpenMM numbers? Not sure where they were in this thread.

Yes, there will be additional Python and doubled neighborlist overhead in OpenMM, both of which are absent in pair_allegro. This should be more important for smaller models and smaller systems.

You can ignore that particular warning about the fusion strategy safely, it is just there to ensure that nequip never silently sets global state when called from someone else's program

@Linux-cpp-lisp I was getting 1.5 ns/day on lammps and 0.2 ns/day on openmm for a system with 645 atoms.

sef43 and others added 18 commits February 15, 2023 12:01

Add NequIP support, WIP

018f1d4

* create openmmml/models/nequippotential.py * create example for toluene example/run_nequip.py

update links for colab

e89729a

add example output

11d3477

Update NequIP MLP

2593949

* implement PBC * cleanup nequip types * user specified unit conversions * should work on GPU and CPU * add example for toy model with PBC

fix links

e8681cb

updated nequippotential

fe87d84

* uses nequip 0.6.0 @develop branch * uses torch-nl compute_neighborlist * sets torch dtype from the loaded model metadata

fix colab links

c299b46

Add NequIP README

3457016

change neighborlist to a simple but correct version

c7ccaf9

improve simple_nl code

e1b4a37

triclinic simple_nl

7fb5c15

Merge branch 'main' of github.com:sef43/openmm-ml into nequip

404503e

update implementation

72f0903

cleanup files

f023eb9

update readme

639918b

update docstring

ebda004

run on gpu

e1f0801

fix colab

e0db0f9

peastman reviewed Oct 5, 2023

View reviewed changes

sef43 mentioned this pull request Nov 3, 2023

Add Documentation #65

Merged

JMorado added 2 commits April 23, 2024 11:49

Updated NequIP potential implementation

60e33ea

Updated examples

b7ab9ed

JMorado added 5 commits April 23, 2024 12:07

Updated docstrings

e847412

Removed utils.py (simple_nl)

e010324

Fixed PBC

b1f30fa

Added NequIP tests, updated MLPotential tests, and restructured tests…

14fc1d9

… directory

Changed friction coefficient value in examples

a807679

JMorado added 4 commits May 3, 2024 19:29

Recreate inputDict each time to prevent issues if it gets modified

a777f5b

Updated examples

c699980

Merge branch 'openmm:main' into nequip

fdd8cf0

Updated dosctrings

cdedeb0

peastman reviewed May 7, 2024

View reviewed changes

Fixes to NequIP interface

aae22a8

JMorado added 2 commits May 14, 2024 15:27

Fixed forces output for hybrid ML/MM simulations. Padded forces must …

62c7cf9

…be returned.

Updated toluene-explicit files

e54aa93

JMorado force-pushed the nequip branch from 44e8621 to e54aa93 Compare May 14, 2024 15:44

JMorado added 3 commits May 14, 2024 16:45

Avoid changing the positions dtype to prevent any decrease in precision

e85828c

Avoid changing the cell dtype to prevent any decrease in precision du…

470f3cf

…e to downcasting to the model's dtype

Updated references to pip package

e3c0cf2

JMorado mentioned this pull request Sep 22, 2024

🌟 [FEATURE] OpenMM mir-group/nequip#288

Open


		"""

		def __init__(self, name, model_path, distance_to_nm, energy_to_kJ_per_mol, atom_types):

		self.register_buffer('nm_to_distance', torch.tensor(1.0/distance_to_nm))
		self.register_buffer('distance_to_nm', torch.tensor(distance_to_nm))

		``atomTypes`` parameter. This must be a list containing an integer specifying
		the atom type of each particle in the system. Note that by default the model

		test_data_dir = os.path.dirname(os.path.abspath(__file__))
		test_data_dir = os.path.join(test_data_dir, "data")

		@@ -0,0 +1,23 @@
		HETATM 1 C1 UNL 1 2.199 -0.143 0.062 1.00 0.00 C

Add support for NequIP models #60

Are you sure you want to change the base?

Add support for NequIP models #60

Conversation

sef43 commented Oct 4, 2023

peastman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jchodera commented Nov 11, 2023

svarner9 commented Apr 10, 2024

JMorado commented Apr 23, 2024

JMorado commented May 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JMorado May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svarner9 commented May 8, 2024

peastman commented May 8, 2024

svarner9 commented May 8, 2024

peastman commented May 8, 2024

svarner9 commented May 8, 2024 • edited Loading

JMorado commented May 8, 2024 • edited Loading

svarner9 commented May 8, 2024 • edited Loading

Linux-cpp-lisp commented May 10, 2024

Linux-cpp-lisp commented May 10, 2024

peastman commented May 10, 2024

Linux-cpp-lisp commented May 10, 2024

peastman commented May 10, 2024

JMorado commented May 14, 2024

Linux-cpp-lisp commented May 15, 2024

svarner9 commented May 22, 2024

JMorado May 14, 2024 •

edited

Loading

svarner9 commented May 8, 2024 •

edited

Loading

JMorado commented May 8, 2024 •

edited

Loading

svarner9 commented May 8, 2024 •

edited

Loading