pd: support paddle backend and water/se_e2_a (#4302)

Split <#4157> into several pull requests. 1. Add core modules of paddle backend(`deepmd.pd.*`) and related backend module unitests. 2. Support training/testing/freeze(C++ inference will be supported in subsequent pull request) for example water/se_e2_a. 3. Add se_e2_a related uinttests Related PR to be merged: - [x] <PaddlePaddle/Paddle#69139> ## Accuracy test ### pytorch ![image](https://github.com/user-attachments/assets/cea8f313-4a57-4575-b55a-b6cf577654a2) ### paddle: ``` log deepmd.utils.batch_size Adjust batch size from 1024 to 2048 deepmd.utils.batch_size Adjust batch size from 2048 to 4096 deepmd.entrypoints.test # number of test data : 30 , deepmd.entrypoints.test Energy MAE : 7.467160e-02 eV deepmd.entrypoints.test Energy RMSE : 8.981154e-02 eV deepmd.entrypoints.test Energy MAE/Natoms : 3.889146e-04 eV deepmd.entrypoints.test Energy RMSE/Natoms : 4.677685e-04 eV deepmd.entrypoints.test Force MAE : 4.495974e-02 eV/A deepmd.entrypoints.test Force RMSE : 5.883696e-02 eV/A deepmd.entrypoints.test Virial MAE : 4.683873e+00 eV deepmd.entrypoints.test Virial RMSE : 6.298489e+00 eV deepmd.entrypoints.test Virial MAE/Natoms : 2.439517e-02 eV deepmd.entrypoints.test Virial RMSE/Natoms : 3.280463e-02 eV ```  ## Summary by CodeRabbit ## Release Notes - **New Features** - Introduced support for PaddlePaddle in the DeePMD framework, enhancing model training and evaluation capabilities. - Added new backend options and configuration files for multitask models. - Implemented new classes and methods for handling Paddle-specific functionalities, including descriptor calculations and model evaluations. - Enhanced the command-line interface to include Paddle as a backend option. - Expanded the functionality for managing Paddle dependencies and configurations in the testing framework. - **Bug Fixes** - Improved error handling and robustness in various components across the framework. - **Tests** - Expanded the test suite to include Paddle-specific tests, ensuring consistency and reliability across different backends. - Introduced unit tests for new functionalities related to Paddle, including model evaluations and descriptor calculations. - Added tests to validate force gradient calculations and smoothness properties in models. - Implemented tests for neighbor statistics and region transformations, ensuring accuracy in calculations. - **Documentation** - Updated documentation across multiple modules to reflect new features and usage instructions.  --------- Signed-off-by: HydrogenSulfate <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
deepmodeling · Nov 27, 2024 · 4a45fe5 · 4a45fe5
1 parent 3cdf407
commit 4a45fe5
Show file tree

Hide file tree

Showing 136 changed files with 21,039 additions and 25 deletions.
diff --git a/.github/workflows/test_cuda.yml b/.github/workflows/test_cuda.yml
@@ -51,6 +51,7 @@ jobs:
     - run: |
         export PYTORCH_ROOT=$(python -c 'import torch;print(torch.__path__[0])')
         export TENSORFLOW_ROOT=$(python -c 'import importlib,pathlib;print(pathlib.Path(importlib.util.find_spec("tensorflow").origin).parent)')
+        source/install/uv_with_retry.sh pip install --system --pre paddlepaddle-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/cu123/
         source/install/uv_with_retry.sh pip install --system -v -e .[gpu,test,lmp,cu12,torch,jax] mpi4py
       env:
         DP_VARIANT: cuda

diff --git a/.github/workflows/test_python.yml b/.github/workflows/test_python.yml
@@ -31,6 +31,7 @@ jobs:
         export PYTORCH_ROOT=$(python -c 'import torch;print(torch.__path__[0])')
         source/install/uv_with_retry.sh pip install --system -e .[test,jax] mpi4py
         source/install/uv_with_retry.sh pip install --system horovod --no-build-isolation
+        source/install/uv_with_retry.sh pip install --system --pre "paddlepaddle" -i https://www.paddlepaddle.org.cn/packages/nightly/cpu/
       env:
         # Please note that uv has some issues with finding
         # existing TensorFlow package. Currently, it uses

diff --git a/backend/find_paddle.py b/backend/find_paddle.py
@@ -0,0 +1,133 @@
+# SPDX-License-Identifier: LGPL-3.0-or-later
+import importlib
+import os
+import site
+from functools import (
+    lru_cache,
+)
+from importlib.machinery import (
+    FileFinder,
+)
+from importlib.util import (
+    find_spec,
+)
+from pathlib import (
+    Path,
+)
+from sysconfig import (
+    get_path,
+)
+from typing import (
+    Optional,
+    Union,
+)
+
+
+@lru_cache
+def find_paddle() -> tuple[Optional[str], list[str]]:
+    """Find PaddlePadle library.
+
+    Tries to find PaddlePadle in the order of:
+
+    1. Environment variable `PADDLE_ROOT` if set
+    2. The current Python environment.
+    3. user site packages directory if enabled
+    4. system site packages directory (purelib)
+
+    Considering the default PaddlePadle package still uses old CXX11 ABI, we
+    cannot install it automatically.
+
+    Returns
+    -------
+    str, optional
+        PaddlePadle library path if found.
+    list of str
+        Paddle requirement if not found. Empty if found.
+    """
+    if os.environ.get("DP_ENABLE_PADDLE", "0") == "0":
+        return None, []
+    requires = []
+    pd_spec = None
+
+    if (pd_spec is None or not pd_spec) and os.environ.get("PADDLE_ROOT") is not None:
+        site_packages = Path(os.environ.get("PADDLE_ROOT")).parent.absolute()
+        pd_spec = FileFinder(str(site_packages)).find_spec("paddle")
+
+    # get paddle spec
+    # note: isolated build will not work for backend
+    if pd_spec is None or not pd_spec:
+        pd_spec = find_spec("paddle")
+
+    if not pd_spec and site.ENABLE_USER_SITE:
+        # first search TF from user site-packages before global site-packages
+        site_packages = site.getusersitepackages()
+        if site_packages:
+            pd_spec = FileFinder(site_packages).find_spec("paddle")
+
+    if not pd_spec:
+        # purelib gets site-packages path
+        site_packages = get_path("purelib")
+        if site_packages:
+            pd_spec = FileFinder(site_packages).find_spec("paddle")
+
+    # get install dir from spec
+    try:
+        pd_install_dir = pd_spec.submodule_search_locations[0]  # type: ignore
+        # AttributeError if ft_spec is None
+        # TypeError if submodule_search_locations are None
+        # IndexError if submodule_search_locations is an empty list
+    except (AttributeError, TypeError, IndexError):
+        pd_install_dir = None
+        requires.extend(get_pd_requirement()["paddle"])
+    return pd_install_dir, requires
+
+
+@lru_cache
+def get_pd_requirement(pd_version: str = "") -> dict:
+    """Get PaddlePadle requirement when Paddle is not installed.
+
+    If pd_version is not given and the environment variable `PADDLE_VERSION` is set, use it as the requirement.
+
+    Parameters
+    ----------
+    pd_version : str, optional
+        Paddle version
+
+    Returns
+    -------
+    dict
+        PaddlePadle requirement.
+    """
+    if pd_version is None:
+        return {"paddle": []}
+    if pd_version == "":
+        pd_version = os.environ.get("PADDLE_VERSION", "")
+
+    return {
+        "paddle": [
+            "paddlepaddle>=3.0.0b1" if pd_version != "" else "paddlepaddle>=3.0.0b1",
+        ],
+    }
+
+
+@lru_cache
+def get_pd_version(pd_path: Optional[Union[str, Path]]) -> str:
+    """Get Paddle version from a Paddle Python library path.
+
+    Parameters
+    ----------
+    pd_path : str or Path
+        Paddle Python library path, e.g. "/python3.10/site-packages/paddle/"
+
+    Returns
+    -------
+    str
+        version
+    """
+    if pd_path is None or pd_path == "":
+        return ""
+    version_file = Path(pd_path) / "version" / "__init__.py"
+    spec = importlib.util.spec_from_file_location("paddle.version", version_file)
+    module = importlib.util.module_from_spec(spec)
+    spec.loader.exec_module(module)
+    return module.full_version
diff --git a/deepmd/backend/paddle.py b/deepmd/backend/paddle.py
@@ -0,0 +1,124 @@
+# SPDX-License-Identifier: LGPL-3.0-or-later
+from importlib.util import (
+    find_spec,
+)
+from typing import (
+    TYPE_CHECKING,
+    Callable,
+    ClassVar,
+)
+
+from deepmd.backend.backend import (
+    Backend,
+)
+
+if TYPE_CHECKING:
+    from argparse import (
+        Namespace,
+    )
+
+    from deepmd.infer.deep_eval import (
+        DeepEvalBackend,
+    )
+    from deepmd.utils.neighbor_stat import (
+        NeighborStat,
+    )
+
+
+@Backend.register("pd")
+@Backend.register("paddle")
+class PaddleBackend(Backend):
+    """Paddle backend."""
+
+    name = "Paddle"
+    """The formal name of the backend."""
+    features: ClassVar[Backend.Feature] = (
+        Backend.Feature.ENTRY_POINT
+        | Backend.Feature.DEEP_EVAL
+        | Backend.Feature.NEIGHBOR_STAT
+        | Backend.Feature.IO
+    )
+    """The features of the backend."""
+    suffixes: ClassVar[list[str]] = [".json", ".pd"]
+    """The suffixes of the backend."""
+
+    def is_available(self) -> bool:
+        """Check if the backend is available.
+
+        Returns
+        -------
+        bool
+            Whether the backend is available.
+        """
+        return find_spec("paddle") is not None
+
+    @property
+    def entry_point_hook(self) -> Callable[["Namespace"], None]:
+        """The entry point hook of the backend.
+
+        Returns
+        -------
+        Callable[[Namespace], None]
+            The entry point hook of the backend.
+        """
+        from deepmd.pd.entrypoints.main import main as deepmd_main
+
+        return deepmd_main
+
+    @property
+    def deep_eval(self) -> type["DeepEvalBackend"]:
+        """The Deep Eval backend of the backend.
+
+        Returns
+        -------
+        type[DeepEvalBackend]
+            The Deep Eval backend of the backend.
+        """
+        from deepmd.pd.infer.deep_eval import DeepEval as DeepEvalPD
+
+        return DeepEvalPD
+
+    @property
+    def neighbor_stat(self) -> type["NeighborStat"]:
+        """The neighbor statistics of the backend.
+
+        Returns
+        -------
+        type[NeighborStat]
+            The neighbor statistics of the backend.
+        """
+        from deepmd.pd.utils.neighbor_stat import (
+            NeighborStat,
+        )
+
+        return NeighborStat
+
+    @property
+    def serialize_hook(self) -> Callable[[str], dict]:
+        """The serialize hook to convert the model file to a dictionary.
+
+        Returns
+        -------
+        Callable[[str], dict]
+            The serialize hook of the backend.
+        """
+        from deepmd.pd.utils.serialization import (
+            serialize_from_file,
+        )
+
+        return serialize_from_file
+
+    @property
+    def deserialize_hook(self) -> Callable[[str, dict], None]:
+        """The deserialize hook to convert the dictionary to a model file.
+
+        Returns
+        -------
+        Callable[[str, dict], None]
+            The deserialize hook of the backend.
+        """
+        from deepmd.pd.utils.serialization import (
+            deserialize_to_file,
+        )
+
+        return deserialize_to_file
diff --git a/deepmd/dpmodel/model/make_model.py b/deepmd/dpmodel/model/make_model.py
@@ -457,7 +457,7 @@ def format_nlist(
 
             Returns
             -------
-            formated_nlist
+            formatted_nlist
                 the formatted nlist.
 
             """

diff --git a/deepmd/main.py b/deepmd/main.py
@@ -99,9 +99,10 @@ def main_parser() -> argparse.ArgumentParser:
         formatter_class=RawTextArgumentDefaultsHelpFormatter,
         epilog=textwrap.dedent(
             """\
-        Use --tf or --pt to choose the backend:
+        Use --tf, --pt or --pd to choose the backend:
             dp --tf train input.json
             dp --pt train input.json
+            dp --pd train input.json
         """
         ),
     )

diff --git a/deepmd/pd/__init__.py b/deepmd/pd/__init__.py
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: LGPL-3.0-or-later
+
+# import customized OPs globally
+
+from deepmd.utils.entry_point import (
+    load_entry_point,
+)
+
+load_entry_point("deepmd.pd")
+
+__all__ = []
diff --git a/deepmd/pd/entrypoints/__init__.py b/deepmd/pd/entrypoints/__init__.py
@@ -0,0 +1 @@
+# SPDX-License-Identifier: LGPL-3.0-or-later