-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pd: add CPP inference with LAMMPS #4467
Conversation
📝 WalkthroughWalkthroughThis pull request introduces comprehensive support for the PaddlePaddle (Paddle) deep learning framework in the DeePMD project. The changes span multiple files across the project, adding a new backend option Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (8)
source/api_cc/include/DeepPotPD.h (3)
238-239
: Typo in documentation: Correct "arraay" to "array"There is a typographical error in the documentation comment. The word "arraay" should be corrected to "array" for clarity.
Apply this diff to fix the typo:
- * @brief Get the buffer arraay of this model. + * @brief Get the buffer array of this model.
251-251
: Typo in documentation: Missing period at the end of the sentenceThe documentation comment is missing a period at the end of the sentence. For consistency and readability, please add a period.
Apply this diff:
- * @param[out] buffer_scalar Buffer scalar. + * @param[out] buffer_scalar Buffer scalar. + */
342-342
: Variable naming: Renameinited
toinitialized
The private member variable
inited
is a non-standard abbreviation. Renaming it toinitialized
enhances readability and adheres to common naming conventions.Apply this diff to rename the variable:
-int inited; +bool initialized;Also, update all occurrences of
inited
throughout the class toinitialized
.source/api_cc/src/DeepPotPD.cc (2)
323-324
: Incorrect exception message: Replace "fparam" with "aparam"In the exception messages, "fparam is not supported as input yet." should be "aparam is not supported as input yet." when handling
aparam
. This correction avoids confusion and provides accurate information.Apply these diffs:
throw deepmd::deepmd_exception("fparam is not supported as input yet."); +throw deepmd::deepmd_exception("aparam is not supported as input yet.");
Repeat this correction in both occurrences where
aparam
is involved.Also applies to: 330-331
47-52
: Hardcoded GPU settings: Make GPU memory and ID configurableThe GPU memory size and device ID are hardcoded, which may not be suitable for all environments. Consider making these parameters configurable to enhance flexibility and usability.
Suggest modifying the code to accept GPU settings from configuration or input parameters.
-int gpu_num = 1; -if (gpu_num > 0) { - gpu_id = gpu_rank % gpu_num; -} else { - gpu_id = 0; -} +gpu_id = gpu_rank;Similarly, allow the GPU memory size to be specified:
-config->EnableUseGpu( - 4096, 0); // annotate it if use cpu, default use gpu with 4G mem +config->EnableUseGpu(memory_size_in_mb, gpu_id);Ensure that
memory_size_in_mb
andgpu_id
are properly defined and passed to theinit
method.source/api_cc/include/version.h.in (1)
13-13
: Update comments to reflect new variableA new variable
global_pd_lib
has been added. Consider updating any documentation or comments to include this variable, ensuring that all global library variables are documented consistently.Add a comment describing
global_pd_lib
, similar to the existing variables.+// Path to the Paddle libraries const std::string global_pd_lib="@PADDLE_LIBRARIES@";
source/api_cc/CMakeLists.txt (1)
30-37
: Consider making the ROCm library path configurable.While the Paddle integration looks good, the hardcoded path to
libgalaxyhip.so
might cause issues across different systems. Consider making this path configurable through a CMake variable.if(ENABLE_PADDLE AND NOT BUILD_PY_IF) target_link_libraries(${libname} PUBLIC "${PADDLE_LIBRARIES}") target_compile_definitions(${libname} PUBLIC BUILD_PADDLE) if(DP_VARIANT STREQUAL "rocm") + set(GALAXYHIP_LIBRARY "${hip_LIB_INSTALL_DIR}/libgalaxyhip.so" CACHE PATH "Path to libgalaxyhip.so") target_link_libraries(${libname} - PUBLIC "${hip_LIB_INSTALL_DIR}/libgalaxyhip.so") + PUBLIC "${GALAXYHIP_LIBRARY}") endif() endif()source/api_cc/src/DeepPot.cc (1)
59-63
: Improve error message for better user guidance.While the initialization logic is correct, consider making the error message more informative by including build instructions.
- throw deepmd::deepmd_exception("PaddlePaddle backend is not supported yet"); + throw deepmd::deepmd_exception("PaddlePaddle backend is not built. Please rebuild with -DENABLE_PADDLE=ON");
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (9)
source/CMakeLists.txt
(3 hunks)source/api_cc/CMakeLists.txt
(1 hunks)source/api_cc/include/DeepPotPD.h
(1 hunks)source/api_cc/include/version.h.in
(1 hunks)source/api_cc/src/DeepPot.cc
(2 hunks)source/api_cc/src/DeepPotPD.cc
(1 hunks)source/api_cc/src/common.cc
(2 hunks)source/config/CMakeLists.txt
(1 hunks)source/config/run_config.ini
(1 hunks)
🔇 Additional comments (8)
source/api_cc/src/DeepPotPD.cc (1)
185-186
:
Ownership mismatch: Potential issue with unique_ptr
assignment
Assigning the result of GetInputHandle
to a std::unique_ptr
may cause ownership issues if GetInputHandle
does not return a unique_ptr
. Verify the return type of GetInputHandle
and ensure proper ownership semantics.
Please check if GetInputHandle
returns a std::unique_ptr
. If it returns a raw pointer or shared pointer, adjust the declaration of firstneigh_tensor
accordingly.
-std::unique_ptr<paddle_infer::Tensor> firstneigh_tensor;
+auto firstneigh_tensor = predictor->GetInputHandle("nlist");
Or, if a shared_ptr
is appropriate:
-std::unique_ptr<paddle_infer::Tensor> firstneigh_tensor;
+std::shared_ptr<paddle_infer::Tensor> firstneigh_tensor;
source/config/CMakeLists.txt (1)
17-21
: Consistency in handling boolean options
The addition of the ENABLE_PADDLE
option follows the existing pattern, but ensure that all boolean variables are consistently handled throughout the configuration. Verify that ENABLE_PADDLE
is properly set and used in other CMake scripts.
source/config/run_config.ini (1)
17-18
: LGTM! Configuration parameters for Paddle integration.
The new configuration parameters follow the established naming convention and structure.
source/CMakeLists.txt (2)
12-12
: LGTM! Option declaration follows CMake conventions.
The ENABLE_PADDLE option is properly declared with a safe default value of OFF.
335-337
: LGTM! Backend messaging is properly updated.
The backend messaging section is correctly updated to include Paddle in the list of enabled backends.
Also applies to: 341-341
source/api_cc/src/DeepPot.cc (1)
18-20
: LGTM! Header inclusion follows existing backend patterns.
The Paddle backend header inclusion is properly guarded with BUILD_PADDLE and follows the same pattern as other backend includes.
source/api_cc/src/common.cc (2)
1395-1397
: LGTM! Summary printing follows existing patterns.
The addition of Paddle library path to the summary output is consistent with how other backends are handled.
1415-1417
:
Consider using a more specific file extension for Paddle models.
Using .json
as the model file extension could lead to conflicts since it's a very common format used for various purposes. Consider using a more specific extension like .pdmodel
or .pd.json
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
source/api_cc/include/DeepPotPD.h (2)
259-262
: Inconsistent data type: declareaparam_nall
asbool
The member variable
aparam_nall
is declared as anint
but is used as a boolean in theis_aparam_nall()
method. To improve code clarity and correctness, consider changing its type tobool
.Apply the following change:
- int aparam_nall; + bool aparam_nall;Ensure that all assignments and usages of
aparam_nall
are updated accordingly.Also applies to: 347-347
17-18
: Delete copy constructor and assignment operator to prevent unintended copyingThe
DeepPotPD
class manages resources usingstd::shared_ptr
. To prevent unintended copying, which could lead to resource management issues, consider deleting the copy constructor and copy assignment operator.Add the following declarations to the class:
DeepPotPD(const DeepPotPD&) = delete; DeepPotPD& operator=(const DeepPotPD&) = delete;source/lmp/plugin/CMakeLists.txt (1)
11-11
: Add space in the message for clarityThere is no space between "TO" and
${LAMMPS_SOURCE_ROOT}
in the message, causing the output to be concatenated without a space. Please add a space for improved readability.Apply the following change:
-message(STATUS "STARTING DOWNLOAD LAMMPS TO" ${LAMMPS_SOURCE_ROOT}) +message(STATUS "STARTING DOWNLOAD LAMMPS TO " ${LAMMPS_SOURCE_ROOT})
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
source/api_cc/include/DeepPotPD.h
(1 hunks)source/api_cc/src/common.cc
(2 hunks)source/lmp/plugin/CMakeLists.txt
(1 hunks)
🔇 Additional comments (3)
source/api_cc/include/DeepPotPD.h (1)
27-27
: Avoid passing small built-in types by const reference
Passing small built-in types like int
by const reference (const int&
) is less efficient than passing them by value. It is recommended to pass them by value to improve performance and code clarity.
Apply the following changes to update the parameter declarations:
At line 27:
- const int& gpu_rank = 0,
+ int gpu_rank = 0,
At line 38:
- const int& gpu_rank = 0,
+ int gpu_rank = 0,
At line 114:
- const int& ago,
+ int ago,
At line 145:
- const int& nframes,
+ int nframes,
At line 183:
- const int& nframes,
+ int nframes,
Also applies to: 38-38, 114-114, 145-145, 183-183
source/api_cc/src/common.cc (2)
1395-1397
: Paddle library integration is correctly implemented
The addition of the Paddle library path in the print_summary
function is properly included and follows the existing pattern for other backends.
1415-1417
: Backend detection for Paddle models is appropriate
The modification to the get_backend
function to detect .json
files for the Paddle backend is correctly implemented and ensures proper backend selection.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## devel #4467 +/- ##
==========================================
+ Coverage 84.57% 84.60% +0.02%
==========================================
Files 677 680 +3
Lines 63916 64467 +551
Branches 3486 3538 +52
==========================================
+ Hits 54060 54545 +485
- Misses 8730 8781 +51
- Partials 1126 1141 +15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests should be added and enabled, and then we could see the C++ converage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (7)
.pre-commit-config.yaml (1)
66-66
: Consider narrowing the JSON exclusion pattern.The current pattern excludes all JSON files from clang-format checks. Consider narrowing this to specific JSON files or directories if only certain JSON files need to be excluded.
- exclude: ^(source/3rdparty|source/lib/src/gpu/cudart/.+\.inc|.+\.ipynb$|.+\.json$) + exclude: ^(source/3rdparty|source/lib/src/gpu/cudart/.+\.inc|.+\.ipynb$|source/.*deeppot.*\.json$)deepmd/pd/utils/serialization.py (2)
40-40
: Preserve or remove commented-out serialization code.
If themodel_def_script
is no longer relevant, consider removing this line entirely to keep the codebase clean. If it remains relevant, please clarify why it's commented out.
65-66
: Clarify the saved model file extension.
Usingmodel_file.split(".json")[0]
strips the “.json” extension; ensure downstream tools support this naming convention for Paddle models (usually “.pdmodel/.pdiparams”).source/api_cc/tests/test_deeppot_pd.cc (4)
56-75
: Consider adjusting EPSILON for different data types.
You’re comparing floating-point results using the same EPSILON for all typed tests. If TypeParam can be float or double, you might want to use different tolerances to reduce false positives in tests.
122-136
: Possible duplication in model testing.
You might benefit from factoring out repeated test logic into a shared utility or fixture (like theMyModel
approach). This would improve maintainability and reduce repetitive code blocks.
418-427
: Avoid modifying the global test input in-place.
Here, you prepend “dummy” virtual atoms to the main coordinate/atype arrays, which can complicate test isolation. Consider copying or leveraging a fixture to keep the base arrays unmodified.
529-532
: Add coverage for error scenarios or corner cases.
A “print_summary” test is helpful for demonstration purposes, but it might be beneficial to include negative or boundary-case tests, e.g., missing file, invalid JSON, etc.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.pre-commit-config.yaml
(2 hunks)deepmd/pd/utils/serialization.py
(2 hunks)source/api_cc/tests/test_deeppot_pd.cc
(1 hunks)
🔇 Additional comments (4)
.pre-commit-config.yaml (1)
10-10
: Verify the existence of deeppot_sea.json file.
The exclude pattern has been updated to ignore deeppot_sea.json
. Let's verify this file is part of the PR changes.
✅ Verification successful
The deeppot_sea.json file exists and is correctly excluded
The file deeppot_sea.json
is present in the PR at source/tests/infer/deeppot_sea.json
, confirming that the exclude pattern in the pre-commit config is valid and necessary.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check if deeppot_sea.json exists in the PR
gh pr view 4467 --json files -q '.files[].path' | grep 'deeppot_sea.json'
# Find any potential locations of this file
fd 'deeppot_sea.json'
Length of output: 167
deepmd/pd/utils/serialization.py (2)
51-54
: Import statements look good.
Importing InputSpec
directly from paddle.static
is correct for specifying dynamic model inputs.
55-63
: Handle dynamic shapes cautiously.
You are explicitly specifying [-1, -1, 3]
, [-1, -1]
, and [-1, -1, -1]
in the input spec. Consider whether you need more flexible dimensions (e.g., for variable numbers of atoms or neighbors in other scenarios).
source/api_cc/tests/test_deeppot_pd.cc (1)
1-15
: General test structure is well-organized.
Using Google Test for multiple data types is a solid approach. The test scaffolding is thorough and sets the stage for verifying all relevant aspects of the DeepPot
API.
Thanks for comments, I have added inference model files and an unitest file |
Please check |
6cc6eb7
to
5a048c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (7)
source/api_cc/src/DeepPotPD.cc (6)
14-20
: Consider using a more generalized approach or direct insertion for building the flattened list.
Currently,createNlistTensor
flattens a nested vector via insertion intoret
. This approach works fine for small to medium data. However, for very large sizes, consider reserving capacity or using bulk insertion to reduce repeated memory allocations and copying.std::vector<int> createNlistTensor(const std::vector<std::vector<int>>& data) { std::vector<int> ret; + // Optionally reserve capacity to avoid repeated reallocations + size_t total_size = 0; + for (const auto& row : data) { + total_size += row.size(); + } + ret.reserve(total_size); for (const auto& row : data) { ret.insert(ret.end(), row.begin(), row.end()); } return ret; }
22-33
: Graceful error handling is properly implemented, but consider logging context details.
The constructor uses atry/catch
block to ensure correct rethrow if initialization fails. This is good for preventing partial initialization. However, capturing and logging the actual exception content (e.g.,std::current_exception()
) before rethrow may provide additional debugging context.
190-203
: Remove or convert commented-out code sections if not needed.
It seems that handling forfparam
andaparam
is commented out. Consider removing this code if no longer necessary or converting it into a clearly marked TODO/FIXME if future support is planned.
291-359
: Reduce duplication between the two compute methods.
The secondcompute
method in lines 291-359 is structurally similar to the one in lines 113-290, differing mostly in parameters likenghost
andInputNlist
. Consider refactoring into a shared internal function to avoid code duplication.
387-400
: Memory-safety consideration for string building inget_type_map
.
Appending each character viastd::string(1, char_c)
is correct but might be less efficient. You could push back into astd::vector<char>
and convert at the end. Also, ensure that all returnedchar_c
values are valid printable characters if used in logs.
422-514
: Consider unifying thecomputew
andcomputew_mixed_type
logic.
The many overloads in lines 422-484 and lines 485-514 could be simplified using templates or bridging functions to minimize duplication. If you eventually implement mixed-type logic for Paddle, maintaining a single interface would be more manageable.source/CMakeLists.txt (1)
12-12
: Provide clarity onENABLE_PADDLE
usage.
You introduceENABLE_PADDLE
with a default OFF, which is fine. It would be helpful to clarify in documentation or comments how users can setPADDLE_INFERENCE_DIR
if they enable this option manually (since you provide an auto-download fallback).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
.devcontainer/build_cxx.sh
(1 hunks)deepmd/pd/utils/serialization.py
(1 hunks)source/CMakeLists.txt
(3 hunks)source/api_cc/src/DeepPotPD.cc
(1 hunks)source/install/test_cc_local.sh
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pd/utils/serialization.py
🔇 Additional comments (6)
source/api_cc/src/DeepPotPD.cc (2)
34-42
: Avoid repeated initialization in highly concurrent calls.
The code checks if (inited) ... return;
to skip re-initialization. If users call init
from multiple threads, more robust concurrency safeguards (such as a mutex or an atomic check) might be required to prevent partial re-initialization.
242-256
: Consistent with past feedback: atomic virial storage is still unsupported.
The code rightly throws an exception for unimplemented features. You might mark this code path with a clear TODO if you plan to implement atomic virial output later, or remove it if it's not intended to be supported.
.devcontainer/build_cxx.sh (1)
14-14
: Confirm whether Paddle should always be enabled in container builds.
You've set -D ENABLE_PADDLE=ON
unconditionally. If Paddle is optional or if some environments don’t require it, consider a runtime switch or environment variable to toggle Paddle support.
source/install/test_cc_local.sh (1)
24-24
: Include tests for Paddle-based C++ inference.
Enabling Paddle in the build is a great step. Ensure that ctest
and your new test_deeppot_pd.cc
are properly invoked to validate Paddle-based functionality, especially since the PR aims to increase coverage.
source/CMakeLists.txt (2)
339-345
: Consistent logging for enabled backends.
You’ve added a check for ENABLE_PADDLE
to log “- Paddle” among other backends, ensuring consistent feedback to users. This looks good.
30-84
: 🛠️ Refactor suggestion
Security and maintainability improvements for third-party downloads.
Downloading Paddle from a hardcoded URL raises the risk of broken links or malicious tampering. Consider verifying a checksum or providing a fallback if the download fails. Also, check whether your license policy requires disclaimers for included third-party code.
if(ENABLE_PADDLE)
if(NOT DEFINED PADDLE_INFERENCE_DIR)
message(STATUS "PADDLE_INFERENCE_DIR is not defined. Downloading...")
+ # Consider verifying a known hash or fallback if the URL is unavailable
set(DOWNLOAD_URL
"https://paddle-qa.bj.bcebos.com/paddle-pipeline/..."
)
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (1)
source/api_cc/src/DeepPotPD.cc (1)
509-511
: Document unimplemented features with TODO comments.Add TODO comments to indicate future implementation plans for mixed-type compute methods.
+ // TODO(#4467): Implement mixed-type compute methods for paddle backend throw deepmd::deepmd_exception( "computew_mixed_type is not implemented in paddle backend yet");
Also applies to: 524-526
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
source/CMakeLists.txt
(3 hunks)source/api_cc/src/DeepPotPD.cc
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- source/CMakeLists.txt
🔇 Additional comments (1)
source/api_cc/src/DeepPotPD.cc (1)
219-221
: 🛠️ Refactor suggestion
Use output names instead of hard-coded indices.
Accessing output tensors by fixed indices is fragile. Consider using named outputs or validating the tensor names.
- auto energy_ = predictor->GetOutputHandle(output_names[1]);
- auto force_ = predictor->GetOutputHandle(output_names[2]);
- auto virial_ = predictor->GetOutputHandle(output_names[3]);
+ const char* ENERGY_OUTPUT = "energy";
+ const char* FORCE_OUTPUT = "force";
+ const char* VIRIAL_OUTPUT = "virial";
+
+ auto energy_ = predictor->GetOutputHandle(ENERGY_OUTPUT);
+ auto force_ = predictor->GetOutputHandle(FORCE_OUTPUT);
+ auto virial_ = predictor->GetOutputHandle(VIRIAL_OUTPUT);
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (5)
source/api_cc/include/DeepPotPD.h (3)
64-75
: Improve extension handling for model files
When checkingmodel.find(".json")
ormodel.find(".pdmodel")
, any mismatch throws a generic “do not exist, please check it.” It may be clearer to throw an exception about an “unsupported file extension,” especially if the file does exist but with an unexpected extension.
110-117
: Consider retrieving hyperparameters from model buffers
Currently,rcut
,ntypes
,dfparam
, etc., are hard-coded. This might cause desynchronization if the model defines different parameters. Consider usingget_buffer<int>
orget_buffer<double>
calls to keep these values in sync with the model.
240-244
: Clarify naming convention for methods
Methods such ascompute
,compute_mixed_type
, andcomputew
are similarly named but vary in both parameters and behavior. Consider consolidating or clarifying method names to reduce confusion.source/api_cc/src/DeepPotPD.cc (2)
247-265
: Atomic virial not supported yet
You throw an exception for atomic virial outputs. If this is a planned feature, document it clearly or remove the partial code until fully supported.
332-337
: Remove or finalize commented-out sections
Large blocks of commented-out code forfparam_tensor
andaparam_tensor
remain. It may be best to remove them or wrap them in#ifdef DEBUG
to maintain clarity.Also applies to: 339-344
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
source/CMakeLists.txt
(3 hunks)source/api_cc/include/DeepPotPD.h
(1 hunks)source/api_cc/src/DeepPotPD.cc
(1 hunks)
🔇 Additional comments (12)
source/api_cc/include/DeepPotPD.h (5)
12-12
: Class Overview Looks Good
Overall, this newly introduced DeepPotPD
class provides a clear and well-documented interface for the Paddle-based backend, indicating a solid design for managing model inference within deepmd
.
1-2
: License Header Acknowledgment
The SPDX identifier is properly included, ensuring license clarity.
26-28
: Avoid passing small built-in types by const reference
This repeats a prior recommendation: passing small built-in types by const reference can be less efficient than passing by value.
- DeepPotPD(const std::string& model,
- const int& gpu_rank = 0,
- const std::string& file_content = "");
+ DeepPotPD(const std::string& model,
+ int gpu_rank = 0,
+ const std::string& file_content = "");
36-38
: Avoid passing small built-in types by const reference
Same issue for the init
method’s gpu_rank
parameter.
- void init(const std::string& model,
- const int& gpu_rank = 0,
- const std::string& file_content = "");
+ void init(const std::string& model,
+ int gpu_rank = 0,
+ const std::string& file_content = "");
140-145
: Check multi-atom parameters for consistency
In templated compute
methods, the function references nall, aparam_nall, ntypes, ...
. Ensure all array-like parameters have the correct dimensions and that the code properly handles corner cases (e.g., zero atoms in a frame).
source/CMakeLists.txt (4)
12-12
: New Option for Paddle Support
Introducing ENABLE_PADDLE
is clear and maintains the project’s modular approach.
30-51
: Consider verifying and hashing downloaded Paddle inference archives
This process downloads Paddle libraries but does not verify checksums. Corrupt or tampered files could compromise the build.
From prior feedback: adding a fallback or verification mechanism (e.g., EXPECTED_HASH SHA256=<hash>
) could improve security.
67-70
: Review path-based linking
When referencing library directories (${PADDLE_INFERENCE_DIR}/paddle/lib
, etc.), please ensure these paths remain valid across different systems and do not overwrite user-provided settings.
Also applies to: 71-71
352-354
: Clear backend listing
Listing Paddle among enabled backends is consistent. This helps users confirm the project’s build configuration.
source/api_cc/src/DeepPotPD.cc (3)
25-25
: Use int instead of const int& for small integral parameters
Same best practice: avoid const int&
for gpu_rank
.
- DeepPotPD::DeepPotPD(const std::string& model,
- const int& gpu_rank,
- ...
+ DeepPotPD::DeepPotPD(const std::string& model,
+ int gpu_rank,
+ ...
65-78
: More descriptive exception message for unsupported file extension
Currently, the code says "do not exist, please check it." For clarity, mention unsupported extension if .json
or .pdmodel
is missing.
219-223
: Validate output handle indices
You’re fetching output_names[1]
, [2]
, [3]
, etc. Without verifying output_names.size()
, out-of-range access is a risk if the model’s output changes.
Also applies to: 226-228, 230-233
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
source/lmp/tests/test_lammps_pd.py (3)
247-278
: Refactor duplicate error messages and document magic numbers.Consider the following improvements:
- Extract the repeated error message into a constant to avoid duplication
- Document or extract the hardcoded neighbor cutoff value (2.0) into a named constant with explanation
+# At the top of the file with other constants +ALLOWED_UNITS = ["metal", "real", "si"] +NEIGHBOR_CUTOFF = 2.0 # Angstroms for metal/real, meters for si +NEIGHBOR_CUTOFF_SI = NEIGHBOR_CUTOFF * 1e-10 # Convert to meters for si units + def _lammps(data_file, units="metal") -> PyLammps: lammps = PyLammps() lammps.units(units) lammps.boundary("p p p") lammps.atom_style("atomic") if units == "metal" or units == "real": - lammps.neighbor("2.0 bin") + lammps.neighbor(f"{NEIGHBOR_CUTOFF} bin") elif units == "si": - lammps.neighbor("2.0e-10 bin") + lammps.neighbor(f"{NEIGHBOR_CUTOFF_SI} bin") else: - raise ValueError("units should be metal, real, or si") + raise ValueError(f"units should be one of {ALLOWED_UNITS}")
343-691
: Extract common model deviation validation logic.The test functions for model deviation calculations contain repeated patterns for validating forces and virial tensors. Consider extracting these common validation patterns into helper functions to improve maintainability and reduce duplication.
Example refactor:
def validate_model_deviation_forces(md, expected_f, expected_f2, unit_conversion=1.0): """Helper to validate force-related model deviations.""" expected_md_f = np.linalg.norm(np.std([expected_f, expected_f2], axis=0), axis=1) assert md[7:] == pytest.approx(expected_md_f * unit_conversion) assert md[4] == pytest.approx(np.max(expected_md_f) * unit_conversion) assert md[5] == pytest.approx(np.min(expected_md_f) * unit_conversion) assert md[6] == pytest.approx(np.mean(expected_md_f) * unit_conversion) def validate_model_deviation_virial(md, expected_v, expected_v2, unit_conversion=1.0): """Helper to validate virial-related model deviations.""" expected_md_v = ( np.std([np.sum(expected_v, axis=0), np.sum(expected_v2, axis=0)], axis=0) / 6 ) assert md[1] == pytest.approx(np.max(expected_md_v) * unit_conversion) assert md[2] == pytest.approx(np.min(expected_md_v) * unit_conversion) assert md[3] == pytest.approx( np.sqrt(np.mean(np.square(expected_md_v))) * unit_conversion )
706-752
: Parameterize the number of MPI processes.The test hardcodes the number of MPI processes to 2. Consider parameterizing this value to test with different numbers of processes and document why 2 was chosen as the default.
+# At the top of the file with other constants +DEFAULT_MPI_PROCS = 2 # Minimum number of processes needed to test MPI functionality + @pytest.mark.skipif( shutil.which("mpirun") is None, reason="MPI is not installed on this system" ) @pytest.mark.skipif( importlib.util.find_spec("mpi4py") is None, reason="mpi4py is not installed" ) @pytest.mark.parametrize( - ("balance_args",), - [(["--balance"],), ([],)], + ("balance_args", "n_procs"), + [ + (["--balance"], DEFAULT_MPI_PROCS), + ([], DEFAULT_MPI_PROCS), + (["--balance"], DEFAULT_MPI_PROCS * 2), # Test with more processes + ], ) -def test_pair_deepmd_mpi(balance_args: list): +def test_pair_deepmd_mpi(balance_args: list, n_procs: int): with tempfile.NamedTemporaryFile() as f: sp.check_call( [ "mpirun", "-n", - "2", + str(n_procs), sys.executable,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/lmp/tests/test_lammps_pd.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (7)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (4)
source/lmp/tests/test_lammps_pd.py (4)
1-21
: LGTM! Well-organized imports.The file has appropriate license header and well-organized imports that follow Python best practices.
226-228
: Handle exceptions when converting model files.The
sp.check_output
call that converts the model files does not handle exceptions. If the conversion fails, it could cause the tests to crash unexpectedly.
242-245
: Ensure all test artifacts are properly cleaned up.
data_file_si
is created during setup but is never removed inteardown_module
, which could leave behind temporary files.
280-306
: LGTM! Well-structured test fixtures.The fixtures follow pytest best practices:
- Proper resource management with yield and cleanup
- Clear separation of concerns for different unit systems
- Consistent pattern across all fixtures
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
source/lmp/tests/test_lammps_pd.py (1)
308-309
: Refactor to reduce code duplication in test functionsMultiple test functions, such as
test_pair_deepmd
,test_pair_deepmd_virial
, and others, contain repeated code for setting up thepair_style
,pair_coeff
, running the simulation, and performing assertions. To improve maintainability and reduce duplication, consider extracting common code into helper functions or parameterized tests.Also applies to: 320-321, 343-344, 369-370, 408-409, 438-439, 471-472, 483-484, 497-498, 530-531, 560-561, 624-625, 657-658, 694-695
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/lmp/tests/test_lammps_pd.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (14)
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Test Python (3, 3.9)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cpu, cpu)
- GitHub Check: Analyze (c-cpp)
🔇 Additional comments (2)
source/lmp/tests/test_lammps_pd.py (2)
226-228
: Handle exceptions when converting model filesThe
sp.check_output
call that converts the model files does not handle exceptions. If the conversion fails, it could cause the tests to crash unexpectedly. Consider adding a try-except block to handle potential errors gracefully.
242-244
: Ensure all test artifacts are properly cleaned up
data_file_si
is created duringsetup_module()
but is not removed inteardown_module()
. This could leave behind temporary files after tests are run. Please add the removal ofdata_file_si
to prevent cluttering the filesystem.Apply this diff to clean up the test artifact:
def teardown_module(): os.remove(data_file) os.remove(data_type_map_file) + os.remove(data_file_si)
rerun ci Signed-off-by: HydrogenSulfate <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
.github/workflows/test_cuda.yml (1)
88-90
: Enhance robustness of library file copying.While the copy commands are correctly placed after LD_LIBRARY_PATH setup, consider these improvements for better reliability:
- Add error checking for source directories
- Clean destination before copying to prevent conflicts
- Use specific file patterns instead of copying all files
- cp ${{ github.workspace }}/source/build_tests/paddle_inference_install_dir/paddle/lib/* ${{ github.workspace }}/dp_test/lib/ - cp ${{ github.workspace }}/source/build_tests/paddle_inference_install_dir/third_party/install/onednn/lib/* ${{ github.workspace }}/dp_test/lib/ - cp ${{ github.workspace }}/source/build_tests/paddle_inference_install_dir/third_party/install/mklml/lib/* ${{ github.workspace }}/dp_test/lib/ + PADDLE_LIB_DIR="${{ github.workspace }}/source/build_tests/paddle_inference_install_dir" + TARGET_LIB_DIR="${{ github.workspace }}/dp_test/lib" + + for dir in "$PADDLE_LIB_DIR/paddle/lib" "$PADDLE_LIB_DIR/third_party/install/onednn/lib" "$PADDLE_LIB_DIR/third_party/install/mklml/lib"; do + if [ ! -d "$dir" ]; then + echo "Error: Directory $dir does not exist" + exit 1 + fi + echo "Copying libraries from $dir" + cp -v "$dir"/*.so* "$TARGET_LIB_DIR"/ || echo "No .so files in $dir" + cp -v "$dir"/*.dylib* "$TARGET_LIB_DIR"/ || echo "No .dylib files in $dir" + donesource/CMakeLists.txt (2)
85-88
: Add version checks and make paths configurable.Consider these improvements:
- Add version checks for critical dependencies (protobuf, glog, etc.)
- Make third-party library paths configurable via CMake variables
- Consolidate multiple link_directories calls
+# Add version checks +find_package(Protobuf REQUIRED) +if(Protobuf_VERSION VERSION_LESS "3.0.0") + message(FATAL_ERROR "Protobuf >= 3.0.0 required") +endif() +# Make paths configurable +set(PADDLE_THIRD_PARTY_PATH "${PADDLE_INFERENCE_DIR}/third_party/install/" + CACHE PATH "Path to Paddle third-party libraries") +# Consolidate link directories link_directories( ${PADDLE_INFERENCE_DIR}/paddle/lib ${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib - ${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib) + ${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib + ${PADDLE_THIRD_PARTY_PATH}/protobuf/lib + ${PADDLE_THIRD_PARTY_PATH}/glog/lib + ${PADDLE_THIRD_PARTY_PATH}/gflags/lib + ${PADDLE_THIRD_PARTY_PATH}/xxhash/lib)Also applies to: 126-140
90-115
: Reduce code duplication in RPATH configuration.The current implementation duplicates the same paths for both CMAKE_BUILD_RPATH and CMAKE_INSTALL_RPATH. Consider refactoring to reduce duplication.
+# Define paths once +set(PADDLE_RPATH_DIRS + "${PADDLE_INFERENCE_DIR}/paddle/lib" + "${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib" + "${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib") -# Check and append to existing CMAKE_BUILD_RPATH -if(DEFINED CMAKE_BUILD_RPATH) - list(APPEND CMAKE_BUILD_RPATH "${PADDLE_INFERENCE_DIR}/paddle/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib") -else() - set(CMAKE_BUILD_RPATH - "${PADDLE_INFERENCE_DIR}/paddle/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib") -endif() +# Append to CMAKE_BUILD_RPATH +if(DEFINED CMAKE_BUILD_RPATH) + list(APPEND CMAKE_BUILD_RPATH ${PADDLE_RPATH_DIRS}) +else() + set(CMAKE_BUILD_RPATH ${PADDLE_RPATH_DIRS}) +endif() -# Check and append to existing CMAKE_INSTALL_RPATH -if(DEFINED CMAKE_INSTALL_RPATH) - list(APPEND CMAKE_INSTALL_RPATH "${PADDLE_INFERENCE_DIR}/paddle/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib") -else() - set(CMAKE_INSTALL_RPATH - "${PADDLE_INFERENCE_DIR}/paddle/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/onednn/lib" - "${PADDLE_INFERENCE_DIR}/third_party/install/mklml/lib") -endif() +# Append to CMAKE_INSTALL_RPATH +if(DEFINED CMAKE_INSTALL_RPATH) + list(APPEND CMAKE_INSTALL_RPATH ${PADDLE_RPATH_DIRS}) +else() + set(CMAKE_INSTALL_RPATH ${PADDLE_RPATH_DIRS}) +endif()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/test_cuda.yml
(1 hunks)source/CMakeLists.txt
(3 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (21)
- GitHub Check: Build wheels for cp310-manylinux_aarch64
- GitHub Check: Build wheels for cp311-win_amd64
- GitHub Check: Build wheels for cp311-macosx_x86_64
- GitHub Check: Test Python (4, 3.12)
- GitHub Check: Test Python (3, 3.12)
- GitHub Check: Build C++ (clang, clang)
- GitHub Check: Test Python (2, 3.12)
- GitHub Check: Build C++ (rocm, rocm)
- GitHub Check: Test Python (2, 3.9)
- GitHub Check: Analyze (python)
- GitHub Check: Build C++ (cuda120, cuda)
- GitHub Check: Build C library (2.14, >=2.5.0rc0,<2.15, libdeepmd_c_cu11.tar.gz)
- GitHub Check: Build C++ (cuda, cuda)
- GitHub Check: Test C++ (false)
- GitHub Check: Analyze (javascript-typescript)
- GitHub Check: Test Python (1, 3.12)
- GitHub Check: Analyze (c-cpp)
- GitHub Check: Test C++ (true)
- GitHub Check: Test Python (1, 3.9)
- GitHub Check: Build C library (2.18, libdeepmd_c.tar.gz)
- GitHub Check: Build C++ (cpu, cpu)
🔇 Additional comments (3)
source/CMakeLists.txt (3)
12-12
: LGTM! Option declaration follows project conventions.The ENABLE_PADDLE option is correctly declared following the same pattern as other backend options.
30-83
: Improve robustness of Paddle library download and configuration.The current implementation has several potential issues:
- Hardcoded download URLs that might become outdated
- No verification of downloaded file integrity
- No fallback mechanism if download fails
396-402
: LGTM! Backend messaging is consistent.The changes correctly integrate Paddle into the backend messaging system, following the established pattern for other backends.
@njzjz I found that |
04e651b
to
34091f9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
source/CMakeLists.txt (1)
154-154
: Consider adding ROCM supportThe commented out line suggests ROCM support was considered but not implemented. Consider adding ROCM support for broader hardware compatibility.
Would you like me to help implement ROCM support or create a GitHub issue to track this enhancement?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
source/CMakeLists.txt
(3 hunks)
🔇 Additional comments (4)
source/CMakeLists.txt (4)
12-12
: LGTM: Option declaration follows CMake conventionsThe
ENABLE_PADDLE
option is correctly declared and defaults to OFF.
85-153
: LGTM: Library paths and RPATHs are properly configuredThe implementation correctly:
- Sets up library and include directories
- Configures build and install RPATHs
- Adds Paddle paths to backend paths as requested in previous reviews
410-416
: LGTM: Backend configuration is consistentThe Paddle backend is correctly integrated into the existing backend configuration system.
31-83
: 🛠️ Refactor suggestionImprove robustness of Paddle library download
The current implementation has several potential issues:
- No verification of downloaded file integrity
- No error handling for failed downloads
- Hardcoded URLs that might become outdated
- No caching of downloads
Consider implementing these improvements:
if(NOT DEFINED PADDLE_INFERENCE_DIR) + set(PADDLE_DOWNLOAD_HASH_CUDA11 "expected_hash_here" CACHE STRING "Expected hash for CUDA 11 Paddle inference library") + set(PADDLE_DOWNLOAD_HASH_CUDA12 "expected_hash_here" CACHE STRING "Expected hash for CUDA 12 Paddle inference library") + set(PADDLE_DOWNLOAD_HASH_CPU "expected_hash_here" CACHE STRING "Expected hash for CPU Paddle inference library") + if(USE_CUDA_TOOLKIT) # ... existing CUDA version check ... file(DOWNLOAD ${DOWNLOAD_URL} ${TGZ_FILE} + EXPECTED_HASH SHA256=${CUDA_MAJOR_VERSION VERSION_EQUAL "11" ? ${PADDLE_DOWNLOAD_HASH_CUDA11} : ${PADDLE_DOWNLOAD_HASH_CUDA12}} + STATUS DOWNLOAD_STATUS + SHOW_PROGRESS) + + list(GET DOWNLOAD_STATUS 0 STATUS_CODE) + if(NOT STATUS_CODE EQUAL 0) + message(FATAL_ERROR "Failed to download Paddle inference library from ${DOWNLOAD_URL}") + endif() else() # ... CPU version download ... file(DOWNLOAD ${DOWNLOAD_URL} ${TGZ_FILE} + EXPECTED_HASH SHA256=${PADDLE_DOWNLOAD_HASH_CPU} + STATUS DOWNLOAD_STATUS + SHOW_PROGRESS) endif()Likely invalid or redundant comment.
deepmd/pd/utils/serialization.py
and paddle inference files can be deserialized usingdp convert-backend deeppot_sea.yaml deeppot_sea.json
related PR:
Summary by CodeRabbit
Based on the comprehensive changes, here are the updated release notes:
New Features
Configuration
ENABLE_PADDLE
to toggle Paddle support.Testing
Documentation
Performance