-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor property fitting interface #4471
Refactor property fitting interface #4471
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThis pull request introduces a comprehensive enhancement to the DeepMD-kit's property fitting functionality. The changes primarily focus on introducing a dynamic Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (3)
deepmd/pt/utils/stat.py (1)
506-507
: Simplify dictionary membership checksUse direct dictionary membership tests for better readability and efficiency.
Apply this diff to simplify the code:
- assert ii in bias_atom_e.keys() - assert ii in std_atom_e.keys() + assert ii in bias_atom_e + assert ii in std_atom_e🧰 Tools
🪛 Ruff (0.8.2)
506-506: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
507-507: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
deepmd/utils/argcheck.py (1)
1584-1584
: Provide documentation fordoc_property_name
The placeholder
"TODO"
indicates missing documentation. Please provide a meaningful description fordoc_property_name
to enhance clarity.Apply this diff to address the issue:
- doc_property_name = "TODO" + doc_property_name = "The name(s) of the property(ies), which should correspond to the properties in the dataset."deepmd/pt/model/task/property.py (1)
131-131
: Consider documenting the sub_var_name parameterThe
sub_var_name
parameter is being passed to OutputVariableDef, but its purpose and expected values aren't documented in the class docstring.Add documentation for the new parameter:
"""Fitting the rotationally invariant properties of `task_dim` of the system. Parameters ---------- + property_name : str|list + Name of the property or list of property names. """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
deepmd/dpmodel/output_def.py
(2 hunks)deepmd/pt/loss/property.py
(4 hunks)deepmd/pt/model/atomic_model/property_atomic_model.py
(1 hunks)deepmd/pt/model/task/property.py
(3 hunks)deepmd/pt/utils/stat.py
(4 hunks)deepmd/utils/argcheck.py
(4 hunks)deepmd/utils/out_stat.py
(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py
506-506: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
507-507: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (8)
deepmd/pt/model/atomic_model/property_atomic_model.py (1)
38-41
: Application of output statistics is correctly streamlined
The updated code simplifies the application of output statistics to the atomic outputs, enhancing readability and maintainability.
deepmd/pt/loss/property.py (4)
27-28
: Effective addition of property_name
and property_dim
parameters
Introducing property_name
and property_dim
enhances the flexibility to handle multiple properties with varying dimensions.
49-55
: Proper handling of property names and dimensions
Converting property_name
and property_dim
to lists ensures that multiple properties are managed consistently. The assertion confirms that the sum of property_dim
matches task_dim
, maintaining integrity.
81-95
: Correct concatenation and normalization of properties
The code appropriately concatenates the property labels and normalizes predictions and labels using out_bias
and out_std
. The assertions ensure dimensional consistency.
163-171
: Dynamic label requirements for multiple properties
The label_requirement
method correctly generates data requirements for each property, facilitating proper data handling during training.
deepmd/utils/argcheck.py (2)
Line range hint 1618-1623
: Addition of property_name
parameter
Including property_name
as an argument aligns with modifications in other components, facilitating flexible property handling.
Line range hint 2486-2523
: Appropriate integration of property parameters in loss configuration
The addition of property_name
and property_dim
parameters in the loss configuration allows for precise specification of property names and dimensions during loss computation.
deepmd/pt/model/task/property.py (1)
85-85
: LGTM: Property name parameter addition
The addition of property_name
parameter with a default value "property" maintains backward compatibility while adding flexibility for property naming.
Also applies to: 99-99
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
deepmd/pt/loss/property.py (3)
27-28
: LGTM! Consider using typing.Union for type hintsThe implementation correctly handles both single and multiple property configurations. The validation ensures property dimensions sum up to task_dim.
Consider using
typing.Union
instead of the|
operator for better compatibility with older Python versions:- property_name: str | list = "property", - property_dim: int | list = 1, + property_name: Union[str, list] = "property", + property_dim: Union[int, list] = 1,Also applies to: 49-55
81-98
: Consider optimizing property concatenation and enhancing validationThe implementation correctly handles multiple properties, but there's room for improvement in validation and performance.
Consider these improvements:
- Validate dimensions before concatenation:
def _validate_property_shapes(self, label, nbz): for name, dim in self.property_name_dim_mapping.items(): if label[name].shape != (nbz, dim): raise ValueError( f"Property {name} shape mismatch. " f"Expected ({nbz}, {dim}), got {label[name].shape}" )
- Use torch.cat with pre-allocated list for better performance:
concat_property = [None] * len(self.property_name) for i, name in enumerate(self.property_name): concat_property[i] = label[name] label["property"] = torch.cat(concat_property, dim=1)
105-106
: Consider reducing code duplication in loss calculationsThe normalization logic is correctly implemented but duplicated across different loss functions.
Consider extracting the normalization logic:
def _normalize_predictions(self, label_tensor, pred_tensor, out_bias, out_std): return (label_tensor - out_bias) / out_std, (pred_tensor - out_bias) / out_std # Usage in loss functions norm_label, norm_pred = self._normalize_predictions( label["property"], model_pred["property"], out_bias, out_std ) loss += F.smooth_l1_loss(norm_label, norm_pred, reduction="sum", beta=self.beta)Also applies to: 112-114, 118-119, 125-126
deepmd/utils/argcheck.py (3)
1584-1584
: Add proper documentation for property_name parameterThe documentation for property_name is marked as TODO. Please add proper documentation explaining the purpose and usage of this parameter.
2492-2493
: Add proper documentation for property parametersBoth property_name and property_dim parameters need proper documentation to explain:
- Their purpose and usage
- Expected value types and formats
- Relationship with dataset properties
Line range hint
515-516
: Simplify dictionary key checksRemove unnecessary
.keys()
calls when checking for key existence in dictionaries.Apply this diff:
- assert ii in bias_atom_e.keys() - assert ii in std_atom_e.keys() + assert ii in bias_atom_e + assert ii in std_atom_e
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pt/loss/property.py
(4 hunks)deepmd/pt/model/atomic_model/property_atomic_model.py
(1 hunks)deepmd/pt/model/task/property.py
(3 hunks)deepmd/pt/utils/stat.py
(4 hunks)deepmd/utils/argcheck.py
(4 hunks)deepmd/utils/out_stat.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/task/property.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py
515-515: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
516-516: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (7)
deepmd/pt/loss/property.py (1)
168-176
: LGTM! Clear and correct implementation of label requirements
The implementation correctly generates requirements for each property with appropriate dimensions.
deepmd/utils/out_stat.py (1)
133-179
: 🛠️ Refactor suggestion
Fix documentation and optimize implementation
The implementation has similar issues to those previously identified.
- Update the docstring to accurately reflect the implementation:
- compute the least-squares solution as the atomic output bias and std.
+ compute the mean as the atomic output bias and std.
- Optimize the implementation using numpy's broadcasting:
- computed_output_bias = np.repeat(
- np.mean(output_redu, axis=0)[np.newaxis, :], ntypes, axis=0
- )
+ computed_output_bias = np.broadcast_to(
+ np.mean(output_redu, axis=0), (ntypes, output_redu.shape[1])
+ )
- Consider adding input validation for assigned_bias parameter since it's currently unused:
if assigned_bias is not None:
warnings.warn("assigned_bias parameter is not used in compute_stats_property")
deepmd/pt/model/atomic_model/property_atomic_model.py (1)
38-41
: LGTM! Clean simplification of output statistics application
The implementation is now more straightforward and consistent with the changes in other files.
deepmd/pt/utils/stat.py (4)
32-32
: LGTM: Import added for property statistics support
The import of compute_stats_property
is properly placed and used in the property statistics computation logic.
294-298
: LGTM: Property sub-variable handling added
The code correctly extends the keys list with sub-variable names from property definitions, enabling proper handling of property statistics.
495-527
: LGTM: Property statistics computation implemented correctly
The implementation properly:
- Computes property statistics using the appropriate method
- Concatenates bias and standard deviation values
- Handles tensor shape transformations correctly
🧰 Tools
🪛 Ruff (0.8.2)
515-515: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
516-516: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
382-391
:
Fix generator expression evaluation
The generator expression is used directly in a boolean context, which doesn't evaluate as intended. Wrap it with any()
to properly evaluate the condition.
Apply this diff to fix the issue:
- if (
- "property" in atomic_output.var_defs
- and (
- ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
- )
- )
+ if (
+ "property" in atomic_output.var_defs
+ and any(
+ ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
+ )
+ )
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
deepmd/utils/argcheck.py (1)
1584-1584
: Documentation needed for property_name parameter.The TODO comment indicates missing documentation for the property_name parameter. Please add a description explaining its purpose and usage.
deepmd/utils/out_stat.py (2)
171-174
: Optimize numpy operationsThe current implementation can be optimized using numpy's broadcasting capabilities.
- computed_output_bias = np.repeat( - np.mean(output_redu, axis=0)[np.newaxis, :], ntypes, axis=0 - ) + computed_output_bias = np.broadcast_to(np.mean(output_redu, axis=0), (ntypes, output_redu.shape[1]))
133-137
: Document or remove unused parameterThe
assigned_bias
parameter is declared but never used in the function implementation.Either document why this parameter exists but is unused, or remove it if it's not needed.
deepmd/pt/utils/stat.py (2)
294-298
: Simplify sub-keys extractionThe sub-keys extraction can be written more concisely using list comprehension.
- sub_keys = [] - for key in keys: - if atomic_output.var_defs[key].sub_var_name is not None: - sub_keys.extend(atomic_output.var_defs[key].sub_var_name) - keys.extend(sub_keys) + keys.extend([ + sub_key for key in keys + if atomic_output.var_defs[key].sub_var_name is not None + for sub_key in atomic_output.var_defs[key].sub_var_name + ])
495-506
: Simplify dictionary key checksThe code can be simplified by removing unnecessary
.keys()
calls.- if "property" in atomic_output.var_defs: + if "property" in atomic_output.var_defs and kk in stats_input: bias_atom_e[kk], std_atom_e[kk] = compute_stats_property( stats_input[kk], merged_natoms[kk], assigned_bias=assigned_atom_ener[kk], ) - else: + elif kk in stats_input: bias_atom_e[kk], std_atom_e[kk] = compute_stats_from_redu( stats_input[kk], merged_natoms[kk], assigned_bias=assigned_atom_ener[kk], rcond=rcond, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
deepmd/pt/loss/property.py
(4 hunks)deepmd/pt/model/atomic_model/property_atomic_model.py
(1 hunks)deepmd/pt/model/task/property.py
(3 hunks)deepmd/pt/utils/stat.py
(4 hunks)deepmd/utils/argcheck.py
(4 hunks)deepmd/utils/out_stat.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/pt/model/task/property.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py
515-515: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
516-516: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (10)
deepmd/pt/model/atomic_model/property_atomic_model.py (1)
38-41
: LGTM! Clean refactor of bias application logic.
The simplified implementation is more maintainable while preserving the same functionality. The direct multiplication by standard deviation followed by bias addition is cleaner than the previous conditional approach.
deepmd/pt/loss/property.py (5)
81-83
: LGTM! Proper shape validation.
The shape validation ensures the property predictions match the expected dimensions.
84-92
: LGTM! Well-structured property concatenation.
The implementation properly:
- Handles multiple properties
- Validates individual property dimensions
- Concatenates properties while maintaining their order
94-97
: LGTM! Proper normalization setup.
The code correctly retrieves and validates the normalization parameters (out_std and out_bias).
105-106
: LGTM! Consistent normalization across loss functions.
The normalization is consistently applied across all supported loss functions (smooth_mae, mae, mse, rmse).
Also applies to: 112-113, 118-119, 125-126
168-176
: LGTM! Well-structured label requirements.
The implementation properly generates label requirements for each property with appropriate dimensions.
deepmd/utils/argcheck.py (3)
1618-1624
: LGTM! Property configuration parameters.
The addition of property_name parameter with proper type checking and defaults is well-structured.
2492-2493
: LGTM! Clear documentation for property parameters.
The documentation clearly explains the relationship between property names/dimensions and the dataset.
2516-2529
: LGTM! Well-structured property configuration.
The implementation properly handles both single and multiple property configurations with appropriate defaults.
deepmd/utils/out_stat.py (1)
138-142
:
Fix incorrect docstring description
The docstring states that the function computes "the least-squares solution" but the implementation actually computes the mean value.
- compute the least-squares solution as the atomic output bias and std.
+ compute the mean value as the atomic output bias and std.
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (3)
deepmd/pt/loss/property.py (3)
Line range hint
28-56
: Consider adding validation for property dimensionsWhile the code correctly validates that the sum of property dimensions equals task_dim, consider adding additional validation:
- Ensure property_dim values are positive integers
- Verify that property_name and property_dim lists have the same length
if isinstance(property_dim, int): property_dim = [property_dim] self.property_name = property_name + if not all(isinstance(dim, int) and dim > 0 for dim in property_dim): + raise ValueError("All property dimensions must be positive integers") + if len(property_name) != len(property_dim): + raise ValueError("Length of property_name and property_dim must match") assert self.task_dim == sum(property_dim)
95-99
: Consider adding shape validation for normalization parametersThe code assumes out_bias has the same shape as out_std. Add explicit validation:
out_std = model.atomic_model.out_std[0][0] out_bias = model.atomic_model.out_bias[0][0] assert len(out_std.shape) == 1 + assert out_bias.shape == out_std.shape, "out_bias and out_std must have the same shape" assert out_std.shape[0] == self.task_dim
106-127
: Consider refactoring repeated normalization logicThe normalization logic
(x - out_bias) / out_std
is repeated across different loss functions. Consider extracting it to a helper method:+ def _normalize(self, x): + """Normalize input using out_bias and out_std.""" + return (x - self.out_bias) / self.out_std def forward(self, input_dict, model, label, natoms, learning_rate=0.0, mae=False): # ... existing code ... if self.loss_func == "smooth_mae": loss += F.smooth_l1_loss( - (label["property"] - out_bias) / out_std, - (model_pred["property"] - out_bias) / out_std, + self._normalize(label["property"]), + self._normalize(model_pred["property"]), reduction="sum", beta=self.beta, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/dpmodel/output_def.py
(2 hunks)deepmd/pt/loss/property.py
(5 hunks)deepmd/pt/model/task/property.py
(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/dpmodel/output_def.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/task/property.py
85-85: Undefined name Union
(F821)
🔇 Additional comments (4)
deepmd/pt/model/task/property.py (2)
Line range hint 85-99
: LGTM: Constructor parameter addition
The addition of property_name
parameter with a sensible default value maintains backward compatibility while enabling more flexible property handling.
🧰 Tools
🪛 Ruff (0.8.2)
85-85: Undefined name Union
(F821)
131-131
: LGTM: Output definition enhancement
The sub_var_name
parameter is correctly utilized to pass the property name to the output definition.
deepmd/pt/loss/property.py (2)
85-94
: LGTM: Property concatenation and validation
The property concatenation logic is well-implemented with proper shape validation for each property.
169-177
: LGTM: Dynamic label requirements
The label requirements are correctly updated to handle multiple properties, with proper dimension mapping and settings.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
deepmd/pt/loss/property.py (2)
Line range hint
30-58
: Enhance type hints and validation messagesThe constructor changes look good overall, but could benefit from some improvements:
- Consider using more specific type hints:
- property_name: Union[str, list] = "property" - property_dim: Union[int, list] = 1 + property_name: Union[str, List[str]] = "property" + property_dim: Union[int, List[int]] = 1
- Add more descriptive validation:
- assert self.task_dim == sum(property_dim) + assert self.task_dim == sum(property_dim), ( + f"Sum of property dimensions {sum(property_dim)} does not match task_dim {self.task_dim}" + )
97-129
: Refactor loss computation for better maintainabilityConsider extracting the normalization logic and improving the loss function selection:
- Extract normalization to a separate method:
def _normalize_property(self, property_tensor: torch.Tensor, out_bias: torch.Tensor, out_std: torch.Tensor) -> torch.Tensor: assert len(out_std.shape) == 1, f"Expected 1D tensor for out_std, got shape {out_std.shape}" assert out_std.shape[0] == self.task_dim, ( f"Dimension mismatch: out_std has shape {out_std.shape}, " f"expected ({self.task_dim},)" ) return (property_tensor - out_bias) / out_std
- Consider using a dictionary for loss functions to improve maintainability:
LOSS_FUNCTIONS = { "smooth_mae": lambda x, y, beta: F.smooth_l1_loss(x, y, reduction="sum", beta=beta), "mae": lambda x, y, _: F.l1_loss(x, y, reduction="sum"), "mse": lambda x, y, _: F.mse_loss(x, y, reduction="sum"), "rmse": lambda x, y, _: torch.sqrt(F.mse_loss(x, y, reduction="mean")), }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/pt/loss/property.py
(5 hunks)
🔇 Additional comments (2)
deepmd/pt/loss/property.py (2)
171-179
: LGTM! Label requirements properly handle multiple properties
The implementation correctly generates requirements for each property while maintaining consistency with the must=True and high_prec=True flags.
Line range hint 1-179
: Verify integration with dependent components
Please ensure that all components using PropertyLoss have been updated to handle multiple properties correctly.
✅ Verification successful
Let me gather more information about the test files and training integration to ensure proper handling of multiple properties.
Based on the gathered information, I can now provide the final response:
Multiple property support has been properly integrated
The changes to PropertyLoss are well-integrated with the rest of the codebase:
- Training integration is clean through
deepmd/pt/train/training.py
- Test coverage exists in
source/tests/pt/model/test_property_fitting.py
andsource/tests/pt/test_training.py
- Property access patterns are consistently used only within the PropertyLoss class
- The property concatenation logic is self-contained and doesn't affect other components
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check usage of PropertyLoss and property handling in the codebase
# Check for PropertyLoss usage
echo "Checking PropertyLoss usage..."
rg -l "PropertyLoss" --type py
# Check for property access patterns that might need updating
echo "Checking property access patterns..."
rg -A 2 "model_pred\[.property.\]|label\[.property.\]" --type py
# Check for potential property-related tests
echo "Checking property-related tests..."
rg -l "test.*property" --type py
Length of output: 3309
Script:
#!/bin/bash
# Check test implementations and training integration
# Check property-related test implementations
echo "Checking property test implementations..."
rg -A 10 "test.*property" source/tests/pt/model/test_property_fitting.py source/tests/pt/test_training.py
# Check training integration
echo "Checking training integration..."
rg -A 5 "PropertyLoss" deepmd/pt/train/training.py
# Check property model predictions
echo "Checking property model predictions..."
ast-grep --pattern 'def forward($$$) { $$$ property $$$ }'
Length of output: 2695
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (5)
deepmd/pt/loss/property.py (5)
30-31
: Add documentation for new parametersThe docstring is missing documentation for the new parameters
property_name
andproperty_dim
.Add the following to the Parameters section of the docstring:
beta: The 'beta' parameter in 'smooth_mae' loss. + property_name : Union[str, list] + Name(s) of the property/properties to fit. Can be a single string or list of strings. + property_dim : Union[int, list] + Dimension(s) of the property/properties. Can be a single integer or list of integers.
52-58
: Enhance input validationThe validation logic could be more robust to handle edge cases.
Consider adding these validations:
if isinstance(property_name, str): property_name = [property_name] if isinstance(property_dim, int): property_dim = [property_dim] + if not property_name: + raise ValueError("property_name cannot be empty") + if not property_dim: + raise ValueError("property_dim cannot be empty") + if len(property_name) != len(property_dim): + raise ValueError("Length of property_name must match property_dim") + if any(dim <= 0 for dim in property_dim): + raise ValueError("All property dimensions must be positive") self.property_name = property_name assert self.task_dim == sum(property_dim)
84-95
: Improve error messages and performanceThe shape assertions could be more informative, and the property concatenation could be optimized.
Consider these improvements:
- assert model_pred["property"].shape == (nbz, self.task_dim) + assert model_pred["property"].shape == (nbz, self.task_dim), \ + f"Expected property shape ({nbz}, {self.task_dim}), got {model_pred['property'].shape}" - concat_property = [] - for property_name in self.property_name: - assert label[property_name].shape == ( - nbz, - self.property_name_dim_mapping[property_name], - ) - concat_property.append(label[property_name]) - label["property"] = torch.cat(concat_property, dim=1) + # Pre-allocate tensor for better performance + label["property"] = torch.empty((nbz, self.task_dim), + dtype=label[self.property_name[0]].dtype, + device=label[self.property_name[0]].device) + start_idx = 0 + for property_name in self.property_name: + dim = self.property_name_dim_mapping[property_name] + prop_tensor = label[property_name] + expected_shape = (nbz, dim) + assert prop_tensor.shape == expected_shape, \ + f"Expected {property_name} shape {expected_shape}, got {prop_tensor.shape}" + label["property"][:, start_idx:start_idx + dim] = prop_tensor + start_idx += dim
108-129
: Reduce code duplication in loss normalizationThe normalization logic
(x - out_bias) / out_std
is repeated across different loss functions.Consider extracting the normalization into a helper method:
+ def _normalize(self, x, out_bias, out_std): + return (x - out_bias) / out_std def forward(self, input_dict, model, label, natoms, learning_rate=0.0, mae=False): # ... existing code ... if self.loss_func == "smooth_mae": loss += F.smooth_l1_loss( - (label["property"] - out_bias) / out_std, - (model_pred["property"] - out_bias) / out_std, + self._normalize(label["property"], out_bias, out_std), + self._normalize(model_pred["property"], out_bias, out_std), reduction="sum", beta=self.beta, )
171-179
: Consider pre-allocating the requirements listThe current implementation builds the list one item at a time.
Consider pre-allocating the list for better performance:
@property def label_requirement(self) -> list[DataRequirementItem]: """Return data label requirements needed for this loss calculation.""" - label_requirement = [] + label_requirement = [None] * len(self.property_name) + for i, property_name in enumerate(self.property_name): - for property_name in self.property_name: - label_requirement.append( + label_requirement[i] = ( DataRequirementItem( property_name, ndof=self.property_name_dim_mapping[property_name], atomic=False, must=True, high_prec=True, ) ) return label_requirement
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/dpmodel/output_def.py
(2 hunks)deepmd/pt/loss/property.py
(5 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/dpmodel/output_def.py
200-200: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (2)
deepmd/dpmodel/output_def.py (1)
Line range hint 200-224
: Verify integration of sub_var_name parameter
The sub_var_name
parameter has been added but appears to be unused in the class methods. Please clarify:
- The intended purpose of this parameter
- How it integrates with the existing output variable handling
- Whether additional methods need to be updated to support sub-variables
Let's check for any existing usage or similar patterns in the codebase:
🧰 Tools
🪛 Ruff (0.8.2)
200-200: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
deepmd/pt/loss/property.py (1)
30-31
: Verify backward compatibility
The new parameters have default values that should maintain backward compatibility, but we should verify this doesn't break existing models.
Let's check for existing usage patterns:
Also applies to: 52-58
✅ Verification successful
The initial results show PropertyLoss usage in training.py with simple instantiation. Let's examine the test configurations and test files to understand existing usage patterns and verify backward compatibility implications.
The configuration files show simple property loss configurations with just "type": "property" without any specific property_name or property_dim parameters, which is good for backward compatibility. Let's examine the test files to complete our verification.
We've found several test files that could show how PropertyLoss is used in tests. Let's examine their content to verify backward compatibility implications.
Based on the comprehensive test implementations and configuration files examined, I can now generate the final response:
Backward compatibility is maintained
The changes to PropertyLoss are backward compatible for the following reasons:
- All test cases use default values for property_name and property_dim
- Configuration files (input.json) only specify "type": "property" without any additional parameters
- Test implementations verify various scenarios (intensive/non-intensive, different dimensions) using the default parameters
- PropertyLoss instantiations in training.py use **loss_params without explicit property_name/property_dim
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for existing PropertyLoss instantiations that might be affected
# Search for PropertyLoss instantiations
rg -A 5 "PropertyLoss\("
# Search for property-related configuration files
fd -e json -e yaml | xargs rg -l "property"
Length of output: 817
Script:
#!/bin/bash
# Let's check the actual content of these configuration files and any test files
# to understand the existing usage patterns
# Check property test configurations
cat source/tests/pt/property/input.json
cat examples/property/train/input_torch.json
# Look for test files that might show usage patterns
rg -l "test.*property" --type python
Length of output: 3144
Script:
#!/bin/bash
# Let's check test files with a more reliable pattern
rg -l "test.*property" --type-add 'py:*.py' --type py
# Also check for actual property loss test implementations
fd -e py | xargs rg -l "PropertyLoss"
# And look at the actual test implementations
fd -e py -p "test.*property.*\.py"
Length of output: 538
Script:
#!/bin/bash
# Let's examine the actual test implementations to verify usage patterns
cat source/tests/universal/pt/loss/test_loss.py
cat source/tests/pt/model/test_property_fitting.py
cat source/tests/consistent/fitting/test_property.py
# Also check the training implementation for any specific property handling
rg -A 5 "property.*loss" source/tests/pt/test_training.py
Length of output: 24056
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
deepmd/dpmodel/fitting/property_fitting.py (1)
78-78
: Document the newproperty_name
parameterThe new parameter needs documentation in the class docstring to explain its purpose and accepted values.
Add this to the class docstring parameters section:
Parameters ---------- + property_name : Union[str, list], optional + Name(s) of the property. Can be a single string or list of strings. Defaults to "property".
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deepmd/dpmodel/fitting/property_fitting.py
(2 hunks)deepmd/dpmodel/output_def.py
(3 hunks)deepmd/pt/utils/stat.py
(4 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/utils/stat.py
516-516: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
517-517: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
🔇 Additional comments (3)
deepmd/dpmodel/fitting/property_fitting.py (1)
94-94
: LGTM: Property name assignment
The property name is correctly assigned to the instance variable.
deepmd/pt/utils/stat.py (2)
512-528
: 🛠️ Refactor suggestion
Optimize property concatenation logic
The property concatenation logic can be improved for better efficiency and safety.
Apply this diff to optimize the code:
if "property" in atomic_output.var_defs:
concat_bias = []
concat_std = []
for ii in atomic_output.var_defs["property"].sub_var_name:
- assert ii in bias_atom_e.keys()
- assert ii in std_atom_e.keys()
+ if ii not in bias_atom_e or ii not in std_atom_e:
+ raise ValueError(f"Missing property {ii} in bias or std arrays")
+ if not concat_bias or concat_bias[-1].shape[0] == bias_atom_e[ii].shape[0]:
+ concat_bias.append(bias_atom_e[ii])
+ concat_std.append(std_atom_e[ii])
+ else:
+ raise ValueError(f"Inconsistent shapes in property arrays for {ii}")
- del bias_atom_e, std_atom_e
- bias_atom_e = {}
- std_atom_e = {}
+ bias_atom_e = {"property": np.concatenate(concat_bias, axis=-1)}
+ std_atom_e = {"property": np.tile(
+ np.concatenate(concat_std, axis=-1),
+ (bias_atom_e["property"].shape[0], 1)
+ )}
- bias_atom_e["property"] = np.concatenate(concat_bias, axis=-1)
- std_atom_e["property"] = np.concatenate(concat_std, axis=-1)
- std_atom_e["property"] = np.tile(
- std_atom_e["property"], (bias_atom_e["property"].shape[0], 1)
- )
The changes:
- Replace assertions with proper error handling
- Add shape validation
- Optimize dictionary operations
- Improve memory usage by avoiding intermediate variables
Likely invalid or redundant comment.
🧰 Tools
🪛 Ruff (0.8.2)
516-516: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
517-517: Use key in dict
instead of key in dict.keys()
Remove .keys()
(SIM118)
383-392
:
Fix generator expression evaluation
The condition using a generator expression is incorrect and won't evaluate as intended.
Apply this diff to fix the issue:
keys = (
["property"]
if (
"property" in atomic_output.var_defs
and (
- ii in keys for ii in atomic_output.var_defs["property"].sub_var_name
+ any(ii in keys for ii in atomic_output.var_defs["property"].sub_var_name)
)
)
else keys
)
Likely invalid or redundant comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
deepmd/dpmodel/output_def.py (2)
6-8
: Consider using modern Python type hintsFor better readability and to align with modern Python practices, consider using the union operator (
|
) instead ofUnion
if you're targeting Python 3.10+:-from typing import ( - Union, -)And update the type hint:
-sub_var_name: Union[list[str], str, None] = None, +sub_var_name: list[str] | str | None = None,
Line range hint
203-229
: Add tests for sub_var_name functionalityThe new
sub_var_name
parameter needs test coverage to ensure it works correctly with:
- String input
- List of strings input
- None input
- Invalid inputs (validation)
Would you like me to help create test cases for this new functionality?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
deepmd/dpmodel/output_def.py
(3 hunks)
🔇 Additional comments (2)
deepmd/dpmodel/output_def.py (2)
203-203
:
Add parameter documentation
The sub_var_name
parameter needs to be documented in the class docstring. Add the following to the Parameters section:
Parameters
----------
+ sub_var_name : Union[list[str], str, None], optional
+ List of sub-variable names or a single sub-variable name for the output variable.
+ If a string is provided, it will be converted to a single-item list.
+ Defaults to None.
Likely invalid or redundant comment.
227-229
:
Add type validation for sub_var_name
The current implementation lacks type validation, which could lead to runtime errors. Add validation to ensure the input is either a string, list of strings, or None:
if isinstance(sub_var_name, str):
sub_var_name = [sub_var_name]
+ if sub_var_name is not None and not (
+ isinstance(sub_var_name, list)
+ and all(isinstance(x, str) for x in sub_var_name)
+ ):
+ raise ValueError(
+ "sub_var_name must be a list of strings, a single string, or None"
+ )
self.sub_var_name = sub_var_name
Likely invalid or redundant comment.
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (2)
deepmd/pt/loss/property.py (1)
59-60
: Simplify kwargs.get() callsThe None default is redundant as it's the default behavior of dict.get().
-self.out_bias = kwargs.get("out_bias", None) -self.out_std = kwargs.get("out_std", None) +self.out_bias = kwargs.get("out_bias") +self.out_std = kwargs.get("out_std")🧰 Tools
🪛 Ruff (0.8.2)
59-59: Use
kwargs.get("out_bias")
instead ofkwargs.get("out_bias", None)
Replace
kwargs.get("out_bias", None)
withkwargs.get("out_bias")
(SIM910)
60-60: Use
kwargs.get("out_std")
instead ofkwargs.get("out_std", None)
Replace
kwargs.get("out_std", None)
withkwargs.get("out_std")
(SIM910)
source/tests/pt/property/input.json (1)
30-30
: Add comments to document property configurationConsider adding comments to explain the purpose and requirements of the new property configuration parameters.
"fitting_net": { "type": "property", "intensive": true, + "_comment": "property_name defines the name of the property to be fitted", "property_name": "band_property", "task_dim": 3, ... }, "loss": { "type": "property", + "_comment": "property_name and property_dim define the property to be fitted and its dimensionality", "property_name": "band_property", "property_dim": [3], ... }Also applies to: 52-53
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
checkpoint
(1 hunks)deepmd/pt/loss/property.py
(5 hunks)source/tests/pt/model/test_permutation.py
(1 hunks)source/tests/pt/property/double/type.raw
(1 hunks)source/tests/pt/property/double/type_map.raw
(1 hunks)source/tests/pt/property/input.json
(2 hunks)source/tests/pt/test_training.py
(1 hunks)source/tests/universal/dpmodel/loss/test_loss.py
(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- source/tests/pt/property/double/type.raw
- checkpoint
- source/tests/pt/property/double/type_map.raw
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/loss/property.py
59-59: Use kwargs.get("out_bias")
instead of kwargs.get("out_bias", None)
Replace kwargs.get("out_bias", None)
with kwargs.get("out_bias")
(SIM910)
60-60: Use kwargs.get("out_std")
instead of kwargs.get("out_std", None)
Replace kwargs.get("out_std", None)
with kwargs.get("out_std")
(SIM910)
🔇 Additional comments (5)
deepmd/pt/loss/property.py (3)
3-5
: LGTM: Type hints and new parameters are well-defined
The addition of Union type hints and new parameters property_name
and property_dim
improves the interface flexibility.
Also applies to: 30-31
86-98
: Improve error handling in property concatenation
The previous review comment about improving error handling is still valid. Consider implementing the suggested validation method for better robustness.
173-181
: LGTM: Label requirements are properly updated
The dynamic generation of label requirements based on property names is well-implemented.
source/tests/pt/test_training.py (1)
467-467
: Verify test data compatibility with the new property interface.
The change from "single" to "double" suggests testing with more complex property data. Ensure that the test data at the new path exists and is formatted correctly for the refactored property interface.
✅ Verification successful
Based on the output, I can see that the "property/double" directory exists and contains numerous NumPy array files with various shapes and data types. The test data appears to be well-structured with consistent formats, containing both float (f4, f8) and integer (i8) arrays of different dimensions that are commonly used in property testing.
The presence of arrays with shapes like (2, 441), (2, 9), (2, 1), etc., suggests that the data is properly formatted for the property interface, with paired inputs and outputs. The data types and dimensions are consistent with what would be expected for property testing, including feature vectors, labels, and property-related data.
Test data is properly structured and compatible with the property interface.
The change from "single" to "double" appears to be well-supported by the existing test data files, which show appropriate dimensionality and data types for property testing.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the existence and structure of property test data
# Check if the test data directory exists and examine its contents
fd -t d "property/double$"
fd -t f . -e json -e npy --exec head -n 1 {} \; 2>/dev/null
Length of output: 34323
source/tests/pt/model/test_permutation.py (1)
334-334
: Verify integration with property interface changes
The addition of property_name
should be verified against the broader property interface refactor.
Let's verify the property interface implementation:
✅ Verification successful
Let me gather more information about the property interface and its implementation.
Based on the gathered information, I can now provide the final response:
Property interface implementation is consistent
The band_property
usage in the test file is consistent with the broader property interface implementation. The codebase shows a well-structured property model system with:
- Proper inheritance hierarchy through
PropertyModel
,PropertyFittingNet
, and related classes - Consistent implementation across different backends (PyTorch, JAX)
- Test coverage for property-related functionality
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for property interface implementation and usage
# Check for property interface implementation
echo "Checking property interface implementation..."
rg -l "property_name.*band_property" --type py
# Check for property model usage
echo "Checking property model usage..."
ast-grep --pattern 'class $name implements PropertyModel {
$$$
}'
# Check for test coverage
echo "Checking test coverage..."
rg -l "test.*property" --type py
Length of output: 571
Script:
#!/bin/bash
# Check the actual implementation and usage of property_name and band_property
echo "Checking property_name implementation..."
rg -A 5 "property_name.*=.*band_property" --type py
echo "\nChecking band_property usage..."
rg -A 5 "band_property" --type py
echo "\nChecking property model definitions..."
rg -A 10 "class.*Property" --type py
Length of output: 21097
…-kit into refactor_property
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (3)
deepmd/infer/deep_property.py (2)
49-74
: Add error handling for edge casesWhile the implementation is correct, consider adding error handling for potential edge cases:
- Validate that get_var_name() returns a non-empty string
- Handle potential exceptions during output definition creation
def change_output_def(self) -> None: + var_name = self.get_var_name() + if not var_name: + raise ValueError("Variable name cannot be empty") + try: self.output_def = ModelOutputDef( FittingOutputDef( [ OutputVariableDef( - self.get_var_name(), + var_name, shape=[self.get_task_dim()], reducible=True, atomic=True, intensive=self.get_intensive(), ), ] ) ) + except Exception as e: + raise RuntimeError(f"Failed to create output definition: {e}")
139-144
: Add shape validation before reshapingConsider adding shape validation before reshaping to provide clearer error messages:
+ expected_shape = (-1, -1, self.get_task_dim()) + if atomic_property.ndim != 3 or atomic_property.shape[2] != expected_shape[2]: + raise ValueError( + f"Expected atomic_property shape (*,*,{self.get_task_dim()}), " + f"got {atomic_property.shape}" + ) atomic_property = results[self.get_var_name()].reshape( nframes, natoms, self.get_task_dim() )deepmd/pt/loss/property.py (1)
Line range hint
129-181
: Standardize reduction methods across loss calculationsThe code uses different reduction methods (
sum
vsmean
) across different loss calculations, which could affect loss scaling. Consider standardizing tomean
reduction for consistency:- reduction="sum", + reduction="mean",This change should be applied consistently across all loss calculations to ensure proper scaling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
deepmd/dpmodel/fitting/property_fitting.py
(5 hunks)deepmd/entrypoints/test.py
(4 hunks)deepmd/infer/deep_eval.py
(1 hunks)deepmd/infer/deep_property.py
(3 hunks)deepmd/pt/infer/deep_eval.py
(2 hunks)deepmd/pt/loss/property.py
(6 hunks)deepmd/pt/model/model/property_model.py
(4 hunks)deepmd/pt/model/task/property.py
(5 hunks)deepmd/pt/train/training.py
(1 hunks)source/tests/consistent/fitting/test_property.py
(5 hunks)source/tests/pt/model/test_property_fitting.py
(11 hunks)source/tests/pt/test_dp_test.py
(1 hunks)source/tests/universal/dpmodel/loss/test_loss.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (9)
- source/tests/pt/test_dp_test.py
- source/tests/universal/dpmodel/loss/test_loss.py
- source/tests/consistent/fitting/test_property.py
- deepmd/entrypoints/test.py
- deepmd/infer/deep_eval.py
- deepmd/pt/infer/deep_eval.py
- deepmd/pt/model/task/property.py
- source/tests/pt/model/test_property_fitting.py
- deepmd/dpmodel/fitting/property_fitting.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/loss/property.py
29-29: Do not use mutable data structures for argument defaults
Replace with None
; initialize within function
(B006)
🔇 Additional comments (6)
deepmd/pt/model/model/property_model.py (3)
40-41
: LGTM: Dynamic property name handling implemented correctly
The change from static "property" keys to dynamic keys using get_var_name()
improves flexibility while maintaining consistency.
65-66
: LGTM: Consistent dynamic property handling in forward method
The changes maintain consistency with the output definition changes, properly using dynamic property names.
81-84
: LGTM: Well-documented getter method
The get_var_name
method is clearly documented and provides essential functionality for dynamic property handling.
deepmd/infer/deep_property.py (1)
41-48
: LGTM: Clear documentation of design choice
The placeholder implementation with detailed documentation clearly explains why the output definition is deferred until model loading.
deepmd/pt/loss/property.py (1)
97-99
:
Add validation for natoms to prevent division by zero
When dividing by natoms for non-intensive properties, add validation:
if not self.intensive:
+ if torch.any(natoms == 0):
+ raise ValueError("natoms contains zero values, which would cause division by zero")
model_pred[var_name] = model_pred[var_name] / natoms
label[var_name] = label[var_name] / natoms
Likely invalid or redundant comment.
deepmd/pt/train/training.py (1)
1243-1247
: LGTM! Verify model interface implementation.
The addition of var_name
and intensive
parameters enhances the property loss configuration, aligning with the PR's goal of flexible property fitting. The changes look correct and well-structured.
Let's verify that the required model interface methods are implemented:
✅ Verification successful
Based on the search results, I can see that both get_var_name()
and get_intensive()
methods are properly implemented in multiple model classes, particularly in the property model classes that are relevant to this PR:
- In
deepmd/pt/model/model/property_model.py
:
def get_var_name(self) -> str:
"""Get the name of the property."""
return self.get_fitting_net().var_name
def get_intensive(self) -> bool:
"""Get whether the property is intensive."""
return self.model_output_def()[self.get_var_name()].intensive
- In
deepmd/pt/model/task/property.py
:
def get_intensive(self) -> bool:
"""Whether the fitting property is intensive."""
return self.intensive
The methods are properly implemented in the model hierarchy and provide the required functionality for property fitting. Therefore, I can now provide the final response:
LGTM! Required model interface methods are properly implemented.
The get_var_name()
and get_intensive()
methods are correctly implemented in the property model classes, providing the necessary interface for property loss configuration. The implementation properly propagates the property name and type information through the model hierarchy.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check if the model interface methods are implemented
# Test: Search for the getter method implementations
ast-grep --pattern 'def get_var_name($$$) {
$$$
}'
ast-grep --pattern 'def get_intensive($$$) {
$$$
}'
Length of output: 110
Script:
#!/bin/bash
# Let's try a different approach using ripgrep to search for these method definitions
# The methods might be defined with different indentation or style
# Search for get_var_name method
echo "Searching for get_var_name:"
rg "def get_var_name" -A 5
# Search for get_intensive method
echo -e "\nSearching for get_intensive:"
rg "def get_intensive" -A 5
# Also search for class definitions that might contain these methods
echo -e "\nSearching for relevant model classes:"
rg "class.*Model" -A 10
Length of output: 66402
…-kit into refactor_property
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deepmd/infer/deep_eval.py
(1 hunks)deepmd/pt/model/model/property_model.py
(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/infer/deep_eval.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/model/model/property_model.py
84-84: Undefined name obj
(F821)
84-84: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
84-84: Undefined name obj
(F821)
🔇 Additional comments (5)
deepmd/pt/model/model/property_model.py (5)
40-41
: LGTM: Dynamic property naming implementation looks good!
The changes correctly implement dynamic property naming by using get_var_name()
to construct output definition keys, which aligns with the PR's objective of supporting flexible property names.
65-66
: LGTM: Consistent implementation of dynamic property naming!
The changes maintain consistency with the translated_output_def
method and correctly implement dynamic property naming in the forward pass.
79-79
: LGTM: Correctly adapted intensive property access!
The method properly uses the dynamic property name to access the intensive flag from the model output definition.
113-114
: LGTM: Consistent implementation in lower-level forward pass!
The changes maintain consistency with other methods in implementing dynamic property naming for the lower-level forward pass.
Line range hint 1-117
: Verify integration with normalization features
While the dynamic property naming is well implemented, let's verify the integration with the normalization features mentioned in the PR objectives.
✅ Verification successful
Let me search more specifically for normalization in the property model implementation.
Let me check one more specific aspect about the property model's normalization implementation.
Property model correctly integrates with normalization features
Based on the verification results, the property model properly integrates with normalization features through:
-
Proper handling of statistical normalization in
DPPropertyAtomicModel
:- Uses
out_bias
andout_std
for normalization - Applies normalization through
ret[kk] = ret[kk] * out_std[kk][0] + out_bias[kk][0]
- Uses
-
Support for property-specific statistics:
- Handles bias and standard deviation per property type
- Properly manages intensive vs. extensive properties via
model_output_def()
-
Integration with the fitting framework:
- Supports type-specific normalization through
type_map
- Handles vacuum contribution removal when needed
- Properly manages statistical parameters through the fitting network
- Supports type-specific normalization through
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for normalization-related code to ensure proper integration
echo "Searching for normalization implementation..."
rg -l "mean|std|norm" --type py
# Look for statistical calculations
echo "Checking for statistical calculations..."
ast-grep --pattern 'def $_(self, $$$) {
$$$
mean = $$$
std = $$$
$$$
}'
Length of output: 11730
Script:
#!/bin/bash
# Check for normalization in property model and related files
echo "Searching for normalization in property model..."
rg -A 5 "norm|mean|std" deepmd/pt/model/atomic_model/property_atomic_model.py deepmd/pt/model/model/property_model.py
# Look for statistical handling in property fitting
echo "Checking property fitting implementation..."
rg -A 5 "fitting|stat" deepmd/pt/model/task/fitting.py
Length of output: 4814
Script:
#!/bin/bash
# Check for output definition and normalization handling
echo "Checking output definition handling..."
rg -A 10 "model_output_def" deepmd/pt/model/model/property_model.py
# Look for base implementation of normalization
echo "Checking base model implementation..."
ast-grep --pattern 'class DPPropertyAtomicModel($$$) {
$$$
}'
Length of output: 1084
🧰 Tools
🪛 Ruff (0.8.2)
84-84: Undefined name obj
(F821)
84-84: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
84-84: Undefined name obj
(F821)
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
deepmd/pt/infer/deep_eval.py (1)
187-195
: Simplify attribute check in get_var_nameThe current implementation uses unnecessary getattr. Simplify the check.
Apply this change:
def get_var_name(self) -> str: """Get the name of the property.""" - if hasattr(self.dp.model["Default"], "get_var_name") and callable( - getattr(self.dp.model["Default"], "get_var_name") - ): + if hasattr(self.dp.model["Default"], "get_var_name"): return self.dp.model["Default"].get_var_name() - else: - raise NotImplementedError + raise NotImplementedError🧰 Tools
🪛 Ruff (0.8.2)
190-190: Do not call
getattr
with a constant attribute value. It is not any safer than normal property access.Replace
getattr
with attribute access(B009)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
deepmd/dpmodel/fitting/property_fitting.py
(5 hunks)deepmd/pt/infer/deep_eval.py
(2 hunks)deepmd/pt/model/model/property_model.py
(4 hunks)deepmd/pt/model/task/property.py
(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deepmd/dpmodel/fitting/property_fitting.py
🧰 Additional context used
🪛 Ruff (0.8.2)
deepmd/pt/infer/deep_eval.py
190-190: Do not call getattr
with a constant attribute value. It is not any safer than normal property access.
Replace getattr
with attribute access
(B009)
🔇 Additional comments (8)
deepmd/pt/model/model/property_model.py (4)
40-41
: LGTM: Dynamic property name handling in output definition
The changes correctly use get_var_name() to dynamically construct output keys, improving flexibility.
65-66
: LGTM: Consistent property name handling in forward method
The forward method properly uses get_var_name() to maintain consistency with the output definition.
79-80
: LGTM: Property intensity check uses dynamic property name
The get_intensive method correctly uses get_var_name() to check the property's intensive status.
81-84
: LGTM: Clean implementation of get_var_name
Simple and straightforward implementation that delegates to the fitting network.
deepmd/pt/model/task/property.py (3)
128-130
: LGTM: Clear implementation of get_intensive
The method provides a clean way to check if the property is intensive.
46-49
:
Add validation for property_name parameter
The property_name parameter needs validation to prevent security issues when loading files.
Add this validation at the beginning of init:
def __init__(
self,
ntypes: int,
dim_descrpt: int,
property_name: str,
task_dim: int = 1,
...
) -> None:
+ if not isinstance(property_name, str) or not property_name.strip():
+ raise ValueError("property_name must be a non-empty string")
+ if any(c in property_name for c in '/\\:*?"<>|') or '..' in property_name:
+ raise ValueError("property_name contains invalid characters")
self.task_dim = task_dim
Likely invalid or redundant comment.
135-135
: Verify version compatibility changes
The version has been bumped from 3 to 4 and serialization format updated.
Run this script to check version compatibility:
Also applies to: 137-137, 148-148, 150-150
✅ Verification successful
Version compatibility changes are safe
Based on the codebase analysis, the version bump from 3 to 4 in deepmd/pt/model/task/property.py
is a safe change:
- The version compatibility check is properly implemented using
check_version_compatibility()
function that ensures backward compatibility - The version bump is consistently applied across related files:
deepmd/pt/model/task/property.py
(version 4)deepmd/dpmodel/fitting/property_fitting.py
(version 4)deepmd/pt/model/task/polarizability.py
(version 4)deepmd/dpmodel/fitting/polarizability_fitting.py
(version 4)
- No existing models with older versions (1-3) were found in the codebase that would be affected by this change
- The version bump follows the established pattern used throughout the codebase for managing model versioning
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any existing models that might break with the new version
rg -l '"@version":\s*[123]' --type json
Length of output: 41
Script:
#!/bin/bash
# Let's check for version compatibility checks in the codebase
rg -A 2 "check_version_compatibility"
# Also search for version-related constants or variables
rg "SUPPORTED_VERSION|MODEL_VERSION|@version"
Length of output: 66887
deepmd/pt/infer/deep_eval.py (1)
212-212
: LGTM: Model type check uses dynamic property name
The model_type property correctly uses get_var_name() for consistency.
Delete `fparam` in doc train-fitting-property.md Signed-off-by: Chenqqian Zhang <[email protected]>
Change json format Signed-off-by: Chenqqian Zhang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (6)
doc/model/train-fitting-property.md (6)
7-8
: Enhance the introduction with specific examplesConsider expanding the introduction with concrete examples of properties that have been successfully fitted using this model, along with typical use cases. This would help users better understand the model's capabilities and limitations.
25-35
: Fix JSON formatting and enhance configuration example
- Replace hard tabs with spaces for consistent formatting.
- Consider adding validation rules for each parameter (e.g., valid ranges for
task_dim
, required neuron array length)."fitting_net" : { - "type": "property", - "intensive": true, - "property_name": "band_prop", - "task_dim": 3, - "neuron": [240,240,240], - "resnet_dt": true, - "seed": 1, + "type": "property", + "intensive": true, + "property_name": "band_prop", + "task_dim": 3, + "neuron": [240, 240, 240], + "resnet_dt": true, + "seed": 1, },🧰 Tools
🪛 Markdownlint (0.37.0)
27-27: Column: 1
Hard tabs(MD010, no-hard-tabs)
31-31: Column: 1
Hard tabs(MD010, no-hard-tabs)
32-32: Column: 1
Hard tabs(MD010, no-hard-tabs)
33-33: Column: 1
Hard tabs(MD010, no-hard-tabs)
49-55
: Fix JSON formatting and enhance loss function documentation
- Replace hard tabs with spaces.
- Consider adding examples and explanations for each loss function, particularly the differences between 'smooth_mae' and 'mae'.
"loss" : { - "type": "property", - "metric": ["mae"], - "loss_func": "smooth_mae" + "type": "property", + "metric": ["mae"], + "loss_func": "smooth_mae" },🧰 Tools
🪛 Markdownlint (0.37.0)
51-51: Column: 1
Hard tabs(MD010, no-hard-tabs)
52-52: Column: 1
Hard tabs(MD010, no-hard-tabs)
53-53: Column: 1
Hard tabs(MD010, no-hard-tabs)
67-115
: Enhance data preparation example
- Add error handling examples for common scenarios (e.g., missing files, incorrect dimensions).
- Consider adding validation for property values.
- Include examples of handling multiple properties simultaneously.
Here's an enhanced version of the code with error handling:
import dpdata import numpy as np from dpdata.data_type import Axis, DataType def register_property(property_name: str, task_dim: int): try: # Validate inputs if not isinstance(task_dim, int) or task_dim <= 0: raise ValueError(f"task_dim must be a positive integer, got {task_dim}") datatypes = [ DataType( property_name, np.ndarray, shape=(Axis.NFRAMES, task_dim), required=False, ), ] # ... rest of the registration code ... return True except Exception as e: print(f"Error registering property: {e}") return False def prepare_training_data(poscar_path: str, property_values: np.ndarray): try: # Validate property values if property_values.shape[1] != task_dim: raise ValueError( f"Property values must have shape (n, {task_dim}), " f"got {property_values.shape}" ) ls = dpdata.MultiSystems() frame = dpdata.System(poscar_path, fmt="vasp/poscar") # ... rest of the preparation code ... return ls except FileNotFoundError: print(f"POSCAR file not found: {poscar_path}") except Exception as e: print(f"Error preparing training data: {e}") return None
135-149
: Add language specification to code blocks and enhance output examples
- Add language specification to the output example:
-``` +```text # step mae_val mae_trn lr
- Consider adding examples of:
- How to interpret the metrics
- Common convergence patterns
- Warning signs of overfitting/underfitting
🧰 Tools
🪛 Markdownlint (0.37.0)
135-135: null
Fenced code blocks should have a language specified(MD040, fenced-code-language)
185-194
: Enhance data normalization documentationConsider adding:
- Practical examples showing how normalization affects the training process
- Guidelines for when normalization is particularly important
- Code examples showing how to access and interpret the normalization parameters
Example addition:
# Example of accessing normalization parameters import numpy as np def analyze_normalization(property_values): mean = np.mean(property_values) std = np.std(property_values) print(f"Property statistics:") print(f"- Mean: {mean:.6f}") print(f"- Std: {std:.6f}") print(f"- Range: [{np.min(property_values):.6f}, {np.max(property_values):.6f}]") # Guidelines if std < 1e-6: print("Warning: Very small standard deviation might cause numerical issues") if abs(mean) > 1e3: print("Warning: Large mean values might need pre-normalization")🧰 Tools
🪛 LanguageTool
[uncategorized] ~193-~193: Loose punctuation mark.
Context: ...(D_i))= MLP(D_i) * std+ mean$$ 1.std
: The standard deviation of the property ...(UNLIKELY_OPENING_PUNCTUATION)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
doc/model/train-fitting-property.md
(1 hunks)
🧰 Additional context used
🪛 LanguageTool
doc/model/train-fitting-property.md
[uncategorized] ~58-~58: Loose punctuation mark.
Context: ...distinction from ener
mode. - metric
: The metric for display, which will be p...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~59-~59: Loose punctuation mark.
Context: ..., 'mae', 'mse' and 'rmse'. - loss_func
: The loss function to minimize, you can ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~193-~193: Loose punctuation mark.
Context: ...(D_i))= MLP(D_i) * std+ mean$$ 1. std
: The standard deviation of the property ...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 Markdownlint (0.37.0)
doc/model/train-fitting-property.md
27-27: Column: 1
Hard tabs
(MD010, no-hard-tabs)
31-31: Column: 1
Hard tabs
(MD010, no-hard-tabs)
32-32: Column: 1
Hard tabs
(MD010, no-hard-tabs)
33-33: Column: 1
Hard tabs
(MD010, no-hard-tabs)
51-51: Column: 1
Hard tabs
(MD010, no-hard-tabs)
52-52: Column: 1
Hard tabs
(MD010, no-hard-tabs)
53-53: Column: 1
Hard tabs
(MD010, no-hard-tabs)
135-135: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
172-172: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
178-178: null
Fenced code blocks should have a language specified
(MD040, fenced-code-language)
Two main changes:
npy
format with any prefix(tc.npy
,band_gap.npy
...). One just needs to write the name of the property and the corresponding dimension inmodel/fitting
ininput.json
.fitting_net
is multiplied by the standard deviation and then added to the mean to get the output of thePropertyModel
, and when calculating the loss, the loss is again normalised.Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
Tests
Chores