Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(pt): DPA-2 repinit compress #4329

Merged
merged 3 commits into from
Nov 10, 2024
Merged

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Nov 9, 2024

Summary by CodeRabbit

  • New Features

    • Introduced model compression functionality in the descriptor class, allowing users to enable compression with specific parameters.
    • Enhanced handling of serialized data for different descriptor types, improving flexibility and efficiency.
  • Tests

    • Added a new test suite for the descriptor class, ensuring robust testing of functionality with various configurations and floating-point precisions.

Copy link
Contributor

coderabbitai bot commented Nov 9, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request introduce a new method enable_compression in the DescrptDPA2 class, which allows for model compression under specific conditions. Additionally, modifications are made to the DPTabulate class to accommodate serialized data for DescrptDPA2. A new test file is created to validate the functionality of the DescrptDPA2 class, focusing on its behavior with and without compression. These updates enhance the descriptor's capabilities and testing coverage.

Changes

File Change Summary
deepmd/pt/model/descriptor/dpa2.py Added method enable_compression to DescrptDPA2 class for model compression.
deepmd/pt/utils/tabulate.py Updated DPTabulate class to handle serialized data for DescrptDPA2.
source/tests/pt/model/test_compressed_descriptor_dpa2.py Added new test class TestDescriptorDPA2 and function eval_pt_descriptor for testing compression functionality.

Possibly related PRs

  • Add 4 pt descriptor compression #4227: The addition of the enable_compression method in the DescrptDPA1 class parallels the new method in DescrptDPA2, indicating shared functionality for model compression.
  • Enable Hybrid Descriptor to be compressed #4297: The introduction of the enable_compression method in the DescrptHybrid class expands the compression capabilities within the descriptor framework, similar to the changes made in DescrptDPA2.
  • Add compression API to BaseModel and AtomicModel #4298: The addition of the enable_compression method to the DPAtomicModel class indicates a broader implementation of compression features across different model types, including those that may interact with DescrptDPA2.
  • Add pt compress commad line #4300: The new command line option for model compression enhances the usability of the compression feature introduced in DescrptDPA2.
  • fix(dp/pt): support auto sel for dpa2 #4323: Modifications to the update_sel method in DescrptDPA2 to support three-body interactions may relate to how compression is handled, as both functionalities aim to enhance the descriptor's capabilities.

Suggested labels

OP, Docs

Suggested reviewers

  • wanghan-iapcm

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (3)
source/tests/pt/model/test_compressed_descriptor_dpa2.py (2)

72-100: Document the test data structure.

The hardcoded coordinates and box values would benefit from documentation explaining their significance and structure (e.g., what molecular system they represent).

Add a comment explaining the test system:

+        # Test system configuration:
+        # - 6 atoms in total (2 of type 0, 4 of type 1)
+        # - Cubic box of size 13.0
         self.coords = np.array([

122-146: Consider adding more test cases for compression.

While the current test verifies basic compression functionality, consider adding tests for:

  • Different compression ratios
  • Edge cases (e.g., very small or large compression ratios)
  • Error cases (e.g., invalid compression values)

Example additional test:

def test_compression_invalid_ratio(self):
    with self.assertRaises(ValueError):
        self.descriptor.enable_compression(-1.0)
    with self.assertRaises(ValueError):
        self.descriptor.enable_compression(1.5)
deepmd/pt/model/descriptor/dpa2.py (1)

890-890: Typo in comment: 'mocel' should be 'model'

There's a typographical error in the inline comment. The word 'mocel' should be corrected to 'model'.

Apply this diff to fix the typo:

-        # do some checks before the mocel compression process
+        # do some checks before the model compression process
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 0c5ab07 and 22d0192.

📒 Files selected for processing (3)
  • deepmd/pt/model/descriptor/dpa2.py (2 hunks)
  • deepmd/pt/utils/tabulate.py (2 hunks)
  • source/tests/pt/model/test_compressed_descriptor_dpa2.py (1 hunks)
🔇 Additional comments (3)
source/tests/pt/model/test_compressed_descriptor_dpa2.py (1)

1-50: LGTM! Well-structured helper function with proper type hints.

The helper function eval_pt_descriptor is well-implemented with:

  • Clear type hints and parameters
  • Proper device handling for PyTorch tensors
  • Logical workflow for descriptor evaluation
deepmd/pt/utils/tabulate.py (2)

297-303: LGTM! Clean implementation of descriptor type handling.

The changes elegantly extend support for DescrptDPA2 while maintaining consistency with DescrptDPA1. The implementation follows best practices by using isinstance with a tuple for multiple type checks.


98-103: Verify compression functionality for DescrptDPA2.

Since these changes enable compression for DescrptDPA2, it would be valuable to verify:

  1. The compression process works as expected
  2. The compressed model produces consistent results
✅ Verification successful

Compression functionality for DescrptDPA2 is properly tested

The test file test_compressed_descriptor_dpa2.py contains comprehensive tests that:

  1. Verify the model's output shape remains consistent after compression
  2. Validate that compressed model results match the uncompressed results within acceptable tolerance
  3. Test both float32 and float64 precision levels
  4. Include proper test setup with realistic input data

The implementation includes proper error handling and precision-specific tolerances, ensuring reliable compression functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Search for tests that verify DescrptDPA2 compression functionality

# Look for test files related to DescrptDPA2
echo "Searching for DescrptDPA2 test files..."
fd -e py "test.*dpa2.*" source/tests/

# Look for compression-related test cases
echo "Searching for compression test cases..."
rg -l "compress.*dpa2|dpa2.*compress" source/tests/

Length of output: 509


Script:

#!/bin/bash
# Let's examine the content of the compression test file
echo "Checking compression test content..."
cat source/tests/pt/model/test_compressed_descriptor_dpa2.py

# Let's also check if there are any compression-related tests in the descriptor test file
echo -e "\nChecking descriptor test content..."
rg "compress" source/tests/pt/model/test_descriptor_dpa2.py -A 5

# Look for actual compression implementation
echo -e "\nChecking compression implementation..."
rg "compress.*dpa2|dpa2.*compress" -A 5 --type py

Length of output: 4803

deepmd/pt/utils/tabulate.py Show resolved Hide resolved
deepmd/pt/model/descriptor/dpa2.py Outdated Show resolved Hide resolved
deepmd/pt/model/descriptor/dpa2.py Outdated Show resolved Hide resolved
deepmd/pt/model/descriptor/dpa2.py Show resolved Hide resolved
deepmd/pt/model/descriptor/dpa2.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Nov 9, 2024

Codecov Report

Attention: Patch coverage is 79.31034% with 6 lines in your changes missing coverage. Please review.

Project coverage is 84.45%. Comparing base (0c5ab07) to head (51e8696).
Report is 5 commits behind head on devel.

Files with missing lines Patch % Lines
deepmd/pt/model/descriptor/dpa2.py 71.42% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##            devel    #4329      +/-   ##
==========================================
- Coverage   84.60%   84.45%   -0.16%     
==========================================
  Files         571      571              
  Lines       53163    53187      +24     
  Branches     3059     3059              
==========================================
- Hits        44981    44918      -63     
- Misses       7218     7306      +88     
+ Partials      964      963       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Jinzhe Zeng <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
deepmd/pt/model/descriptor/dpa2.py (3)

876-876: Fix typo in docstring: 'statisitcs' → 'statistics'

The docstring contains a spelling error.

-        """Receive the statisitcs (distance, max_nbor_size and env_mat_range) of the training data.
+        """Receive the statistics (distance, max_nbor_size and env_mat_range) of the training data.

868-875: Add return type hint to method signature

The method signature should include a return type hint for better type safety.

-    def enable_compression(
+    def enable_compression(
         self,
         min_nbor_dist: float,
         table_extrapolate: float = 5,
         table_stride_1: float = 0.01,
         table_stride_2: float = 0.1,
         check_frequency: int = -1,
-    ) -> None:
+    ) -> None:

935-940: Consider using a dictionary for table configuration

Using a list for configuration makes the code less maintainable and more prone to errors. Consider using a dictionary instead.

-        self.table_config = [
-            table_extrapolate,
-            table_stride_1,
-            table_stride_2,
-            check_frequency,
-        ]
+        self.table_config = {
+            "extrapolate": table_extrapolate,
+            "stride_1": table_stride_1,
+            "stride_2": table_stride_2,
+            "check_frequency": check_frequency,
+        }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 22d0192 and fe28429.

📒 Files selected for processing (1)
  • deepmd/pt/model/descriptor/dpa2.py (3 hunks)
🔇 Additional comments (3)
deepmd/pt/model/descriptor/dpa2.py (3)

34-36: LGTM: New imports are properly organized.

The additions of DPTabulate and ActivationFn imports are well-organized and follow the project's import style.

Also applies to: 41-42


313-313: LGTM: Proper initialization of compression flag.

The compress flag is correctly initialized to False in __init__, addressing potential AttributeError concerns.


922-924: ⚠️ Potential issue

Fix inconsistent error message

The error message doesn't match the condition being checked. The condition checks if tebd_input_mode != "strip", but the error message refers to "concat".

-            raise RuntimeError(
-                "Cannot compress model when repinit tebd_input_mode == 'concat'"
-            )
+            raise RuntimeError(
+                "Cannot compress model when repinit tebd_input_mode != 'strip'"
+            )

Likely invalid or redundant comment.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Jinzhe Zeng <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
deepmd/pt/model/descriptor/dpa2.py (2)

868-875: Add return type hint to method signature.

The method signature should include a return type hint for better type safety.

    def enable_compression(
        self,
        min_nbor_dist: float,
        table_extrapolate: float = 5,
        table_stride_1: float = 0.01,
        table_stride_2: float = 0.1,
        check_frequency: int = -1,
-    ) -> None:
+    ) -> None:  # noqa: D202

891-925: Consider refactoring validation checks into separate methods.

The pre-compression validation logic is complex and could be modularized for better maintainability. Consider extracting the checks into separate validation methods.

+ def _validate_compression_state(self) -> None:
+     if self.compress:
+         raise ValueError("Compression is already enabled.")

+ def _validate_repinit_settings(self) -> None:
+     if self.repinit.resnet_dt:
+         raise RuntimeError("Model compression error: repinit resnet_dt must be false!")
+     if self.repinit.attn_layer != 0:
+         raise RuntimeError("Cannot compress model when repinit attention layer is not 0.")
+     if self.repinit.tebd_input_mode != "strip":
+         raise RuntimeError("Cannot compress model when repinit tebd_input_mode != 'strip'")

+ def _validate_exclude_types(self) -> None:
+     for tt in self.repinit.exclude_types:
+         if (tt[0] not in range(self.repinit.ntypes)) or (tt[1] not in range(self.repinit.ntypes)):
+             raise RuntimeError(
+                 f"Repinit exclude types {tt} must within the number of atomic types {self.repinit.ntypes}!"
+             )
+     if self.repinit.ntypes * self.repinit.ntypes - len(self.repinit.exclude_types) == 0:
+         raise RuntimeError("Repinit empty embedding-nets are not supported in model compression!")

    def enable_compression(
        self,
        min_nbor_dist: float,
        table_extrapolate: float = 5,
        table_stride_1: float = 0.01,
        table_stride_2: float = 0.1,
        check_frequency: int = -1,
    ) -> None:
-        # do some checks before the mocel compression process
-        if self.compress:
-            raise ValueError("Compression is already enabled.")
-        assert (
-            not self.repinit.resnet_dt
-        ), "Model compression error: repinit resnet_dt must be false!"
-        for tt in self.repinit.exclude_types:
-            if (tt[0] not in range(self.repinit.ntypes)) or (
-                tt[1] not in range(self.repinit.ntypes)
-            ):
-                raise RuntimeError(
-                    "Repinit exclude types"
-                    + str(tt)
-                    + " must within the number of atomic types "
-                    + str(self.repinit.ntypes)
-                    + "!"
-                )
-        if (
-            self.repinit.ntypes * self.repinit.ntypes - len(self.repinit.exclude_types)
-            == 0
-        ):
-            raise RuntimeError(
-                "Repinit empty embedding-nets are not supported in model compression!"
-            )
-
-        if self.repinit.attn_layer != 0:
-            raise RuntimeError(
-                "Cannot compress model when repinit attention layer is not 0."
-            )
-
-        if self.repinit.tebd_input_mode != "strip":
-            raise RuntimeError(
-                "Cannot compress model when repinit tebd_input_mode == 'concat'"
-            )
+        self._validate_compression_state()
+        self._validate_repinit_settings()
+        self._validate_exclude_types()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between fe28429 and 51e8696.

📒 Files selected for processing (1)
  • deepmd/pt/model/descriptor/dpa2.py (3 hunks)
🔇 Additional comments (5)
deepmd/pt/model/descriptor/dpa2.py (5)

34-36: LGTM: New imports are properly organized.

The added imports for DPTabulate and ActivationFn are correctly placed and necessary for the new compression functionality.

Also applies to: 41-41


313-313: LGTM: Properly initialized compression flag.

The compress attribute is correctly initialized with a default value of False, addressing the potential AttributeError issue mentioned in past review comments.


876-876: Fix typo in docstring.

The word "statistics" is misspelled.


926-948: LGTM: Compression setup looks correct.

The compression setup logic is well-structured:

  1. Serializes the model data
  2. Initializes the tabulation with proper parameters
  3. Builds the tables with provided configuration
  4. Enables compression on the repinit module

921-924: ⚠️ Potential issue

Fix inconsistent error message.

The error message doesn't match the condition being checked. When tebd_input_mode != "strip", the error message incorrectly states it's "concat".

@njzjz njzjz added this pull request to the merge queue Nov 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Nov 9, 2024
@njzjz njzjz added this pull request to the merge queue Nov 10, 2024
Merged via the queue into deepmodeling:devel with commit 023bb9c Nov 10, 2024
60 checks passed
@njzjz njzjz deleted the dpa2-repinit-compress branch November 10, 2024 04:15
This was referenced Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants