Model: Device-id and data-parallel inference in CLI and Torch #452

michaelfeil · 2024-11-04T17:35:43Z

Description

Please provide a clear and concise description of the changes in this PR.

Related Issue

If applicable, link the issue this PR addresses.

Types of Change

Bug fix
[ x] New feature
Documentation update

Checklist

I have read the CONTRIBUTING guidelines.
My code follows the code style of this project.
I have added tests to cover my changes.
All new and existing tests passed.
My changes generate no new warnings.
I have updated the documentation accordingly.

Additional Notes

Add any other context about the PR here.

License

By submitting this PR, I confirm that my contribution is made under the terms of the MIT license.

greptile-apps

PR Summary

Based on my analysis of the pull request, here is a concise summary of the key changes:

Added device-id and data-parallel inference capabilities to enable running models across multiple GPUs/devices:

Added new --device-id CLI option that accepts comma-separated device IDs (e.g. "0,1") for model placement across multiple GPUs/devices
Introduced LoadingStrategy class to manage device mapping, dtype configuration, and quantization settings across different hardware
Modified BatchHandler to support multiple model replicas running in parallel across specified devices
Updated test suite with retry logic and adjusted tolerance parameters to handle numerical differences from parallel processing
Added proper error handling for device validation and unavailable hardware configurations

The changes enable better scaling and performance through parallel inference while maintaining the existing API interface.

_{23 file(s) reviewed, 32 comment(s)}
_{Edit PR Review Bot Settings | Greptile}

greptile-apps · 2024-11-04T17:36:29Z

docs/docs/cli_v2.md

+│ --device-id                                            TEXT                           device id defines the model    │
+│                                                                                       placement. e.g. `0,1` will     │
+│                                                                                       place the model on             │
+│                                                                                       MPS/CUDA/GPU 0 and 1 each      │


style: The phrase 'each' at the end of this line is ambiguous - does it mean the model is replicated on each device or split across devices?

libs/infinity_emb/infinity_emb/args.py

greptile-apps · 2024-11-04T17:37:20Z

libs/infinity_emb/infinity_emb/args.py

+    def update_loading_stategy(self):
+        """Assign a device id to the EngineArgs object."""
+        from infinity_emb.inference import loading_strategy  # type: ignore


syntax: update_loading_stategy() has a typo in its name (should be 'strategy')

greptile-apps · 2024-11-04T17:37:21Z

libs/infinity_emb/infinity_emb/args.py

+        if self._loading_strategy is None:
+            self.update_loading_stategy()
+        elif isinstance(self._loading_strategy, dict):
+            object.__setattr__(self, "_loading_strategy", LoadingStrategy(**self._loading_strategy))


style: loading strategy initialization should happen before pydantic validation to ensure the complete object is validated

greptile-apps · 2024-11-04T17:37:22Z

libs/infinity_emb/infinity_emb/args.py

@@ -61,6 +67,8 @@ class EngineArgs:
    embedding_dtype: EmbeddingDtype = EmbeddingDtype[MANAGER.embedding_dtype[0]]
    served_model_name: str = MANAGER.served_model_name[0]

+    _loading_strategy: Optional[LoadingStrategy] = None


style: _loading_strategy should be documented with a type hint comment explaining its purpose and structure

libs/infinity_emb/tests/end_to_end/test_openapi_client_compat.py

libs/infinity_emb/tests/end_to_end/test_sentence_transformers.py

libs/infinity_emb/tests/end_to_end/test_torch_vision.py

libs/infinity_emb/tests/unit_test/test_engine.py

codecov-commenter · 2024-11-05T18:32:45Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 80.12821% with 31 lines in your changes missing coverage. Please review.

Project coverage is 79.15%. Comparing base (a178460) to head (e4bb6a4).
Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...ity_emb/infinity_emb/inference/loading_strategy.py	59.67%	25 Missing ⚠️
...infinity_emb/transformer/quantization/interface.py	50.00%	3 Missing ⚠️
...y_emb/transformer/embedder/sentence_transformer.py	71.42%	2 Missing ⚠️
libs/infinity_emb/infinity_emb/primitives.py	95.45%	1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #452      +/-   ##
==========================================
- Coverage   79.18%   79.15%   -0.04%     
==========================================
  Files          41       42       +1     
  Lines        3248     3363     +115     
==========================================
+ Hits         2572     2662      +90     
- Misses        676      701      +25

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

michaelfeil added 2 commits November 3, 2024 11:48

inital commit

052378f

fmt

b11ea98

greptile-apps bot reviewed Nov 4, 2024

View reviewed changes

lint

a873b21

michaelfeil and others added 2 commits November 5, 2024 10:46

Merge branch 'main' into model-parallel-device-interface

791996e

fix: typos, more docs

e4bb6a4

michaelfeil merged commit 941105f into main Nov 11, 2024
36 checks passed

michaelfeil deleted the model-parallel-device-interface branch November 11, 2024 06:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model: Device-id and data-parallel inference in CLI and Torch #452

Model: Device-id and data-parallel inference in CLI and Torch #452

michaelfeil commented Nov 4, 2024

greptile-apps bot left a comment

greptile-apps bot Nov 4, 2024

greptile-apps bot Nov 4, 2024

greptile-apps bot Nov 4, 2024

greptile-apps bot Nov 4, 2024

codecov-commenter commented Nov 5, 2024 •

edited

Loading

Model: Device-id and data-parallel inference in CLI and Torch #452

Model: Device-id and data-parallel inference in CLI and Torch #452

Conversation

michaelfeil commented Nov 4, 2024

Description

Related Issue

Types of Change

Checklist

Additional Notes

License

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary

greptile-apps bot Nov 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Nov 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Nov 4, 2024

Choose a reason for hiding this comment

greptile-apps bot Nov 4, 2024

Choose a reason for hiding this comment

codecov-commenter commented Nov 5, 2024 • edited Loading

Codecov Report

codecov-commenter commented Nov 5, 2024 •

edited

Loading