-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add a wrapper for torch.inference_mode #6618
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, please make sure to run full CI as it is required to merge (or just use auto-merge). To run full CI, you can do one of these:
🚀 |
is it possible to put it into https://github.com/vllm-project/vllm/tree/main/vllm/platforms ? e.g. class CudaPlatform(Platform):
inference_mode = torch.inference_mode class TpuPlatform(Platform):
inference_mode = torch.no_grad |
@youkaichao Sorry I don't get your point. What will the API look like in your proposal? Orthogonally, |
The usage will be a unified decorator: from vllm.platforms import current_platform
class ModelRunner:
@current_platform.inference_mode
def execute_model( it dispatches to we don't need to mimic full functionality of this is how we unify the heterogeneity of hardware platforms. |
@youkaichao Got it. Thanks for the explanation.
You're right. I simplified the code based on your suggestion. PTAL.
I also agree that this could be a better idea. However, because |
not necessary. we are doing it step by step. for example, I only implemented for the problem you mentioned, we can have another platform called vllm/vllm/platforms/__init__.py Line 16 in b6df37f
I can do it in a followup PR, if you don't have bandwidth. I think starting the code in a unified way is better than we gather around code later. |
@youkaichao I see. Update the PR. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for addressing my comments! let's see if tests pass smoothly 👍
@youkaichao Thanks for your feedback! Let me merge the PR as it passes the tests. |
Signed-off-by: Alvant <[email protected]>
torch.inference_mode
is not supported by some hardware backends such as TPU. To address this, this PR introduces a wrapper class that falls back totorch.no_grad
for the unsupported backends.