Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

skhorasganiTT · 2024-12-10T21:23:58Z

Issue: #29

Added _validate_device_inputs to LLMEngine for performing device-specific request validation (whereas the existing _validate_model_inputs and custom input processors can be used for device agnostic or model-specific request validation). Currently only supported for TT models through TTExecutor::validate_seq_group
Changed sampling param assertions in TTModelRunner to ValueErrors which are triggered through the invocation of _validate_device_inputs mentioned above, so that bad requests return 400 errors instead of crashing the server
Changed batch level assertions (during step execution) for same temp/top-p/top-k params to warnings to avoid server crashes (this cannot be handled through _validate_device_inputs since it has already reached step execution at which point any assertion or error will crash the server)
Note: This PR also relies on [Llama3-70b] Separate vllm generator class and add prompt length validation in input processor tt-metal#15880

…t-specific asserts in TTModelRunner to not crash server instances Signed-off-by: Salar Hosseini <[email protected]>

github-actions · 2024-12-10T21:24:08Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

Add device-specific request validation in LLMEngine and modify reques…

a423383

…t-specific asserts in TTModelRunner to not crash server instances Signed-off-by: Salar Hosseini <[email protected]>

skhorasganiTT requested a review from cglagovichTT December 10, 2024 21:23

cglagovichTT approved these changes Dec 10, 2024

View reviewed changes

skhorasganiTT merged commit 9531611 into dev Dec 10, 2024
1 check passed

skhorasganiTT deleted the skhorasgani/fix_request_validation branch December 10, 2024 21:57

skhorasganiTT mentioned this pull request Dec 10, 2024

[Bug] vLLM server crashes upon assertions instead of throwing errors to client (e.g. fails when requests with different temperatures are sent) #29

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

skhorasganiTT commented Dec 10, 2024 •

edited

Loading

github-actions bot commented Dec 10, 2024

Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

Add device-specific request validation in LLMEngine and modify request-specific asserts in TTModelRunner to not crash server instances #41

Conversation

skhorasganiTT commented Dec 10, 2024 • edited Loading

github-actions bot commented Dec 10, 2024

skhorasganiTT commented Dec 10, 2024 •

edited

Loading