Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvidia-trt: no stop request, stop response for decoupled model #17764

Closed

Conversation

mkhludnev
Copy link
Contributor

@mkhludnev mkhludnev commented Feb 19, 2024

  • Description:

  • Issue: now it sends stop:True as an input, and triton rejects it; then without triton_enable_empty_final_response=True stream hangs forever

  • Add tests and docs: If you're adding a new integration, please include

    1. a test for the integration, preferably unit tests that do not rely on network access,
    2. an example notebook showing its use. It lives in docs/docs/integrations directory.
  • Lint and test: Run make format, make lint and make test from the root of the package(s) you've modified. See contribution guidelines for more: https://python.langchain.com/docs/contributing/

If no one reviews your PR within a few days, please @-mention one of baskaryan, efriis, eyurtsev, hwchase17.

 - detect decoupled model and request explicit stop response
@efriis efriis added the partner label Feb 19, 2024
@efriis efriis self-assigned this Feb 19, 2024
Copy link

vercel bot commented Feb 19, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Feb 20, 2024 9:43pm

@@ -218,7 +218,7 @@ def _invoke_triton(self, model_name, inputs, outputs, stop_words):
result_queue = StreamingResponseGenerator(
self,
request_id,
force_batch=False,
signal_stop=False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming to meaningful. Inverting it at line 405.
TLDR triton doesn't let one cancel a request triton-inference-server/server#4818

@@ -238,6 +238,7 @@ def _invoke_triton(self, model_name, inputs, outputs, stop_words):
inputs=inputs,
outputs=outputs,
request_id=request_id,
enable_empty_final_response=self.client.get_model_config(model_name).config.model_transaction_policy.decoupled
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to improve it here. Maybe cache it or use it as model_ready call above. I also don't know how much this grpc code is null proof.

@mkhludnev mkhludnev marked this pull request as ready for review February 20, 2024 21:45
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. 🤖:improvement Medium size change to existing code to handle new use-cases labels Feb 20, 2024
@efriis
Copy link
Member

efriis commented Mar 19, 2024

Should be against the https://github.com/langchain-ai/langchain-nvidia repo instead!

@efriis efriis closed this Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases partner size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants