Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

Closed
songkq opened this issue May 23, 2023 · 5 comments
Labels
question Further information is requested

Comments

@songkq
Copy link

songkq commented May 23, 2023

I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? For example, call the client.stop_stream following a asynchronous command.
If so, will the tritonserver stop inference immediately after calling the stop_stream?
Could you please give some advice?

with grpcclient.InferenceServerClient(self.model_url) as client:
        client.start_stream(callback=partial(stream_callback, result_queue))
        client.async_stream_infer(self.model_name, request_data)
@dyastremsky dyastremsky added the question Further information is requested label May 23, 2023
@dyastremsky
Copy link
Contributor

I don't believe so. Once a request is enqueued, there should not be a way to stop it from being executed aside from timeout. Stop_stream should stop the client's stream, but that won't affect the requests that have already been sent.

CC: @rmccorm4

@rmccorm4
Copy link
Contributor

client.stop_stream() will block until the responses have been received for the enqueued requests.

@songkq
Copy link
Author

songkq commented May 24, 2023

@rmccorm4 @dyastremsky Thanks.
In a production environment like ChatGPT, early termination of a conversation based on user-client commands can be a major requirement. Is there another way to immediately terminate TritonServer inference? Could you give some advice?

@dyastremsky
Copy link
Contributor

dyastremsky commented May 24, 2023

Quoting Ryan in a different issue:

"Currently, there is no way to cancel a request once it has been scheduled/placed in the queue on the server side. Once the request is scheduled, it's up to the backend to do the right thing.

You may be able to use some custom logic to determine if that request should just return an empty response / not waste more time computing the response. But this would require some custom backend logic and metadata to do so on your end."

Source issue with more information: #4818

@songkq
Copy link
Author

songkq commented Jun 1, 2023

@dyastremsky Thanks.

@songkq songkq closed this as completed Jun 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

3 participants