How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

songkq · 2023-05-23T01:17:26Z

I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? For example, call the client.stop_stream following a asynchronous command.
If so, will the tritonserver stop inference immediately after calling the stop_stream?
Could you please give some advice?

with grpcclient.InferenceServerClient(self.model_url) as client:
        client.start_stream(callback=partial(stream_callback, result_queue))
        client.async_stream_infer(self.model_name, request_data)

The text was updated successfully, but these errors were encountered:

dyastremsky · 2023-05-23T15:24:16Z

I don't believe so. Once a request is enqueued, there should not be a way to stop it from being executed aside from timeout. Stop_stream should stop the client's stream, but that won't affect the requests that have already been sent.

CC: @rmccorm4

rmccorm4 · 2023-05-23T22:22:51Z

client.stop_stream() will block until the responses have been received for the enqueued requests.

songkq · 2023-05-24T09:45:04Z

@rmccorm4 @dyastremsky Thanks.
In a production environment like ChatGPT, early termination of a conversation based on user-client commands can be a major requirement. Is there another way to immediately terminate TritonServer inference? Could you give some advice?

dyastremsky · 2023-05-24T14:57:19Z

Quoting Ryan in a different issue:

"Currently, there is no way to cancel a request once it has been scheduled/placed in the queue on the server side. Once the request is scheduled, it's up to the backend to do the right thing.

You may be able to use some custom logic to determine if that request should just return an empty response / not waste more time computing the response. But this would require some custom backend logic and metadata to do so on your end."

Source issue with more information: #4818

songkq · 2023-06-01T02:48:25Z

@dyastremsky Thanks.

dyastremsky added the question Further information is requested label May 23, 2023

songkq closed this as completed Jun 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

songkq commented May 23, 2023

dyastremsky commented May 23, 2023

rmccorm4 commented May 23, 2023

songkq commented May 24, 2023 •

edited

Loading

dyastremsky commented May 24, 2023 •

edited

Loading

songkq commented Jun 1, 2023

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

How to terminate a grpc streaming request immediately during tritonserver inference with a FasterTransformer backend? #5833

Comments

songkq commented May 23, 2023

dyastremsky commented May 23, 2023

rmccorm4 commented May 23, 2023

songkq commented May 24, 2023 • edited Loading

dyastremsky commented May 24, 2023 • edited Loading

songkq commented Jun 1, 2023

songkq commented May 24, 2023 •

edited

Loading

dyastremsky commented May 24, 2023 •

edited

Loading