You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? For example, call the client.stop_stream following a asynchronous command.
If so, will the tritonserver stop inference immediately after calling the stop_stream?
Could you please give some advice?
with grpcclient.InferenceServerClient(self.model_url) as client:
client.start_stream(callback=partial(stream_callback, result_queue))
client.async_stream_infer(self.model_name, request_data)
The text was updated successfully, but these errors were encountered:
I don't believe so. Once a request is enqueued, there should not be a way to stop it from being executed aside from timeout. Stop_stream should stop the client's stream, but that won't affect the requests that have already been sent.
@rmccorm4@dyastremsky Thanks.
In a production environment like ChatGPT, early termination of a conversation based on user-client commands can be a major requirement. Is there another way to immediately terminate TritonServer inference? Could you give some advice?
"Currently, there is no way to cancel a request once it has been scheduled/placed in the queue on the server side. Once the request is scheduled, it's up to the backend to do the right thing.
You may be able to use some custom logic to determine if that request should just return an empty response / not waste more time computing the response. But this would require some custom backend logic and metadata to do so on your end."
I'm wondering whether a grpc streaming request can be terminated immediately during tritonserver inference with a FasterTransformer backend? For example, call the client.stop_stream following a asynchronous command.
If so, will the tritonserver stop inference immediately after calling the stop_stream?
Could you please give some advice?
The text was updated successfully, but these errors were encountered: